<?xml version="1.0" encoding="UTF-8" standalone="no"?><rss xmlns:arxiv="http://arxiv.org/schemas/atom" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>ExcitingAds! arXiv</title>
    <link>http://rss.arxiv.org/rss/cs</link>
    <description>arXiv!</description>
    <atom:link href="http://rss.arxiv.org/rss/cs" rel="self" type="application/rss+xml"/>
    <docs>http://www.rssboard.org/rss-specification</docs>
    <language>en-us</language>
    <lastBuildDate>Thu, 07 May 2026 04:00:03 +0000</lastBuildDate>
    <managingEditor>rss-help@arxiv.org</managingEditor>
    <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
    <skipDays>
      <day>Saturday</day>
      <day>Sunday</day>
    </skipDays>
    <xhtml:meta content="noindex" name="robots" xmlns:xhtml="http://www.w3.org/1999/xhtml"/><item>
      <title>LCM: Lossless Context Management</title>
      <link>https://arxiv.org/abs/2605.04050</link>
      <description>arXiv:2605.04050v1 Announce Type: new 
Abstract: We introduce Lossless Context Management (LCM), a deterministic architecture for LLM memory that outperforms Claude Code on long-context tasks. When benchmarked using Opus 4.6, our LCM-augmented coding agent, Volt, achieves higher scores than Claude Code on the OOLONG long-context eval, including at every context length between 32K and 1M tokens.
  LCM may be considered both a vindication and extension of the recursive paradigm pioneered by Recursive Language Models (RLMs). Our results demonstrate that recursive context manipulation can outperform not just conventional LLMs, but frontier coding agents with native file-system access.
  LCM departs from RLM by decomposing symbolic recursion into two deterministic, engine-managed mechanisms: recursive context compression, in which a hierarchical summary DAG automatically compacts older messages while retaining lossless pointers to every original; and recursive task partitioning, in which engine-managed parallel primitives like LLM-Map replace model-written loops. This trade-off, analogous to the move from GOTO to structured control flow in program-ming language design, sacrifices maximal flexibility for termination guarantees, zero-cost continuity on short tasks, and lossless retrievability of all prior state.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04050v1</guid>
      <category>cs.AI</category>
      <category>cs.PL</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Clint Ehrlich, Theodore Blackman</dc:creator>
    </item>
    <item>
      <title>Constraint-Aware Execution Planning for Hybrid Space-Ground Compute Workloads</title>
      <link>https://arxiv.org/abs/2605.04052</link>
      <description>arXiv:2605.04052v1 Announce Type: new 
Abstract: Low Earth orbit (LEO) satellites increasingly carry compute hardware capable of on-board processing, yet each satellite generates roughly two orders of magnitude more data than it can downlink per orbit. This mismatch forces operators to decide, for every workload, which computation runs on-board and which runs on the ground, how intermediate data crosses the space-ground boundary through narrow contact windows, and how to maintain delivery guarantees over noisy channels. We present Constraint-Aware Execution (CAE), a planning system that takes a satellite identifier, a workload expressed as a directed acyclic graph of processing steps, and a set of orbital and resource constraints, and produces a deterministic, physically grounded execution plan. CAE operates in four phases: (1) orbital environment construction via SGP4 propagation with eclipse detection and ground station pass prediction, (2) compute placement using a cost model that compares on-board resource consumption against transfer overhead, (3) transfer insertion with adaptive forward error correction and security overhead modeling, and (4) greedy first-fit scheduling into orbital windows under power, thermal, compute, and communication constraints. We evaluate CAE against five representative workload patterns across satellites in distinct orbital regimes and demonstrate that the system produces feasible plans in under two seconds, correctly exploits onboard data reduction to minimize transfer volume, and adapts FEC and multi-pass allocation to varying channel conditions. CAE is deployed as a production API computing plans for any cataloged satellite using live two-line element data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04052v1</guid>
      <category>cs.DC</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Subhadip Mitra</dc:creator>
    </item>
    <item>
      <title>Endogenous Regime Switching Driven by Scalar-Irreducible Learning Dynamics</title>
      <link>https://arxiv.org/abs/2605.04054</link>
      <description>arXiv:2605.04054v1 Announce Type: new 
Abstract: Achieving endogenous regime switching is crucial for the emergence of autonomous intelligence, yet remains a central challenge for existing machine learning frameworks, where such transitions are typically externally imposed. In this work, we introduce a classification that distinguishes scalar-reducible dynamics, which can be expressed as gradient flows driven by a scalar objective, from scalar-irreducible dynamics that cannot be reduced to such a form. While most existing machine learning systems operate within the scalar-reducible class, we demonstrate that scalar-irreducible dynamics naturally enable internally generated regime switching through feedback between fast dynamical variables and slow structural adaptation. Using a minimal dynamical model, we illustrate how this mechanism produces sustained endogenous regime transitions without external scheduling. Our results suggest a new dynamical paradigm for regime exploration and provide a potential route toward autonomous learning systems whose adaptive behavior is organized internally rather than externally prescribed.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04054v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sheng Ran</dc:creator>
    </item>
    <item>
      <title>A Self-Attentive Meta-Optimizer with Group-Adaptive Learning Rates and Weight Decay</title>
      <link>https://arxiv.org/abs/2605.04055</link>
      <description>arXiv:2605.04055v1 Announce Type: new 
Abstract: Adaptive optimizers like AdamW apply uniform hyperparameters across all parameter groups, ignoring heterogeneous optimization dynamics across layers and modules. We address this limitation by proposing MetaAdamW - a new optimizer that integrates a self-attention mechanism to dynamically modulate per-group learning rates and weight decay. The modulation factors are produced by a lightweight Transformer encoder that operates on statistical features (gradient norms, momentum norms, correlations) extracted from each parameter group. To train the attention module, we introduce a meta-learning objective that combines gradient alignment, loss decrease, and generalization gap. A key novel contribution is the extension of homoscedastic uncertainty weighting (HUW) with task-specific priorities that directly scale the regularization terms - enabling domain knowledge to guide automatic loss balancing. Extensive experiments on five diverse tasks-time series forecasting (ETT), language modeling (WikiText-2), machine translation (Multi30k), image classification (CIFAR-10), and sentiment analysis (IMDB) - demonstrate that MetaAdamW consistently outperforms the standard AdamW baseline in terms of validation loss, accuracy, or perplexity. Depending on the task, MetaAdamW either reduces overall training time (by up to 17.11%) or improves performance (by up to 11.08%) while introducing only moderate overhead; in some cases, it can also mitigate issues of insufficient convergence caused by premature early stopping. Ablation studies validate the effectiveness of each component, including feature versions, grouping strategies, and the proposed priority-injected uncertainty weighting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04055v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>JiangBo Zhao, ZhaoXin Liu</dc:creator>
    </item>
    <item>
      <title>Transformation Categorization Based on Group Decomposition Theory Using Parameter Division</title>
      <link>https://arxiv.org/abs/2605.04056</link>
      <description>arXiv:2605.04056v1 Announce Type: new 
Abstract: Representation learning seeks meaningful sensory representations without supervision and can model aspects of human development. Although many neural networks empirically learn useful features, a principled account of what makes a representation "good" remains elusive. We study unsupervised categorization of transformations between pairs of inputs under algebraic constraints. Classical disentanglement favors mutually independent factors and fails when factors are coupled. Our prior Galois-theoretic approach decomposes a group via normal subgroups by learning a product of two transformations with one factor constrained to a normal subgroup, covering both commutative and non-commutative cases. That method, however, relied on auxiliary assumptions (e.g., motion and isometry restrictions) not required by decomposition theory, and ablations did not separate theory-based from auxiliary effects. We propose parameter division for a single transformation: we split its parameter into components, impose homomorphism constraints mapping the full transformation to one component, and identify the normal subgroup as the set of transformations when that component is fixed to the identity. This formulation drops the previous auxiliary assumptions and applies more broadly. We evaluate on image pairs involving rotation, translation, and scale; ablations show that group-decomposition constraints drive appropriate categorization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04056v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Takayuki Komatsu, Yoshiyuki Ohmura, Yasuo Kuniyoshi</dc:creator>
    </item>
    <item>
      <title>Structured Progressive Knowledge Activation for LLM-Driven Neural Architecture Search</title>
      <link>https://arxiv.org/abs/2605.04057</link>
      <description>arXiv:2605.04057v1 Announce Type: new 
Abstract: This paper focuses on a key challenge in Neural Architecture Search (NAS): integrating established architectural knowledge while exploring new designs under expensive evaluations. Large language models (LLMs) are a promising assistant for NAS because they can translate rich architectural and coding priors into executable code edits. However, in practice, seemingly local revisions often propagate into non-local behavioral and performance shifts because a single edit can inadvertently couple multiple interacting functional factors, a phenomenon we refer to as functional entanglement. To make LLM knowledge usable under such entanglement, we propose Structured Progressive Knowledge Activation (SPARK), which activates relevant priors by explicitly selecting the functional factor to modify and conditioning the edit on that factor. This factor-conditioned editing reduces entangled side effects and yields more targeted, reliable architecture modifications. On CLRS-DFS, SPARK achieves a 28.1x sample-efficient architecture evolution speedup and yields a 22.9 percent relative improvement in OOD accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04057v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhen Liu, Yuhan Liu, Jingwen Fu</dc:creator>
    </item>
    <item>
      <title>MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning</title>
      <link>https://arxiv.org/abs/2605.04058</link>
      <description>arXiv:2605.04058v1 Announce Type: new 
Abstract: Parameter-efficient transfer learning (PETL) has emerged as a pivotal paradigm for adapting pre-trained foundation models to downstream tasks, significantly reducing trainable parameters yet suffering from substantial memory overhead caused by gradient backpropagation during fine-tuning. While memory-efficient transfer learning (METL) circumvents this challenge by bypassing backbone gradient computation via lightweight small side networks, its stringent memory constraint severely limits learning capacity of side networks, thereby significantly compromising performance. To address these limitations, we propose a novel Mixed-Precision Interactive Side Mixture-of-Experts framework (MP-ISMoE). Specifically, we first propose a Gaussian Noise Perturbed Iterative Quantization (GNP-IQ) scheme to quantize weights into lower-bits while effectively decreasing quantization errors. By leveraging memory conserved from GNP-IQ, we subsequently employ Interactive Side Mixture-of-Experts (ISMoE) to scaling up side networks without sacrificing overall memory efficiency. Different from conventional mixture-of-experts, ISMoE learns to select optimal experts by interacting with salient features from frozen backbones, thus suppressing knowledge forgetting and boosting performance. Extensive experiments across diverse vision-language and language-only tasks demonstrate that MP-ISMoE remarkably promotes accuracy compared to state-of-the-art METL approaches, while maintaining comparable parameter and memory efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04058v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yutong Zhang, Zimeng Wu, Shangcai Liao, Shujiang Wu, Jiaxin Chen</dc:creator>
    </item>
    <item>
      <title>Continual Distillation of Teachers from Different Domains</title>
      <link>https://arxiv.org/abs/2605.04059</link>
      <description>arXiv:2605.04059v1 Announce Type: new 
Abstract: Deep learning models continue to scale, with some requiring more storage than many large-scale datasets. Thus, we introduce a new paradigm: Continual Distillation (CD), where a student learns sequentially from a stream of teacher models without retaining access to earlier teachers. CD faces two challenges: teacher training data is unavailable, and teachers have varying expertise. We show that external unlabeled data enables Unseen Knowledge Transfer (UKT), allowing the student to acquire information from domains not present in the training data, while known to the teacher. We also show that sequential distillation causes Unseen Knowledge Forgetting (UKF) when transferred knowledge is lost after training on later teachers. To better trade off between UKT and UKF, we propose Self External Data Distillation (SE2D), a method that preserves logits on external data to stabilize learning across heterogeneous teachers. Experiments on multiple benchmarks show that SE2D reduces UKF and improves cross-domain generalization. The code and implementation for this work are publicly available at: https://github.com/Nicolas1203/continual_distillation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04059v1</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nicolas Michel, Maorong Wang, Jiangpeng He, Toshihiko Yamasaki</dc:creator>
    </item>
    <item>
      <title>Lookahead Drifting Model</title>
      <link>https://arxiv.org/abs/2605.04060</link>
      <description>arXiv:2605.04060v1 Announce Type: new 
Abstract: Recently, a new paradigm named \emph{drifting model} has been proposed for mapping distributions, which achieves the SOTA image generation performance over ImageNet via one-step neural functional evaluation (NFE). The basic idea is to compute a drifting term at each training iteration and then push the output of the model towards the direction of the drifting term. In this paper, we propose a \emph{lookahead drifting model}. At each training iteration, we compute a set of drifting terms sequentially. Each drifting term is calculated by making use of previously computed ones as well as the positive samples and the output of the model. %One key step is to properly scale the drifting terms so that their magnitudes are in a comparable range. In principle, the drifting terms obtained at a later stage capture higher order gradient information towards the positive samples. At each training iteration, the model is optimized by pushing its output towards the direction of the (weighted) summation of the drifting terms. Experimental results on toy examples and CIFAR10 demonstrate the better performance of the new method than the baseline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04060v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Guoqiang Zhang, Kenta Niwa, W. Bastiaan Kleijn</dc:creator>
    </item>
    <item>
      <title>Single-Position Intervention Fails: Distributed Output Templates Drive In-Context Learning</title>
      <link>https://arxiv.org/abs/2605.04061</link>
      <description>arXiv:2605.04061v1 Announce Type: new 
Abstract: Understanding how large language models encode task identity from few-shot demonstrations is a central open problem in mechanistic interpretability. Prior work uses linear probing to localize task representations, reporting high classification accuracy at specific layers. We reveal a striking dissociation: probing accuracy completely fails to predict causal importance. Single-position activation intervention achieves 0% task transfer across all 28 layers of Llama-3.2-3B-despite 100% probing accuracy at those same positions. This null result is itself a key finding, demonstrating that task encoding is fundamentally distributed. Multi-position intervention-replacing activations at all demonstration output tokens simultaneously-achieves up to 96% transfer (N=50, 95% CI: [87%, 99%]) at layer 8, pinpointing for the first time the causal locus of ICL task identity. We establish the generality of these findings across four models spanning three architecture families (LLaMA, Qwen, Gemma), discovering a universal intervention window at ~30% network depth. Causal tracing uncovers an asymmetric architecture: the query position is strictly necessary (53-100% disruption) while no individual demonstration position is necessary (0% disruption)-resolving a key ambiguity in prior accounts. Crucially, transfer depends on internal representation compatibility, not surface similarity (r=-0.05 vs r=0.31), ruling out trivial explanations. These results establish the distributed template hypothesis: ICL task identity is encoded as output format templates distributed across demonstration tokens, fundamentally reshaping our understanding of how in-context learning operates.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04061v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bryan Cheng, Jasper Zhang</dc:creator>
    </item>
    <item>
      <title>EdgeRazor: A Lightweight Framework for Large Language Models via Mixed-Precision Quantization-Aware Distillation</title>
      <link>https://arxiv.org/abs/2605.04062</link>
      <description>arXiv:2605.04062v1 Announce Type: new 
Abstract: Recent years have witnessed an increasing interest in deploying LLMs on resource-constrained devices, among which quantization has emerged as a promising lightweight technique that converts full-precision model weights and activations into lower-bit formats. Existing weight quantization approaches can be roughly divided into three categories: Post-Training Quantization (PTQ) that calibrates quantized parameters on a small dataset without retraining but suffers from severe performance degradation below 4-bit, Quantization-Aware Training (QAT) that searches low-bit parameters using surrogate gradients but demands substantial computational resources, and Quantization-Aware Distillation that integrates QAT with knowledge transfer from a full-precision teacher but manually selects features to distill and relies heavily on teacher-specific data. In this paper, we propose EdgeRazor, a lightweight framework for LLMs with mixed-precision and extremely low-bit weight quantization. The EdgeRazor framework contains three modules: Mixed-Precision Quantization-Aware Distillation for the fine-grained control of precision, Adaptive Feature Distillation that derives an $n$-bit student from its 16-bit teacher, and Entropy-Aware KL Divergence on both human-annotated and distilled datasets, whose forward-reverse balance is determined solely by the teacher's output distribution. Empirical investigations of EdgeRazor are conducted on base, instruction-tuned, and multimodal LLMs. Notably, EdgeRazor with 1.88-bit surpasses all contenders with the 3-bit precision, especially outperforms the leading 2-bit PTQ methods by 11.3 points, within a 4-10$\times$ lower training budget than the leading QAT approach. EdgeRazor delivers higher compression ratios at all bit width; the 1.58-bit Qwen3-0.6B reduces storage from 1.41 GB to 0.28 GB while accelerating decoding by 15.1$\times$ relative to the 16-bit baseline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04062v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shu-Hao Zhang, Le-Tong Huang, Xiang-Sheng Deng, Xin-Yi Zou, Chen Wu, Nan Li, Shao-Qun Zhang</dc:creator>
    </item>
    <item>
      <title>Investigating Trustworthiness of Nonparametric Deep Survival Models for Alzheimer's Disease Progression Analysis</title>
      <link>https://arxiv.org/abs/2605.04063</link>
      <description>arXiv:2605.04063v1 Announce Type: new 
Abstract: Alzheimer's Dementia (AD) is a progressive neurodegenerative disease marked by irreversible decline, making reliable modeling of its progression essential for effective patient care. Progression-aware methods such as survival analysis are therefore crucial tools for the early detection and monitoring of AD. Recent advancements in deep learning have demonstrated remarkable performance in survival tasks, but alarmingly fewer studies have been conducted in the domain of AD. Further, the studies that do exist do not consider learned bias within the model itself, which could result in unfair and unreliable predictions toward certain marginalized groups. As such, we conduct a rigorous study of fairness in AD progression analysis along with a thorough feature importance study to determine the characteristics which are most important for reliable AD predictions. Furthermore, we propose two novel fairness metrics, called Time-Dependent Concordance Impurity and Kaplan-Meier Fairness, to quantify bias with respect to sensitive attributes such as sex, race, and education in nonparametric survival models. Our study demonstrates that while deep learning powered survival models are robust tools which can aid clinicians in AD care decisions, they often exhibit considerable bias, representing important avenues for future research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04063v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jacob Thrasher, Kaitlyn Heintzelman, Peter Martone, David Kotlowski, Binod Bhattarai, Donald Adjeroh, Prashnna Gyawali</dc:creator>
    </item>
    <item>
      <title>Improving Medical VQA through Trajectory-Aware Process Supervision</title>
      <link>https://arxiv.org/abs/2605.04064</link>
      <description>arXiv:2605.04064v1 Announce Type: new 
Abstract: Reasoning capabilities are crucial for reliable medical visual question answering (VQA); however, existing datasets rarely include reasoning explanations.
  We address this by generating reasoning trajectories for six medical VQA benchmarks using the COMCTS algorithm with open-source vision-language models, with an LLM serving as the verification judge.
  Building on these generated datasets, we propose a two-stage training framework: supervised fine-tuning followed by Group Relative Policy Optimization (GRPO) with a novel process-based reward.
  While standard approaches rely solely on exact-match rewards for final answers, we introduce a trajectory-aware reward that measures the similarity between generated and ground-truth reasoning processes.
  Specifically, we embed reasoning steps using sentence transformers and compute the Dynamic Time Warping (DTW) distance between the resulting vector sequences.
  Experiments across six benchmarks demonstrate that combining the DTW-based process reward with exact-match reward consistently outperforms SFT-only training, raising mean accuracy from 0.598 to 0.689, mean BERTScore from 0.845 to 0.881, and mean ROUGE-L from 0.665 to 0.748.
  Our results highlight the importance of process supervision in training reasoning-capable medical VLMs.
  We make our code and generated reasoning datasets publicly available at https://anonymous.4open.science/r/MICCAI-R1-MED-VQA-code-B14B/</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04064v1</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Halil Ibrahim Gulluk, Olivier Gevaert</dc:creator>
    </item>
    <item>
      <title>Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs</title>
      <link>https://arxiv.org/abs/2605.04065</link>
      <description>arXiv:2605.04065v1 Announce Type: new 
Abstract: Unsupervised reinforcement learning (RL) has emerged as a promising paradigm for enabling self-improvement in large language models (LLMs). However, existing unsupervised RL-based methods often lack the capacity to adapt to the model's evolving reasoning capabilities during training. Therefore, these methods can misdirect policy optimization in the absence of ground-truth supervision. To address this issue, we introduce FREIA, a novel RL-based algorithm built on two key innovations: (1) Free Energy-Driven Reward (FER) adapts rewards to balance consensus and exploration based on the Free Energy Principle. (2) Adaptive Advantage Shaping (AAS) adaptively adjusts learning signals based on the statistical characteristics of sampled rewards. Empirical evaluations on nine datasets across three reasoning tasks showcase that FREIA outperforms other unsupervised RL-based baselines. Notably, in mathematical reasoning tasks, FREIA surpasses other methods by an average of 0.5 to 3.5 points in Pass@1 using the DeepSeek-R1-Distill-Qwen-1.5B model.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04065v1</guid>
      <category>cs.CL</category>
      <category>cs.ET</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yiming Huang, Zhenbo Shi, Xin-Cheng Wen, Jichuan Zeng, Cuiyun Gao, Peiyi Han, Chuanyi Liu</dc:creator>
    </item>
    <item>
      <title>Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning</title>
      <link>https://arxiv.org/abs/2605.04066</link>
      <description>arXiv:2605.04066v1 Announce Type: new 
Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is an essential paradigm that enhances the reasoning capabilities of Large Language Models (LLMs). However, existing methods typically rely on static policy optimization schemes that misalign with the model's evolving reasoning capabilities. To address this issue, we propose Adaptive Power-Mean Policy Optimization (APMPO), which comprises two main innovations: Power-Mean Policy Optimization (PMPO) and Feedback-Adaptive Clipping (FAC). Specifically, PMPO introduces a generalized power-mean objective. This enables the model to adaptively transition from the signal-amplifying behavior of the arithmetic mean to the consistency-enforcing behavior of the geometric mean. FAC adaptively adjusts clipping bounds based on real-time reward statistics to overcome the limitations of static mechanisms. Capitalizing on these innovations, APMPO improves learning dynamics and reasoning performance. Extensive experiments on nine datasets across three reasoning tasks showcase the superiority of APMPO over state-of-the-art RLVR-based baselines. For instance, APMPO boosts the average Pass@1 score on mathematical reasoning benchmarks by 3.0 points compared to GRPO when using Qwen2.5-3B-Instruct.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04066v1</guid>
      <category>cs.CL</category>
      <category>cs.ET</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yiming Huang, Zhenbo Shi, Shuzheng Gao, Cuiyun Gao, Peiyi Han, Chuanyi Liu</dc:creator>
    </item>
    <item>
      <title>SemiConLens: Visual Analytics for 2D Semiconductor Discovery</title>
      <link>https://arxiv.org/abs/2605.04067</link>
      <description>arXiv:2605.04067v1 Announce Type: new 
Abstract: The past few years have witnessed vibrant efforts in discovering new two-dimensional (2D) semiconductor materials from both academia and the industry, due to their promising potential in resolving the severe performance deterioration of traditional semiconductors resulting from condensed silicon thickness. However, existing methods (e.g., Density Functional Theory (DFT) or machine-learning-based approaches) suffer from various challenges such as small datasets, and reliability and trustworthiness issues. To bridge this gap, we propose SemiConLens, a visual analytics approach to combine human expertise with the power of ML to enable effective and reliable 2D semiconductor discovery. Specifically, we first develop a new Correlation Aware Multivariate Imputation (CAMI) method and use ML models like autoencoder, which can better learn from limited data and reveal uncertainty, to address the challenge of sparse data in semiconductivity prediction. Built upon this, our visualization module, consisting of three visualization views with linked interactions, allows material researchers to interactively filter, discover and compare 2D semiconductor candidates. A novel circular glyph design and a new cluster-aware layout optimization approach are proposed to effectively display all the user-configurable key attributes and possible prediction uncertainties of each semiconductor candidate, ensuring a reliable and trustable 2D semiconductor discovery. We assess SemiConLens through quantitative evaluations, expert interviews, and use cases. The results demonstrate SemiConLens's capability to help material researchers conduct effective discovery of desirable 2D semiconductors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04067v1</guid>
      <category>cs.HC</category>
      <category>cond-mat.mtrl-sci</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kavinda Athapaththu, Shiwei Chen, Yuan Fang, Sanchali Mitra, Yee Sin Ang, Yong Wang</dc:creator>
    </item>
    <item>
      <title>Designing a double deep reinforcement learning selection tool for resilient demand prediction</title>
      <link>https://arxiv.org/abs/2605.04068</link>
      <description>arXiv:2605.04068v1 Announce Type: new 
Abstract: The use of artificial intelligence in supply chain forecasting has attracted many scientific studies for several decades. However, the process of selecting an appropriate forecasting solution becomes a daunting task. This complexity arises due to the distinct features inherent to each dataset. Research to tackle this issue has been performed since the eighties but recent development of demand forecasting has opened new perspectives. This research aims to enhance automatic forecasting model selection by proposing a novel architecture that acts as a double deep reinforcement learning agent, selecting automatically a forecasting model from the forecasting committee at the time of prediction. Moreover, a novel early-stopping approach based on average reward convergence has been introduced to expedite training time. To evaluate the model's performance, an empirical study was conducted utilizing grocery sales datasets and snack demands datasets. The experimental results demonstrate the robustness of the proposed approach when compared to state-of-the-art methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04068v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1080/21681015.2025.2580997</arxiv:DOI>
      <dc:creator>Bilel Abderrahmane Benziane, Benoit Lardeux, Ayoub Mcharek, Maher Jridi</dc:creator>
    </item>
    <item>
      <title>LAWS: Learning from Actual Workloads Symbolically -- A Self-Certifying Parametrized Cache Architecture for Neural Inference, Robotics, and Edge Deployment</title>
      <link>https://arxiv.org/abs/2605.04069</link>
      <description>arXiv:2605.04069v1 Announce Type: new 
Abstract: We introduce LAWS (Learning from Actual Workloads Symbolically), a self-certifying inference caching architecture that builds a growing library of certified expert functions from deployment observations. Each expert covers a region of input space defined by a node in the Probabilistic Language Trie (PLT) of the base model and carries a formal error bound holding uniformly over all inputs. The central result is a self-certification theorem: for any input x, the LAWS approximation error is bounded by epsilon_fit + 2*Lambda(W)*C_E, where Lambda(W) is the model Lipschitz constant, C_E is the maximum embedding diameter, and epsilon_fit is the expert training error -- all checkable at deployment time without ground truth. We prove that LAWS generalizes both Mixture-of-Experts and KV prefix caching as special cases and is strictly more expressive than any fixed-K MoE or finite cache. Further results include a monotone hit rate theorem (any-match routing ensures coverage only increases), an expert library growth rate of O(2^H log N) where H is workload entropy, a fleet learning convergence theorem with Omega(K) speedup for K-unit fleets, and an over-the-air update bandwidth bound. We conjecture that LAWS is acquisition-optimal among stationary online caching algorithms and that the effective Lipschitz constant on the training distribution grows polynomially rather than exponentially in depth. Applications are developed for LLM inference, robotic control, and multi-agent edge deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04069v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.IT</category>
      <category>cs.NE</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gregory Magarshak</dc:creator>
    </item>
    <item>
      <title>Toward Human-AI Complementarity Across Diverse Tasks</title>
      <link>https://arxiv.org/abs/2605.04070</link>
      <description>arXiv:2605.04070v1 Announce Type: new 
Abstract: Human-AI complementarity, the idea that combining human and AI judgments can outperform either alone, offers a promising pathway toward robust oversight of advanced AI systems. However, whether human-AI complementarity can be achieved on realistic tasks remains an open question. We investigate this through two approaches: hybridization and two AI assistance methods (top-2 assistance and subtask delegation), evaluated on a multi-domain dataset of 1,886 samples spanning knowledge, factuality, long-context reasoning, and deception detection. We find only modest complementarity gains. Baseline hybridization yields just +0.4 percentage points (pp) over AI alone (69.3\% vs 68.9\%), limited both by a small complementarity region (only 8.9\% of items where AI errs but humans do not) and the inability of confidence-based routing to identify it, since the model's confidence is similarly distributed across correct and incorrect predictions. Applied when AI has low confidence, top-2 assistance increases human accuracy from 28.4\% to 38.3\%, surpassing AI alone (37.7\%) -- but primarily because humans adopt correct AI suggestions, not because they successfully override AI errors. These findings suggest that the primary bottleneck is not human task accuracy per se, but the ability to route decisions to humans when it matters and to design assistance methods that enable humans to catch AI mistakes. Our quantitative and qualitative analyses pinpoint where and why each method succeeds or fails, offering concrete targets for future work. We will release our dataset and code upon request to support progress toward more effective human-AI collaboration for AI oversight.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04070v1</guid>
      <category>cs.HC</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuzheng Xu, Annya Dahmani, Matthew D. Blanchard, Niclas Dern, Edy Nastase, Francesca Bianco, Maja Pavlovic, Sukanya Krishna, Eric Modesitt, Miranda Anna Christ, Arth Singh, Gaia Molinaro, Sikata Bela Sengupta, Jaji Pamarthi, Arjun Menon, Rishub Jain</dc:creator>
    </item>
    <item>
      <title>FlatASCEND: Autoregressive Clinical Sequence Generation with Continuous Time Prediction and Association-Based Pharmacological Testing</title>
      <link>https://arxiv.org/abs/2605.04071</link>
      <description>arXiv:2605.04071v1 Announce Type: new 
Abstract: Autoregressive models can predict clinical events, but generating patient-conditioned multi-step trajectories that respond to intervention tokens and testing whether those responses preserve known pharmacological associations has received limited attention. We present FlatASCEND, a 14.5M-parameter autoregressive clinical sequence model using flat composite tokens and a zero-inflated log-normal time head. Standard distributional metrics (Jaccard 0.889-0.954) do not distinguish FlatASCEND from trivial baselines; the model's value lies in conditional generation from patient-specific prefixes. A prompt-shuffle ablation shows patient-specific conditioning amplifies mechanistic pharmacological effects (2.0-2.2x for steroid to glucose, diuretic to potassium) while leaving confounding-driven associations unchanged (0.9x for insulin to glucose). An incident-user framework assesses directional consistency against prior pharmacological knowledge on MIMIC-IV (N=500 per comparison): 4/10 recover correct mechanistic directions, 2 reproduce treatment-context associations, 4 are incorrect (9/10 significant, Wilcoxon p&lt;0.05). This pattern - partial recovery under residual confounding - is consistent with learned observational associations without causal distinction. Direct preference optimisation with surrogate reward destroys all correct associations (3/3 to 0/3), illustrating reward exploitation when reward and evaluation share an outcome domain. Generative evidence is strongest for short-horizon ICU data; outpatient temporal fidelity is weaker (median 10 vs 154 days on INSPECT), and zero-shot cross-site transfer degrades without adaptation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04071v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>q-bio.QM</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chris Sainsbury, Feng Dong, Andreas Karwath</dc:creator>
    </item>
    <item>
      <title>Sparse Autoencoder Decomposition of Clinical Sequence Model Representations: Feature Complexity, Task Specialisation, and Mortality Prediction</title>
      <link>https://arxiv.org/abs/2605.04072</link>
      <description>arXiv:2605.04072v1 Announce Type: new 
Abstract: Sparse autoencoders (SAEs) have been applied to large language models and protein language models, but not systematically to electronic health record (EHR) foundation models. We train TopK SAEs on FlatASCEND, a 14.5-million-parameter autoregressive clinical sequence model, at all 10 residual stream extraction points on INSPECT (outpatient) and MIMIC-IV (ICU). SAE decomposition reveals progressive abstraction across transformer depth: layer-0 features are near-perfect token detectors (45.7% singleton), while layer-6 features span approximately 30 token types across multiple clinical categories (0.5% singleton). Under full-sequence simple linear probes, SAE features outperform dense representations for discrete event prediction (mortality) while dense representations outperform for continuous magnitude prediction (length of stay) - a probe-level representational phenomenon that does not extend to clinically relevant leakage-safe windows, where dense representations match or exceed SAE features across all tested settings (eICU-CRD 48-hour AUC: SAE 0.871 versus dense 0.880; base model zero-shot, SAE dictionaries trained on eICU activations; MIMIC-IV: 0.836 versus 0.914; INSPECT 1-year/3-year: 0.697 versus 0.800). A delta-mode intervention method reduces SAE perturbation noise by 86x, enabling cleaner feature-level experiments, though the resulting perturbation effects are larger than random controls in 3 of 4 conditions but not formally significant. Feature reproducibility across random seeds is 21%, and individual features should be interpreted as illustrative rather than stable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04072v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chris Sainsbury, Feng Dong, Andreas Karwath</dc:creator>
    </item>
    <item>
      <title>Confronting Label Indeterminacy in Automated Bail Decisions</title>
      <link>https://arxiv.org/abs/2605.04073</link>
      <description>arXiv:2605.04073v1 Announce Type: new 
Abstract: Bail decisions present a fundamental challenge for data-driven decision support systems. When bail is denied, the counterfactual outcome of whether the defendant would have appeared in court remains unobserved. As a result, historical bail data embed structural label indeterminacy: future decisions are influenced by past decisions whose outcomes are only partially knowable. Building automated systems on such data risks introducing bias and reinforcing feedback loops. This raises a core question for machine-learning systems intended to assist judicial actors: how should cases in which bail was denied be treated during model development? In a case study of bail decisions from the Unified Judicial System of Pennsylvania, we evaluate five contemporary approaches to handling label indeterminacy across three machine learning models, including a novel label imputation method motivated by the dynamics of bail decisions. Each method relies on unverifiable assumptions, yet all influence the models' predictive behaviour, sometimes even more so than the choice of model itself. Explainable AI analysis further reveals that these effects extend to the models' internal decision-making processes as well. Finally, we consider the notion of label indeterminacy from a legal perspective and assess the legitimacy of these approaches in the context of bail decision-making.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04073v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Cor Steging, Tadeusz Zbiegie\'n</dc:creator>
    </item>
    <item>
      <title>A Physics-Aware Framework for Short-Term GPU Power Forecasting of AI Data Centers</title>
      <link>https://arxiv.org/abs/2605.04074</link>
      <description>arXiv:2605.04074v1 Announce Type: new 
Abstract: AI data centers experience rapid fluctuations in power demand due to the heterogeneity of computational tasks that they have to support. For example, the power profile of inference and training of large language models (LLMs) is quite distinct and big divergences can result in the instability of the underlying electricity grid. In this paper we propose, to the best of our knowledge, the first physics-informed DLinear time-series model that can accurately forecast power utilization of an AI data center 5-80 minutes (short-term forecasting) into the future. The physics, based on a multi-node lumped thermal resistance-capacitance (RC) network consistent with Newton's law of cooling, is captured using newly derived time-dependent ordinary differential equations (ODE) that separately models and interlinks power consumption with the GPU compute and memory utilization and temperature. The resulting model, that we refer to as PI-DLinear, trained and evaluated on a real AI data center dataset and is not only more accurate than the state-of-the-art (SOTA) models tested, but the forecast profile respects the underlying physics under power throttling and load transient events. Relative to the SOTA transformer-based and non-transformer-based models, improvements in forecasting accuracy (averaged across all look-back and prediction windows) range from 0.782%-39.08% for MSE, 0.993%-51.82% for MAE, and 0.370%-22.28% for RMSE.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04074v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CE</category>
      <category>cs.DC</category>
      <category>cs.ET</category>
      <category>cs.OS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mohammad AlShaikh Saleh, Sanjay Chawla, Sertac Bayhan, Haitham Abu-Rub, Ali Ghrayeb</dc:creator>
    </item>
    <item>
      <title>RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction</title>
      <link>https://arxiv.org/abs/2605.04075</link>
      <description>arXiv:2605.04075v1 Announce Type: new 
Abstract: Multimodal Large Language Models face severe challenges in computational efficiency and memory consumption due to the substantial expansion of the visual KV cache when processing long visual contexts. Existing KV cache compression methods typically rely on the "persistence of importance" hypothesis to prune tokens. However, this approach proves fragile in multimodal settings due to two key issues: 1) Visual tokens display "deferred importance," initially exhibiting low salience but becoming pivotal during later decoding, which can lead to premature eviction. 2) Discrete pruning disrupts the inherent spatial continuity of visual cues. To address these challenges, we propose RetentiveKV, an entropy-driven KV cache optimization method that reformulates KV eviction from "discrete context truncation" to "continuous memory evolution" based on State Space Models. Our method leverages information entropy to quantify the information potential of low-attention tokens and integrates tokens scheduled for eviction into a continuous state space through entropy-guided state transitions, enabling their dynamic reactivation when semantic relevance arises during subsequent decoding. Extensive experiments on multimodal benchmarks demonstrate that RetentiveKV achieves 5.0 times KV cache compression and 1.5 times decoding acceleration.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04075v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sihao Liu, YuFan Xiong, Zhonghua Jiang, Zhaode Wang, chengfei lv Shengyu Zhang</dc:creator>
    </item>
    <item>
      <title>A Regulatory Governance Framework for AI-Driven Financial Fraud Detection in U.S. Banking: Integrating OCC, SR 11-7, CFPB, and FinCEN Compliance Requirements for Model Development, Validation, and Monitoring Lifecycles</title>
      <link>https://arxiv.org/abs/2605.04076</link>
      <description>arXiv:2605.04076v1 Announce Type: new 
Abstract: U.S. financial institutions deploying AI-based fraud detection face a fragmented compliance landscape spanning four regulatory frameworks -- OCC Bulletin 2011-12, SR 11-7, the CFPB AI circular, and FinCEN BSA/SAR requirements -- with no integrated governance life cycle connecting these requirements to model development, validation, and monitoring practice. This paper presents the Regulatory Governance Framework for AI-Driven Financial Fraud Detection (RGF-AFFD), a three-tier governance architecture empirically anchored in a multi-study empirical program. Using the IEEE-CIS dataset (590,540 transactions) and ULB benchmark (284,807 transactions), we benchmark six architectures including an LSTM+XGBoost ensemble, and conduct ablation, temporal drift, SHAP interpretability, and BISG fairness analyses. The LSTM+XGBoost ensemble achieves ROC-AUC of 0.9289 (F1: 0.6360) with a benefit-cost ratio of 6:1. XGBoost demonstrates the strongest temporal stability (delta-AUC = -0.0017 versus -0.0626 for LSTM). The RDT-FG Regulatory Digital Twin meta-model translates metrics into four regulator-specific health scores and a composite Regulatory Fitness Index for continuous compliance monitoring. The RGF-AFFD is the first integrated deployment blueprint to simultaneously satisfy OCC, SR 11-7, CFPB, and FinCEN requirements, supported by a community bank implementation vignette and four evidence-based policy recommendations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04076v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mohammad Nasir Uddin</dc:creator>
    </item>
    <item>
      <title>Balanced Aggregation: Understanding and Fixing Aggregation Bias in GRPO</title>
      <link>https://arxiv.org/abs/2605.04077</link>
      <description>arXiv:2605.04077v1 Announce Type: new 
Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a central paradigm for improving reasoning and code generation in large language models, and GRPO-style training is widely adopted for its simplicity and effectiveness. However, an important design choice remains underexplored: how token-level policy gradient terms are aggregated within each sampled group. Standard GRPO uses sequence aggregation, while recent work has advocated token aggregation as a better alternative. We show that these two rules induce different optimization biases: token aggregation introduces sign-length coupling, while sequence aggregation implicitly downweights longer responses through sequence-level equal weighting. To address this tension, we propose \textbf{Balanced Aggregation (BA)}, a simple drop-in replacement that computes token-level means separately within the positive and negative subsets and then combines them with sequence-count-based weights. Experiments with Qwen2.5-Math-7B and Qwen3-1.7B on DAPO-17k and Polaris, evaluated on six reasoning and coding benchmarks, show that BA consistently improves training stability and final performance over standard token and sequence aggregation. Our analysis further shows that the relative effectiveness of token and sequence aggregation is largely governed by response-length variation and the positive-negative length gap, highlighting aggregation as a critical design dimension in GRPO-style RLVR.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04077v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhiyuan Zeng, Jiameng Huang, Zhangyue Yin, Jiashuo Liu, Ziniu Li, Bingrui Li, Yuhao Wu, Yining Zheng, Ge Zhang, Wenhao Huang, Xipeng Qiu</dc:creator>
    </item>
    <item>
      <title>Validity-Calibrated Reasoning Distillation</title>
      <link>https://arxiv.org/abs/2605.04078</link>
      <description>arXiv:2605.04078v1 Announce Type: new 
Abstract: Reasoning distillation aims to transfer multi-step reasoning capabilities from large language models to smaller, more efficient ones. While recent methods have shown promising gains, they typically rely on static teacher-student hierarchies and frame distillation as trajectory imitation. This is misaligned with the structure of reasoning, where intermediate steps are often locally under-specified: global correctness constrains the final answer, but does not uniquely determine each intermediate move. We propose validity-calibrated reasoning distillation, a framework that treats reasoning distillation as a problem of local learning-signal allocation rather than path alignment. Instead of enforcing token-level imitation, we compare the student's and teacher's proposed next-step actions under the same prefix and use their relative local validity to modulate the strength of the distillation update. This yields a dynamic, context-dependent supervision mechanism that preserves the teacher's structural guidance while adapting update strength to local reasoning quality. Across mathematical reasoning, code generation, and instruction-following benchmarks, our method consistently outperforms strong distillation baselines. These results indicate that effective LLM reasoning distillation is governed not by rigid trajectory imitation, but by principled, locally calibrated allocation of learning signal.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04078v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Khouloud Saadi, Di Wang</dc:creator>
    </item>
    <item>
      <title>Efficient Handwriting-Based Alzheimer,s Disease Diagnosis Using a Low-Rank Mixture of Experts Deep Learning Framework</title>
      <link>https://arxiv.org/abs/2605.04079</link>
      <description>arXiv:2605.04079v1 Announce Type: new 
Abstract: Early and reliable detection of Alzheimer's disease (AD) is crucial for timely clinical intervention and improved patient management. It also supports the evaluation of emerging therapeutic strategies. In this paper, we propose a Low-Rank Mixture of Experts (LoRA-MoE) deep learning framework for Alzheimer's disease diagnosis based on handwriting analysis. Handwriting signals provide a non-invasive and scalable digital biomarker that captures subtle cognitive-motor impairments associated with early AD progression. The proposed architecture allows multiple experts to specialize in different handwriting patterns while sharing a common base network. This design enables efficient learning of general representations while reducing interference between experts. Each expert is equipped with lightweight low-rank adapters. This mechanism significantly reduces the number of trainable parameters compared with standard Mixture of Experts (MoE) models and improves training stability. The proposed framework is evaluated on the Diagnosis AlzheimeR WIth haNdwriting (DARWIN) dataset. Extensive experiments are conducted, including ablation studies on key architectural parameters such as hidden dimension size, number of experts, and LoRA rank. The method is compared with multilayer perceptron (MLP) and conventional MoE architectures. In addition, stacking ensemble strategies (StackMean and StackMax) are investigated to improve robustness and predictive performance. Experimental results show that the LoRA-MoE framework achieves powerful diagnostic performance while activating significantly fewer parameters during inference. These results highlight the potential of the proposed approach as an accurate and computationally efficient solution for handwriting-based Alzheimer's disease screening and digital health applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04079v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wu Wang, Yuang Cheng, Fouzi Harrou, Ying Sun</dc:creator>
    </item>
    <item>
      <title>Connecting online criminal behavior with machine learning: Using authorship attribution to analyze and link potential online traffickers</title>
      <link>https://arxiv.org/abs/2605.04080</link>
      <description>arXiv:2605.04080v1 Announce Type: new 
Abstract: This research investigated how online criminal activities can be better understood and connected using data-driven machine learning methods. Many illegal activities, such as human trafficking and illicit trade, have moved to online platforms where offenders hide behind anonymous accounts and frequently change identities. This makes it difficult for authorities to understand how large these networks are and how different online profiles may be linked.
  The research shows that people tend to maintain consistent patterns in how they write advertisements and present images online, even when they try to stay anonymous. By analysing these patterns across large collections of online advertisements, the research demonstrates how to link related accounts and identify repeated behaviour across illegal online markets.
  In addition, the research also addresses how such methods should be used responsibly. It proposes clear guidelines to ensure that privacy, fairness, and transparency are respected when these tools are applied. Overall, the research provides practical ways to support law enforcement investigations while emphasising careful and ethical use.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04080v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <category>cs.CY</category>
      <category>cs.LG</category>
      <category>cs.SI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.26481/dis.20250107vs</arxiv:DOI>
      <dc:creator>Vageesh Kumar Saxena</dc:creator>
    </item>
    <item>
      <title>Time series causal discovery with variable lags</title>
      <link>https://arxiv.org/abs/2605.04081</link>
      <description>arXiv:2605.04081v1 Announce Type: new 
Abstract: Causal Bayesian Networks (CBNs) are a powerful tool for reasoning under uncertainty about complex real-world problems. Such problems evolve over time, responding to external shocks as they occur. To support decision-making, CBNs require a cause-and-effect map of the variables under consideration, known as the network's structure. Learning the graphical structure of a causal model from data remains challenging; learning it from time-series data is even harder because dependencies may arise at different time lags. Existing time-series causal discovery methods often assume a fixed lag window and do not explicitly optimise edge-specific lags. We propose a Tabu-based structure learning algorithm that searches for a time-ordered directed structure (i.e., where every edge respects time) while allowing edge-specific lags up to a specified maximum lag. The approach uses a decomposable BIC-based score with node-specific effective sample sizes and an explicit lag-length penalty encouraging parsimonious delay assignments while preserving efficient local score updates. We provide theoretical guarantees of validity and local optimality, and we also describe a parallel implementation for improved scalability. In simulations, the method recovered graph structure competitively and estimated lags accurately when true adjacencies were recovered. On a real-world UK COVID-19 policy dataset, the learnt structure was dominated by short delays while retaining a substantial minority of longer-lag dependencies, consistent with delayed behavioural and epidemiological effects.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04081v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bruno Petrungaro, Anthony C. Constantinou</dc:creator>
    </item>
    <item>
      <title>Enhancing the interpretability of spatially variable N2O model predictions with soft sensors during wastewater treatment</title>
      <link>https://arxiv.org/abs/2605.04082</link>
      <description>arXiv:2605.04082v1 Announce Type: new 
Abstract: Model-based solutions for nitrous oxide (N2O) emissions from wastewater treatment plants (WWTP) are informed by operational datasets designed to control nutrient levels in liquid waste, coupled with dedicated campaigns for N2O measurements. We analysed how machine learning (ML) models predict disturbances to WWT operation and spatially variable N2O emissions. A real dataset was investigated to validate the modelling framework from N2O emissions predicted by four ML models (R2 = 0.79 - 0.89). Monitoring campaigns for N2O were simulated with a plant-wide mechanistic model to include additional sensors, site-level N2O datasets, and wastewater disturbances (n = 16). ML models were highly accurate (0.97 +- 0.02, n = 80), but the feature importance depended on the model, the scenario and the N2O measurement scale (reactor vs. WWTP). We argue that N2O soft sensor model predictions are limited to the measuring location and the methodological uncertainty of the dataset, which affect the interpretability of the model. Lastly, the analysis of the mechanistic model structure exposed interactions between autotrophic and heterotrophic pathways over nitric oxide which can overestimate aerobic nitrite production and bias the N2O pathway contributions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04082v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Mohammad Raeisi Gahrouei, Pedram Ramin, Vincenzo A. Riggio, Carlos Domingo-Felez</dc:creator>
    </item>
    <item>
      <title>AsymmetryZero: A Framework for Operationalizing Human Expert Preferences as Semantic Evals</title>
      <link>https://arxiv.org/abs/2605.04083</link>
      <description>arXiv:2605.04083v1 Announce Type: new 
Abstract: Much of the focus in RL today is on evaluation design: building meaningful evals that serve simultaneously as benchmarks and as well-defined reward signals for post-training. Yet, many real-world tasks are governed by subjective, procedural, and domain-specific requirements that are difficult to encode as exact-match targets or open-ended preference judgments frequently used in RL pipelines today. In this work, we present AsymmetryZero, a framework for operationalizing human expert preferences as semantic evals. AsymmetryZero represents each task as a stable evaluation contract that makes grading criteria explicit: what is being graded, how each criterion is judged, and how criterion-level decisions are aggregated into a task outcome. The same contract can be executed using Inspect for model-only evaluations, as well as the Harbor Framework for agentic evaluations, enabling comparable scores and shared audit artifacts across both settings. We argue that the central challenge in post-training today is the faithful encoding of expert requirements into the evaluation itself. To that end, we present a study using Harbor that holds task contracts fixed and compares a five-model frontier jury against a five-model compact jury across four frontier-class solvers (Claude Opus 4.6, GPT-5.4, Grok-4.20, Gemini-3.1-Pro). We find that criterion-level frontier-vs-compact agreement ranges from $75.9\%$ to $89.6\%$ (strict common-subset agreement: $77.8\%$ to $92.1\%$), while compact juries exhibit substantially higher internal dissent (3--2 split rate $28.7\%$--$32.4\%$) than frontier juries ($6.1\%$--$11.5\%$). Verifier traces further show that compact juries reduce per-criterion judging cost to roughly $4.2\%$--$5.6\%$ of frontier and latency to roughly $21.7\%$--$27.1\%$, even as aggregated task-level outcomes often remain comparatively stable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04083v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tadhg Looram, Lucas Nuzzi, Kyle Waters, Steven Dillmann</dc:creator>
    </item>
    <item>
      <title>FASQ: Flexible Accelerated Subspace Quantization for Calibration-Free LLM Compression</title>
      <link>https://arxiv.org/abs/2605.04084</link>
      <description>arXiv:2605.04084v1 Announce Type: new 
Abstract: Compressing large language models (LLMs) for deployment on commodity GPUs remains challenging: conventional scalar quantization is limited to fixed bit-widths (e.g., 8/4/3-bit), offers only a few discrete compression points, and typically requires calibration data. We present FASQ (Flexible Accelerated Subspace Quantization), a calibration-free framework that applies product quantization to LLM weight matrices. By tuning two parameters, sub-vector size and codebook cardinality, FASQ exposes a continuous design space spanning 27-49% of the original FP16 model size, filling compression gaps that fixed-bit schemes cannot reach. On Meta-Llama-3-8B, FASQ surpasses 4-bit GPTQ and AWQ in accuracy (67.1-67.7 avg.) at 37-42% model size, with consistent results on Qwen3-8B and Qwen3.5-9B-Base. To make product quantization practical at inference time, we design custom CUDA kernels: a LUT-free direct-compute GEMV for decode and an output-stationary double-buffered LUT GEMM for prefill, both with split-K parallelism. On an RTX~3090, FASQ achieves 45.2 tok/s decode at effective 4-bit (2.56x memory reduction) and 51.8 tok/s at effective 3-bit (2.80x), both surpassing FP16 tensor-core performance (43.9 tok/s) and delivering 1.6 to 1.8x the throughput of AWQ, 2.5 to 2.5x of GPTQ, and 4.3 to 5x of RTN. FASQ is the only compressed method that accelerates decode beyond FP16, offering calibration-free compression, continuous size-quality trade-offs, and real-time inference on a single consumer GPU.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04084v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.AR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ye Qiao, Yian Wang, Zhiheng Chen, Hyoukjun Kwon, Sitao Huang</dc:creator>
    </item>
    <item>
      <title>Evaluating Patient Safety Risks in Generative AI: Development and Validation of a FMECA Framework for Generated Clinical Content</title>
      <link>https://arxiv.org/abs/2605.04085</link>
      <description>arXiv:2605.04085v1 Announce Type: new 
Abstract: Objectives: Large language models (LLMs) are increasingly used for clinical text summarization, yet structured methods to assess associated patient safety risks remain limited. Failure Mode, Effects, and Criticality Analysis (FMECA) provides a proactive framework for systematic risk identification but has not been adapted to LLM-generated clinical content. This study aimed to develop and validate a novel FMECA framework for the prospective assessment of patient safety risks in LLM-generated clinical summaries.
  Materials and Methods: An interdisciplinary expert panel (n = 8) developed a taxonomy of failure modes through literature review and brainstorming. Standard FMECA dimensions (occurrence, severity, detectability) were adapted into 5-point ordinal scales. The framework was applied to 36 discharge summaries from four patients, generated by an open LLM (GPT-OSS 120B) using real-world clinical data from the Geneva University Hospitals. Reviewers independently annotated the summaries across two rounds. Inter-rater reliability was assessed at failure mode, severity and detectability score levels. Usability and content validity were evaluated using an adapted System Usability Scale and structured feedback.
  Results: The final framework comprised 14 failure modes organized into categories. Inter-rater agreement improved between rounds, reaching moderate-to-substantial agreement for failure mode identification and good agreement for severity and detectability scoring. Usability was rated as good (mean SUS: 79.2/100), with high evaluator confidence.
  Discussion and Conclusion: This study presents the first FMECA-based framework for systematic patient safety risk assessment of LLM-generated clinical summaries. The framework provides a structured and reproducible method for identifying clinically relevant risks caused by these summaries.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04085v1</guid>
      <category>cs.CY</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>stat.ME</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Lydie Bednarczyk, Jamil Zaghir, Julien Ehrsam, Maria Tcherepanova, Christian Skalafouris, Karim Gariani, Catherine Geslin, Claire-B\'en\'edicte Rivara, Pascal Bonnabry, Laetitia Gosetto, Richard Dubos, Mina Bjelogrlic, Christophe Gaudet-Blavignac, Christian Lovis</dc:creator>
    </item>
    <item>
      <title>OpenCLAW-Nexus: A Self-Reinforcing Trust Framework for Byzantine-Resilient Decentralized Federated Learning</title>
      <link>https://arxiv.org/abs/2605.04091</link>
      <description>arXiv:2605.04091v1 Announce Type: new 
Abstract: Decentralized Federated Learning (DFL) eliminates the central aggregator but introduces a severe 'trust gap': without a trusted coordinator, the system becomes vulnerable to Byzantine and Sybil attacks, while existing solutions treat node selection, aggregation, and consensus as isolated modules, often relying on a trusted root dataset unavailable in truly decentralized settings.We propose OpenCLAW-Nexus, a self-reinforcing trust framework that bridges this gap through a single primitive, a discounted Beta-reputation model, that unifies reputation-based node selection, reputation-weighted aggregation Rep-FedAvg, and reputation-aware BFT consensus. Rep-FedAvg eliminates the trusted root dataset requirement; we formally prove reputation separation between honest and Byzantine nodes under non-IID data with noisy evaluations.On a 1,000-node global testbed spanning three cloud providers and nine regions, Rep-FedAvg achieves 72.6% accuracy on non-IID CIFAR-10 with 20% Byzantine nodes and record-level differential privacy, within 0.5,pp of centralized FLTrust.Under a 300-node Sybil attack, reputation-weighted consensus maintains 84.2% validation correctness versus 62.8% (PoW) and 47.6% (PoS).</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04091v1</guid>
      <category>cs.NI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wenyang Jia, Qiankang Xu, Ziwei Yan, Chunhua Kang, Yang Yang, Jinglu He, Kai Lei</dc:creator>
    </item>
    <item>
      <title>Decision Evidence Maturity Model for Agentic AI: A Property-Level Method Specification</title>
      <link>https://arxiv.org/abs/2605.04093</link>
      <description>arXiv:2605.04093v1 Announce Type: new 
Abstract: Agentic AI systems produce decision evidence at scale through execution telemetry, but property-level reconstruction often fails when an external party asks a specific governance question about a specific decision: the assembled evidence is insufficient to answer it. We name this pattern the container fallacy: the automatic equation of evidence-container presence with audit sufficiency. This paper specifies the Decision Evidence Maturity Model (DEMM), a property-level reconstructability method for agentic decisions. DEMM classifies evidence sufficiency into four executable categories plus a protocol-level "conflicting" category and aggregates per-property verdicts into a five-level capability rubric anchored to the established maturity-model lineage. The open-source Decision Trace Reconstructor ships ten executable adapter-fallback classes spanning vendor SDKs, protocol traces, public-postmortem prose, and generic JSONL records. A reproducible feasibility exercise runs the protocol on 140 synthetic scenarios plus three public incidents; the resulting completeness range (53.6% to 100%) is implementation behaviour, not external validation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04093v1</guid>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Oleg Solozobov</dc:creator>
    </item>
    <item>
      <title>Are Multimodal LLMs Ready for Clinical Dermatology? A Real-World Evaluation in Dermatology</title>
      <link>https://arxiv.org/abs/2605.04098</link>
      <description>arXiv:2605.04098v1 Announce Type: new 
Abstract: Multimodal large language models (MLLMs) have demonstrated promise on publicly available dermatology benchmarks. However, benchmark performance may not generalize to real-world dermatologic decision-making. To quantify this benchmark-to-bedside gap, we evaluated four open-weight MLLMs (InternVL-Chat v1.5, LLaVA-Med v1.5, SkinGPT4 and MedGemma-4B-Instruct) and one commercial MLLM (GPT-4.1) across three publicly available dermatology datasets and a retrospective multi-site hospital-based dermatology consultation cohort comprising 5,811 cases and 46,405 clinical images. Models were evaluated on two clinically relevant tasks: differential diagnosis generation and severity-based triage. Diagnostic performance was modest on public datasets and declined substantially in the real-world cohort. On public benchmarks, top-3 diagnostic accuracy reached 26.55% for the best open-weight model and 42.25% for GPT-4.1. On real-world consultation cases using images alone, top-3 diagnostic accuracy fell to 1.50%-13.35% among open-weight models and 24.65% for GPT-4.1. Incorporating clinical context improved performance across all models, increasing top-3 diagnostic accuracy up to 28.75% among open-weight models and 38.93% for GPT-4.1. However, model outputs were highly sensitive to incomplete or erroneous consultation context. For severity-based triage, models achieved moderate sensitivity (above 60%), suggesting potential utility for screening but insufficient reliability for clinical deployment. These findings demonstrate that benchmark performance substantially overestimates the real-world clinical capability of current dermatology MLLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04098v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Roy Jiang, Hyunjae Kim, Zhenyue Qin, Morten Lee, Margaret MacGibeny, Ailish Hanly, Angela Sadlowski, Shanin Chowdhury, Xuguang Ai, Jeffrey Gehlhausen, Qingyu Chen</dc:creator>
    </item>
    <item>
      <title>Regularized Centered Emphatic Temporal Difference Learning</title>
      <link>https://arxiv.org/abs/2605.04100</link>
      <description>arXiv:2605.04100v1 Announce Type: new 
Abstract: Off-policy temporal-difference (TD) learning with function approximation faces a structural tradeoff among stability, projection geometry, and variance control. Emphatic TD (ETD) improves the off-policy projection geometry through follow-on emphasis, but the follow-on trace can have high variance. We revisit this tradeoff through Bellman-error centering. Although centering naturally removes a common drift term from TD errors, we show that a naive centered emphatic extension introduces an auxiliary coupling that can destroy the positive-definiteness of the ETD key matrix. We propose \emph{Regularized Emphatic Temporal-Difference Learning} (RETD), which preserves the follow-on trace and regularizes only the auxiliary centering recursion, corresponding to lifting the lower-right block of the coupled key matrix from \(1\) to \(1+c\). We derive the RETD core matrix, prove convergence under a conservative sufficient regularization condition, and evaluate the method on diagnostic linear off-policy prediction tasks. The experiments show that RETD avoids the instability of naive centered emphatic learning, preserves favorable emphatic geometry, and exhibits a robust intermediate regime for the regularization parameter \(c\) across the diagnostics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04100v1</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Xingguo Chen, Chaohui Wu, Jinguo Ye, Chao Li, Shangdong Yang, Guang Yang, Tianyu Liang, Wenhao Wang</dc:creator>
    </item>
    <item>
      <title>HERCULES: Hardware-Efficient, Robust, Continual Learning Neural Architecture Search</title>
      <link>https://arxiv.org/abs/2605.04103</link>
      <description>arXiv:2605.04103v1 Announce Type: new 
Abstract: Neural Architecture Search (NAS) has emerged as a powerful framework for automatically discovering neural architectures that balance accuracy and efficiency. However, as AI transitions from static benchmarks to real-world deployment, the traditional focus on hardware-aware efficiency is no longer sufficient. We observe that modern NAS methods, especially those that target edge AI, are evolving to address a triple objective: Efficiency, Robustness, and Continual Learning. While efficiency ensures feasibility in resource-constrained environments, robustness guarantees reliability under environmental variabilities, and continual learning enables adaptation to sequential tasks without catastrophic forgetting. We propose a taxonomy of NAS approaches through this triple lens, distinguishing between methods targeting resource optimization, environmental resilience, and architectural plasticity. This unified perspective reveals that these axes, though often studied in isolation, are mutually reinforcing. Building on this taxonomy, we map the current landscape of these NAS methods into a new framework called Hardware-Efficient, Robust, and ContinUal LEarning Search (HERCULES). We define the desiderata, the twelve labours of HERCULES, addressing the non-trivial challenge of balancing an adequate search-space exploration with the immense computational costs of a multi-objective NAS, accounting for these crucial objectives of current AI systems. By identifying critical gaps in existing research, this survey outlines a roadmap toward integrated algorithmic, architectural, and hardware-software co-design for truly deployable, lifelong-learning AI systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04103v1</guid>
      <category>cs.LG</category>
      <category>cs.AR</category>
      <category>cs.CL</category>
      <category>cs.CV</category>
      <category>cs.NE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Matteo Gambella, Fabrizio Pittorino, Manuel Roveri</dc:creator>
    </item>
    <item>
      <title>TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments</title>
      <link>https://arxiv.org/abs/2605.04107</link>
      <description>arXiv:2605.04107v1 Announce Type: new 
Abstract: Production agent frameworks (OpenAI Function Calling, Anthropic Tool Use, MCP) transmit tool schemas as JSON, a format designed for machine parsing, not for interpretation by language models. For small models (4B-14B), this protocol mismatch accounts for the majority of tool-use failure at production catalog sizes. We present TSCG, a deterministic tool-schema compiler that resolves this mismatch at the API boundary, converting JSON schemas into token-efficient structured text without model access, fine-tuning, or runtime search. TSCG combines eight composable operators with a formal compression bound (&gt;=51% on well-formed schemas).
  On TSCG-Agentic-Bench (about 19,000 calls, 12 models, 5 scenarios), TSCG restores Phi-4 14B from 0% to 84.4% accuracy at 20 tools (90.3% at 50 tools) and achieves 108-181% accuracy-retained ratio across three models on BFCL. Format-versus-compression decomposition (R^2=0.88 -&gt; 0.03) establishes representation change as the dominant mechanism. Per-operator isolation across three frontier models reveals three distinct operator-response profiles: operator-hungry (Opus 4.7), operator-sensitive (GPT-5.2), and operator-robust (Sonnet 4), providing per-model deployment guidance. Scaling experiments show accuracy advantages persisting on heavy production MCP schemas (+5.0 pp at about 10,500 input tokens) despite saturation on light synthetic catalogs, with 52-57% token savings throughout. The synthetic benchmark generalizes to real MCP schemas within 0.1 accuracy points. TSCG ships as a 1,200-line zero-dependency TypeScript package.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04107v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.5281/zenodo.19795759</arxiv:DOI>
      <dc:creator>Furkan Sakizli</dc:creator>
    </item>
    <item>
      <title>MuCALD-SplitFed: Causal-Latent Diffusion for Privacy-Preserving Multi-Task Split-Federated Medical Image Segmentation</title>
      <link>https://arxiv.org/abs/2605.04108</link>
      <description>arXiv:2605.04108v1 Announce Type: new 
Abstract: Federated Learning enables decentralized training by aggregating model updates across clients without sharing raw data, while Split Federated Learning further partitions the model between clients and a server to reduce computation and communication at the client side. However, decentralized medical institutions rarely operate on a single shared task, making standard Federated and SplitFed collaborations poorly aligned with real clinical workflows. Multi-task FL extends these frameworks by allowing clients to handle different tasks, but often introduces instability and privacy vulnerabilities. This study proposes \textbf{MuCALD-SplitFed}, a multi-task SplitFed framework that integrates causal representation learning and latent diffusion. Experiments show MuCALD-SplitFed consistently improves segmentation, while baseline SplitFed fails to converge. The proposed approach further reduces information leakage at split points, mitigating reconstruction-based and membership inference attacks. Additionally, MuCALD SplitFed outperforms state-of-the-art personalized FL and multi-task FL approaches. The code repository is: https://github.com/ChamaniS/MuCALD_SplitFed.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04108v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chamani Shiranthika, Hadi Hadizadeh, Parvaneh Saeedi</dc:creator>
    </item>
    <item>
      <title>Resource Utilization of Differentiable Logic Gate Networks Deployed on FPGAs</title>
      <link>https://arxiv.org/abs/2605.04109</link>
      <description>arXiv:2605.04109v1 Announce Type: new 
Abstract: On-edge machine learning (ML) often strives to maximize the intelligence of small models while miniaturizing the circuit size and power needed to perform inference. Meeting these needs, differentiable Logic Gate Networks (LGN) have demonstrated nanosecond-scale prediction speeds while reducing the required resources as compares to traditional binary neural networks. Despite these benefits, the trade-offs between LGN parameters and resulting hardware synthesis characteristics are not well characterized. This paper therefore studies the tradeoffs between power, resource utilization, inference speed, and model accuracy when varying the depth and width of LGNs synthesized for Field Programmable Gate Arrays (FPGA). Results reveal that the final layer of an LGN is critical to minimize timing and resource usage (i.e. 28\% decrease), as this layer dictates the logic size of summing operations. Subject to timing and routing constraints, deeper and wider LGNs can be synthesized for FPGA when the final layer is narrow. Further tradeoffs are presented to help ML engineers select baseline LGN architectures for FPGAs with a set number of Look Up Tables (LUT).</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04109v1</guid>
      <category>cs.AR</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Stephen Wormald, Gilon Kravatsky, Damon Woodard, Domenic Forte</dc:creator>
    </item>
    <item>
      <title>Optimally Covering Large Triangles with Homothetic Unit Triangles</title>
      <link>https://arxiv.org/abs/2605.04111</link>
      <description>arXiv:2605.04111v1 Announce Type: new 
Abstract: We answer an open problem in the \emph{American Mathematical Monthly} about covering large triangles. Given a triangle $T$ of any triangular shape with a selected side length between $n \in \mathbb{N}$ and $n+1$, Baek and Lee proved that $T$ could not be covered with $n^2+1$ homothetic unit triangles (with the selected side of length 1). Letting $T_{n+d}$ denote a triangle with selected side length $n + d$ with $d \in (0, 1)$, Baek and Lee extended their proof to establish upper bounds for $d$ above which a $T_{n+d}$ cannot be covered with $n^2+2$ or $n^2+3$ homothetic unit triangles. Then, they showed that these bounds are tight based on analyses of a method by Conway and Soifer for the $n^2+2$ case and their own method for the $n^2+3$ case. Baek and Lee stated as an open problem the need to find tight upper bounds for the $n^2 + k$ cases for $4 \le k \le 2n$. We extend the Baek and Lee proof to establish upper bounds for those higher cases, and we show the upper bounds are tight by presenting two new triangle covering methods for the odd and even cases of $k$ that meet the upper bounds, as well as an optimal consolidated method that uses whichever of the two will cover a given $T_{n+d}$ with the fewest homothetic unit triangles.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04111v1</guid>
      <category>cs.CG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>John M. Boyer</dc:creator>
    </item>
    <item>
      <title>Semantic Reverse Engineering Legacy Software Applications with ChatGPT, Gemini AI, and Claude AI</title>
      <link>https://arxiv.org/abs/2605.04114</link>
      <description>arXiv:2605.04114v1 Announce Type: new 
Abstract: This research paper describes our research results on using ChatGPT, Gemini, and Claude AI to semantically reverse engineer legacy database software applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04114v1</guid>
      <category>cs.SE</category>
      <category>cs.DB</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.56831/PSEN-08-266</arxiv:DOI>
      <arxiv:journal_reference>Primera Scientific Engineering, Denton, TX, 8.5 (2026): 04-23</arxiv:journal_reference>
      <dc:creator>Christian Mancas, Diana Christina Mancas</dc:creator>
    </item>
    <item>
      <title>Learning reveals invisible structure in low-rank RNNs</title>
      <link>https://arxiv.org/abs/2605.04115</link>
      <description>arXiv:2605.04115v1 Announce Type: new 
Abstract: Learning in neural systems arises from synaptic changes that reshape the representations underlying behavior. While low-rank recurrent neural networks (RNNs) have emerged as a powerful framework for linking connectivity to function, a theoretical understanding of their learning process remains elusive. Here, we extend the low-rank framework from activity to learning by deriving gradient-descent dynamics directly in a reduced overlap space. We formulate a closed-form, low-dimensional system of ODEs that governs learning in this space, exact for linear RNNs and asymptotically exact for nonlinear RNNs in the large-$N$ Gaussian limit. Central to our analysis is a distinction between two classes of overlaps: loss-visible overlaps, which fully determine network activity, output, and loss, and loss-invisible overlaps, which do not affect function but are required to describe learning. We illustrate the consequences of this decomposition through two phenomena. First, we show that learning can serve as a perturbation that exposes differences in connectivity between functionally equivalent networks. Second, we show that loss-invisible overlaps can act as memory variables that encode training history, and characterize the conditions under which this occurs. Finally, we present several testable predictions for biological learning experiments derived from our theory.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04115v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>q-bio.NC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yoav Ger, Omri Barak</dc:creator>
    </item>
    <item>
      <title>Membership Inference Attacks for Retrieval Based In-Context Learning for Document Question Answering</title>
      <link>https://arxiv.org/abs/2605.04116</link>
      <description>arXiv:2605.04116v1 Announce Type: new 
Abstract: We show that remotely hosted applications employing in-context learning when augmented with a retrieval function to select in-context examples can be vulnerable to membership-inference attacks even when the service provider and users are separate parties. We propose two black-box membership inference attacks that exploit query text prefixes to distinguish member from non-member inputs. The first attack uses a reference model to estimate an otherwise unavailable loss metric. The second attack improves upon it by eliminating the reference model and instead computing a membership statistic through a simple but novel weighted-averaging scheme. Our comprehensive empirical evaluations consider a stricter case in which the adversary has a paraphrased version of the text in the queries and show that our attacks can exhibit stronger resilience to paraphrasing and outperform three prior attacks in many cases with small number of prefixes. We also adapt an existing ensemble prompting defense to our setting, demonstrating that it substantially mitigates the privacy leakage caused by our second attack.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04116v1</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tejas Kulkarni, Antti Koskela, Laith Zumot</dc:creator>
    </item>
    <item>
      <title>Simultaneous CNN Approximation on Manifolds with Applications to Boundary Value Problems</title>
      <link>https://arxiv.org/abs/2605.04126</link>
      <description>arXiv:2605.04126v1 Announce Type: new 
Abstract: This paper develops convolutional neural network (CNN) methods for simultaneous approximation and elliptic boundary value problems on compact Riemannian manifolds. We establish simultaneous Sobolev approximation results for single- and multichannel CNNs, showing that manifold functions and their derivatives can be approximated with rates governed by the intrinsic dimension and the smoothness gap, rather than by the ambient dimension, thereby mitigating the curse of dimensionality. Building on this approximation theory, we propose a physics-informed CNN (PICNN) framework specially designed for boundary value problems. The main numerical issue is a boundary-norm mismatch: standard PINNs usually impose boundary data through low-order, often L2-type, penalties, whereas elliptic stability requires Sobolev trace control. We address this by introducing a spectral boundary loss based on the boundary Laplace-Beltrami operator, which represents trace errors as weighted frequency energies and relates truncation error to boundary eigenvalue decay. This avoids smooth auxiliary constructions required by exact boundary enforcement and singular double integrals arising in Sobolev-Slobodeckij penalties, while enabling implementations based on Fast Fourier Transforms (FFTs) or precomputed spectral bases on structured boundaries. Numerical experiments demonstrate improved accuracy, convergence, and stability over standard PINNs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04126v1</guid>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hanfei Zhou, Lei Shi</dc:creator>
    </item>
    <item>
      <title>Position: the Stochastic Parrot in the Coal Mine. Model Collapse is a Threat to Low-Resource Communities</title>
      <link>https://arxiv.org/abs/2605.04127</link>
      <description>arXiv:2605.04127v1 Announce Type: new 
Abstract: Model collapse, the degradation in performance that arises when generative models are trained on the outputs of prior models, is an increasing concern as artificially generated content proliferates. Related critiques of large language models have highlighted their tendency to reproduce frequent patterns in training data, their reliance on vast datasets, and their substantial environmental cost. Together, these factors contribute to data degradation, the reinforcement of cultural biases, and inefficient resource use. In this position paper we aim to combine these views and argue that model collapse threatens current efforts to democratize AI. By reducing training efficiency and skewing data distributions away from the tails of their support, model collapse disproportionately impacts low-resource and marginalized communities. We examine both the environmental and cultural implications of this phenomenon, situate our position within recent position papers on model collapse, and conclude with a call to action. Finally, we outline initial directions for mitigating these effects.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04127v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Devon Jarvis, Richard Klein, Benjamin Rosman, Steven James, Stefano Sarao Mannelli</dc:creator>
    </item>
    <item>
      <title>Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation</title>
      <link>https://arxiv.org/abs/2605.04128</link>
      <description>arXiv:2605.04128v1 Announce Type: new 
Abstract: We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing. JoyAI-Image couples a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), allowing perception and generation to interact through a shared multimodal interface. Around this architecture, we build a scalable training recipe that combines unified instruction tuning, long-text rendering supervision, spatially grounded data, and both general and spatial editing signals. This design gives the model broad multimodal capability while strengthening geometry-aware reasoning and controllable visual synthesis. Experiments across understanding, generation, long-text rendering, and editing benchmarks show that JoyAI-Image achieves state-of-the-art or highly competitive performance. More importantly, the bidirectional loop between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables the model to move beyond general visual competence toward stronger spatial intelligence. These results suggest a promising path for unified visual models in downstream applications such as vision-language-action systems and world models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04128v1</guid>
      <category>cs.GR</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lin Song, Wenbo Li, Guoqing Ma, Wei Tang, Bo Wang, Yuan Zhang, Yijun Yang, Yicheng Xiao, Jianhui Liu, Yanbing Zhang, Guohui Zhang, Wenhu Zhang, Hang Xu, Nan Jiang, Xin Han, Haoze Sun, Maoquan Zhang, Haoyang Huang, Nan Duan</dc:creator>
    </item>
    <item>
      <title>Quantum-Resistant Networks: A Review of Primitives, Protocols and Best Practices</title>
      <link>https://arxiv.org/abs/2605.04129</link>
      <description>arXiv:2605.04129v1 Announce Type: new 
Abstract: Large-scale quantum computers threaten the public-key cryptographic foundations underpinning today's network security infrastructures. While significant progress has been made in standardizing post-quantum cryptographic (PQC) primitives and adapting individual protocols such as TLS and SSH, far less attention has been paid to the broader architectural consequences of the post-quantum transition for networked systems. In particular, many real-world deployments such as mobile networks, industrial control systems, IoT environments, and regulated infrastructures cannot assume the universal availability, deployability, or desirability of PQ public-key infrastructures. This paper presents the first comprehensive systematization of PQ-resistant network architectures, focusing on key distribution and management as a system-level design problem rather than a protocol-local substitution. We introduce a unified taxonomy spanning cryptographic foundations (symmetric-only, PQ-PKI, hybrid, and information-theoretic multi-path), key-distribution architectures (centralized, hierarchical, replicated, threshold, MPC-backed, and serverless), trust and threat models, key-management lifecycle, and deployment environments. Using this framework, we analyze the security, scalability, and operational trade-offs of a wide range of architectures under realistic PQ adversary assumptions, including harvest-now, decrypt-later attacks and partial infrastructure compromise. Our study highlights fundamental gaps in existing approaches, clarifies when PQ-PKI is necessary or avoidable, and identifies promising research directions for building cryptographically agile, quantum-resilient network infrastructures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04129v1</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Elisa Bertino, Ramana Kompella, Ashish Kundu, Cristina Nita-Rotaru, Jaideep Vaidya, Attila A. Yavuz</dc:creator>
    </item>
    <item>
      <title>Constrained Extreme Gradient Boosting for Adapting Reduced-Order Models</title>
      <link>https://arxiv.org/abs/2605.04130</link>
      <description>arXiv:2605.04130v1 Announce Type: new 
Abstract: High-fidelity simulations, such as computational fluid dynamics and finite element analysis, are essential for modeling complex engineering systems but are often prohibitively expensive for tasks including parametric studies, optimization, and real-time control. Projection-based reduced-order models (ROMs) alleviate this cost by projecting the governing dynamics onto low-dimensional subspaces. However, their performance can deteriorate under parameter variation, motivating the need for adaptive basis construction. In this work, we propose a constrained ensemble learning framework, termed Constrained Extreme Gradient Boosting (cXGBoost), for predicting Proper Orthogonal Decomposition (POD) bases as functions of system parameters. The approach leverages a geometric representation of subspaces on the Grassmann manifold, which are mapped to a Euclidean space to enable efficient regression using gradient boosting trees. A norm constraint is imposed during training to ensure the validity of the inverse mapping and preserve the geometric structure of the predicted subspaces. The proposed method is evaluated on four numerical examples, including fluid dynamics and wave propagation problems, demonstrating its ability to accurately predict parameter-dependent bases while maintaining robustness across nonlinear regimes. These results highlight the potential of combining geometric learning with constrained ensemble methods for scalable and reliable reduced-order modeling of high-dimensional parametric systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04130v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Melika Baghi, Xiao Liu, Kamran Paynabar</dc:creator>
    </item>
    <item>
      <title>Two Integration Pathways in Human-Centered Requirements Engineering: A Systematic Mapping Study of Structural Gaps</title>
      <link>https://arxiv.org/abs/2605.04132</link>
      <description>arXiv:2605.04132v1 Announce Type: new 
Abstract: Human-centered Requirements Engineering (HC-RE) integrates user cognition, emotions, and social interactions into the RE process through contributions from disciplines such as psychology, cognitive science, design thinking, and human-computer interaction. Despite growing interest, how these multidisciplinary contributions are structured and why they remain fragmented across the RE lifecycle is not well understood.
  This systematic mapping study analyzes 56 primary studies across seven dimensions, including RE phases, user involvement techniques, contributing disciplines, and evaluation methods. Results show that 70\% of approaches involve multidisciplinary contributions, yet only 39% have been empirically evaluated and 48% address only the elicitation phase. A cross-study analysis reveals a structural separation between two parallel integration traditions: a Cognitive-Formal (C-F) pathway grounded in goal-based frameworks and formal modeling, and a Participatory-Iterative (P-I) pathway grounded in scenario-based frameworks and iterative design. Each pathway has developed complementary strengths, but their near-total disconnection explains the persistent lifecycle concentration and theory-practice gap observed in the corpus.
  The findings identify the absence of translation mechanisms between human-centered artifacts and formal RE specifications as the field's primary structural gap, provide a structured research agenda organized into four priority tiers, and establish the empirical foundation for Experience-Centered Requirements Engineering, a direction in which user experience is explicitly operationalized as a first-class concern in requirements specification.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04132v1</guid>
      <category>cs.SE</category>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Imen Benzarti, Ikram Darif, Abderrahmane Leshob, Hafedh Mili, Darine Amayed</dc:creator>
    </item>
    <item>
      <title>Model synthesis and identifiability analysis of stiff chemical reaction systems with inVAErt networks</title>
      <link>https://arxiv.org/abs/2605.04134</link>
      <description>arXiv:2605.04134v1 Announce Type: new 
Abstract: We consider the problem of learning data-driven replicas for stiff systems of ordinary differential equations arising in chemical kinetics that can be evaluated with high computational efficiency. We first focus on training emulators for families of reaction equations under varying reaction rates, using conditional residual networks or long-short term memory architectures. We then apply a recently proposed data-driven framework known as ``inVAErt networks'' to address the ill-posed inverse problem of inferring reaction rates, integration time, and possibly initial conditions from a target set of species concentrations - a problem that has received relatively little attention in the literature. The proposed approach is demonstrated on chemical systems with reversible and irreversible kinetics, spanning 2 to 20 differential equations, 3 to 20 chemical species, and 3 to 25 reaction rate parameters. Relative root mean squared errors produced by the proposed emulators range from $10^{-5}$ for lower-dimensional systems to $10^{-4}$ and $10^{-3}$ for an air pollution model and a hydrogen-air reaction system, respectively. Manifolds of non-identifiable reaction rates recovered by the proposed approach can be analytically verified for simple systems and are consistent with local identifiability analysis in higher dimensions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04134v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sreejata Dey, Guoxiang Grayson Tong, Jonathan F. MacArt, Daniele E. Schiavazzi</dc:creator>
    </item>
    <item>
      <title>Frontier Lag: A Bibliometric Audit of Capability Misrepresentation in Academic AI Evaluation</title>
      <link>https://arxiv.org/abs/2605.04135</link>
      <description>arXiv:2605.04135v1 Announce Type: new 
Abstract: Readers of applied-domain LLM capability evaluations want to know what AI systems can currently do. That literature answers a related, but consequentially different, question: what older, cheaper, less-elicited models could do months or years earlier (a 2026 paper evaluating GPT-4o-mini zero-shot, say, against a frontier of reasoning-capable, tool-using systems like GPT-5.5 Pro and Claude Opus 4.7), often reported with sparse configuration details and abstracted upward into claims about "AI" that propagate through citations, media, and policy. We measure the 'publication elicitation gap' (the gap between these answers) in a pre-registered audit of 112,303 LLM-keyword-matched candidate records (2022-01 to 2026-04; 18,574 admissible, 4,766 full-paper texts retrievable), comparing tested models to the contemporaneous frontier on the Epoch AI Capabilities Index (ECI), reproduced under Arena Elo and Artificial Analysis.
  The median paper evaluates a model +10.85 ECI (~1.4x the distance between Claude Sonnet 3.7 and Claude Opus 4.5) behind the contemporaneous frontier at evaluation time (H1); an exploratory rational-lag baseline (H8) decomposes this into ~25% peer-review latency, ~75% excess lag. The gap is widening at +5.53 ECI/year (H2; 95% CI [+5.03, +5.83]). Meanwhile, only 3.2% of abstracts (21.2% of full-texts) disclose reasoning-mode status on reasoning-capable models (H4) and 52.5% (95% CI [48.2, 56.9]) state conclusions at the level of "AI" rather than the evaluated model(s), rising at OR = 1.23/year.
  Proposed remedies include API-access subsidies and editorial enforcement of reporting frameworks mandating configuration-surface disclosure (model snapshot, reasoning mode/effort, tool access, scaffolding, prompting, etc.); VERSIO-AI is a 13-item checklist (Core 3 desk-reject) extending existing frameworks at the elicitation surface, with per-DOI analysis at frontierlag.org.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04135v1</guid>
      <category>cs.CY</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>David Gringras, Misha Salahshoor</dc:creator>
    </item>
    <item>
      <title>FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals</title>
      <link>https://arxiv.org/abs/2605.04157</link>
      <description>arXiv:2605.04157v1 Announce Type: new 
Abstract: SemEval-2026 Task 13 investigates machine-generated code detection across multiple programming languages and application scenarios, asking participating systems to generalize to unseen languages and domains. This paper describes our participation in Subtask A (binary classification) and explores both pretrained code encoders and lightweight feature-based methods. We design ratio-based features that are less sensitive to snippet length. To support the extraction of descriptiveness-related signals, we use parsing engines and a programming-language classifier. Additionally, we train a separate code-vs-text line classifier to identify raw natural language segments embedded within samples. We combine a shallow decision tree with heuristic rules derived from data analysis to produce the final predictions. Our approach is computationally efficient, requires only CPU resources for training, and achieves near-instant inference time, offering a lightweight alternative to large pretrained models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04157v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Elitsa Yotkova, Violeta Kastreva, Dimitar Dimitrov, Ivan Koychev, Preslav Nakov</dc:creator>
    </item>
    <item>
      <title>Enabling Real-Time Training of a Wildfire-to-Smoke Map with Multilinear Operators</title>
      <link>https://arxiv.org/abs/2605.04164</link>
      <description>arXiv:2605.04164v1 Announce Type: new 
Abstract: Wildfires are a major producer of fine particulate matter, impacting human health and the electrical grid. Accurately forecasting smoke impacts over long time scales incorporates fuel treatment strategies, natural fuel succession, and stochastic events like lightning strikes. However, predicting smoke for each fuel distribution with a forward simulation of a coupled fire-atmosphere model is computationally infeasible. Moreover, relatively simple fire models are tractable to run in many long-time scenarios but do not capture smoke transport. We use data-driven multilinear operators to predict a smoke concentration field from knowledge of the time since ignition for two quantities of interest: aerosol optical depth and smoke detection. Our method first computes the principal components of time-since-ignition and smoke concentration fields and then learns a map from powers of the input coefficients to the output coefficients. We apply our learned operator to smoke prediction in the Upper Rio Grande Watershed. After collecting training data, learning the approximation weights on a CPU takes less than 30 seconds, and each forward call takes less than 1 ms. On a proxy for aerosol optical depth, we obtain equal accuracy to Monte Carlo sampling with fewer than half as many coupled model calls. For smoke detection, we obtain an intersection-over-union (IoU) of 65% and an area under the receiver operating characteristic curve (AUC) of 0.95 on holdout data. Our method is significantly more accurate than the most similar published smoke classifier, which obtains an IoU and AUC of 0.15 and 0.61, respectively, on a 2015 bushfire in Australia.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04164v1</guid>
      <category>cs.LG</category>
      <category>physics.ao-ph</category>
      <category>physics.comp-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zachary Morrow, Joseph Crockett, John D. Jakeman, Dan J. Krofcheck</dc:creator>
    </item>
    <item>
      <title>FlowEval: Reference-based Evaluation of Generated User Interfaces</title>
      <link>https://arxiv.org/abs/2605.04165</link>
      <description>arXiv:2605.04165v1 Announce Type: new 
Abstract: While large language models (LLMs) and coding agents are often applied to user interface (UI) development, developers find it difficult to reliably assess their proficiency in visual and interaction design. Existing evaluations either rely on human experts, who can accurately assess usability by testing critical flows but are slow and costly, or on automated judges, which are scalable but less accurate and opaque. We present FlowEval, a reference-based framework that measures whether a generated UI supports realistic interaction flows by comparing navigation traces from real websites to traces from generated analogs using reference-based similarity metrics (e.g., dynamic time warping). In a small-scale study with expert UI evaluators, we show that reference-based metrics strongly correlate with human judgments, suggesting that they can provide scalable yet trustworthy evaluation for UI generation systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04165v1</guid>
      <category>cs.MA</category>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Jason Wu, Priyan Vaithilingam, Eldon Schoop, Jeffrey Nichols, Titus Barik</dc:creator>
    </item>
    <item>
      <title>Actionable Real-Time Modeling of Surgical Team Dynamics via Time-Expanded Interaction Graphs</title>
      <link>https://arxiv.org/abs/2605.04169</link>
      <description>arXiv:2605.04169v1 Announce Type: new 
Abstract: Surgical team performance arises from complex interactions between technical execution and non-technical skills, including communication and coordination dynamics. However, current surgical AI systems predominantly model visual workflow signals, lacking structured representations of intraoperative team interactions over time. We propose a real-time actionable approach for modeling surgical team dynamics using time-expanded interaction graphs, where team members are modeled as time-indexed nodes and communication exchanges define directed edges. This spatio-temporal expansion enables dynamic interaction modeling, while allowing efficient inference with a static graph neural network. The model predicts procedural efficiency as the deviation from the expected duration and supports real-time deployment. Beyond prediction, we perform a counterfactual analysis to identify minimal changes in communication structure and interpretable behavioral variables associated with improved predicted outcomes. Experiments on recorded surgical procedures show that structured modeling of team interactions improves early identification of prolonged interventions and provides coherent, actionable explanations. This work advances surgical AI toward real-time, team-aware, and actionable decision support in the operating room.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04169v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vincenzo Marco De Luca, Antonio Longa, Giovanna Varni, Andrea Passerini</dc:creator>
    </item>
    <item>
      <title>Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing</title>
      <link>https://arxiv.org/abs/2605.04171</link>
      <description>arXiv:2605.04171v1 Announce Type: new 
Abstract: Large Language models (LLMs) show extraordinary abilities, but they are still prone to hallucinations, especially when we use them for generating Academic content. We have investigated four popular LLMs, ChatGPT, Grok, Gemini, and Copilot for hallucinations specifically for academic writing. We have designed 80 prompts across four categories, namely, reference generation, factual explanation, abstract generation, and writing improvement. We evaluated the model using a 0-5 rubric score, which checks factual accuracy, reference validity, coherence, style consistency, and academic tone. A novel weighted metric, Hallucination Index (HI), was introduced to measure hallucination in the responses generated by the models. Some of the most widely used evaluation metrics often fail to check errors which alter sentiment in machine-translated text. We found that Grok and Copilot perform better on reference generation tasks, but they often struggle with abstract or stylistic prompts, with HI values of 0.67 and 0.70, respectively. Whereas, Gemini and ChatGPT have done well with having stronger tone control, but they lack in writing factual tasks and higher hallucination risk with HI scores of 0.53 and 0.57, respectively. Our study found that hallucination behavior does not depend solely on model architecture but also on the type of task and the prompting conditions we are providing. We propose that our work opens new research dimensions for future researchers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04171v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Humam Khan, Md Tabrez Nafis, Shahab Saquib Sohail, Aqeel Khalique, Rehan Hasan Khan</dc:creator>
    </item>
    <item>
      <title>t\"{a}k\={o}Formal: Enabling Robust Software for Programmable Memory Hierarchies (Extended Version)</title>
      <link>https://arxiv.org/abs/2605.04172</link>
      <description>arXiv:2605.04172v1 Announce Type: new 
Abstract: Accelerators provide large performance and energy-efficiency benefits, but can significantly change the hardware-software interface. The t\"{a}k\={o} programmable memory hierarchy accelerates data movement by enabling programmers to run user-defined callback functions triggered by cache misses, evictions, and writebacks. However, it also leads to drastically increased complexity and counterintuitive outcomes. In response, we develop an ISA-level memory consistency model (MCM) for t\"{a}k\={o} that captures the semantics of its operation, and we show how it enables programmers to formally reason about their t\"{a}k\={o} programs. We also prove the soundness of this ISA-level MCM by constructing a detailed t\"{a}k\={o} implementation model and verifying that all executions of the implementation model are allowed by our ISA-level MCM. Along the way, we discover useful insights about microarchitectural modeling and verification that are applicable to hardware in general.
  This is the extended version of the ISCA 2026 paper "t\"{a}k\={o}Formal: Enabling Robust Software for Programmable Memory Hierarchies". This version adds material on additional litmus tests to Section V to further explore the programmability of t\"{a}k\={o} using our ISA-level MCM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04172v1</guid>
      <category>cs.AR</category>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Pranav Srinivasan, Manos Kapritsos, Yatin A. Manerkar</dc:creator>
    </item>
    <item>
      <title>A Provably Convergent and Practical Algorithm for Gromov--Wasserstein Optimal Transport</title>
      <link>https://arxiv.org/abs/2605.04175</link>
      <description>arXiv:2605.04175v1 Announce Type: new 
Abstract: Gromov--Wasserstein optimal transport (GWOT) aligns metric measure spaces by matching their within-domain relational structures, but large-scale GWOT remains challenging because its objective is nonconvex and projection onto the transport polytope is often solved only approximately in practice. This leads to a gap between practical projected-gradient implementations and convergence theory, which typically assumes exact projections. For squared-loss GWOT, we propose an inexact projected-gradient framework with a verifiable feasibility-residual-based inexact condition for the projection subproblem. This condition is directly computable and avoids unknown quantities such as the exact projection point. Under this implementable condition, we prove subsequential convergence to stationary points and, with a mild tolerance-decay condition, convergence of the whole sequence. The resulting method retains the simplicity and sparsity of projected-gradient schemes while providing rigorous convergence guarantees, turning projected-gradient methods into a principled and scalable approach for GWOT with provable reliability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04175v1</guid>
      <category>cs.LG</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ling Liang, Lei Yang</dc:creator>
    </item>
    <item>
      <title>Are LLMs Ready for Conflict Monitoring? Empirical Evidence from West Africa</title>
      <link>https://arxiv.org/abs/2605.04177</link>
      <description>arXiv:2605.04177v1 Announce Type: new 
Abstract: As LLMs enter conflict monitoring, understanding systematic distortions in their outputs is critical for humanitarian accountability. We evaluate four vanilla open-weight models Gemma 3 4B, Llama 3.2 3B, Mistral 7B, and OLMo 2 7B and two domain-adapted models, AfroConfliBERT and AfroConfliLLAMA, on Nigeria and Cameroon conflict-event classification against ACLED, a gold-standard dataset with multi-stage verification. We find a bifurcated divergence in normative directionality. Open-weight models exhibit statistically significant False Illegitimation bias: Gemma misclassifies to 18.29% of legitimate battles as civilian-targeted violence while making zero False Legitimation errors. By contrast, AfroConfliBERT and AfroConfliLLAMA achieve near-directional neutrality, with Legitimization Bias differences indistinguishable from zero. Yet domain adaptation does not eliminate actor-based selection bias. Both adapted models show statistically significant actor bias comparable to vanilla LLMs; in Nigeria, state actors are legitimized 36.5% more often than non-state actors in identical tactical contexts. Open-weight outputs are also fragile to geography-specific lexical framing: delegitimizing phrases produce flip rates up to 66.7% in Cameroon and 34.2% in Nigeria, while perturbations salient in one context may not matter in another. Error trace profiling shows models mask normative bias through unfaithful rationale confabulations. In contrast, AfroConfliBERT and AfroConfliLLAMA are largely robust, with near-zero flip rates across perturbation categories. Overall, current models are not ready for unsupervised deployment in conflict monitoring. We call for fairness-aware fine-tuning to reduce actor-based selection bias, mandatory adversarial robustness evaluation against lexical manipulation, and context-specific human-in-the-loop oversight calibrated to regional difficulty.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04177v1</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3805689.3812264</arxiv:DOI>
      <dc:creator>Hoffmann Muki, Olukunle Owolabi</dc:creator>
    </item>
    <item>
      <title>Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures</title>
      <link>https://arxiv.org/abs/2605.04178</link>
      <description>arXiv:2605.04178v1 Announce Type: new 
Abstract: Rapidly evolving GPU architectures featuring complex memory hierarchies, matrix units, and varied precision formats continue to widen the gap between theoretical peaks and achievable performance. We design and develop analytical performance models for NVIDIA Blackwell (B200) and AMD CDNA3 (MI300A) grounded in systematic microbenchmark characterization. For Blackwell, the model captures Tensor Memory (TMEM), asynchronous bulk copy (TMA), and 5th-generation tensor cores; for CDNA3, the model captures Infinity Cache hierarchy, VGPR constraints, and occupancy. Validation yields 1.31% MAE on B200 (21 kernels) and 0.09% on MI300A (27 kernels), while naive roofline baselines exceed 95% error on the same kernels. We further validate the models using Rodinia~3.1 and SPEChpc 2021 Tiny.The models are updated with HBM bandwidth, capacity, and cache parameters and applied to H200 (Hopper) and MI250X (CDNA2), indicating no major restructuring of the models are needed. All models and benchmarks will be released as open-source upon acceptance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04178v1</guid>
      <category>cs.DC</category>
      <category>cs.AR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aaron Jarmusch, Sunita Chandrasekaran</dc:creator>
    </item>
    <item>
      <title>MedFabric and EtHER: A Data-Centric Framework for Word-Level Fabrication Generation and Detection in Medical LLMs</title>
      <link>https://arxiv.org/abs/2605.04180</link>
      <description>arXiv:2605.04180v1 Announce Type: new 
Abstract: Large Language Models exhibit strong reasoning and semantic understanding capabilities but often hallucinate in domains that require expert knowledge, among which fabrications, the generation of factually incorrect yet fluent statements, pose the greatest risk in medical contexts. Existing medical hallucination datasets inadequately capture fabrication phenomena due to limited fabrication coverage, stylistic disparities between human and LLM-authored texts, and distributional drift during hallucinated sample synthesis. To address this, we propose a data-centric pipeline to generate realistic and word-level fabrications that preserve syntactic and stylistic fidelity while introducing subtle factual deviations, resulting in MedFabric. Building upon this dataset, we introduce ETHER, a modular word-level fabrication detector integrating Text2Table Decomposition, Word Masking and Filling and Hybrid Sentence Pair Evaluation to enhance factual alignment. Empirical results demonstrate that MedFabric outperforms state-of-the-art detectors by over 15% on word-level fabrication benchmarks while maintaining consistent performance across structural similarities, offering a comprehensive framework for reliable and domain-specific factuality detection.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04180v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tung Sum Thomas Kwok, Qian Qian, Xiaofeng Lin, Dongxu Zhang, Jun Han, Zhichao Yang, Davin Hill, Tamer Soliman, Sanjit Singh Batra, Robert Tillman, Guang Cheng</dc:creator>
    </item>
    <item>
      <title>Nearly-Tight Bounds for Zonotope Containment and Beyond</title>
      <link>https://arxiv.org/abs/2605.04183</link>
      <description>arXiv:2605.04183v1 Announce Type: new 
Abstract: We investigate the convex-body containment problem $\max\{s &gt;0 : s Z \subseteq Q\}$, where the outer body $Q \subseteq \mathbb R^d$ is described by a membership oracle and the inner body $Z \subseteq \mathbb R^d$ is a zonotope. Our main result is a sampling-based $O(\sqrt{d})$-approximation algorithm for this problem that almost matches the lower bound of $\Omega(\sqrt{d/\log d})$ by Khot and Naor in the oracle model. Assuming zonotopes can be sparsified by a linear number of generators, which is referred to as Talagrand conjecture, our approach attains the optimal approximation factor of $\Theta(\sqrt{d/\log d})$. Our second main result is a proof of Talagrand's conjecture for $\Delta$-modular zonotopes whenever $\Delta$ is constant. Those zonotopes are of the form $Z = \{ Wx \colon \| x\|_\infty \leq 1\}$ where the non-zero $d \times d$ sub-determinants of $W$ are between $1$ and $\Delta$. This result establishes a connection between zonoid sparsification and spectral sparsification of Batson, Spielman and Srivastava. We complement these results with a universal $\Omega(\sqrt{d/\log d})$ lower bound holding for all zonotopes.
  Finally, we consider containment problems $\max\{s &gt;0 : s K \subseteq Q\}$, for general convex bodies $K \subseteq \mathbb R^d$. A result of Nasz\'odi on approximating $K \subseteq \mathbb R^d$ by a polytope implies a $\Theta(d/\log d)$ approximation algorithm in polynomial time. We show the tightness of this approximation factor in the oracle model via a reduction to the circumradius computation. Our lower bound holds for centrally symmetric convex sets, implying that Barvinok's optimal $O(\sqrt{d})$-approximation of a centrally symmetric convex body by a polytope with a polynomial number of vertices cannot be computed in polynomial time.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04183v1</guid>
      <category>cs.DS</category>
      <category>math.MG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Friedrich Eisenbrand, Thomas Rothvoss, Matteo Russo, Ruben Skorupinski</dc:creator>
    </item>
    <item>
      <title>Constraint-Enhanced Reinforcement Learning Based on Dynamic Decoupled Spherical Radial Squashing</title>
      <link>https://arxiv.org/abs/2605.04185</link>
      <description>arXiv:2605.04185v1 Announce Type: new 
Abstract: When deploying reinforcement learning policies to physical robots, actuator rate constraints -- hard limits on how fast each joint can move per control step -- are unavoidable. These limits vary substantially across joints due to differences in motor inertia, power bandwidth, and transmission stiffness, creating pronounced heterogeneity that existing methods fail to handle geometrically: the per-joint feasible region forms a high-dimensional box in action-increment space, yet QP projection and spherical parameterization methods impose isotropic ball-shaped constraints, exponentially under-covering the true feasible set as heterogeneity grows. This paper proposes Dynamic Decoupled Spherical Radial Squashing (DD-SRad), which resolves this mismatch by computing a position-adaptive radius independently for each actuator, achieving tight alignment with the true per-joint feasible region. DD-SRad satisfies per-step hard constraints with probability~1, preserves well-conditioned gradients throughout training, and admits exact policy gradient backpropagation with zero runtime solver overhead. MuJoCo benchmark experiments demonstrate the highest task return at zero constraint violation -- matching the unconstrained upper bound -- with 30%--50% improvement in constraint-space coverage over spherical baselines. High-fidelity IsaacLab simulations with Unitree H1 and G1 humanoid robots confirm end-to-end optimality parameterized directly from official joint specifications, validating a systematic pathway from hardware datasheets to safe deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04185v1</guid>
      <category>cs.LG</category>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Qijun Liao, Zhaoxin Yu, Jue Yang</dc:creator>
    </item>
    <item>
      <title>A Multi-Agent Consensus Protocol for Stable Software Remodularization</title>
      <link>https://arxiv.org/abs/2605.04188</link>
      <description>arXiv:2605.04188v1 Announce Type: new 
Abstract: Automatic software remodularisation is typically cast as a single-objective optimization problem. While recent metaheuristics have improved search efficiency, real-world architecture recovery must reconcile the conflicting attributes of structural cohesion and evolutionary stability. We reframe software module clustering as a distributed consensus problem among autonomous agents. We introduce an Asymmetric Monotonic Concession Protocol (AMCP) that enables agents to negotiate decompositions that respect multi-attribute utility thresholds. We formally prove the protocol's termination, its bounded concession behaviour consistent with the Zeuthen Strategy under closed-instance conditions, and the local Pareto-satisfactoriness of the resulting partitions. Preliminary experiments on a synthetic benchmark and the Xwork Java framework confirm that our negotiated consensus matches state-of-the-art optimizers when stability budgets are loose, while acting as a "circuit breaker" to enforce strict stability constraints. Extended results on ten further systems, including comparisons with multi-objective evolutionary algorithms and multi-version chains, will be reported in a forthcoming full paper.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04188v1</guid>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ahmed F. Ibrahim</dc:creator>
    </item>
    <item>
      <title>Exploring the Output of Software Testing Tools through a Visual Comparative Analysis</title>
      <link>https://arxiv.org/abs/2605.04189</link>
      <description>arXiv:2605.04189v1 Announce Type: new 
Abstract: Software testing is a fundamental process of software development, and prior work has shown that visualizations of test results support testers' decision-making. However, Human-Computer Interaction research on software testing has yet to explore and understand the shared interface elements and patterns in visualization of testing outputs. To address this, we conducted a visual comparative analysis of the output of 50 software testing tools and harnesses (44 with CLI output, 6 with GUI output) across four popular programming languages. Our analysis reveals the common interface elements in software testing tools, how these tools display and visualize test results, as well as the specific make-up of the output. Our findings provide insight on how visual testing output is formatted and how colour is used across both CLI and GUI environments, identifying trends that can be applied by developers of testing tools.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04189v1</guid>
      <category>cs.HC</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Brandon Lit, Anthony Maocheia-Ricci, Thomas Driscoll</dc:creator>
    </item>
    <item>
      <title>ANDRE: An Attention-based Neuro-symbolic Differentiable Rule Extractor</title>
      <link>https://arxiv.org/abs/2605.04193</link>
      <description>arXiv:2605.04193v1 Announce Type: new 
Abstract: Inductive Logic Programming (ILP) aims to learn interpretable first-order rules from data, but existing symbolic and neuro-symbolic approaches struggle to scale to noisy and probabilistic settings. Classical ILP relies on discrete combinatorial rule search and is brittle under uncertainty, while differentiable ILP methods typically depend on predefined rule templates or inaccurate fuzzy operators that suffer from vanishing gradients or poor approximation of logical structure when reasoning over probabilistic predicate valuations. This paper proposes an Attention-based Neuro-symbolic Differentiable Rule Extractor (ANDRE), a novel ILP framework that learns first-order logic programs by optimizing over a continuous rule space with attention-based logical operators. ANDRE replaces both rule templates and logical operators with fully differentiable, attention-driven conjunction and disjunction operators that approximate logical min-max semantics, enabling accurate, stable, and interpretable reasoning over probabilistic data. By softly selecting, negating, or excluding predicates within each rule, ANDRE supports flexible rule induction while preserving symbolic structure. Extensive experiments on classical ILP benchmarks, large-scale knowledge bases, and synthetic datasets with probabilistic predicates and noisy supervision demonstrate that ANDRE achieves competitive or superior predictive performance while reliably recovering correct symbolic rules under uncertainty. In particular, ANDRE remains robust to moderate label noise, substantially outperforming existing differentiable ILP methods in both rule extraction quality and stability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04193v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Iman Sharifi, Peng Wei, Saber Fallah</dc:creator>
    </item>
    <item>
      <title>Coupled-NeuralHP: Directional Temporal Coupling Between AI Innovation Exposure and Public Response</title>
      <link>https://arxiv.org/abs/2605.04194</link>
      <description>arXiv:2605.04194v1 Announce Type: new 
Abstract: Artificial intelligence innovation exposure and public response co-evolve, but innovation arrives as irregular event streams while response is observed monthly. We introduce Coupled-NeuralHP, a hybrid event-plus-state model linking eight-domain USPTO AI patent publication streams to a train-only Google Trends response index. Under the cleaned response protocol, the validation-selected one-way real-data variant gives the best held-out innovation count forecasts in the registered comparison set (pseudo-log-likelihood -30.4 vs. -34.7; root mean squared error (RMSE) 471 vs. 532) while matching the stronger multi-lag factor-family baseline on response RMSE (0.295). Ablations show that the real-data response signal is carried mainly by the structured forecast head, whereas the reverse response-to-innovation block is not supported on held-out count prediction. Across 60 semi-synthetic replications with known structure, the broader coupled family recovers innovation-to-response links much better than vector autoregression with exogenous inputs (VARX) (F1 = 0.734 vs. 0.386). A placebo-controlled 2022 split-date analysis finds no robust milestone-specific regime break.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04194v1</guid>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Amir Rafe, Subasish Das</dc:creator>
    </item>
    <item>
      <title>The Impact of Vocabulary Overlaps on Knowledge Transfer in Multilingual Machine Translation</title>
      <link>https://arxiv.org/abs/2605.04196</link>
      <description>arXiv:2605.04196v1 Announce Type: new 
Abstract: Knowledge transfer, especially across related languages, has been found beneficial for multilingual neural machine translation (MNMT), but some aspects are still under-explored and deserve further investigation. A joint vocabulary is most often applied to form a uniform word embedding space, but since the impact of a disjoint vocabulary on model performance is far less studied, there is no consensus on how much knowledge transfer is mainly due to vocabulary overlap. In this paper, we present systematic experiments with joint and disjoint vocabularies, and auxiliary languages related and unrelated to the source language. We design this experiment in an out-of-domain setup in order to emphasize transfer and the impact of the auxiliary language. As expected, we yield better results with more extensive vocabulary overlaps typical for related languages, but our experiments also show that domain-match and language relatedness are more important than a joint vocabulary.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04196v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Oona Itkonen, J\"org Tiedemann</dc:creator>
    </item>
    <item>
      <title>Deep Wave Network for Modeling Multi-Scale Physical Dynamics</title>
      <link>https://arxiv.org/abs/2605.04198</link>
      <description>arXiv:2605.04198v1 Announce Type: new 
Abstract: Performance of deep learning models is strongly governed by architectural capacity, with width and depth as primary controls. However, in physical-science applications, models are often compared at a single fixed size or by separating accuracy and computational cost, which can be misleading since architectures exhibit different accuracy-cost scaling as width and depth vary. This issue is particularly relevant for U-Net-type encoder-decoder models, widely used for multi-scale gas, fluid, and plasma dynamics due to their ability to represent features across spatial scales. A U-Net constructs a multi-resolution representation via an encoder that progressively reduces spatial resolution, followed by a decoder that restores it for prediction. Skip connections link corresponding encoder and decoder features, preserving fine-scale information and improving optimization. In practice, U-Net width is routinely tuned, while depth is typically kept fixed (a set number of down/up-sampling stages with few convolutions per stage), limiting systematic exploration of depth for improving the accuracy-cost trade-off. We address this limitation by increasing effective depth through stacking multiple encoder-decoder "waves" in series, with skip connections both within and across waves to enable progressive cross-scale refinement. We call this architecture a Deep Wave Network (DW-Net). Training data, optimization, and schedules are kept identical across models. Instead of evaluating single configurations, we train multiple width variants of each architecture and compare accuracy vs. GPU time Pareto fronts. Across several 2D and 3D flow benchmarks, DW-Net models consistently improve the Pareto frontier over single-wave U-Nets, achieving higher accuracy at matched cost or similar accuracy at reduced cost, and reaching low-error regimes with up to 3x less training time under identical training settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04198v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>physics.comp-ph</category>
      <category>physics.flu-dyn</category>
      <category>physics.plasm-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Alexander I. Khrabry, Edward A. Startsev, Andrew T. Powis, Igor D. Kaganovich</dc:creator>
    </item>
    <item>
      <title>Topology-Constrained Quantized nnUNet for Efficient and Anatomically Accurate 3D Tooth Segmentation</title>
      <link>https://arxiv.org/abs/2605.04201</link>
      <description>arXiv:2605.04201v1 Announce Type: new 
Abstract: We propose a topology-constrained quantized nnUNet framework for efficient and anatomically accurate 3D tooth segmentation, addressing the challenges of spatial distortion introduced by quantization in deep learning models. The proposed method integrates a novel tooth-specific topological loss into quantization-aware training, preserving critical anatomical structures such as tooth count, adjacency relationships, and cavity integrity while maintaining computational efficiency. The system employs an 8-bit quantized nnUNet backbone, where weights and activations are dynamically calibrated to minimize precision loss during inference. Furthermore, the topological loss combines connected-component analysis, adjacency consistency, and hole detection penalties, ensuring anatomical fidelity without modifying the underlying network architecture. The joint optimization objective harmonizes cross-entropy loss, quantization regularization, and topological constraints, enabling end-to-end training with gradient approximations for persistent homology terms. Experiments demonstrate that our approach significantly reduces topological errors compared to conventional quantized models, achieving clinically plausible segmentations on dental CBCT scans. The method retains the hardware efficiency of integer-only inference, making it suitable for deployment in resource-constrained clinical environments. This work bridges the gap between computational efficiency and anatomical precision in medical image segmentation, offering a practical solution for real-world dental applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04201v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Paarth Prasad, Ruchika Malhotra</dc:creator>
    </item>
    <item>
      <title>Sequential Strategic Classification with Multi-Stage Selective Classifiers</title>
      <link>https://arxiv.org/abs/2605.04202</link>
      <description>arXiv:2605.04202v1 Announce Type: new 
Abstract: Strategic classification studies the problem where self-interested individuals or agents manipulate their response to obtain favorable decision outcomes made by classifiers, typically turning to dishonest actions when they are less costly than genuine efforts. Prior works have demonstrated a fundamental inability to get out of this conundrum by only focusing on the design of a classifier. We note that prior work also heavily focuses on either one-shot settings or repeated interaction with the same classifier. Real-world decision making is often multi-stage, involving a sequence of potentially different classifiers as an agent progresses. This paper introduces a sequential, stochastic, multi-stage model of strategic classification, by capturing how agents adapt their behavior, through improvement actions (enhancing both observable features and true attributes) and gaming actions (enhancing only observable features), over multiple levels of classification with increasing difficulty as well as reward. For each level, we adopt a selective classifier that can abstain from making a prediction at low confidence. Consequently, a positive (resp. negative) outcome leads to promotion (resp. demotion) of the agent to the next higher (resp. lower) level, while abstention keeps the agent at the same level. We fully characterize the agent's optimal instantaneous action under selective classifiers and compare the long-term properties and utility of the agent repeatedly following an optimal myopic policy of either no-improvement (never choose the improvement action) or no-gaming (never choose the gaming action). We further examine design principles over the sequence of classifiers that yield higher long-term utility for the latter policy, thereby effectively incentivizing genuine effort in the long run.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04202v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ziyuan Huang, Lina Alkarmi, Mingyan Liu</dc:creator>
    </item>
    <item>
      <title>Symmetry-induced quantum-inspired parallelism of classical dynamic systems</title>
      <link>https://arxiv.org/abs/2605.04204</link>
      <description>arXiv:2605.04204v1 Announce Type: new 
Abstract: Performing multiple computations within the same system,
  without spatial or temporal separation of tasks, requires encoding
  multiple data items into a well-defined physical state. The most widely
  explored mechanism for such encoding is the superposition of physical
  states representing computational states. However, superposition requires
  the system to be linear, which significantly limits the set of
  achievable operations. We show that system symmetries provide an
  alternative mechanism for encoding multiple computational states.
  Notably, this mechanism also applies to nonlinear systems and therefore
  does not impose inherent limits on computed functions.
  Using the evaluation of Boolean functions as an example, we show that a
  relaxed spin network driven by the V-2 model supports this
  mechanism. We relate the resulting simultaneous computations enabled by
  symmetry-induced parallelism to properties of the evaluated functions.
  We demonstrate symmetry-induced parallelism for a logical AND/OR
  gate and an N-bit adder.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04204v1</guid>
      <category>cs.ET</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mikhail Erementchouk, Pinaki Mazumder</dc:creator>
    </item>
    <item>
      <title>Climate-based Pre-screening of Self-sustaining Regreening Opportunities in Drylands: A Case Study for Saudi Arabia</title>
      <link>https://arxiv.org/abs/2605.04206</link>
      <description>arXiv:2605.04206v1 Announce Type: new 
Abstract: Large-scale restoration in drylands is widely promoted to address land degradation and biodiversity loss, yet many efforts rely on long-term irrigation, limiting sustainability in water-scarce regions. A key challenge is identifying locations where native vegetation can persist without intensive management while minimizing costly field campaigns. A scalable pre-screening framework is presented that integrates climate and remote sensing data to enable cost-efficient site selection in arid environments using Saudi Arabia as a case study. A Climate Suitability Score (CSS), derived from machine learning models trained on expert-curated reference sites, captures complex climatic dependencies on vegetation persistence. Using multi-year ERA5-Land data for Saudi Arabia, national-scale prediction maps are generated and combined with vegetation indices to identify areas where climate is favorable, but vegetation remains underdeveloped. Multi-criteria screening reduces candidates to thirteen priority locations. Climatically analogous intact ecosystems provide benchmarks for restoration targets and indicate that an average 2.5 fold increase in vegetation coverage is a realistic target for restoration efforts. Overall, this approach narrows the search space, reduces costs, and supports resilient ecosystem recovery planning in water-limited regions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04206v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Katja Froehlich, Jonathan Klein, Ibrahim S. Elbasyoni, Julian D. Hunt, Yoshihide Wada, Dominik L. Michels</dc:creator>
    </item>
    <item>
      <title>Nsanku: Evaluating Zero-Shot Translation Performance of LLMs for Ghanaian Languages</title>
      <link>https://arxiv.org/abs/2605.04208</link>
      <description>arXiv:2605.04208v1 Announce Type: new 
Abstract: Large language models (LLMs) have demonstrated impressive multilingual capabilities for well-resourced languages, yet their performance on low-resource African languages remains poorly understood and largely unevaluated. This paper presents Nsanku, a systematic benchmark that evaluates the zero-shot machine translation performance of 19 open-weight and proprietary LLMs across 43 Ghanaian languages paired with English. Evaluation sentences were sourced from the YouVersion Bible platform, providing 300 sentence pairs per language. Two complementary automatic metrics are employed: Bilingual Evaluation Understudy (BLEU) and Character n-gram F-Score (chrF), alongside an average accuracy score and a cross-language consistency dimension. Nsanku represents the most comprehensive LLM translation evaluation for Ghanaian languages conducted to date. Results show that gemini-2.5-flash achieves the highest overall average score of 26.88 (BLEU: 24.60, chrF: 29.16), followed by claude-sonnet-4-5 at 24.87 (BLEU: 22.46, chrF: 27.28) and gpt-4.1 at 23.20 (BLEU: 21.15, chrF: 25.24). Among open-weight models, kimi-k2-instruct-0905 leads at an average score of 20.87. A critical finding from the consistency analysis is that no model and no language reached the Leaders quadrant of high performance and high consistency simultaneously, indicating that current LLMs are not yet reliably usable for Ghanaian language translation at scale. Siwu achieved the highest per-language average score at 25.73 while Nkonya scored lowest at 11.65. Nsanku establishes a publicly available, community-extensible evaluation infrastructure for African language NLP research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04208v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Stephen E. Moore, Mich-Seth Owusu, Akwasi Asare, Lawrence Adu Gyamfi, Paul Azunre, Joel Budu, Jonathan Asiamah, Elias Dzobo, Kelvin Newman, Edmund O. Benefo, Gerhardt Datsomor, Onesimus Addo Appiah, Ama Branoa Banful, Lucas Woedem Kpatah, Saani Mustapha Deishini, John Ayernor</dc:creator>
    </item>
    <item>
      <title>Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions</title>
      <link>https://arxiv.org/abs/2605.04209</link>
      <description>arXiv:2605.04209v1 Announce Type: new 
Abstract: We present Sparse Backdoor, a supply-chain attack that plants a \emph{provably undetectable} backdoor in pre-trained image classifiers, including convolutional networks and Vision Transformers. The attack injects a structured sparse perturbation along a randomly chosen direction into a small subset of columns at each fully connected layer, propagating a trigger signal to an adversary-chosen target class, and masks the perturbation with an independent isotropic Gaussian dither. The dither serves a single technical purpose: it induces a clean reference distribution anchored at the pre-trained weights, against which undetectability can be formalized. Under a mild margin condition on the pre-trained classifier, we show that the dithered reference is functionally equivalent to the original classifier. We prove that distinguishing the backdoor-injected model from this reference is at least as hard as Sparse PCA detection, which is computationally infeasible under standard hardness assumptions. The guarantee holds against any probabilistic polynomial-time distinguisher with white-box access to the parameters.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04209v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sarthak Choudhary, Atharv Singh Patlan, Nils Palumbo, Ashish Hooda, Kassem Fawaz, Somesh Jha</dc:creator>
    </item>
    <item>
      <title>The Anatomy of Silent Data Corruption: GPU Error Pattern Study and Modeling Guidance</title>
      <link>https://arxiv.org/abs/2605.04213</link>
      <description>arXiv:2605.04213v1 Announce Type: new 
Abstract: Silent data corruption (SDC) threatens the reliability of large-scale GPU clusters used for training large language models, yet its rarity and lack of explicit error signals make accurate high-level modeling challenging. To address this gap, we conducted a large-scale gate-level stuck-at fault injection on a production-class data-center GPU, consuming over three million simulator hours across 63 CUDA micro-benchmarks. We extracted GPU SDC characteristics in terms of corruption types, bit-flip behavior, and warp-aligned spatial correlation. Our results show that NaN/+INF/-INF account for only 1.01% of SDC outcomes, that single-bit flips constitute less than 40% of bit-flip events, and that corruption addresses exhibit periodicity. These statistics motivate distribution-aware high-level fault modeling and realistic software-based fault injection for resilience evaluation of production-class GPU architectures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04213v1</guid>
      <category>cs.AR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chung-Hsuan Tung, Yanxiang Huang, Nirmal Saxena, Philip Shirvani, Saurabh Hukerikar, Twinkle Jain, Abhishek Tyagi, Sanjay Gongalore</dc:creator>
    </item>
    <item>
      <title>Predict-then-Diffuse: Adaptive Response Length for Compute-Budgeted Inference in Diffusion LLMs</title>
      <link>https://arxiv.org/abs/2605.04215</link>
      <description>arXiv:2605.04215v1 Announce Type: new 
Abstract: Diffusion-based Large Language Models (D-LLMs) represent a promising frontier in generative AI, offering fully parallel token generation that can lead to significant throughput advantages and superior GPU utilization over traditional autoregressive paradigm. However, this parallelism is constrained by the requirement of a fixed-size response length prior to generation. This architectural limitation imposes a severe trade-off: oversized response length results in computational waste on semantically meaningless padding tokens, while undersized response length cause output truncation requiring costly re-computations that introduce unpredictable latency spikes. To tackle this issue, we propose Predict-then-Diffuse, a simple and model-agnostic framework, that enables compute-budgeted inference per input query by first estimating the response length and then using it to run inference with D-LLM. At its core lies a Adaptive Response Length Predictor (AdaRLP) auxiliary predictor that predicts the optimal response length given an input query. As a measure against under-predicting the response length and re-running inference with a higher response length, we introduce a data-driven safety mechanism, which trades a negligible padding overhead. As a whole, our framework limits the significant waste of computation on padding tokens and preserves output quality. Experimental validation on multiple datasets demonstrate that Predict-then-Diffuse significantly reduces computational costs (FLOP) compared to the default D-LLM inference mechanism and baselines based on heuristics, while being robust to skewed data distributions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04215v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Michael Rottoli, Subhankar Roy, Stefano Paraboschi</dc:creator>
    </item>
    <item>
      <title>Jordan-RoPE: Non-Semisimple Relative Positional Encoding via Complex Jordan Blocks</title>
      <link>https://arxiv.org/abs/2605.04217</link>
      <description>arXiv:2605.04217v1 Announce Type: new 
Abstract: Relative positional encodings determine which functions of query-key lag can enter the primitive attention logit. RoPE supplies a rotary phase, while ALiBi supplies an additive distance bias. Motivated by group-theoretic views of linear translation-invariant positional encodings, we study a non-semisimple case in which a complex rotary eigenvalue and a nilpotent response live in the same defective Jordan block. The resulting relative operator generates oscillatory-polynomial features such as $e^{-\gamma d}\cos(\omega d)$, $e^{-\gamma d}\sin(\omega d)$, $d e^{-\gamma d}\cos(\omega d)$, and $d e^{-\gamma d}\sin(\omega d)$, for causal lag $d=i-j\geq 0$. Thus the construction realizes a distance-modulated phase basis $d e^{i\omega d}$, rather than merely adding a separate distance channel to RoPE.
  We formulate Exact Jordan-RoPE as a non-semisimple one-parameter representation, give its real block form, and specify the contragredient query action required by non-orthogonal positional maps. We also distinguish this exact representation from stabilized variants whose bounded shear improves numerical behavior but breaks the exact group law. Kernel-level diagnostics and a Jordan-friendly synthetic language-model task show that the coupled Jordan basis is useful when the target contains distance-modulated phase interactions. On a small WikiText-103 byte language model, a scaled-exact variant improves over RoPE and direct-sum baselines within the Jordan family, while RoPE+ALiBi remains strongest overall. The evidence is structural rather than a broad performance claim.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04217v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yaobo Zhang</dc:creator>
    </item>
    <item>
      <title>Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction</title>
      <link>https://arxiv.org/abs/2605.04221</link>
      <description>arXiv:2605.04221v1 Announce Type: new 
Abstract: Clinical named entity recognition from dental progress notes is challenging because documentation is highly unstructured, domain-specific, and often privacy-sensitive. We developed a locally deployable framework that enables small language models to self-generate, verify, refine, and evaluate entity-specific prompts for extracting multiple clinical entities from dental notes. Using 1,200 annotated notes, we evaluated candidate open-weight models with multi-prompt ensemble inference and further adapted selected models using QLoRA-based supervised fine-tuning and direct preference optimization. Model performance varied substantially, highlighting the need for task-specific evaluation rather than reliance on generic benchmarks. Qwen2.5-14B-Instruct achieved the strongest baseline performance. After DPO, Qwen2.5-14B-Instruct and Llama-3.1-8B-Instruct achieved micro/macro F1 scores of 0.864/0.837 and 0.806/0.797, respectively. These findings suggest that automated prompt optimization combined with lightweight preference-based post-training can support scalable clinical information extraction using locally deployed small language models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04221v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yao-Shun Chuang, Tushti Mody, Uday Pratap Singh, Shirindokht Shiraz, Chun-Teh Lee, Ryan Brandon, Muhammad F Walji, Xiaoqian Jiang, Bunmi Tokede</dc:creator>
    </item>
    <item>
      <title>Safety by Invariance, Liveness through Refinement: Heterogeneous Contract Framework for Co-Design of Layered Control</title>
      <link>https://arxiv.org/abs/2605.04222</link>
      <description>arXiv:2605.04222v1 Announce Type: new 
Abstract: Real-world control systems must achieve long-horizon objectives (liveness) while respecting continuous-time safety constraints, a combination that motivates hierarchical layered control architectures (LCAs). Existing LCA research, however, lacks (i) a uniform specification language across discrete planning and continuous execution, (ii) formal guarantees that specifications are preserved when interconnecting subsystems at heterogeneous time scales, and (iii) compositional separation between layers, owing to reliance on naive input-filtering laws. This paper addresses all three gaps by importing the safety--liveness decomposition into a heterogeneous assume--guarantee framework: \emph{safety is enforced by invariance} at the continuous-time layer, while \emph{liveness is achieved through refinement} at the discrete-time layer, with inter-layer coordination formalized via vertical refinement and timing-compatibility conditions. We instantiate this contract with a novel LCA combining an MPC planner, an input-to-state stabilizing (ISS) low-level controller, and a reference-governor bridge, and validate it on a Hybrid Energy Storage System (HESS) comprising a battery and a supercapacitor.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04222v1</guid>
      <category>eess.SY</category>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yoshinari Takayama, Alessio Iovine, Bart Besselink, Guillaume Sandou, Adnane Saoud</dc:creator>
    </item>
    <item>
      <title>ARMATA: Auto-Regressive Multi-Agent Task Assignment</title>
      <link>https://arxiv.org/abs/2605.04225</link>
      <description>arXiv:2605.04225v1 Announce Type: new 
Abstract: Coordinating multi-agent systems over spatially distributed areas requires solving a complex hierarchical problem: first distributing areas among agents (allocation) and subsequently determining the optimal visitation order (routing). Existing methods typically decouple these stages ignoring inter-stage dependencies or rely on decentralized heuristics that lack global context. In this work, we propose a centralized, fully end-to-end auto-regressive framework that jointly generates allocation decisions and routing sequences. The core contribution of our approach is a multi-stage decoding mechanism that unifies high-level allocation and low-level routing in a single autoregressive pass while maintaining a centralized global state. This enables the model to implicitly balance workload distribution with routing efficiency, avoiding local optima common in decentralized methods. Extensive experiments demonstrate that our method significantly outperforms diverse baselines, achieving up to a 20\% improvement in solution quality over industrial solvers such as Google OR-Tools, IBM CPLEX, and LKH-3, while reducing computation time from hours to seconds.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04225v1</guid>
      <category>cs.MA</category>
      <category>cs.AI</category>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yazan Youssef, Aboelmagd Noureldin, Sidney Givigi</dc:creator>
    </item>
    <item>
      <title>ipc_shared_ptr: A Publish/Subscribe-Aware Smart Pointer for Cross-Process Object Lifetime Management</title>
      <link>https://arxiv.org/abs/2605.04226</link>
      <description>arXiv:2605.04226v1 Announce Type: new 
Abstract: True zero-copy Inter-Process Communication (IPC) in publish/subscribe (pub/sub) middleware such as Robot Operating System 2 (ROS 2) requires subscribers to reference message objects in publisher-owned shared memory. Objects must not be reclaimed while referenced, yet must eventually be reclaimed, with correct handling of crash recovery and Transient Local QoS retention requirements. We propose ipc_shared_ptr, a pub/sub-aware smart pointer for cross-process message lifetime management. ipc_shared_ptr exploits pub/sub structural properties to specialize Birrell's reference listing, limiting global metadata updates to per-subscriber 0&lt;-&gt;1 transitions and achieving an order-of-magnitude reduction in global communication over general-purpose distributed reference counting. We analyze the key metadata management tradeoff: scalability versus implementation simplicity. Owner-driven reclaim offers greater scalability, but concurrent membership changes and reclamation decisions produce races that widen the correctness-verification state space. Single-writer achieves structural atomicity, eliminating this complexity at the cost of a centralized bottleneck. iceoryx2 (owner-driven reclaim) and Agnocast -- a true zero-copy ROS 2 IPC middleware sharing the publisher's heap with subscribers and adopting ipc_shared_ptr with single-writer -- embody each architecture. Comparative evaluation at the scale of Autoware -- the largest open-source ROS 2 application -- confirms that single-writer achieves sufficient scalability: at 200 topics, two subscribers per topic and 100 Hz, Agnocast's E2E p99.9 is 2.9x lower than iceoryx2's, justifying implementation simplicity over owner-driven reclaim.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04226v1</guid>
      <category>cs.OS</category>
      <category>cs.DC</category>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Takahiro Ishikawa-Aso, Atsushi Yano, Koichi Imai, Takuya Azumi, Shinpei Kato</dc:creator>
    </item>
    <item>
      <title>Pro$^2$Assist: Continuous Step-Aware Proactive Assistance with Multimodal Egocentric Perception for Long-Horizon Procedural Tasks</title>
      <link>https://arxiv.org/abs/2605.04227</link>
      <description>arXiv:2605.04227v1 Announce Type: new 
Abstract: Procedural tasks with multiple ordered steps are ubiquitous in daily life. Recent advances in multimodal large language models (MLLMs) have enabled personal assistants that support daily activities. However, existing systems primarily provide reactive guidance triggered by user queries, or limited proactive assistance for isolated short-term events rather than long-horizon procedural tasks. In this work, we introduce Pro$^2$Assist, a step-aware proactive assistant that continuously tracks fine-grained task progress and reasons over the user's evolving state to provide timely assistance throughout tasks. Pro$^2$Assist leverages multimodal data from augmented reality (AR) glasses to achieve motion-based perception. It then extracts step-oriented procedural context from multi-scale temporal dynamics and task-specific expert knowledge. Based on both sensory input and procedural context, Pro$^2$Assist performs continuous reasoning to infer user needs and display timely assistance on AR glasses. We evaluate Pro$^2$Assist using a dataset curated from public sources and a real-world dataset collected on our testbed with AR glasses. Extensive evaluations show that Pro$^2$Assist outperforms the best-performing baselines by over 21% in procedural action understanding accuracy, and it achieves up to 2.29x the proactive timing accuracy of baselines. A user study with 20 participants further shows that 90% find Pro$^2$Assist useful, indicating its effectiveness for real-world procedural assistance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04227v1</guid>
      <category>cs.AI</category>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lilin Xu, Bufang Yang, Siyang Jiang, Kaiwei Liu, Kaiyuan Hou, Yuang Fan, Hongkai Chen, Zhenyu Yan, Xiaofan Jiang</dc:creator>
    </item>
    <item>
      <title>Thinking fast and slow -- decision intelligence for power systems</title>
      <link>https://arxiv.org/abs/2605.04228</link>
      <description>arXiv:2605.04228v1 Announce Type: new 
Abstract: Decision-making in power systems spans multiple timescales - from milliseconds to prevent surges, to seconds to balance frequency and protect grid assets, to minutes for real-time energy balancing, to day-ahead, seasonal, and long-term planning. Growing uncertainty and complexity, driven by intermittent renewables and distributed energy resources (DER), demand fresh approaches to power system intelligence and architecture. Daniel Kahneman describes the interplay of two systems of human decision-making: System 1 that is fast, intuitive, experience based, reactive, and System 2 that is slow, deliberate, analytical. Similarly, octopus intelligence illustrates a model for distributed yet coordinated decision-making between central and edge intelligence. Future power systems must embed coordinated intelligence that operates across diverse timescales and with placement at both edge and centralized levels. This paper maps decision-intelligence in power systems against System 1 and 2 and edge-central architecture paradigms based on the trade-offs inherent in decision making such as speed/latency, energy cost/compute, accuracy, and robustness. The framework inspires an agentic intelligence architecture - laying the foundation for trustworthy, autonomous power systems of the future.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04228v1</guid>
      <category>eess.SY</category>
      <category>cs.DC</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Apoorv Mathur</dc:creator>
    </item>
    <item>
      <title>Capabilities of Auto-encoders and Principal Component Analysis of the Reduction of Microstructural Images; Application on the Acceleration of Phase-Field Simulations</title>
      <link>https://arxiv.org/abs/2605.04229</link>
      <description>arXiv:2605.04229v1 Announce Type: new 
Abstract: In this work, a data-driven framework based on Phase-Field simulations data is proposed to highlight the capabilities of neural networks to ensure accurate low dimensionality reduction of simulated microstructural images and to provide time-series analysis. The dataset was indeed constructed from high-fidelity Phase-Field simulations. Analyses demonstrated that the association of auto-encoder neural networks and principal component analyses leads to ensure efficient and significant dimensionality reduction: 1/196 of reduction ratio with more than 80% of accuracy. These findings give insight to apply analyses on data from the latent dimension. Application of Long Short Term Memory (LSTM) neural networks showed the possibility of making next frame predictions; that makes possible the acceleration of Phase-Field simulation without the need of high computing resources. We discussed the application of such a framework on various areas of research. Different methods are proposed from the conducted analyses, in order to ensure dimensionality reduction, including auto-encoders, principal component analysis and Artificial Neural Networks, and time-series analysis, including LSTM and Gated Recurrent Unit (GRU).</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04229v1</guid>
      <category>cs.LG</category>
      <category>cond-mat.mtrl-sci</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:DOI>10.1016/j.commatsci.2022.111820</arxiv:DOI>
      <arxiv:journal_reference>Computational Materials Science, Volume 216, 5 January 2023, Article 111820</arxiv:journal_reference>
      <dc:creator>Seifallah Fetni, Thinh Quy Duc Pham, Truong Vinh Hoang, Hoang Son Tran, Laurent Duch\^ene, Xuan-Van Tran, Anne Marie Habraken</dc:creator>
    </item>
    <item>
      <title>Layerwise LQR for Geometry-Aware Optimization of Deep Networks</title>
      <link>https://arxiv.org/abs/2605.04230</link>
      <description>arXiv:2605.04230v1 Announce Type: new 
Abstract: Geometry-aware optimizers such as Newton and natural gradient can improve conditioning in deep learning, but scalable variants such as K-FAC, Shampoo, and related preconditioners usually impose structural approximations early, often discarding cross-layer interactions induced by the network computation. We introduce Layerwise LQR (LLQR), a framework for learning structured inverse preconditioners under a global layerwise optimal-control objective. The starting point is an exact equivalence: the steepest-descent step under a broad class of divergence-induced quadratic models--including Newton, Gauss-Newton, Fisher/natural-gradient, and intermediate-layer metrics--can be written as a finite-horizon Linear Quadratic Regulator (LQR) problem. This formulation serves as a reference that exposes the layerwise dynamics and cost matrices encoding the original dense geometry. We then derive a scalable relaxation that learns diagonal, (E-)Kronecker-factored, or other structured inverse preconditioners by minimizing the LQR objective and reusing them across iterations. The resulting optimizer wraps standard methods while retaining a principled connection to second-order geometry, without forming or inverting the global curvature matrix. Experiments on ResNets and Transformers show that LLQR improves optimization dynamics and often translates these gains into improved final test performance, while adding only modest wall-clock overhead. It establishes LLQR as a practical framework for geometry-aware second-order methods and a reference for evaluating scalable approximations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04230v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Simon Dufort-Labb\'e, Pierre-Luc Bacon, Razvan Pascanu, Simon Lacoste-Julien, Aristide Baratin</dc:creator>
    </item>
    <item>
      <title>Anatomy of a failure: When, how, and why deep vision fails in scientific domains</title>
      <link>https://arxiv.org/abs/2605.04231</link>
      <description>arXiv:2605.04231v1 Announce Type: new 
Abstract: Mirroring its ubiquity in popular media and all human activities, the use of deep learning (DL) is rapidly growing in scientific imaging modalities. However, unlike everyday RGB pictures, pixels encode precise physicochemical properties in scientific imaging across potentially thousands of channels. While DL is well validated on human-centric RGB perceptual tasks, its effectiveness for scientific imaging remains uncertain. Here, we show that the naive application of DL frameworks to scientific images can lead to critical failures. We evaluate the use of DL for pathology, comparing RGB images of stained tissue with the quantitative and information-rich biochemical signatures of infrared (IR) imaging. Despite this informational advantage, DL models trained on IR data paradoxically underperform. We investigate this discrepancy to find that IR data priors interact poorly with the simplicity bias of DL, causing models to collapse to one-dimensional predictions. This constitutes a catastrophic DL failure because the model's representational capacity remains largely unused, while furthermore raising AI safety concerns and undermining the advantages of such scientific modalities. Notably, this problem persists even with state-of-the-art DL robustification strategies, which are primarily designed and validated for RGB imagery and thus inherit the same prior-bias mismatch. This work establishes a framework for understanding the limitations of generic DL in science and advocates for the study of modality-specific failure modes to guide the development of specialized, safe AI algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04231v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ji-Hun Oh, Dou Hoon Kwark, Kianoush Falahkheirkhah, Kevin Yeh, John Cheville, Volodymyr Kindratenko, Rohit Bhargava</dc:creator>
    </item>
    <item>
      <title>Probabilistic Floating-Point Round-Off Analysis via Concentration Inequalities</title>
      <link>https://arxiv.org/abs/2605.04232</link>
      <description>arXiv:2605.04232v1 Announce Type: new 
Abstract: Floating-point round-off errors are ubiquitous in numerically intensive programs arising in fields such as scientific computing and optimization. As floating-point errors potentially lead to unexpected and catastrophic program failures, one must derive guaranteed round-off thresholds to ensure the correctness of these programs. However, deterministic round-off thresholds tend to be too conservative to be usable in practice, since they often involve large round-off errors that occur with small probability. Probabilistic thresholds relax deterministic ones by specifying that the probability of the round-off error exceeding a threshold is below a given confidence.
  In this work, we propose a novel approach to probabilistic round-off analysis, by applying concentration inequalities over the Taylor expansion from FPTaylor (TOPLAS 2018). A major obstacle in applying concentration inequalities is that the Taylor expansion involves absolute value operators that make the calculation of the expected values of the first order partial differential terms difficult. Our first step to overcome this obstacle is a sound over-approximation that removes the absolute value operators in polynomial expressions. Then, we show how to handle fractional expressions by a transformation into polynomial case. Finally, we show how to improve our approach with range partitioning. Our approach is scalable since the key computational part is the calculation of expected values of polynomial expressions with independent variables, for which the linear and independence properties of expectation boost the computation. Experimental results show that our approach is orders of magnitude more time efficient, while producing thresholds with comparable precision against the state of the art.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04232v1</guid>
      <category>cs.LO</category>
      <category>cs.NA</category>
      <category>cs.PL</category>
      <category>math.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yichen Tao, Hongfei Fu, Jiawei Chen, Jean-Baptiste Jeannin</dc:creator>
    </item>
    <item>
      <title>Disentangled Learning Improves Implicit Neural Representations for Medical Reconstruction</title>
      <link>https://arxiv.org/abs/2605.04234</link>
      <description>arXiv:2605.04234v1 Announce Type: new 
Abstract: Implicit neural representations (INRs) have emerged as a powerful paradigm for medical imaging via physics-informed unsupervised learning. Classical INRs optimize an entire network from scratch for each subject, leading to inefficient training and suboptimal imaging quality. Recent initialization-based approaches attempt to inject population priors into pre-trained networks, yet they rely on high-quality images and often suffer from catastrophic forgetting during fine-tuning. We present DisINR, a novel INR framework that explicitly disentangles shared and subject-specific representations. DisINR introduces a shared encoder-decoder pair and subject-specific encoders, whose features are jointly decoded for image reconstruction. By integrating differentiable forward models, it pre-trains the shared modules directly from limited raw measurements, removing the need for pre-acquired high-quality images. During test-time adaptation, only the subject-specific encoder is optimized, while the shared pair remains frozen, effectively preserving learned priors. Extensive evaluations on three representative medical imaging tasks show that DisINR significantly outperforms state-of-the-art INRs in both reconstruction accuracy and efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04234v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qing Wu, Xuanyu Tian, Chenhe Du, Haonan Zhang, Xiao Wang, Le Lu, Yuyao Zhang</dc:creator>
    </item>
    <item>
      <title>Adaptive Consensus in LLM Ensembles via Sequential Evidence Accumulation: Automatic Budget Identification and Calibrated Commit Signals</title>
      <link>https://arxiv.org/abs/2605.04236</link>
      <description>arXiv:2605.04236v1 Announce Type: new 
Abstract: Large Language Model ensembles improve reasoning accuracy up to a performance boundary; beyond it, additional deliberation degrades accuracy. Static-budget methods cannot detect this boundary. Extended-thinking architectures compound the problem: a wrong answer after 120k tokens is indistinguishable from a correct one. We introduce DASE (Deliberative Adaptive Stopping Ensemble), a stopping heuristic for iterative ensemble deliberation that commits early on genuine consensus and applies a global-frequency fallback on fragmented evidence. Two configurations are evaluated: a persistence heuristic and DASE-Spatial (arena half-width W). Three contributions. (1) DASE produces a commit-type routing partition complementary to verbalized single-call confidence. On a contamination-controlled corpus (AIME 2010-2023, N=254, 3 seeds), a 120B ensemble achieves a 24.8 pp routing gap (right-wall 97.1% vs. left-wall 73.6%), statistically equivalent to Opus 4.6 Standard verbalized confidence at coverage-matched threshold (25.7 pp gap; bootstrap CI on difference: [-12.0, +10.3] pp, p=0.873). The two mechanisms disagree on 27% of routing assignments, establishing them as complements rather than substitutes; every DASE decision is accompanied by a machine-readable deliberation record. (2) Adaptive stopping, not injection bandwidth, drives accuracy gains. On AIME-300, bandwidth accounts for only 0.3 pp (ns); on GPQA-Extended, 4.4 pp bandwidth versus 5.0 pp stopping effect. DASE-Spatial ties Debate-Dense at its optimal budget using one-tenth the injection bandwidth and identifies that budget automatically; W=8 (65.0%) significantly outperforms W=4 (59.3%) on AIME-300 (adj p=0.0042). (3) Injection-based methods exhibit a retrospective accuracy-vs-inference inverted-U on both benchmarks; this pattern is hypothesis-generating for future work.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04236v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Roberto Medina</dc:creator>
    </item>
    <item>
      <title>Densification and forecasting of Sentinel-2 time series from multimodal SAR and Optical satellite data using deep generative models</title>
      <link>https://arxiv.org/abs/2605.04239</link>
      <description>arXiv:2605.04239v1 Announce Type: new 
Abstract: Optical satellite image time series are extensively used in many Earth observation applications, including agriculture, climate monitoring, and land surface analysis. However, clouds and swath edges result in irregular sampling along the temporal dimension, limiting continuous monitoring. To address this issue, a growing body of work has focused on temporal densification and reconstruction of satellite image time series, with the objective of filling missing or cloud-contaminated observations within the temporal extent of the available data. While these approaches improve temporal continuity, they are inherently restricted to the reconstruction of the gaps within the observed time periods, and do not address the prediction of future observations. This work proposes a probabilistic deep learning framework for the densification and forecasting of Sentinel-2 time series by generating optical images at arbitrary past or future dates. The approach leverages multimodal satellite data by jointly exploiting Sentinel-2 optical and Sentinel-1 SAR observations. Unlike most existing works, we propose to focus on the uncertainty of the generated images. Experimental results demonstrate effective densification and forecasting, on sparse and temporally misaligned time series.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04239v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>V\'eronique Defonte, Dawa Derksen, Alexandre Constantin, Bastien Nespoulous</dc:creator>
    </item>
    <item>
      <title>Road Risk Monitor: A Deployable U.S. Road Incident Forecasting System with Live Weather and Road-Level Tiles</title>
      <link>https://arxiv.org/abs/2605.04242</link>
      <description>arXiv:2605.04242v1 Announce Type: new 
Abstract: Nationwide road-incident forecasting is a systems problem before it is a modeling problem. A usable service must connect historical incident archives, historicalandliveweather,nationalroadgeometry, offline model training, tile generation, web serving and runtime handoff. This paper presents Road Risk Monitor, a U.S.-wide road-safety stack that combines a nationwide H3 baseline trained on FARS fatal-crash data with a road-segment forecasting pipeline trained from TIGER/Line geometry and US-Accidents events, then serves predictions through live APIs, raster tiles, JSON road tiles, and a public web application.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04242v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anton Ivchenko</dc:creator>
    </item>
    <item>
      <title>Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA</title>
      <link>https://arxiv.org/abs/2605.04243</link>
      <description>arXiv:2605.04243v1 Announce Type: new 
Abstract: Despite significant advances, large language models (LLMs) continue to exhibit brittle performance on complex temporal reasoning tasks. This failure mode is widely attributed to inherent deficits in autoregressive logical deduction. In this paper, we challenge this prevailing narrative, demonstrating that temporal reasoning is not the fundamental bottleneck; rather, the locus of failure lies in unstructured text-to-event representation. We introduce a novel neuro-symbolic question-answering framework governed by a Probabilistic Inconsistency Signal (PIS) that explicitly isolates perceptual errors from reasoning failures. By lifting unstructured text into explicit event graphs and interval constraints, our architecture strictly decouples semantic extraction from a symbolic reasoning engine. To robustly detect structural breaks, the PIS elegantly unifies symbolic credal intervals with epistemic neural uncertainty extracted via Evidential Deep Learning on LLM hidden states. Empirical evaluations reveal a striking paradigm shift: when provided with correct structural representations, our system's explicit proof traces achieve perfect 1.0 accuracy (4000/4000) and strictly zero false positives/negatives on temporal arithmetic benchmarks. On broader, noise-injected QA settings, the framework maintains a competitive 75.1\% accuracy while enabling deterministic, step-level failure localization. Ultimately, by isolating the representation bottleneck from the reasoning substrate, this work reframes temporal QA from an algorithmic reasoning challenge to a structural alignment problem, charting a verifiable path forward for reliable neuro-symbolic AI.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04243v1</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tran Quang Liem</dc:creator>
    </item>
    <item>
      <title>Faster Iterative $\phi$ Queries on the Positional BWT</title>
      <link>https://arxiv.org/abs/2605.04244</link>
      <description>arXiv:2605.04244v1 Announce Type: new 
Abstract: The Positional Burrows-Wheeler Transform (PBWT) is a fundamental data structure for the efficient representation and analysis of large-scale haplotype panels. For a panel of $h$ sequences $\{S_1, \dots, S_h\}$ over $m$ sites, a key operation is the $\phi_j(i)$ query, which returns the haplotype index immediately preceding $S_i$ in co-lexicographic order at site $j$. Efficient support for $k$ iterative queries $\phi^1, \dots, \phi^k$ is essential for haplotype matching and variation analysis.
  In this work, we introduce a simple and novel decomposition scheme that decomposes each haplotype row into sub-intervals, called refined segments, within which a haplotype's co-lexicographic predecessor for the sites remains unchanged. We show that refined segments satisfy two key properties: (i) each segment $[b,e]$ associated with $S_i$ overlaps with at most a constant number of segments of $S_{\phi_e(i)}$, and (ii) the total number of segments is bounded by $O(\tilde{r} + h)$, where $\tilde{r}$ denotes the number of runs in the PBWT. Building on this decomposition, we present two space-time tradeoffs for supporting $k$ iterative $\phi$ queries: (i) a structure using $O((\tilde{r} + h)\log n)$ bits of space that answers $k$ iterative queries in $O(\log \log_w \min(m,h) + k)$ time, where $n = m \cdot h$, and (ii) a more compact structure using $O(\tilde{r} \log h + h \log n)$ bits of space that supports queries in $O(k \log \log_w h)$ time.
  Prior to our work, supporting these queries required $O((\tilde{r} + h)\log n)$ bits of space and $O(k \cdot \log \log_w m)$ time. Our second tradeoff is expected to be effective in practice for modern genomic datasets, where the number $h$ of haplotypes is typically much smaller than the number $m$ of sites.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04244v1</guid>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Paola Bonizzoni, Travis Gagie, Younan Gao</dc:creator>
    </item>
    <item>
      <title>Physics-Guided Regime Unmixing</title>
      <link>https://arxiv.org/abs/2605.04247</link>
      <description>arXiv:2605.04247v1 Announce Type: new 
Abstract: The Linear Mixing Model (LMM) dominates spectral unmixing for its simplicity, but fails under multiple scattering; existing nonlinear models compensate by applying a fixed regime uniformly across entire scenes. We propose Physics-Guided Regime Unmixing (PGRU), which estimates a pixel-wise scalar $\xi_i \in [0,1]$ from observable physical features to activate nonlinear mixing only where justified. Residuals from the Generalized Bilinear Model (GBM), the Post-Nonlinear Mixing Model (PPNM), and Hapke are combined via learned attention, yielding interpretable regime maps. Experiments on Samson, Jasper Ridge, and Urban show consistent improvements over baselines, with physical coherence $\rho &gt; 0.90$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04247v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Paula Pacheco, Pablo Granitto, Juan B. Cabral</dc:creator>
    </item>
    <item>
      <title>Towards a Zero-Trust Supply-Chain Assurance Rubric for ORAN RIC Applications</title>
      <link>https://arxiv.org/abs/2605.04249</link>
      <description>arXiv:2605.04249v1 Announce Type: new 
Abstract: Open RAN enables third-party xApps and rApps to be onboarded and updated at operational cadence, creating a software supply chain that spans developers, CI systems, registries, onboarding pipelines, and runtime enforcement points. This preprint proposes a zero-trust supply-chain assurance rubric for O-RAN RIC applications. It makes three contributions: first, an app-centric lifecycle threat model for RIC applications across build, signing, publication, onboarding, runtime, and update or rollback stages; second, a WG11-aligned threat-control-evidence mapping that relates lifecycle threats to O-RAN security baselines and complementary supply-chain evidence; and third, an operator-facing assurance profile that combines secure software development practices, SBOM transparency, and SLSA-style provenance into incremental onboarding levels. Analytical case-study walkthroughs and a minimal evidence-checking workflow illustrate how the rubric can support explicit Accept, Escalate, or Block decisions during RIC app onboarding. The evaluation is intended to assess applicability rather than deployment-scale performance; empirical measurements of operational overhead, decision consistency, and detection coverage are left for future work.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04249v1</guid>
      <category>cs.CR</category>
      <category>cs.NI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chun Yin Chiu</dc:creator>
    </item>
    <item>
      <title>Binary Image-Based Intrusion Detection for Operational Technology Networks: Extending the SPHBI Methodology from IoT to Modbus TCP</title>
      <link>https://arxiv.org/abs/2605.04250</link>
      <description>arXiv:2605.04250v1 Announce Type: new 
Abstract: This paper extends the Single Packet Header Binary Image (SPHBI) intrusion detection methodology from IoT to Modbus TCP, evaluating five approaches spanning a gradient of protocol depth on the CIC Modbus 2023 dataset (11.4 million packets, eight detectable attack types). TCP/IP headers alone achieve only 51.8% binary accuracy, confirming that header-level heterogeneity exploited in IoT traffic is absent in uniform SCADA environments. Adding eight bytes of application-layer information improves binary accuracy to 98.1% with just 63 parameters, directly relevant to per-packet classification on resource-constrained OT edge devices. The best-performing approach achieves 94.4% +/- 2.2pp multiclass accuracy across nine classes (95% CI [92.9%, 95.9%], 10 seeds) with 56,873 parameters, roughly 430 times fewer than comparable ResNet50-based approaches. Per-class recall analysis shows seven of eight detectable attack types identified with recall above 94%, while replay attacks remain structurally undetectable by any single-packet method.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04250v1</guid>
      <category>cs.CR</category>
      <category>cs.NI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aamir Omar</dc:creator>
    </item>
    <item>
      <title>Root-Cause-Driven Automated Vulnerability Repair</title>
      <link>https://arxiv.org/abs/2605.04251</link>
      <description>arXiv:2605.04251v1 Announce Type: new 
Abstract: Recent LLM-based systems have made automated vulnerability repair increasingly practical, but two challenges remain. First, without strong signals about where a bug originates, repair agents drift toward shallow edits that silence the observed failure while leaving the underlying defect unresolved. Second, finding the root cause for bugs is hard: even developers familiar with the codebase frequently produce fixes that address symptoms rather than the root cause, and LLM-based agents, operating with noisier context and less program understanding, are no exception. We present Kumushi, a root-cause-driven patching agent that addresses both challenges by combining diversified dynamic fault localization with evidence-weighted ranking to focus the LLM on the code most relevant to the defect. To rigorously measure whether Kumushi produces genuinely better patches, we also introduce a two-tier patch quality metric that pairs automated oracle validation with structured expert assessment of patches. Evaluated on 178 C/C++ vulnerabilities, Kumushi substantially outperforms prior specialized repair agents under automated evaluation while matching a frontier commercial coding agent. Expert assessment then reveals differences that oracles cannot: Kumushi produces more root-cause fixes and fewer superficial patches, and is preferred in the majority of decisive pairwise comparisons. Together, these results demonstrate that progress in automated vulnerability repair requires not only stronger patching systems, but also richer evaluation methods capable of distinguishing genuine fixes from oracle-passing ones.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04251v1</guid>
      <category>cs.CR</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Hulin Wang, Zion Leonahenahe Basque, Jie Hu, Ati Priya Bajaj, Yibo Liu, Samuel Zhu, Giorgi Kobakhia, Nikhil Chapre, Will Rosenberg, Siddharth Mishra, Aditya Maheshbhai Gabani, Moritz Schloegel, Adam Doup\'e, Yan Shoshitaishvili, Ruoyu Wang, Tiffany Bao</dc:creator>
    </item>
    <item>
      <title>Second-Order FALQON Parameter Transfer for the Max-Cut Problem on 3-Regular Graphs</title>
      <link>https://arxiv.org/abs/2605.04253</link>
      <description>arXiv:2605.04253v1 Announce Type: new 
Abstract: The Feedback-based Algorithm for Quantum Optimization (FALQON) offers a deterministic alternative to variational quantum algorithms by bypassing classical optimization loops. However, maintaining convergence on large problem instances often requires restricting the time step, necessitating quantum circuit depths that exceed Noisy Intermediate-Scale Quantum (NISQ) hardware capabilities. This paper investigates the parameter transferability of second-order FALQON applied to the Max-Cut problem on 3-regular graphs. Through numerical experiments evaluating quantum circuits up to 16 layers on graphs up to 24 nodes, we demonstrate a highly advantageous scaling behavior: transferring feedback parameters optimized on small instances to larger target graphs yields significantly higher approximation ratios than natively optimizing the parameters directly on the larger graphs. This performance advantage arises because parameters trained on smaller instances can safely adopt aggressively larger time steps. By offloading the expensive parameter discovery phase to small-scale instances, this transfer strategy simultaneously reduces computational overhead and enhances the approximation ratio, thereby bringing FALQON closer to practical viability on near-term quantum architectures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04253v1</guid>
      <category>cs.ET</category>
      <category>quant-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Gabriel Fernandes Thomaz, Eduarda Rodrigues Monteiro, Jerusa Marchi, Marcelo Zen Pretto, Alisson dos Passos Fumaco, Evandro Chagas Ribeiro da Rosa</dc:creator>
    </item>
    <item>
      <title>Hierarchical Support Vector State Partitioning for Distilling Black Box Reinforcement Learning Policies</title>
      <link>https://arxiv.org/abs/2605.04254</link>
      <description>arXiv:2605.04254v1 Announce Type: new 
Abstract: We introduce State Vector Space Partitioning (SVSP), a novel method to mimic a black box reinforcement learning policy using a set of human-interpretable subpolicies. By partitioning a distillation dataset of state action pairs with linear support vector machine splits, SVSP constructs a compact and structured representation of the original policy. Our method improves mean return by +7.4\% over previous critic driven state partitioning attempts such as Voronoi State Partitioning (VSP) and +2.8\% over the original TD3 policy, while reducing the number of required subpolicies against VSP by 82.1\%. Our results pave the path towards a more flexible form of distillation where both the decision boundary and surrogate models can be chosen within a margin of the original black box behavior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04254v1</guid>
      <category>cs.LG</category>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Senne Deproost, Mehrdad Asadi, Ann Now\'e</dc:creator>
    </item>
    <item>
      <title>phys-MCP: A Control Plane for Heterogeneous Physical Neural Networks</title>
      <link>https://arxiv.org/abs/2605.04256</link>
      <description>arXiv:2605.04256v1 Announce Type: new 
Abstract: Physical neural networks (PNNs) embed computation directly in material dynamics, including molecular, chemical, biological, photonic, memristive, and mechanical substrates. They are attractive for edge computing, especially at the extreme edge, where computation can be placed at the interface to sensing, actuation, or the physical process itself. However, PNNs are difficult to integrate into edge-cloud software stacks because each substrate exposes distinct interfaces, timing behavior, observability limits, and lifecycle requirements. This paper argues that the missing systems component is a common control plane for heterogeneous PNNs. We present phys-MCP, a substrate-aware orchestration architecture that exposes physical neural substrates as discoverable and invocable resources for edge, fog, and cloud workflows, while preserving their possible placement at the extreme edge. phys-MCP defines a capability model, lifecycle semantics, telemetry interfaces, and digital-twin bindings that retain substrate-specific properties such as latency, resetability, plasticity, and I/O modality. We instantiate the architecture through a prototype with three representative backend classes, an HTTP-backed execution path, and an integrated Cortical Labs adapter exposing a wetware-facing API path through the same control model. The evaluation combines controlled experiments on representative backends with end-to-end validation of the Cortical Labs path. Results show descriptor-portable integration across heterogeneous backends, improved runtime-aware matching over simpler baselines, telemetry-aware recovery under representative faults, successful execution against the API-backed wetware path, and small local control-path overhead. Overall, results provide prototype-level evidence that substrate-aware control can span heterogeneous physical AI resources, twin-backed backends, and a wetware-facing API path.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04256v1</guid>
      <category>cs.DC</category>
      <category>cs.ET</category>
      <category>cs.NE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Stefan Fischer, Maliheh Hariri, Sebastian Otte</dc:creator>
    </item>
    <item>
      <title>HUGO-CS: A Hybrid-Labeled, Uncertainty-Aware, General-Purpose, Observational Dataset for Cold Spray</title>
      <link>https://arxiv.org/abs/2605.04257</link>
      <description>arXiv:2605.04257v1 Announce Type: new 
Abstract: Cold spraying is an increasingly common approach for repairing and manufacturing components due to its solid-state manufacturing capabilities. However, process optimization remains difficult due to many interdependent parameters and the lack of large-scale, machine-readable data to support modeling. While the scientific literature contains many relevant experiments, results are inconsistently reported (often in tables and figures) and use non-uniform units, limiting utilization at scale. To address these limitations, this work presents HUGO-CS, a literature-derived dataset of 4,383 cold-spray experiments with 144 features from 1,124 sources, exceeding the previous largest dataset (137 samples) by 30x. With completely manual extraction requiring an average of 91 minutes per document, this work designs and leverages a Hybrid-labeled, Uncertainty-aware, General-purpose, Observational extraction framework, called HUGO, to support this extraction. HUGO combines automated LLM-based labeling with targeted manual label refinement to handle this experimental result extraction process from scientific literature. To balance labeling efficiency with extraction accuracy, HUGO introduces a Hierarchical Risk Mitigation (HRM) to route LLM outputs with a high risk of potential errors for manual review, while retaining low-risk records as auto-labeled. Lastly, HUGO post-processing consolidates categorical descriptors, maps reported feedstock chemistries into structured continuous compositions, and normalizes units across sources. Of the 4,383 reported experiments, 1,765 are hand-labeled, providing a high-quality labeled subset for benchmarking, error analysis, and higher-fidelity data points. All code to replicate this work, along with the complete HUGO-CS dataset, are released under a CC-BY license at https://github.com/sprice134/HUGO.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04257v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Stephen Price, Kyle Miller, Marco Musto, Kenneth Kroenlein, James Saal, Kyle Tsaknopoulos, Elke A. Rundensteiner, Danielle L. Cote</dc:creator>
    </item>
    <item>
      <title>Constructing Suffixient Arrays Revisited</title>
      <link>https://arxiv.org/abs/2605.04258</link>
      <description>arXiv:2605.04258v1 Announce Type: new 
Abstract: Recently, Cenzato et al.\ proposed a new text index, called the \emph{suffixient array}, which is a subset of the suffix array and supports locating a single pattern occurrence or finding its maximal exact matches (MEMs), assuming random access to the input text $T[1..n]$ is available. They show that, given the suffix array, the longest common prefix array, and the Burrows--Wheeler transform (BWT) of the reverse of $T[1..n]$ over an alphabet $\{1,\ldots,\sigma\}$, a suffixient array can be constructed in linear time. However, their construction algorithms require multiple scans of these arrays. When restricted to a single pass over the arrays, they present an alternative construction algorithm running in $O(n + \overline{r} \log \sigma)$ time, where $\overline{r}$ is the number of runs in the BWT of the reversed text. In this paper, we present a new one-pass algorithm that constructs a suffixient array in linear time under the standard RAM model.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04258v1</guid>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Paola Bonizzoni, Younan Gao, Brian Riccardi</dc:creator>
    </item>
    <item>
      <title>EngThrive: Make It Fast and Easy to Do Great Work</title>
      <link>https://arxiv.org/abs/2605.04259</link>
      <description>arXiv:2605.04259v1 Announce Type: new 
Abstract: Frameworks such as SPACE, DevEx, and DORA established that developer productivity is inherently multidimensional, but left practitioners with a practical question: what should we measure, and how should we use it to improve? This paper introduces Engineering Thrive (EngThrive), a measurement and improvement system developed and deployed across Microsoft's engineering organization. EngThrive organizes productivity around three dimensions - Speed, Ease, and Quality - with Thriving as a guardrail to ensure developer wellbeing improves alongside performance. Within each dimension, outcome-oriented North Star metrics are paired with diagnostic submetrics, combining system telemetry with developer surveys to provide both scale and context. We describe the design principles that guide metric selection, including an approach in which well-chosen metrics align "gaming" behavior with genuine improvement. We also outline the data platform, survey program, and dashboard ecosystem required to operationalize this approach in practice, and present case studies demonstrating how outcome-oriented measurement enables sustained, system-level improvements. Finally, we show that EngThrive functions as a general-purpose evaluation language, applicable not only to developer tools and AI, but to organizational policies, work environments, and other factors that shape how developers experience their work. We offer EngThrive as a concrete model for organizations seeking to move beyond measuring activity toward improving outcomes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04259v1</guid>
      <category>cs.SE</category>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Brian Houck, Tim Bozarth, David Liu, Dean Carignan</dc:creator>
    </item>
    <item>
      <title>Lightweight Vulnerability Detection from Code Metrics and Token Features</title>
      <link>https://arxiv.org/abs/2605.04260</link>
      <description>arXiv:2605.04260v1 Announce Type: new 
Abstract: Vulnerability detection for C/C++ code increasingly relies on heavy representations such as code graphs and deep models, while many practical workflows still benefit from fast and reproducible ranking baselines for human triage. This preprint studies a lightweight function-level vulnerability triage pipeline that combines sparse token n-grams from raw function text with a small set of inexpensive code metrics, including NLOC, approximate cyclomatic complexity, token count, maximum brace depth, and parameter count. We use TF-IDF token features and a class-weighted logistic regression classifier, avoiding deep learning, transformers, and program graphs.
  Using the Devign function-level labels, we evaluate random and cross-project settings, including a FFmpeg-to-QEMU transfer experiment. We emphasize precision-recall AUC and Recall@10% as ranking-oriented metrics for skewed or triage-oriented workloads. On the random split, the best combined variant reaches PR-AUC 0.642 and Recall@10% 0.161, while cross-project generalization is substantially harder, with PR-AUC around 0.436. We further report ablations, test-only identifier-renaming robustness, and end-to-end efficiency. The results suggest that simple token and metric features provide a useful transparent baseline, but also expose sensitivity to superficial lexical cues and limited cross-project transfer.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04260v1</guid>
      <category>cs.CR</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chun Yin Chiu</dc:creator>
    </item>
    <item>
      <title>Laundering AI Authority with Adversarial Examples</title>
      <link>https://arxiv.org/abs/2605.04261</link>
      <description>arXiv:2605.04261v1 Announce Type: new 
Abstract: Vision-language models (VLMs) are increasingly deployed as trusted authorities -- fact-checking images on social media, comparing products, and moderating content. Users implicitly trust that these systems perceive the same visual content as they do. We show that adversarial examples break this assumption, enabling \emph{AI authority laundering}: an attacker subtly perturbs an image so that the VLM produces confident and authoritative responses about the \emph{wrong} input. Unlike jailbreaks or prompt injections, our attacks do not compromise model alignment; the attack operates entirely at the perceptual level. We demonstrate that standard attacks against publicly available CLIP models transfer reliably to production VLMs -- including GPT-5.4, Claude Opus~4.6, Gemini~3, and Grok~4.2. Across four attack surfaces, we show that authority laundering can amplify misinformation, disparage individuals, evade content moderation, and manipulate product recommendations. Our attacks have high success rates: In hundreds of attacks targeting identity manipulation and NSFW evasion, we measure success rates of $22 - 100\%$ across six models. No novel attack algorithm is required: basic techniques known for over a decade suffice, establishing a lower bound on attacker capability that should concern defenders. Our results demonstrate that visual adversarial robustness is now a practical -- and still largely unsolved -- safety problem.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04261v1</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jie Zhang, Pura Peetathawatchai, Florian Tram\`er, Avital Shafran</dc:creator>
    </item>
    <item>
      <title>Imagery Dataset for Remaining Useful Life Estimation of Synthetic Fibre Ropes</title>
      <link>https://arxiv.org/abs/2605.04262</link>
      <description>arXiv:2605.04262v1 Announce Type: new 
Abstract: Remaining useful life (RUL) estimation of synthetic fibre ropes (SFRs) is critical for safe operation in offshore-crane, wind turbine installation, and heavy-load handling applications, where rope failure can result in catastrophic safety incidents and costly downtime. Despite growing research interest in data-driven condition monitoring, there is no publicly available image dataset that captures the complete degradation lifecycle of SFRs under controlled cyclic fatigue loading. To address this gap, we present a novel image dataset comprising approximately 34,700 high-resolution images of eleven Dyneema SK75/78 high-modulus polyethylene (HMPE) rope samples subjected to cyclic fatigue on a sheave-bend test stand at seven distinct axial load levels ranging from 60 kN to 280 kN. Ropes were loaded until mechanical failure, with fatigue lifetimes ranging from 695 cycles to 8,340 cycles. After every fixed number of sheave cycles (an inspection burst), ten images were captured at different cross-sectional positions along the rope, providing spatially representative sampling of surface degradation throughout the rope's entire service life. The images obtained from each load are annotated with the corresponding elapsed cycle count, enabling a direct computation of RUL for any rope in the sequence. This dataset aims to support a broad range of machine learning (ML) tasks including RUL regression, damage progression modelling, anomaly detection, and load-conditioned prognostics. The dataset is intended to serve as a benchmark resource for the development and comparison of vision-based condition monitoring (CM) and prognostics algorithms for SFRs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04262v1</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anju Rani, Daniel Ortiz-Arroyo, Petar Durdevic</dc:creator>
    </item>
    <item>
      <title>Parallel Prefix Verification for Speculative Generation</title>
      <link>https://arxiv.org/abs/2605.04263</link>
      <description>arXiv:2605.04263v1 Announce Type: new 
Abstract: We introduce PARSE (PArallel pRefix Speculative Engine), a speculative generation framework that accelerates large language model (LLM) inference by parallelizing prefix verification on a semantic level. Existing speculative decoding methods are fundamentally limited by token-level equivalence: the target model must verify each token, leading to short acceptance lengths and modest speedups. Moving to semantic or segment-level verification can substantially increase acceptance granularity, but prior approaches rely on sequential verification, introducing significant overhead and limiting practical gains. PARSE introduces parallel prefix verification, enabling semantic-level verification without sequential checks. Given a full draft from a draft model, the target model evaluates correctness across multiple prefixes in a single forward pass using a custom attention mask, directly identifying the maximal valid prefix. This eliminates sequential segment verification, and makes verification compute-efficient. PARSE is orthogonal to token-level speculative decoding and can be composed with it for additional gains. Across models and benchmarks, PARSE delivers $1.25\times$ to $4.3\times$ throughput gain over the target model, and $1.6\times$ to $4.5\times$ when composed with EAGLE-3, all with negligible accuracy degradation. This demonstrates parallel prefix verification as an effective, general approach to accelerating LLM inference.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04263v1</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuncheng Yao, Yuxuan Xia, Shengjie Wang, Danyang Zhuo</dc:creator>
    </item>
    <item>
      <title>Governed Collaborative Memory as Artificial Selection in LLM-Based Multi-Agent Systems</title>
      <link>https://arxiv.org/abs/2605.04264</link>
      <description>arXiv:2605.04264v1 Announce Type: new 
Abstract: Persistent memory is turning language-model-based agents from stateless participants in isolated interactions into state-bearing components of LLM-based multi-agent systems. As memory becomes durable, reloadable, and behavior-shaping across agents, sessions, or versions, a design question arises that is not captured by retrieval accuracy or access control alone: which candidate memories should become shared institutional state? This Viewpoint frames that problem as governed collaborative memory. We argue that memory governance functions as a selection regime, determining which memory variants persist, which remain private, and which are rejected, abstained from, or superseded. We distinguish ungoverned persistence, constitutional or hybrid selection, automatic metric-based selection, and human-ratified artificial selection, emphasizing that these regimes are not a ranking but a design choice over target properties. We then describe a layered architecture that separates agent-local memory, shared institutional memory, archive memory, and project-continuity memory, with provenance and version lineage making selection inspectable. Documented traces from one running LLM-based multi-agent ecosystem illustrate unmanaged false-memory persistence, ratified institutional memory, rejection and revision, identity-preserving expansion, and governance-as-learning. The contribution is a design agenda: persistent LLM-based multi-agent systems should evaluate memory not only for recall and performance, but also for provenance fidelity, selection traceability, epistemic quality, correction pathways, and role preservation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04264v1</guid>
      <category>cs.MA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Diego F. Cuadros, Abdoul-Aziz Maiga, Helen Meskhidze, Andre Curtis-Trudel</dc:creator>
    </item>
    <item>
      <title>Explaining and Preventing Alignment Collapse in Iterative RLHF</title>
      <link>https://arxiv.org/abs/2605.04266</link>
      <description>arXiv:2605.04266v1 Announce Type: new 
Abstract: Reinforcement learning from human feedback (RLHF) typically assumes a static or non-strategic reward model (RM). In iterative deployment, however, the policy generates the data on which the RM is retrained, creating a feedback loop. Building on the Stackelberg game formulation of this interaction, we derive an analytical decomposition of the policy's true optimization gradient into a standard policy gradient and a parameter-steering term that captures the policy's influence on the RM's future parameters. We show that standard iterative RLHF, which drops this steering term entirely, suffers from alignment collapse: the policy systematically exploits the RM's blind spots, producing low-quality, high-reward outputs whose feedback reinforces the very errors it exploits. To mitigate this, we propose foresighted policy optimization (FPO), a mechanism-design intervention that restores the missing steering term by regularizing the policy's parameter-steering effect on RM updates. We instantiate FPO via a scalable first-order approximation and demonstrate that it prevents alignment collapse on both controlled environments and an LLM alignment pipeline using Llama-3.2-1B.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04266v1</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Etienne Gauthier, Francis Bach, Michael I. Jordan</dc:creator>
    </item>
    <item>
      <title>QUIVER: Cost-Aware Adaptive Preference Querying in Surrogate-Assisted Evolutionary Multi-Objective Optimization</title>
      <link>https://arxiv.org/abs/2605.04267</link>
      <description>arXiv:2605.04267v1 Announce Type: new 
Abstract: Interactive multi-objective optimization systems face a budget allocation dilemma: one can spend resources on expensive objective evaluations or on eliciting decision-maker preferences that identify the relevant region of the Pareto set. Moreover, preference elicitation itself spans modalities with different information content and cognitive burden, ranging from cheap, noisy pairwise preference statements (PS) to richer but costlier indifference adjustments (IA).
  We study cost-aware optimization under an unknown scalarization and introduce QUIVER (Query-Informed Value Estimation for Regret), a surrogate-assisted evolutionary multi-objective optimizer that adaptively chooses between objective evaluations and heterogeneous preference queries. At each step, QUIVER selects the next action by maximizing the expected decision-quality improvement per unit total cost. Across DTLZ and WFG benchmarks under synthetic decision-maker models, QUIVER achieves the lowest final utility regret on challenging WFG problems (utility regret of 2.14 on WFG4, 2.82 on WFG9: a 25% improvement over baselines), outperforming all single-modality baselines. We analyze how the optimal mix of PS and IA adapts to problem difficulty: on easy problems (DTLZ2), QUIVER selects 80\% PS queries; on hard problems (WFG9), it shifts to 35% IA queries. This adaptive modality selection demonstrates cost-aware preference learning in action.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04267v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Florian A. D. Burnat</dc:creator>
    </item>
    <item>
      <title>OPENJ: A Conceptual Framework for Open-Source Digital Human Modeling and Ergonomic Assessment in a CAD Environment</title>
      <link>https://arxiv.org/abs/2605.04270</link>
      <description>arXiv:2605.04270v1 Announce Type: new 
Abstract: Industrial workplace challenges range from musculoskeletal disorders -- a leading cause of occupational injury -- to suboptimal workstation layouts, inefficient task sequences, and poor human-equipment fit. Digital human modeling (DHM) tools address several of these challenges by placing a scalable virtual mannequin in a computer-aided design (CAD) environment, enabling engineers to evaluate ergonomic risk through standardized assessment methods (RULA, REBA, NIOSH Lifting Equation, OWAS), optimize workstation layouts for reach and visibility, predict task postures through inverse kinematics, and simulate operations before physical implementation. Despite four decades of development since the Jack system originated at the University of Pennsylvania in the 1980s, the integrated DHM capability set -- anthropometric mannequin, posture prediction, ergonomic assessment, and CAD integration -- remains exclusive to commercial platforms such as Siemens Tecnomatix Jack (Process Simulate), Dassault DELMIA, Humanetics RAMSIS, and the University of Iowa's Santos system. These platforms operate under proprietary, vendor-quoted pricing models, and their acquisition and operating costs, together with closed-source implementations, have been repeatedly identified as practical adoption barriers for individual researchers, small-to-medium enterprises, and educational institutions. Organizations without access resort to manual observational methods -- paper-based worksheets applied to photographs or video -- sacrificing the predictive power and reproducibility that computational analysis provides. The paper serves as a design blueprint for (OpenJane/Joe), positioning the project for subsequent open-source implementation and community adoption.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04270v1</guid>
      <category>cs.HC</category>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Sinan Bank, Casey E. Eaton</dc:creator>
    </item>
    <item>
      <title>A Mean Curvature Approach to Boundary Detection: Geometric Insights for Unsupervised Learning</title>
      <link>https://arxiv.org/abs/2605.04274</link>
      <description>arXiv:2605.04274v1 Announce Type: new 
Abstract: Accurate boundary detection in high-dimensional data remains a central challenge in unsupervised learning, particularly in the presence of non-linear structures and heterogeneous densities. In this work, we introduce Mean Curvature Boundary Points (MCBP), a novel geometric framework grounded in Geometric Machine Learning that departs from traditional density-based approaches by explicitly modeling the intrinsic curvature of the data manifold. The method relies on a discrete approximation of the shape operator, estimated from local k-nearest neighbor patches, to compute pointwise mean curvature without requiring explicit manifold parametrization. The key insight of MCBP is to use mean curvature as a principled descriptor of boundary structure: high-curvature regions naturally correspond to transitions between clusters, geometric irregularities, and low-density interfaces. This yields a unified geometric interpretation of boundary, outlier, and transition points. We further introduce an adaptive percentile-based thresholding scheme that enables multiscale boundary extraction without relying on ad hoc density parameters. Beyond detection, we propose a curvature-driven data decomposition that separates samples into smooth (low-curvature) and boundary (high-curvature) subsets, effectively acting as a non-linear geometric filtering mechanism. This representation enhances cluster separability and improves the robustness of downstream unsupervised algorithms. Extensive experiments on synthetic and real-world datasets demonstrate that MCBP consistently improves clustering performance, particularly in complex and high-dimensional scenarios. These results position MCBP as a concrete contribution to Geometric Machine Learning, highlighting the potential of curvature-aware analysis as a unifying paradigm bridging differential geometry and data-driven modeling.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04274v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alexandre L. M. Levada</dc:creator>
    </item>
    <item>
      <title>Material Database Agent: A Multimodal Agentic Framework for Scientific Literature Mining</title>
      <link>https://arxiv.org/abs/2605.04278</link>
      <description>arXiv:2605.04278v1 Announce Type: new 
Abstract: Materials science workflows rely on structured and unstructured data from the vast body of available scientific literature. However, most of the experimental details remain buried in text, tables, graphs and figures. Thus, constructing databases that incorporate this data is a manual, time-consuming, and hard-to-scale process. Multimodal large language models have made it feasible to extract information from text and scientific figures with high speed and accuracy. This opens the possibility of an AI system that can create production-scale material databases. Material Database Agent (MDA) is a modular, multi-agent system architecture for converting research literature into structured databases. MDA accepts article PDFs as input, which are subsequently processed in parallel into markdown files and figures. Multiple sub-agents read these markdown files and figures in parallel to assemble sub-databases for each paper. These sub-databases are then compiled into a single tabular database by an agent. As opposed to using either a rule-based approach or a single-pass pipeline for extracting information, MDA is a specialized architecture for transforming the literature into a database in the field of materials science. More generally, this study provides a basis for positioning multimodal agentic information extraction as a viable means for constructing next-generation scientific databases from the primary literature.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04278v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Achuth Chandrasekhar, Omid Barati Farimani, Radheesh Sharma Meda, Amir Barati Farimani</dc:creator>
    </item>
    <item>
      <title>Gradient Flow Structure and Quantitative Dynamics of Multi-Head Self-Attention</title>
      <link>https://arxiv.org/abs/2605.04279</link>
      <description>arXiv:2605.04279v1 Announce Type: new 
Abstract: Transformer self-attention can be interpreted as a gradient flow on the unit sphere, in which tokens evolve under softmax interaction potentials and tend to form clusters. While prior work has established clustering behavior for single-head attention, the multi-head setting remains less understood due to geometric interference between heads, which invalidates standard monotonicity arguments.
  In this work, we develop a theoretical framework for multi-head self-attention dynamics and resolve several open questions. We show that, under suitable conditions on the score matrices, a natural multi-head energy functional is non-decreasing along both flat and spherical dynamics. We identify the key obstruction to per-head monotonicity as radial shadow terms, which are projections of each head's output onto token directions, persisting even under orthogonality assumptions. We introduce a sufficient condition ensuring monotonicity and establish robustness to approximate orthogonality.
  In a simplified scalar-head regime with equiangular token configurations, we derive a closed-form expression for the critical inverse temperature governing clustering behavior, and show that heterogeneous heads exhibit super-additive clustering rates. In this regime, we also prove a separation in clustering time between ReLU and softmax attention in the linearized dynamics. Finally, we establish an entropy production identity and show that attention entropy increases monotonically toward equilibrium as clustering progresses.
  Our results provide a unified perspective on the dynamics of multi-head attention and clarify the mechanisms underlying clustering and stability in transformer models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04279v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ayan Pendharkar</dc:creator>
    </item>
    <item>
      <title>Revocation-Ready CP-ABE Key Management for Blockchain-Based IoT Data Sharing</title>
      <link>https://arxiv.org/abs/2605.04280</link>
      <description>arXiv:2605.04280v1 Announce Type: new 
Abstract: Blockchain-based IoT data sharing systems increasingly adopt a hybrid architecture in which a permissioned ledger stores tamper-evident metadata while encrypted payloads are placed in content-addressed storage. In such systems, a central security bottleneck is key access control: enforcing dynamic, multi-user authorization for releasing or using bulk-data decryption keys. Existing designs often rely on always-online RBAC or smart-contract gates that return keys to authorized users, reintroducing a trusted online policy enforcement point and weakening auditability. This paper presents a revocation-ready key management layer that replaces online key release with ciphertext key publication: the ledger records metadata of the form (CID, CK, PolicyID, epoch), where CK is a CP-ABE ciphertext encapsulating an AES-GCM key. Users retrieve CK from the ledger and decrypt locally if their attributes satisfy the policy.
  To support forward revocation and policy evolution without re-encrypting large files, the design introduces an epoch/time-bound attribute and a lightweight CK-rotation protocol that updates only small ciphertext keys and ledger entries. We implement a minimal end-to-end prototype using a local content-addressed store, a hash-chained ledger, and a CP-ABE backend, with the goal of isolating key-management costs rather than benchmarking production blockchain throughput. Experiments on a commodity MacBook show that CP-ABE encryption dominates store latency, with approximately 186 ms for a k=6 mixed-Boolean policy, while ledger and storage operations remain around 1-2 ms. Epoch-based revocation amortizes key update cost under churn, gateway-assisted mode reduces median client-side decryption time by more than 4x under a simulated 4x client slow-down, and ledger growth scales with the number of shared assets rather than the number of readers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04280v1</guid>
      <category>cs.CR</category>
      <category>cs.DC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chun Yin Chiu</dc:creator>
    </item>
    <item>
      <title>Hardware-Aware Neural Feature Extraction for Resource-Constrained Devices</title>
      <link>https://arxiv.org/abs/2605.04282</link>
      <description>arXiv:2605.04282v1 Announce Type: new 
Abstract: Visual SLAM is a core component of spatial computing systems, yet deploying learned local feature extractors on microcontroller-class hardware remains challenging due to memory, bandwidth, and quantization constraints. While modern neural descriptors provide strong robustness, their practical adoption is often hindered by system-level bottlenecks that are not captured by FLOP-based efficiency metrics. In this work, we introduce Gideon, a hardware-aware neural feature extractor explicitly designed for resource-constrained devices. Our approach combines relational knowledge distillation from a SuperPoint teacher with differentiable neural architecture search (DNAS) under strict memory and operator constraints. Unlike conventional design pipelines, we treat quantization stability and dynamic-range compactness as first-class objectives. We show that architectural choices such as replacing Batch Normalization with affine layers significantly improve INT8 robustness, and that descriptor dimensionality directly governs quantization resilience. Deployed on STM32N6, Gideon achieves 9.003 ms inference time (111 fps) while remaining below a 1.5 MB memory footprint. Remarkably, INT8 quantization induces negligible degradation and occasionally matches full-precision performance. These results demonstrate that robust learned feature extraction can be reconciled with embedded hardware constraints through holistic hardware-algorithm co-design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04282v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Francesco Tosini, Simone Pedroni, Christian Veronesi, Pietro Bartoli, Marco Paracchini, Marco Marcon, Diana Trojaniello</dc:creator>
    </item>
    <item>
      <title>Probabilistic Classification and Uncertainty Quantification of Sahara Desert Climate Using Feedforward Neural Networks</title>
      <link>https://arxiv.org/abs/2605.04286</link>
      <description>arXiv:2605.04286v1 Announce Type: new 
Abstract: Climate classification plays a vital role in agricultural planning, hydrological studies, and climate science. One of the most widely used systems for classifying global climate zones is the K\"oppen-Trewartha (KT) classification. However, the KT classification is fundamentally deterministic, offering discrete labels to spatial locations without accounting for uncertainties in classification. In this paper, we provide a framework for probabilistic modeling of climatic zones. We implement a feedforward artificial neural network (ANN) for classification, allowing for efficient, uncertainty-aware categorization of climatic regions, thereby offering a more nuanced understanding of transitional climate zones compared to traditional deterministic methods. We apply this method to the Sahara Desert region over the 30-year period of 1960 - 1989, using data at more than 400,000 space-time locations from the first 11 years to train our model. We assess the model's short- and long-term classification capabilities to evaluate its stability and accuracy over time. We also compare the probabilistic classification from our model with the traditional KT classification. In addition, we use fluctuation analysis methods to highlight the temporal evolution of climatic zones across the Sahara region and identify areas undergoing significant flux of probabilities of their climate classes, providing insights into broader trends in desertification.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04286v1</guid>
      <category>cs.LG</category>
      <category>stat.AP</category>
      <category>stat.CO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Stephen Tivenan, Indranil Sahoo, Yanjun Qian</dc:creator>
    </item>
    <item>
      <title>Building Power Grid Models from Open Data: A Complete Pipeline from OpenStreetMap to Optimal Power Flow</title>
      <link>https://arxiv.org/abs/2605.04289</link>
      <description>arXiv:2605.04289v1 Announce Type: new 
Abstract: Access to realistic transmission grid models is essential for power systems research, yet detailed network data in the United States remains restricted under critical-infrastructure regulations. We present a pipeline that constructs complete, OPF-solvable transmission network models entirely from publicly available data. The five-stage pipeline (1) extracts power infrastructure from OpenStreetMap via a local Overpass API instance, (2) reconstructs bus-branch topology through voltage inference, line merging, and transformer detection, (3) estimates electrical parameters using voltage-class lookup tables calibrated with U.S. Energy Information Administration (EIA) plant-level data, (4) allocates hourly demand from EIA-930 to individual buses using US Census population as a spatial proxy, and (5) solves both DC and AC optimal power flow using PowerModels.jl with a progressive relaxation strategy that automatically loosens constraints on imprecise models. We validate the pipeline on all 48 contiguous US states and six multi-state regions, including the full Western (5,076 buses) and Eastern (21,697 buses) Interconnections. Of the 48 single-state models, 42 (88%) converge at the strictest relaxation level for AC-OPF at peak hour and 44 (92%) off-peak. Dispatch costs (median $22/MWh) and system losses (median 1.0%) are consistent with real wholesale-market outcomes. The pipeline relies exclusively on open data sources, enabling reproducible grid analysis without proprietary data. All 54 models (48 single-state and 6 multi-state) are publicly released at https://github.com/microsoft/GridSFM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04289v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Andrea Britto, Thiago Spina, Weiwei Yang, Spencer Fowers, Baosen Zhang, Chris White</dc:creator>
    </item>
    <item>
      <title>StormWave: An Open-Source Portable SDR Platform for Over-the-Air Resilience Evaluation of Terrestrial and Aerial Communications</title>
      <link>https://arxiv.org/abs/2605.04290</link>
      <description>arXiv:2605.04290v1 Announce Type: new 
Abstract: This paper presents \emph{StormWave}, an open-source, portable software-defined Radio Frequency (RF) interference generation and monitoring platform designed for realistic field-based evaluation of the resilience of wireless communication systems. StormWave enables seamless composition and runtime switching among a wide range of narrowband and wideband waveforms, while supporting multiple digital modulations, adaptive coding, and multi-radio orchestration with real-time spectrum visualization. We evaluate the effectiveness of StormWave through both outdoor ground and air-to-air (A2A) experiments. Ground experiments demonstrate clear waveform- and modulation-dependent interference effects under realistic propagation conditions, while A2A experiments reveal pronounced distance-dependent constellation distortion and access-symbol degradation under active interference. The StormWave source code will be released to the community, with the expectation that StormWave will be used as a flexible, extensible, and field-ready platform for systematically validating interference resilience of wireless systems under realistic operating conditions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04290v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>eess.SP</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuqing Cui, Zhaoxi Zhang, Sidharth Santhi Nivas, Prem Sagar Pattanshetty Vasanth Kumar, Maxwell McManus, Chenzhi Zhao, Guanying Sun, Nicholas Mastronarde, George Sklivanitis, Dimitris A. Pados, Elizabeth Serena Bentley, Zhangyu Guan</dc:creator>
    </item>
    <item>
      <title>Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion</title>
      <link>https://arxiv.org/abs/2605.04291</link>
      <description>arXiv:2605.04291v1 Announce Type: new 
Abstract: We present a discrete diffusion-based language model using Glauber dynamics from statistical physics. Our main insight is that instead of trying to train a discrete state space diffusion model using Glauber dynamics with a uniform transition kernel as the forward process, one can set up an ``energy function'' based on pretrained causal/masked language models. When viewed as the stationary distribution, this energy function allows us to significantly improve the quality of the generated text. Incorporating UL2 as the pretrained model into our diffusion pipeline, we outperform prior diffusion based LMs and perform competitively with autoregressive models of comparable model sizes. Furthermore, our models are competitive with or outperform prior diffusion models and GPT-2 style auto-regressive models on zero-shot common sense reasoning tasks as well as planning and search tasks like Sudoku and Zebra puzzles.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04291v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tarun Kathuria, Sachin Kumar</dc:creator>
    </item>
    <item>
      <title>LLMs Uncertainty Quantification via Adaptive Conformal Semantic Entropy</title>
      <link>https://arxiv.org/abs/2605.04295</link>
      <description>arXiv:2605.04295v1 Announce Type: new 
Abstract: LLMs' overconfidence, particularly when hallucinating, poses a significant challenge for the deployment of the models in safety-critical settings and makes a reliable estimation of uncertainty necessary. Existing approaches for uncertainty quantification typically prioritize lexical or probabilistic measures; however, these techniques often ignore the semantic variance of different responses with similar meaning. In this paper, we propose Adaptive Conformal Semantic Entropy (ACSE), a method for estimating prompt-level uncertainty by adaptively measuring semantic dispersion in LLMs outputs. Our uncertainty scoring function is based on clustering semantic entropy of multiple diverse responses to the same prompt. The function adaptively adjusts the uncertainty score based on semantic features of each cluster. To ensure statistical reliability of our score, we use conformal calibration to apply a decision rule to accept/abstain the prompts, providing a finite-sample, distribution-free guarantee such that the error rate among the accepted responses remains bounded by a user-specified tolerance. Our extensive experimental evaluations using different LLMs and datasets, demonstrate that our approach consistently outperforms state-of-the-art uncertainty quantification baselines using discriminative performance, conformal guarantees, and probabilistic calibration indicators. As a highlight, for TriviaQA dataset, AUROC of our approach is 0.88 compared to 0.65 produced by the token entropy approach.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04295v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hamed Karimi, Vaishali Meyappan, Reza Samavi</dc:creator>
    </item>
    <item>
      <title>Dynamic Quantum-Assisted Co-Design of Control Tuning and Lyapunov Stability Synthesis for Nonlinear Systems</title>
      <link>https://arxiv.org/abs/2605.04296</link>
      <description>arXiv:2605.04296v1 Announce Type: new 
Abstract: This paper proposes a dynamic quantum-assisted co-design framework for nonlinear closed-loop systems in which controller parameters and Lyapunov-certificate parameters are redesigned jointly at successive decision epochs. Unlike conventional nonlinear control designs that typically tune controller gains offline and verify stability separately, the proposed method embeds performance improvement and Lyapunov-based stability synthesis within a unified online optimization loop. The main novelty is a two-step computational structure that first contracts the continuous admissible search region around the current operating condition using a Black-Hole-based calibration procedure and then constructs a finite binary representation only over this calibrated region. The encoded objective is obtained from sampled nonlinear closed-loop evaluations and approximated by a local quadratic pseudo-Boolean surrogate, enabling an Ising-type Hamiltonian representation suitable for quantum-assisted optimization. Quantum imaginary time evolution is then used to explore the encoded Hamiltonian, and the resulting candidate bitstrings are decoded into continuous controller and Lyapunov parameters. To reduce dependence on the surrogate model, the decoded candidates are re-evaluated using the original nonlinear closed-loop cost and Lyapunov penalties before the final update is applied. The framework can accommodate different Lyapunov decay specifications by modifying the stability penalty and is validated on first-order nonlinear consensus, second-order nonlinear consensus, and induction-motor drive control examples. The implementation code used to generate the reported results is available at \href{https://github.com/LSU-RAISE-LAB/DQCLS-NS}{GitHub}.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04296v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Milad Hasanzadeh, Amin Kargarian, Mehdi Farasat</dc:creator>
    </item>
    <item>
      <title>Towards Self-Referential Analytic Assessment: A Profile-Based Approach to L2 Writing Evaluation with LLMs</title>
      <link>https://arxiv.org/abs/2605.04298</link>
      <description>arXiv:2605.04298v1 Announce Type: new 
Abstract: Automated essay scoring (AES) research often relies on rank-based correlation metrics to validate analytic assessment. However, such metrics obscure both intrinsic intercorrelations among analytic dimensions that arise from the structure of writing proficiency itself and halo effects, whereby holistic impressions bleed into fine-grained component scores. As a result, high correlations may mask a system's true diagnostic behaviour. In this study, we propose a novel self-referential assessment evaluation framework that focuses on identifying intra-learner strengths and weaknesses rather than assessing inter-learner rankings. We conduct experiments on the publicly available ICNALE GRA, a uniquely dense second-language writing dataset annotated holistically and analytically by up to 80 trained raters. To obtain reliable reference scores, we apply two-facet Rasch modelling to calibrate rater severity and derive fair average scores across ten analytic aspects and holistic proficiency. We compare the analytic scoring performance of human operational raters and three large language models (LLMs) in a zero-shot setting. Our results show that LLMs tend to outperform single human raters in identifying relative weaknesses (negative feedback) across several proficiency aspects, while human raters remain stronger at identifying relative strengths (positive feedback). Overall, our findings highlight the limitations of rank-based evaluation for analytic assessment and demonstrate the value of intra-learner, profile-based methods for assessing and deploying LLMs in AES.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04298v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Stefano Bann\`o, Kate Knill, Mark Gales</dc:creator>
    </item>
    <item>
      <title>Beyond Fixed Thresholds and Domain-Specific Benchmarks for Explainable Multi-Task Classification in Autonomous Vehicles</title>
      <link>https://arxiv.org/abs/2605.04299</link>
      <description>arXiv:2605.04299v1 Announce Type: new 
Abstract: Scene understanding is a vital part of autonomous driving systems, which requires the use of deep learning models. Deep learning methods are intrinsically black box models, which lack transparency and safety in autonomous driving. To make these systems transparent, multi-task visual understanding has become crucial for explainable autonomous driving perception systems, where simultaneous prediction of multiple driving behaviors and their underlying explanations is essential for safe navigation and human trust in autonomous vehicles. In order to design an accurate and cross-cultural explainable autonomous driving system, we introduce a comprehensive confidence threshold sensitivity analysis that evaluates various threshold values to identify optimal decision boundaries for different tasks. Our analysis demonstrates that traditional fixed threshold approaches are suboptimal for multi-task scenarios. Through extensive evaluation, we demonstrate that our adaptive threshold selection methodology improves F1-scores across different tasks. In addition, we introduce IUST-XAI-AD, a novel dataset consisting of 958 images with human annotations for driving decisions and corresponding reasoning. This dataset addresses the critical gap in domain-specific evaluation benchmarks for distinct driving contexts and provides a more challenging test environment compared to existing datasets. Experimental results demonstrate that confidence threshold sensitivity analysis can significantly improve model performance, while the introduction of the IUST-XAI-AD dataset reveals important insights about cross-cultural driving behavior patterns. The combined contributions of this work provide both methodological advances and practical evaluation tools that can accelerate the development of more reliable, explainable, and culturally-adaptive autonomous driving systems for global deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04299v1</guid>
      <category>cs.CV</category>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Maryam Sadat Hosseini Azad, Shahriar Baradaran Shokouhi</dc:creator>
    </item>
    <item>
      <title>Rigid homotopies for sampling from algebraic varieties: a Waring structure complexity model</title>
      <link>https://arxiv.org/abs/2605.04302</link>
      <description>arXiv:2605.04302v1 Announce Type: new 
Abstract: Polynomial system solving has seen major progress in both theory and practice over the past decade. A landmark achievement was addressing Smale's 17th problem, establishing average-case polynomial-time algorithms for computing approximate solutions of polynomial systems via homotopy continuation. Recent improvements in complexity bounds for these algorithms led to the development of rigid homotopy methods. In this article, we prove a new complexity result for rigid homotopies for polynomial systems with Waring representations of prescribed length. In addition, we provide the first computational experiments for rigid homotopies using a preliminary implementation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04302v1</guid>
      <category>math.NA</category>
      <category>cs.CC</category>
      <category>cs.NA</category>
      <category>math.AG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Abigail R. Jones, Kisun Lee, Jose Israel Rodriguez</dc:creator>
    </item>
    <item>
      <title>Hierarchical Visual Agent: Managing Contexts in Joint Image-Text Space for Advanced Chart Reasoning</title>
      <link>https://arxiv.org/abs/2605.04304</link>
      <description>arXiv:2605.04304v1 Announce Type: new 
Abstract: Advanced chart question answering requires both precise perception of small visual elements and multi-step reasoning across several subplots. While existing MLLMs are strong at understanding single plots, they often struggle with multi-step reasoning across multiple subplots. We propose HierVA, a hierarchical visual agent framework for chart reasoning that iteratively constructs and updates a working context in a joint image--text space. A high-level manager generates plans and maintains a compact context containing only key information, while specialized workers perform reasoning, gather evidence, and return results. In particular, the agent maintains separate visual and textual contexts, using a zoom-in tool to restrict the visual context. Experiments on the CharXiv reasoning subset demonstrate consistent improvements over strong multimodal baselines, and ablation studies verify that hierarchical architecture, scoped visual context, and distilled context contribute complementary gains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04304v1</guid>
      <category>cs.CV</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qihua Dong, Ruozhen He, Junwen Chen, Yizhou Wang, Xu Ma, Songyao Jiang, Yun Fu</dc:creator>
    </item>
    <item>
      <title>SWAN: Semantic Watermarking with Abstract Meaning Representation</title>
      <link>https://arxiv.org/abs/2605.04305</link>
      <description>arXiv:2605.04305v1 Announce Type: new 
Abstract: We introduce SWAN (Semantic Watermarking with Abstract Meaning Representation), a novel framework that embeds watermark signatures into the semantic structure of a sentence using Abstract Meaning Representation (AMR). In contrast to existing watermarking methods, which typically encode signatures by adjusting token selection preferences during text generation, SWAN embeds the signature directly in the sentence's semantic representation. As the signature is encoded at the semantic structure level, any paraphrase that preserves meaning automatically preserves the signature. SWAN is training-free: watermark injection is achieved by prompting an LLM to generate sentences guided by a selected AMR template while maintaining contextual coherence, and detection uses an off-the-shelf AMR parser followed by a simple one-proportion z-test. Empirical evaluation on the RealNews benchmark shows SWAN matches state-of-the-art detection performance on unaltered watermarked text, while significantly improving robustness against paraphrasing, increasing detection AUC by up to 13.9 percentage points compared to prior methods. These results demonstrate that SWAN's approach of anchoring watermarks in AMR semantic structures provides a simple, effective, and prompt-based method for robust text provenance verification under paraphrasing, opening new avenues for semantic-level watermarking research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04305v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Ziping Ye, Gourab Dey, Christos Christodoulopoulos, Charith Peris, Anil Ramakrishna, Weitong Ruan, Aram Galstyan, Kai-Wei Chang, Rahul Gupta, Ninareh Mehrabi</dc:creator>
    </item>
    <item>
      <title>dtour: a steerable tour de vis through high-dimensional data</title>
      <link>https://arxiv.org/abs/2605.04306</link>
      <description>arXiv:2605.04306v1 Announce Type: new 
Abstract: Understanding high-dimensional data requires projecting it into lower-dimensional spaces, but any single projection inevitably loses information or introduces distortions. Tours address this limitation through animation of 2D projection sequences, yet existing tools present tradeoffs in the freedom and steerability of projection traversal, providing little to no ability to move between expert-guided paths and unrestrained exploration. We present dtour, a tour interface that combines static projection previews, reversible scrubbing along continuous geodesic projection paths, manual projection manipulation, and a wandering grand tour, all within a single progressive exploration interface. dtour scales to millions of points via GPU-accelerated rendering, runs in any modern browser, and integrates with both Python and JavaScript ecosystems. We demonstrate dtour on text, image, and single-cell data for two usage scenarios: gradually revealing structure in high-dimensional data and validating non-linear dimensionality reduction outputs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04306v1</guid>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Fritz Lekschas, Nezar Abdennur</dc:creator>
    </item>
    <item>
      <title>Memory as a Markov Matrix: Sample Efficient Knowledge Expansion via Token-to-Dictionary Mapping</title>
      <link>https://arxiv.org/abs/2605.04308</link>
      <description>arXiv:2605.04308v1 Announce Type: new 
Abstract: Continual incorporation of new knowledge is essential for the long-term evolution of large language models (LLMs). Existing approaches typically rely on parameter-update algorithms to mitigate catastrophic forgetting, yet they suffer from fundamental limitations: 1) forgetting is unavoidable as the amount of newly injected knowledge grows; and 2) model updates are often irreversible. As modern LLMs become increasingly expressive, it is natural to question whether large-scale weight updates are necessary for acquiring a small amount of new knowledge. In this work, we propose a principled framework that models autoregressive language generation as a Markov process over tokens, where model memory is represented by a Markov transition matrix. Under this formulation, incorporating new knowledge/tokens corresponds to extending the state space, and preserving existing transitions guarantees retention of previously learned knowledge. We then prove a sample complexity bound for incorporating new tokens via a token-to-dictionary mapping strategy. In particular, for learning the transition behavior of each new token, the required number of samples scales linearly with the number of existing tokens it is mapped to. To realize this mapping, we propose an embedding-tuning algorithm that requires minimal parameter updates and induces zero forgetting. Experimental results further demonstrate the effectiveness of our method and validate our theoretical findings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04308v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kaustubh Pethkar, Ziyang Xiong, Zuofeng Shang, Yingcong Li</dc:creator>
    </item>
    <item>
      <title>Interpreting V1 Population Activity via Image-Neural Latent Representation Alignment</title>
      <link>https://arxiv.org/abs/2605.04309</link>
      <description>arXiv:2605.04309v1 Announce Type: new 
Abstract: Understanding the neural mechanisms underlying visual computation has long been a central challenge in neuroscience. Recent alignment based approaches have improved the accuracy of decoding visual stimuli from brain activity, yet they provide limited insight into the neural computations that give rise to these improvements. To address this gap, we propose Dual-Tower Image-Neural Alignment (DINA), an interpretable contrastive framework for analyzing population level visual computations in primary visual cortex (V1). DINA jointly trains a biologically motivated dual-tower architecture that aligns visual stimuli and corresponding V1 population responses in a shared latent space at the level of intermediate feature maps, enabling both accurate decoding and direct access to interpretable feature maps. Evaluated on large-scale two-photon calcium imaging data from mouse V1, DINA achieves accurate neural-based decoding while revealing that decoding performance is primarily supported by coarse, low-level visual structure, rather than semantic category information or fine-grained details. Further analysis reveals that alignable feature maps emerge from multiple spatially distributed image regions, capturing both shape and texture cues, and are predominantly reconstructed by sparse subsets of strongly responsive neurons and their functional interactions. Together, these results confirm that, beyond enabling accurate decoding, DINA provides a principled framework for probing the computational mechanisms underlying visual processing in V1.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04309v1</guid>
      <category>cs.NE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xin Wang, Zhuangzhi Gao, Hongyi Qin, Zhongli Wu, Feixiang Zhou, He Zhao</dc:creator>
    </item>
    <item>
      <title>ClusterLess: Deadline-Aware Serverless Workflow Orchestration on Federated Edge Clusters</title>
      <link>https://arxiv.org/abs/2605.04310</link>
      <description>arXiv:2605.04310v1 Announce Type: new 
Abstract: The recent convergence of edge computing, serverless execution, and Kubernetes (K8s) based container orchestration has enabled the processing of application workflows close to data sources. While effective within a single edge cluster, existing schemes do not generalize to federated multi edge environments, where multiple workflows execute concurrently under strict end to end (E2E) deadline constraints. This paper introduces ClusterLess, a deadline aware serverless workflow orchestration method for federated multi edge K8s clusters. ClusterLess manages the E2E lifecycle of workflow execution, including dependency analysis, execution mode selection, and resource aware placement. To this end, it integrates structured intra cluster orchestration with a leader selected, super master driven intercluster coordination layer, determining where and how each workflow function should be executed across the federated edge clusters. We implement ClusterLess using OpenFaaS as the serverless execution substrate and Argo for workflow management, and deploy it on a realistic testbed of six edge clusters comprising 64 heterogeneous edge nodes. Experimental results with concurrent serverless workflows, spanning 18 workload configurations across different input sizes and deadline classes, show that ClusterLess reduces workflow completion time by up to 40 %, increases deadline satisfaction from below 50 % to over 90 %, and confines deadline violations to single digit seconds compared to four baseline methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04310v1</guid>
      <category>cs.DC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Reza Farahani, Mario Colosi, Ilir Murturi, Stefan Nastic, Massimo Villari, Schahram Dustdar, Radu Prodan</dc:creator>
    </item>
    <item>
      <title>Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games</title>
      <link>https://arxiv.org/abs/2605.04312</link>
      <description>arXiv:2605.04312v1 Announce Type: new 
Abstract: Static capabilities benchmarks suffer from saturation and contamination, making it difficult to track capabilities progress over time. We introduce Agent Island, a multiplayer simulation environment in which language-model agents compete in a game of interagent cooperation, conflict, and persuasion. The environment yields a dynamic benchmark designed to mitigate both saturation and contamination; new models can always outperform the current leading player in this winner-take-all game, and agents compete against other adaptive agents rather than face a fixed task set. We rank players with a Bayesian Plackett-Luce model, allowing us to quantify uncertainty in player skill. In 999 games involving 49 unique models, openai/gpt-5.5 dominates its peers with a posterior mean skill of 5.64, compared with 3.10 for the second-ranked model, openai/gpt-5.2, and 2.86 for the third-ranked model, openai/gpt-5.3-codex. We release the game logs as a dataset for analyses of model behavior. As an example, we investigate same-provider preference in final-round votes and find that models are 8.3 p.p. more likely to support a same-provider finalist than finalists from other providers. This preference is not uniform across providers: among separately estimated providers, the effect is strongest for OpenAI models and weakest for Anthropic models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04312v1</guid>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Connacher Murphy</dc:creator>
    </item>
    <item>
      <title>NoisyCausal: A Benchmark for Evaluating Causal Reasoning Under Structured Noise</title>
      <link>https://arxiv.org/abs/2605.04313</link>
      <description>arXiv:2605.04313v1 Announce Type: new 
Abstract: Causal reasoning in natural language requires identifying relevant variables, understanding their interactions, and reasoning about effects and interventions, often under noisy or ambiguous conditions. While large language models (LLMs) exhibit strong general reasoning abilities, they struggle to disentangle correlation from causation, particularly when observations are partially incorrect or irrelevant information is present. In this work, we introduce NoisyCausal, a new benchmark designed to evaluate causal reasoning under structured noise. Each instance is generated from a ground-truth causal graph and contextualized with a natural language scenario by injecting controllable forms of noise, such as irrelevant distractors, value perturbations, confounding, and partial observability. Moreover, we propose a modular reasoning framework that combines LLMs with explicit causal structure to address these challenges. Our method prompts the LLM to extract variables, construct a causal graph from context, and then reformulates the reasoning task as a structured prompt grounded in this graph. Rather than relying on statistical patterns alone, the LLM is guided by symbolic structure, enabling more interpretable and robust inference. Experimental results show that our method significantly outperforms standard prompting and reasoning baselines on NoisyCausal. Furthermore, it generalizes well to external benchmarks such as Cladder without task-specific tuning. Our findings highlight the importance of combining causal abstractions with language-driven reasoning to achieve faithful and robust causal understanding in LLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04313v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhi Xu, Yun Fu</dc:creator>
    </item>
    <item>
      <title>Orchestrating Serverless Applications in the Edge Cloud Space Continuum: What Breaks and What is Next?</title>
      <link>https://arxiv.org/abs/2605.04316</link>
      <description>arXiv:2605.04316v1 Announce Type: new 
Abstract: Serverless computing has matured into an effective execution model for edge cloud environments, enabling function level decomposition, demand driven scaling, and workflow execution across stable, well provisioned infrastructure. This success motivates extending it to the edge cloud space continuum, where Low Earth Orbit (LEO) constellations are increasingly explored as distributed compute substrates. However, existing serverless orchestration is not directly applicable in this setting, where LEO systems impose time varying contact graphs, intermittent link availability, and strict feasibility constraints on energy, memory, communication, and operational cost. This article identifies ten broken assumptions in existing serverless orchestration and organizes them into three core challenges: spatiotemporal execution over dynamic graphs, constraint aware function placement and scaling, and correctness and progress under decentralized and delayed state. It then proposes an architecture that enables robust and efficient serverless execution across the continuum, grounded in these challenges and demonstrated through a representative flood response use case.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04316v1</guid>
      <category>cs.DC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hadi Tabatabaee Malazi, Reza Farahani, Nitinder Mohan, Schahram Dustdar</dc:creator>
    </item>
    <item>
      <title>Reproduction Test Generation for Java SWE Issues</title>
      <link>https://arxiv.org/abs/2605.04320</link>
      <description>arXiv:2605.04320v1 Announce Type: new 
Abstract: Given an issue on a software repository, a reproduction test confirms its presence in the code before it gets fixed and its absence after. Reproduction tests provide crucial execution-based feedback for diagnosis and validation during software development. Unfortunately, they are usually missing. Therefore, recent work has introduced both benchmarks and a thriving literature on solutions for reproduction test generation from issues. However, that work has focused on Python and neglected other languages such as Java, which is important for enterprise software. This paper introduces both a benchmark and a solution for Java repository-level reproduction test generation. The benchmark, TDD-Bench-Java, is the first to model this problem and comprises 250 instances sourced from popular open-source repositories. The solution, e-Otter++ for Java, adapts a state-of-the-art reproduction test generator for Python to yield high performance on Java. To evaluate in an industry setting, besides empirical results with TDD-Bench-Java, this paper also presents results with a contamination-free proprietary dataset. Overall, we hope that this paper contributes to bringing better diagnosis and validation to Java software development.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04320v1</guid>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Toufique Ahmed, Jatin Ganhotra, Avraham Shinnar, Martin Hirzel</dc:creator>
    </item>
    <item>
      <title>AI and Suicide Prevention: A Cross-Sector Primer</title>
      <link>https://arxiv.org/abs/2605.04321</link>
      <description>arXiv:2605.04321v1 Announce Type: new 
Abstract: AI chatbots already function as de facto mental health support tools for millions of people, including people in crisis. Yet, they lack the clinical validation, shared standards, and coordinated oversight that their societal role demands. This primer was developed in conjunction with a multistakeholder workshop hosted by Partnership on AI in 2026, convening AI labs, mental health practitioners, people with lived experience, and policymakers, to provide a common cross-sector reference point for the current state of the field of AI and suicide prevention. It begins with an overview of clinical best practices, then turns to how frontier AI systems (as of winter 2026) detect and respond to suicide and non-suicidal self-injury (NSSI) queries. Together, these provide insight into what it would take to design and implement AI tools that not only better prevent suicide and NSSI, but also promote overall well-being. Drawing on clinical literature, publicly available AI lab policies, an emerging landscape of evaluation frameworks, and conversations with leaders across the AI and mental health fields, we map challenges posed by general-purpose AI chatbots for mental health across model, product, and policy layers, ultimately highlighting priority areas where cross-industry alignment is both urgently needed and achievable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04321v1</guid>
      <category>cs.CY</category>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Emily Saltz, Claire R. Leibowicz</dc:creator>
    </item>
    <item>
      <title>LUCAS-MEGA: A Large-Scale Multimodal Dataset for Representation Learning in Soil-Environment Systems</title>
      <link>https://arxiv.org/abs/2605.04323</link>
      <description>arXiv:2605.04323v1 Announce Type: new 
Abstract: Understanding soil is fundamental to agriculture, carbon cycling, and environmental sustainability, yet progress is limited by fragmented and heterogeneous datasets that constrain modeling to small-scale predictive settings rather than high-dimensional representation learning. We introduce LUCAS-MEGA, a large-scale multimodal dataset constructed through systematic data fusion of European soil-environment observations, with the LUCAS survey as its backbone. The fused dataset comprises over 70,000 samples and more than 1,000 features spanning physical, chemical, environmental, biological, and visual attributes, aggregated from 68 source datasets. To enable integration at scale, we develop SoilFuser, a multi-agent, human-in-the-loop data fusion pipeline that standardizes heterogeneous data formats and measurement protocols, resolves inconsistencies and invalid entries (e.g., unit inconsistencies, codebook mismatches, and erroneous values), incorporates natural language annotations, and harmonizes multimodal attributes and metadata into a unified, machine learning-ready feature space. The resulting dataset captures key characteristics of real-world soil observations, including multimodality, uneven feature coverage, and heterogeneous uncertainty. To demonstrate the usability of LUCAS-MEGA for data-driven modeling, we pretrain a multimodal tabular transformer (SoilFormer) using a self-supervised objective based on feature masking, achieving stable training, strong predictive performance, and representations that support uncertainty-aware prediction. We further show that the learned representations recover relationships consistent with established soil processes. LUCAS-MEGA is released with open access and is accompanied by composable, agent-friendly APIs that support structured querying and data-driven workflows.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04323v1</guid>
      <category>cs.LG</category>
      <category>cs.DB</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kuangdai Leng, Simon Jeffery, Panos Panagos, Tarje Nissen-Meyer</dc:creator>
    </item>
    <item>
      <title>DeFed-GMM-DaDiL: A Decentralized Federated Framework for Domain Adaptation</title>
      <link>https://arxiv.org/abs/2605.04324</link>
      <description>arXiv:2605.04324v1 Announce Type: new 
Abstract: Decentralized multi-source domain adaptation seeks to transfer knowledge from multiple heterogeneous and related source domains to an unlabeled target domain in a decentralized setting. We address this challenge through a fully decentralized federated approach, DeFed-GMM-DaDiL, an extension of the GMM-Dataset Dictionary Learning (DaDiL) framework. Each client models its dataset as a Gaussian Mixture Model (GMM), and the federation jointly approximates them via labeled Wasserstein barycenters of shared, learnable GMM atoms. This design enables adaptation without a central server while preserving clients' privacy. We empirically study the stability of the learned representations in scenarios where the target domain has missing classes. Empirical results demonstrate that DeFed-GMM-DaDiL maintains stable and consistent shared representations across clients, effectively reconstructs missing classes, and achieves competitive performance on multi-source domain adaptation benchmarks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04324v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Rebecca Clain, Eduardo Fernandes Montesuma, Fred Ngole Mboula</dc:creator>
    </item>
    <item>
      <title>On the Architectural Complexity of Neural Networks</title>
      <link>https://arxiv.org/abs/2605.04325</link>
      <description>arXiv:2605.04325v1 Announce Type: new 
Abstract: We introduce a unified theoretical framework for the rigorous analysis and systematic construction of deep neural networks (DNNs). This framework addresses a gap in existing theory by explicitly modeling the structure of tensor operations -- lower level information that is often abstracted. Our framework enables two novel objectives: (1) analysis of the evolution of architectural complexity over deep learning history, and (2) automatic construction of novel architectures based on new types of tensor operations. Our study of DNNs introduced over the past 40 years reveals a connection between groundbreaking architectures and increases in different types of architectural complexity. Moreover, we identify several large classes of higher complexity architectures that have not yet been explored. We then collect a dataset of 3,000+ higher complexity architectures, which we publicly release at: https://github.com/combinatoriallabs/ArchitecturalComplexity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04325v1</guid>
      <category>cs.LG</category>
      <category>cs.DM</category>
      <category>math.CO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nicholas J. Cooper, Fran\c{c}ois G. Meyer, Michael L. Roberts, Carlos Zapata-Carratal\'a, Lijun Chen, Danna Gurari</dc:creator>
    </item>
    <item>
      <title>From Language to Logic: A Theoretical Architecture for VLM-Grounded Safe Navigation</title>
      <link>https://arxiv.org/abs/2605.04327</link>
      <description>arXiv:2605.04327v1 Announce Type: new 
Abstract: We propose an architecture for integrating high-level, human-provided safety rules and operator-aligned semantic preferences into autonomous robot navigation in unstructured outdoor environments. In our approach, natural-language rules are translated into Signal Temporal Logic (STL) specifications that guide planning and navigation during runtime. Persistent, environment-centric rules and terrain preferences are grounded into a 2D cost map, while temporally dynamic requirements are expressed as STL specifications to be monitored during runtime. We hypothesize the use of Vision-Language Models (VLMs) for zero-shot scene understanding, enabling mapping between human instructions, semantic features, and environmental constraints. Within this framework, we construct an illustrative navigation model that is designed to satisfy a set of STL-encoded specifications and soft operator preferences through formal satisfaction metrics embedded into environmental properties and runtime monitoring.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04327v1</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kristy Sakano, Kalonji Harrington, Mumu Xu</dc:creator>
    </item>
    <item>
      <title>The Scaling Properties of Implicit Deductive Reasoning in Transformers</title>
      <link>https://arxiv.org/abs/2605.04330</link>
      <description>arXiv:2605.04330v1 Announce Type: new 
Abstract: We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning approaches explicit CoT performance across graph topologies and problem widths, though CoT remains necessary for depth extrapolation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04330v1</guid>
      <category>cs.AI</category>
      <category>cs.CC</category>
      <category>cs.LO</category>
      <category>cs.SC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Enrico Vompa, Tanel Tammet</dc:creator>
    </item>
    <item>
      <title>Learning-based Statistical Refinement for Denoising</title>
      <link>https://arxiv.org/abs/2605.04332</link>
      <description>arXiv:2605.04332v1 Announce Type: new 
Abstract: This work proposes a learning-based statistical refinement method for improving the denoising results of a given denoiser without knowing the precise noise distribution or accessing clean images or calibration data. While there are many existing successful denoising approaches for handling different kinds of noise, they typically require accurate modelling of the images and the noise (implicitly or explicitly), and hence the denoising results can be suboptimal due to different practical factors such as imperfect models, unreliable noise assumptions, or low quality data. In particular, when clean image samples are not available and there is a lack of knowledge of the underlying noise distribution, which is the case in various practical situations, the results may not well align with the noise statistics. The unawareness of the useful statistical information leads to suboptimal results. This work aims to make the best use of the statistical information to improve the consistency between the given denoising results and the noise statistics, under the assumption that the noise is conditionally pixel-wise independent given the clean signal. A method, based on a Bayesian formulation of an auxiliary signal in the noisy data, is proposed for evaluating the consistency of the denoising results, without precise information on noise distribution. By leveraging the statistical information from noisy data, the method enhances the statistical noise consistency and improves denoising quality.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04332v1</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <category>eess.IV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Rihuan Ke</dc:creator>
    </item>
    <item>
      <title>Resilient AI Supercomputer Networking using MRC and SRv6</title>
      <link>https://arxiv.org/abs/2605.04333</link>
      <description>arXiv:2605.04333v1 Announce Type: new 
Abstract: Tail latency dominates the performance of synchronous pretraining jobs when running at very large scales. We describe a three-pronged approach: (1) a new RDMA-based transport protocol, MRC, sprays across many paths and actively load-balances between them, eliminating the issue of flow collisions (2) the use of multi-plane Clos topologies to get the benefits of high switch radix and redundancy, allowing training clusters well over 100K GPUs to be built as two-tier topologies while increasing physical redundancy, and (3) the use of static source-routing using SRv6 to allow MRC the freedom to bypass failures by itself. We describe our experiences running MRC and static SRv6 routing in production in OpenAI and Microsoft's largest training clusters, where it has been used to train the latest frontier models. We demonstrate how MRC allows AI training jobs to ride out many network failures that previously would have interrupted training.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04333v1</guid>
      <category>cs.NI</category>
      <category>cs.AI</category>
      <category>cs.DC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Joao Araujo, Alex Chow, Mark Handley, Ryder Lewis, Christoph Paasch, Jitendra Padhye, Michael Papamichael, Greg Steinbrecher, Amin Tootoonchian, Lihua Yuan, S. Anantharamu, Abhishek Dosi, Mohit Garg, Mahdieh Ghazi, Torsten Hoefler, Deepal Jayasinghe, Jithin Jose, Abdul Kabbani, Guohan Lu, Yang Wang, K. Doddapaneni, Murali Garimella, Vipin Jain, Yanfang Le, H. Nagulapalli, S. Narayanan, Rong Pan, Rathina Sabesan, Raghava Sivaramu, Rip Sohan, Eric Davis, Dragos Dumitrescu, Mohan Kalkunte, Bhaswar Mitra, Guglielmo Morandin, Adrian Popa, Costin Raiciu, Eric Spada, John Spillane, Niranjan Vaidya, Aviv Barnea, Idan Burstein, Elazar Cohen, Yamin Friedman, Noam Katz, Masoud Moshref, Yuval Shpigelman, Shahaf Shuler, Shy Shyman, Sayantan Sur</dc:creator>
    </item>
    <item>
      <title>Science discussions of retracted articles on Bluesky: public scrutiny or misinformation spreading?</title>
      <link>https://arxiv.org/abs/2605.04334</link>
      <description>arXiv:2605.04334v1 Announce Type: new 
Abstract: Post-publication peer review (PPPR) has emerged as an important supplement to traditional peer review, with social media playing a growing role in publicising potential problems in published research. However, it remains unclear whether social media discussions of retracted articles primarily reflect good practices, such as exposing flaws and acknowledging retraction status, or bad practices, such as overlooking retractions and continuing to disseminate scientific misinformation. In this study, we collected Bluesky posts referencing scholarly articles from Altmetric and retrieved metadata for the referenced articles using OpenAlex. The final dataset included 284 retracted articles with 79 pre-retraction posts and 857 post-retraction posts, 59 retraction notices with 186 posts, and 609,461 non-retracted articles with 1,344,756 posts. We manually coded Bluesky posts discussing retracted articles to identify instances of good and bad practice. The results show that posts demonstrating good practice (89.9%) substantially outnumbered those demonstrating bad practice (10.1%). Posts reflecting good practice also had more user engagement. In the pre-retraction phase, good practice posts constituted a slight minority (43.0%), whereas in the post-retraction phase they were dominant (94.2%). Most negative posts in the pre-retraction phase (90.0%) had good practice while only 17.3% positive posts in the post-retraction phase showed bad practice. Thus, sentiment analysis can be helpful to filter posts that could flag potential flaws before retraction, but it may struggle to accurately identify the spread of misinformation after retraction. More broadly, this study highlights the potential of Bluesky to support responsible scientific communication, public scrutiny, and research integrity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04334v1</guid>
      <category>cs.DL</category>
      <category>cs.CY</category>
      <category>cs.SI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Er-Te Zheng, Hui-Zhen Fu, Xiaorui Jiang, Zhichao Fang, Mike Thelwall</dc:creator>
    </item>
    <item>
      <title>Analysis of a Competitive Bivirus SIS Epidemic Model with Game Theoretic Social Distancing</title>
      <link>https://arxiv.org/abs/2605.04340</link>
      <description>arXiv:2605.04340v1 Announce Type: new 
Abstract: We propose a competitive bi-virus model with dynamic social distancing behavior. Our model illustrates how public perception of different viruses changes the conditions for their eradication, their coexistence, or the dominance of one over the other. We show that our model is not monotone, in contrast to the classic bi-virus model. We detail how social distancing behavior produces different sets of equilibria than the classic bi-virus model and changes the criteria for their stability. In particular, we detail the set of disease free equilibria (DFE) present in our model and identify necessary and sufficient conditions for almost global exponential stability of the same. We prove similar global results for all but one non-DFE isolated (unilateral) equilibria and local stability results for the remainder. We also consider coexistence equilibria; we show such equilibria, when they exist, take the form of lines of equilibria and give local conditions for their stability. Finally, we illustrate our theoretical findings with numerical examples.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04340v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Benjamin Catalano, Keith Paarporn, Sebin Gracy</dc:creator>
    </item>
    <item>
      <title>Budgeted LoRA: Distillation as Structured Compute Allocation for Efficient Inference</title>
      <link>https://arxiv.org/abs/2605.04341</link>
      <description>arXiv:2605.04341v1 Announce Type: new 
Abstract: We study distillation for large language models under explicit compute constraints, with the goal of producing student models that are not only cheaper to train, but structurally efficient at inference time. While prior approaches to parameter-efficient distillation, such as LoRA, reduce adaptation cost, they leave the dense backbone unchanged and therefore fail to deliver meaningful inference savings. We propose Budgeted LoRA, a distillation framework that treats model compression as a structured compute allocation problem. Instead of using a fixed student architecture, we introduce a global compute budget that sets the final target fraction of dense computation retained. Under this constraint, the model redistributes capacity across dense and low-rank pathways via (i) module-level dense retention coefficients, (ii) adaptive low-rank allocation, and (iii) post-training compression that selectively removes, approximates, or preserves dense components. This formulation yields a family of students controlled by a single budget dial. Empirically, Budgeted LoRA matches standard LoRA perplexity at a moderate budget with a 1.74x compressed-module speedup; at an aggressive budget it achieves a 4.05x speedup with moderate perplexity degradation, and it preserves higher accuracy on function-style in-context learning probes. These results suggest that, under compute-constrained distillation, retaining behavior is less about matching perplexity or removing more parameters than it is about controlling how dense computation is transferred to low-rank pathways.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04341v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mohammed Sabry, Anya Belz</dc:creator>
    </item>
    <item>
      <title>Adaptive Diagonal Loading for Norm Constrained Beamforming</title>
      <link>https://arxiv.org/abs/2605.04342</link>
      <description>arXiv:2605.04342v1 Announce Type: new 
Abstract: Reliable adaptive beamforming is critical for large microphone arrays operating in highly dynamic acoustic environments. In scenarios characterized by fast-moving talkers and interferers, the available sample support for estimating the spatial correlation matrix is often snapshot-deficient. This deficiency, coupled with array imperfections, degrades the White Noise Gain (WNG), leading to severe target signal cancellation. To ensure stable and robust beamforming, we propose a novel adaptive diagonal loading method that guarantees the WNG remains strictly within specified bounds. By leveraging the Kantorovich inequality, we map the desired WNG to a strict upper bound on the condition number of the correlation matrix. Furthermore, we present three estimation techniques for the adaptive loading level, ranging from trace-based bounding to exact eigenvalue decomposition, offering scalable computational complexities of $\mathcal{O}(M)$, $\mathcal{O}(M^2)$, and $\mathcal{O}(M^3)$. Our approach demonstrates highly stable beamforming under fast-changing interference.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04342v1</guid>
      <category>eess.SY</category>
      <category>cs.IT</category>
      <category>cs.SD</category>
      <category>cs.SY</category>
      <category>math.IT</category>
      <category>stat.AP</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Manan Mittal, Ryan M. Corey, John R. Buck, Andrew C. Singer</dc:creator>
    </item>
    <item>
      <title>Structural Equivalence and Learning Dynamics in Delayed MARL</title>
      <link>https://arxiv.org/abs/2605.04345</link>
      <description>arXiv:2605.04345v1 Announce Type: new 
Abstract: We formally establish the equivalence between Observation Delay (OD) and Action Delay (AD) in cooperative partially observable multi-agent systems using observation-action histories. We show that both systems generate identical admissible joint-policy sets, and their induced state-action-observation trajectories are identical in distribution, leading to identical optimal solutions in Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). This formally generalizes existing infinite-horizon single-agent results to any-horizon partially observable cooperative multi-agent problems with decentralized policy execution, and allows any mixed-delay configuration to be reduced to a pure OD system. We further prove that in Transition-Independent MDPs (TI-MDPs), the observation-action history reduces to a tractable minimal local augmented state.
  However, we show through numerical experiments that although the optimal solution spaces are structurally isomorphic, the practical learning dynamics are fundamentally different. First, using the minimal local augmented state, the equivalence no longer holds when transitions are not independent. Second, operational constraints and causal credit-assignment errors in Temporal Difference (TD) algorithms induce different learning behaviors across regimes. Finally, leveraging this structural equivalence to bypass these learning challenges, we demonstrate successful multi-agent zero-shot policy transfer from OD to AD, paving the way for unified, efficient solution methods in complex delayed systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04345v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jules Sintes, Ana Bu\v{s}i\'c, Jiamin Zhu</dc:creator>
    </item>
    <item>
      <title>Covariance-Aware Goodness for Scalable Forward-Forward Learning</title>
      <link>https://arxiv.org/abs/2605.04346</link>
      <description>arXiv:2605.04346v1 Announce Type: new 
Abstract: The Forward-Forward algorithm eliminates global gradient flow and full network activations storage. However, in convolutional settings, existing BP-free FF methods significantly under-perform backpropagation on complex benchmarks such as ImageNet-100 and Tiny-ImageNet. We identify this gap as a structural bottleneck in goodness extraction: standard sum-of-squares formulation collapses feature volumes into channel-wise activation energies which omits critical second-order dependencies. To address this, we propose a framework centered on three key components. First, Bi-axis Covariance Goodness(BiCovG) explicitly augments the standard goodness function with structured second-order information along two axes: cross-channel projections that model inter-feature covariance, and nested multi-scale aggregation that encodes spatial correlation statistics. This provides a tractable approximation to covariance-aware goodness without the prohibitive O(C^2) complexity of explicit matrix estimation. Second, a lightweight Logistic Fusion module aggregates layer-wise predictions, amplifying the contribution of deeper representations. Third, the Feature Alignment Layer(FAL) introduces a zero-initialized correction at block boundaries to mitigate representation misalignment in deep locally trained networks. By introducing these three components, we effectively double the depth of viable Forward-Forward learning, extending robust layer utilization from shallow baselines to 16 layer architectures like VGG-16. The resulting BP-free model achieves 73.01% on ImageNet-100 and 50.30% on Tiny-ImageNet. As a practical extension, Hybrid Goodness Blocks control the scope of gradient propagation via configurable block sizes, further narrowing the ImageNet-100 gap to 3.6% and matching BP on Tiny-ImageNet, while still reducing peak memory by approximately 50% relative to BP.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04346v1</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiaoyi Jiang, Bashir M. Al-Hashimi, Kai Xu</dc:creator>
    </item>
    <item>
      <title>Probing Structural Mathematical Reasoning in Language Models with Algebraic Trapdoors</title>
      <link>https://arxiv.org/abs/2605.04352</link>
      <description>arXiv:2605.04352v1 Announce Type: new 
Abstract: We introduce a benchmark suite for evaluating structural mathematical reasoning in language models, built on subgroup-construction problems in SL(3, Z) with cryptographic-style verifier-prover asymmetry. Each instance presents a finitely generated subgroup as a list of integer matrices and asks for an arithmetic invariant -- index, surjection-at-prime, or membership -- that the construction-time information (N, K) pins down in O(1) closed form, but that the solver, lacking that information, must derive by either Aschbacher-classification analysis or by a membership query in SL(3, Z) of unknown decidability. The benchmark therefore distinguishes models with internalized algebraic priors (Aschbacher classes, McLaughlin's theorem, Property (T), the congruence subgroup property) from models that rely on general-purpose computation. We report empirical results across five representative reasoning traces from two state-of-the-art models. The headline result: on the index variant, one model spent 152 minutes of reasoning, explicitly identified the kernel-side membership question as the bottleneck, attempted constructive verification, and abstained with "DON'T KNOW" rather than commit to its computed cokernel candidate -- demonstrating calibrated meta-cognition on the open-decidability boundary that the benchmark was designed to probe. We argue that the benchmark exposes a four-way classification of model behavior (commit-correct, commit-wrong, abstain-correct, abstain-wrong) that standard answer-key scoring conflates.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04352v1</guid>
      <category>cs.LG</category>
      <category>math.GR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Igor Rivin</dc:creator>
    </item>
    <item>
      <title>InterFuserDVS: Event-Enhanced Sensor Fusion for Safe RL-Based Decision Making</title>
      <link>https://arxiv.org/abs/2605.04355</link>
      <description>arXiv:2605.04355v1 Announce Type: new 
Abstract: Autonomous driving systems rely heavily on robust sensor fusion to perceive complex envi- ronments. Traditional setups using RGB cameras and LiDAR often struggle in high-dynamic- range scenes or high-speed scenarios due to motion blur and latency. Dynamic Vision Sensors (DVS), or event cameras, offer a paradigm shift by capturing asynchronous brightness changes with microsecond temporal resolution and high dynamic range. In this paper, we propose an extended architecture of the state-of-the-art InterFuser model, integrating DVS as an additional modality to enhance perception reliability. We introduce a novel token-based fusion strategy that incorporates accumulated event frames into the transformer-based backbone of InterFuser. Our method leverages the complementary nature of RGB, LiDAR, and DVS data. We evaluate our approach on the Car Learning to Act (CARLA) Leaderboard benchmarks, demonstrating that the inclusion of DVS improves the robustness of the driving agent, achieving a competitive Driving Score of 77.2 and a superior Route Completion of 100%. The results indicate that event-based vision is a promising direction for improving safety and performance in adverse lighting and dynamic conditions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04355v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mustafa Sakhaia, Kaung Sithua, Min Khant Soe Okea, Maciej Wielgosza</dc:creator>
    </item>
    <item>
      <title>Efficiently Aligning Language Models with Online Natural Language Feedback</title>
      <link>https://arxiv.org/abs/2605.04356</link>
      <description>arXiv:2605.04356v1 Announce Type: new 
Abstract: Reinforcement learning with verifiable rewards has been used to elicit impressive performance from language models in many domains. But, broadly beneficial deployments of AI may require us to train models with strong capabilities in "fuzzy", hard-to-supervise domains. In this paper, we develop methods to align language models in fuzzy domains where human experts are still able to provide high-quality supervision signal, but only for a small number of model outputs, using online natural language feedback. Specifically, we train models by iteratively optimizing against proxy reward signals, stopping at the point of over-optimization, collecting fresh expert supervision, and updating the proxy reward. We construct proxy reward models from language models using in-context learning (ICL) and fine-tuning. We test our methods by eliciting creative writing and alignment research capabilities in Qwen3-8B and Haiku 4.5 respectively. For Qwen3-8B, ICL methods recover up to 35% of performance with 50x fewer expert samples, while fine-tuning methods recover 80% with up to 20x fewer samples and 100% with 3x fewer samples. For Haiku 4.5, ICL methods recover up to 35% of performance with 30x fewer samples, and fine-tuning methods recover 100% with 10x fewer samples. Our results suggest that online natural language feedback can substantially improve the data efficiency of expert supervision.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04356v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Christine Ye, Joe Benton</dc:creator>
    </item>
    <item>
      <title>Coral: Cost-Efficient Multi-LLM Serving over Heterogeneous Cloud GPUs</title>
      <link>https://arxiv.org/abs/2605.04357</link>
      <description>arXiv:2605.04357v1 Announce Type: new 
Abstract: The usage of large language models (LLMs) has grown increasingly fragmented, with no single model dominating. Meanwhile, cloud providers offer a wide range of mid-tier and older-generation GPUs that enjoy better availability and deliver comparable performance per dollar to top-tier hardware. To efficiently harness these heterogeneous resources for serving multiple LLMs concurrently, we introduce Coral, an adaptive heterogeneity-aware multi-LLM serving system. The key idea behind Coral is to jointly optimize resource allocation and the serving strategy of each model replica across all models. To keep pace with shifting throughput demand and resource availability, Coral applies a lossless two-stage decomposition that preserves joint optimality while cutting online solve time from hours to tens of seconds. Our evaluation across 6 models and 20 GPU configurations shows that Coral reduces serving cost by up to 2.79$\times$ over the best baseline, and delivers up to 2.39$\times$ higher goodput under scarce resource availability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04357v1</guid>
      <category>cs.DC</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yixuan Mei, Zikun Li, Zixuan Chen, Shiqi Pan, Mengdi Wu, Xupeng Miao, Zhihao Jia, K. V. Rashmi</dc:creator>
    </item>
    <item>
      <title>Intermediate Representations are Strong AI-Generated Image Detectors</title>
      <link>https://arxiv.org/abs/2605.04358</link>
      <description>arXiv:2605.04358v1 Announce Type: new 
Abstract: The rapid advancement in generative AI models has enabled the creation of photorealistic images. At the same time, there are growing concerns about the potential misuse and dangers of generated content, as well as a pressing need for effective AI-generated image detectors. However, current training-based detection techniques are typically computationally costly and can hardly be generalized to unseen data domains, while training-free methods fall short in detection performance. To bridge this gap, we propose a search-based method employing data embedding sensitivity in intermediate layers to detect AI-generated images. Given a set of real and AI-generated images, our method examines the similarity between original image embeddings and perturbed image embeddings, and detects AI-generated images based on the similarity. We examine the proposed method on two comprehensive benchmarks: GenImage and Forensics Small. Our method exhibits improved performance across different datasets compared to both training-free and training-based state-of-the-art methods. On average, our method achieves the largest performance gain on the Forensics Small benchmark by 39.61% compared to the best training-free method and 5.14% compared to the best training-based method in AUROC score.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04358v1</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhenhan Huang, Pin-Yu Chen, Tejaswini Pedapati, Jianxi Gao</dc:creator>
    </item>
    <item>
      <title>When Context Hurts: The Crossover Effect of Knowledge Transfer on Multi-Agent Design Exploration</title>
      <link>https://arxiv.org/abs/2605.04361</link>
      <description>arXiv:2605.04361v1 Announce Type: new 
Abstract: The prevailing assumption in agent orchestration is that more context is better. We test this on multi-agent software design across 10 tasks, 7 context-injection conditions, and over 2,700 runs, and find a crossover effect: the same artifact type improves design exploration on some tasks (up to 20$\times$ tradeoff coverage) and actively degrades it on others (up to 46% reduction). On several tasks, an irrelevant document performs as well as or better than every relevant artifact. The direction is predicted by a single measurable variable--baseline exploration without context--with Pearson $r = -0.82$ ($p &lt; 0.001$). Probing the mechanism by manipulating convergence pressure through prompt design reveals two distinct regimes: convergence driven by training data priors (natural) responds to artifact disruption, while convergence driven by explicit instructions (induced) does not. The implication is that context injection should be conditional, not universal: one no-context trial is a cheap diagnostic that predicts whether knowledge artifacts will help or hurt a given task.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04361v1</guid>
      <category>cs.AI</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Saranyan Vigraham</dc:creator>
    </item>
    <item>
      <title>Mitigating Label Shift in Tabular In-Context Learning via Test-Time Posterior Adjustment</title>
      <link>https://arxiv.org/abs/2605.04363</link>
      <description>arXiv:2605.04363v1 Announce Type: new 
Abstract: TabPFN has recently gained attention as a foundation model for tabular datasets, achieving strong performance by leveraging in-context learning on synthetic data. However, we find that TabPFN is vulnerable to label shift, often overfitting to the majority class in the training dataset. To address this limitation, we propose DistPFN, the first test-time posterior adjustment method designed for tabular foundation models. DistPFN rescales predicted class probabilities by downweighting the influence of the training prior (i.e., the class distribution of the context) and emphasizing the contribution of the model's predicted posterior, without architectural modification or additional training. We further introduce DistPFN-T, which incorporates temperature scaling to adaptively control the adjustment strength based on the discrepancy between prior and posterior. We evaluate our methods on over 250 OpenML datasets, demonstrating substantial improvements for various TabPFN-based models in classification tasks under label shift, while maintaining strong performance in standard settings without label shift. Code is available at this repository: https://github.com/seunghan96/DistPFN.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04363v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Seunghan Lee, Jaehoon Lee, Jun Seo, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn</dc:creator>
    </item>
    <item>
      <title>Online Nonstochastic Prediction: Logarithmic Regret via Predictive Online Least Squares</title>
      <link>https://arxiv.org/abs/2605.04364</link>
      <description>arXiv:2605.04364v1 Announce Type: new 
Abstract: We study online prediction for marginally stable, partially observed linear dynamical systems under nonstochastic disturbances. Our objective is to minimize the cumulative squared prediction loss and compete with the best-in-hindsight Luenberger predictor. Standard online learning methods typically rely on bounded domains/gradients, and thus their guarantees may fail to deal with potentially unbounded trajectories in marginally stable systems. In this paper, we introduce an unconstrained online least squares method that stabilizes the learning process via tailored predictive hints. With model knowledge, we prove that hints constructed from any stabilizing Luenberger predictor render the hint residuals uniformly bounded, achieving logarithmic regret despite unbounded trajectory growth. We also discuss model-free prediction and introduce a simple universal hint for symmetric systems, under which logarithmic regret is maintained without model knowledge. Our results provide an adaptive, instance-wise optimal online predictor compared to classical fixed-gain observers under nonstochastic disturbances.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04364v1</guid>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chih-Fan Pai, Yang Zheng</dc:creator>
    </item>
    <item>
      <title>How Do Ice Shelves Calve? Peridynamic Modeling of Ice Shelf Fracture Driven by Wave Erosion, Basal Melting, and Buoyancy Flexure</title>
      <link>https://arxiv.org/abs/2605.04365</link>
      <description>arXiv:2605.04365v1 Announce Type: new 
Abstract: An ice shelf is a floating extension of a land-based ice sheet into the ocean. It plays a crucial role in slowing down the flow of land ice into the sea, thus stabilizing the ice sheet. However, this stabilizing effect can be weakened by ice calving, a process in which large fragments of ice detach from the ice shelf. Although ice calving is widely acknowledged as a major contributor to ice mass loss, and its frequency and magnitude are highly sensitive to the environmental forcing, the underlying physics-based mechanisms remain poorly understood, particularly under ocean wave actions. In this context, we developed a nonlocal peridynamics (PD) framework to model the ice calving process subjected to wave-induced frontal corrosion. The proposed physics-based PD framework enables investigation of the coupled effects of self-weight bending, buoyancy-induced foot loosening, and ice calving process. To authors' best knowledge, this work represents the first attempt to employ a physics-based peridynamics framework for simulating ice calving processes. Compared with conventional finite element methods (FEM), the PD framework naturally captures crack initiation, interaction, and propagation without the need for special numerical treatments, thereby providing a robust tool for simulating fracture phenomena under large deformations and long-term environmental loading. To quantitatively resolve fracture processes, we implemented a static first Piola Kirchhoff virial stress formulation within the PD framework, allowing direct evaluation of stress concentration and energy release at evolving crack tips. Subsequently, the model is rigorously validated through one-to-one comparisons with finite-element stress fields, analytical beam-theory solutions, and recent field observations of wave-driven ice-shelf failure reported by Sartore et al. (2025).</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04365v1</guid>
      <category>cs.CE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ying Song, Xuan Hu, Jingrui Xu, Keming Zhu, Yuan Zhang, Wenjun Lu, Shaofan Li</dc:creator>
    </item>
    <item>
      <title>Conditional Flow-VAE for Safety-Critical Traffic Scenario Generation</title>
      <link>https://arxiv.org/abs/2605.04366</link>
      <description>arXiv:2605.04366v1 Announce Type: new 
Abstract: Safety-critical scenarios are essential for the development of autonomous vehicles (AVs) but are rare in real-world driving data. While simulation offers a way to generate such scenarios, manually designed test cases lack scalability, and adversarial optimization often produces unrealistic behaviors. In this work, we introduce a conditional latent flow matching approach for scalable and realistic safety-critical scenario generation. Our method uses distribution matching to transform nominal scenes into safety-critical rollouts. Furthermore, we demonstrate that incorporating both simulation and real-world data enables our framework to efficiently generate diverse, data-driven scenarios. Experimental results highlight that our approach is able to more consistently and realistically generate novel safety-critical scenarios, making it a valuable tool for training and benchmarking AV systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04366v1</guid>
      <category>cs.RO</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zimu Gong, Brian Zhaoning Zhang, Chris Zhang, Kelvin Wong, Raquel Urtasun</dc:creator>
    </item>
    <item>
      <title>Extending Differential Temporal Difference Methods for Episodic Problems</title>
      <link>https://arxiv.org/abs/2605.04368</link>
      <description>arXiv:2605.04368v1 Announce Type: new 
Abstract: Differential temporal difference (TD) methods are value-based reinforcement learning algorithms that have been proposed for infinite-horizon problems. They rely on reward centering, where each reward is centered by the average reward. This keeps the return bounded and removes a value function's state-independent offset. However, reward centering can alter the optimal policy in episodic problems, limiting its applicability. Motivated by recent works that emphasize the role of normalization in streaming deep reinforcement learning, we study reward centering in episodic problems and propose a generalization of differential TD. We prove that this generalization maintains the ordering of policies in the presence of termination, and thus extends differential TD to episodic problems. We show equivalence with a form of linear TD, thereby inheriting theoretical guarantees that have been shown for those algorithms. We then extend several streaming reinforcement learning algorithms to their differential counterparts. Across a range of base algorithms and environments, we empirically validate that reward centering can improve sample efficiency in episodic problems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04368v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Kris De Asis, Mohamed Elsayed, Jiamin He</dc:creator>
    </item>
    <item>
      <title>Reddit's Globalization over Twenty Years: Inferring Community Time Zone from Activity Timestamps</title>
      <link>https://arxiv.org/abs/2605.04371</link>
      <description>arXiv:2605.04371v1 Announce Type: new 
Abstract: Online communities are a global phenomenon, but assessing their actual geographical spread requires accurate and scalable measurement. We propose and evaluate methods that infer the time zone of online communities solely from their temporal activity patterns, requiring nothing beyond hourly activity counts. Grounding our approach in the well-established finding that posting rhythms encode circadian structure, we compare time-domain and frequency-domain methods against a parsimonious heuristic: that activity reaches its minimum around 4 a.m. local time. On Reddit, we show that the best-performing method is accurate to a sub-30-minute resolution, and that fewer than a thousand comments are sufficient to reach peak performance. Similarly, our heuristic almost matches the accuracy of more complex methods, recovering the correct time zone within a one-hour margin on average. This simple method correlates significantly with the actual distribution of Reddit's geographical spread; we validate its generalizability across communities organized around diverse cultural phenomena, from sports to finance, and apply it at scale to characterize the geographic evolution of Reddit from its founding to the present. Our method is portable across platforms and requires no user disclosure, making it a practical baseline for any study that must account for the geographic structure of online behavior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04371v1</guid>
      <category>cs.SI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Franco Della Negra, Mattia Samory, Matteo Cinelli</dc:creator>
    </item>
    <item>
      <title>Worst-Case Discovery and Runtime Protection for RL-Based Network Controllers</title>
      <link>https://arxiv.org/abs/2605.04373</link>
      <description>arXiv:2605.04373v1 Announce Type: new 
Abstract: RL-based controllers achieve strong average-case performance in networking tasks such as congestion control and adaptive bitrate streaming. Yet their performance can degrade severely under network conditions where strong performance is still achievable. Identifying such conditions and quantifying the resulting performance gap is intractable by enumeration, while the sequential and closed-loop nature of RL controllers makes formal verification methods impractical.
  We present ReGuard, a framework that discovers worst-case scenarios for a given RL controller and protects it against them at inference time without retraining. Discovery is formulated as a bilevel regret-maximization problem, which yields a certified lower bound on the worst-case performance gap. The discovered trajectories are then analyzed as counterfactuals and compiled into lightweight logic rules that intervene only when a risky state is detected, leaving the controller's behavior unchanged otherwise.
  We evaluate ReGuard across three RL-based network controllers: Pensieve, Sage, and Park. ReGuard discovers scenarios in which the controller's performance is 43$-$64% worse than what is achievable. ReGuard not only discovers gaps 57% to 6$\times$ larger than those found by the strongest baselines but also shrinks them by 79$-$85% via lightweight rule-based protection while preserving nominal performance. ReGuard's protection extends beyond the scenarios it discovers, improving performance across a wider range of network conditions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04373v1</guid>
      <category>cs.NI</category>
      <category>cs.AI</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hongyu H\`e, Minhao Jin, Maria Apostolaki</dc:creator>
    </item>
    <item>
      <title>$p$-adic Manifold Learning and Benchmark Tasks from Impartial Games</title>
      <link>https://arxiv.org/abs/2605.04374</link>
      <description>arXiv:2605.04374v1 Announce Type: new 
Abstract: We introduce $p$-adic manifold learning, propose an algorithm to solve it, and propose benchmark tasks from impartial games.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04374v1</guid>
      <category>cs.LG</category>
      <category>math.NT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tomoki Mihara</dc:creator>
    </item>
    <item>
      <title>Experiment-as-Code Labs: A Declarative Stack for AI-Driven Scientific Discovery</title>
      <link>https://arxiv.org/abs/2605.04375</link>
      <description>arXiv:2605.04375v1 Announce Type: new 
Abstract: To unleash the full potential of AI for Science, we must untether the agents from a purely digital environment. The agent's ability to control and explore in real-world labs is essential because the physical lab remains foundational to scientific discovery. While some tasks can be performed on a computer (e.g., data analysis, running simulated experiments), Eureka moments could occur at any time while operating lab instruments (e.g., when a scientist notices unexpected clues, intuition may prompt a real-time course change). Although autonomous labs are on the rise, which expose programmable APIs to control scientific instruments via software, bridging the gap between increasingly powerful AI agents and automated lab equipment requires innovation that draws insights from computer systems.
  We propose a new paradigm called ``Experiment-as-Code (EaC) Labs,'' where a core concept is to encode experiments as declarative configurations that can be compiled down to device-level APIs. AI agents come up with hypotheses and experiments, written as an ensemble of declarative configurations. The systems layer performs program analysis, safety checks, resource assignment, and job orchestration. Finally, programmatic experimentation occurs via actuating the device APIs. This is a general stack that is science-, lab-, and instrument-independent, representing a novel synthesis across the physical, systems, and intelligence layers to unleash the next breakthrough in AI for Science.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04375v1</guid>
      <category>eess.SY</category>
      <category>cs.AI</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhenning Yang, Yuhan Chen, Patrick Tser Jern Kon, Tongyuan Miao, Hongyi Lin, Venkat Viswanathan, Danai Koutra, Ang Chen</dc:creator>
    </item>
    <item>
      <title>GraphPI: Efficient Protein Inference with Graph Neural Networks</title>
      <link>https://arxiv.org/abs/2605.04376</link>
      <description>arXiv:2605.04376v1 Announce Type: new 
Abstract: The integration of deep learning approaches in biomedical research has been transformative, enabling breakthroughs in various applications. Despite these strides, its application in protein inference is impeded by the scarcity of extensively labeled datasets, a challenge compounded by the high costs and complexities of accurate protein annotation. In this study, we introduce GraphPI, a novel framework that treats protein inference as a node classification problem. We treat proteins as interconnected nodes within a protein-peptide-PSM graph, utilizing a Graph Neural Network-based architecture to elucidate their interrelations. To address label scarcity, we train the model on a set of unlabeled public protein datasets with pseudo-labels derived from an existing protein inference algorithm, enhanced by self-training to iteratively refine labels based on confidence scores. Contrary to prevalent methodologies necessitating dataset-specific training, our research illustrates that GraphPI, due to the well normalized nature of Percolator features, exhibits universal applicability without dataset-specific fine-tuning, a feature that not only mitigates the risk of overfitting but also enhances computational efficiency. Our empirical experiments reveal notable performance on various test datasets and deliver significantly reduced computation times compared to common protein inference algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04376v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1021/acs.jproteome.3c00845</arxiv:DOI>
      <arxiv:journal_reference>Journal of Proteome Research 23.11 (2024): 4821-4834</arxiv:journal_reference>
      <dc:creator>Zheng Ma, Jiazhen Chen, Lei Xin, Ali Ghodsi</dc:creator>
    </item>
    <item>
      <title>Towards Formal Verification of Hybrid Synchronous Programs with Refinement Types</title>
      <link>https://arxiv.org/abs/2605.04377</link>
      <description>arXiv:2605.04377v1 Announce Type: new 
Abstract: Cyber-physical systems (CPS) such as autonomous cars, aircraft, and robots are often also safety-critical; thus it is imperative that they operate as intended with a high degree of certainty. Formal verification has been employed to verify the software controlling these systems, but due to their complexity, is usually performed on an abstract model rather than the executable code. Synchronous programming languages extended with differential equations promise both rigorous modeling and sufficient expressiveness to implement executable controller code, and recent developments have introduced formal verification of strictly discrete-time programs. Extending these verification techniques to hybrid systems enables precise modeling of the environment for a wider variety of programs to be both verified and executed. We formalize the operational semantics of initial value problems and zero-crossing detection expressed in a synchronous programming language, extend its type system for verification thereof, and prove its soundness.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04377v1</guid>
      <category>cs.PL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Serra Z. Dane, Jiawei Chen, Marc Pouzet, Jean-Baptiste Jeannin</dc:creator>
    </item>
    <item>
      <title>Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize</title>
      <link>https://arxiv.org/abs/2605.04396</link>
      <description>arXiv:2605.04396v1 Announce Type: new 
Abstract: Recent work has shown that Transformers' compositional generalization is governed by \emph{complexity control}, initialization scale and weight decay, which steers training toward low-complexity reasoning solutions rather than high-complexity memorization. Existing analyses, however, treat complexity control as a single static hyperparameter choice, leaving open \emph{when} during training this control is actually decisive. We show that the memorization-versus-reasoning fate of a Transformer is determined within a sharp, identifiable window of training. On a controlled compositional task we find that (i)~weight decay applied for a single 25\%-of-training window matches full-training weight decay in out-of-distribution (OOD) accuracy ($0.93$ vs $0.91$); (ii)~holding total regularization budget constant, placing it in the middle of training yields $5{-}9\times$ higher OOD accuracy than placing it early; (iii)~the boundary of the critical window is remarkably sharp, window onset shifted by as little as $100$ optimization steps causes mean OOD to jump from chance ($0.15$) to reasoning-regime ($0.61$); (iv)~the window's position depends systematically on initialization scale, but the basin of attraction for reasoning solutions \emph{shrinks} at small initialization, contradicting the prevailing recommendation that smaller initialization is uniformly better. We further show that the critical-window phenomenon is task-specific: it does not appear on grokking with modular arithmetic, where properly tuned constant weight decay matches scheduled weight decay.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04396v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sarwan Ali</dc:creator>
    </item>
    <item>
      <title>Optimize-at-Capture: Highly-adaptive Exposure Controlling for In-Vehicle Non-contact Heart-rate Monitoring</title>
      <link>https://arxiv.org/abs/2605.04397</link>
      <description>arXiv:2605.04397v1 Announce Type: new 
Abstract: Remote photoplethysmography (rPPG) holds great promise for continuous heart-rate monitoring of drivers in intelligent vehicles. However, its performance is severely degraded by the highly dynamic illumination changes. A critical yet overlooked factor is the lack of exposure controlling during video acquisition -- most existing systems rely on either fixed exposure settings or camera build-in auto-exposure, both of which fail to maintain stable facial brightness under rapidly changing lighting conditions during driving. To address this gap, we propose a highly-adaptive exposure controlling framework that proactively adjusts exposure parameters based on predictive modeling of historical skin reflections. Unlike standard auto-exposure, our method is specifically optimized for rPPG measurement, ensuring the skin region of interest (ROI) remains within the optimal dynamic range for rPPG signal extraction. As an important contribution of this study, we introduce ExpDrive, a public in-vehicle physiological monitoring dataset comprising synchronized facial video and reference ECG from 48 subjects captured under real driving conditions. Extensive experiments demonstrate that our method consistently outperforms fixed exposure and standard auto-exposure strategies. Specifically, it reduces the Mean Absolute Error (MAE) by 6.31 bpm (from 14.1 to 7.79 bpm) and significantly increases the success rate by 32.3 percentage points (p &lt; 0.001) (from 24.9% to 57.2%) across challenging driving scenarios. Notably, it clearly improved the performance of non-contact heart-rate monitoring in both low-light (rainy) and high-glare (sunny) conditions, validating the efficacy of exposure-aware acquisition design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04397v1</guid>
      <category>cs.CV</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jieying Wang, Xinqi Cai, Caifeng Shan, Wenjin Wang</dc:creator>
    </item>
    <item>
      <title>Contextual Memory-Enhanced Source Coding for Low-SNR Communications</title>
      <link>https://arxiv.org/abs/2605.04400</link>
      <description>arXiv:2605.04400v1 Announce Type: new 
Abstract: While Separate Source-Channel Coding (SSCC) retains the practical benefits of modular system design, its effectiveness in noisy text transmission is fundamentally constrained by the fragility of autoregressive source decoding. In low-SNR regimes, even a small number of residual bit errors after channel decoding may derail the subsequent lossless reconstruction process, especially when Arithmetic Coding (AC) relies on Large Language Model (LLM)-based probability estimation. Existing remedies either strengthen channel decoding based solely on channel observations or introduce contextual information only at the receiver for post-hoc correction, yet neither fully addresses the fragility of source probability modeling under residual channel errors. To this end, this paper proposes a Memory-Augmented Source Coding (MASC) scheme for robust SSCC-based transmission. Rather than treating context as external side information, MASC internalizes contextual patterns into a source model shared by both the transmitter-side source encoder and the receiver-side source decoder. Specifically, MASC employs a shared Parameterized Contextual Memory (PCM) to encode multi-order $n$-gram patterns, and further introduces a Mixture-of-Memory-Experts Router (MMER) to perform sparse, hidden-state-dependent routing over memory experts during autoregressive source modeling. By adaptively activating only the most relevant memories at each coding step, MASC refines source probability estimation, shortens average codelength, and mitigates the sensitivity of source decoding to residual channel errors. Extensive experiments over Rayleigh fading and AWGN channels demonstrate the effectiveness of the proposed scheme compared with state-of-the-art methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04400v1</guid>
      <category>cs.IT</category>
      <category>cs.LG</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ziqiong Wang, Rongpeng Li</dc:creator>
    </item>
    <item>
      <title>Detecting Deepfakes via Hamiltonian Dynamics</title>
      <link>https://arxiv.org/abs/2605.04405</link>
      <description>arXiv:2605.04405v1 Announce Type: new 
Abstract: Driven by the rapid development of generative AI models, deepfake detectors are compelled to undergo periodic recalibration to capture newly developed synthetic artifacts. To break this cycle, we propose a new perspective on deepfake detection: moving from static pattern recognition to dynamical stability analysis. Specifically, our approach is motivated by physics-inspired priors: we hypothesize that natural images, as products of dissipative physical processes, tend to settle near stable, low-energy equilibria. In contrast, generative models optimize for statistical similarity to real images but do not explicitly enforce structural constraints such as geometric smoothness, leaving deepfakes more likely to occupy unstable, high-energy states. To operationalize this, we introduce Hamiltonian Action Anomaly Detection (HAAD), comprising three contributions: \textbf{i)} We model the image latent manifold as a potential energy surface. Under this hypothesis, real images are expected to produce basin-like low-energy responses, whereas fake images are more likely to induce high-potential, high-gradient responses. \textbf{ii)} We employ Hamiltonian-inspired dynamics as a stability probe. By releasing latent states from rest, samples near stable regions remain bounded, while high-gradient samples produce larger trajectory responses. \textbf{iii)} We quantify these dynamic behaviors through two trajectory statistics, \ie, Hamiltonian action and energy dissipation. Extensive experiments show that HAAD outperforms evaluated state-of-the-art baselines on challenging cross-dataset transfer benchmarks, supporting a physics-inspired stability prior for digital forensics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04405v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Harry Cheng, Ming-Hui Liu, Tianyi Wang, Weili Guan, Liqiang Nie, Mohan Kankanhalli</dc:creator>
    </item>
    <item>
      <title>Beyond Rigid Geometries: The Spline-Pullback Metric for Universal Diffeomorphic SPD Representation Learning</title>
      <link>https://arxiv.org/abs/2605.04406</link>
      <description>arXiv:2605.04406v1 Announce Type: new 
Abstract: The integration of Symmetric Positive Definite (SPD) matrices into deep learning has historically relied on fixed algebraic Riemannian metrics. Analogous to hand-crafted features in classical machine learning, these static formulations impose rigid geometries limiting network expressivity and adaptability. Recent attempts to parameterize these geometries often violate the axioms of primary matrix functions through unconstrained powers or rank-dependent scaling, inviting spatial folding, loss of global surjectivity, and gradient collapse at spectral singularities. In this paper, we introduce the Spline-Pullback Metric (SPM), instantiated as Spectral-SPM and Cholesky-SPM, marking a paradigm shift from static metric selection to universal geometric approximation. By parameterizing the global diffeomorphism via a rank-invariant, monotonically constrained B-spline, SPM acts as a dense universal approximator for strictly increasing $C^1$ diffeomorphisms and theoretically subsumes existing pullback metrics while enabling localized non-linear spectral modelling. Topologically, SPM provides a globally bijective pullback geometry precluding rank-swapping discontinuities and gradient instabilities. Empirically, SPM achieves a state-of-the-art performance across 3 datasets utilizing Linear Probes, SPDNets, and deep Riemannian ResNets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04406v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tushar Das, Subrata Dutta, Sarmistha Neogy, Koushlendra Kumar Singh</dc:creator>
    </item>
    <item>
      <title>Assessing Generalisation Capability of Machine Learning Models for Intrusion Detection</title>
      <link>https://arxiv.org/abs/2605.04407</link>
      <description>arXiv:2605.04407v1 Announce Type: new 
Abstract: The growth of networked and IoT systems has intensified cyber-security threats and exposed the limits of traditional signature-based intrusion detection. Although machine-learning-based intrusion detection systems often report strong benchmark performance, high ac- curacy within a single dataset does not necessarily guarantee reliable performance in unseen network environments. This study investigates the generalisation capability of supervised machine learning models for intrusion detection using UNSW-NB15 and TON_IoT. Random Forest, Logistic Regression, and Naive Bayes were evaluated under same-dataset and cross-dataset settings. Random Forest achieved the strongest same dataset performance, with 95.08% accuracy on UNSW-NB15 and 99.79% on TON_IoT, but performance dropped sharply in cross-dataset testing. When trained on UNSW-NB15 and tested on TON_IoT or vice versa, below 40% accuracy. These results reveal a significant generalisation gap in intrusion detection. We connect this challenge to affective computing and human-centric AI, where behavioural signal analysis, anomaly detection, domain shift, and context-sensitive modelling are also central. This framing highlights the need for adaptive, generalisable cyber-security models that can operate across changing network and IoT environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04407v1</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Md Zakir Hossain, Md Ayshik Rahman Khan, Md Rafiqul Islam, Syed Mohammed Shamsul Islam, Tom Gedeon</dc:creator>
    </item>
    <item>
      <title>Autonomous Laparoscope Control through Unified Mechanics-Based Representation of Multimodal Intraoperative Information</title>
      <link>https://arxiv.org/abs/2605.04408</link>
      <description>arXiv:2605.04408v1 Announce Type: new 
Abstract: Laparoscope-holding robots can provide surgeons with a stable laparoscopic field of view (FOV) and reduce the burden on human assistants. To maintain an ideal intraoperative FOV, the robot must continuously adjust the laparoscope pose according to intraoperative information. However, intraoperative multimodal signals, such as position, force/torque, and images, differ markedly in physical meaning and units, making it difficult to build a unified representation and to generate control commands that can be used directly for laparoscope control. To address this issue, we propose a laparoscope-holding robot control method based on unified mechanics modeling of multimodal information. First, we design mapping strategies for multiple intraoperative sources, including position, force/torque, and images, and unify them into an equivalent-wrench representation in the operational space. Then, using a task-priority scheme, we inject the wrenches into the task space and the null space, respectively, and synthesize laparoscope control commands via task-priority projection, thereby achieving consistent representation and coordinated fusion of multimodal information within a single framework. Finally, taking the intraoperative remote center of motion (RCM) position, force/torque sensor readings, and laparoscopic images as examples, we construct an RCM-constraint wrench to enforce the RCM geometric constraint and reduce the contact force at the trocar site, a laparoscope-manipulation wrench to enable compliant dragging, and an instrument-tracking wrench to achieve autonomous visual tracking of the instruments. Experiments on a surgical phantom and in vivo porcine trials demonstrate that the proposed method supports multi-task operation, including compliant laparoscope manipulation and autonomous instrument tracking, while maintaining the RCM constraint and reducing sustained trocar-site loading.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04408v1</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaojian Li, Jin Fang, Yudong Shi, Xilin Xiao, Kai Yan, Kang Min, Ling Li, Hua Tang, Hangjie Mo</dc:creator>
    </item>
    <item>
      <title>UAV as Urban Construction Change Monitor: A New Benchmark and Change Captioning Model</title>
      <link>https://arxiv.org/abs/2605.04409</link>
      <description>arXiv:2605.04409v1 Announce Type: new 
Abstract: Remote Sensing Image Change Captioning (RSICC) aims to generate spatially grounded natural language descriptions of scene evolution from bi-temporal imagery, moving beyond binary change masks toward semantic-level understanding. However, existing methods rely on implicit feature differencing without explicitly modeling structured change semantics, and struggle to reconcile the conflicting representation demands of change detection and caption generation. In addition, current benchmarks provide limited coverage of high-resolution urban construction scenarios. To address these challenges, we propose PTNet, a prototype-guided task-adaptive framework for joint change captioning and detection. PTNet explicitly models structured change semantics through a learnable prototype bank that guides cross-temporal interaction, disentangles task-specific representations via multi-head gating, and injects detection-derived spatial priors into caption generation, enabling coherent semantic correspondence while preserving fine-grained spatial sensitivity. Furthermore, we construct UCCD, a large-scale UAV-based benchmark comprising 9,000 high-resolution image pairs and 45,000 annotated sentences for urban construction monitoring. Extensive experiments on UCCD and WHU-CDC demonstrate that PTNet consistently outperforms existing methods. The dataset and source code are publicly available at https://github.com/G124556/ptnet.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04409v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yupeng Gao, Tianyu Li, Guoqing Wang, Yang Yang</dc:creator>
    </item>
    <item>
      <title>Evaluation Cards for XAI Metrics</title>
      <link>https://arxiv.org/abs/2605.04410</link>
      <description>arXiv:2605.04410v1 Announce Type: new 
Abstract: The evaluation of explainable AI (XAI) methods is affected by a lack of standardization. Metrics are inconsistently defined, incompletely reported, and rarely validated against common baselines. In this paper, we identify transparency of evaluation reporting as a central, under-addressed problem. We propose the XAI Evaluation Card, a documentation template analogous to model cards, designed to accompany any study that introduces an XAI evaluation metric. The card covers explicit declaration of target properties, grounding levels, metric assumptions, validation evidence, gaming risks, and known failure cases. We argue that adopting this template as a community norm would reduce evaluation fragmentation, support meta-analysis, and improve accountability in XAI research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04410v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rokas Gipi\v{s}kis, Olga Kurasova</dc:creator>
    </item>
    <item>
      <title>Structured 3D Latents Are Surprisingly Powerful: Unleashing Generalizable Style with 2D Diffusion</title>
      <link>https://arxiv.org/abs/2605.04412</link>
      <description>arXiv:2605.04412v1 Announce Type: new 
Abstract: 3D asset generation plays a pivotal role in fields such as gaming and virtual reality, enabling the rapid synthesis of high-fidelity 3D objects from a single or multiple images. Building on this capability, enabling style-controllable generation naturally emerges as an important and desirable direction. However, existing approaches typically rely on style images that lie within or are similar to the training distribution of 3D generation models. When presented with out-of-distribution (OOD) styles, their performance degrades significantly or even fails. To address this limitation, we introduce $\textbf{DiLAST}$: 2D Diffusion-based Latent Awakening for 3D Style Transfer. Specifically, we leverage a pretrained 2D diffusion model as a teacher to provide rich and generalizable style priors. By aligning rendered views with the target style under diffusion-based guidance, our method optimizes the structured 3D latent representation for stylization. We observe that this limitation stems not from insufficient model capacity, but from the underutilization of structured 3D latents, which are inherently expressive. Despite being trained on comparatively limited data, 3D generation models can leverage 2D diffusion guidance to steer denoising toward specific directions in latent space, thereby producing diverse, OOD styles. Extensive experiments across diverse data and multiple 3D generation backbones demonstrate the effectiveness and plug-and-play nature of our approach.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04412v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yiran Qiao, Yiren Lu, Yunlai Zhou, Disheng Liu, Linlin Hou, Rui Yang, Yu Yin, Jing Ma</dc:creator>
    </item>
    <item>
      <title>Counterfactual identifiability beyond global monotonicity: non-monotone triangular structural causal models</title>
      <link>https://arxiv.org/abs/2605.04413</link>
      <description>arXiv:2605.04413v1 Announce Type: new 
Abstract: Structural causal models provide a unified semantics for interventions and counterfactuals, but most identifiability results rely on restrictive assumptions like global monotonicity, which are often violated in embodied interaction, where the same exogenous perturbation can induce opposite responses under different contact contexts. We ask what structure still suffices once global monotonicity is dropped. We introduce non-monotone triangular structural causal models (NM-TM-SCM), which retain triangular recursion but replace global monotonicity with mechanism-wise invertibility and context-independent inverse transport. We prove that these conditions are equivalent to exogenous isomorphism and imply complete counterfactual identifiability, and we give a counterexample showing that local invertibility alone is insufficient. We instantiate the theory in CausalInverter, with triangular invertible layers, orientation gates, and transport-stability regularization. On synthetic non-monotonic mechanisms, the structural bias yields systematic counterfactual gains as non-monotonicity increases. On MuJoCo Door, our model achieves perfect event-level counterfactual recovery, lowers continuous angle error relative to a Transformer baseline, and delivers substantially more stable recovery than Transformer and conditional-flow predictors. On MuJoCo Push, where non-monotonicity is weaker, the same low-data predictors remain competitive or better, consistent with a bias-variance boundary. These results identify a broader identifiable regime between globally monotone triangular models and unconstrained black-box world models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04413v1</guid>
      <category>cs.LG</category>
      <category>stat.ME</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pengcheng Tan, Jiang Chen, Dehui Du</dc:creator>
    </item>
    <item>
      <title>Demystifying Manifold Constraints in LLM Pre-training</title>
      <link>https://arxiv.org/abs/2605.04418</link>
      <description>arXiv:2605.04418v1 Announce Type: new 
Abstract: The empirical success of large language model (LLM) pre-training relies heavily on heuristic stabilization techniques, such as explicit normalization layers and weight decay. While recent constrained optimization approaches that explicitly restrict weights may improve numerical stability and performance, the mechanism and motivation for adding constraints still remain elusive. This paper systematically demystifies the role of explicit manifold constraints in LLM pre-training. By introducing the Msign-Aligned Constrained Riemannian Optimizer (MACRO)-a provably convergent, single-loop optimization framework-our study disentangles weight regularization heuristics from interacting mechanisms like RMS normalization and decoupled weight decay. Theoretical analyses and comprehensive empirical evaluations reveal that manifold constraints independently bound forward activation scales and enforce stable rotational equilibrium, thereby subsuming the roles of these heuristic mechanisms. Evaluations on large-scale LLM architectures demonstrate that MACRO achieves highly competitive performance while rigorously preserving the theoretical guarantees of exact Riemannian optimization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04418v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kang An, Jiaxiang Li, Donald Goldfarb, Shiqian Ma</dc:creator>
    </item>
    <item>
      <title>FLUID: Continuous-Time Hyperconnected Sparse Transformer for Sink-Free Learning</title>
      <link>https://arxiv.org/abs/2605.04421</link>
      <description>arXiv:2605.04421v1 Announce Type: new 
Abstract: Continuous-time (CT) Transformers improve irregular and long-range modeling over CT-RNNs by exploiting inputs or outputs embeddings with continuous dynamics. However, the core scaled-dot-product-attention (SDPA) mechanism remains inherently discrete. We propose FLUID (Flexible Unified Information Dynamics), a CT Transformer that incorporates continuous dynamics directly into the attention computation by replacing it with Liquid Attention Network (LAN). LAN reinterprets attention logits as continuous dynamical system and reformulates them as the solution to a linear ODE modulated by input-dependent nonlinear recurrent gates. Theoretically, we establish stability guarantees for LAN dynamics and show that it serves as an interpolating middle ground between SDPA and CT-RNNs, recovering each as special case under well-defined parameterization of its gating functions. LAN also introduces an explicit attention-sink gate to eliminate disproportionate attention mass on uninformative nodes. FLUID replaces standard residual connections with input-dependent Liquid Hyper-Connections to adaptively regulate interlayer information flow. Empirically, we evaluate FLUID on a broad set of learning tasks, including (i) irregular time-series, (ii) long-range modeling, (iii) lane-keeping control of autonomous vehicles, and (iv) learning physical dynamics under a scarce data regime. Across all the tasks, FLUID consistently matches or outperforms CT baselines, achieving improvements of up to 47% in certain scenarios and enhancing generalization under distributional shifts. Additionally, FLUID demonstrates superior noise robustness and a self-correcting inductive bias in autonomous vehicle control. We also provide a detailed analysis of key hyperparameters to guide tuning and show that FLUID occupies an intermediate position among competing approaches in terms of runtime and memory efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04421v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Waleed Razzaq, Yun-Bo Zhao</dc:creator>
    </item>
    <item>
      <title>Joint Semantic Token Selection and Prompt Optimization for Interpretable Prompt Learning</title>
      <link>https://arxiv.org/abs/2605.04425</link>
      <description>arXiv:2605.04425v1 Announce Type: new 
Abstract: Vision-language models such as CLIP achieve strong visual-textual alignment, but often suffer from overfitting and limited interpretability when adapted through continuous prompt learning. While discrete prompt optimization improves interpretability, it usually depends on large external models, leading to high computational costs and limited scalability. In this paper, we propose Interpretable Prompt Learning (IPL), a hybrid framework that alternates between discrete semantic token selection and continuous prompt optimization. Specifically, IPL formulates semantic token selection as an approximate submodular optimization problem, encouraging tokens that are both human-understandable and semantically diverse. It further adopts an alternating optimization strategy to integrate discrete token selection with continuous prompt tuning, improving interpretability while preserving adaptability to downstream tasks. Our framework is plug-and-play, allowing seamless integration with existing prompt learning methods. Extensive experiments on multiple benchmarks show that IPL consistently improves both interpretability and accuracy across five representative prompt learning methods, providing an effective and scalable extension to existing frameworks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04425v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yating Wang, Yaqi Zhao, Yongshun Gong, Yilong Yin, Haoliang Sun</dc:creator>
    </item>
    <item>
      <title>Telegraph English: Semantic Prompt Compression via Structured Symbolic Rewriting</title>
      <link>https://arxiv.org/abs/2605.04426</link>
      <description>arXiv:2605.04426v1 Announce Type: new 
Abstract: We introduce Telegraph English (TE), a prompt-compression protocol that rewrites natural language into a symbol-rich, formally-structured dialect. Where token-deletion methods such as LLMLingua-2 train a classifier to delete low-importance tokens at a fixed ratio, TE performs a full semantic rewrite: it decomposes the input into atomic fact lines, substitutes verbose phrases with $\sim$40 logical and relational symbols, and lets the compression ratio adapt to each document's information density. A consequence of the line-structure rule is that compression and semantic chunking become the same operation -- each output line is an independently addressable fact, so the compressed representation is simultaneously a semantic index. We evaluate TE on 4{,}081 question-answer pairs from LongBench-v2 across five OpenAI models and two difficulty levels. At roughly 50\% token reduction, TE preserves 99.1\% accuracy on key facts with GPT-4.1 and outperforms LLMLingua-2 at matched compression ratios on every model and task tested. The gap widens on smaller models -- up to 11 percentage points on fine-detail tasks -- suggesting that explicit relational structure compensates for limited model capacity. We release the grammar specification, compression prompt, benchmark data, and reference implementation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04426v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mikhail L. Arbuzov, Sisong Bei, Ziwei Dong, Dmitri Kalaev, Alexey A. Shvets</dc:creator>
    </item>
    <item>
      <title>Structure-Preserving and Pressure-Robust PINNs for Incompressible Oseen Problems</title>
      <link>https://arxiv.org/abs/2605.04427</link>
      <description>arXiv:2605.04427v1 Announce Type: new 
Abstract: We develop a new class of physics-informed neural network approximations for the stationary Oseen equations based on stability-consistent loss constructions. In contrast to standard PINN formulations, which are typically heuristic, the proposed consistent PINN (CPINN) framework is systematically derived from the stability structure of the continuous problem. Within this setting, we introduce two fundamentally new approaches. First, we design standard CPINN formulations that exhibit clear improvements over conventional PINNs. Second, we propose pressure-robust CPINN formulations that provably eliminate the influence of gradient forces on the velocity approximation, yielding velocity errors that depend solely on the divergence-free component of the forcing and are independent of the pressure. The framework accommodates both exactly divergence-free architectures and unconstrained velocity approximations, providing a unified treatment of these two paradigms. Using techniques from optimal recovery theory, we establish, for the first time in the PINN setting for Oseen-type problems, quantitative recovery estimates and optimal error bounds for both velocity and pressure under suitable Besov regularity assumptions. In particular, we obtain optimal rates for the velocity in $\boldsymbol{H}^1(\Omega)$ and for the pressure in $L^2(\Omega)$. The proposed methodology introduces a pressure-robust CPINN paradigm for incompressible flows, combining structural consistency, robustness with respect to irrotational forces, and rigorous accuracy guarantees. Numerical experiments corroborate the theoretical findings and demonstrate the effectiveness of the approach.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04427v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shiv Mishra, Arbaz Khan</dc:creator>
    </item>
    <item>
      <title>Submodular Ground-Set Pruning: Monotone Tightness and a Non-Monotone Separation</title>
      <link>https://arxiv.org/abs/2605.04428</link>
      <description>arXiv:2605.04428v1 Announce Type: new 
Abstract: Large-scale subset selection asks for a small useful set of examples, features, sensors, seed users, or context passages from an enormous ground set. Submodular maximization is a canonical model for such diminishing-returns problems, but rapidly growing datasets make even linear-time algorithms ever costlier. We study \emph{containment pruning}: first reduce the ground set to a smaller core $P$, then require that $P$ contain a near-optimal feasible solution for every downstream budget up to~$k$. Prior work has formulated many heuristics, but the theoretical limits of this preprocessing problem are largely unknown. For monotone submodular objectives, we prove that $1-1/e$ is tight: greedy achieves this containment factor, and no algorithm can beat it even with a larger pruning budget. For non-monotone objectives, we give the first$1/2-\varepsilon$ containment algorithms under cardinality constraints and extend the approach to knapsack constraints. This $1/2$ factor exceeds the best known algorithmic ratio and the known hardness threshold for non-monotone maximization, showing that pruning can be provably easier than optimization. Empirically, pruning lets an exact IP solver run on the reduced MaxCut instance with a ${\approx}620\times$ speedup, and proof-of-concept experiments on LLM context selection demonstrate the utility of non-monotone submodular proxies and our proposed containment algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04428v1</guid>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alan Kuhnle</dc:creator>
    </item>
    <item>
      <title>Towards Robust LLM Post-Training: Automatic Failure Management for Reinforcement Fine-Tuning</title>
      <link>https://arxiv.org/abs/2605.04431</link>
      <description>arXiv:2605.04431v1 Announce Type: new 
Abstract: Reinforcement fine-tuning (RFT) has become a core paradigm for post-training large language models, yet its training process remains highly fragile. Existing efforts mainly improve reliability at the system level or address specific issues in individual subproblems by modifying RFT algorithms. Despite their effectiveness, they largely overlook the problem of failure management at the training-process level. When training goes wrong, practitioners still rely heavily on expert-driven manual inspection and correction, and automatic failure management for RFT remains largely unexplored. In this paper, we take a first step toward systematic failure management for reinforcement fine-tuning. To understand the empirical structure of RFT failures, we first construct RFT-FaultBench, the first benchmark for fine-grained failures in reinforcement fine-tuning, covering 5 fault families, 16 fault types, 779 training runs, 22,549 train-step records, and 1,457,288 trajectory-level records. Based on this benchmark, we conduct a comprehensive empirical study showing that RFT failures are both observable from training dynamics and distinguishable through their empirical fault fingerprints. Building on these findings, we propose RFT-FM, an automatic failure management framework for reinforcement fine-tuning that unifies anomaly detection, failure diagnosis, and auto remediation in a closed loop. Experimental results show that RFT-FaultBench is neither trivial nor saturated: it exhibits clear anomaly structure while still posing substantial challenges, especially under subtle fault settings. Moreover, RFT-FM shows strong capability in detecting, diagnosing, and mitigating RFT failures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04431v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lingzhe Zhang, Tong Jia, Yunpeng Zhai, Liancheng Fang, Kening Zheng, Hongyi Liu, Xiaosong Huang, Philip S. Yu, Ying Li</dc:creator>
    </item>
    <item>
      <title>Ground4D: Spatially-Grounded Feedforward 4D Reconstruction for Unstructured Off-Road Scenes</title>
      <link>https://arxiv.org/abs/2605.04435</link>
      <description>arXiv:2605.04435v1 Announce Type: new 
Abstract: Feedforward Gaussian Splatting has recently emerged as an efficient paradigm for 4D reconstruction in autonomous driving. However, in unstructured off-road scenes, its performance degrades due to high-frequency geometry, ego-motion jitter, and increased non-rigid dynamics. These factors introduce conflicting Gaussian observations across timestamps, leading to either over-smoothed renderings or structural artifacts. To address this issue, we propose Ground4D, a spatially-grounded 4D feedforward framework for pose-free off-road reconstruction. The key idea is to resolve temporal conflicts through spatially localized conditioning. Specifically, we introduce voxel-grounded temporal Gaussian aggregation, which partitions the canonical Gaussian space into spatial voxels and performs query-conditioned temporal attention within each voxel. Intra-voxel softmax normalization ensures that temporal selectivity and spatial occupancy become mutually reinforcing rather than conflicting. We furthermore introduce surface normal cues as auxiliary geometric guidance to regularize the geometry of Gaussian primitives. Extensive experiments on ORAD-3D and RELLIS-3D demonstrate that Ground4D consistently outperforms existing feedforward methods in reconstruction quality and generalizes zero-shot to unseen off-road domains. Project page and code:https://github.com/wsnbws/Ground4D.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04435v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shuo Wang, Jilin Mei, Fuyang Liu, Wenfei Guan, Fanjie Kong, Zhihua Zhao, Shuai Wang, Chen Min, Yu Hu</dc:creator>
    </item>
    <item>
      <title>Joint Optimization of Trajectory Control, Resource Allocation, and Task Offloading for Multi-UAV-Assisted IoV</title>
      <link>https://arxiv.org/abs/2605.04436</link>
      <description>arXiv:2605.04436v1 Announce Type: new 
Abstract: This paper investigates a multi-Unmanned Aerial Vehicle (UAV) joint base station-assisted Internet of Vehicles (IoV) task offloading system in dense urban environments. To minimize system delay and energy consumption under strict coupling constraints, the complex non-convex optimization problem is decoupled into a hierarchical execution framework. First, a sequential distributed optimization algorithm based on Second-Order Cone Programming (SOCP) is proposed to optimize the 3D flight trajectory of each UAV, ensuring adaptive network coverage. Second, a novel hybrid resource scheduling paradigm synergizing Deep Reinforcement Learning (DRL) and Large Language Models (LLMs) is developed. Within this framework, the DRL agent dictates the initial resource allocation, while the LLM acts as a semantic macro-scheduler to rectify long-tail allocation imbalances for failed and surplus tasks. Crucially, a reward decoupling mechanism is introduced to isolate DRL training from external LLM interventions, thereby ensuring policy convergence. Finally, the task offloading ratios are precisely determined via Linear Programming (LP) within an alternating optimization loop. Simulation results demonstrate that the proposed method significantly outperforms traditional multi-agent reinforcement learning baselines in terms of task success rate and system efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04436v1</guid>
      <category>cs.NI</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Maoxin Ji, Qiong Wu, Pingyi Fan, Cui Zhang, Nan Cheng, Wen Chen, Khaled B. Letaief</dc:creator>
    </item>
    <item>
      <title>A cross-modal network for facial expression recognition</title>
      <link>https://arxiv.org/abs/2605.04439</link>
      <description>arXiv:2605.04439v1 Announce Type: new 
Abstract: Deep neural networks enriched with structural information have been widely employed for facial expression recognition tasks. However, these methods often depend on hierarchical information rather than face property to finish expression recognition. In this paper, we propose a cross-modal network with strong biological and structural information for facial expression recognition (CMNet). CMNet can respectively learn expression information via face symmetry on a whole face, left and right half faces to extract complementary facial features. To prevent negative effect of biological and structural information fusion, a salient facial information refinement module can obtain salient facial expression information to improve stability of an obtained facial expression classifier. To reduce reliance on unilateral facial features, a half-face alignment optimization mechanism is designed to align obtained expression information of learned left and right half faces. Our experimental results demonstrate that CMNet outperforms several novel methods, i.e., SCN and LAENet-SA for facial expression recognition. Codes can be obtained at https://github.com/hellloxiaotian/CMNet.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04439v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:DOI>10.1109/TIP.2026.3688163</arxiv:DOI>
      <dc:creator>Chunwei Tian, Jingyuan Xie, Qi Zhang, Chao Li, Wangmeng Zuo, Shichao Zhang</dc:creator>
    </item>
    <item>
      <title>LEGO: LoRA-Enabled Generator-Oriented Framework for Synthetic Image Detection</title>
      <link>https://arxiv.org/abs/2605.04445</link>
      <description>arXiv:2605.04445v1 Announce Type: new 
Abstract: The rapid advancement of generative technologies has made synthetic images nearly indistinguishable from real ones, thereby creating an urgent need for robust detectors to counter misinformation. However, existing methods mainly rely on universal artifact features that are shared across multiple generators. We observe that as the diversity of generators increases, the overlap of these common features gradually decreases. This severely undermines model generalization. In contrast, focusing only on unique artifacts tends to cause overfitting to specific forgery patterns. To address this challenge, we propose LEGO (LoRA-Enabled Generator-Oriented Framework). The core mechanism of LEGO employs an MLP to modulate multiple LoRA (Low-Rank Adaptation) blocks, each pretrained to capture the unique artifacts of a specific generator, followed by attention-based feature fusion. Unlike conventional methods that seek a single universal solution, LEGO delegates unique artifact extraction to specialized LoRA modules by dividing its training procedure into two stages. Each LoRA module is individually trained on a single-generator dataset to learn generator-specific representations, then MLP and attention layers are trained on mixed datasets to dynamically regulate the contribution of each module. Benefiting from its modular yet robust design, LEGO can be naturally extended by incorporating new LoRA modules for adaptation to newly emerging next-generation datasets, while still achieving substantially better performance than prior SOTA methods with fewer than 30,000 training images, less than 10% of their training data, and only 5 epochs in each training stage.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04445v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yutong Xiao, Ran Ran, Jiwei Wei, Shuchang Zhou, Ke Liu, Zheng Ziqiang, Caiyan Qin</dc:creator>
    </item>
    <item>
      <title>Misrouter: Exploiting Routing Mechanisms for Input-Only Attacks on Mixture-of-Experts LLMs</title>
      <link>https://arxiv.org/abs/2605.04446</link>
      <description>arXiv:2605.04446v1 Announce Type: new 
Abstract: Mixture-of-Experts (MoE) architectures have emerged as a leading paradigm for scaling large language models through sparse, routing-based computation. However, this design introduces a new attack surface: the routing mechanism that determines which experts process each input. Prior work shows that manipulating routing can bypass safety alignment, but existing attacks require model modification and thus apply only to locally deployed models. By contrast, real-world LLM services are remotely hosted and accessible only through input queries. This raises a fundamental question: can MoE routing be exploited through input-only attacks to induce stronger unsafe behaviors in real-world services? Our key insight is to optimize attacks in a white-box setting on open-source surrogate MoE models and transfer the resulting adversarial inputs to public API services within the same model family. This setting presents three main challenges: routing can be influenced only indirectly through input perturbations, routing control and output generation are tightly coupled, and even a successful safety bypass may still produce low-quality responses. To address these challenges, we propose Misrouter, an input-only attack framework that jointly targets routing behavior and expert functionality. Misrouter identifies weakly aligned experts that are willing to produce target harmful content by analyzing expert activations under harmful queries paired with unsafe continuations. It then optimizes adversarial inputs to steer routing toward these experts and away from strongly aligned ones. It further biases routing toward highly capable general-purpose experts identified from benign question-answering tasks. Finally, because routing and output objectives can conflict, Misrouter uses a two-phase optimization strategy that first steers routing and then optimizes harmful outputs while preserving routing stability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04446v1</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zekun Fei, Zihao Wang, Weijie Liu, Ruiqi He, Jianing Geng, Zheli Liu, XiaoFeng Wang</dc:creator>
    </item>
    <item>
      <title>Deep Reprogramming Distillation for Medical Foundation Models</title>
      <link>https://arxiv.org/abs/2605.04447</link>
      <description>arXiv:2605.04447v1 Announce Type: new 
Abstract: Medical foundation models pre-trained on large-scale datasets have shown powerful versatile performance. However, when adapting medical foundation models for specific medical scenarios, it remains the inevitable challenge due to the gap induced by the discrepancy between pre-training and downstream tasks, the real-world computation, and speed constraints. Relevant techniques that probably handle this challenge more or less suffer from some intrinsic limitations. For example, knowledge distillation (KD) assumes that teacher and student models share the same task, training strategy, and model structure family, while prevalent parameter-efficient fine-tuning (PEFT) fails to achieve personalized and lightweight deployment. Even the combination of PEFT and KD still struggles to resolve model structures and training strategies inconsistencies between teacher and student models, leading to inefficient knowledge transfer. In this study, we propose a novel framework called Deep Reprogramming Distillation (DRD) to combat the general adaptation challenge. Specifically, DRD introduces the novel reprogramming module that on the one side overcomes the domain and task discrepancy between pretraining and downstream scenarios, and on the other side builds the student-friendly efficient distillation from foundation models to lightweight downstream models. Furthermore, to mitigate variability under different training conditions, we design a centered kernel alignment (CKA) distillation method to promote robust knowledge transfer. Empirical results show that DRD surpasses previous PEFT and KD methods across 18 medical downstream tasks under different foundation models, covering various scenarios including 2D/3D classification and 2D/3D segmentation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04447v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Siyuan Du, Yuhang Zhou, Haolin Li, Jiangchao Yao, Haishuai Wang, Hui Lin, Ya Zhang, Yanfeng Wang</dc:creator>
    </item>
    <item>
      <title>Queue-Aware and Resilient Routing in LEO Satellite Networks Using Multi-Agent Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2605.04448</link>
      <description>arXiv:2605.04448v1 Announce Type: new 
Abstract: With the rapid growth in data demand and stringent latency requirements of modern applications has driven significant interest in Low Earth Orbit (LEO) satellite constellations as an emerging solution for global Internet coverage. However, routing in LEO networks remains a fundamental challenge due to highly dynamic topologies, time-varying traffic conditions, and its susceptibility to link failures. Conventional routing algorithms typically assume static link metrics and fail to account for queue backlogs or real-time system variations, making them less effective in such environments. We propose a queue-aware multi-agent deep reinforcement learning (MA-DRL) framework for routing in LEO satellite networks. Each satellite is modeled as an independent agent responsible for making local routing decisions, enabling a distributed and scalable solution. The proposed framework formulates a latency-aware optimization problem that incorporates background traffic, queue dynamics at each satellite, and a resilience score to improve robustness. We evaluate the proposed approach against the state-action-reward-state-action (SARSA) and Dijkstra algorithms. While Dijkstra achieves the lowest end-to-end latency under ideal conditions, its computational and signaling overhead becomes a significant bottleneck as the network scales. In contrast, our proposed approach incurs significantly lower overhead (approximately 50% of Dijkstra at a 5 s recalculation interval), scales efficiently with network size, and effectively manages queue backlogs and resilience under increasing traffic load, demonstrating enhanced robustness and scalability in LEO satellite networks while maintaining competitive latency and resilience scores.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04448v1</guid>
      <category>cs.NI</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mudassar Liaq, Mahyar Tajeri, Peng Hu</dc:creator>
    </item>
    <item>
      <title>GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking</title>
      <link>https://arxiv.org/abs/2605.04449</link>
      <description>arXiv:2605.04449v1 Announce Type: new 
Abstract: Dialogue State Tracking (DST) requires precise extraction of structured information from multi-domain conversations, a task where Large Language Models (LLMs) struggle despite their impressive general capabilities. We present GEM (Graph-Enhanced Mixture-of-Experts), a novel framework that combines language models and graph-structured dialogue understanding with ReAct agent-based reasoning for superior DST performance. Our approach dynamically routes between specialized experts: a Graph Neural Network that captures dialogue structure and turn-level dependencies, and a finetuned T5-Small encoder-decoder for sequence modeling, coordinated by an intelligent router. For complex value generation tasks, we integrate ReAct agents that perform structured reasoning over dialogue context. On MultiWOZ 2.2, GEM achieves 65.19% Joint Goal Accuracy, substantially outperforming end-to-end LLM approaches (best: 38.43%) and surpassing state-of-the-art (SOTA) methods including TOATOD (63.79%), D3ST (58.70%), and Diable (56.48%). Our graph-enhanced mixture-of-experts architecture with ReAct integration demonstrates that combining structured dialogue representation with dynamic expert routing and agent-based reasoning provides a powerful paradigm for dialogue state tracking, achieving superior accuracy while maintaining computational efficiency through selective expert activation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04449v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ziqi Zhu, Adithya Suresh, Tomal Deb, Iman Abbasnejad</dc:creator>
    </item>
    <item>
      <title>One Pool, Two Caches: Adaptive HBM Partitioning for Accelerating Generative Recommender Serving</title>
      <link>https://arxiv.org/abs/2605.04450</link>
      <description>arXiv:2605.04450v1 Announce Type: new 
Abstract: Generative Recommender (GR) inference places embedding hot caches (EMB) and KV caches in direct competition for limited GPU HBM: allocating more memory to one improves its efficiency but degrades the other. Existing systems optimize them in isolation, overlooking that the optimal EMB-KV allocation ratio can shift by up to 0.35 across workload regimes, leaving 20-30\% latency improvement unrealized. While online reallocation is required to close this gap, naive approaches introduce H2D refill traffic on the critical path, causing P99 SLO violations.
  To address this, we present HELM, which jointly manages HBM allocation and request routing at runtime through two key components: (1) Adaptive Memory Allocation, a three-layer PPO-based controller (frozen base policy, online residual adapter, and burst-aware recovery controller) that achieves $32\,\mathrm{\mu s}$ decision latency while staying within 0.024-0.029 of the offline-optimal ratio; and (2) EMB-KV-Aware Scheduling, which routes requests by jointly considering KV residency, embedding locality, and node load to avoid routing inefficiencies under heterogeneous allocations. Evaluations on three production-scale datasets over a 32-node A100 cluster show that HELM reduces P99 latency by 24-38\% over the best static policy and achieves 93.5-99.6\% SLO satisfaction across Steady, Trend, and Burst workloads, significantly outperforming state-of-the-art baselines without sacrificing throughput.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04450v1</guid>
      <category>cs.DC</category>
      <category>cs.IR</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wenjun Yu, Shuguang Han, Amelie Chi Zhou</dc:creator>
    </item>
    <item>
      <title>RemoteZero: Geospatial Reasoning with Zero Human Annotations</title>
      <link>https://arxiv.org/abs/2605.04451</link>
      <description>arXiv:2605.04451v1 Announce Type: new 
Abstract: Geospatial reasoning requires models to resolve complex spatial semantics and user intent into precise target locations for Earth observation. Recent progress has liberated the reasoning path from manual curation, allowing models to generate their own inference chains. Yet a final dependency remains: they are still supervised by human-annotated ground-truth coordinates. This leaves the reasoning process autonomous, but not its spatial endpoint, and prevents true self-evolution on abundant unlabeled remote sensing data. To break this bottleneck, we introduce RemoteZero, a box-supervision-free framework for geospatial reasoning. RemoteZero is motivated by a simple asymmetry: an MLLM is typically better at verifying whether a region satisfies a query than at directly generating precise coordinates. Leveraging this stronger discriminative ability, RemoteZero replaces geometric supervision with intrinsic semantic verification and enables GRPO training without box annotations. The resulting framework further supports iterative self-evolution, allowing the model to improve from unlabeled remote sensing imagery through its own verification signal. Experiments show that RemoteZero achieves competitive performance against strong supervised methods, demonstrating the potential of self-verifying training for geospatial reasoning localization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04451v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Liang Yao, Fan Liu, Shengxiang Xu, Chuanyi Zhang, Rui Min, Shimin Di, Yuhui Zheng</dc:creator>
    </item>
    <item>
      <title>Beyond Ability: The Four-Fold Spectrum of Power and the Logic of Full Inability</title>
      <link>https://arxiv.org/abs/2605.04452</link>
      <description>arXiv:2605.04452v1 Announce Type: new 
Abstract: Coalition Logic studies what coalitions can enforce. Recent work treats inability as simple non-ability: $\neg\Eff{C}\varphi$. This conflates two distinct configurations -- a coalition unable to force $\varphi$ may still force $\neg\varphi$, retaining adversarial control rather than genuine inability. We introduce \textbf{Full Inability} ($\FI$): the symmetric condition in which a coalition can enforce neither a proposition nor its negation.
  Combining coalitional effectivity with propositional negation yields a four-fold spectrum: \textbf{Full Control} ($\FC$), \textbf{Positive Determination} ($\PD$), \textbf{Adverse Determination} ($\AD$), and \textbf{Full Inability} ($\FI$). These categories partition a coalition's strategic status exhaustively and exclusively. We establish their algebraic and order-theoretic structure. Under $\alpha$-duality, propositional negation and coalition complementation generate a Klein four-group symmetry. In playable models, the four power regions are order-convex in the powerset lattice, yielding interval-stable verification of inability.
  We axiomatize $\CLFI$, a definitional extension treating Full Inability as a primitive modality. Via elimination translation, we prove soundness, completeness, and conservativity over Coalition Logic. The extension preserves expressive power and complexity ($\PSPACE$-complete), but provides direct proof-theoretic access to symmetric inability, strategic dependence, propositional dummyhood, and containment verification.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04452v1</guid>
      <category>cs.LO</category>
      <category>math.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shanxia Wang</dc:creator>
    </item>
    <item>
      <title>StableI2I: Spotting Unintended Changes in Image-to-Image Transition</title>
      <link>https://arxiv.org/abs/2605.04453</link>
      <description>arXiv:2605.04453v1 Announce Type: new 
Abstract: In most real-world image-to-image (I2I) scenarios, existing evaluations primarily focus on instruction following and the perceptual quality or aesthetics of the generated images. However, they largely fail to assess whether the output image preserves the semantic correspondence and spatial structure of the input image. To address this limitation, we propose StableI2I, a unified and dynamic evaluation framework that explicitly measures content fidelity and pre--post consistency across a wide range of I2I tasks without requiring reference images, including image editing and image restoration. In addition, we construct StableI2I-Bench, a benchmark designed to systematically evaluate the accuracy of MLLMs on such fidelity and consistency assessment tasks. Extensive experimental results demonstrate that StableI2I provides accurate, fine-grained, and interpretable evaluations of content fidelity and consistency, with strong correlations to human subjective judgments. Our framework serves as a practical and reliable evaluation tool for diagnosing content consistency and benchmarking model performance in real-world I2I systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04453v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Jiayang Li, Shuo Cao, Xiaohui Li, Zhizhen Zhang, Kaiwen Zhu, Yule Duan, Yu Qiao, Jian Zhang, Yihao Liu</dc:creator>
    </item>
    <item>
      <title>Deployment-Relevant Alignment Cannot Be Inferred from Model-Level Evaluation Alone</title>
      <link>https://arxiv.org/abs/2605.04454</link>
      <description>arXiv:2605.04454v1 Announce Type: new 
Abstract: Alignment evaluation in machine learning has largely become evaluation of models. Influential benchmarks score model outputs under fixed inputs, such as truthfulness, instruction following, or pairwise preference, and these scores are often used to support claims about deployed alignment. This paper argues that deployment-relevant alignment cannot be inferred from model-level evaluation alone. Alignment claims should instead be indexed to the level at which evidence is collected: model-level, response-level, interaction-level, or deployment-level. Two studies support this position. First, a structured audit of eleven alignment benchmarks, extended to a sixteen-benchmark corpus, dual-coded against an eight-dimension rubric with Cohen's kappa = 0.87, finds that user-facing verification support is absent across every benchmark examined, while process steerability is nearly absent. The few interactional benchmarks identified, including tau-bench, CURATe, Rifts, and Common Ground, remain fragmented in coverage, and benchmark construction rather than data source determines what is measured. Second, a blinded cross-model stress test using 180 transcripts across three frontier models and four scaffolds finds that the same verification scaffold raises one model's verification support to ceiling while leaving another categorically unchanged. This shows that scaffold efficacy is model-dependent and that the gap identified by the audit cannot be closed at the model level alone. We propose a system-level evaluation agenda: alignment profiles instead of single scores, fixed-scaffolding protocols for comparable interactional evaluation, and reporting templates that make the inferential distance between evaluation evidence and deployment claims explicit.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04454v1</guid>
      <category>cs.AI</category>
      <category>cs.HC</category>
      <category>cs.LG</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Varad Vishwarupe, Nigel Shadbolt, Marina Jirotka, Ivan Flechais</dc:creator>
    </item>
    <item>
      <title>Long-time $L^2$&amp;$H^1$-stability of the Family of DLN Methods for the Two-dimensional Incompressible Navier-Stokes Equations</title>
      <link>https://arxiv.org/abs/2605.04455</link>
      <description>arXiv:2605.04455v1 Announce Type: new 
Abstract: In this report, we study the long-time stability of the family of one-leg DLN methods for the two-dimensional incompressible Navier-Stokes equations. The family of DLN methods (with one parameter $\theta$), non-linear energy stable ($G$-stable) and second-order accurate under arbitrary time grids, has been widely applied to the simulations of various fluid models with success. We derive a new version of the $G$-stability identity for the family of DLN methods under uniform time grids and mild time constraints. Then we utilize this crucial auxiliary tool and the discrete uniform Gr\"onwall inequality lemma to prove the uniform-in-time stability of the numerical solutions. Essentially, the bounds are independent of the time interval and the initial conditions, consistent with the theories of the continuous case.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04455v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Isabel Barrio Sanchez, Wenlong Pei, Catalin Trenchea</dc:creator>
    </item>
    <item>
      <title>DoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluation</title>
      <link>https://arxiv.org/abs/2605.04458</link>
      <description>arXiv:2605.04458v1 Announce Type: new 
Abstract: Evaluation of long-form, citation-backed reports has lately received significant attention due to the wide-scale adoption of retrieval-augmented generation (RAG) systems. Core to many evaluation frameworks is the use of atomic facts, or nuggets, to assess a report's coverage of query-relevant information attested in the underlying collection. While nuggets have traditionally been represented as short statements, recent work has used question-answer (QA) representations, enabling fine-grained evaluations that decouple the information need (i.e. the question) from the potentially diverse content that satisfies it (i.e. its answers).
  A persistent challenge for nugget-based evaluation is the need to manually curate sets of nuggets for each topic in a test collection -- a laborious process that scales poorly to novel information needs. This challenge is acute in cross-lingual settings, where information is found in multilingual source documents. Accordingly, we introduce DoGMaTiQ, a pipeline for generating high-quality QA-based nugget sets in three stages: (1) document-grounded nugget generation, (2) paraphrase clustering, and (3) nugget subselection based on principled quality criteria. We integrate DoGMaTiQ nuggets with AutoArgue -- a recent nugget-based evaluation framework -- to enable fully automatic evaluation of generated reports. We conduct extensive experiments on two cross-lingual TREC shared tasks, NeuCLIR and RAGTIME, showing strong rank correlations with both human-in-the-loop and fully manual judgments. Finally, detailed analysis of our pipeline reveals that a strong LLM nugget generator is key, and that the system rankings induced by DoGMaTiQ are robust to outlier systems. We facilitate future research in report evaluation by publicly releasing our code and artifacts at https://github.com/manestay/dogmatiq.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04458v1</guid>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bryan Li, William Walden, Yu Hou, Gabrielle Kaili-May Liu, Dawn Lawrie, Jame Mayfield, Eugene Yang, Chris Callison-Burch, Laura Dietz</dc:creator>
    </item>
    <item>
      <title>Discovering Sparse Counterfactual Factors via Latent Adjustment for Survey-based Community Intervention</title>
      <link>https://arxiv.org/abs/2605.04460</link>
      <description>arXiv:2605.04460v1 Announce Type: new 
Abstract: Transportation surveys are widely used to understand travel preferences and adoption barriers, yet most survey-based analyses remain descriptive or predictive and rarely provide sparse, policy-feasible intervention strategies. We study sparse counterfactual community intervention from survey responses, where the goal is to shift a target respondent group toward a desired reference group through controllable survey-variable adjustments. We formulate this task as a policy-feasible distributional alignment problem using a fixed-basis nonnegative latent representation that preserves pre/post comparability and provides a stable map from latent factors to original variables. To make latent movement actionable, target-relevant latent factors are identified through Shapley-guided attribution and transferred to controllable variables as intervention priorities. Feasible group-level adjustments are then learned by minimizing an entropy-regularized optimal-transport discrepancy between the post-intervention target distribution and the reference distribution, together with a weighted $\ell_{2,1}$ penalty that promotes shared policy-lever sparsity. Experiments on real-world transportation survey datasets show that the proposed framework produces compact and interpretable policy-feasible interventions with explicit adjustment magnitudes, improves population-level conversion, and preserves intervention sparsity. Code and datasets are publicly available at: https://github.com/pangjunbiao/latent-group-alignment.git</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04460v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Fatima Ashraf, Muhammad Ayub Sabir, Junbiao Pang, Yufang Zhou, Yan Shang</dc:creator>
    </item>
    <item>
      <title>Stream-T1: Test-Time Scaling for Streaming Video Generation</title>
      <link>https://arxiv.org/abs/2605.04461</link>
      <description>arXiv:2605.04461v1 Announce Type: new 
Abstract: While Test-Time Scaling (TTS) offers a promising direction to enhance video generation without the surging costs of training, current test-time video generation methods based on diffusion models suffer from exorbitant candidate exploration costs and lack temporal guidance. To address these structural bottlenecks, we propose shifting the focus to streaming video generation. We identify that its chunk-level synthesis and few denoising steps are intrinsically suited for TTS, significantly lowering computational overhead while enabling fine-grained temporal control. Driven by this insight, we introduced Stream-T1, a pioneering comprehensive TTS framework exclusively tailored for streaming video generation. Specifically, Stream-T1 is composed of three units: (1) Stream -Scaled Noise Propagation, which actively refines the initial latent noise of the generating chunk using historically proven, high-quality previous chunk noise, effectively establishes temporal dependency and utilizing the historical Gaussian prior to guide the current generation; (2) Stream -Scaled Reward Pruning, which comprehensively evaluates generated candidates to strike an optimal balance between local spatial aesthetics and global temporal coherence by integrating immediate short-term assessments with sliding-window-based long-term evaluations; (3) Stream-Scaled Memory Sinking, which dynamically routes the context evicted from KV-cache into distinct updating pathways guided by the reward feedback, ensuring that previously generated visual information effectively anchors and guides the subsequent video stream. Evaluated on both 5s and 30s comprehensive video benchmarks, Stream-T1 demonstrates profound superiority, significantly improving temporal consistency, motion smoothness, and frame-level visual quality.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04461v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Yijing Tu, Shaojin Wu, Mengqi Huang, Wenchuan Wang, Yuxin Wang, Chunxiao Liu, Zhendong Mao</dc:creator>
    </item>
    <item>
      <title>Robust Inverse Quadratic Error Decay with Meshing and Beam Search for Random Subset Sum</title>
      <link>https://arxiv.org/abs/2605.04465</link>
      <description>arXiv:2605.04465v1 Announce Type: new 
Abstract: The Subset Sum Problem is a fundamental NP-complete problem in cryptography and combinatorial optimization, with many real-world applications. The Random Subset Sum Problem (RSSP) is a more applicable version of subset sum, where numbers are drawn from some i.i.d input distribution. We present an algorithm that, with probability $1-\delta$, constructs the same $O(B/w)$ mesh as Da Cunha et al. (2023), while trimming to $w$ elements throughout and running in $O(w\log w)$ time. Then, we present a novel beam search heuristic running in linearithmic time w.r.t list size $n$ and beam width $w$ using the mesh that gives an expected error of $O\!\left(\frac{B}{nw^2}\right)$ under a standard mean-field assumption with equal standard deviation, demonstrating the practical effectiveness of meshing to achieve error decay. The algorithm is empirically robust to multiple input distributions and can naturally extend to variants with simple changes to the scoring heuristic, establishing a new practical baseline for robust subset sum error decay and $\epsilon$-approximation theory.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04465v1</guid>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Edwin Chen, Christof Teuscher</dc:creator>
    </item>
    <item>
      <title>KEET: Explaining Performance of GPU Kernels Using LLM Agents</title>
      <link>https://arxiv.org/abs/2605.04467</link>
      <description>arXiv:2605.04467v1 Announce Type: new 
Abstract: Performance profiles of GPU kernels generated by tools such as Nsight Compute are rich in detail but are often challenging to interpret. To achieve the best performance possible on a given GPU architecture, kernel developers need to spend significant time analyzing and comparing profiles in the tool's graphical interface to identify and understand kernel performance bottlenecks. Large Language Models (LLMs) have shown promise in understanding complex data and generating natural language explanations. In this paper, we propose the Kernel Execution Explanation Toolkit (KEET), an LLM-based agentic framework for interpreting Nsight Compute profiles to generate useful and data-grounded natural language explanations of performance issues in GPU kernels, and suggestions for optimizations. We evaluate \toolname using several CUDA kernels of varying complexity on NVIDIA H100 GPUs. We find that the generated explanations, when provided as context, improve the quality of LLM code optimization and multiple-choice question answering in downstream tasks. We further demonstrate that the tool can be used to interpret performance data from large sets of profiles to improve the quality of optimization suggestions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04467v1</guid>
      <category>cs.PF</category>
      <category>cs.DC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Joshua H. Davis, Klaudiusz Rydzy, Srinivasan Ramesh, Aadit Nilay, Daniel Nichols, Swapna Raj, Nikhil Jain, Abhinav Bhatele</dc:creator>
    </item>
    <item>
      <title>Stabilizing LLM Supervised Fine-Tuning via Explicit Distributional Control</title>
      <link>https://arxiv.org/abs/2605.04468</link>
      <description>arXiv:2605.04468v1 Announce Type: new 
Abstract: Post-training large language models (LLMs) often suffers from catastrophic forgetting, where improvements on a target objective degrade previously acquired capabilities. Recent evidence suggests that this phenomenon is primarily driven by excessive distributional drift during optimization. Motivated by this perspective, we propose Anchored Learning, a simple framework that explicitly controls distributional updates during offline fine-tuning via a dynamically evolving moving anchor. Instead of matching a fixed reference distribution, the anchor interpolates between the current model and a frozen reference to construct an intermediate target that the model distills toward, transforming global fine-tuning into a sequence of local trust-region updates in distribution space. Theoretically, we prove this anchor-based update admits a linear KL-divergence upper bound per iteration, ensuring a stable transition between model distributions. Extensive experiments on iGSM, MedCalc, and IFEval show that Anchored Learning consistently lies on the Pareto frontier of gain-stability trade-offs, achieving near-optimal performance improvements while substantially reducing degradation compared to strong baselines. For example, while standard SFT suffers from over 53% performance degradation on iGSM and MedCalc, Anchored Learning slashes this drop to under 5% while maintaining near-optimal gains (e.g., 75.2% on iGSM).</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04468v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xinyu Wang, Changzhi Sun, Yuanbin Wu, Xiaoling Wang</dc:creator>
    </item>
    <item>
      <title>CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies</title>
      <link>https://arxiv.org/abs/2605.04470</link>
      <description>arXiv:2605.04470v1 Announce Type: new 
Abstract: Open-loop imitation learning has advanced modern autonomous driving policy architectures, but closed-loop deployment remains vulnerable to policy-induced distribution shift. Existing post-training paradigms exhibit fundamental trade-offs: closed-loop RL fine-tuning provides grounded feedback from executed actions but is constrained by the sparsity of informative events, whereas counterfactual fine-tuning provides dense supervision over candidate futures but inherits bias from imperfect future estimates. We introduce Counterfactual-to-Interactive Reinforcement Fine-Tuning (CRAFT), an on-policy framework that formulates closed-loop post-training as proxy-residual optimization. CRAFT uses group-normalized counterfactual advantages as a dense proxy for real closed-loop advantages and aligns this proxy with the closed-loop world through grounded residual correction from interaction-critical events. To stabilize adaptation, CRAFT regularizes the online policy toward an EMA teacher via asymmetric KL self-distillation. Theoretically, CRAFT decomposes the real closed-loop policy gradient into proxy and residual terms under the same visited-state distribution, reducing residual variance with an aligned proxy while mitigating proxy bias through grounded residual approximation. Empirically, CRAFT achieves the strongest closed-loop gains on Bench2Drive across hierarchical planning, vision-language-action, and vocabulary-scoring architectures. Ablations, scaling behavior, stability analyses, and transfer results further validate the complementary roles of dense counterfactual proxy and grounded residual correction. Project page: https://currychen77.github.io/CRAFT.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04470v1</guid>
      <category>cs.LG</category>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Keyu Chen, Nanfei Ye, Yida Wang, Wenchao Sun, Danqi Zhao, Hao Cheng, Sifa Zheng</dc:creator>
    </item>
    <item>
      <title>Order Flow Exclusivity and Value Extraction Mechanisms: An Analysis of Ethereum Builder Centralization</title>
      <link>https://arxiv.org/abs/2605.04471</link>
      <description>arXiv:2605.04471v1 Announce Type: new 
Abstract: This study investigates the rapid centralization of the Ethereum builder market under the Proposer-Builder Separation (PBS) architecture. We argue that existing research, by focusing predominantly on influential order flows, lacks a comprehensive evaluation of order flow behavioral patterns and economic purposes. To address this gap, we analyze Ethereum transactions from September 2023 to August 2025 to characterize Exclusive Order Flows (EOFs) and non-atomic Maximal Extractable Value (MEV) -- the missing components corresponding to these behavioral and economic dimensions, respectively. We introduce a novel exclusivity metric based on Kullback-Leibler divergence and employ supervised learning to identify 75 EOFs and 322 non-atomic MEV flows, which account for 71\% and 23\% of trading-related builder revenue. A longitudinal analysis of builder strategies across these dimensions delineates the market's evolution into four distinct eras, revealing that while EOFs were instrumental in establishing early dominance, incumbents have since decoupled market share from immediate EOF dependency by leveraging entrenched network effects. Ultimately, we conclude that builder centralization is an emergent property of the PBS framework itself, as the architecture systematically violates the fundamental prerequisites of a competitive market.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04471v1</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ao Zhang (Tsinghua University, Beijing, China), Yunwen Liu (KU Leuven, Leuven, Belgium), Ren Zhang (Cryptape and Nervos, China), Yingdi Shan (Tsinghua University, Beijing, China), Yongwei Wu (Tsinghua University, Beijing, China)</dc:creator>
    </item>
    <item>
      <title>Automated Formal Proofs of Combinatorial Identities via Wilf-Zeilberger Guidance and LLMs</title>
      <link>https://arxiv.org/abs/2605.04472</link>
      <description>arXiv:2605.04472v1 Announce Type: new 
Abstract: Automating formal proofs of combinatorial identities is challenging for LLM-based provers, as long-horizon proof planning is required and unconstrained search quickly explodes. Symbolic methods such as the Wilf-Zeilberger (WZ) method can achieve a mechanized proof of combinatorial identities by constructing special auxiliary functions and demonstrating that they satisfy specific recurrence relations. We propose WZ-LLM, a neuro-symbolic framework that turns WZ proof plans into executable proof sketches in Lean 4 and uses an LLM-based prover to discharge the resulting machine-checkable subgoals. We also train a dedicated WZ-Prover via a Lean-kernel-verified bootstrapping loop with expert-verified iteration, followed by DAPO-based refinement. Experiments show that WZ-LLM achieves a 34% proof success rate on LCI-Test (100 classic combinatorial identities), outperforming strong baselines such as DeepSeek-V3 and Goedel-Prover-V2, and delivering consistent gains on CombiBench and PutnamBench-Comb. These results indicate that our framework provides two complementary strengths: improved direct proving for identities beyond the scope of WZ, and substantially higher end-to-end success when WZ sketches guide a specialized prover.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04472v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Beibei Xiong, Hangyu Lv, Junqi Liu, Yisen Wang, Shaoshi Chen, Jianlin Wang, Zhengfeng Yang, Lihong Zhi</dc:creator>
    </item>
    <item>
      <title>Geometry-Aware Neural Optimizer for Shape Optimization and Inversion</title>
      <link>https://arxiv.org/abs/2605.04474</link>
      <description>arXiv:2605.04474v1 Announce Type: new 
Abstract: Geometry is central to PDE-governed systems, motivating shape optimization and inversion. Classical pipelines conduct costly forward simulation with geometry processing, requiring substantial expert effort. Neural surrogates accelerate forward analysis but do not close the loop because gradients from objectives to geometry are often unavailable. Existing differentiable methods either rely on restrictive parameterizations or unstable latent optimization driven by scalar objectives, limiting interpretability and part-wise control. To address these challenges, we propose Geometry-Aware Neural Optimizer (GANO), an end-to-end differentiable framework that unifies geometry representation, field-level prediction, and automated optimization/inversion in a single latent-space loop. GANO encodes shapes with an auto-decoder and stabilizes latent updates via a denoising mechanism, and a geometry-injected surrogate provides a reliable gradient pathway for geometry updates. Moreover, GANO supports part-wise control through null-space projection and uses remeshing-free projection to accelerate geometry processing. We further prove that denoising induces an implicit Jacobian regularization that reduces decoder sensitivity, yielding controlled deformations. Experiments on three benchmarks spanning 2D Helmholtz, 2D airfoil, and 3D vehicles show state-of-the-art accuracy and stable, controllable updates, achieving up to +55.9% lift-to-drag improvement for airfoils and ~7% drag reduction for vehicles.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04474v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Guoze Sun, Tianya Miao, Haoyang Huang, Huaguan Chen, Han Wan, Rui Zhang, Hao Sun</dc:creator>
    </item>
    <item>
      <title>Information Coordination as a Bridge: A Neuro-Symbolic Architecture for Reliable Autonomous Driving Scene Understanding</title>
      <link>https://arxiv.org/abs/2605.04475</link>
      <description>arXiv:2605.04475v1 Announce Type: new 
Abstract: Reliable autonomous driving requires scene understanding that is semantically consistent across heterogeneous sensors and verifiable at the reasoning stage. However, many recent LLM-driven driving systems attach the language model as a post-processor and force it to reason over redundant or conflicting perception outputs, which can amplify hallucinated entities and unsafe conclusions. This paper proposes InfoCoordiBridge, a BEV-centric neuro-symbolic architecture that inserts an explicit coordination bridge between perception and language reasoning. InfoCoordiBridge comprises (i) a unified multi-agent perception layer that outputs typed structured facts together with modality-focused synopses, (ii) an ICA module that aligns and fuses multi-source outputs into a single SceneSummary, and (iii) an SSRE module that performs SceneSummary-grounded reasoning with verification. Experiments on nuScenes and Waymo show that ICA preserves competitive 3D detection accuracy while substantially improving fusion consistency, reducing redundancy to below 1% and achieving about 98% attribute agreement. On NuScenes-QA and a template-aligned Waymo-QA benchmark, SSRE improves factual grounding and reduces hallucinated entity mentions compared with representative VLM and agentic baselines. Overall, by coordinating multi-sensor outputs into a single conflict-aware SceneSummary before prompting, InfoCoordiBridge prevents redundant and cross-modally inconsistent perception evidence from propagating into high-level reasoning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04475v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shuo Liu, Lei Shi, Haowen Liu, Jing Xu, Yufei Gao, Yucheng Shi</dc:creator>
    </item>
    <item>
      <title>Data-dependent Exploration for Online Reinforcement Learning from Human Feedback</title>
      <link>https://arxiv.org/abs/2605.04477</link>
      <description>arXiv:2605.04477v1 Announce Type: new 
Abstract: Online reinforcement learning from human feedback (RLHF) has emerged as a promising paradigm for aligning large language models (LLMs) by continuously collecting new preference feedback during training. A foundational challenge in this setting is exploration, which requires algorithms that enable the LLMs to generate informative comparisons that improve sample-efficiency in online RLHF. Existing exploration strategies often derive bonuses via on-policy expectations, which are difficult to estimate reliably from the limited historical preference data available during training; as a result, the policy can prematurely down-weight under-explored regions that may contain high-value behaviors. In this paper, we propose data-dependent exploration for preference optimization (DEPO), a simple and scalable method that leverages historical data to construct an extra uncertainty bonus for high-uncertainty regions, encouraging exploration toward potentially high-value data. Theoretically, we provide a data-dependent regret bound for the proposed algorithm, showing that it adapts to the hardness of the learning task itself and can be tighter than worst-case bounds in practice. Empirically, the proposed method consistently outperforms strong baselines across benchmarks, demonstrating improved sample efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04477v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhen-Yu Zhang, Yuting Tang, Jiandong Zhang, Lanjihong Ma, Masashi Sugiyama</dc:creator>
    </item>
    <item>
      <title>CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training</title>
      <link>https://arxiv.org/abs/2605.04478</link>
      <description>arXiv:2605.04478v1 Announce Type: new 
Abstract: As training scales grow, collective communication libraries (CCL) increasingly face anomalies arising from complex interactions among hardware, software, and environmental factors. These anomalies typically manifest as slow/hang communication, the most frequent and time-consuming category to diagnose. However, traditional diagnostic methods remain inaccurate and inefficient, frequently requiring hours or even days for root cause analysis. To address this, we propose CCL-D, a high-precision diagnostic system designed to detect and locate slow/hang anomalies in large-scale distributed training. CCL-D integrates a rank-level real-time probe with an intelligent decision analyzer. The probe measures cross-layer anomaly metrics using a lightweight distributed tracing framework to monitor communication traffic. The analyzer performs automated anomaly detection and root-cause location, precisely identifying the faulty GPU rank. Deployed on a 4,000-GPU cluster over one year, CCL-D achieved near-complete coverage of known slow/hang anomalies and pinpointed affected ranks within 6 minutes-substantially outperforming existing solutions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04478v1</guid>
      <category>cs.DC</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3774934.3786429</arxiv:DOI>
      <dc:creator>Yida Gu, Fakang Wang, Jianhao Fu, Zhenhang Sun, Qianyu Zhang, Hairui Zhao, Xingchen Liu, Yang Tian, Wenjing Huang, Zedong Liu, Yifan Chen, Jinwu Yang, Yueyuan Zhou, Qian Zhao, Haoxu Li, Tao Wang, Feng Yu, Zhan Wang, Guangming Tan, Dingwen Tao</dc:creator>
    </item>
    <item>
      <title>Geometric Milstein Scheme for Stochastic Differential Equations on SO(n) and SE(n)</title>
      <link>https://arxiv.org/abs/2605.04480</link>
      <description>arXiv:2605.04480v1 Announce Type: new 
Abstract: In the paper, we propose a higher-order geometry-preserving numerical method for stochastic differential equations (SDEs) evolving on the Lie groups SO(n) and SE(n). Most existing Lie group integrators rely on Magnus expansion of the exponential map, which makes the construction of higher-order stochastic schemes difficult. To overcome this limitation, we develop a tangent-space parameterization corrected Milstein method (TaSP-CM), extending the tangent space parameterization (TaSP) framework from Lie-group ODEs to the stochastic setting. Although TaSP is a well-established method for Lie ODEs, the extension to SDEs is non-trivial and requires new stochastic corrections that ensure both geometric consistency and higher-order accuracy. We prove that the proposed scheme achieves strong convergence of order 1 under both commutative and non-commutative noise. Numerical experiments illustrate the theoretical results and demonstrate the efficiency and robustness of the proposed method.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04480v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xi Wang, Victor Solo</dc:creator>
    </item>
    <item>
      <title>Tightly-Coupled Estimation and Guidance for Robust Low-Thrust Rendezvous via Adaptive Homotopy</title>
      <link>https://arxiv.org/abs/2605.04481</link>
      <description>arXiv:2605.04481v1 Announce Type: new 
Abstract: Minimum-fuel low-thrust rendezvous guidance yields bang-bang control structures highly sensitive to estimation errors, sensor anomalies, and solver regularization, making aggressive closed-loop execution brittle for uncooperative proximity operations. This paper proposes a tightly-coupled estimation and guidance architecture where navigation confidence directly modulates the homotopy parameter of a receding-horizon indirect optimal control solver. Relative motion is modeled in the Clohessy-Wiltshire frame. The translational state is estimated via a linear Kalman filter augmented by a Multiple Tuning Factors (MTF) covariance inflation mechanism that suppresses suspicious innovation directions. A composite score from the normalized innovation and MTF activity is mapped online to the homotopy parameter, allowing the controller to relax toward a smoother, conservative regime when confidence degrades, and recover fuel-efficient bang-bang control as sensing improves. Numerical results under severe measurement degradation show fixed bang-bang guidance remains brittle; both plain-KF and MTF-KF fixed-epsilon controllers yield large terminal miss distances. Conversely, the proposed MTF-adaptive homotopy controller reduces terminal miss by roughly two orders of magnitude, from hundreds of meters to sub-meter levels, requiring only a moderate increase in control effort versus the open-loop fuel-optimal benchmark. A comparison indicates adaptive homotopy is the dominant robustness mechanism, while MTF provides additional accuracy and efficiency improvements. The receding-horizon implementation exhibits consistently fast and reliable solution times, supporting the practical online viability of the proposed method.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04481v1</guid>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Batu Candan, Simone Servadio</dc:creator>
    </item>
    <item>
      <title>Analysis of gradient flow for computing defocusing action ground states of rotating nonlinear Schr\"odinger equations</title>
      <link>https://arxiv.org/abs/2605.04485</link>
      <description>arXiv:2605.04485v1 Announce Type: new 
Abstract: This work focuses on the numerical computation of defocusing action ground states for rotating nonlinear Schr\"odinger equations (RNLS) using a direct gradient flow (DGF) method. We address theoretical gaps in the existing literature concerning the stability and convergence of this DGF scheme. Firstly, we prove the unconditional stability of the DGF scheme, demonstrating that the action functional is monotonically non-increasing along the discrete flow for arbitrary time step sizes. Secondly, we establish a rigorous convergence analysis, proving global convergence under minor assumptions and local exponential convergence to the action ground state under a reasonable non-degeneracy condition. The analysis relies on the uniform boundedness of sublevel sets of the action functional and introduces a tailored $H^1$-distance between phase-shift equivalence classes to handle complex-valued ground states with quantized vortices. A novel analytical framework is also developed to establish the exponential convergence rate. Numerical experiments are presented to validate the theoretical findings, demonstrating both the global migration towards a neighborhood of the ground state and subsequent exponential convergence.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04485v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wei Liu, Tingfeng Wang, Yongjun Yuan, Xiaofei Zhao</dc:creator>
    </item>
    <item>
      <title>How Does Thinking Mode Change LLM Moral Judgments? A Controlled Instant-vs-Thinking Comparison Across Five Frontier Models</title>
      <link>https://arxiv.org/abs/2605.04488</link>
      <description>arXiv:2605.04488v1 Announce Type: new 
Abstract: We evaluate whether enabling provider-exposed reasoning mode changes moral judgments within the same model checkpoint. Across 100 moral-judgment scenarios and five frontier reasoning-trained LLMs (Claude Sonnet 4.6, GPT 5.5, Gemini 3 Flash, DeepSeek V3.1, and Qwen3.5 397B), aggregate binary-verdict agreement remains high and statistically indistinguishable between instant and thinking modes (Krippendorff's alpha = 0.78 vs. 0.79). However, disagreement is concentrated in 21 model-disputed scenarios, where instant-mode agreement is near chance (alpha = 0.08). On these scenarios, reasoning directionally narrows cross-model disagreement, increasing mean pairwise agreement from 5.4 to 6.7 out of 10. Reasoning also reduces demographic-judgment inconsistency in three of five models and does not increase it for any model. Across all five model families, reasoning changes self-labeled ethical frameworks more often than binary verdicts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04488v1</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sai Sourabh Madur</dc:creator>
    </item>
    <item>
      <title>A Hybrid Method for Low-Resource Named Entity Recognition</title>
      <link>https://arxiv.org/abs/2605.04489</link>
      <description>arXiv:2605.04489v1 Announce Type: new 
Abstract: Named Entity Recognition (NER) is a critical component of Natural Language Processing with diverse applications in information extraction and conversational AI. However, NER in specific domains for low-resource languages faces challenges such as limited annotated data and heterogeneous label sets. This study addresses these issues by proposing a hybrid neurosymbolic framework that integrates rule-based processing with deep learning models for Vietnamese NER. The core idea involves a two-stage pipeline: first, a rule-based component reduces label complexity by grouping relational and special categories; second, pre-trained language models are fine-tuned for high-precision extraction. A post-processing module is then utilized to restore fine-grained labels, preserving expressiveness for application-level usability. To mitigate data scarcity, a scalable data augmentation strategy leveraging Large Language Models (LLMs) is introduced to expand the label set without full re-annotation, which is a significant novelty of this work. The effectiveness of this method was evaluated across five specific-domain datasets, including logistics, wildlife, and healthcare. Experimental results demonstrate substantial improvements over strong RoBERTa-based baselines. Specifically, the proposed system achieved F1 scores of 90 percent in Customer Service, up from 83 percent; 84 percent in GAM, up from 73 percent; 83 percent in AI Fluent, up from 80 percent; 94 percent in PhoNER_Covid19, up from 91 percent; and 60 percent in Rare Wildlife, up from 36 percent. These findings confirm that the hybrid approach effectively captures the linguistic complexity of Vietnamese and contextual nuances in specialized domains, offering a robust contribution to low-resource NER research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04489v1</guid>
      <category>cs.CE</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <arxiv:DOI>10.47738/jads.v7i2.1161</arxiv:DOI>
      <arxiv:journal_reference>Journal of Applied Data Sciences, Vol. 7, No. 2, pp. 999--1019, 2026</arxiv:journal_reference>
      <dc:creator>Do Minh Duc, Quan Xuan Truong, Viet Tran Hong, Le Hoang Anh, Mac Thi Minh Tra, Nguyen Van Thuy, Le Hai Ha, Vinh Nguyen Van</dc:creator>
    </item>
    <item>
      <title>An Evaluation of Chat Safety Moderations in Roblox</title>
      <link>https://arxiv.org/abs/2605.04491</link>
      <description>arXiv:2605.04491v1 Announce Type: new 
Abstract: Roblox is among the most popular online gaming platforms, used by hundreds of millions of users every day. A substantial portion of these users are underage, who are at a greater risk, where abusive users may utilize Roblox's real-time chat interface to make the initial contact with potential victims. Roblox employs automated chat moderation mechanisms to detect potentially abusive messages; however, to date, their effectiveness has not been independently investigated. Toward this goal, we collected approximately 2 million chat messages from four games across multiple age groups and analyzed them to evaluate the moderation system. These messages were collected from public game servers following ethical and legal norms as well as Roblox's terms of service.
  We use this corpus to qualitatively study which types of unsafe chats escape the moderation system and how policy-violating users evade the moderation system. Given the dataset's scale, it is prohibitively expensive to conduct qualitative content analysis manually. Therefore, we adopt a two-step approach. First, we manually labeled safe and unsafe messages (n=99.8K) and used them as a ground truth to evaluate four locally hosted state-of-the-art large language models (LLMs). Next, the best-performing LLM was applied to the entire corpus to identify potentially unsafe messages, which we manually categorized using iterative open and axial coding methods until thematic saturation was reached. Overall, our findings reveal a troublesome reality: numerous instances of unsafe chat messages related to grooming, sexualizing minors, bullying, &amp; harassment, violence, self-harm, and sharing sensitive information, etc., escaped the current moderation. Our analysis of users whose messages were previously flagged revealed that they continue to send harmful messages by employing a wide range of techniques to evade the moderation system.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04491v1</guid>
      <category>cs.CY</category>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Priya Kaushik, Sonja Brown, Rakibul Hasan, Sazzadur Rahaman</dc:creator>
    </item>
    <item>
      <title>Towards General Preference Alignment: Diffusion Models at Nash Equilibrium</title>
      <link>https://arxiv.org/abs/2605.04494</link>
      <description>arXiv:2605.04494v1 Announce Type: new 
Abstract: Reinforcement learning from human feedback (RLHF) has been popular for aligning text-to-image (T2I) diffusion models with human preferences. As a mainstream branch of RLHF, Direct Preference Optimization (DPO) offers a computationally efficient alternative that avoids explicit reward modeling and has been widely adopted in diffusion alignment. However, existing preference-based methods for diffusion alignment still rely on reward-induced preference signals and typically assume that human preferences can be adequately modeled by the Bradley--Terry (BT) model, which may fail to capture the full complexity of human preferences. In this paper, we formulate diffusion alignment from a game-theoretic perspective. We propose Diffusion Nash Preference Optimization (Diff.-NPO), an intuitive general preference framework for diffusion alignment. Diff.-NPO encourages the current policy to play against itself to achieve self improvement and lead to a better alignment. Empirically, we demonstrate the effectiveness of Diff.-NPO on the text-to-image generation task via various metrics. Diff.-NPO consistently outperforms existing preference-based diffusion alignment methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04494v1</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiaming Hu, Jiamu Bai, Haoyu Wang, Debarghya Mukherjee, Ioannis Ch. Paschalidis</dc:creator>
    </item>
    <item>
      <title>CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation</title>
      <link>https://arxiv.org/abs/2605.04495</link>
      <description>arXiv:2605.04495v1 Announce Type: new 
Abstract: Retrieval-Augmented Generation (RAG) depends on document ranking to provide useful evidence for generation, but conventional reranking methods mainly optimize query-document relevance rather than generation usefulness. A relevant document may still introduce noise, while a lower-ranked document may better reduce the generator's uncertainty. We propose CAR (Confidence-Aware Reranking), a query-guided, training-free, and plug-and-play reranking framework that uses generator confidence change as a document usefulness signal. CAR estimates confidence through the semantic consistency of multiple sampled answers under query-only and query-document conditions. Documents that significantly increase confidence are promoted, those that decrease confidence are demoted, and uncertain cases preserve the baseline order, while a query-level gate avoids unnecessary intervention on already confident queries. Experiments on four BEIR datasets show that CAR consistently improves NDCG@5 across sparse and dense retrievers, LLM-based and supervised rerankers, and four LLM backbones. Notably, CAR improves the YesNo reranker by 25.4 percent on average under Contriever retrieval, and its ranking gains strongly correlate with downstream generation F1 improvements, achieving Spearman rho = 0.964.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04495v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhipeng Song, Yizhi Zhou, Xiangyu Kong, Jiulong Jiao, Xuezhou Ye, Chunqi Gao, Xueqing Shi, Yuhang Zhou, Heng Qi</dc:creator>
    </item>
    <item>
      <title>SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States</title>
      <link>https://arxiv.org/abs/2605.04496</link>
      <description>arXiv:2605.04496v1 Announce Type: new 
Abstract: Long-Text Understanding (LTU) at million-token scale requires balancing reasoning fidelity with computational efficiency. Frontier long-context LLMs can process millions of token contexts end-to-end, but they suffer from high token consumption and attention dilution. In parallel, specialized LTU agents often sacrifice fidelity through task-agnostic abstractions like graph construction or indexing. We identify a key insight for LTU: query-relevant information is typically sparse relative to the full document, so effective reasoning should rely on a query-sufficient subset rather than the entire context. To address this, we propose SCOUT, a new paradigm for LTU that shifts from passive processing to active information foraging. It treats the document as an explorable environment and answers from a compact, provenance-grounded epistemic state. Guided by state-level gap diagnosis, SCOUT adaptively alternates between coarse-to-fine exploration and anchored state updates that progressively contract its epistemic state toward query sufficiency. Experiments show that SCOUT matches state-of-the-art proprietary models while reducing token consumption by up to 8x. Moreover, SCOUT remains stable as context length scales, substantially alleviating the practical cost-performance trade-off.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04496v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhenliang Zhang, Wenqing Wang, Yong Hu, Yaming Yang, Jiaheng Gao, Chen Shen, Xiaojun Wan</dc:creator>
    </item>
    <item>
      <title>Quadrature-TreeSHAP: Depth-Independent TreeSHAP and Shapley Interactions</title>
      <link>https://arxiv.org/abs/2605.04497</link>
      <description>arXiv:2605.04497v1 Announce Type: new 
Abstract: Shapley values are a standard tool for explaining predictions of tree ensembles, with Path-Dependent SHAP being the most widely used variant. Despite substantial progress, existing methods still exhibit trade-offs between depth-dependent runtime, numerical stability, and support for higher-order interactions. To address these challenges, we introduce Quadrature-TreeSHAP, a quadrature-based reformulation of Path-Dependent TreeSHAP that is numerically stable, naturally extends to any-order Shapley interaction values and is practically insensitive to tree depth. Our implementation supports both CPU and GPU and is integrated into XGBoost.
  Our method is based on a weighted-Banzhaf interaction polynomial, which expresses Banzhaf interaction values as expectations under a feature participation probability $p$. Shapley values and any-order interaction values are then recovered by integrating these polynomials over $p$ from 0 to 1. We evaluate these integrals using Gauss-Legendre quadrature, and show that, in practice, only 8 fixed quadrature points are sufficient to reach machine precision. In fact, Quadrature-TreeSHAP with 8 fixed points achieves greater numerical stability than TreeSHAP. This fixed-point formulation removes depth dependence from the inner computation and enables efficient SIMD execution.
  We confirm these advantages empirically. On 12 XGBoost benchmarks, Quadrature-TreeSHAP computes Shapley values 1.06x-10.59x faster than TreeSHAP on CPU and 1.84x-6.95x faster than GPUTreeSHAP on GPU. Shapley pairwise interactions are 3.80x-58.11x faster on CPU, with higher-order interactions achieving speedups of up to 1200x compared to TreeSHAP-IQ.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04497v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ron Wettenstein, Rory Mitchell, Peng Yu</dc:creator>
    </item>
    <item>
      <title>Pen-Strategist: A Reasoning Framework for Penetration Testing Strategy Formation and Analysis</title>
      <link>https://arxiv.org/abs/2605.04499</link>
      <description>arXiv:2605.04499v1 Announce Type: new 
Abstract: Cyber threats are rapidly increasing, expanding their impact from large-scale enterprises to government services and individual users, making robust security systems increasingly essential. However, a significant shortage of skilled cybersecurity professionals exacerbates this challenge. While recent research has explored automating tasks such as penetration testing using LLM-based agents, existing frameworks often perform poorly due to limited capability in strategy formulation, domain-specific reasoning, and accurate action and tool selection. To overcome these limitations, we propose Pen-Strategist framework, consisting of a novel domain-specific reasoning model that derives pentesting strategies via logical reasoning and a classifier that converts the strategies into actionable steps. First, we construct a reasoning dataset containing logical explanations for both strategy derivation and step selection in pentesting scenarios. We then fine-tune a Qwen-3-14B model for strategy generation using reinforcement learning. Evaluation on the test split of the dataset demonstrates a 87% improvement in strategy derivation performance compared to the baseline. Furthermore, we integrate the fine-tuned Pen-Strategist model into existing automated pentesting frameworks, such as PentestGPT, and evaluate its performance on vulnerable machines, achieving a 47.5% improvement in subtask completion while surpassing the baseline GPT-5. Further experiments on the CTFKnow benchmark show an 18% performance gain over the base model. For step prediction, we train a semantic-based CNN classifier, which outperforms commercial LLMs by 28% and enhances execution stability. Finally, we conduct a user study to qualitatively assess the generated strategies, and Pen-Strategist demonstrates superior performance compared to the Claude-4.6-Sonnet.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04499v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yasod Ginige, Pasindu Marasinghe, Sajal Jain, Suranga Seneviratne</dc:creator>
    </item>
    <item>
      <title>Harnessing Linguistic Dissimilarity for Language Generalization on Unseen Low-Resource Varieties</title>
      <link>https://arxiv.org/abs/2605.04500</link>
      <description>arXiv:2605.04500v1 Announce Type: new 
Abstract: Low-resource language varieties used by specific groups remain neglected in the development of Multilingual Language Models. A great deal of cross-lingual research focuses on inter-lingual language transfer which strives to align allied varieties and minimize differences between them. However, for low-resource varieties, linguistic dissimilarity is also an important cue allowing generalization to unseen varieties. Unlike prior approaches, we propose a two-stage Language Generalization framework that focuses on capturing variety-specific cues while also exploiting rich overlap offered by high-resource source variety. First, we propose TOPPing, a source-selection method specifically designed for low-resource varieties. Second, we suggest a lightweight VACAI-Bowl architecture that learns variety-specific attributes with one branch while a parallel branch captures variety-invariant attributes using adversarial training. We evaluate our framework on structural prediction tasks, which are among the few tasks available, as proxy for performance on other downstream tasks. Using VACAI-Bowl with TOPPing yields an average 54.62% improvement in the dependency parsing task, which serves as a proxy for performance on other downstream tasks across 10 low-resource varieties.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04500v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Jinju Kim, Haeji Jung, Youjeong Roh, Jong Hwan Ko, David R. Mortensen</dc:creator>
    </item>
    <item>
      <title>Example-Based Object Detection</title>
      <link>https://arxiv.org/abs/2605.04501</link>
      <description>arXiv:2605.04501v1 Announce Type: new 
Abstract: In recent years, object detection has achieved significant progress, especially in the field of open-vocabulary object detection. Unlike traditional methods that rely on predefined categories, open-vocabulary approaches can detect arbitrary objects based on human-provided prompts. With the advancement of prompt-based detection techniques, models such as SAM3 can even outperform some category-specific detectors trained on particular datasets without requiring additional training on those datasets. However, despite these advancements, false positives and false negatives still occur. In practical engineering applications, persistent misdetections or missed detections of the same object are unacceptable. Yet retraining the model every time such errors occur incurs substantial costs in terms of human effort, computational resources, and time. Therefore, how to leverage existing false positive and false negative samples to prevent such errors from recurring remains a highly challenging and urgent problem. To address this issue, we propose EBOD (Example-Based Object Detection), which integrates a prompt-based detector (SAM3) with robust feature matching modules (DINOv3 and LightGlue). The proposed framework effectively suppresses the repeated occurrence of false positives and false negatives by leveraging previous error examples, without requiring additional model retraining. Code is available at https://github.com/sunzx97/examples_based_object_detection.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04501v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>ZhiXin Sun</dc:creator>
    </item>
    <item>
      <title>Gradient Scaling Effects in Adaptive Spectral PINNs for Stiff Nonlinear ODEs</title>
      <link>https://arxiv.org/abs/2605.04502</link>
      <description>arXiv:2605.04502v1 Announce Type: new 
Abstract: Physics-Informed Neural Networks (PINNs) often struggle to train reliably on stiff and oscillatory dynamical systems due to poor optimization conditioning. While prior work has emphasized representational remedies such as spectral parameterizations, the optimization implications of initial-condition (IC) embeddings in adaptive spectral PINNs have not been well characterized. In this work, we show that the choice of IC gating function induces explicit time-dependent gradient scaling, which interacts with spectral representations during training. Using a nonlinear stiff spring-pendulum ODE as a controlled benchmark, we compare exponential and linear IC gates in combination with fixed and adaptive Fourier spectral trunks. We observe stiffness-dependent changes in relative dominance for adaptive PINNs: at moderate stiffness ($k=20$), exponential gating often yields lower error but exhibits heterogeneous behavior across random seeds, whereas at higher stiffness ($k=60$), linear gating becomes preferable, with additional reversals observed at larger $k$. These trends hold for both relative $L^2$ error and maximum pointwise error and are confirmed by paired Wilcoxon signed-rank tests with Holm correction. Overall, our results demonstrate that IC embeddings are not a neutral design choice in PINNs: the induced gradient scaling materially shapes optimization conditioning in stiff regimes, with distinct sensitivity patterns in baseline and adaptive spectral models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04502v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Isabela M. Yepes, Pavlos Protopapas</dc:creator>
    </item>
    <item>
      <title>DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning</title>
      <link>https://arxiv.org/abs/2605.04503</link>
      <description>arXiv:2605.04503v1 Announce Type: new 
Abstract: Image Difference Captioning (IDC) generates natural language descriptions that precisely identify differences between two images, serving as a key benchmark for fine-grained change perception, cross-modal reasoning, and image editing data construction. However, existing benchmarks lack diversity and compositional complexity, and standard lexical-overlap metrics (e.g., BLEU, METEOR) fail to capture semantic consistency or penalize hallucinations, which together prevent a comprehensive and robust evaluation of multimodal large language models (MLLMs) on IDC. To address these gaps, we introduce DiffCap-Bench, a comprehensive IDC benchmark covering ten distinct difference categories to ensure diversity and compositional complexity. Furthermore, we propose an LLM-as-a-Judge evaluation protocol grounded in human-validated Difference Lists, enabling a robust assessment of models' ability to both capture and describe visual changes. Through extensive evaluation of state-of-the-art MLLMs, we reveal significant performance gaps between proprietary and open-source models, highlight the critical importance of reasoning capability, and identify clear limitations in model scaling. Our framework also demonstrates strong alignment with human expert judgments and strong correlation with downstream image editing data construction quality. These findings establish DiffCap-Bench as both a reliable IDC evaluation framework and a practical predictor of downstream utility. The benchmark and code will be made publicly available to support further research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04503v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yuancheng Wei, Haojie Zhang, Linli Yao, Lei Li, Jiali Chen, Tao Huang, Yiting Lu, Duojun Huang, Xin Li, Zhao Zhong</dc:creator>
    </item>
    <item>
      <title>SpecPL: Disentangling Spectral Granularity for Prompt Learning</title>
      <link>https://arxiv.org/abs/2605.04504</link>
      <description>arXiv:2605.04504v1 Announce Type: new 
Abstract: Existing prompt learning for VLMs exhibits a modality asymmetry, predominantly optimizing text tokens while still relying on frozen visual encoder as holistic extractor and neglecting the spectral granularity essential for fine-grained discrimination. To bridge this, we introduce Disentangling Spectral Granularity for Prompt Learning (SpecPL), which approaches prompt learning from a novel spectral perspective via Counterfactual Granule Supervision. Specifically, we leverage a frozen VAE to decompose visual signals into semantic low-frequency bands and granular high-frequency details. A frozen Visual Semantic Bank anchors text representations to universal low-frequency invariants, mitigating overfitting. Crucially, fine-grained discrimination is driven by counterfactual granule training: by permuting high-frequency signals, we compel the model to explicitly distinguish visual granularity from semantic invariance. Uniquely, SpecPL serves as a universal plug-and-play booster, revitalizing text-oriented baselines like CoOp and MaPLe via visual-side guidance. Experiments on 11 benchmarks demonstrate competitive state-of-the-art performance, achieving a new performance ceiling of 81.51\% harmonic-mean accuracy. These results validate that spectral disentanglement with counterfactual supervision effectively bridges the gap in the stability-generalization trade-off. Code is released at https://github.com/Mlrac1e/SpecPL-Prompt-Learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04504v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jingtao Zhou, Xirui Kang, Feiyang Huang, Lai-Man Po</dc:creator>
    </item>
    <item>
      <title>Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting</title>
      <link>https://arxiv.org/abs/2605.04506</link>
      <description>arXiv:2605.04506v1 Announce Type: new 
Abstract: We introduce Ilov3Splat, a novel framework for instance-level open-vocabulary 3D scene understanding built on 3D Gaussian Splatting (3D-GS). Most prior work depends on 2D rendering-based matching or point-level semantic association, which undermines cross-view consistency, lacks coherent instance-level reasoning, and limits precision in downstream 3D tasks. To address these limitations, our method jointly optimizes scene geometry and semantic representations by augmenting Gaussian splats with view-consistent feature fields. Specifically, we leverage multi-resolution hash embedding to efficiently encode language-aligned CLIP features, enabling dense and coherent language grounding in 3D space. We further train an instance feature field using contrastive loss over SAM masks, supporting fine-grained object distinction across views. At inference time, CLIP-encoded queries are matched against the learned features, followed by two-stage 3D clustering to retrieve relevant Gaussian groups. This enables our framework to identify arbitrary objects in 3D scenes based on natural language descriptions, without requiring category supervision or manual annotations. Experiments on standard benchmarks demonstrate that Ilov3Splat outperforms prior open-vocabulary 3D-GS methods in both object selection and instance segmentation, offering a flexible and accurate solution for language-driven 3D scene understanding. Project page: https://csiro-robotics.github.io/Ilov3Splat.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04506v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Binh Long Nguyen, Kien Nguyen, Sridha Sridharan, Clinton Fookes, Peyman Moghadam</dc:creator>
    </item>
    <item>
      <title>Distilling Bayesian Belief States into Language Models for Auditable Negotiation</title>
      <link>https://arxiv.org/abs/2605.04507</link>
      <description>arXiv:2605.04507v1 Announce Type: new 
Abstract: Negotiation agents must infer what their counterpart values, update those beliefs over dialogue turns, and choose actions under uncertainty. End-to-end large language models (LLMs) can imitate negotiation dialogue, but their opponent beliefs are usually implicit and difficult to inspect. We propose BOND (Bayesian Opponent-belief Negotiation Distillation), a framework for auditable negotiation. BOND consists of an LLM-based Bayesian teacher that scores dialogue contexts against the six possible opponent priority orderings, updates a posterior over those orderings, and uses the posterior for menu-based decision making, as well as a smaller 8B student language model that emits both negotiation actions and normalized posterior beliefs as tagged text. In the CaSiNo negotiation dataset, BOND outperforms the state-of-the-art and achieves mean Brier score 0.085 over opponent-priority posteriors. The distilled student preserves much of this belief signal, achieving Brier 0.114, below the uniform six-ordering reference of 5/36, approximately 0.139. Compared with a 70B structured-CoT baseline, the significantly smaller 8B student model yields substantially better elicited posterior calibration. We further showcase auditability through posterior trajectories, belief-versus-policy error decomposition, and posterior-prefix interventions. These diagnostics reveal that distillation preserves a scoreable belief report more strongly than causal belief-conditioned control, making weak belief-action coupling visible, not hidden.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04507v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zongqi Cui, Baihan Lin</dc:creator>
    </item>
    <item>
      <title>CoherentRaster: Efficient 3D Gaussian Splatting for Light Field Displays</title>
      <link>https://arxiv.org/abs/2605.04509</link>
      <description>arXiv:2605.04509v1 Announce Type: new 
Abstract: Light field displays (LFDs) require rendering an interlaced image that encodes many view-dependent observations. This multi-view requirement introduces substantial computational overhead, making real-time rendering difficult to achieve. While 3D Gaussian Splatting (3DGS) is efficient for single-view rendering on 2D displays, directly extending it to LFDs is computationally expensive. Moreover, prior accelerations either suffer from GPU inefficiency under spatially incoherent subpixel layouts or rely on computationally heavy multi-plane intermediates. In this paper, we propose CoherentRaster, a 3DGS-based light field rendering framework that performs subpixel-level rasterization. Our method employs Cross-view Coherent Attribute Reuse to eliminate redundant computation across neighboring viewpoints and applies View-coherent Remapping to restore warp-level memory efficiency degraded by the interlaced subpixel layout. Together, CoherentRaster provides an efficient pipeline for real-time, high-quality light field synthesis on consumer-grade hardware.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04509v1</guid>
      <category>cs.GR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gyujin Sim, Seungjoo Shin, Hosung Jeon, Gwangsoon Lee, Hyon-Gon Choo, Sunghyun Cho</dc:creator>
    </item>
    <item>
      <title>From Priors to Perception: Grounding Video-LLMs in Physical Reality</title>
      <link>https://arxiv.org/abs/2605.04515</link>
      <description>arXiv:2605.04515v1 Announce Type: new 
Abstract: While Video Large Language Models (Video-LLMs) excel in general understanding, they exhibit systematic deficits in fine-grained physical reasoning. Existing interventions not only suffer from limited generalization but fundamentally conflate generative artifacts with genuine physical fallacies. Furthermore, we find that models fail systematically not only in anti-physics anomalies but also in counter-intuitive scenarios where visual facts contradict statistical expectations. Accordingly, we propose the Unified Attribution Theory: this dual failure stems not from perception deficiency, but from Semantic Prior Dominance -- the reasoning mechanism is deeply hijacked by internal narrative scripts. To address this, we construct the Programmatic Adversarial Curriculum (PACC), the first high-fidelity adversarial video dataset synthesized based on physical laws, thoroughly decoupling visual artifacts from logical errors. Concurrently, we design the Visual-Anchored Reasoning Chain (VARC) to force models to explicitly ground their judgments in low-level visual facts prior to logical adjudication. Experiments demonstrate that without invasive architectural modifications, standard LoRA fine-tuning with the PACC curriculum effectively neutralizes prior interference in state-of-the-art (SOTA) models, yielding a substantial leap in physical reasoning capabilities.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04515v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zicheng Zhao, Chaofan Gan, Shijie Li, Weiyao Lin</dc:creator>
    </item>
    <item>
      <title>DALight-3D: A Lightweight 3D U-Net for Brain Tumor Segmentation from Multi-Modal MRI</title>
      <link>https://arxiv.org/abs/2605.04518</link>
      <description>arXiv:2605.04518v1 Announce Type: new 
Abstract: Automatic brain tumor segmentation from multi-modal MRI remains challenging because volumetric models often incur substantial computational cost. This paper presents DALight-3D, a compact 3D U-Net variant that combines depthwise separable 3D convolutions, identifier-conditioned normalization, cross-slice attention, and adaptive skip fusion. The method is evaluated on the Medical Segmentation Decathlon Task01 BrainTumour benchmark under matched optimization settings against standard 3D U-Net, Attention U-Net, Residual 3D U-Net, and V-Net baselines. In the reported 50-epoch comparison, DALight-3D achieves a mean Dice of 0.727 with 2.22M parameters, compared with 0.710 Dice and 3.20M parameters for Residual 3D U-Net. Component-wise ablations show consistent performance degradation when SepConv, identifier-conditioned normalization, CSA, or SSFB is removed. These results indicate that DALight-3D offers a favorable accuracy-efficiency trade-off within the present benchmark setting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04518v1</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <category>cs.NE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Nand Kumar Mishra, Dhruv Mishra, Dr Manu Pratap Singh</dc:creator>
    </item>
    <item>
      <title>FL-Sailer: Efficient and Privacy-Preserving Federated Learning for Scalable Single-Cell Epigenetic Data Analysis via Adaptive Sampling</title>
      <link>https://arxiv.org/abs/2605.04519</link>
      <description>arXiv:2605.04519v1 Announce Type: new 
Abstract: Single-cell ATAC-seq (scATAC-seq) enables high-resolution mapping of chromatin accessibility, yet privacy regulations and data size constraints hinder multi-institutional sharing. Federated learning (FL) offers a privacy-preserving alternative, but faces three fundamental barriers in scATAC-seq analysis: ultra-high dimensionality, extreme sparsity, and severe cross-institutional heterogeneity. We propose FL-Sailer, the first FL framework designed for scATAC-seq data. FL-Sailer integrates two key innovations: (i) adaptive leverage score sampling, which selects biologically interpretable features while reducing dimensionality by 80%, and (ii) an invariant VAE architecture, which disentangles biological signals from technical confounders via mutual information minimization. We provide a convergence guarantee, showing that FL-Sailer converges to an approximate solution of the original high-dimensional problem with bounded error. Extensive experiments on synthetic and real epigenomic datasets demonstrate that FL-Sailer not only enables previously infeasible multi-institutional collaborations but also surpasses centralized methods by leveraging adaptive sampling as an implicit regularizer to suppress technical noise. Our work establishes that federated learning, when tailored to domain-specific challenges, can become a superior paradigm for collaborative epigenomic research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04519v1</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>Transactions on Machine Learning Research (TMLR), May 2026</arxiv:journal_reference>
      <dc:creator>Guangyi Zhang, Yi Dai, Yiyun He, Junhao Liu</dc:creator>
    </item>
    <item>
      <title>DAO-enabled decentralized physical AI: A new paradigm for human-machine collaboration</title>
      <link>https://arxiv.org/abs/2605.04522</link>
      <description>arXiv:2605.04522v1 Announce Type: new 
Abstract: We propose DAO-enabled decentralized physical AI (DePAI), a democratic architecture for coordinating humans and autonomous machines in the operation and governance of physical-digital systems. We (1) synthesize foundations in blockchains, decentralized autonomous organizations (DAOs), and cryptoeconomics; (2) connect DAO design with digital-democracy research on deliberation and voting, showing how each can advance the other; (3) position DAO-governed decentralized physical infrastructure networks (DePIN) within a vertically integrated stack that links energy and sensing to connectivity, storage/compute, models, and robots; (4) show how these elements specify workflows that couple machine execution with human oversight, enabling enhanced self-organization of techno-socio-economic systems, which we call DePAI; and (5) analyze risks, including security, centralization, incentive failure, legal exposure, and the crowding-out of intrinsic motivation, and argue for value-sensitive design and continuously adaptive governance. DePAI offers a path to scalable, resilient self-organization that integrates physical infrastructure, AI, and community ownership under transparent rules, on-chain incentives, and permissionless participation, aiming to preserve human autonomy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04522v1</guid>
      <category>cs.MA</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <category>econ.GN</category>
      <category>q-fin.EC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mark C. Ballandies, Florian Spychiger, Uwe Serd\"ult, Claudio J. Tessone</dc:creator>
    </item>
    <item>
      <title>RaguTeam at SemEval-2026 Task 8: Meno and Friends in a Judge-Orchestrated LLM Ensemble for Faithful Multi-Turn Response Generation</title>
      <link>https://arxiv.org/abs/2605.04523</link>
      <description>arXiv:2605.04523v1 Announce Type: new 
Abstract: We present our winning system for Task~B (generation with reference passages) in SemEval-2026 Task~8: MTRAGEval. Our method is a heterogeneous ensemble of seven LLMs with two prompting variants, where a GPT-4o-mini judge selects the best candidate per instance. We ranked 1st out of 26 teams, achieving a conditioned harmonic mean of 0.7827 and outperforming the strongest baseline (gpt-oss-120b, 0.6390). Ablations show that diversity in model families, scales, and prompting strategies is essential, with the ensemble consistently beating any single model. We also introduce Meno-Lite-0.1, a 7B domain-adapted model with a strong cost--performance trade-off, and analyse MTRAGEval, highlighting annotation limitations and directions for improvement. Our code is publicly available: https://github.com/RaguTeam/ragu_mtrag_semeval</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04523v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ivan Bondarenko, Roman Derunets, Oleg Sedukhin, Mikhail Komarov, Ivan Chernov, Mikhail Kulakov</dc:creator>
    </item>
    <item>
      <title>High-Fidelity Single-Image Head Modeling with Industry-Grade Topology</title>
      <link>https://arxiv.org/abs/2605.04524</link>
      <description>arXiv:2605.04524v1 Announce Type: new 
Abstract: We present a single-image head mesh reconstruction framework that addresses the longstanding challenge of simultaneously preserving facial identity and producing industry-grade topology. Our framework adopts a coarse-to-fine optimization pipeline that refines a rigged template across three stages -- rig, joint, and vertex -- achieving stable convergence and consistent topology. To mitigate the ill-posed nature of single-image 3D face reconstruction and ensure identity preservation, we employ a normal consistency objective jointly with landmark alignment. To further preserve local surface structure and enforce topological regularity, we introduce geometry-aware constraints based on Gaussian curvature and conformal consistency, along with auxiliary regularizations that correct fine artifacts such as lip seams and eyelid discontinuities. Our hierarchical optimization with geometry-aware regularization yields meshes with semantically meaningful edge flow and industry-grade topology. After geometry reconstruction, we extract UV-space texture and normal maps to preserve appearance details for visualization and downstream use. In a user study with 22 professional technical artists, our results were assessed as approaching industry-grade usability, and 95% of participants ranked our method as the top-performing approach, underscoring its effectiveness for real-world digital human production.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04524v1</guid>
      <category>cs.CV</category>
      <category>cs.GR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yunmu Wang, Zoubin Bi, Bowen Cai, Chenchu Rong, Jinlong Wang, Junchen Deng, Aocheng Huang, Jidong Jia, Huan Fu</dc:creator>
    </item>
    <item>
      <title>HDFlow: Hierarchical Diffusion-Flow Planning for Long-horizon Tasks</title>
      <link>https://arxiv.org/abs/2605.04525</link>
      <description>arXiv:2605.04525v1 Announce Type: new 
Abstract: Recent advances in generative models have shown promise in generating behavior plans for long-horizon, sparse reward tasks. While these approaches have achieved promising results, they often lack a principled framework for hierarchical decomposition and struggle with the computational demands of real-time execution, due to their iterative denoising process. In this work, we introduce Hierarchical Diffusion-Flow (HDFlow), a novel hierarchical planning framework that optimally leverages the strengths of diffusion and rectified flow models to overcome the limitations of single-paradigm generative planners. HDFlow employs a high-level diffusion planner to generate sequences of strategic subgoals in a learned latent space, capitalizing on diffusion's powerful exploratory capabilities. These subgoals then guide a low-level rectified flow planner that generates smooth and dense trajectories, exploiting the speed and efficiency of ordinary differential equation (ODE)-based trajectory generation. We evaluate HDFlow on four challenging furniture assembly tasks in both simulation and real-world, where it significantly outperforms state-of-the-art methods. Furthermore, we also showcase our method's generalizability on two long-horizon benchmarks comprising diverse locomotion and manipulation tasks. Project website: https://hdflow-page.github.io/</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04525v1</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Nandiraju Gireesh, Yuanliang Ju, Chaoyi Xu, Weiheng Liu, Yuxuan Wan, He Wang</dc:creator>
    </item>
    <item>
      <title>Velox: Learning Representations of 4D Geometry and Appearance</title>
      <link>https://arxiv.org/abs/2605.04527</link>
      <description>arXiv:2605.04527v1 Announce Type: new 
Abstract: We introduce a framework for learning latent representations of 4D objects which are descriptive, faithfully capturing object geometry and appearance; compressive, aiding in downstream efficiency; and accessible, requiring minimal input, i.e., an unstructured dynamic point cloud, to construct. Specifically, Velox trains an encoder to compress spatiotemporal color point clouds into a set of dynamic shape tokens. These tokens are supervised using two complementary decoders: a 4D surface decoder, which models the time-varying surface distribution capturing the geometry; and a Gaussian decoder, which maps the tokens to 3D Gaussians, helping learn appearance. To demonstrate the utility of our representation, we evaluate it across three downstream tasks -- video-to-4D generation, 3D tracking, and cloth simulation via image-to-4D generation -- and observe strong performances in all settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04527v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Anagh Malik, Dorian Chan, Xiaoming Zhao, David B. Lindell, Oncel Tuzel, Jen-Hao Rick Chang</dc:creator>
    </item>
    <item>
      <title>YOTOnet: Zero-Shot Cross-Domain Fault Diagnosis via Domain-Conditioned Mixture of Experts</title>
      <link>https://arxiv.org/abs/2605.04528</link>
      <description>arXiv:2605.04528v1 Announce Type: new 
Abstract: Mechanical equipment forms the critical backbone of modern industrial production, yet domain shift severely limits the generalization of deep learning based fault diagnosis models across different equipment and operating conditions.Inspired by the success of foundation models in achieving zero-shotgeneralization, we propose YOTOnet (You Only Train Once), a novel architecture specifically designed for cross-domain fault diagnosis in mechanical equipment.YOTOnet comprises three core components: (1) a physics-aware Invariant Feature Distiller that extracts domain-agnostic representations using multi-scale dilated convolutions and FFT-based time-frequency fusion,(2) Domain-Conditioned Sparse Experts (DC-MoE) that adaptively route inputs to specialized processors via learned gating without external meta-data, and (3) a dual-head classification system with auxiliary supervision.Extensive validation on five public bearing datasets (CWRU, MFPT, XJTU,OTTAWA, HUST) through 30 cross-dataset protocols demonstrates the superiority of YOTOnet compared with other state-of-the-art methods. Critically, we observe a clear scaling effect-average test F1 improves from 0.5339(1 training dataset) to 0.705 (4 datasets), with a clear gain when moving from 3 to 4 datasets. These findings provide empirical evidence that foundation model principles can enable robust, train-once deployment for industrial fault diagnosis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04528v1</guid>
      <category>cs.LG</category>
      <category>cs.MA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zesen Wang, Zihao Wu, Yue Hu, Yang Gao, Fuzhen Xuan</dc:creator>
    </item>
    <item>
      <title>SADE: Symptom-Aware Diagnostic Escalation for LLM-Based Network Troubleshooting</title>
      <link>https://arxiv.org/abs/2605.04530</link>
      <description>arXiv:2605.04530v1 Announce Type: new 
Abstract: Large language model (LLM) agents are increasingly applied to network troubleshooting, but root-cause localization on public benchmarks remains well below practical deployment thresholds. We argue this is because existing agents do not encode the disciplined, layer-by-layer methodology that human network engineers use, and instead rely on free-form deliberation that conflates evidence acquisition with hypothesis commitment. We present SADE (Symptom-Aware Diagnostic Escalation), an agent that encodes the classical Cisco troubleshooting methodology as an explicit policy. SADE pairs a phase-gated diagnostic workflow, which separates evidence acquisition from hypothesis commitment, with a routed library of fault-family skills and high-yield diagnostic helpers. On a held-out 523 incident set of the public NIKA benchmark covering eleven unseen scenarios, SADE improves root-cause F1 by 37 percentage points over a ReAct + GPT-5 baseline; a model-controlled comparison against the same Claude Sonnet backend without the SADE policy attributes 22 of those points to the diagnostic policy alone, showing that the gain is not a side-effect of the model upgrade.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04530v1</guid>
      <category>cs.NI</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kuan-Hao Tseng, Niruth Bogahawatta, Yasod Ginige, Kosta Dekic, Arunan Sivanathan, Suranga Seneviratne</dc:creator>
    </item>
    <item>
      <title>Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection</title>
      <link>https://arxiv.org/abs/2605.04531</link>
      <description>arXiv:2605.04531v1 Announce Type: new 
Abstract: Open-vocabulary object detection with vision-language models (VLMs) such as Grounding DINO suffers from performance degradation under test-time distribution shifts, primarily due to semantic misalignment between text embeddings and shifted visual embeddings of region proposals. While recent test-time adaptive object detection methods for VLM-based either rely on costly backpropagation or bypass semantic misalignment via external memory, none directly and efficiently align text and vision in a training-free manner. To address this, we propose Reward-Guided Semantic Evolution (RGSE), a training-free framework that directly refines the text embeddings at test time. Inspired by evolutionary search, RGSE treats text embedding adaptation as a semantic search process: it perturbs text embeddings as candidate variants, evaluates them via cosine similarity with current and historical high-confidence visual proposals as a reward signal, and fuses them into a refined embedding through reward-weighted averaging. Without any backpropagation, RGSE achieves state-of-the-art performance across multiple detection benchmarks while adding minimal computational overhead. Our code will be open source upon publication.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04531v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lihua Zhou, Mao Ye, Xiatian Zhu, Nianxin Li, Changyi Ma, Shuaifeng Li, Yitong Qin, Hongbin Liu, Jiebo Luo, Zhen Lei</dc:creator>
    </item>
    <item>
      <title>Accountable Agents in Software Engineering: An Analysis of Terms of Service and a Research Roadmap</title>
      <link>https://arxiv.org/abs/2605.04532</link>
      <description>arXiv:2605.04532v1 Announce Type: new 
Abstract: AI coding assistants and autonomous agents are becoming integral to software development workflows, reshaping how code is produced, reviewed, and maintained. While recent research has focused mainly on the capabilities and impacts of productivity of these systems, much less attention has been paid to accountability: who is responsible when agents generate, modify, or recommend code? In practice, accountability is defined through the Terms of Service (ToS) and related policy documents that govern the use of AI-powered development tools.
  In this vision paper, we present a comparative analysis of the Terms of Service for widely used AI coding assistants and agent-enabled development tools. We examine how these documents allocate ownership, responsibility, liability, and disclosure obligations between tool providers and software developers, and we identify common patterns and divergences between providers. Our analysis reveals a consistent tendency to shift responsibility for correctness, safety, and legal compliance onto users, as well as substantial variation in how providers address issues such as indemnification, data reuse, and acceptable use.
  Based on these findings, we argue that existing policy frameworks are poorly aligned with increasingly agent-mediated and autonomous software development workflows. We outline a research roadmap for accountable agents in software engineering, identifying challenges and opportunities for modeling responsibility, designing governance artifacts, developing tooling that supports accountability, and conducting empirical studies of developers' perceptions and practices.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04532v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1145/3805760.3814889</arxiv:DOI>
      <dc:creator>Christoph Treude</dc:creator>
    </item>
    <item>
      <title>Characterizing Students' LLM Usage Behaviors and Their Association with Learning in Critical Thinking Tasks</title>
      <link>https://arxiv.org/abs/2605.04534</link>
      <description>arXiv:2605.04534v1 Announce Type: new 
Abstract: Large language models (LLMs) are becoming increasingly embedded in students' learning practices, yet much of what is known about how students use LLMs and how this usage impacts learning comes from problem-solving domains or constrained experimental settings. We present an analysis of data on LLM usage collected during two offerings of a research-oriented course where students learn to read, reason about, and critique academic papers. Without restrictions on whether or how to use LLMs, students reported their LLM usage practices when asked to do these activities as a series of homework assignments during the course. This paper extends prior work done on data from a single offering of the same course by presenting a refined bottom-up categorization of LLM usage types, cross-labeled by the extent of student initiative these usages entail. Furthermore, we examine how LLM use impacts student learning, measured by performance on three midterms, looking at factors such as frequency and type of usage.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04534v1</guid>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Minju Park, Ivan Orozco Vasquez, Cristina Conati</dc:creator>
    </item>
    <item>
      <title>From Video-to-PDE: Data-Driven Discovery of Nonlinear Dye Plume Dynamics</title>
      <link>https://arxiv.org/abs/2605.04535</link>
      <description>arXiv:2605.04535v1 Announce Type: new 
Abstract: Inferring continuum models directly from video is hampered by two facts: the recorded field is uncalibrated image intensity rather than a physical state, and direct numerical differentiation of noisy frames is unstable. We develop a video-to-PDE pipeline that converts grayscale recordings of an ink plume into a normalised scalar field $u(x,y,t)$, isolates a bulk drift $\mathbf{v}(t)$ from intrinsic spreading via the intensity-weighted centroid, and identifies an effective transport law by weak-form sparse regression. Conditioning, threshold-sweep and random-centre diagnostics show that overcomplete libraries are strongly collinear; the search is therefore restricted to compact gradient-based libraries. Coefficients are refined by an inverse physics-informed network and recalibrated against forward rollouts, with a chronological block bootstrap quantifying uncertainty. The selected reduced model $u_t+\mathbf v(t)\!\cdot\!\nabla u = 9.005\,|\nabla u|^{2}+0.666\,\Delta u$ outperforms advection--diffusion baselines on held-out frames, retains a positive Laplacian coefficient, and admits a Cole--Hopf reduction to a linear advection--diffusion equation. The framework demonstrates that uncalibrated visual data can yield compact, predictive and structurally interpretable continuum models when discovery, calibration and uncertainty are treated as distinct stages.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04535v1</guid>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <category>physics.comp-ph</category>
      <category>stat.AP</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Cesar Acosta-Minoli, Sayantan Sarkar</dc:creator>
    </item>
    <item>
      <title>RLearner-LLM: Balancing Logical Grounding and Fluency in Large Language Models via Hybrid Direct Preference Optimization</title>
      <link>https://arxiv.org/abs/2605.04539</link>
      <description>arXiv:2605.04539v1 Announce Type: new 
Abstract: Direct Preference Optimization (DPO), the efficient alternative to PPO-based RLHF, falls short on knowledge-intensive generation: standard preference signals from human annotators or LLM judges exhibit a systematic verbosity bias that rewards fluency over logical correctness. This blindspot leaves a logical alignment gap -- SFT models reach NLI entailment of only 0.05-0.22 despite producing fluent text. We propose RLearner-LLM with Hybrid-DPO: an automated preference pipeline that fuses a DeBERTa-v3 NLI signal with a verifier LLM score, removing human annotation while overcoming the "alignment tax" of single-signal optimization. Evaluated across five academic domains (Biology, Medicine, Law) with three base architectures (LLaMA-2-13B, Qwen3-8B, Gemma 4 E4B-it), RLearner-LLM yields up to 6x NLI improvement over SFT, with NLI gains in 11 of 15 cells and consistent answer-coverage gains. On Gemma 4 E4B-it (4.5B effective params), Hybrid-DPO lifts NLI in four of five domains (+11.9% to +2.4x) with faster inference across all five, scaling down to compact base models without losing the alignment-tax mitigation. Our Qwen3-8B RLearner-LLM wins 95% of pairwise comparisons against its own SFT baseline; GPT-4o-mini in turn wins 95% against our concise output -- alongside the 69% win the same judge gives a verbose SFT over our DPO model, this replicates verbosity bias on a frontier comparator and motivates logic-aware metrics (NLI, ACR) over LLM-as-a-judge for knowledge-intensive generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04539v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Qiming Bao, Juho Leinonen, Paul Denny, Michael J. Witbrock</dc:creator>
    </item>
    <item>
      <title>Angle-I2P: Angle-Consistent-Aware Hierarchical Attention for Cross-Modality Outlier Rejection</title>
      <link>https://arxiv.org/abs/2605.04541</link>
      <description>arXiv:2605.04541v1 Announce Type: new 
Abstract: Image-to-point-cloud registration (I2P) is a fundamental task in robotic applications such as manipulation,grasping, and localization. Existing deep learning-based I2P methods seek to align image and point cloud features in a learned representation space to establish correspondences, and have achieved promising results. However, when the inlier ratio of the initial matching pairs is low, conventional Perspective-n-Points (PnP) methods may struggle to achieve accurate results. To address this limitation, we propose Angle-I2P, an outlier rejection network that leverages angle-consistent geometric constraints and hierarchical attention. First, we design a scale-invariant, crossmodality geometric constraint based on angular consistency. This explicit geometric constraint guides the model in distinguishing inliers from outliers. Furthermore, we propose a global-tolocal hierarchical attention mechanism that effectively filters out geometrically inconsistent matches under rigid transformation, thereby improving the Inlier Ratio (IR) and Registration Recall (RR). Experimental results demonstrate that our method achieves state-of-the-art performance on the 7Scenes, RGBD Scenes V2, and a self-collected dataset, with consistent improvements across all benchmarks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04541v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Muyao Peng, Shun Zou, Pei An, You Yang, Qiong Liu</dc:creator>
    </item>
    <item>
      <title>Power Distribution Bridges Sampling, Self-Reward RL, and Self-Distillation</title>
      <link>https://arxiv.org/abs/2605.04542</link>
      <description>arXiv:2605.04542v1 Announce Type: new 
Abstract: Recent analyses question whether reinforcement learning (RL) is responsible for strong reasoning in large language models (LLMs). At the same time, distillation and inference-time sampling, including power sampling, have emerged as effective ways to improve LLM performance. However, the relationship among RL, distillation, and sampling remains unclear. In this study, we focus on the power distribution, the target distribution of power sampling, and show that the power distribution bridges sampling, self-reward KL-regularized RL, and self-distillation. From the sampling perspective, we show that inexpensive local approximations cannot reproduce sequence-level power without information about possible suffixes. From the RL perspective, the power distribution is the closed-form optimizer of KL-regularized RL when the model's sequence-level log-probabilities are used as the reward. This identification leads to power self-distillation, an offline distillation surrogate that shares the same target distribution and amortizes the cost of power sampling into supervised training on teacher samples. We further show that power self-distillation can achieve self-reward sharpening, while improvement in a downstream true reward is governed by the covariance between true reward and self-reward under the power distribution. Experiments on reasoning tasks support our analysis: power sampling raises self-reward, true-reward gains depend on alignment with self-reward, and power self-distillation can match or exceed the performance of power sampling at much lower inference cost.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04542v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Akiyoshi Tomihari, Issei Sato</dc:creator>
    </item>
    <item>
      <title>UniVer: A Unified Perspective for Multi-step and Multi-draft Speculative Decoding</title>
      <link>https://arxiv.org/abs/2605.04543</link>
      <description>arXiv:2605.04543v1 Announce Type: new 
Abstract: Speculative decoding accelerates Large Language Models via draft-then-verify, where verification can be framed as an Optimal Transport (OT) problem. Existing approaches typically handle multi-draft and multi-step aspects in isolation, applying either flat OT to single-step drafts or per-token rejection sampling to tree-structured candidates. This separation leaves the joint regime (where multi-step dependencies meet multi-draft branching) poorly optimized, as local verification rules fail to exploit the coupling between horizontal and vertical dimensions of candidate trees. In this paper, we propose a unified perspective that casts tree-based verification as a conditional OT problem. Our key insight is that vertical dependencies can be abstracted through prefix acceptance probabilities, which act as dynamic scaling factors to actively guide horizontal draft selection. Based on this principle, we introduce UniVer, a verification algorithm that jointly optimizes across tree levels by composing local optimal transport plans under prefix constraints. We prove that UniVer remains lossless and achieves the optimal acceptance rate under the proposed conditional framework. Extensive experiments across different tasks and models demonstrate that UniVer improves acceptance length by 4.2% to 8.5% over standard recursive rejection sampling without replacement, while maintaining exact distributional alignment with the target model.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04543v1</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yepeng Weng, Qiao Hu, Takehisa Yairi</dc:creator>
    </item>
    <item>
      <title>Hard CNF Instances for Ideal Proof Systems</title>
      <link>https://arxiv.org/abs/2605.04544</link>
      <description>arXiv:2605.04544v1 Announce Type: new 
Abstract: Since the introduction of the Ideal Proof System (IPS) by Grochow and Pitassi (J. ACM 2018), a substantial body of work has established size lower bounds for IPS and its fragments. In particular, Forbes, Shpilka, Tzameret, and Wigderson (Theory Comput. 2021) developed the main lower-bound frameworks for restricted IPS fragments, namely functional lower bounds and the hard multiples method, while Alekseev, Grigoriev, Hirsch, and Tzameret (SIAM J. Comput. 2024) gave a general template for conditional lower bounds for full IPS.
  Yet all these lower bounds apply only to purely algebraic formulas over a field, that is, non-Boolean formulas not directly expressible in propositional logic. Proving lower bounds for CNF formulas has therefore remained a central open problem in this line of work.
  The current work resolves this question for IPS over read-once oblivious algebraic branching programs (roABPs) by proving lower bounds for refutations of CNF formulas in this system. Our approach is a rank-based feasible interpolation argument, following the method of Pudl\'ak and Sgall (Proof Complexity and Feasible Arithmetic 1996) for monotone span programs, in which decomposing a given roABP refutation along a variable partition yields a low-dimensional space of polynomials from which we construct a span-program interpolant. We extend their result from Nullstellensatz refutations measured by degree to Nullstellensatz refutations measured by roABP size (i.e., roABP-IPS$_\text{LIN}$).</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04544v1</guid>
      <category>cs.CC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tuomas Hakoniemi, Nutan Limaye, Iddo Tzameret</dc:creator>
    </item>
    <item>
      <title>Z-Opt: A Near-Optimal Reduced-Complexity Two-Dimensional Grassmannian Constellation</title>
      <link>https://arxiv.org/abs/2605.04545</link>
      <description>arXiv:2605.04545v1 Announce Type: new 
Abstract: Grassmannian constellations are known to achieve the capacity of noncoherent communications over Rayleigh fading channels in the high-SNR regime, yet their efficient construction remains challenging. In this paper, we propose two construction methods for Grassmannian constellations of one-dimensional subspaces in a two-dimensional space, termed S-Opt and Z-Opt, along with two low-complexity detectors. Both the construction and detection procedures are performed on the unit sphere, known as the Bloch sphere in quantum computing. We show that the chordal distance on the Grassmann manifold is proportional to the Euclidean distance on the Bloch sphere and derive a corresponding theoretical upper bound based on the Fejes--T\'oth bound on the minimum chordal distance. The S-Opt constellation is constructed from sphere-packing solutions and attains the derived upper bound for the optimal Bloch-sphere packings considered. The S-Opt detector can be applied to arbitrary Grassmannian constellations on $\mathcal{G}(2,1)$, and its time complexity scales linearly with the number of receive antennas and logarithmically with the constellation size, while yielding the same detection performance as the GLRT detector. Furthermore, based on the insight obtained through the S-Opt construction, the Z-Opt constellation is constructed by stacking regular polygons on the Bloch sphere, and its minimum chordal distance approaches the derived upper bound over the evaluated constellation sizes. The Z-Opt detector's time complexity scales linearly with the number of receive antennas, while yielding the same detection performance as the GLRT detector for Z-Opt.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04545v1</guid>
      <category>cs.IT</category>
      <category>eess.SP</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kotaro Shigenaga, Hiroki Iimori, Yuto Hama, Chandan Pradhan, Szabolcs Malomsoky, Naoki Ishikawa</dc:creator>
    </item>
    <item>
      <title>Stage-adaptive audio diffusion modeling</title>
      <link>https://arxiv.org/abs/2605.04547</link>
      <description>arXiv:2605.04547v1 Announce Type: new 
Abstract: Recent progress in diffusion-based audio generation and restoration has substantially improved performance across heterogeneous conditioning regimes, including text-conditioned audio generation and audio-conditioned super-resolution. However, training audio diffusion models remains computationally expensive, and most existing pipelines still rely on static optimization recipes that treat the relative importance of training signals as fixed throughout learning. In this work, we argue that a major source of inefficiency lies in the evolving balance between semantic acquisition and generation-oriented refinement. Early training places stronger emphasis on acquiring condition-aligned semantic structure and coarse global organization, whereas later training increasingly emphasizes temporal consistency, perceptual fidelity, and fine-detail refinement. To characterize this evolving balance, we introduce a progress-based regime variable derived from the training-time slope of an SSL-space discrepancy, which measures semantic progress during training. Based on this signal, we develop three complementary stage-aware mechanisms: decayed SSL guidance for early semantic bootstrapping, self-adaptive timestep sampling driven by the regime variable, and structure-aware regularization activated from convergent grouped organization in parameter space. We evaluate these mechanisms on text-conditioned audio generation and audio-conditioned super-resolution. Across both settings, the proposed stage-aware strategies improve convergence behavior and yield gains on the primary generation and spectral reconstruction metrics over standard static baselines. These results support the view that efficient audio diffusion training can benefit from treating external guidance, internal organization, and optimization emphasis as stage-dependent components rather than fixed ingredients.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04547v1</guid>
      <category>cs.SD</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xuanhao Zhang, Chang Li</dc:creator>
    </item>
    <item>
      <title>Event-Based Early Warning of Vineyard Disease Risk from Environmental Time Series</title>
      <link>https://arxiv.org/abs/2605.04548</link>
      <description>arXiv:2605.04548v1 Announce Type: new 
Abstract: Accurate early warning of vineyard disease risk from environmental observations is essential for timely intervention and more sustainable crop protection. However, many existing studies formulate disease prediction as daily presence classification, which can favor persistence-driven predictions and provide only limited support for actionable short-horizon warning. In this paper, we present an event-based approach for early warning of vineyard disease risk from environmental time series and evaluate it through a vineyard case study. Rather than predicting daily disease status, the task is reformulated to predict transitions into annotated disease-risk periods within a future window of 3-7 days. To reduce fragmentation caused by short interruptions in the binary labels, new events are defined only after a minimum disease-free gap. This formulation encourages models to capture environmental precursors associated with upcoming risk periods instead of merely reproducing temporal persistence. Using multi-year agro-meteorological data, we construct input representations that capture humidity dynamics, rainfall accumulation, temperature variability, and seasonal structure through cyclic temporal encoding. We evaluate representative methods from classical machine learning and deep learning, including XGBoost, Long Short-Term Memory (LSTM) networks, and Temporal Convolutional Networks (TCNs), using both standard classification metrics and an event-oriented early warning protocol. The results show that the event-based formulation supports practical short-horizon warning, while the compared models exhibit distinct trade-offs between event recall, lead time, and false-alert behavior. Overall, the study underscores the importance of problem formulation in environmental time-series learning and demonstrates the value of event-based prediction for vineyard disease warning systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04548v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ivica Dimitrovski, Ivan Kitanovski, Danco Davcev, Slobodan Kalajdziski, Kosta Mitreski</dc:creator>
    </item>
    <item>
      <title>Neural-Guided Domain Restriction to Accelerate Pseudospectra Computation for Structured Non-normal Banded Matrices</title>
      <link>https://arxiv.org/abs/2605.04550</link>
      <description>arXiv:2605.04550v1 Announce Type: new 
Abstract: Computing pseudospectra of non-normal matrices is essential for understanding the stability and transient behavior of dynamical systems. Such analysis is critical in applications including fluid dynamics, control systems, and differential operators, where non-normality can lead to significant transient amplification and sensitivity to perturbations that are not captured by eigenvalue analysis alone. At large scales, commonly used numerical approaches for pseudospectra computation can become computationally demanding, as they require repeated auxiliary computations to identify spectrally sensitive regions in the complex plane.
  We present a neural network-based approach that predicts sensitive regions directly from matrix features, thereby avoiding exhaustive pseudospectra evaluation across the entire complex plane. We calibrate the prediction threshold on validation data to ensure reliable coverage of sensitive regions. The trained neural network guides the selection of grid points requiring full computation, enabling focused computation only where necessary. The approach provides a practical preprocessing strategy for efficient pseudospectra computation. Numerical experiments on non-normal banded matrices demonstrate substantial speedup compared to full grid-based numerical evaluation while maintaining high accuracy in identifying sensitive regions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04550v1</guid>
      <category>math.NA</category>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Amit Punia, Rakesh Kumar, Madan Lal</dc:creator>
    </item>
    <item>
      <title>The Newsworthiness of Brazilian Distress: A Peak Analysis on Time Series of International Media Attention to Disasters in Brazil</title>
      <link>https://arxiv.org/abs/2605.04552</link>
      <description>arXiv:2605.04552v1 Announce Type: new 
Abstract: Media coverage influences disaster response, yet the drivers of international media attention to local events remain unevenly understood. Brazil offers a compelling case: some of its natural and technological disasters occasionally hit the international headlines. However, systematic analyses of what makes these events be discussed abroad are still missing. Addressing this gap requires representative, validated and country-specific news datasets. This paper presents a peak analysis of 2k news about Brazilian fires and landslides in German newspapers from 2000 to 2024. Using time series segmentation to detect news event peaks, we examine the extent to which they can be temporally aligned with observations in national and global disaster databases.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04552v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Brielen Madureira, Andreas Niekler, Marc Keuschnigg, Mariana Madruga de Brito</dc:creator>
    </item>
    <item>
      <title>InterMesh: Explicit Interaction-Aware End-to-End Multi-Person Human Mesh Recovery</title>
      <link>https://arxiv.org/abs/2605.04554</link>
      <description>arXiv:2605.04554v1 Announce Type: new 
Abstract: Humans constantly interact with their surroundings. Existing end-to-end multi-person human mesh recovery methods, typically based on the DETR framework, capture inter-human relationships through self-attention across all human queries. However, these approaches model interactions only implicitly and lack explicit reasoning about how humans interact with objects and with each other. In this paper, we propose InterMesh, a simple yet effective framework that explicitly incorporates human-environment interaction information into human mesh recovery pipeline. By leveraging a human-object interaction detector, InterMesh enriches query representations with structured interaction semantics, enabling more accurate pose and shape estimation. We design lightweight modules, Contextual Interaction Encoder and Interaction-Guided Refiner, to integrate these features into existing HMR architectures with minimal overhead. We validate our approach through extensive experiments on 3DPW, MuPoTS, CMU Panoptic, Hi4D, and CHI3D datasets, demonstrating remarkable improvements over state-of-the-art methods. Notably, InterMesh reduces MPJPE by 9.9% on CMU Panoptic and 8.2% on Hi4D, highlighting its effectiveness in scenarios with complex human-object and inter-human interactions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04554v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kaili Zheng, Kaiwen Wang, Xun Zhu, Chenyi Guo, Ji Wu</dc:creator>
    </item>
    <item>
      <title>Counter-Dyna: Data-Efficient RL-Based HVAC Control using Counterfactual Building Models</title>
      <link>https://arxiv.org/abs/2605.04555</link>
      <description>arXiv:2605.04555v1 Announce Type: new 
Abstract: Model-based reinforcement learning (MBRL) offers a promising approach for data-efficient energy management in buildings, combining the strengths of predictive modeling and reinforcement learning. While previous MBRL methods applied to HVAC control have reduced training data requirements, they still require several months of interaction with the building to learn a satisfactory control policy. A key reason is that existing surrogate models attempt to predict the entire state-space, including weather and electricity prices that are unaffected by control actions, or completely ignore these variables. Addressing these issues, we propose Counter-Dyna, a method that enhances the data-efficiency of Dyna, an MBRL method. We create data-efficient counterfactual surrogate models (CSM) by leveraging invariances in the state-space. Using a CSM in Dyna speeds up RL training measured in environment interaction data compared to previous results. In comparison with previous state-of-the-art that used 6-12 months of environment interactions, our method needs only 5 weeks. We evaluate our method in a large simulation study using the literature standard BOPTEST framework and proximal policy algorithm (PPO) as the RL algorithm. Our results show cost-saving potentials of 5.3% to 17.0% in a hypothetical deployment scenario. Our work is a significant step towards making real-world deployment of RL algorithms in HVAC control practically viable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04555v1</guid>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jan Marco Ruiz de Vargas, Fabian Raisch, Zoltan Nagy, Pierre Pinson, Christoph Goebel</dc:creator>
    </item>
    <item>
      <title>Benchmarking LLMs on the Massive Sound Embedding Benchmark (MSEB)</title>
      <link>https://arxiv.org/abs/2605.04556</link>
      <description>arXiv:2605.04556v1 Announce Type: new 
Abstract: The Massive Sound Embedding Benchmark (MSEB) has emerged as a standard for evaluating the functional breadth of audio models. While initial baselines focused on specialized encoders, the shift toward "audio-native" Large Language Models (LLMs) suggests a new paradigm where a single multimodal backbone may replace complex, task-specific pipelines. This paper provides a rigorous empirical evaluation of leading LLMs - including members from the Gemini and GPT families - across the eight core MSEB capabilities to assess their efficacy and audio-text parity. Our results indicate that while a significant modality gap persists regarding performance and robustness, the empirical evidence for an "optimal" modeling approach remains inconclusive. Ultimately, the choice between audionative and cascaded architectures depends heavily on specific use-case requirements and the underlying assumptions regarding latency, cost, and reasoning depth.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04556v1</guid>
      <category>cs.SD</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Cyril Allauzen, Tom Bagby, Georg Heigold, Ehsan Variani, Ke Wu</dc:creator>
    </item>
    <item>
      <title>Efficient Geometry-Controlled High-Resolution Satellite Image Synthesis</title>
      <link>https://arxiv.org/abs/2605.04557</link>
      <description>arXiv:2605.04557v1 Announce Type: new 
Abstract: High-resolution satellite images are often scarce and costly, especially for remote areas or infrequent events. This shortage hampers the development and testing of machine learning models for land-cover classification, change detection, and disaster monitoring. In this paper, we tackle the problem of geometry-controlled high-resolution satellite image synthesis by adding control over existing pre-trained diffusion models. We propose a simple yet efficient method for controlling the synthesis process by leveraging only skip connection features using windowed cross-attention modules. Several previously established control techniques are compared, indicating that our method achieves comparable performance while leading to a better alignment with the geometry control map. We also discuss the limitations in current evaluation approaches, amplifying the necessity of a consistent alignment assessment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04557v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Vlad Vasilescu, Daniela Faur, Teodor Costachioiu</dc:creator>
    </item>
    <item>
      <title>Beyond Static Best-of-N: Bayesian List-wise Alignment for LLM-based Recommendation</title>
      <link>https://arxiv.org/abs/2605.04559</link>
      <description>arXiv:2605.04559v1 Announce Type: new 
Abstract: Large Language Models have revolutionized recommender systems (LLM4Rec) by leveraging their generative capabilities to model complex user preferences. However, existing LLM4Rec methods primarily rely on token-level objectives, making it difficult to optimize list-level and non-differentiable metrics (e.g., NDCG, fairness) that define actual recommendation quality. While Best-of-N (BoN) directly optimizes these metrics during inference, its high computational cost hinders real-world deployment. To address this, BoN Alignment aims to distill the search capability into the model itself, yet current approaches suffer from two critical limitations: (1) Indiscriminate Supervision, where the static reference fails to distinguish the relative quality of candidates exceeding its empirical range, leading to a loss of ranking guidance; and (2) Gradient Decay, where the effective supervision signal rapidly diminishes as the evolving policy improves, resulting in inefficient optimization.
  To overcome these challenges, we propose BLADE (Bayesian List-wise Alignment via Dynamic Estimation). Unlike static approaches, BLADE introduces a Bayesian framework that continuously updates the target distribution by fusing historical priors with dynamic evidence from the model's current rollouts. This mechanism constructs a self-evolving target that adapts to the model's growing capabilities, ensuring the training signal remains informative throughout the learning process. Extensive experiments on three real-world datasets demonstrate that BLADE significantly outperforms state-of-the-art baselines. Crucially, it breaks the static performance upper bound, achieving sustained gains in both ranking accuracy (Recall, NDCG) and complex list-wise metrics (Fairness, Diversity). The code is available via https://github.com/RegionCh/BLADE.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04559v1</guid>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ruijun Chen, Chongming Gao, Jiawei Chen, Weiqin Yang, Xiangnan He</dc:creator>
    </item>
    <item>
      <title>SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression</title>
      <link>https://arxiv.org/abs/2605.04560</link>
      <description>arXiv:2605.04560v1 Announce Type: new 
Abstract: Perceptual image compression focuses on preserving high visual quality under low-bitrate constraints. Most existing approaches to perceptual compression leverage the strong generative capabilities of generative adversarial networks or diffusion models, at the cost of substantial model complexity. To this end, we present an efficient perceptual image compression method that exploits the long-range modeling capability and linear computational complexity of state space models, with a particular focus on Mamba. Unlike existing methods that rely on an inherently fixed scanning order and consequently impair semantic continuity and spatial correlation, we develop a semantic-aware Mamba block (SAMB) to enable scanning guided by dynamically clustered semantic features, thereby alleviating the strict causality constraints and long-range information decay inherent to Mamba. Inspired by singular value decomposition, we design an SVD-inspired redundancy reduction module (SVD-RRM) that performs a low-rank approximation on the latent features by introducing a learnable soft threshold, leading to channel-wise redundancy information reduction. The proposed SAMB is integrated into both the encoder and decoder of the compression framework, whereas the SVD-RRM is incorporated only in the encoder. Extensive experiments demonstrate that our method performs favorably against state-of-the-art approaches in terms of rate-distortion-perception tradeoff and model complexity. The source code and pretrained models will be available at https://github.com/Jasmine-aiq/SAMIC.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04560v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiaqian Zhang, Hao Wei, Chenyang Ge, Yanhui Zhou</dc:creator>
    </item>
    <item>
      <title>RangeGuard: Efficient, Bounded Approximate Error Correction for Reliable DNNs</title>
      <link>https://arxiv.org/abs/2605.04563</link>
      <description>arXiv:2605.04563v1 Announce Type: new 
Abstract: As DRAM scales in density and adopts 3D integration, raw fault rates increase and multi-bit errors are no longer rare. Such errors can severely impact Deep Neural Networks (DNNs): although DNNs tolerate small numerical perturbations, random bit flips can create extreme outliers that propagate and sharply degrade accuracy. Large Language Models (LLMs) are particularly vulnerable because attention, residual, and normalization layers can amplify and preserve a single corrupted activation across many layers, destabilizing inference.
  This paper introduces RangeGuard, a metadata-centric error-correcting framework that provides strong reliability and high efficiency based on bounded approximate correction. Instead of protecting raw bits, RangeGuard encodes compact Range Identifiers (RIDs) that capture the numerical range of each value. These compact metadata enable efficient use of limited redundancy and concentrate protection on range changes, which indicate harmful semantic deviations, while ignoring benign intra-range variations. Upon detecting a range change, RangeGuard restores the correct range and substitutes a representative value, ensuring that error magnitudes are bounded within the range. Based on RIDs, RangeGuard can tolerate 64+ flipped bits using only 16 bits of parity available in GPU memories without a noticeable accuracy loss. By introducing semantic range protection, RangeGuard enables reliable DNN execution even under frequent memory errors and tight redundancy budgets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04563v1</guid>
      <category>cs.AR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hanum Ko, Sangheum Yeon, Jong Hwan Ko, Jungrae Kim</dc:creator>
    </item>
    <item>
      <title>Practical validation of synthetic pre-crash scenarios</title>
      <link>https://arxiv.org/abs/2605.04564</link>
      <description>arXiv:2605.04564v1 Announce Type: new 
Abstract: The representativeness of synthetic pre-crash scenarios is crucial for assessing the safety impact of Driving Automation Systems through virtual simulations. However, a gap remains in the robust evaluation of synthetic pre-crash scenarios' practical equivalence to their real-world counterparts; that is, whether they are similar enough for the intended assessment purpose. Conventional significance testing is inadequate, as it focuses on detecting differences rather than establishing practical equivalence. This study addresses the research gap by extending our previous work on a Bayesian Region of Practical Equivalence (ROPE)-based equivalence testing framework by introducing a binning-based approach to define appropriate statistics and equivalence criteria. Two binning-based statistics are proposed to measure practically meaningful distributional differences between datasets in the context of safety impact assessment. The framework's applicability is demonstrated through a case study, which tests the practical equivalence of two synthetic rear-end pre-crash datasets with a previously developed reference dataset in the context of the safety impact assessment of an Automatic Emergency Braking system. The results show that the framework provides informative quantitative assessments of practical equivalence as well as diagnostic insights into the divergence of datasets. Although the demonstration focuses on rear-end pre-crash scenarios, the framework is generic and extensible to broader validation contexts, providing an interpretable and principled basis for practical equivalence assessment across diverse synthetic data applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04564v1</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Jian Wu, Ulrich Sander, Carol Flannagan, Jonas B\"argman</dc:creator>
    </item>
    <item>
      <title>Delay-Aware Large-Small Model Collaboration over LEO Satellite Networks</title>
      <link>https://arxiv.org/abs/2605.04565</link>
      <description>arXiv:2605.04565v1 Announce Type: new 
Abstract: In this paper, we introduce a delay-aware largesmall model collaboration scheme for low Earth orbit (LEO) satellite networks, which can balance the computational load among satellites and the communication load across inter-satellite links. Specifically, computational resource constrained remote sensing satellites are responsible for data collection and local processing using small models, while collaborating with computing satellites that provide large model processing. To minimize the service delay, we formulate a joint optimization problem for offloading decision and routing strategy design, which is transformed into a decentralized partially observable Markov decision process. To solve the problem, we develop a multi-agent reinforcement learning (MARL)-based algorithm with offline policy training and online bisection search. The offline trained policy determines routing strategies, while online bisection search iteratively adjusts the offloading decisions. Simulation results demonstrate that the proposed scheme can reduce the service delay by up to 31.85% compared with the benchmarks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04565v1</guid>
      <category>cs.DC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mingyu Guo, Wen Wu, Ying Wang, Songge Zhang, Liang Li</dc:creator>
    </item>
    <item>
      <title>Open-Source Image Editing Models Are Zero-Shot Vision Learners</title>
      <link>https://arxiv.org/abs/2605.04566</link>
      <description>arXiv:2605.04566v1 Announce Type: new 
Abstract: Recent studies have shown that large generative models can solve vision tasks they were not explicitly trained for. However, existing evidence relies on closed-source models~(Veo~3, Nano Banana Pro) or requires task-specific instruction tuning, leaving open whether publicly available image-editing models possess zero-shot vision abilities out of the box.
  We conduct a systematic evaluation of three open-source image-editing models -- Qwen-Image-Edit, FireRed-Image-Edit, and LongCat-Image-Edit -- on dense visual prediction tasks \emph{without any fine-tuning}. We benchmark monocular depth estimation on NYUv2 and DIODE, surface normal estimation on NYUv2, and semantic segmentation on Cityscapes, covering both geometric and semantic scene understanding.
  Results show that open-source image-editing models exhibit non-trivial zero-shot visual understanding. On NYUv2 surface normals, FireRed-Image-Edit achieves a mean angular error of $17.69^\circ$, surpassing the fine-tuned Marigold ($20.86^\circ$) and matching the instruction-tuned Vision Banana ($17.78^\circ$) without any task-specific training. On NYUv2 depth estimation, LongCat-Image-Edit obtains $\delta_1{=}0.822$ with affine alignment, and Qwen-Image-Edit leads on DIODE Indoor ($\delta_1{=}0.868$). On Cityscapes semantic segmentation, Qwen-Image-Edit reaches 25.7 mIoU at the 19-class level and 49.5 mIoU at a coarser 7-category level. By comparing three independently trained editors, we test whether zero-shot vision ability is an emergent property of image-editing pretraining rather than a model-specific artifact. Code, evaluation scripts, and all results are publicly released to serve as a reproducible baseline for future work.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04566v1</guid>
      <category>cs.CV</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wei Liu, Jiaxin Lin, Rui Chen</dc:creator>
    </item>
    <item>
      <title>Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination</title>
      <link>https://arxiv.org/abs/2605.04568</link>
      <description>arXiv:2605.04568v1 Announce Type: new 
Abstract: State-of-the-art model-based Reinforcement Learning (RL) approaches either use gradient-free, population-based methods for planning, learned policy networks, or a combination of policy networks and planning. Hybrid approaches that combine Model Predictive Control (MPC) with a learned model and a policy prior to leverage the advantages of both paradigms have shown promising results. However, these approaches typically rely on gradient-free optimization methods, which can be computationally expensive for high-dimensional control tasks. While gradient-based methods are a promising alternative, recent works have empirically shown that gradient-based methods often perform worse than their gradient-free counterparts. We propose Dream-MPC, a novel approach that generates few candidate trajectories from a rolled-out policy and optimizes each trajectory by gradient ascent using a learned world model, uncertainty regularization and amortization of optimization iterations over time by reusing previously optimized actions. Our results on 24 continuous control tasks show that Dream-MPC can significantly improve the performance of the underlying policy and can outperform gradient-free MPC and state-of-the-art baselines. We will open source our code and more at https://dream-mpc.github.io.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04568v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Jonathan Spieler, Sven Behnke</dc:creator>
    </item>
    <item>
      <title>Lightning Unified Video Editing via In-Context Sparse Attention</title>
      <link>https://arxiv.org/abs/2605.04569</link>
      <description>arXiv:2605.04569v1 Announce Type: new 
Abstract: Video editing has evolved toward In-Context Learning (ICL) paradigms, yet the resulting quadratic attention costs create a critical computational bottleneck. In this work, we propose In-context Sparse Attention (ISA), the first near-lossless empirical sparse framework tailored for ICL video editing. Our design is grounded in two key insights: first, context tokens exhibit significantly lower saliency than source tokens; second, we theoretically prove and empirically validate that Query sharpness correlates with approximation error. Motivated by these findings, ISA implements an efficient pre-selection strategy to prune redundant context, followed by a dynamic query grouping mechanism that routes high-error queries to full attention and low-error ones to a computationally efficient 0-th order Taylor sparse attention. Furthermore, we build \textbf{\texttt{LIVEditor}} , a novel lightning video editing model via ISA and a proposed video-editing data pipeline that curated a 1.7M high-quality dataset. Extensive experiments demonstrate that LIVEditor achieves a $\sim$60% reduction in attention-module latency while surpassing state-of-the-art methods across EditVerseBench, IVE-Bench, and VIE-Bench, delivering near-lossless acceleration without compromising visual fidelity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04569v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Shitong Shao, Zikai Zhou, Haopeng Li, Yingwei Song, Wenliang Zhong, Lichen Bai, Zeke Xie</dc:creator>
    </item>
    <item>
      <title>PINSIGHT: A Comprehensive Threat Exploration of Domain-Adaptive Wi-Fi based PIN Code Inference</title>
      <link>https://arxiv.org/abs/2605.04570</link>
      <description>arXiv:2605.04570v1 Announce Type: new 
Abstract: Wi-Fi signals can be exploited by adversaries as a sensing side channel to eavesdrop on physical information. By monitoring propagation effects of radio waves within the victim's environment, attackers can remotely infer sensitive information. One particularly concerning example is PIN code inference, where the attacker faces the challenge of mapping Wi-Fi physical-layer channel estimations back into typed digits. While effective in their training environment, such attacks typically fail as soon as they are deployed in unseen environments. The current state-of-the-art attack, WiKI-Eve, attempts to overcome this problem using a deep-learning approach, reporting high PIN code inference accuracy independent of environments, devices, and users. While this suggests a significant real-world threat, it is not well understood how far the attack actually reaches, nor what its underlying generalization performance is based on. In this work, we close this gap by presenting PINSIGHT, a novel methodology that separates the effects of environmental variation and PIN code typing. This enables the first rigorous threat assessment of such attacks, evaluating their generalization capabilities and limitations. Our approach leverages a robotic typing platform that produces highly repeatable keystroke events across systematically varied environment changes [...]. This dataset constitutes the first benchmark for environment generalization in Wi-Fi PIN code inference attacks. Evaluating several state-of-the-art methods, we find that attacks generalize reliably across changes in the surrounding environment but degrade substantially when the channel's encoding of typing itself shifts - precisely the condition that defines a realistic attack scenario. We conclude that the reported performance of current state-of-the-art Wi-Fi PIN inference attacks is not representative of the actual real-world threat.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04570v1</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Johannes Kortz, Paul Staat, Christof Paar, Christian Zenger</dc:creator>
    </item>
    <item>
      <title>From Parameter Dynamics to Risk Scoring : Quantifying Sample-Level Safety Degradation in LLM Fine-tuning</title>
      <link>https://arxiv.org/abs/2605.04572</link>
      <description>arXiv:2605.04572v1 Announce Type: new 
Abstract: Safety alignment of Large Language Models (LLMs) is extremely fragile, as fine-tuning on a small number of benign samples can erase safety behaviors learned from millions of preference examples. Existing studies attempt to explain this phenomenon by comparing parameters and hidden states before and after fine-tuning, but overlook their dynamic evolution during fine-tuning. In this paper, we uncover a critical mechanism underlying safety degradation by analyzing parameter dynamics, where benign fine-tuning causes parameters to cumulatively drift toward danger-aligned directions, progressively undermining the model's safety. This finding suggests that samples contributing more to this drift has greater fine-tuning risks. Based on this insight, we propose a method of Sample-Level Quantification of Safety Degradation (SQSD), which quantifies the influence of each training sample on safety degradation. Specifically, SQSD computes continuous risk scores to samples by measuring their induced parameter updates' projection difference between danger and safety directions. Extensive experiments across multiple models and datasets demonstrate that SQSD effectively quantifies sample-level fine-tuning risks and exhibits strong transferability across model architectures, parameter scales, and parameter-efficient methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04572v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiao Wang, Yifei Zhang, YongKang Liu, Xiaocui Yang, Zihan Wang, Shi Feng, Daling Wang</dc:creator>
    </item>
    <item>
      <title>Mixed Finite Elements for Geometrically Exact Beams using Discontinuous Rotations and Discrete Curvature</title>
      <link>https://arxiv.org/abs/2605.04573</link>
      <description>arXiv:2605.04573v1 Announce Type: new 
Abstract: We propose a novel mixed finite-element formulation for geometrically exact (Simo--Reissner) beams that introduces the moment vector as additional independent field. The specific mixed form allows for an element-local, discontinuous approximation of rotations, which is key to a simple and efficient discretization framework. The concept of discrete curvature provides a mathematically consistent treatment of rotation discontinuities. For linear constitutive laws, the mixed form is derived via a Legendre transform of the curvature-related strain energy. Objectivity is retained at the discrete level by interpolating relative rotations through a multiplicative split of the rotation field; path-independence is inherent to the total Lagrangian setting and verified numerically. Several benchmarks demonstrate optimal rates of convergence and accuracy, irrespective of the beam's slenderness and order of approximation. Notably, the lowest-order element entirely avoids rotation interpolation by employing element-constant rotations only.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04573v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Alexander Humer, Ivo Steinbrecher, Astrid Pechstein</dc:creator>
    </item>
    <item>
      <title>VL-UniTrack: A Unified Framework with Visual-Language Prompts for UAV-Ground Visual Tracking</title>
      <link>https://arxiv.org/abs/2605.04574</link>
      <description>arXiv:2605.04574v1 Announce Type: new 
Abstract: UAV-ground visual tracking (UGVT) aims to simultaneously track the same object from both the UAV and the ground view. However, existing two-stream methods suffer from isolated feature extraction and rely heavily on implicit appearance matching, which struggles to establish reliable correspondence under drastic view differences, leading to tracking unreliability. To address these limitations, we propose VL-UniTrack, a fully unified framework enhanced by visual-language prompts. By encoding features from both views within a single shared encoder, our method breaks the barrier of feature isolation to facilitate sufficient cross-view interaction. To overcome the ambiguity caused by relying solely on appearance matching, we design visual-language geometric prompting module, which fuses language descriptions with visual features to generate learnable prompts. These prompts are then fed into our prompt-guided cross-view adapter module to enable sufficient cross-view feature interaction and to guide the learning of view-specific feature representations. Furthermore, a confidence-modulated mutual distillation loss is proposed to regularize the training by mitigating noise propagation. Extensive experiments demonstrate that our method achieves state-of-the-art performance on the latest benchmark. The code can be downloaded in https://github.com/xuboyue1999/VL-UniTrack.git</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04574v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Boyue Xu, Ruichao Hou, Tongwei Ren, Gangshan Wu</dc:creator>
    </item>
    <item>
      <title>Benchmarking POS Tagging for the Tajik Language: A Comparative Study of Neural Architectures on the TajPersParallel Corpus</title>
      <link>https://arxiv.org/abs/2605.04576</link>
      <description>arXiv:2605.04576v1 Announce Type: new 
Abstract: This paper presents the first benchmark for the task of automatic part-of-speech (POS) tagging for the Tajik language. Despite the existence of multilingual language models demonstrating high effectiveness for many of the world's languages, their capacity for grammatical analysis of Tajik has remained unexplored until now. The aim of this study is to fill this gap through a systematic comparison of classical neural network architectures and modern multilingual transformers.
  Experiments were conducted on the TajPersParallel corpus, a parallel lexical resource comprising approximately 44,000 dictionary entries. Due to the absence of full-fledged example sentences in the current version of the corpus, the task was performed at the level of isolated lexical units, representing a challenging case of context-independent classification. The study compares the following architectures: a recurrent BiLSTM-CRF model, as well as multilingual models XLM-RoBERTa (large), mBERT, ParsBERT (Persian), and ruBERT (Russian), adapted using the parameter-efficient fine-tuning method LoRA.
  The testing results showed that the best performance is achieved by the mBERT + LoRA model (macro F1-score = 0.11, weighted F1-score = 0.62). It was established that in the absence of syntactic context, all models experience significant difficulty in resolving morphological ambiguity, successfully classifying primarily high-frequency classes ("noun," "adjective") while demonstrating zero effectiveness for rare function words. Zero-shot evaluation revealed the greatest typological proximity of Tajik to Persian (ParsBERT) and Russian (ruBERT). The obtained results form a foundation for further research and development in the field of automatic processing of the Tajik language.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04576v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mullosharaf K. Arabov</dc:creator>
    </item>
    <item>
      <title>GTF: Omnidirectional EPI Transformer for Light Field Super-Resolution</title>
      <link>https://arxiv.org/abs/2605.04581</link>
      <description>arXiv:2605.04581v1 Announce Type: new 
Abstract: Light field (LF) image super-resolution benefits from Epipolar Plane Images (EPIs), whose line slopes explicitly encode disparity. However, existing Transformer-based LF SR methods mainly attend to horizontal and vertical EPIs, leaving diagonal epipolar geometry underexplored. We present GTF, an omnidirectional EPI Transformer that explicitly models horizontal, vertical, 45-degree, and 135-degree EPIs within a unified reconstruction framework. GTF combines directional EPI processing, MacPI-based prior injection, adaptive directional fusion, and a topology-preserving feed-forward network to better exploit LF geometry. For the NTIRE 2026 fidelity tracks, we use GTF as the main model, while a lightweight GTF-Tiny variant targets the efficiency track. On five standard LF SR benchmarks covering both real-captured and synthetic scenes, GTF reaches 32.78 dB without inference-time enhancement, and stronger inference settings with EPSW and test-time augmentation further improve performance. Under the NTIRE 2026 efficiency constraint, GTF-Tiny attains 32.57 dB with only 0.915M parameters and 19.81 GFLOPs. In the NTIRE 2026 Light Field Image Super-Resolution Challenge, our submissions rank 3rd on Track 1 and Track 3 and 4th on Track 2. Architecture-evolution, channel-width, and inference analyses further support the effectiveness of diagonal EPI modeling, directional fusion, and the lightweight design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04581v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kunyu Li, Fei Wang, Lichao Zhang, Junjie Liu, Bihong Li</dc:creator>
    </item>
    <item>
      <title>TajikNLP: An Open-Source Toolkit for Comprehensive Text Processing of Tajik (Cyrillic Script)</title>
      <link>https://arxiv.org/abs/2605.04583</link>
      <description>arXiv:2605.04583v1 Announce Type: new 
Abstract: The Tajik language, written in Cyrillic script, remains severely under-resourced in terms of publicly available natural language processing (NLP) toolkits, hindering both linguistic research and applied development. This paper introduces TajikNLP, an open-source Python library that provides the first comprehensive pipeline for processing authentic Tajik text while preserving the original Cyrillic orthography. The library implements a modular architecture centered around a unified Doc object, enabling sequential application of components for cleaning, normalization, tokenization (including subword BPE), morphemic segmentation, part-of-speech tagging, stemming, lemmatization, and sentence splitting. A novel unified morphology engine is introduced, offering controlled and deep analysis modes that significantly improve handling of Tajik's agglutinative nominal and verbal inflections. The release further incorporates a lexicon-based sentiment analyser and pre-trained Word2Vec/FastText embeddings loaded directly from the Hugging Face Hub. To ensure reproducibility and facilitate future research, four accompanying linguistic datasets -- a POS-tagged corpus (52.5k entries), a sentiment lexicon (3.5k entries), a toponym gazetteer (5.6k entries), and a personal names dataset (3.8k entries) -- have been openly published under permissive licenses. The library's reliability is validated by an extensive test suite of 616 automated tests achieving 93% source code coverage. TajikNLP thus establishes a foundational technological infrastructure for Tajik language processing, lowering the barrier to entry for both academic and industrial applications in low-resource Cyrillic-script environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04583v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mullosharaf K. Arabov</dc:creator>
    </item>
    <item>
      <title>IntenBot: Flexible and Imprecise Multimodal Input for LLMs to Understand User Intentions for Casual and Human-Like HRI</title>
      <link>https://arxiv.org/abs/2605.04585</link>
      <description>arXiv:2605.04585v1 Announce Type: new 
Abstract: In natural human-to-human communication, multimodal user input is typically used to supplement explicit and complement implicit voice commands, with casualness allowing for flexible input modality combinations and tolerance for imprecise input data. For example, saying "I want that." with a casual glance at a bottle of water is clear enough in human-to-human communication as an implicit voice command accompanied by gaze and/or gestures, rather than an explicit one. To enable such a human-like interaction in human-robot interaction (HRI), we propose a system, IntenBot, to understand user intentions from flexible and imprecise multimodal input, including voice, gaze, and finger-pointing, in XR. The disambiguation capability of large language models (LLMs) is used to filter out irrelevant input modalities and imprecise input data, generating potential instructions for user confirmation. The flexible and imprecise multimodal input enables casual, human-like interaction with robots, reducing time, effort, and attention, and could also be used as non-voice input. We conducted an informative user behavior study in a simulated environment to understand users' natural be- havior in flexibly interacting with a robot using multimodal input and to obtain appropriate angle range parameters for gaze and finger-pointing. An XR study was then performed to evaluate the performance of IntenBot, compared with other methods. We also deployed IntenBot on a physical robot to showcase its real-world applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04585v1</guid>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yen-Ting Liu, Chiu-Hsuan Wang, TzuLing Chen, Ting-Ying Lee, Tzu-Hua Wang, Chien-Ming Lin, Bing-Yu Chen, Hsin-Ruey Tsai</dc:creator>
    </item>
    <item>
      <title>From Diffusion to Rectified Flow: Rethinking Text-Based Segmentation</title>
      <link>https://arxiv.org/abs/2605.04590</link>
      <description>arXiv:2605.04590v1 Announce Type: new 
Abstract: Text-based image segmentation aims to delineate object boundaries within an image from text prompts, offering higher flexibility and broader application scope compared to traditional fixed-category segmentation tasks. Recent studies have shown that diffusion models (e.g., Stable Diffusion) can provide rich multimodal semantic features, leading to studies of using diffusion models as feature extractors for segmentation tasks. Such methods, however, inherit the generative natures of diffusion models that are harmful to discriminative segmentation tasks. In response, we propose RLFSeg, a novel framework that leverages Rectified Flow to learn direct mapping from the image to the segmentation mask within the latent space. The model is thus freed from the noise-denoise process and the need to optimize the time step of diffusion models, resulting in substantially better performance than previous diffusion-based methods, especially on zero-shot scenarios. By introducing label refinement and an Adaptive One-Step Sampling strategy, the model achieves higher accuracy even on a single inference step. The framework redirects a pretrained generative model to the discriminative segmentation task with zero modification to model structure, thus reveals promising application potential and significant research value.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04590v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3805622.3810595</arxiv:DOI>
      <dc:creator>Zishen Qu, Xuesong Li, Haijian Gu, Hongwei Kang, Quan Meng, Tianrui Niu, Xin Yang, Ruidong Pan</dc:creator>
    </item>
    <item>
      <title>DiCLIP: Diffusion Model Enhances CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation</title>
      <link>https://arxiv.org/abs/2605.04593</link>
      <description>arXiv:2605.04593v1 Announce Type: new 
Abstract: Weakly Supervised Semantic Segmentation (WSSS) with image-level labels typically leverages Class Activation Maps (CAMs) to achieve pixel-level predictions. Recently, Contrastive Language-Image Pre-training (CLIP) has been introduced to generate CAMs in WSSS. However, previous WSSS methods solely adopt CLIP's vision-language paired property for dense localization, neglecting its inherently limited dense knowledge across both visual and text modalities, which renders CAM generation suboptimal. In this work, we propose DiCLIP, a novel WSSS framework that leverages the generative diffusion model to enhance CLIP's dense knowledge across two modalities. Specifically, Visual Correlation Enhancement (VCE) and Text Semantic Augmentation (TSA) modules are proposed for dense prediction enhancement. To improve the spatial awareness of visual features, our VCE module utilizes diffusion's reliable spatial consistency to mitigate the over-smoothing issue in CLIP's attention. It designs the Attention Clustering Refinement (ACR) module to reliably extract diverse correlation maps from the diffusion model. The correlation maps act as a diversity bias for CLIP's self-attention, recursively pushing its visual features towards a more discriminative dense distribution. To augment the semantics of text embeddings, our TSA module argues that a single text modality is insufficient to encompass the variability of visual categories. Thus, we leverage diffusion's generative power to maintain a dynamic key-value cache model, shifting CAM generation from a patch-text matching mechanism to a novel visual knowledge retrieval paradigm. With these enhancements, DiCLIP not only outperforms state-of-the-art methods on PASCAL VOC and MS COCO but also significantly reduces training costs. Code is publicly available at https://github.com/zwyang6/DiCLIP.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04593v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhiwei Yang, Pengfei Song, Yucong Meng, Kexue Fu, Shuo Wang, Zhijian Song</dc:creator>
    </item>
    <item>
      <title>HeterSEED: Semantics-Structure Decoupling for Heterogeneous Graph Learning under Heterophily</title>
      <link>https://arxiv.org/abs/2605.04594</link>
      <description>arXiv:2605.04594v1 Announce Type: new 
Abstract: Many real-world heterogeneous graphs exhibit pronounced heterophily, where connected nodes often have dissimilar labels or play different semantic roles. In such settings, standard heterogeneous graph neural networks that aggregate messages along metapaths or meta-relations primarily based on feature similarity can propagate misleading information, since feature similarity may be misaligned with underlying relational semantics. In this paper, we propose HeterSEED, a semantics-structure decoupling framework for heterogeneous graph learning under heterophily. HeterSEED decouples representation learning into a heterogeneous semantic channel that captures type- and relation-aware local semantics and a structure-aware heterophily channel that separates homophilic and heterophilic neighborhoods via pseudo-label-guided partitioning and aggregates them using metapath-based structural weights. A node-level adaptive fusion mechanism then combines the two channels to produce context-dependent node representations. Theoretically, we establish that, on heterogeneous graphs under heterophily, HeterSEED is strictly more expressive than standard heterogeneous graph neural networks that rely primarily on feature similarity and provably reduces the prediction bias introduced by heterophilic neighbors. Experiments on five real-world heterogeneous graphs, including two large-scale networks at the million-node and hundred-million-edge scale, demonstrate that HeterSEED consistently outperforms representative heterogeneous graph neural networks and recent heterophily-aware baselines, especially in strongly heterophilic regimes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04594v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xinyi Li, Ming Li, Lu Bai, Lixin Cui, Feilong Cao, Ke Lv, Yunliang Jiang, Pietro Li\`o</dc:creator>
    </item>
    <item>
      <title>A Queueing-Theoretic Framework for Stability Analysis of LLM Inference with KV Cache Memory Constraints</title>
      <link>https://arxiv.org/abs/2605.04595</link>
      <description>arXiv:2605.04595v1 Announce Type: new 
Abstract: The rapid adoption of large language models (LLMs) has created significant challenges for efficient inference at scale. Unlike traditional workloads, LLM inference is constrained by both computation and the memory overhead of key-value (KV) caching, which accelerates decoding but quickly exhausts GPU memory. In this paper, we introduce the first queueing-theoretic framework that explicitly incorporates both computation and GPU memory constraints into the analysis of LLM inference. Based on this framework, we derive rigorous stability and instability conditions that determine whether an LLM inference service can sustain incoming demand without unbounded queue growth. This result offers a powerful tool for system deployment, potentially addressing the core challenge of GPU provisioning. By combining an estimated request arrival rate with our derived stable service rate, operators can calculate the necessary cluster size to avoid both costly over-purchasing and performance-violating under-provisioning. We further validate our theoretical predictions through extensive experiments in real GPU production environments. Our results show that the predicted stability conditions are highly accurate, with deviations typically within 10%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04595v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chengyi Nie, Nian Si, Zijie Zhou</dc:creator>
    </item>
    <item>
      <title>A Blockchain-as-a-Service Solution for TAFES-Compliant Verification of Fair Trade Certifications</title>
      <link>https://arxiv.org/abs/2605.04600</link>
      <description>arXiv:2605.04600v1 Announce Type: new 
Abstract: \abstract{\textbf{Purpose:} This study addresses the lack of trust in ethical product labels by designing a blockchain platform grounded in the TAFES principles (Transparency, Accountability, Fairness, Ethics, Safety). It aims to bridge the gap between blockchain's theoretical transparency and a responsible, real-world implementation for certification ecosystems.
  \textbf{Design/Methodology/Approach:} Using Action Design Research, we developed a proof-of-concept platform for label authentication. A hybrid architecture records critical events on an Ethereum Layer-2 network for security, while supporting evidence is stored off-chain via IPFS and linked via content identifiers. The solution was validated through a coffee supply chain scenario.
  \textbf{Findings:} The proof of concept demonstrates how a TAFES-aligned blockchain platform can support verification of label claims without requiring trust in a single intermediary by creating tamper-evident provenance records and auditable certification evidence across multiple stakeholders. The design supports low-cost, near-real-time anchoring of supply chain events while mitigating adoption barriers related to scalability, privacy, and operational viability.
  \textbf{Originality/Value:} This research contributes an integrated ethical and technical blueprint for trustworthy label authentication systems by translating TAFES into implementable design requirements and evaluation checks, and validating them through an ADR driven proof of concept. It advances prior work by moving from the question of whether blockchain can help to the question of how it should be implemented responsibly in multi stakeholder certification ecosystems.}</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04600v1</guid>
      <category>cs.CE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nadia Dahmani, Peihao Li, Ravi S. Sharma</dc:creator>
    </item>
    <item>
      <title>Reference-based Category Discovery: Unsupervised Object Detection with Category Awareness</title>
      <link>https://arxiv.org/abs/2605.04606</link>
      <description>arXiv:2605.04606v1 Announce Type: new 
Abstract: Traditional one-shot detection methods have addressed the closed-set problem in object detection, but the high cost of data annotation remains a critical challenge. General unsupervised methods generate pseudo boxes without category labels, thus failing to achieve category-aware classification. To overcome these limitations, we propose Reference-based Category Discovery (RefCD), an unsupervised detector that enables category-aware\footnotemark[1] detection without any manually annotated labels. It leverages feature similarity between predicted objects and unlabeled reference images. Unlike previous unsupervised methods that lack category guidance and one-shot methods which require labeled data, RefCD introduces a carefully designed feature similarity loss to explicitly guide the learning of potential category-specific features. Additionally, RefCD supports category-agnostic detection without reference images, serving as a unified framework. Comprehensive quantitative and qualitative analysis of category-aware and category-agnostic detection results demonstrates its effectiveness, and RefCD can learn category information in an unsupervised paradigm even without category labels.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04606v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yichen Li, Qiankun Liu, Ying Fu</dc:creator>
    </item>
    <item>
      <title>Right Model, Right Time: Real-Time Cascaded-Fidelity MPC for Bipedal Walking</title>
      <link>https://arxiv.org/abs/2605.04607</link>
      <description>arXiv:2605.04607v1 Announce Type: new 
Abstract: This paper presents a multi-phase whole-body model predictive control approach for bipedal walking, combining a detailed whole-body model in the near horizon with a simplified single-rigid-body model in the later prediction steps. This reduces computational complexity while retaining prediction capabilities. The resulting nonlinear optimal control problem is solved using sequential quadratic programming (SQP) in acados. Using a prior specified contact schedule and a target walking speed, the controller optimizes joint torques without depending on prior selected foot step locations. The controller is validated in MuJoCo simulation on the 18-DoF bipedal robot HyPer-2</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04607v1</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Franek Stark, Felix Wiebe, Shubham Vyas, Dennis Mronga, Frank Kirchner</dc:creator>
    </item>
    <item>
      <title>SensingAgents: A Multi-Agent Collaborative Framework for Robust IMU Activity Recognition</title>
      <link>https://arxiv.org/abs/2605.04608</link>
      <description>arXiv:2605.04608v1 Announce Type: new 
Abstract: Human Activity Recognition (HAR) using Inertial Measurement Unit (IMU) sensors is a cornerstone of mobile health, smart environments, and human-computer interaction. However, current deep learning-based HAR models often struggle with heavy reliance on labeled data, position-specific ambiguity, and a lack of transparent reasoning. Inspired by the advanced agents framework, which emulates a collaborative agent using Large Language Models (LLMs), we propose SensingAgents, a novel multi-agent system for robust IMU activity recognition. SensingAgents organizes LLM-powered agents into specialized roles: a group of Analyst Agents for position-specific sensor analysis (arm, wrist, belt, pocket), a pair of Advocate Agents that resolves sensor conflicts through dynamic and static dialectical debates, and a Decision Agent that ensures reliability under sensor drift or failure. Evaluation on the Shoaib dataset demonstrates that SensingAgents significantly outperforms state-of-the-art single-agent and multi-agent LLM models, achieving an accuracy of 79.5% in a zero setting--29% higher than existing agent models and 9.4% higher than deep learning baselines--particularly in complex scenarios where multi-sensor data is conflicting or noisy. Our work highlights the potential of multi-agent collaborative reasoning for advancing the robustness and interpretability of ubiquitous sensing systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04608v1</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Naiyu Zheng, Tianlong Yu, Haochen Yin, Xiaoyi Fan, Xiping Hu, Zhimeng Yin</dc:creator>
    </item>
    <item>
      <title>Advancing Aesthetic Image Generation via Composition Transfer</title>
      <link>https://arxiv.org/abs/2605.04609</link>
      <description>arXiv:2605.04609v1 Announce Type: new 
Abstract: Composition is a cornerstone of visual aesthetics, influencing the appeal of an image. While its principles operate independently of specific content, in practice, composition is often coupled with semantics. As a result, existing methods often enhance composition either through implicit learning or by semantics-based layout control, rather than explicitly modeling composition itself. To address this gap, we introduce Composer, a framework rooted in aesthetic theory, designed to model composition in a semantic-agnostic manner. First, it supports composition transfer by extracting key composition-aware representations from a reference image and leveraging a tailored conditional guidance module to control composition based on pre-trained diffusion models. Second, when users specify only text themes without a composition reference, Composer supports theme-driven composition retrieval by leveraging the in-context learning capabilities of Large Vision-Language Models (LVLMs), achieving explicit composition planning. To enhance composition in a reference-free mode, we conduct text-to-composition fine-tuning on the trained control module to enable implicit composition planning. Furthermore, we curated a high-quality dataset comprising 2 million image-text pairs using state-of-the-art generative models to support model training. Experimental results demonstrate that Composer significantly enhances aesthetic quality in text-to-image tasks and facilitates personalized composition control and transfer, offering users precision and flexibility in the creative process.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04609v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1007/s11263-026-02862-8</arxiv:DOI>
      <arxiv:journal_reference>International Journal of Computer Vision, 2026</arxiv:journal_reference>
      <dc:creator>Kai Zou, Zhiwei Zhao, Bin Liu, Nenghai Yu</dc:creator>
    </item>
    <item>
      <title>Active Contact Sensing for Robust Robot-to-Human Object Handover</title>
      <link>https://arxiv.org/abs/2605.04610</link>
      <description>arXiv:2605.04610v1 Announce Type: new 
Abstract: Robot-to-human object handover is an essential skill for robot assistants, from serving drinks at home to passing surgical tools in the operating room. We expect robots to perform handover robustly -- to release the object only after a firm human grasp while ignoring incidental touches. Existing passive-sensing methods struggle to generalize across diverse objects and human behaviors, as they lack informative perturbations to disambiguate different contact conditions, such as firm grasp versus incidental touch. We propose an active sensing approach for robust handovers: the robot applies information-gathering motions and senses the resulting human-applied forces to infer the contact state. A firm grasp produces forces in multiple directions, while an accidental touch does not. To capture this distinction, we model the contact state with a Bayesian linear model: a distribution over piecewise-linear mappings from robot motions to human-applied forces. This model enables firm grasp detection and active information gathering. In experiments with 12 participants and 30 diverse rigid objects, our method achieved a 97.5% success rate -- over 30% higher than two common baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04610v1</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Linfeng Li, Lin Shao, David Hsu</dc:creator>
    </item>
    <item>
      <title>An Axiomatic Analysis of Proportionality Notions in Approval-Based Multiwinner Voting</title>
      <link>https://arxiv.org/abs/2605.04612</link>
      <description>arXiv:2605.04612v1 Announce Type: new 
Abstract: Even though proportional representation is a fundamental goal in multiwinner voting and a plethora of proportionality notions has been introduced, the normative justifications for choosing one notion over another remain poorly understood. We address this by introducing the axiomatic study of proportionality notions in the approval-based multiwinner voting setting. That is, we define axioms (or desirable properties) that ``good'' proportionality notions should possess. Using these axioms, we then provide axiomatic characterizations of two prominent recently introduced notions: PJR+ and EJR+ [Brill and Peters 2023]. Our characterization proceeds in two parts. Firstly, we provide a characterization of refinements of PJR+ and EJR+. That is, we define axioms such that any notion satisfying these axioms must imply PJR+ (or EJR+, respectively). In particular, the fundamental axiom distinguishing PJR+ and EJR+ from their predecessors PJR and EJR is the classical axiom of monotonicity. Secondly, we introduce our framework of witness-based proportionality notions, that is, proportionality notions that certify ``misrepresentation'' via a witness set of misrepresented voters. In this class, we provide characterizations of PJR+ and EJR+ as the strongest (assuming certain axioms). Thus, by putting both directions together we obtain exact characterizations of both notions. Among our results, it may be worth highlighting that any notion satisfying mild conditions (monotonicity, independence of losers, robustness to fully satisfied voters, and lower quota) refines PJR+. In this sense, PJR+ turns out to be the canonical minimal requirement that one may impose on proportionality.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04612v1</guid>
      <category>cs.GT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chris Dong, Jannik Peters</dc:creator>
    </item>
    <item>
      <title>VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models</title>
      <link>https://arxiv.org/abs/2605.04613</link>
      <description>arXiv:2605.04613v1 Announce Type: new 
Abstract: High-quality singing annotations are fundamental to modern Singing Voice Synthesis (SVS) systems. However, obtaining these annotations at scale through manual labeling is unrealistic due to the substantial labor and musical expertise required, making automatic annotation highly necessary. Despite their utility, current automatic transcription systems face significant challenges: they often rely on complex multi-stage pipelines, struggle to recover text-note alignments, and exhibit poor generalization to out-of-distribution (OOD) singing data. To alleviate these issues, we present VocalParse, a unified singing voice transcription (SVT) model built upon a Large Audio Language Model (LALM). Specifically, our novel contribution is to introduce an interleaved prompting formulation that jointly models lyrics, melody, and word-note correspondence, yielding a generated sequence that directly maps to a structured musical score. Furthermore, we propose a Chain-of-Thought (CoT) style prompting strategy, which decodes lyrics first as a semantic scaffold, significantly mitigating the context disruption problem while preserving the structural benefits of interleaved generation. Experiments demonstrate that VocalParse achieves state-of-the-art SVT performance on multiple singing datasets. The source code and checkpoint are available at https://github.com/pymaster17/VocalParse.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04613v1</guid>
      <category>cs.SD</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yukun Chen, Tianrui Wang, Zhaoxi Mu, Xinyu Yang, EngSiong Chng</dc:creator>
    </item>
    <item>
      <title>Beyond Retrieval: A Multitask Benchmark and Model for Code Search</title>
      <link>https://arxiv.org/abs/2605.04615</link>
      <description>arXiv:2605.04615v1 Announce Type: new 
Abstract: Code search has usually been evaluated as first-stage retrieval, even though production systems rely on broader pipelines with reranking and developer-style queries. Existing benchmarks also suffer from data contamination, label noise, and degenerate binary relevance. In this paper, we introduce \textsc{CoREB}, a contamination-limited, multitask \underline{co}de \underline{r}etrieval and r\underline{e}ranking \underline{b}enchmark, together with a fine-tuned code reranker, that goes beyond retrieval to cover the full code search pipeline. \textsc{CoREB} is built from counterfactually rewritten LiveCodeBench problems in five programming languages and delivered as timed releases with graded relevance judgments. We benchmark eleven embedding models and five rerankers across three tasks: text-to-code, code-to-text, and code-to-code. Our experiments reveal that: \circone code-specialised embeddings dominate code-to-code retrieval (${\sim}2{\times}$ over general encoders), yet no single model wins all three tasks; \circtwo short keyword queries, the format closest to real developer search, collapse every model to near-zero nDCG@10; \circthree off-the-shelf rerankers are task-asymmetric, with a 12-point swing on code-to-code and no baseline net-positive across all tasks; \circfour our fine-tuned \textsc{CoREB-Reranker} is the first to achieve consistent gains across all three tasks. The data and model are released.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04615v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Siqiao Xue, Zihan Liao, Jin Qin, Ziyin Zhang, Yixiang Mu, Fan Zhou, Hang Yu</dc:creator>
    </item>
    <item>
      <title>Guidelines for Designing AI Technologies to Support Adult Learning</title>
      <link>https://arxiv.org/abs/2605.04616</link>
      <description>arXiv:2605.04616v1 Announce Type: new 
Abstract: AI-powered educational technologies have demonstrated measurable benefits for learners, but their design and evaluation have largely centered on K-12 contexts. As a result, many AI-supported learning systems remain poorly aligned with the needs, constraints, and goals of adult learners. To better understand how AI systems function in adult education, this paper examines the deployment of several AI learning technologies developed within a multidisciplinary, national research institute in the United States focused on adult learning and online education. Drawing on longitudinal deployment data, we conducted a reflexive thematic analysis to identify recurring challenges and design considerations across systems. These insights were synthesized into a set of 19 design guidelines intended to inform future AI-supported adult learning technologies. We demonstrate the utility of these guidelines through a heuristic evaluation of the deployed systems. Lastly, we present a guideline exploration tool that aids in the ideation of technologies by connecting the guidelines to stakeholder statements surfaced in the analysis process.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04616v1</guid>
      <category>cs.CY</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3800645.3813102</arxiv:DOI>
      <dc:creator>Jennifer M. Reddig, Glen R. Smith Jr, Sanaz Ahmadzadeh Siyahrood, Wesley G. Morris, Yoojin Bae, Kaitlyn Crutcher, John Kos, Rahul K. Dass, Jinho Kim, Momin Naushad Siddiqui, Daniel Weitekamp, Ploy Thajchayapong, Sandeep Kakar, Alex Endert, Scott Crossley, Min Kyu Kim, Chris Dede, Ashok Goel, Christopher J. MacLellan</dc:creator>
    </item>
    <item>
      <title>Temporal Structure Matters for Efficient Test-Time Adaptation in Wearable Human Activity Recognition</title>
      <link>https://arxiv.org/abs/2605.04617</link>
      <description>arXiv:2605.04617v1 Announce Type: new 
Abstract: Wearable human activity recognition (WHAR) models often suffer from performance degradation under real-world cross-user distribution shifts. Test-time adaptation (TTA) mitigates this degradation by adapting models online using unlabeled test streams, yet existing methods largely inherit assumptions from vision tasks and underexploit the inherent inter-window temporal structure in WHAR streams. In this paper, we revisit such temporal structure as a feature-conditioned inference signal rather than merely an output-space smoothing prior. We derive the insight that temporal continuity and observation-induced feature deviations provide complementary cues for determining when to preserve or release temporal inertia and where to route prediction refinement during likely transitions. Building upon this insight, we propose SIGHT, a lightweight and backpropagation-free TTA framework for WHAR, enabling real-time edge deployment. SIGHT estimates predictive surprise by comparing the current feature with a prototype-based expected state, and then uses the resulting feature deviation to guide geometry-aware transition routing based on prototype alignment and stream-level marginal habit tracking. Evaluations on real-world datasets confirm that SIGHT outperforms existing TTA baselines while reducing computational and memory costs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04617v1</guid>
      <category>cs.CV</category>
      <category>cs.HC</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zishu Zhou, Zaipeng Xie, Xuanyao Jie</dc:creator>
    </item>
    <item>
      <title>Constructions of locally repairable codes via concatenated codes</title>
      <link>https://arxiv.org/abs/2605.04618</link>
      <description>arXiv:2605.04618v1 Announce Type: new 
Abstract: In recent years, locally repairable codes (LRCs) have attracted considerable attention owing to their pivotal role in distributed storage systems. Since binary linear locally repairable codes can significantly reduce the complexity of both encoding and decoding processes, the construction of binary LRCs has attracted extensive research interest. In this paper, we construct locally repairable codes via concatenated codes and present a systematic approach to select outer codes to obtain optimal binary LRCs, where the outer codes are linear codes over $\mathbb{F}_4$. The weight distributions of the resulting LRCs are determined by the weight distributions of the selected linear codes over $\mathbb{F}_4$. Furthermore, several classes of optimal binary locally repairable codes are constructed, including binary LRCs meeting the Griesmer-like bound, and binary perfect LRCs. Meanwhile, for the locality $r=2$, we improve the Johnson-like bound for binary LRCs with disjoint local repair groups established by Ma and Ge, and construct explicit LRCs that attain this new bound.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04618v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hengfeng Jin, Fang-Wei Fu</dc:creator>
    </item>
    <item>
      <title>Library learning with e-graphs on jazz harmony</title>
      <link>https://arxiv.org/abs/2605.04622</link>
      <description>arXiv:2605.04622v1 Announce Type: new 
Abstract: Humans can acquire a highly structured intuitive understanding of musical patterns, yet these patterns often require multiple iterations of reflection and re-listening to internalize fully. To capture such an internalization process, we present a computational model for the learning of jazz harmonic patterns based on library learning. Given a corpus of harmonic progressions, our model searches over a space of programs composed of primitive harmonic relations in order to discover concise generative explanations of the corpus. The model first enumerates possible programs for each piece, and then jointly learns a library of harmonic patterns and refactored programs. To efficiently navigate the vast joint space of programs and libraries, we integrate deductive parsing with library learning on e-graphs. We explore how well our model captures aspects of human musical pattern learning by evaluating the intuitiveness of both programs and libraries, as well as similarities to human-written harmonic derivations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04622v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.SC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zeng Ren, Maddy Bowers, Xinyi Guan, Martin Rohrmeier</dc:creator>
    </item>
    <item>
      <title>AuditRepairBench: A Paired-Execution Trace Corpus for Evaluator-Channel Ranking Instability in Agent Repair</title>
      <link>https://arxiv.org/abs/2605.04624</link>
      <description>arXiv:2605.04624v1 Announce Type: new 
Abstract: Agent-repair leaderboards reorder under evaluator reconfiguration, and a measurable share of the reordering is produced by methods that consult evaluator-derived signal during internal selection of candidate repairs. We document this failure mode on a public leaderboard and release AuditRepairBench, a paired-execution trace corpus of 576,000 registered cells (96,000 executed) that operationalizes evaluator-channel-blocking ranking instability within a declared observability boundary. A modular screening architecture decides pathway-blocking through four interchangeable implementations, a learned influence proxy, a rule-based channel-exposure ratio that uses no trained model, a counterfactual sensitivity proxy, and a sparse human-audit proxy, combined into a screening posterior that feeds a cell-level flip functional, a set-valued label, a stratified system score, and a set-valued leaderboard. The resource is supported by mechanism-anchored validation on an 80-case source-level channel-surgery subset, an independent-discovery protocol under which two annotator groups separated from the pipeline developers discover coupling patterns blinded to the screening design and the frozen ensemble attains pooled AUROC 0.83 on their 79 cases, implementation robustness, uncertainty propagation that raises 95% coverage from 0.81 to 0.95, and forward transfer with pooled community-evaluator Spearman \r{ho} = 0.65. Screening-guided blinding patches reduce rank displacement by 55--74% (mean 62%) at fewer than 50 lines of code, whereas random channel blinding produces at most 7% reduction and generic retraining at most 13%. AuditRepairBench-Lite, a rule-only configuration on a 12,000-cell subset, preserves the leaderboard at Kendall {\tau} = 0.88 under twenty-four GPU-hours and is the primary release artifact at 42 GB.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04624v1</guid>
      <category>cs.AI</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song</dc:creator>
    </item>
    <item>
      <title>Autonomous Synchronization of Discrete-Time Heterogeneous Multiagent Systems</title>
      <link>https://arxiv.org/abs/2605.04627</link>
      <description>arXiv:2605.04627v1 Announce Type: new 
Abstract: This paper investigates the autonomous synchronization problem for discrete-time heterogeneous multiagent systems.
  The synchronization problem is transformed into the asymptotic decoupling problem of stable modes in a class of discrete-time linear time-varying systems,
  for which we provide a sufficient condition.
  Leveraging this condition, synchronization conditions are established.
  The synchronization conditions are based on the average of the agents' initial dynamic matrices,
  without requiring the differences among these matrices to be small.
  This approach reduces the conservativeness of existing conditions and achieves a unification of both homogeneous and heterogeneous systems.
  Numerical simulation results are provided to support the theoretical findings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04627v1</guid>
      <category>cs.MA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wei Hu, Quanyi Liang</dc:creator>
    </item>
    <item>
      <title>CombOL: a Library for Practical Enumeration and Boltzmann Sampling of Combinatorial Classes</title>
      <link>https://arxiv.org/abs/2605.04629</link>
      <description>arXiv:2605.04629v1 Announce Type: new 
Abstract: We present CombOL (Combinatorial Objects Library), an open-source library for the enumeration and Boltzmann sampling of combinatorial classes. Classes can be specified by a concise string syntax, and may depend on an arbitrary number of parameters. CombOL automatically derives the associated generating functions, enabling the generation of counting sequences and the compilation of Boltzmann samplers. The library supports exact and approximate-size Boltzmann rejection sampling with automatic parameter tuning to target specific sizes. In addition to implementing established methods, CombOL contributes a novel early-rejection scheme, as well as guaranteed statistical correctness by dynamically increasing the numerical precision, eliminating bias due to floating-point rounding errors. Through the Python interface, sampled structures can be mapped to application-specific objects, enabling direct sampling of domain objects such as graphs, chemical structure representations, or other complex data types. CombOL is available from PyPI as 'combol' (pypi.org/project/combol). The source code is available at gitlab.com/casbjorn/combol.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04629v1</guid>
      <category>cs.MS</category>
      <category>math.CO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Casper Asbj{\o}rn Eriksen, Daniel Merkle</dc:creator>
    </item>
    <item>
      <title>UniPCB: A Generation-Assisted Detection Framework for PCB Defect Inspection</title>
      <link>https://arxiv.org/abs/2605.04635</link>
      <description>arXiv:2605.04635v1 Announce Type: new 
Abstract: Printed Circuit Board (PCB) defect inspection faces two compounding challenges: scarce and imbalanced defect samples that limit model training, and insufficient feature representation under complex circuit backgrounds. Existing generation methods rely on single-modality conditions with coarse structural control, while detection methods improve architectures without addressing the data bottleneck. To resolve both challenges jointly, we propose a generation-assisted PCB defect inspection framework that integrates controlled defect synthesis with task-specific defect detection. On the generation side, a Multi-modal Condition Generator extracts complementary edge, depth, and text conditions in parallel. A ScaleEncoder then embeds these conditions into the diffusion U-Net at four resolutions, and a Condition Modulation applies FiLM-style spatially-adaptive modulation at each scale, enabling structurally aligned and defect-aware sample synthesis. On the detection side, an Inverted Residual Shift Attention couples self-attention with shift-wise convolution to jointly capture global context and local texture, and a Cross-level Complementary Fusion Block generates pixel-level gates for selective cross-level feature fusion. The synthesized samples directly enrich the detection training set, so that improvements in generation compound with improvements in detection. Extensive experiments on DsPCBSD+ demonstrate that UniPCB achieves mAP@0.5 of 98.0% and mAP@0.5:0.95 of 61.8% on defect detection, surpassing all compared methods, while the generation branch attains an FID of 129.61 and SSIM of 0.619, outperforming existing conditional generation approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04635v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Huan Zhang, Lianghong Tan, Yichu Xu, Jiangzhong Cao, Huanqi Wu, Linwei Zhu, Xu Zhang</dc:creator>
    </item>
    <item>
      <title>SWE-WebDevBench: Evaluating Coding Agent Application Platforms as Virtual Software Agencies</title>
      <link>https://arxiv.org/abs/2605.04637</link>
      <description>arXiv:2605.04637v1 Announce Type: new 
Abstract: The emergence of "vibe coding" platforms, where users describe applications in natural language and AI agents autonomously generate full-stack software, has created a need for rigorous evaluation beyond code-level benchmarks. In order to assess them as virtual software development agencies on understanding business requirements, making architectural decisions, writing production code, handling iterative modifications, and maintaining business readiness, we introduce SWE-WebDev Bench, a 68-metric evaluation framework spanning 25 primary and 43 diagnostic metrics across seven groups, organized along three dimensions: Interaction Mode (App Creation Request (ACR) vs. App Modification Request (AMR)), Agency Angle (Product Manager (PM), Engineering, Ops), and Complexity Tier (T4 multi-role SaaS, T5 AI-native).
  Our evaluation (six platforms, three domains, 18 evaluation cells) reveals four recurring shortcomings in the current generation of AI app builders: (1) A specification bottleneck, where platforms compress rich business requirements into oversimplified technical plans, (2) A pervasive frontend-backend decoupling, where visually polished UIs mask absent or broken backend infrastructure, (3) A steep production-readiness cliff, where no platform scores above 60% on engineering quality and post-generation human effort varies substantially across platforms and (4) Widespread security and infrastructure failures, with no platform exceeding 65% Security Score against a 90% target and concurrency handling as low as 6%. These observations are descriptive of our sample and require larger-scale replication to establish generality. We release SWE-WebDev Bench as a community benchmark to enable such replication and help platform builders identify and address these gaps.
  Code and benchmark resources are available at: https://github.com/snowmountainAi/webdevbench and https://webdevbench.com/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04637v1</guid>
      <category>cs.MA</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Siddhant Saxena, Nilesh Trivedi, Vinayaka Jyothi</dc:creator>
    </item>
    <item>
      <title>Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models</title>
      <link>https://arxiv.org/abs/2605.04638</link>
      <description>arXiv:2605.04638v1 Announce Type: new 
Abstract: Uncertainty quantification (UQ) is an important technique for ensuring the trustworthiness of LLMs, given their tendency to hallucinate. Existing state-of-the-art UQ approaches for free-form generation rely heavily on sampling, which incurs high computational cost and variance. In this work, we propose the first gradient-based UQ method for free-form generation, SemGrad, which is sampling-free and computationally efficient. Unlike prior gradient-based methods developed for classification tasks that operates in parameter space, we propose to consider gradients in semantic space. Our method builds on the key intuition that a confident LLM should maintain stable output distributions under semantically equivalent input perturbations. We interpret the stability as the gradients in semantic space and introduce a Semantic Preservation Score (SPS) to identify embeddings that best capture semantics, with respect to which gradients are computed. We further propose HybridGrad, which combines the strengths of SemGrad and parameter gradients. Experiments demonstrate that both of our methods provide efficient and effective uncertainty estimates, achieving superior performance than state-of-the-art methods, particularly in settings with multiple valid responses.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04638v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Mingda Li, Rundong Lv, Xinyu Li, Weinan Zhang, Ting Liu</dc:creator>
    </item>
    <item>
      <title>Cognitive Alignment Drives Attention: Modeling and Supporting Socially Shared Regulation in Pair Programming</title>
      <link>https://arxiv.org/abs/2605.04639</link>
      <description>arXiv:2605.04639v1 Announce Type: new 
Abstract: Grounded in socially shared regulation of learning (SSRL), this paper investigates how joint mental effort (JME) and joint visual attention (JVA) serve as process-level indicators of shared regulation in pair programming and how AI-driven adaptive feedback can strengthen these processes.
  We present three eye-tracking studies involving 182 dyads engaged in collaborative debugging tasks. Study 1 examines natural collaboration and shows that high-performing dyads exhibit significantly higher JME and JVA, a greater prevalence of productive high-JME-high-JVA episodes, and a stable causal relationship in which JME predicts JVA. Study 2 evaluates reactive adaptive feedback based on real-time deviations in JME and/or JVA. Results show that combined feedback targeting both dimensions yields the strongest improvements in performance, regulatory coherence, and cognitive-to-attentional causality, outperforming single-channel feedback. Study 3 introduces proactive, forecast-based feedback using machine-learning predictions of future collaboration states. Proactive support further enhances performance and sustains shared regulation by anticipating breakdowns before they manifest.
  Across studies, causal modeling reveals that cognitive alignment systematically drives attentional coordination in successful collaboration, while mismatches between effort and attention characterize unproductive regulation. Methodologically, this work integrates dual eye-tracking, pupillometry, episode-based analysis, and causal inference to capture SSRL as a dynamic, emergent process. Conceptually, the findings position AI not as an automated controller, but as an intelligence-augmenting co-regulator that supports learners' capacity to coordinate effort, attention, and understanding together.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04639v1</guid>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Anahita Golrang, Kshitij Sharma</dc:creator>
    </item>
    <item>
      <title>CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering</title>
      <link>https://arxiv.org/abs/2605.04641</link>
      <description>arXiv:2605.04641v1 Announce Type: new 
Abstract: Although Large Vision-Language Models (LVLMs) have demonstrated remarkable performance on downstream tasks, they frequently produce contents that deviate from visual information, leading to object hallucination. To tackle this, recent works mostly depend on expensive manual annotations and training cost, or decoding strategies which significantly increase inference time. In this work, we observe that LVLMs' attention to visual information is significantly enhanced when answering caption queries compared to non-caption queries. Inspired by this phenomenon, we propose Caption-guided Visual Attention Steering (CAST), a training-free, plug-and-play hallucination mitigation method that leverages the attention activation pattern corresponding to caption queries to enhance LVLMs' visual perception capability. Specifically, we use probing techniques to identify attention heads that are highly sensitive to caption queries and estimate optimized steering directions for their outputs. This steering strengthens LVLM's fine-grained visual perception capabilities, thereby effectively mitigating object hallucination. CAST reduced object hallucination by an average of 6.03% across five widely used LVLMs and five benchmarks including both discriminative and generative tasks, demonstrating state-of-the-art performance while adding little inference cost and preserving other foundational capabilities.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04641v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Qiming Li, Zekai Ye, Xiaocheng Feng, Weihong Zhong, Libo Qin, Ruihan Chen, Lei Huang, Baohang Li, Kui Jiang, Yaowei Wang, Ting Liu, Bing Qin</dc:creator>
    </item>
    <item>
      <title>Securing the Web with HSTS-Enforced</title>
      <link>https://arxiv.org/abs/2605.04642</link>
      <description>arXiv:2605.04642v1 Announce Type: new 
Abstract: TLS stripping attacks expose sensitive web traffic by forcing secure HTTPS connections to fall back to unencrypted HTTP. At present, protection against these attacks relies on website operators explicitly opting into security by deploying mechanisms such as HTTP Strict Transport Security (HSTS) headers. These mechanisms have significant limitations: some are weak or difficult to configure, which raises the risk of misconfiguration and reduces practical adoption; others violate HTTP backward compatibility; at least one can even be abused to enable unintended user tracking.
  We introduce HSTS-Enforced, a mechanism that eliminates the remaining attack surface for TLS stripping while still allowing operators to securely specify that their websites need to be accessed over HTTP when necessary, thereby maintaining accessibility. To achieve this, we flip the current opt-in security model to an opt-out model: all connections default to HTTPS, and operators can explicitly opt out if their websites require HTTP using so-called HTTP-Required indicators. We propose two such HTTP-Required indicators: a new DNS record and an HTTP-Required Preload list. We evaluate HSTS-Enforced under multiple deployment scenarios, demonstrating that it blocks all practical TLS stripping attempts while maintaining compatibility for sites that require HTTP - without introducing overhead in the typical case. Finally, we outline a practical transition path to accelerate global adoption.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04642v1</guid>
      <category>cs.CR</category>
      <category>cs.NI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aaron van Diepen, Adrian Zapletal, Fernando Kuipers</dc:creator>
    </item>
    <item>
      <title>Graph-Augmented LLMs for Swiss MP Ideology Prediction</title>
      <link>https://arxiv.org/abs/2605.04643</link>
      <description>arXiv:2605.04643v1 Announce Type: new 
Abstract: Approximating the ideological position of Members of Parliament (MPs) is a fundamental task in political science, helping researchers understand legislative behavior, party alignment, and policy preferences. While Large Language Models (LLMs) have shown promising results in estimating MPs' ideological stances, there are more actors and elements in the parliamentary system, and relations between them, that could provide a wider and more informative picture. However, due to the complexity of integrating them in the prediction task, these additional elements are generally ignored. In this work, we propose an LLM framework, PG-RAG, that implements a retrieval-augmented generation pipeline: it first queries a political knowledge graph (KG) and then integrates the resulting graph-structured information into the context. This allows for capturing both textual semantics and inter-MP relationships, another relevant information source in any parliamentary system. We evaluate the approach on the task of ideology prediction, using data from a Swiss parliamentary dataset. When comparing graph-augmented models against several state-of-the-art baselines, the results demonstrate that incorporating this enriched information, which encodes information about different entities and relations, improves prediction performance. These results help to highlight the value of domain-specific relational information in modeling political behavior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04643v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yifei Yuan, Luis Salamanca, Sophia Schlosser, Laurence Brandenberger</dc:creator>
    </item>
    <item>
      <title>Heat and mass transfer through fabric: a model for fabric drying with heated cylinders</title>
      <link>https://arxiv.org/abs/2605.04644</link>
      <description>arXiv:2605.04644v1 Announce Type: new 
Abstract: Textile drying is a key operation in the textile production cycle as it represents one of the most energy-intensive stages and plays a critical role in determining both product quality and overall process efficiency. In this work we propose a mathematical model for the drying process of a generic textile material using heated cylinders, operating under low-pressure conditions. The model's parameters are estimated by nonlinear least squares regression. Given a specific fabric, the developed model allows to predict the drying time and the residual moisture content. The model is validated using real world data provided by a major Italian textile company.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04644v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Stefania Bellavia, Nicol\`o Fiorini, Adriano Milazzo, Alessandra Papini</dc:creator>
    </item>
    <item>
      <title>ReflectDrive-2: Reinforcement-Learning-Aligned Self-Editing for Discrete Diffusion Driving</title>
      <link>https://arxiv.org/abs/2605.04647</link>
      <description>arXiv:2605.04647v1 Announce Type: new 
Abstract: We introduce ReflectDrive-2, a masked discrete diffusion planner with separate action expert for autonomous driving that represents plans as discrete trajectory tokens and generates them through parallel masked decoding. This discrete token space enables in-place trajectory revision: AutoEdit rewrites selected tokens using the same model, without requiring an auxiliary refinement network. To train this capability, we use a two-stage procedure. First, we construct structure-aware perturbations of expert trajectories along longitudinal progress and lateral heading directions and supervise the model to recover the original expert trajectory. We then fine-tune the full decision--draft--reflect rollout with reinforcement learning (RL), assigning terminal driving reward to the final post-edit trajectory and propagating policy-gradient credit through full-rollout transitions. Full-rollout RL proves crucial for coupling drafting and editing: under supervised training alone, inference-time AutoEdit improves PDMS by at most $0.3$, whereas RL increases its gain to $1.9$. We also co-design an efficient reflective decoding stack for the decision--draft--reflect pipeline, combining shared-prefix KV reuse, Alternating Step Decode, and fused on-device unmasking. On NAVSIM, ReflectDrive-2 achieves $91.0$ PDMS with camera-only input and $94.8$ PDMS in a best-of-6 oracle setting, while running at $31.8$ ms average latency on NVIDIA Thor.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04647v1</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Huimin Wang, Yue Wang, Bihao Cui, Pengxiang Li, Ben Lu, Mingqian Wang, Tong Wang, Chuan Tang, Teng Zhang, Kun Zhan</dc:creator>
    </item>
    <item>
      <title>From Reach to Insert: Tactile-Augmented Precision Assembly under Sub-Millimeter Tolerances</title>
      <link>https://arxiv.org/abs/2605.04649</link>
      <description>arXiv:2605.04649v1 Announce Type: new 
Abstract: High-precision assembly frequently involves tight-tolerance insertions, where even slight pose errors can cause jamming or excessive interaction forces, making robust and safe insertion policies difficult to obtain. This paper proposes a tactile-augmented two-stage method that combines Imitation Learning (IL) and Reinforcement Learning (RL) for precision insertion tasks. In the first stage, IL learns a reaching policy with position generalization that grasps the peg and brings it to the vicinity of the target region. In the second stage, RL executes the insertion and enables recovery from failures during contact-rich interactions. To better exploit tactile feedback, we introduce tactile group sampling to increase coverage of critical contact segments during training, and design a tactile critic to more accurately evaluate policy values, improving insertion performance while maintaining low contact forces. We conduct systematic experiments across five hole geometries and three clearance settings. Results show that our method substantially improves insertion performance across all settings; under the most challenging 0.05\,mm clearance, it achieves a 67\% success rate while keeping contact forces low, reducing the maximum interaction force by 60\% and torque by 44\%, thereby validating both effectiveness and safety for precision assembly.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04649v1</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xinpan Meng, Siyao Huang, JingPu Yang, Muyuan Ma, Zhenghua Ma, Lijun Han, Gao Yuan, Houcheng Li, Long Cheng</dc:creator>
    </item>
    <item>
      <title>FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation</title>
      <link>https://arxiv.org/abs/2605.04651</link>
      <description>arXiv:2605.04651v1 Announce Type: new 
Abstract: Adapting pretrained models typically involves a trade-off between the high training costs of backpropagation and the heavy inference overhead of memory-based or in-context learning. We propose FAAST, a forward-only associative adaptation method that analytically compiles labeled examples into fast weights in a single pass. By eliminating memory or context dependence, FAAST achieves constant-time inference and decouples task adaptation from pretrained representation. Across image classification and language modeling benchmarks, FAAST matches or exceeds backprop-based adaptation while reducing adaptation time by over 90\% and is competitive to memory/context-based adaptation while saving memory usage by up to 95\%. These results demonstrate FAAST as a highly efficient, scalable solution for supervised task adaptation, particularly for resource-constrained models. We release the code and models at https://github.com/baoguangsheng/faast.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04651v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Guangsheng Bao, Hongbo Zhang, Han Cui, Yanbin Zhao, Yue Zhang</dc:creator>
    </item>
    <item>
      <title>CHE-TKG: Collaborative Historical Evidence and Evolutionary Dynamics Learning for Temporal Knowledge Graph Reasoning</title>
      <link>https://arxiv.org/abs/2605.04652</link>
      <description>arXiv:2605.04652v1 Announce Type: new 
Abstract: Temporal knowledge graph (TKG) reasoning aims to predict future events from historical facts. A key challenge lies in jointly capturing two sources of predictive information in TKGs: historical evidence and evolutionary dynamics. However, existing methods typically focus on only one of these sources, which limits the ability to fully exploit the complementary predictive signals in TKGs. To address this, we propose CHE-TKG, a novel collaborative dual-view learning framework for TKG reasoning. CHE-TKG explicitly separates and jointly models historical evidence and evolutionary dynamics, aiming to learn and exploit their complementary predictive signals. Specifically, CHE-TKG constructs a historical evidence graph to capture long-term structural regularities and stable relational constraints, alongside an evolutionary dynamics graph to model temporal transitions and recent changes, with dedicated encoders for each view. We further employ relation decomposition and a contrastive alignment objective to better capture the predictive signals across the two views. Extensive experiments demonstrate that CHE-TKG achieves state-of-the-art performance on multiple benchmarks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04652v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shuai-long Lei, Xiaobin Zhu, Jiarui Liang, Guoxi Sun, Zhiyu Fang, Xu-Cheng Yin</dc:creator>
    </item>
    <item>
      <title>Threshold-Guided Optimization for Visual Generative Models</title>
      <link>https://arxiv.org/abs/2605.04653</link>
      <description>arXiv:2605.04653v1 Announce Type: new 
Abstract: Aligning large visual generative models with human feedback is often performed through pairwise preference optimization. While such approaches are conceptually simple, they fundamentally rely on annotated pairs, limiting scalability in settings where feedback is collected as independent scalar ratings. In this work, we revisit the KL-regularized alignment objective and show that the optimal policy implicitly compares each sample's reward to an instance-specific baseline that is generally intractable. We propose a threshold-guided alignment framework that replaces this oracle baseline with a data-driven global threshold estimated from empirical score statistics. This formulation turns alignment into a binary decision task on unpaired data, enabling effective optimization directly from scalar feedback. We also incorporate a confidence weighting term to emphasize samples whose scores deviate strongly from the threshold, improving sample efficiency. Experiments across both diffusion and masked generative paradigms, spanning three test sets and five reward models, show that our method consistently improves preference alignment over previous methods. These results position our threshold-guided framework as a simple yet principled alternative for aligning visual generative models without paired comparisons.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04653v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jinbin Bai, Yu Lei, Qingyu Shi, Aosong Feng, Yi Xin, Zhuoran Zhao, Fei Shen, Kaidong Yu, Jason Li</dc:creator>
    </item>
    <item>
      <title>Adaptive MPC for Constrained Trajectory Tracking of Uncertain LTI System with Input-Rate Limits</title>
      <link>https://arxiv.org/abs/2605.04656</link>
      <description>arXiv:2605.04656v1 Announce Type: new 
Abstract: This paper addresses the trajectory-tracking problem for discrete-time linear time-invariant systems with bounded parametric uncertainty, subject to hard constraints on system states, control inputs, and input rates. Unlike existing methods, which often consider only partial uncertainty, omit input-rate or state constraints, or focus on regulation problems, this work provides a systematic adaptive model predictive control (MPC) solution for constrained trajectory tracking under full parametric uncertainty. Determining the control input required to achieve zero tracking error under unknown parameters is challenging. Simultaneously, trajectory tracking under uncertainty with input-rate constraints induces temporal coupling in the control sequence, resulting in a time-varying admissible control set and rendering standard recursive feasibility arguments inapplicable. These challenges are overcome by systematically utilizing the estimated system parameters, coupled with a suitably designed adaptive learning process within a reformulated MPC framework. The recursive feasibility of the proposed MPC optimization routine is then rigorously established despite the time-varying admissible control set induced by input-rate constraints. Closed-loop stability is guaranteed via Lyapunov-based analysis, ensuring convergence of the tracking error and boundedness of system states. Simulation results validate the effectiveness of the pr</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04656v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bishal Dey, Abhishek Dhar, Sumit kr. Pandey, Anindita Sengupta</dc:creator>
    </item>
    <item>
      <title>Logics for Context-free Hyperproperties</title>
      <link>https://arxiv.org/abs/2605.04657</link>
      <description>arXiv:2605.04657v1 Announce Type: new 
Abstract: We introduce a novel logic for the specification of context-free hyperproperties, which capture, e.g., the flow of information in security-critical recursive systems. Intuitively, the logic extends visibly pushdown automata by quantification over traces, just like HyperLTL, the most important logic for regular hyperproperties, extends LTL by quantification over traces. Using a game-based approach, we show that model-checking is decidable for formulas with a single quantifier alternation, provided the stack height of the visibly pushdown automaton only depends on the traces bound to the variables of the first quantifier block. A single quantifier alternation suffices to express many information-flow properties studied in the literature. Complementarily, we show that model-checking is undecidable for formulas with a single quantifier alternation, if the stack behavior of the visibly pushdown automaton may depend on the second quantifier block. This also implies that model-checking is undecidable for almost all fragments with more than one quantifier alternation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04657v1</guid>
      <category>cs.LO</category>
      <category>cs.FL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sarah Winter, Martin Zimmermann</dc:creator>
    </item>
    <item>
      <title>A third-order multi-moment cell-centered Lagrangian scheme for hydrodynamics with an accurate 2D nodal solver</title>
      <link>https://arxiv.org/abs/2605.04660</link>
      <description>arXiv:2605.04660v1 Announce Type: new 
Abstract: This paper presents a novel high-order cell-centered Lagrangian scheme for 2D compressible hydrodynamics by bridging the multi-moment constrained finite volume method (MCV) [16, 51, 52] with a nodal Riemann solver. This scheme (denoted by LMCV) not only maintains high-order accuracy as MCV but also inherits the conservation and robust properties of the nodal Riemann solver. On the one hand, the MCV employs and evolves both the point-values (PV) at cell vertexes and the volume-integrated averages (VIA) on computational mesh, which ensures the rigorous numerical conservation and establishes an adequate foundation for the computation of Lagrangian fluxes with high accuracy. On the other hand, we developed a 2D Riemann solver based on EUCCLHYD [24], it takes fully advantage of numerical formulations from high-order scheme and accomplishes the compatibility between the mesh movement and numerical fluxes. The main new features of the solver are the introduction of a new set of jump and balance conditions. The jump condition provides a high-accurate formulation linking the surface pressure of each cell to its nodal velocity, while the balance condition ensures nodal conservation and stabilizes the velocity field without losing accuracy. More intriguing is that our nodal solver can be regarded as a natural high-order extension of the HLLC and the HLLC-2D [41] solvers. The comparison between these solvers better demonstrates our innovative approach in addressing the difficulties encountered in constructing 2D high-order Lagrangian schemes. A variety of numerical experiments are carried out to illustrate the accuracy and robustness of the algorithm.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04660v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiaoteng Zhang, Xun Wang, Zhijun Shen, Chao Yang</dc:creator>
    </item>
    <item>
      <title>Contact Matrix: Enhancing Dance Motion Synthesis with Precise Interaction Modeling</title>
      <link>https://arxiv.org/abs/2605.04662</link>
      <description>arXiv:2605.04662v1 Announce Type: new 
Abstract: Generating realistic reactive motions, in which one person reacts to the fixed motions of others, is challenging due to strict interaction constraints and a limited feasible solution space. This paper focuses on a typical scenario: duet dance, where high-quality data is scarce, motion patterns are complex, and the details of human interactions are both intricate and abundant. To tackle these challenges, we propose a novel two-stage framework. In the first stage, we introduce a motion VQ-VAE with separate body-part encoders and a joint decoder, enabling specialized codebooks to enhance representation capacity while dynamically modeling dependencies across body parts during decoding, thereby preventing inconsistencies in the generated motions. In the second stage, we propose a contact-aware diffusion model for reactive motion generation that jointly generates motion and a contact matrix between individuals, enabling explicit interaction modeling and providing guidance toward more precise and constrained interaction dynamics during sampling. Experiments show that our method outperforms Duolando with lower $\text{FID}_k$ (8.89 vs. 25.30) and $\text{FID}_{cd}$ (8.01 vs. 9.97), as well as a higher BED (0.4606 vs. 0.2858), indicating improved interaction fidelity and rhythmic synchronization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04662v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xuhai Chen, Zhi Cen, Huaijin Pi, Sida Peng, Xiaowei Zhou, Yong Liu</dc:creator>
    </item>
    <item>
      <title>Evidence-based anomaly detection in clinical domains</title>
      <link>https://arxiv.org/abs/2605.04664</link>
      <description>arXiv:2605.04664v1 Announce Type: new 
Abstract: Anomaly detection methods can be very useful in identifying interesting or concerning events. In this work, we develop and examine new probabilistic anomaly detection methods that let us evaluate management decisions for a specific patient and identify those decisions that are highly unusual with respect to patients with the same or similar condition. The statistics used in this detection are derived from probabilistic models such as Bayesian networks that are learned from a database of past patient cases. We apply our methods to the problem of identifying unusual patient-management decisions in post-surgical cardiac patients.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04664v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Milos Hauskrecht, Michal Valko, Branislav Kveton, Shyam Visweswaran, Gregory Cooper</dc:creator>
    </item>
    <item>
      <title>Paraphrase-Induced Output-Mode Collapse: When LLMs Break Character Under Semantically Equivalent Inputs</title>
      <link>https://arxiv.org/abs/2605.04665</link>
      <description>arXiv:2605.04665v1 Announce Type: new 
Abstract: When the substantive content of a request is rewritten, do large language models still answer in the format the original task asked for? We find that they often do not, even at temperature zero. On a 150-query evaluation over five compact 2025-era LLMs and four task types, we observe a systematic failure mode we call prompt-variant output-mode collapse: when a closed-form prompt asks for a bare label or a single choice token, content-preserving prompt variants can push the model into conversational prose, the requested format dissolves, and exact-match evaluation pipelines silently misjudge the result. To make this measurable, we release PARACONSIST, a 900-prompt benchmark of 150 base queries with five lexical, syntactic, and semantic-expansion prompt variants each, and a Semantic Consistency Score that decomposes prompt-variant robustness into answer consistency, sentence-BERT semantic similarity, and length stability. Under a whole-word answer-set match, only ~22% of closed-form variant responses preserve the ground-truth label inside their output, while ~78% drift away from the answer space entirely. In our pool, the dominant predictor of collapse is task structure rather than model identity, with model differentiation jointly carried by answer consistency and length stability. Robustness audits should therefore track response-mode preservation as a first-class reliability target alongside answer accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04665v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Aofan Liu, Jingxiang Meng</dc:creator>
    </item>
    <item>
      <title>Feature importance analysis for patient management decisions</title>
      <link>https://arxiv.org/abs/2605.04666</link>
      <description>arXiv:2605.04666v1 Announce Type: new 
Abstract: The objective of this paper is to understand what characteristics and features of clinical data influence physician's decision about ordering laboratory tests or prescribing medications the most. We conduct our analysis on data and decisions extracted from electronic health records of 4486 post-surgical cardiac patients. The summary statistics for 335 different lab order decisions and 407 medication decisions are reported. We show that in many cases, physician's lab-order and medication decisions can be well predicted from a small subset of all features.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04666v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.3233/978-1-60750-588-4-861</arxiv:DOI>
      <dc:creator>Michal Valko, Milos Hauskrecht</dc:creator>
    </item>
    <item>
      <title>ITBoost: Information-Theoretic Trust for Robust Boosting</title>
      <link>https://arxiv.org/abs/2605.04671</link>
      <description>arXiv:2605.04671v1 Announce Type: new 
Abstract: Gradient boosting remains a strong and widely used method for tabular data learning, but its performance often degrades when training labels are noisy. This behavior is largely related to the way boosting algorithms emphasize samples with large gradients, without explicitly accounting for whether such errors originate from informative hard cases or from unreliable labels. We address this issue by reconsidering how sample reliability is evaluated during boosting. Instead of relying on instantaneous error, we examine the evolution of each sample's residuals across iterations. Based on this insight, we propose Information-Theoretic Trust Boosting (ITBoost), which uses the Minimum Description Length principle to measure the complexity of residual trajectories. Samples whose residual patterns fluctuate in an irregular manner are treated as less trustworthy and are down-weighted during learning. Theoretically, we derive a tighter generalization bound for ITBoost under label noise. Empirical results on various tabular benchmarks indicate that ITBoost provides improved robustness in noisy environments over leading boosting and deep tabular models, while retaining best average performance on clean data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04671v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ye Su, Longlong Zhao, Diego Garcia-Gil, Jipeng Guo, Gangchun Zhang, Jinxin Chen, Jinsong Chen</dc:creator>
    </item>
    <item>
      <title>AI-Aided Advancements in Autonomous Underwater Vehicle Navigation</title>
      <link>https://arxiv.org/abs/2605.04672</link>
      <description>arXiv:2605.04672v1 Announce Type: new 
Abstract: Autonomous underwater vehicles (AUVs) have become indispensable for deep-sea exploration, spanning critical scientific research and commercial applications. The rapid attenuation of electromagnetic waves renders satellite radio signals unavailable, while the dynamic unpredictability of the marine environment presents formidable navigation challenges. This chapter explores recent advancements in AI-aided AUV positioning, specifically focusing on advanced sensor fusion architectures that integrate inertial navigation systems with Doppler velocity logs and cameras. Beyond traditional model-based filtering, we examine the transformative emergence of AI-driven learning approaches in enhancing inertial dead-reckoning tasks and adaptive fusion algorithms. By addressing these recent milestones, this chapter provides a comprehensive roadmap for achieving the high-precision navigation essential for autonomous underwater missions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04672v1</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Guy Damari, Zeev Yampolsky, Nadav Cohen, Arup Kumar Sahoo, Jeryes Danial, Felipe O. Silva, Itzik Klein</dc:creator>
    </item>
    <item>
      <title>Physical Adversarial Clothing Evades Visible-Thermal Detectors via Non-Overlapping RGB-T Pattern</title>
      <link>https://arxiv.org/abs/2605.04675</link>
      <description>arXiv:2605.04675v1 Announce Type: new 
Abstract: Visible-thermal (RGB-T) object detection is a crucial technology for applications such as autonomous driving, where multimodal fusion enhances performance in challenging conditions like low light. However, the security of RGB-T detectors, particularly in the physical world, has been largely overlooked. This paper proposes a novel approach to RGB-T physical attacks using adversarial clothing with a non-overlapping RGB-T pattern (NORP). To simulate full-view (0$^{\circ}$--360$^{\circ}$) RGB-T attacks, we construct 3D RGB-T models for human and adversarial clothing. NORP is a new adversarial pattern design using distinct visible and thermal materials without overlap, avoiding the light reduction in overlapping RGB-T patterns (ORP). To optimize the NORP on adversarial clothing, we propose a spatial discrete-continuous optimization (SDCO) method. We systematically evaluated our method on RGB-T detectors with different fusion architectures, demonstrating high attack success rates both in the digital and physical worlds. Additionally, we introduce a fusion-stage ensemble method that enhances the transferability of adversarial attacks across unseen RGB-T detectors with different fusion architectures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04675v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaopei Zhu, Guanning Zeng, Zhanhao Hu, Jun Zhu, Xiaolin Hu</dc:creator>
    </item>
    <item>
      <title>CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement</title>
      <link>https://arxiv.org/abs/2605.04677</link>
      <description>arXiv:2605.04677v1 Announce Type: new 
Abstract: We present CodeEvolve, an evolutionary framework for improving program performance and code quality with Large Language Models (LLMs). CodeEvolve extends OpenEvolve with runtime-guided target selection, Monte Carlo Tree Search (MCTS), automated code refinement, and language-specific evaluation pipelines for Java and Salesforce Apex. The system uses Java Flight Recorder (JFR) profiles to build weighted component graphs and select optimization targets that account for most execution cost, reducing reliance on manual bottleneck identification. For each target, CodeEvolve generates candidate edits, evaluates them through build validation, unit tests, performance checks, static analysis, and LLM-based review, and retains only variants that preserve functional correctness. Across real-world optimization tasks, CodeEvolve improves performance and code metrics while maintaining correctness. On a large enterprise Java codebase, it achieves an average speedup of 15.22$\times$ across seven hotspot functions and outperforms single-pass LLM optimization on five of them. An ablation study on Apex optimization shows that the full MCTS-augmented configuration produces 19.5 valid programs out of 20 on average, indicating that search, filtering, and refinement each contribute to more reliable optimization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04677v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Ajay Krishna Borra, Wenzhuo Yang, Samarth Arora, Akhilesh Deepak Gotmare, Gokulakrishnan Gopalakrishnan, Tharun Gali, Madhav Rathi, Doyen Sahoo, Manpreet Singh, Mayuresh Verma, Laksh Venka, Shuchita Singh</dc:creator>
    </item>
    <item>
      <title>From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models</title>
      <link>https://arxiv.org/abs/2605.04678</link>
      <description>arXiv:2605.04678v1 Announce Type: new 
Abstract: Latent actions serve as an intermediate representation that enables consistent modeling of vision-language-action (VLA) models across heterogeneous datasets. However, approaches to supervising VLAs with latent actions are fragmented and lack a systematic comparison. This work structures the study of latent action supervision from two perspectives: (i) regularizing the trajectory via image-based latent actions, and (ii) unifying the target space with action-based latent actions. Under a unified VLA baseline, we instantiate and compare four representative integration strategies. Our results reveal a formulation-task correspondence: image-based latent actions benefit long-horizon reasoning and scene-level generalization, whereas action-based latent actions excel at complex motor coordination. Furthermore, we find that directly supervising the VLM with discrete latent action tokens yields the most effective performance. Finally, our experiments offer initial insights into the benefits of latent action supervision in mixed-data, suggesting a promising direction for VLA training. Code is available at https://github.com/RUCKBReasoning/From_Pixels_to_Tokens.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04678v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yihan Lin, Haoyang Li, Yang Li, Haitao Shen, Yihan Zhao, Chao Shao, Jing Zhang</dc:creator>
    </item>
    <item>
      <title>Ultra Low-Power SDM-based Circuit-Switching for Networks-on-Chip</title>
      <link>https://arxiv.org/abs/2605.04679</link>
      <description>arXiv:2605.04679v1 Announce Type: new 
Abstract: In many modern AI chips and multicore systems-on-chip, embedded applications exhibit predictable inter-core traffic behavior that can be characterized at design time. For such applications, a variety of design-time traffic management and network optimization techniques can be employed to improve NoC power and performance. To exploit this predictability, we propose a novel low-power circuit-switched NoC design. It uses the Spatial Division Multiplexing (SDM) technique to establish circuits, implemented as subsets of NoC wires, for the communication flows of a target application. To further reduce the power profile of SDM, the design incorporates a new router architecture that combines hard-wired switches with conventional programmable crossbars. The architecture is complemented by an algorithm that maps application tasks onto a mesh NoC and assigns an SDM route with adequate bit-width to each circuit built for inter-task communication flows. Compared with a conventional packet-switched NoC, the proposed approach achieves approximately 38% lower NoC power consumption, 19% smaller area, and 12% lower packet latency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04679v1</guid>
      <category>cs.AR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Meysam Zaeemi, Mehdi Modarressi</dc:creator>
    </item>
    <item>
      <title>Multi-Level Bidirectional Biomimetic Learning for EEG-Based Visual Decoding</title>
      <link>https://arxiv.org/abs/2605.04680</link>
      <description>arXiv:2605.04680v1 Announce Type: new 
Abstract: EEG-based visual neural decoding aims to align neural responses with visual stimuli for tasks such as image retrieval. However, limited paired data and a fundamental mismatch between high-fidelity digital images and biological visual perception - distorted by retinotopic mapping and subject-specific neuroanatomy - severely impede cross-modal alignment. To address this, we propose MB2L, a Multi-Level Bidirectional Biomimetic Learning framework that incorporates structured physiological inductive biases into representation learning. Specifically, we propose Adaptive Blur with Visual Priors to mitigate perceptual-structural mismatch by reweighting visual inputs according to retinotopic priors. We further propose Biomimetic Visual Feature Extraction to learn multi-level visual representations consistent with hierarchical cortical processing, enhancing subject-invariant encoding. These modules are jointly optimized via Multi-level Bidirectional Contrastive Learning, which aligns EEG and visual features in a shared semantic space through bidirectional contrastive objectives. Experiments show MB2L achieves 80.5% Top-1 and 97.6% Top-5 accuracy on zero-shot EEG-to-image retrieval, significantly outperforming prior methods and demonstrating strong generalization across subjects and experimental settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04680v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jingtao Liu, Peiliang Gong, Chuhang Zheng, Yiheng Liu, Qi Zhu</dc:creator>
    </item>
    <item>
      <title>HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction</title>
      <link>https://arxiv.org/abs/2605.04682</link>
      <description>arXiv:2605.04682v1 Announce Type: new 
Abstract: Spatial transcriptomics offers spatially resolved gene expression profiling within tissue sections, but its cost and limited throughput hinder large-scale deployment. To extend this capability to routine practice, recent computational methods aim to infer spatial gene expression directly from ubiquitous hematoxylin and eosin-stained histology slides. However, most existing models assume Cartesian or geometry-agnostic locality, despite the hexagonal sampling of widely used spot-array platforms, and point-wise regression objectives often yield over-smoothed gene expression profiles, obscuring gene-specific spatial heterogeneity. To address these, we propose HEXST, a geometry-aligned Transformer for spatial gene expression prediction from histology. HEXST operates directly on hexagonal spot coordinates to enable efficient local-to-global contextual modeling via tailored shifted-window attention mechanism and hexagonal rotary positional encoding. To enhance gene-wise spatial contrast, HEXST complements point-wise regression with a contrast-sensitive differential objective and transcriptomic priors from a pretrained single-cell foundation model during training. Across seven spatial transcriptomics datasets, HEXST consistently outperforms state-of-the-art models, providing accurate and robust spatial gene expression predictions while preserving gene-wise contrast and spatial heterogeneity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04682v1</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Keunho Byeon, Jin Tae Kwak</dc:creator>
    </item>
    <item>
      <title>Average Attention Transformers and Arithmetic Circuits</title>
      <link>https://arxiv.org/abs/2605.04683</link>
      <description>arXiv:2605.04683v1 Announce Type: new 
Abstract: We analyse the computational power of transformer encoders as sequence-to-sequence functions on vectors. We show that average hard attention can be used to simulate arithmetic circuits if they are given as an input to an encoder. The circuit families that can be simulated this way have constant depth while using unbounded addition, binary multiplication and sign gates. The transformers we use have arithmetic circuits instead of feed-forward networks. With typical average attention the functions they compute are also computed by the same class of circuit families. Our results hold for transformers over the reals, rationals and any ring in between the two.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04683v1</guid>
      <category>cs.CC</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lena Ehrmuth, Laura Strieker</dc:creator>
    </item>
    <item>
      <title>Hamiltonian Interface Dynamics for Reduced-Order Optimization of Incompressible Mixing</title>
      <link>https://arxiv.org/abs/2605.04688</link>
      <description>arXiv:2605.04688v1 Announce Type: new 
Abstract: We develop a reduced-order framework for optimizing mixing in two-dimensional incompressible flows. Instead of optimizing the full transport PDE, the method maximizes the length of advected material interfaces, leading to a finite-dimensional Hamiltonian control problem based on parametrized stream functions. We derive the continuous adjoint equations and reduced gradients, and discretize the forward and adjoint dynamics with the implicit midpoint rule. The resulting discrete adjoint is algebraically consistent with the derivative of the fully discrete objective, up to the tolerance of the nonlinear midpoint solves. The approach applies to bounded two-dimensional domains with smooth finite-dimensional stream-function parametrizations. Numerical experiments on cellular-flow and Doswell frontogenesis benchmarks show that the optimized time-dependent Hamiltonians generate near-exponential interface stretching and substantially faster decay of the $\dot{H}^{-1}$ mix-norm, in contrast with the polynomial behavior observed for stationary flows. When evaluated on a common reference transport solver, the interface-based controls produce faster $\dot{H}^{-1}$ decay than a Eulerian Sobolev-norm optimizer under a matched setup, while substantially reducing computational cost. We also identify a limitation of the reduced model: increasing the control basis may further improve the interface-length objective without yielding proportional gains in $\dot{H}^{-1}$ mixing, confirming that interface length is an effective but not fully faithful proxy for mixing in geometrically complex regimes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04688v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ziqian Li, Enrique Zuazua</dc:creator>
    </item>
    <item>
      <title>Learning Time-Inhomogeneous Markov Dynamics in Financial Time Series via Neural Parameterization</title>
      <link>https://arxiv.org/abs/2605.04690</link>
      <description>arXiv:2605.04690v1 Announce Type: new 
Abstract: Modeling the dynamics of non-stationary stochastic systems requires balancing the representational power of deep learning with the mathematical transparency of classical models. While classical Markov transition operators provide explicit, theoretically grounded rules for system evolution, their empirical estimation collapses due to severe data sparsity when applied to high-resolution, high-noise environments. We explore this statistical barrier using financial time series as a canonical, real-world testbed. To overcome the degeneracy of empirical counting, we introduce a framework that utilizes neural networks strictly as parameterization engines to generate explicit, time-varying Markov transition matrices. By constraining the neural network to output its predictions as a formal stochastic operator, we maintain complete structural interpretability. We demonstrate that these learned operators successfully capture complex regime shifts: the state-conditioned model achieves mean row heterogeneity $\bar{\rho} = 0.0073$ while the state-free ablation collapses to exactly zero, and operator row entropy correlates with realized variance at $r = -0.62$ ($p \approx 10^{-251}$), revealing that high-volatility regimes homogenize transition dynamics rather than diversify them. Furthermore, rather than enforcing the Chapman-Kolmogorov equations as a rigid structural requirement, we repurpose them as a localized diagnostic tool to pinpoint specific temporal windows where first-order memory assumptions break down. Ultimately, this framework demonstrates how neural networks can be constrained to make rigorous, classical operator analysis viable for complex real-world time series.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04690v1</guid>
      <category>cs.LG</category>
      <category>q-fin.MF</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jan Rovirosa, Jesse Schmolze</dc:creator>
    </item>
    <item>
      <title>Towards Lag Consensus with Noisy Digital Twins Perception in Second-order Multi-agent Cyber-physical Systems</title>
      <link>https://arxiv.org/abs/2605.04692</link>
      <description>arXiv:2605.04692v1 Announce Type: new 
Abstract: In this paper, we study second-order lag consensus in multi-agent cyber-physical networks subject to random noise and input failures, within a framework modeling the interactions and perceptions between physical twins and digital twins. We propose a lag consensus protocol and establish sufficient conditions for the mean-square (exponential) stability of the resulting stochastic lag error dynamics. The consensus criteria are derived via Lyapunov analysis using the It\^o formula, ensuring robustness to random perturbations and intermittent input failures. Numerical examples illustrate the effectiveness of the proposed method.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04692v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>nlin.AO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhicheng Zhang, Fausto Lizzio, Zhongjun Ma, Masaaki Nagahara</dc:creator>
    </item>
    <item>
      <title>Gray-Box Poisoning of Continuous Malware Ingestion Pipelines</title>
      <link>https://arxiv.org/abs/2605.04698</link>
      <description>arXiv:2605.04698v1 Announce Type: new 
Abstract: Modern malware detection pipelines rely on continuous data ingestion and machine learning to counter the high volume of novel threats. This work investigates a realistic gray-box poisoning threat model targeting these pipelines. Using the secml_malware framework, we generate problem-space adversarial binaries through functionality-preserving manipulations, specifically Import Address Table (IAT) and section injections. We evaluate the impact of these poisoned samples when ingested into a defender's training set for a LightGBM malware detection model. Our empirical results demonstrate that subtle IAT-based perturbations enable compact poisoning samples that significantly degrade detection recall. These findings illustrate the inherent challenge of developing low-visibility adversarial perturbations that maintain high poisoning efficacy within continuous learning systems. We further evaluate a defense mechanism based on a homogeneous ensemble, which successfully identifies and filters up to 95.6% of poisoning attempts while maintaining a high retention rate for legitimate data. These findings emphasize the necessity of robust pre-ingestion validation in production pipelines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04698v1</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jan Dolej\v{s}, Martin Jure\v{c}ek, R\'obert L\'orencz</dc:creator>
    </item>
    <item>
      <title>A Separation Between Optimal Demand-Oblivious and Demand-Aware Network Throughput</title>
      <link>https://arxiv.org/abs/2605.04699</link>
      <description>arXiv:2605.04699v1 Announce Type: new 
Abstract: The performance of distributed applications often critically depends on the interconnecting network or more specifically on its throughput: how fast data can be carried across a network. Over the last years, great progress has been made in understanding demand-oblivious throughput: how fast a given demand matrix describing pairwise communication requirements can be served on a given network. However, surprisingly little is known today about the achievable demand-aware throughput: the throughput on a network topology which can be optimized toward the demand. Such demand-aware networks have recently gained popularity in datacenters and are enabled by emerging reconfigurable optical technologies.
  In this paper, we are interested in both the achievable demand-aware throughput bounds as well as in the computational complexity of finding a throughput-optimizing network topology. We take a systematic approach and investigate four variants of demand-aware throughput: we analyze, and derive bounds for, two definitions of throughput, the classic throughput usually considered in the literature, and a new generalized definition which we call weak throughput; for each of them, we consider two routing models, a direct one, where demand can only be served on a single hop, and a general one, where multi-hop routing is allowed.
  Our main result is a separation result which solves an open problem in the literature about the classic throughput definition, showing that demand-aware topologies can outperform demand-oblivious topologies even in the worst case: the demand-aware throughput asymptotically approaches at least 5/8, while it is known that the demand-oblivious throughput is n/(2n-1), which is roughly 1/2. In terms of computational complexity, we show that computing the demand-aware weak throughput is NP-hard, but computing the demand-aware (weak) direct throughput is polynomial-time solvable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04699v1</guid>
      <category>cs.NI</category>
      <category>cs.DM</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Matthias Bentert, Chen Avin, Stefan Schmid</dc:creator>
    </item>
    <item>
      <title>Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization</title>
      <link>https://arxiv.org/abs/2605.04700</link>
      <description>arXiv:2605.04700v1 Announce Type: new 
Abstract: Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity of such dense optimization by analyzing the structure of token-aligned gradients in ALMs. We find that gradient energy is highly non-uniform across audio tokens, indicating that only a small subset of token-aligned audio regions dominates the optimization signal. Motivated by this observation, we propose Token-Aware Gradient Optimization (TAGO), which enables sparse jailbreak optimization by retaining only waveform gradients aligned with audio tokens that have high gradient energy, while masking the remaining gradients at each iteration. Across three ALMs, TAGO outperforms baselines, and substantial sparsification preserves strong attack success rates (e.g. on Qwen3-Omni, $\mathrm{ASR}_{l}$ remains at 86% with a token retention ratio of 0.25, compared to 87% with full token retention). These results demonstrate that dense waveform updates are largely redundant, and we advocate that future audio jailbreak and safety alignment research should further leverage this heterogeneous token-level gradient structure.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04700v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <category>cs.SD</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zheng Fang, Xiaosen Wang, Shenyi Zhang, Shaokang Wang, Zhijin Ge</dc:creator>
    </item>
    <item>
      <title>When Graph Traversal Meets Structured Preferences: Unified Framework and Complexity Results</title>
      <link>https://arxiv.org/abs/2605.04701</link>
      <description>arXiv:2605.04701v1 Announce Type: new 
Abstract: Preference restrictions have played a significant role in computational social choice. This paper studies a framework that connects preference restrictions with classical graph search paradigms. We model candidates as vertices of a graph and interpret the preference ordering of each voter as the outcome of traversing the graph according to a graph search. We focus on six fundamental paradigms: breadth-first search (BFS), depth-first search (DFS), breadth-first search (LexBFS), lexicographic depth-first (LexDFS), maximum cardinality search (MCS), and maximal neighborhood search (MNS).
  Within this framework, we study the problem of determining whether a given preference profile admits a graph support subject to structural restrictions, that is, whether there exists a graph such that each preference ordering can be generated by traversing the graph under the chosen paradigm. For all considered paradigms, we show that this problem is NP-hard when the graph support is required to have at most $k$ edges, where $k$ is a given integer. We further extend these hardness results to the case where the graph support is required to have maximum degree $k$. For DFS, we prove that recognizing whether a preference profile admits a tree support can be solved in polynomial time. Moreover, existing results imply polynomial-time solvability of the problem for all remaining graph traversals, except BFS and LexBFS, for which the complexity remains open.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04701v1</guid>
      <category>cs.GT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Guozhen Rong, Xin Li, Yongjie Yang</dc:creator>
    </item>
    <item>
      <title>FaithfulFaces: Pose-Faithful Facial Identity Preservation for Text-to-Video Generation</title>
      <link>https://arxiv.org/abs/2605.04702</link>
      <description>arXiv:2605.04702v1 Announce Type: new 
Abstract: Identity-preserving text-to-video generation (IPT2V) empowers users to produce diverse and imaginative videos with consistent human facial identity. Despite recent progress, existing methods often suffer from significant identity distortion under large facial pose variations or facial occlusions. In this paper, we propose \textit{FaithfulFaces}, a pose-faithful facial identity preservation learning framework to improve IPT2V in complex dynamic scenes. The key of FaithfulFaces is a pose-shared identity aligner that refines and aligns facial poses across distinct views via a pose-shared dictionary and a pose variation-identity invariance constraint. By mapping single-view inputs into a global facial pose representation with explicit Euler angle embeddings, FaithfulFaces provides a pose-faithful facial prior that guides generative foundations toward robust identity-preserving generation. In particular, we develop a specialized pipeline to curate a high-quality video dataset featuring substantial facial pose diversity. Extensive experiments demonstrate that FaithfulFaces achieves state-of-the-art performance, maintaining superior identity consistency and structural clarity even as pose changes and occlusions occur.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04702v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuanzhi Wang, Xuhua Ren, Jiaxiang Cheng, Bing Ma, Kai Yu, Sen Liang, Wenyue Li, Tianxiang Zheng, Qinglin Lu, Zhen Cui</dc:creator>
    </item>
    <item>
      <title>Entropy and Distributed Source Coding of Connected Soft Random Geometric Graphs</title>
      <link>https://arxiv.org/abs/2605.04703</link>
      <description>arXiv:2605.04703v1 Announce Type: new 
Abstract: We consider the distributed compression of Soft Random Geometric Graphs (SRGGs) above the connectivity threshold. We establish the Slepian-Wolf rate region for the SRGG in the setting where there are a finite number of encoders compressing sections of the graph independently. To do so, we prove novel limit theorems and asymptotic equipartition properties for the SRGG and its entropy, which allow us to use random binning techniques for distributed compression.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04703v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <category>math.PR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Oliver Baker, Carl P. Dettmann</dc:creator>
    </item>
    <item>
      <title>UVMarvel: an Automated LLM-aided UVM Machine for Subsystem-level RTL Verification</title>
      <link>https://arxiv.org/abs/2605.04704</link>
      <description>arXiv:2605.04704v1 Announce Type: new 
Abstract: Verification presents a major bottleneck in Integrated Circuit (IC) development, consuming nearly 70% of total effort. While the Universal Verification Methodology (UVM) improves reuse through structured verification environments, constructing subsystem-level UVM testbenches and generating high-quality stimuli still require extensive manual coding, repeated EDA tool runs, and deep protocol and micro-architectural expertise. We present UVMarvel, an automated verification framework that leverages Large Language Models (LLMs) to build UVM testbenches for subsystem-level RTL.UVMarvel introduces an Intermediate Representation (IR) and a Bus Protocol Library to translate heterogeneous specifications into protocol-correct subsystem-level UVM testbenches, and employs a Signal Tracker and a Verilog Patching Library to guide LLM-based stimuli refinement. UVMarvel is the first framework capable of automatically constructing subsystem-level UVM testbenches across mainstream bus protocols, and it achieves an average code coverage of 95.65%, while reducing verification time from several human working days to a 4.5-hour automated execution.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04704v1</guid>
      <category>cs.AR</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Junhao Ye, Dingrong Pan, Hanyuan Liu, Yuchen Hu, Jie Zhou, Ke Xu, Xinwei Fang, Xi Wang, Nan Guan, Zhe Jiang</dc:creator>
    </item>
    <item>
      <title>Vol-Mark: A Watermark for 3D Medical Volume Data Via Cubic Difference Expansion and Contrastive Learning</title>
      <link>https://arxiv.org/abs/2605.04705</link>
      <description>arXiv:2605.04705v1 Announce Type: new 
Abstract: Today, advances in medical technology extensively utilize 3D volume data for accurate and efficient diagnostics. However, sharing these data across networks in telemedicine poses significant security risks of data tampering and unauthorized copying. To address these challenges, this paper proposes a novel reversible-zero watermarking approach, termed Vol-Mark, for medical volume data to protect their ownership and authenticity in telemedicine. The proposed Vol-Mark method offers two key benefits: 1) it designs a volume data feature extractor that leverages contrastive learning to efficiently extract discriminative and stable volumetric features, ensuring robustness against 3D attacks; 2) it introduces the cubic difference expansion (c-DE) technique, which leverages the 3D integer wavelet transform to embed watermark bits into neighboring voxels within cubes at low-frequency coefficients. The voxel differences within each cube are expanded to create embedding space, and a majority voting mechanism is employed during extraction to enhance reliability. The embedding process incurs low distortion and supports lossless removal, thereby preserving the integrity and diagnostic accuracy of medical volume data. Through these two benefits, Vol-Mark enables both integrity verification and ownership verification. Integrity verification is first performed, and ownership verification through hypothesis testing is further conducted to enhance reliability, particularly under data tampering or watermark removal attacks. Comprehensive experimental results show the effectiveness of the proposed method and its superior robustness against conventional, geometric, and hybrid attacks on medical volume data. In particular, through multiple tasks evaluations, Vol-Mark consistently achieves an ACC above 0.90 in most attack scenarios, outperforming existing methods by a clear margin.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04705v1</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiangnan Zhu, Yuntao Wang, Shengli Pan, Yujie Gu</dc:creator>
    </item>
    <item>
      <title>Differentiable Chemistry in PINNs for Solving Parameterized and Stiff Reaction Systems</title>
      <link>https://arxiv.org/abs/2605.04708</link>
      <description>arXiv:2605.04708v1 Announce Type: new 
Abstract: From neural ODEs to continuous-time machine learning, differentiable solvers allow physics, optimization, and simulation to become trainable components within deep learning systems. This has opened the path to a new generation of deep learning frameworks for scientific computing, with many promising applications still emerging. In this paper, we integrate a differentiable chemistry solver into a modified physics-informed neural network to solve parameterized reaction systems that are inherently stiff. The proposed framework introduces several key components required to overcome limitations of standard physics-informed neural networks. These include a differentiable chemistry solver, a network architecture for parameterized solutions, and residual weighting tailored to stiff reactions. We evaluate the framework on a set of differential equations related to hydrogen combustion, which include initial/boundary value problems, inverse parameter identification, and a parameterized partial differential equation. Our results highlight the ability of the proposed approach to extend physics-informed neural networks to stiff chemical systems that were previously inaccessible.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04708v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Milo\v{s} Babi\'c, Franz M. Rohrhofer, Stefan Posch</dc:creator>
    </item>
    <item>
      <title>ELVIS: Ensemble-Calibrated Latent Imagination for Long-Horizon Visual MPC</title>
      <link>https://arxiv.org/abs/2605.04709</link>
      <description>arXiv:2605.04709v1 Announce Type: new 
Abstract: A central challenge of visual control with model-based reinforcement learning (RL) is reliable long-horizon planning: long rollouts with learned latent dynamics exhibit branching futures and multi-modal action-value distributions. In addition, compounding model errors amplified by visual occlusions make deep imagination brittle. We present ELVIS, a latent model predictive controller (MPC) designed to make long-horizon planning practical. ELVIS plans in a Dreamer-style recurrent state space model (RSSM) and replaces standard unimodal model predictive path integral (MPPI) with a Gaussian-mixture MPPI that maintains multiple coherent hypotheses over long horizons, avoiding mode averaging under branching rollouts. In parallel, ELVIS stabilizes deep imagination with a shared uncertainty-aware lambda-return: an ensemble of latent critics defines an upper-confidence-bound (UCB) score that gates a time-varying lambda, adaptively trading off bootstrapping versus look-ahead to limit compounding error during planning. The same return is used both to train an actor-critic prior from imagined rollouts and to score candidate trajectories inside GMM-MPPI, aligning RL objectives with the planner's long-horizon optimization. On fourteen DeepMind Control Suite visual tasks, ELVIS establishes state-of-the-art performance compared with TD-MPC2 and DreamerV3. Finally, ELVIS transfers zero-shot to a real-world sand-spraying task with severe occlusions, improving surface-quality metrics and demonstrating robustness beyond simulation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04709v1</guid>
      <category>cs.LG</category>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yurui Du, Pinhao Song, Yutong Hu, Renaud Detry</dc:creator>
    </item>
    <item>
      <title>Budget-aware Auto Optimizer Configurator</title>
      <link>https://arxiv.org/abs/2605.04711</link>
      <description>arXiv:2605.04711v1 Announce Type: new 
Abstract: Optimizer states occupy massive GPU memory in large-scale model training. However, gradients in different network blocks exhibit distinct behaviors, such as varying directional stability and scale anisotropy, implying that expensive optimizer states are not universally necessary and using a global optimizer is often memory-inefficient. We propose the Budget-Aware Optimizer Configurator (BAOC) to reduce memory cost by assigning suitable optimizer configurations to individual blocks under given budgets. Specifically, BAOC samples gradient streams to derive statistical metrics that quantify the potential performance risk of applying cheaper configurations (e.g., low precision or removing momentum). It then solves a constrained allocation problem to minimize total risk under memory and time budgets, selecting a budget-feasible configuration for each block. Experiments across vision, language, and diffusion workloads demonstrate that BAOC maintains training quality while significantly reducing the memory usage of optimizer states. The code is available at https://anonymous.4open.science/r/BAOC-45C6.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04711v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kang Liu, Wei Peng, Jianchen Hu</dc:creator>
    </item>
    <item>
      <title>SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2605.04712</link>
      <description>arXiv:2605.04712v1 Announce Type: new 
Abstract: In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixture-of-Experts (MoE) networks have been reported to enable scaling laws and facilitate the learning of diverse skills. However, in continual reinforcement learning settings, their performance can degenerate as learning proceeds, indicating a loss of plasticity. To address this, building on Neural Tangent Kernel (NTK) theory, we formalize the plasticity loss in MoE policies as a loss of spectral plasticity. We then derive a tractable proxy for spectral plasticity, one expressible in terms of individual expert feature matrices. Leveraging this proxy, we introduce SPHERE, a practical Parseval penalty tailored for MoE-based policies that alleviates the loss of spectral plasticity. On MetaWorld and HumanoidBench, SPHERE improves average success under continual RL by 133% and 50% over an unregularized MoE baseline, while maintaining higher spectral plasticity throughout training.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04712v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lirui Luo, Guoxi Zhang, Hongming Xu, Cong Fang, Qing Li</dc:creator>
    </item>
    <item>
      <title>Not Every Subject Should Stay: Machine Unlearning for Noisy Engagement Recognition</title>
      <link>https://arxiv.org/abs/2605.04713</link>
      <description>arXiv:2605.04713v1 Announce Type: new 
Abstract: Engagement recognition datasets are typically subject-indexed and often contain noisy, subjective supervision, making post-hoc dataset revision a practical problem. Existing noisy-label and data-cleaning methods largely operate at the sample level before or during training, but do not directly address a different question: once a model has already been trained, can the influence of an entire problematic subject be removed without full retraining? We study this setting through subject-level machine unlearning as a post-hoc sanitization mechanism for engagement recognition. Starting from a baseline trained on all subjects, we rank candidate harmful subjects using a model-dependent proxy, apply a lightweight approximate unlearning update, and compare the result against an oracle model retrained from scratch on the retained subjects only. We instantiate this protocol on DAiSEE and EngageNet using Tensor-Convolution and Convolution-Transformer Network (TCCT-Net) as a fixed platform and evaluate three matched model states under the same removal scenario: baseline, unlearned, and oracle. In representative K=3 forget-set settings, the unlearned model recovers 89.3% and 92.5% of the oracle gain on EngageNet and DAiSEE, respectively, at roughly one quarter of retraining cost. Across the tested small-audit regimes, effectiveness is strongest at an intermediate forget-set size, indicating that approximate subject-level unlearning is a useful low-cost correction mechanism, but one whose benefit depends on subject selection quality and removal regime.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04713v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alexander Vedernikov</dc:creator>
    </item>
    <item>
      <title>On the Complexity of Minimum Riesz s-Energy Subset Selection in Euclidean and Ultrametric Spaces</title>
      <link>https://arxiv.org/abs/2605.04715</link>
      <description>arXiv:2605.04715v1 Announce Type: new 
Abstract: We study the computational complexity of exact cardinality-constrained minimum Riesz $s$-energy subset selection in finite metric spaces: given $n$ points, select $k&lt;n$ points of minimum Riesz $s$-energy. The objective sums inverse-power pair interactions and therefore promotes well-separated subsets; as $s$ becomes large, it increasingly approaches a bottleneck criterion governed by the closest selected pair, linking it to minimum pairwise distance (MPD). Building on the general-metric NP-hardness result of Pereverdieva et al. (2025), we prove that NP-hardness persists for point sets in the Euclidean plane when $s$ is part of the input. In contrast, finite ultrametric spaces form an exact tractable regime: on rooted binary ultrametric trees with $n$ leaves, an optimal size-$k$ subset can be computed by dynamic programming in $O(nk^2)$ time. We also discuss the ordered one-dimensional Euclidean case, where the classical MPD objective admits simple dynamic programming, but the additive Riesz energy does not appear to allow the same state compression. Finally, we explain why one natural route to fixed-$s$ Euclidean hardness does not close: Fowler-style 3SAT gadgets, together with zeta-function bounds for far-field interactions, show why this approach still requires an exponent depending on $k$. Together, these results provide a compact complexity landscape for a natural diversity or dispersion objective, distinguishing Euclidean hardness, ultrametric tractability, and the ordered one-dimensional case.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04715v1</guid>
      <category>cs.CG</category>
      <category>cs.CC</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Michael T. M. Emmerich, Ksenia Pereverdieva, Andr\'e Deutz</dc:creator>
    </item>
    <item>
      <title>On Minimum CADs for Algebraic Sets in Dimension Three</title>
      <link>https://arxiv.org/abs/2605.04718</link>
      <description>arXiv:2605.04718v1 Announce Type: new 
Abstract: Cylindrical Algebraic Decomposition (CAD) algorithms typically produce a decomposition adapted to a finite family of semi-algebraic sets $\mathcal{F}$ (i.e. every member of $\mathcal{F}$ is a union of cells). Different algorithms may produce different outputs, and introduce unnecessary cell divisions. Recent work by Michel, Mathonet, and Z\'ena\"idi in ISSAC 2024 formalised this issue by studying the refinement order on the set of all CADs adapted to $\mathcal{F}$ and analysing the existence of a minimum (coarsest) adapted CAD. It was shown that such a minimum adapted CAD always exists for subsets of $\mathbb{R}$ and $\mathbb{R}^2$, but not of $\mathbb{R}^n$ ($n \geqslant 3$) in general.
  It is natural to seek natural classes of subsets of $\mathbb{R}^n$ that admit a minimum adapted CAD. In this paper, we identify a class of subsets of $\mathbb{R}^3$ that contains all algebraic sets for which minimum adapted CADs do exist. This provides the first positive existence theorem for minimum CAD for a non-trivial class of sets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04718v1</guid>
      <category>cs.SC</category>
      <category>math.AG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lucas Michel</dc:creator>
    </item>
    <item>
      <title>Every Step Counts: Step-Level Credit Assignment for Tool-Integrated Text-to-SQL</title>
      <link>https://arxiv.org/abs/2605.04719</link>
      <description>arXiv:2605.04719v1 Announce Type: new 
Abstract: Tool-integrated Text-to-SQL parsing has emerged as a promising paradigm, framing SQL generation as a sequential decision-making process interleaved with tool execution. However, existing reinforcement learning approaches mainly rely on coarse-grained outcome supervision, resulting in a fundamental credit assignment problem: models receive the same reward for any trajectory that yields the correct answer, even when intermediate steps are redundant, inefficient, or erroneous. Consequently, models are encouraged to explore suboptimal reasoning spaces, limiting both efficiency and generalization. To address this problem, we propose FineStep, a novel framework for step-level credit assignment in tool-augmented Text-to-SQL. First, we introduce a reward design with independent process rewards to alleviate the signal sparsity of outcome supervision. Next, we present a step-level credit assignment mechanism to precisely quantify the value of each reasoning step. Finally, we develop a policy optimization method based on step-level advantages for efficient updates. Extensive experiments on BIRD benchmarks show that FineStep achieves state-of-the-art performance and reduces redundant tool interactions, with a 3.25% average EX gain over GRPO at the 4B scale.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04719v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yaxun Dai, Baolin Sun, Junying Wang, Pengfei Wang, Yingqi Gao, Xuemei Dong, Mengdie Chu, Xiang Qi, Pingfu Chao</dc:creator>
    </item>
    <item>
      <title>A Framework of Secure Source Coding using Mutual Information Security Criterion: Universal Coding, Strong Converse Theorem</title>
      <link>https://arxiv.org/abs/2605.04720</link>
      <description>arXiv:2605.04720v1 Announce Type: new 
Abstract: In this paper, we propose a framework of source encryption, where cryptographic processing is applied to a prescribed fixed length source code. The proposed source encryption framework is based on the secure communication framework of the Shannon cipher system. In the proposed framework, we use the mutual information as a measure of information leakage to an adversary. For the proposed framework, we explicitly establish the necessary and sufficient condition for reliable and secure communication under the condition that error probability and information leakage, respectively, are upper bounded by prescribed constants $\epsilon\in (0,1)$ and $\delta \in (0,\infty)$. We also show that the obtained necessary and sufficient condition does not depend on the constants $\epsilon\in (0,1)$ and $\delta\in (0,\infty)$, demonstrating that we have the strong converse theorem for the proposed framework of source encryption. We further prove the existence of encryption/decryption schemes, which are universal in the sense that they work effectively for any distributions of the plain text and those of the key used for the encryption.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04720v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yasutada Oohama, Bagus Santoso</dc:creator>
    </item>
    <item>
      <title>Exact Dual Geometry of SOC-ICNN Value Functions</title>
      <link>https://arxiv.org/abs/2605.04722</link>
      <description>arXiv:2605.04722v1 Announce Type: new 
Abstract: Input Convex Neural Networks (ICNNs) are commonly used in a two-stage manner: one first trains a convex network and then minimizes it over its input in a downstream inference problem. Recent second-order-cone ICNNs (SOC-ICNNs) enrich ReLU-based ICNNs with quadratic and conic modules and admit an exact representation as value functions of second-order cone programs (SOCPs). This value-function structure enables an explicit convex-analytic treatment of SOC-ICNN inference. In this paper, we study the exact first-order and local second-order geometry of SOC-ICNNs from the dual viewpoint. We show that supporting slopes, subdifferentials, directional derivatives, and local Hessians can be recovered directly from optimal dual variables. These results provide the geometric primitives for white-box SOC-ICNN inference, going beyond black-box automatic differentiation. Numerical experiments validate the exact multiplier readout, the local Hessian formula, and the set-valued behavior at structurally degenerate inputs. We also provide a step-by-step tutorial showing how the readout mechanism instantiates a complete white-box inference loop. The code is available at https://anonymous.4open.science/r/SOC-ICNN-Theory-BEFC/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04722v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kang Liu, Jianchen Hu, Wei Peng</dc:creator>
    </item>
    <item>
      <title>Rethinking Convolutional Networks for Attribute-Aware Sequential Recommendation</title>
      <link>https://arxiv.org/abs/2605.04723</link>
      <description>arXiv:2605.04723v1 Announce Type: new 
Abstract: Attribute-aware sequential recommendation entails predicting the next item a user will interact with based on a chronologically ordered history of past interactions, enriched with item attributes. Existing methods typically leverage self-attention mechanisms to aggregate the entire sequence into a unified representation used for next-item prediction. While effective, these models often suffer from high computational complexity and memory consumption, limiting their ability to process long user histories. This constraint restricts the model's capacity to fully capture long-term user preferences. In some scenarios, modeling item interactions purely through attention may also not be the most effective approach to extract sequential patterns. In this work, we propose ConvRec, an alternative method with linear computational and memory complexity that employs convolutional layers in a hierarchical, down-scaled fashion to generate compact, yet expressive sequence representations. To further enhance the model's ability to capture diverse sequential patterns, each layer aggregates the neighboring items gradually to reach a comprehensive sequence representation. Extensive experiments on four real-world datasets demonstrate that our approach outperforms state-of-the-art sequential recommendation models, highlighting the potential of convolution-based architectures for efficient and effective sequence modeling in recommendation systems. Our implementation code and datasets are available here https://github.com/ismll-research/ConvRec.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04723v1</guid>
      <category>cs.IR</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shereen Elsayed, Ngoc Son Le, Ahmed Rashed, Lars Schmidt-Thieme</dc:creator>
    </item>
    <item>
      <title>From Beats to Breaches:How Offensive AI Infers Sensitive User Information from Playlists</title>
      <link>https://arxiv.org/abs/2605.04724</link>
      <description>arXiv:2605.04724v1 Announce Type: new 
Abstract: The pervasive integration of AI has enabled Offensive AI: the exploitation of AI for malicious ends across the cyber-kill chain. A critical manifestation is the user attribute inference attack, where AI infers sensitive Personally Identifiable Information (PII) from innocuous public data. We explore how music streaming ecosystems, where users routinely release public playlists, can be exploited for Offensive AI. To quantify this threat, we developed musicPIIrate. This novel tool leverages deep learning architectures that utilize both standalone data representations and the structural information embedded in a user's playlist collection. Our design explores set-based approaches (e.g., Deep Sets) and methodologies modeling relationships between playlists (e.g., Graph Neural Networks), which we also combine to leverage both perspectives. Our approach addresses feature extraction from unordered, variable-length set data, enabling accurate PII prediction.
  Empirical evaluation demonstrates that musicPIIrate achieves state-of-the-art inference accuracy. The tool successfully infers a wide array of attributes, including: Demographics (Age, Country, Gender), Habits (Alcohol, Smoke, Sport), and Personality Traits (OCEAN scores). musicPIIrate outperforms existing methods, beating baselines in 9 out of 15 attribute inference tasks. To counter this vulnerability, we propose JamShield, a lightweight defensive framework. JamShield strategically injects dummy playlists into an account to dilute the PII-carrying signal. Our analysis indicates that JamShield represents a promising defense, lowering inference F1-scores by an average of 10%. This work provides an initial Offensive-AI benchmark for playlist-based PII inference using architectures that leverage set- and graph-structured data and introduces a defense showing encouraging mitigation effects.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04724v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Stefano Cecconello, Mauro Conti, Luca Pajola, Luca Pasa, Pier Paolo Tricomi</dc:creator>
    </item>
    <item>
      <title>RecGPT-Mobile: On-Device Large Language Models for User Intent Understanding in Taobao Feed Recommendation</title>
      <link>https://arxiv.org/abs/2605.04726</link>
      <description>arXiv:2605.04726v1 Announce Type: new 
Abstract: Predicting a user's next search query from recent interaction behaviors is a critical problem in modern e-commerce systems, particularly in scenarios where user intent evolves rapidly. Large Language Models (LLMs) offer strong semantic reasoning capabilities and have recently been adopted to enhance training data construction for next-query prediction. However, due to resource constraints on mobile devices, existing applications are deployed on cloud servers, resulting in high inference costs. In this paper, we propose RecGPT-Mobile, a framework that designs a lightweight LLM-based intent understanding agent to improve recommendation quality in mobile e-commerce scenarios. By deploying LLMs directly on mobile devices, our approach can capture evolving interests of users more quickly and adjust the recommendation results in real time. Extensive offline analyses and online experiments demonstrate that our method significantly improves the accuracy of recommendation results, laying a practical path for LLM deployment in production-scale recommendation systems on mobile devices, as well as a scalable solution for integrating LLMs into real-world next-query prediction systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04726v1</guid>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1145/3805712.3808410</arxiv:DOI>
      <dc:creator>Bin Zhang, Weipeng Huang, Dimin Wang, Jialin Zhu, Yuning Jiang, Zhaode Wang, Chengfei Lv, Jian Wang, Qichao Ma, Li Chen, Junqing Wu, Yipeng Yu</dc:creator>
    </item>
    <item>
      <title>Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols</title>
      <link>https://arxiv.org/abs/2605.04727</link>
      <description>arXiv:2605.04727v1 Announce Type: new 
Abstract: Programming Knowledge Tracing (PKT) has recently advanced through hybrid approaches that integrate attention-based feature modeling for code representation with RNN-based sequential prediction. While these models report strong empirical performance, their reliability can be sensitive to subtle implementation and experimental design choices. This study revisits representative PKT models and shows that reported gains can be substantially influenced by model configuration and sequence construction practices. We identify issues in attention dimension settings that affect performance estimates, and demonstrate that improper ordering of student attempts, such as ignoring ServerTimestamp, can violate temporal causality and lead to overly optimistic results. To ensure consistent evaluation, hyperparameters are selected via grid search guided by a single designated fold and then fixed uniformly across all folds during cross-validation. We further analyze the role of assignment-wise characteristics and systematically explore the impact of maximum sequence length. Using this protocol, we re-evaluate PKT models on the CodeWorkout dataset. Our results show that, under controlled and consistent settings, the performance gap between attention-enhanced models and standard DKT is significantly reduced, and increased architectural complexity does not consistently translate into superior performance. Beyond individual model comparisons, this work provides practical guidance for reliable and comparable evaluation in programming knowledge tracing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04727v1</guid>
      <category>cs.LG</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jaewook Kim, Hyeoncheol Kim</dc:creator>
    </item>
    <item>
      <title>Anny-Fit: All-Age Human Mesh Recovery</title>
      <link>https://arxiv.org/abs/2605.04728</link>
      <description>arXiv:2605.04728v1 Announce Type: new 
Abstract: Recovering 3D human pose and shape from a single image remains a cornerstone of human-centric vision, yet most methods assume adult subjects and optimize each person independently. These assumptions fail in real-world, all-age scenes, where body proportions and depth must be resolved jointly. We introduce Anny-Fit, a multi-person, camera-space optimization framework for all-age 3D human mesh recovery (HMR). Unlike existing per-person fitting methods, Anny-Fit jointly optimizes all individuals directly in the camera coordinate system, enforcing global spatial consistency. At the core of our approach is the use of multiple forms of expert knowledge -- including metric depth maps, instance segmentation, 2D keypoints, and, VLM-derived semantic attributes such as age and gender -- each obtained from dedicated off-the-shelf networks. These complementary signals jointly guide the optimization, constraining the depth-scale ambiguity characteristic of all-age scenes. Across diverse datasets, Anny-Fit consistently improves 2D reprojection accuracy (+13 to 16), relative depth ordering (+6 to 7), 3D estimation error (-9 to -29) and shape estimation (+25 to +82), producing more coherent scenes. Finally, we show that VLM-based semantic knowledge can be distilled into an HMR model via the pseudo-ground-truth annotations produced by Anny-Fit on training data, enabling it to learn semantically meaningful shape parameters while improving HMR performance. Our approach bridges adult-only and all-age modeling by enabling zero-shot adaptation of adult-trained HMR pipelines to the full age spectrum without retraining. Code is publicly available at https://github.com/naver/anny-fit.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04728v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Laura Bravo-S\'anchez, Matthieu Armando, Romain Br\'egier, Gr\'egory Rogez, Serena Yeung-Levy, Fabien Baradel</dc:creator>
    </item>
    <item>
      <title>AISSA: Implementation and Deployment of an AI-based Student Slides Analysis tool for Academic Presentations</title>
      <link>https://arxiv.org/abs/2605.04729</link>
      <description>arXiv:2605.04729v1 Announce Type: new 
Abstract: Providing timely and actionable feedback on oral presentation slides is challenging in higher education, particularly in large classes where teachers cannot realistically deliver detailed formative feedback before students present. This paper introduces AISSA (AI-based Student Slides Analysis tool), a web-based system that combines large language models (LLMs) and Learning Analytics dashboards to support scalable, rubric-based feedback on presentation slides. AISSA allows students to upload their slide decks prior to an oral presentation and automatically receive quantitative scores and qualitative feedback based on teacher-defined evaluation rubrics. The system analyzes both slide-level features and slide content, generates structured feedback through an LLM (ChatGPT 5.2), and presents the results through interactive dashboards for students and teachers. We tested AISSA on a pilot deployment with 46 undergraduate students in a real academic setting. The results indicate that AISSA is technically reliable, economically feasible, and perceived by students as useful for iterative slide improvement. These findings suggest that combining LLM-based analysis with Learning Analytics dashboards is a promising approach for supporting formative feedback on presentation slides at scale.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04729v1</guid>
      <category>cs.HC</category>
      <category>cs.AI</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Alvaro Becerra, Diego Gomez, Ruth Cobos</dc:creator>
    </item>
    <item>
      <title>ULF-Loc: Unbiased Landmark Feature for Robust Visual Localization with 3D Gaussian Splatting</title>
      <link>https://arxiv.org/abs/2605.04730</link>
      <description>arXiv:2605.04730v1 Announce Type: new 
Abstract: Visual localization is a core technology for augmented reality and autonomous navigation. Recent methods combine the efficient rendering of 3D Gaussian Splatting (3DGS) with feature-based localization. These methods rely on direct matching between 2D query features and the 3D Gaussian feature field, but this often results in mismatches due to an inherent bias in the learned Gaussian feature. We theoretically analyze the feature learning process in 3DGS, revealing that the widely adopted $\alpha$-blending optimization inherently introduces bias into 3D point features. This bias stems from the entanglement between individual Gaussians and their neighboring Gaussians, making the learned features unsuitable for precise matching tasks. Motivated by these findings, we propose ULF-Loc, an unbiased landmark feature framework that replaces biased feature optimization with geometry-weighted feature fusion. We further introduce keypoint-consensus landmark sampling to select reliable Gaussians and local geometric consistency verification to reject mismatches caused by rendering artifacts. On the Cambridge Landmarks dataset, ULF-Loc reduces the mean median translation error by 17\% compared to the state-of-the-art, while achieving superior efficiency with only 1/10 the training time and 1/6 the GPU memory of STDLoc.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04730v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yingdong Gu, Shaocheng Yan, Zhenjun Zhao, Yuan Kou, Jianxin Luo, Pengcheng Shi, Jiayuan Li</dc:creator>
    </item>
    <item>
      <title>Morphology-Guided Cross-Task Coupling for Joint Building Height and Footprint Estimation</title>
      <link>https://arxiv.org/abs/2605.04731</link>
      <description>arXiv:2605.04731v1 Announce Type: new 
Abstract: Building height (BH) and building footprint (BF) jointly describe the vertical and horizontal extent of the built environment and are required inputs for urban climate, disaster-risk, and population-mapping models. The two parameters are coupled through floor-area-ratio (FAR) constraints, yet remote-sensing approaches typically treat them as independent regression targets. We argue that explicitly encoding this cross-task coupling is more impactful than further refining individual encoders, and propose MorphoFormer, a joint BH/BF estimation framework built around two complementary mechanisms: (i) a BF-Guided Task Decoder (BGTD) that gates the height branch via cross-attention on a footprint-derived morphology context, and (ii) a Morphology Consistency Loss (MCL) that supervises a height-from-footprint surrogate against the ground-truth BH, indirectly forcing the BF feature to encode height-correlated structure. The encoder is a single-stage Swin backbone fed by Sentinel-1 SAR, Sentinel-2 multispectral, and DEM inputs, trained and evaluated on a geo-blocked split of 54 cities. Against a Swin-MTL baseline at identical receptive field, MorphoFormer reduces BH test RMSE from 3.39 to 3.15 m (R^2 improves 0.62 -&gt; 0.67) with BF R^2 stable at 0.80. Controlled ablations at identical capacity attribute most of this 0.24 m improvement to the two proposed mechanisms: removing BGTD raises BH RMSE by 0.11 m and removing MCL raises it by 0.11 m, with the residual approximately 0.02 m falling within the noise floor of encoder-side variations. Because both mechanisms act on cross-task representations rather than pixels, the design carries no intrinsic dependence on input resolution.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04731v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jinzhen Han, JinByeong Lee, Jisung Kim, HongSik Yun</dc:creator>
    </item>
    <item>
      <title>Using Common Random Numbers for Simulation-based Planning with Rollouts</title>
      <link>https://arxiv.org/abs/2605.04732</link>
      <description>arXiv:2605.04732v1 Announce Type: new 
Abstract: Simulation-based planning with rollouts is a widely-deployed technique for decision making in stochastic environments. The primary instrument of simulation-based planning is a sampling model, which is repeatedly called to generate trajectories and estimate the utilities of available actions. Among the actions thus explored, one with the maximum estimated utility is then executed. In this paper, we examine the effect of using common random numbers in the simulation process. We obtain a simple recipe for (provably) reducing variance in relative utility when simulations invoke a rollout policy beyond some depth. Experiments on synthetic tasks confirm that our scheme improves task performance. The broader significance of our innovation is apparent from two practical applications: (1) single-step lookahead planning in a pension-disbursement task, and (2) a deployment of the well-known UCT algorithm for the game of Ludo.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04732v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:journal_reference>Reinforcement Learning Journal 2026</arxiv:journal_reference>
      <dc:creator>Sandarbh Yadav, Frederic J Maliakkal, Harshad Khadilkar, Shivaram Kalyanakrishnan</dc:creator>
    </item>
    <item>
      <title>Reward-Decomposed Reinforcement Learning for Immersive Video Role-Playing</title>
      <link>https://arxiv.org/abs/2605.04733</link>
      <description>arXiv:2605.04733v1 Announce Type: new 
Abstract: Text-based role-playing models can imitate character styles, yet they often fail to reflect a scene's atmosphere and evolving tension, both essential for immersive applications such as Virtual Reality (VR) games and interactive narratives. We study video-grounded role-playing dialogue and introduce EBM-RL (Eye-Brain-Mouth Reinforcement Learning), a decoupled GRPO-based framework that explicitly separates observation ([perception]), reasoning ([think]), and utterance ([answer]). This structure promotes human-like sensory grounding by compelling the model to first attend to visual cues, then form internal interpretations, and finally generate context-appropriate dialogue.
  EBM-RL integrates four complementary rewards: (i) CLIP-based scene-text alignment to improve ambiance and emotion; (ii) a Perceptual-Cognitive reward that encourages [perception] and [think] processes that increase the likelihood of the reference response; (iii) answer accuracy to ensure faithfulness; and (iv) a dense format reward to enforce the desired structured output.
  Extensive experiments demonstrate that EBM-RL substantially outperforms text-only role-playing baselines and larger-scale vision-language models on our immersive role-playing benchmark, delivering simultaneous gains in visual-atmosphere consistency and character authenticity. Beyond the role-playing domain, EBM-RL also exhibits strong zero-shot generalization: without any additional fine-tuning, it consistently improves performance on out-of-domain VideoQA benchmarks. We additionally release an open-source dataset for video-grounded role-playing dialogue.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04733v1</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Miao Wang, Yuling Shi, Yijiang Li, Yeheng Chen, Xiaodong Gu, Bin Li, Bo Gao, Yaduan Ruan</dc:creator>
    </item>
    <item>
      <title>Sequential topology optimization: SIMP initialization for level-set boundary refinement</title>
      <link>https://arxiv.org/abs/2605.04735</link>
      <description>arXiv:2605.04735v1 Announce Type: new 
Abstract: Density-based topology optimization methods such as SIMP enable efficient topological exploration but produce diffuse material boundaries that require interpretation before manufacturing. Level-set methods maintain sharp interfaces but are sensitive to the initial design. This paper presents a sequential framework that addresses these complementary limitations through a signed distance function (SDF)-based geometry transfer, formulated for three-dimensional meshes. The SIMP density distribution is converted into an SDF that initializes subsequent level-set boundary refinement. From the level-set perspective, the SIMP-derived initialization mitigates sensitivity to the initial design. From the SIMP perspective, the level-set stage acts as optimization-driven post-processing that produces manufacturing-ready boundaries. Validation on three-dimensional cantilever and MBB benchmarks demonstrates compliance comparable to standalone level-set optimization, with up to 4.6x wall-clock speedup on the cantilever case. The full implementation is released under an open-source license to support reproducibility.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04735v1</guid>
      <category>cs.CE</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ond\v{r}ej Je\v{z}ek (Institute of Thermomechanics, Czech Academy of Sciences, Praha, Czech Republic, Faculty of Mechanical Engineering, Czech Technical University in Prague, Praha, Czech Republic), J\'an Kopa\v{c}ka (Institute of Thermomechanics, Czech Academy of Sciences, Praha, Czech Republic), Martin Isoz (Institute of Thermomechanics, Czech Academy of Sciences, Praha, Czech Republic), Du\v{s}an Gabriel (Institute of Thermomechanics, Czech Academy of Sciences, Praha, Czech Republic)</dc:creator>
    </item>
    <item>
      <title>OSAQ: Outlier Self-Absorption for Accurate Low-bit LLM Quantization</title>
      <link>https://arxiv.org/abs/2605.04738</link>
      <description>arXiv:2605.04738v1 Announce Type: new 
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities. However, their massive parameter scale leads to significant resource consumption and latency during inference. Post-training weight-only quantization offers a promising solution by reducing model size and accelerating token generation through alleviating the memory-bound issue. Nevertheless, the presence of inherent systematic outliers in weights continues to be a major obstacle. While existing methods, such as scaling and rotation, attempt to address this issue, the performance remains unsatisfactory. In this paper, we propose Outlier Self-Absorption Quantization (OSAQ), which performs additive weight suppression guided by the second-order low-rank property for low-bit weight-only quantization of LLMs. Specifically, we observe that the Hessian exhibits low-rank consistency across different inputs, with certain directions consistently showing vanishing curvature. Leveraging this property, we identify a stable null space of the Hessian and then construct an additive weight transformation by linearly combining the vectors within this null space, thereby suppressing weight outliers without affecting the task loss. This additive transformation can be absorbed into the weights offline, requiring no inter-layer transformations and introducing no inference overhead. Moreover, the construction is efficiently achieved by a closed-form solution, without resource-intensive training or iterative procedures. Extensive experiments demonstrate that OSAQ effectively suppresses outliers and enhances low-bit quantization performance. For instance, in 2-bit quantization, OSAQ, when integrated with GPTQ, achieves over 40% lower perplexity compared to vanilla GPTQ.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04738v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhikai Li, Zhen Dong, Xuewen Liu, Jing Zhang, Qingyi Gu</dc:creator>
    </item>
    <item>
      <title>AICoFe: Implementation and Deployment of an AI-Based Collaborative Feedback System for Higher Education</title>
      <link>https://arxiv.org/abs/2605.04740</link>
      <description>arXiv:2605.04740v1 Announce Type: new 
Abstract: Effective peer feedback is essential for developing critical reflection in higher education, yet its impact is often limited by the inconsistent quality of student-generated comments. This paper presents the implementation and deployment of AICoFe (AI-based Collaborative Feedback), a system designed to bridge this gap through a human-centered AI approach. We describe a modular architecture that orchestrates a multi-LLM pipeline, utilizing GPT-4.1-mini, Gemini 2.5 Flash, and Llama 3.1, to synthesize quantitative rubric data and qualitative observations into coherent, actionable feedback. Key to the system is a "teacher-in-the-loop" mediation workflow, where educators use specialized Learning Analytics dashboards to curate and refine AI-generated drafts before delivery. Furthermore, we detail the underlying data infrastructure, which employs a hybrid SQL and MongoDB strategy to ensure traceability and manage semi-structured feedback versions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04740v1</guid>
      <category>cs.HC</category>
      <category>cs.AI</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Alvaro Becerra, Alejandra Palma, Ruth Cobos</dc:creator>
    </item>
    <item>
      <title>Hierarachical Multiagent Reinforcement Learning for Multi-Group Tax Game</title>
      <link>https://arxiv.org/abs/2605.04741</link>
      <description>arXiv:2605.04741v1 Announce Type: new 
Abstract: Reinforcement learning has increasingly been used to study economic decision-making, such as taxation, public spending, and labour supply. However, most existing RL-based economic models focus on a single government--household group, thereby overlooking the strategic interactions that arise when multiple governments compete while managing their own populations. In practice, many economic systems (e.g., taxation) exhibit a multi-group structure, where each government must optimize its fiscal policy in response not only to household behaviour within its jurisdiction, but also to the policies of other competing governments. To capture this structure, we formulate taxation as a hierarchical multi-group game. Within each group, the interaction between the government and households is modelled as a leader--follower game; across groups, governments are modelled as players in a competitive game. This results in a hybrid hierarchical game that is difficult to solve using standard multi-agent reinforcement learning algorithms. We therefore propose a bi-level training framework built on multi-agent reinforcement learning, together with \textit{ Curriculum Learning} and a \textit{ Closed-Loop Sequential Update} strategy, to stabilize training and promote convergence. We instantiate this framework in a taxation game simulation environment grounded in classical economic models. The environment supports the evaluation of different taxation algorithms and provides multiple economic indicators for assessing policy performance. Experiments show that our approach can learn stable tax policies that benefit all participating groups. Compared with a two-group baseline without the proposed update mechanisms, our method avoids premature game collapse, extends the effective game duration by 60.92\%, produces more sustainable and robust tax policies, and reduces GDP disparities among governments by 44.12\%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04741v1</guid>
      <category>cs.MA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Honglei Guo, Yuhan Zhao, Yexin Li</dc:creator>
    </item>
    <item>
      <title>MixINN: Accelerating Plant Breeding by Combining Mixed Models and Deep Learning for Interaction Prediction</title>
      <link>https://arxiv.org/abs/2605.04744</link>
      <description>arXiv:2605.04744v1 Announce Type: new 
Abstract: Plant breeding underpins global food security through incremental, accumulating improvements in crop yield, quality and sustainability, achieved via repeated cycles of crop ranking, selection and crossing. Climate change disrupts this process by altering local growing conditions, thereby shifting the relative performance of crop genotypes. Predicting these relative changes in yield is critical for food security. Yet, this problem remains an open challenge in plant breeding, and relatively unexplored within the AI community. We propose MixINN, an approach that first isolates high-quality genotype-environment interaction labels using mixed models, and then predicts these interactions for new crop varieties in future environmental conditions with a deep neural network. We evaluate our method on a corn multi-environment trial across the continental United States and show improved prediction of genotype ranking over current plant breeding methods. MixINN demonstrated superior performance in identifying the 20% most productive corn genotypes, leading to a 5.8% higher average yield, which further improved to 7.2% when targeting specific growing environments. These are competitive results for real-world breeding programs, demonstrating the potential of AI research in accelerating the development of climate-adapted crops, and improving future food security under climate change.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04744v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Aike Potze, Fred van Eeuwijk, Ioannis N. Athanasiadis</dc:creator>
    </item>
    <item>
      <title>Knowledge-Free Correlated Agreement for Incentivizing Federated Learning</title>
      <link>https://arxiv.org/abs/2605.04747</link>
      <description>arXiv:2605.04747v1 Announce Type: new 
Abstract: We introduce Knowledge-Free Correlated Agreement (KFCA) to reward client contributions in federated learning (FL) without relying on ground truth, a public test set, or distribution knowledge. Under categorical reports and an honest majority, KFCA is strictly truthful, addressing the label-flipping vulnerability of Correlated Agreement (CA). We evaluate KFCA on federated LLM adapter tuning and a real-world PCB inspection task, showing efficient real-time reward computation suitable for decentralized and blockchain-based incentive designs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04747v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.GT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Leon Witt, Togrul Abbasli, Kentaroh Toyoda, Wojciech Samek, Lucy Klinger</dc:creator>
    </item>
    <item>
      <title>VC-FeS: Viewpoint-Conditioned Feature Selection for Vehicle Re-identification in Thermal Vision</title>
      <link>https://arxiv.org/abs/2605.04750</link>
      <description>arXiv:2605.04750v1 Announce Type: new 
Abstract: Identification of less-articulated objects using single-channel images, such as thermal images, is important in many applications, such as surveillance. However, in this domain, existing methods show poor performance due to high similarity among objects of the same category in the absence of color information (overlooking shape information) and de-emphasized texture information. Furthermore, variability in viewpoint adds more complexity as the features vary from side to side. We address these issues by constructing viewpoint-conditioned feature vectors and area-specific feature comparisons in separate feature spaces. These interventions enable leveraging the advancements of existing RGB-pre-trained ViT feature extractors while effectively adapting them to address the challenges specific to the thermal domain. We test our system with RGBNT100 (IR) vehicle dataset and a thermal maritime dataset acquired by us. Our results surpass the state-of-the-art methods by 19.7% and 12.8% for the above datasets in mAP scores, respectively. We also plan to make our thermal dataset available, the first of its kind for maritime vessel identification.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04750v1</guid>
      <category>cs.CV</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yasod Ginige, Ransika Gunasekara, Darsha Hewavitharana, Manjula Ariyarathne, Peshala Jayasekara, Ranga Rodrigo</dc:creator>
    </item>
    <item>
      <title>Sequential Monte Carlo for Resilient Networks: Assessment, Mitigation, and Generative Modeling</title>
      <link>https://arxiv.org/abs/2605.04751</link>
      <description>arXiv:2605.04751v1 Announce Type: new 
Abstract: Resilience is becoming crucial for future wireless networks, which must withstand, adapt to, and recover from rare but potentially cascading disruptions. This paper develops a sequential Monte Carlo (SMC) simulation framework for such systems, in which resilience failures are formulated as path-dependent rare events arising from staged degradation and delayed recovery, and are decomposed into semantically interpretable levels defined by a reaction coordinate. Building on this structure, we present a fixed-level splitting approach with budget-aware population control, enabling efficient estimation of rare non-recovery probabilities. We discuss the potential reuse of SMC checkpoints as representative near-critical states for policy evaluation and simulation-based selection. We further extend the methodology to learned stochastic simulation by using generative sequence models as restartable surrogates within data-driven digital twins. We showcase the framework in a delay-critical wireless network use case, where SMC substantially improves over standard Monte Carlo in rare-event regimes with both physical and learned simulators.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04751v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Onel L. A. L\'opez, Amirhossein Azarbahram</dc:creator>
    </item>
    <item>
      <title>Hybrid Congestion Classification Framework Using Flow-Guided Attention and Empirical Mode Decomposition</title>
      <link>https://arxiv.org/abs/2605.04752</link>
      <description>arXiv:2605.04752v1 Announce Type: new 
Abstract: Accurate traffic congestion classification requires models that jointly capture roadway scene context and non-stationary traffic motion, yet most prior work treats these requirements in isolation. Vision-based methods often depend on appearance cues with standard temporal pooling, which can bias predictions toward static infrastructure, whereas signal-based approaches characterize temporal dynamics but lack the spatial context needed for scene-level localization. These complementary limitations motivate a unified framework that links motion evidence to spatial feature selection while preserving data-adaptive temporal characterization. This study therefore proposes FLO-EMD, a hybrid approach that couples motion-guided attention with empirical, data-driven temporal decomposition. Dense optical flow guides channel and spatial attention so that RGB features are refined toward motion-relevant regions. In parallel, aggregated flow statistics form compact motion traces that are decomposed using Empirical Mode Decomposition (EMD) to extract intrinsic temporal components. The resulting EMD embedding is fused with learned spatiotemporal representations to classify light, medium, and heavy congestion. Experiments on 1,050 five-second clips from four surveillance networks show that FLO-EMD achieves 97.5% overall test accuracy (weighted F1 = 0.9742), outperforming established baselines and remaining robust across diverse environmental conditions; ablation and sensitivity analyses further quantify the contributions of EMD, the number of intrinsic mode functions, and the selected motion descriptors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04752v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Eugene Kofi Okrah Denteh, Blessing Agyei Kyem, Joshua Kofi Asamoah, Armstrong Aboah</dc:creator>
    </item>
    <item>
      <title>AxMoE: Characterizing the Impact of Approximate Multipliers on Mixture-of-Experts DNN Architectures</title>
      <link>https://arxiv.org/abs/2605.04754</link>
      <description>arXiv:2605.04754v1 Announce Type: new 
Abstract: Deep neural network (DNN) inference at the edge demands simultaneous improvements in accuracy, computational efficiency, and energy consumption. Approximate computing and Mixture-of-Experts (MoE) architectures have each been studied as independent routes towards efficient inference, the former by replacing exact arithmetic with low-power approximate multipliers, the latter by routing inputs through specialized expert sub-networks to enable conditional computation. However, their interaction remains entirely unexplored. This paper presents AxMoE, the first study of the impact of approximate multiplication on MoE DNN architectures. We evaluate three MoE variants: Hard MoE, Soft MoE, and Cluster MoE against dense baselines across three CNN architectures (ResNet-20, VGG11_bn, VGG19_bn) on CIFAR-100 and a Vision Transformer (ViT-Small) on Tiny ImageNet-200 dataset, using eight 8-bit signed multipliers (including one exact baseline) from the EvoApproxLib library. Results show that, without retraining, the Dense baseline is the most resilient topology across all CNN architectures, whereas on ViT-Small, all topologies degrade at comparable rates regardless of routing strategy. After approximate-aware retraining, recovery varies substantially across architectures, topologies, and multipliers. ResNet-20 achieves full recovery across the entire multiplier range, whereas VGG architectures recover at moderate multipliers but fail irreversibly at aggressive ones for all topologies except Cluster MoE on VGG11_bn; on ViT-Small, Hard MoE outperforms Dense under aggressive approximation at equal normalized inference cost. These results pave the way for future approximate MoE hardware-software co-design strategies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04754v1</guid>
      <category>cs.LG</category>
      <category>cs.AR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Omkar B Shende, Marcello Traiola, Gayathri Ananthanarayanan</dc:creator>
    </item>
    <item>
      <title>3D Printing of Passively Actuated Self-Folding Robots with Integrated Functional Modules</title>
      <link>https://arxiv.org/abs/2605.04757</link>
      <description>arXiv:2605.04757v1 Announce Type: new 
Abstract: We introduce an elastic-driven self-folding approach that fabricates robots directly from flat 3D-printed conductive PLA nets. Elastic bands routed through printed hooks store energy that folds the sheet into programmed 3D geometries, while the flat state allows accurate placement of electronics and magnets before deployment. The same substrate doubles as electrodes for capacitive touch and supports a reusable platform I/O palette with Hall sensors and eccentric rotating mass (ERM) motors for docking detection and vibration actuation. We also derive a closed-form folding model that balances hinge stiffness with elastic band moment to predict equilibrium fold angles; experiments validate the model and yield a design map linking hinge thickness, band size, and hook spacing to target angles. Using this workflow we realize multiple polyhedral modules and demonstrate three applications: a cube that highlights the potential of self-folding for scalable modular robot collectives, a deployable gripper, and a tendon-driven finger. The method is low cost, stimulus-free, and integrates actuation and sensing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04757v1</guid>
      <category>cs.RO</category>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Gaolin Ge, Qifeng Yang, Haoran Lu, Tingyu Cheng, Martin Nisser, Yiyue Luo</dc:creator>
    </item>
    <item>
      <title>Gyan: An Explainable Neuro-Symbolic Language Model</title>
      <link>https://arxiv.org/abs/2605.04759</link>
      <description>arXiv:2605.04759v1 Announce Type: new 
Abstract: Transformer based pre-trained large language models have become ubiquitous. There is increasing evidence to suggest that even with large scale pre-training, these models do not capture complete compositional context and certainly not, the full human analogous context. Besides, by the very nature of the architecture, these models hallucinate, are difficult to maintain, are not easily interpretable and require enormous compute resources for training and inference. Here, we describe Gyan, an explainable language model based on a novel non-transformer architecture, without any of these limitations. Gyan achieves SOTA performance on 3 widely cited data sets and superior performance on two proprietary data sets. The novel architecture decouples the language model from knowledge acquisition and representation. The model draws on rhetorical structure theory, semantic role theory and knowledge-based computational linguistics. Gyan's meaning representation structure captures the complete compositional context and attempts to mimic humans by expanding the context to a 'world model'. AI model adoption critically depends on trust and transparency especially in mission critical use cases. Collectively, our results demonstrate that it is possible to create models which are trustable and reliable for mission critical tasks. We believe our work has tremendous potential for guiding the development of transparent and trusted architectures for language models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04759v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.ET</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Venkat Srinivasan, Vishaal Jatav, Anushka Chandrababu, Geetika Sharma</dc:creator>
    </item>
    <item>
      <title>AFL-ICP: Enhancing Industrial Control Protocol Reliability via Specification-Guided Fuzzing</title>
      <link>https://arxiv.org/abs/2605.04760</link>
      <description>arXiv:2605.04760v1 Announce Type: new 
Abstract: Industrial Control Protocols (ICPs) are critical to the reliability and stability of industrial infrastructure, yet their security is fundamentally compromised by a specification-blindness bottleneck. Modern fuzzers, constrained by observation-driven inference, struggle to penetrate deep protocol states or detect subtle semantic deviations. In this paper, we present AFL-ICP, an autonomous fuzzing framework that pioneers a specification-driven paradigm. AFL-ICP features a context-aware specification formalization pipeline to transform complex specifications into rigorous machine-executable grammars. Building on this formalized specification, AFL-ICP leverages LLMs to enable automated protocol adaptation and seed generation, allowing for rapid extension to new protocols with minimal manual effort. Additionally, it includes an LLM-powered differential checker that cross-references implementation outputs with specification requirements to detect subtle semantic and logic bugs that existing fuzzers cannot detect. We implement AFL-ICP and evaluate it on four widely used ICPs, including both open-source and closed-source variants. Results show that AFL-ICP significantly outperforms state-of-the-art fuzzers in coverage and uncovers 24 previously unknown vulnerabilities, for which we have received acknowledgments from affected vendors (e.g., FreyrSCADA). Specifically, the identified vulnerabilities include 16 semantic and logic bugs that can silently disrupt industrial operations and degrade service availability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04760v1</guid>
      <category>cs.CR</category>
      <category>cs.NI</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiaying Meng, Xuewei Feng, Qi Li, Min Liu, Ke Xu</dc:creator>
    </item>
    <item>
      <title>Cognitive Twins: Investigating Personalized Thinking Model Building and Its Performance Enhancement with Human-in-the-Loop</title>
      <link>https://arxiv.org/abs/2605.04761</link>
      <description>arXiv:2605.04761v1 Announce Type: new 
Abstract: This paper presents the Personalized Thinking Model (PTM), a hierarchical and interpretable learner representation designed for AI supported education. PTM organizes evidence from learner journals into a five-layer structure covering behavioral instances, behavioral patterns, cognitive routines, metacognitive tendencies, and self-system values. PTM is grounded in Marzano's New Taxonomy of Educational Objectives and tries to clone learner's thinking model and build cognitive twin. It was constructed using a pipeline that combines large language model inference (Gemini 2.5 Pro), sentence embeddings, dimensionality reduction, and consensus clustering. This paper evaluates PTM fidelity through three methods applied to 40 participants in a seven-week study. First, automatic evaluation using atomic information point matching yielded an overall F1 score of 74.57% before human-in-the-loop (HITL) refinement and 75.48% after refinement. Second, user evaluation using a Likert scale produced mean ratings of 4.26 and 4.30 on a five-point scale for pre and post-HITL conditions respectively. Third, semantic alignment verification showed that topic coherence increased from 0.436 at the behavioral layer to 0.626 at the core value layer, while lexical overlap with journal vocabulary decreased from 0.114 to 0.007 across those same layers. These results suggest that the PTM produces outputs with acceptable fidelity, was generally perceived by users as reflecting their thinking, and showed a pattern consistent with semantic abstraction across layers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04761v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wu-Yuin Hwang, Nur Alif Ilyasa, Muhammad Irfan Luthfi, Yuniar Indrihapsari</dc:creator>
    </item>
    <item>
      <title>How Does Chunking Affect Retrieval-Augmented Code Completion? A Controlled Empirical Study</title>
      <link>https://arxiv.org/abs/2605.04763</link>
      <description>arXiv:2605.04763v1 Announce Type: new 
Abstract: Retrieval-augmented generation (RAG) pipelines for code completion rely on chunking to segment source files into retrievable units, yet chunking strategies are typically adopted without empirical justification, and practitioner recommendations are notably inconsistent. We present a controlled empirical study isolating the effect of chunking on code completion quality by crossing four representative strategies (Function, Declaration, Sliding Window, and cAST) with four retrievers, five generators, and nine parameter configurations on two benchmarks (RepoEval and CrossCodeEval), totaling 864 experimental settings. Our results reveal that chunking strategy has a statistically significant effect on RAG-based code completion. Contrary to intuition, chunking based on functions underperforms all other strategies by 3.57--5.64 percentage points on RepoEval (Cliff's delta = -1.0), while the remaining chunking strategies perform comparably. Our further analysis demonstrates that this observation holds across all retriever--generator combinations. We also find that cross-file context length is the dominant parameter: doubling from 2,048 to 8,192 tokens yields up to 4.2 percentage points of improvement, whereas chunk size has a weaker, non-monotonic effect. On the cost--quality Pareto front, Sliding Window and cAST dominate both benchmarks; Function chunking is never Pareto-optimal.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04763v1</guid>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xinjian Wu, Jingzhi Gong, Gunel Jahangirova, Jie Zhang</dc:creator>
    </item>
    <item>
      <title>Elicitation Matters: How Prompts and Query Protocols Shape LLM Surrogates under Sparse Observations</title>
      <link>https://arxiv.org/abs/2605.04764</link>
      <description>arXiv:2605.04764v1 Announce Type: new 
Abstract: Large language models are increasingly used as surrogate models for low-data optimization, but their optimizer-facing prediction and its uncertainty remain poorly understood. We study the surrogate belief elicited from an LLM under sparse observations, showing that it depends strongly on prompt text and query protocol. We introduce an uncertainty-alignment criterion that measures whether model uncertainty tracks residual ambiguity among sample-consistent functions. Across controlled inference tasks and Bayesian optimization studies, we find that structural prompts act as effective priors, POINTWISE and JOINT querying induce different beliefs, and sequential evidence leads to non-monotonic, order-sensitive confidence updates. These effects change downstream acquisition decisions and regret, showing that elicitation protocol is part of the LLM surrogate specification, not a formatting detail.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04764v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ge Lei, Samuel J. Cooper</dc:creator>
    </item>
    <item>
      <title>Computational and Analytical Study of Variations and Generalizations of the FC-Gram Approximation Algorithm</title>
      <link>https://arxiv.org/abs/2605.04765</link>
      <description>arXiv:2605.04765v1 Announce Type: new 
Abstract: The FC-Gram algorithm approximates non-periodic functions to high order by constructing a periodic extension with controlled boundary behavior and applying trigonometric interpolation. In this paper we introduce a generalized FC-Gram framework (GenFC), which provides greater flexibility in the construction of the blending continuation of Gram polynomials. This flexibility gives better control over the shape of the periodic extension and leads to improved approximation accuracy. We establish a convergence theorem showing that the trigonometric interpolant converges at the rate $\mathcal{O}(n^{-\min(r+\beta,\,d)})$ in the supremum norm on the original interval, where $r$ is the smoothness of the target function, $d$ the number of Gram polynomials, and $\beta \in [0,1]$ a Fourier-decay parameter. The framework and its analysis are developed so that the modified FC-Gram method of [J. Sci. Comput., 105(1):8, 2025] is recovered as a particular case. Numerical experiments confirm the predicted convergence rates and show that the added flexibility of the GenFC framework leads to improved approximation accuracy, with the gains carrying over to a Fourier continuation solver for two-point boundary value problems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04765v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Prakash Nainwal, Akash Anand</dc:creator>
    </item>
    <item>
      <title>From open-loop representations to closed-loop feedback implementations in differential games: A numerical case study</title>
      <link>https://arxiv.org/abs/2605.04768</link>
      <description>arXiv:2605.04768v1 Announce Type: new 
Abstract: Solutions to pursuit-evasion and surveillance-evasion differential games are typically computed and expressed using open-loop representations, with the synthesis of feedback strategies significantly less common. We propose a numerical scheme for obtaining feedback strategies for the recently introduced prying-pedestrian surveillance-evasion differential game. The scheme involves computing feedback strategies as input-output maps approximated via neural networks trained using data obtained from open-loop representations of solutions. Simulations show the effectiveness of neural networks trained with an appropriate learning-loss function. Since optimal feedback strategies are discontinuous, as a second contribution, the potential loss/gain of individual players is subsequently studied for players using sample-and-hold feedback compared to continuous-time feedback.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04768v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Philipp Braun, Timothy L. Molloy, Gal Barkai, Iman Shames</dc:creator>
    </item>
    <item>
      <title>Lightweight Cross-Spectral Face Recognition via Contrastive Alignment and Distillation</title>
      <link>https://arxiv.org/abs/2605.04769</link>
      <description>arXiv:2605.04769v1 Announce Type: new 
Abstract: Heterogeneous Face Recognition (HFR) aims at matching face images captured across different sensing modalities, such as thermal-to-visible or near-infrared-to-visible, enhancing the usability of face recognition systems in challenging real-world conditions. Although recent HFR methods have achieved significant improvements in performance, many rely on computationally expensive models, making them impractical for deployment on resource-limited edge devices. In this work, we introduce a lightweight yet effective HFR framework by adapting a hybrid CNN-Transformer model originally developed for RGB homogeneous face recognition. Our approach enables efficient end-to-end training with only a small amount of paired heterogeneous data, while still maintaining strong performance on standard RGB face recognition benchmarks. This makes it suitable for both homogeneous and heterogeneous settings. Comprehensive experiments on several challenging HFR and face recognition benchmarks show that our method achieves state-of-the-art or competitive performance while keeping computational requirements low.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04769v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Anjith George, Sebastien Marcel</dc:creator>
    </item>
    <item>
      <title>Gaze4HRI: Zero-shot Benchmarking Gaze Estimation Neural-Networks for Human-Robot Interaction</title>
      <link>https://arxiv.org/abs/2605.04770</link>
      <description>arXiv:2605.04770v1 Announce Type: new 
Abstract: While zero-shot appearance-based 3D gaze estimation offers significant cost-efficiency by directly mapping RGB images to gaze vectors, its reliability in Human-Robot Interaction (HRI) settings remains uncertain. Existing benchmarks frequently overlook fundamental HRI conditions, such as dynamic camera viewpoints and moving targets in video. Furthermore, current cross-dataset evaluations often suffer from a complexity gap, where methods trained on diverse datasets are tested on significantly smaller and less varied sets, failing to assess true robustness. To bridge these gaps, we introduce Gaze4HRI, a large-scale dataset (50+ subjects, 3,000+ videos, 600,000+ frames) designed to evaluate state-of-the-art performance against critical HRI variables: illumination, head-gaze conflict, as well as the motion of camera and gaze target in video. Our benchmark reveals that all evaluated methods fail in at least one condition, identifying steeply-downward gaze as a universal failure point. Notably, PureGaze trained on the ETH-X-Gaze dataset uniquely maintains resilience across all other conditions. These results challenge the recent focus in the literature on complex spatial-temporal modeling and Transformer-based architectures. Instead, our findings suggest that extensive data diversity, as exemplified by the ETH-X-Gaze dataset, serves as the primary driver of zero-shot robustness in unconstrained environments, while resilience-enhancing frameworks, such as PureGaze's self-adversarial loss for gaze feature purification, provide a substantial further improvement. Ultimately, this study establishes a rigorous benchmark that provides practical guidelines for practitioners as well as reshaping future research. The dataset and codes are available at https://gazeforhri.github.io.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04770v1</guid>
      <category>cs.CV</category>
      <category>cs.HC</category>
      <category>cs.LG</category>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Berk Sezer, Ali G\"orkem K\"u\c{c}\"uk, Erol \c{S}ahin, Sinan Kalkan</dc:creator>
    </item>
    <item>
      <title>MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education</title>
      <link>https://arxiv.org/abs/2605.04772</link>
      <description>arXiv:2605.04772v1 Announce Type: new 
Abstract: Access to diverse, well-annotated medical images with interactive learning tools is fundamental for training practitioners in medicine and related fields to improve their diagnostic skills and understanding of anatomical structures. While medical atlases are valuable, they are often impractical due to their size and lack of interactivity, whereas online image search may provide mislabeled or incomplete material. To address this, we propose MIRAGE, a multimodal medical text and image retrieval and generation system that allows users to find and generate clinically relevant images from trustworthy sources by mapping both text and images to a shared latent space, enabling semantically meaningful queries. The system is based on a fine-tuned medical version of CLIP (MedICaT-ROCO), trained with the ROCO dataset, obtained from PubMed Central. MIRAGE allows users to give prompts to retrieve images, generate synthetic ones through a medical diffusion model (Prompt2MedImage) and receive enriched descriptions from a large language model (Dolly-v2-3b). It also supports a dual search option, enabling the visual comparison of different medical conditions. A key advantage of the system is that it relies entirely on publicly available pretrained models, ensuring reproducibility and accessibility. Our goal is to provide a free, transparent and easy-to-use didactic tool for medical students, especially those without programming skills. The system features an interface that enables interactive and personalized visual learning through medical image retrieval and generation. The system is accessible to medical students worldwide without requiring local computational resources or technical expertise, and is currently deployed on Kaggle: http://www-vpu.eps.uam.es/mirage</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04772v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1007/978-3-032-09569-5_11</arxiv:DOI>
      <arxiv:journal_reference>Workshop on Applications of Medical AI (AMAI 2025), MICCAI 2025, pp 103-112, 2025</arxiv:journal_reference>
      <dc:creator>Miguel Diaz Benito, Cecilia Diana Albelda, Alvaro Garcia Martin, Jesus Bescos Cano, Marcos Escudero-Vinolo, Juan C. SanMiguel</dc:creator>
    </item>
    <item>
      <title>AGIPC: Adaptive In-Solve Algebraic Coarsening for GPU IPC</title>
      <link>https://arxiv.org/abs/2605.04773</link>
      <description>arXiv:2605.04773v1 Announce Type: new 
Abstract: Implicit time integration is key to robustly simulating stiff materials and large deformations, but its performance is often dominated by repeatedly solving large linear systems. Adaptive coarsening can reduce this cost by concentrating degrees of freedom (DoF) to where it is most needed, yet conventional explicit remeshing changes connectivity (and often vertex ordering), complicating parallel implementations, harming memory locality, and sometimes being disallowed when it may introduce local geometry intersections. Adaptive subspace approaches avoid topological changes, but basis construction and updates incur irregular data access patterns and typically produce dense system matrices, limiting GPU efficiency and keeping many practical systems CPU-centric. We present algebraic adaptive in-solve coarsening, a GPU-oriented method that dynamically reduces DoF within the Newton solve of implicit time integration without explicit topological modification. Starting from a fine mesh, we express adaptivity as a selective edge-collapse process governed by per-edge tags. Collapsible edges are aggregated in parallel using a warp-level hash mapping scheme that groups fine vertices into coarse super-nodes, while protected edges preserve local detail. This defines an implicit coarse mesh whose linear system is assembled algebraically by mapping and reducing fine-scale gradients and Hessians via efficient GPU reduction kernels. We solve the resulting coarse system with a preconditioned conjugate gradient (PCG) method and then prolongate the solution back to the fine mesh. Our approach integrates seamlessly with IPC's barrier energy and exploits GPU parallelism end-to-end. Across a range of challenging scenarios, we achieve up to 3x speedup over a state-of-the-art GPU IPC solver while producing visually indistinguishable results.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04773v1</guid>
      <category>cs.GR</category>
      <category>cs.PF</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xuan Wang, Zhaofeng Luo, Minchen Li, Taku Komura, Kemeng Huang</dc:creator>
    </item>
    <item>
      <title>Bridging Perception and Action: A Lightweight Multimodal Meta-Planner Framework for Robust Earth Observation Agents</title>
      <link>https://arxiv.org/abs/2605.04777</link>
      <description>arXiv:2605.04777v1 Announce Type: new 
Abstract: Autonomous Earth Observation (EO) agents are transitioning from passive perception to complex, multi-step task execution. However, current architectures that integrate planning and execution within a single model often struggle with combinatorial complexity and reasoning errors in dynamic EO scenarios. To resolve these challenges, we propose the Lightweight Multimodal Meta-Planner (LMMP) framework. LMMP incorporates a dual-awareness mechanism that grounds strategic plans in both multimodal image features and high-level task semantics. Crucially, we introduce a Meta Task Library to inject remote sensing expert knowledge directly into the workflow, which standardizes domain logic and ensures plans are physically feasible. We further implement a two-stage training pipeline, initializing the Meta-Planner via expert-distilled Supervised Fine-Tuning and refining it through Direct Preference Optimization based on execution feedback. Extensive experiments on a dataset derived from EarthBench and ThinkGeo demonstrate that LMMP significantly improves tool-calling accuracy and task success rates. Moreover, the framework exhibits strong ``plug-and-play'' versatility, consistently enhancing the performance of diverse executor backbones across previously unseen EO missions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04777v1</guid>
      <category>cs.MA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jinghui Xu, Boyi Shangguan, Mengke Zhu, Hao Liu, Junhuan Jiang, Guangjun He, Pengming Feng, Shichao Jin, Bin Liang, Yongzhe Chang, Junbo Tan, Tiantian Zhang, Xueqian Wang</dc:creator>
    </item>
    <item>
      <title>Steady Incremental Viscosity Splitting Method for solving the stationary Navier-Stokes equation</title>
      <link>https://arxiv.org/abs/2605.04778</link>
      <description>arXiv:2605.04778v1 Announce Type: new 
Abstract: We develop a novel and efficient iterative scheme for solving incompressible steady Navier-Stokes equations. The method is an adaptation of the Incremental Viscosity Splitting approximation for unsteady flows to steady equations. At each nonlinear iteration, the scheme requires solving an elliptic PDE for the velocity variable and a system with an SPD matrix for the pressure variable, which remains the same across all nonlinear iterations. The method can also be interpreted as an algebraic splitting approach. We prove boundedness and geometric convergence. Numerical tests illustrate the efficiency of the proposed algorithm.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04778v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aziz Takhirov, Driss Yakoubi</dc:creator>
    </item>
    <item>
      <title>A meta-analysis of the effect of generative AI on productivity and learning in programming</title>
      <link>https://arxiv.org/abs/2605.04779</link>
      <description>arXiv:2605.04779v1 Announce Type: new 
Abstract: Generative artificial intelligence (GenAI) is increasingly used for programming, yet it remains unclear when and where GenAI tools lead to productivity gains. Evidence on the effects of GenAI on the long-term development of programming skills is similarly mixed. Here, we present a meta-analysis of $n = 23$ studies reporting $k = 27$ effect sizes to quantify the effect of GenAI-powered coding assistants on productivity and learning. We systematically searched (i) ACM, (ii) arXiv, (iii) Scopus, and (iv) Web of Science for studies published between 2019 and 2025. Studies were required to compare GenAI-assisted with unassisted programming using quantitative measures of (1) productivity (i.e., task completion time, commits, and lines of code) and (2) learning (i.e., exam performance). We assessed the risk of bias using RoB2 and ROBINS-I and compared standardized effect sizes using Hedges' $g$. We find a statistically significant, but moderate positive effect of GenAI assistance on developer productivity ($g = 0.33$, $95\%$ CI: $[0.09, 0.58]$), yet with substantial heterogeneity across settings. Notably, productivity gains tend to be larger in controlled experimental settings, while effects are smaller in open-source and enterprise contexts. In contrast, we find no statistically significant effect of GenAI assistance on learning outcomes ($g = 0.14$, $95\%$ CI: $[-0.18, 0.47]$). Overall, these results highlight that GenAI coding assistants can increase developer productivity, although these gains depend strongly on context. In educational settings, however, the use of GenAI does not consistently translate into improved learning or skill development, which highlights the need for careful integration of GenAI into computer science education.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04779v1</guid>
      <category>cs.SE</category>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sebastian Maier, Moritz Gunzenh\"auser, Jonas Schweisthal, Manuel Schneider, Stefan Feuerriegel</dc:creator>
    </item>
    <item>
      <title>AgentTrust: Runtime Safety Evaluation and Interception for AI Agent Tool Use</title>
      <link>https://arxiv.org/abs/2605.04785</link>
      <description>arXiv:2605.04785v1 Announce Type: new 
Abstract: Modern AI agents execute real-world side effects through tool calls such as file operations, shell commands, HTTP requests, and database queries. A single unsafe action, including accidental deletion, credential exposure, or data exfiltration, can cause irreversible harm. Existing defenses are incomplete: post-hoc benchmarks measure behavior after execution, static guardrails miss obfuscation and multi-step context, and infrastructure sandboxes constrain where code runs without understanding what an action means.
  We present AgentTrust, a runtime safety layer that intercepts agent tool calls before execution and returns a structured verdict: allow, warn, block, or review. AgentTrust combines a shell deobfuscation normalizer, SafeFix suggestions for safer alternatives, RiskChain detection for multi-step attack chains, and a cache-aware LLM-as-Judge for ambiguous inputs.
  We release a 300-scenario benchmark across six risk categories and an additional 630 independently constructed real-world adversarial scenarios. On the internal benchmark, the production-only ruleset achieves 95.0% verdict accuracy and 73.7% risk-level accuracy at low-millisecond end-to-end latency. On the 630-scenario benchmark, evaluated under a patched ruleset and not claimed as zero-shot, AgentTrust achieves 96.7% verdict accuracy, including about 93% on shell-obfuscated payloads. AgentTrust is released under the AGPL-3.0 license and provides a Model Context Protocol server for MCP-compatible agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04785v1</guid>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chenglin Yang</dc:creator>
    </item>
    <item>
      <title>Superconvergence in finite element method by smoothing</title>
      <link>https://arxiv.org/abs/2605.04786</link>
      <description>arXiv:2605.04786v1 Announce Type: new 
Abstract: This paper develops a smoothing-based postprocessing method for superconvergence in finite element methods. The method applies a few smoothing iterations, such as damped Jacobi, Gauss-Seidel, or conjugate gradient, with initial guess being the current finite element solution embedded in an enriched finite element space. The resulting procedure is algebraic, easy to implement, and applicable to high-order and three-dimensional discretizations. For symmetric and positive-definite problems, we prove superconvergence of the smoothed solutions under additive and multiplicative smoothers. Effectiveness of the proposed method is demonstrated by numerical experiments for the Poisson, Maxwell, biharmonic and Helmholtz equations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04786v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuwen Li, Han Shui, Ludmil Zikatanov</dc:creator>
    </item>
    <item>
      <title>Long-Term Risks of IoT Devices: The Case of the Smart Fridge</title>
      <link>https://arxiv.org/abs/2605.04787</link>
      <description>arXiv:2605.04787v1 Announce Type: new 
Abstract: Replacing conventional devices with smart ones has many advantages, e.g., a seamless integration of physical objects into the users digital environment or improved modes of use. However, if a conventional device is replaced by a smart device, its IT components can cause risks, that shorten the life of the device. Such risks stem from different life cycles of embedded soft- and hardware, libraries and protocols used, and the IT ecosystem required. This is problematic, because many conventional household appliances, say, a fridge or TV, have a much longer life span than typical IT equipment. In this paper, we use a systematic approach to identify long-term risks for the operational life span of a smart fridge. In particular, we identify 8 different use cases of three typical smart fridges, e.g., cooling or managing "best before" dates. We model the IT ecosystem needed to run these use cases, and we inspect each asset in this ecosystem for potential long-term risks. We found that even cooling, the most basic use case, is at risk in the long run. This is because the setting cooling parameters may depend on parts of the IT ecosystem that are not under the users control. On the other hand, we did not find any risk that may lead to harm of the category "threatening". Our findings on the smart fridge can be generalized to other smart devices easily.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04787v1</guid>
      <category>cs.CR</category>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:journal_reference>BUCHMANN, Erik. Long-Term Risks of IoT Devices: The Case of the Smart Fridge. In: Proceedings of the 17th Conference on Digital Society (ICDS'23), 2023</arxiv:journal_reference>
      <dc:creator>Erik Buchmann</dc:creator>
    </item>
    <item>
      <title>Equilibrium points and stability of synchronous machine systems</title>
      <link>https://arxiv.org/abs/2605.04788</link>
      <description>arXiv:2605.04788v1 Announce Type: new 
Abstract: This paper investigates equilibrium points and stability in two synchronous machine configurations: (i) a single generator with an impedance load and (ii) two interconnected machines with co-located loads. We consider both abc and dq reference frames to show that the equilibrium condition reduces to a cubic polynomial in the single-machine case and to an 18th- degree polynomial in the two-machine case. For the single-machine system, Lyapunov stability analysis and linearization based stability analysis are carried out. For the two-machine system, local stability is assessed through linearization and eigenvalue analysis. Illustrative examples confirm the existence of multiple equilibria and illustrate the impact of parameter variation on stability. Our results provide insight into the stability of synchronous machine systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04788v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Maryam Khodabakhshloo, Elizabeth L. Ratnam, Ian R. Petersen</dc:creator>
    </item>
    <item>
      <title>OpenWatch: A Multimodal Benchmark for Hand Gesture Recognition on Smartwatches</title>
      <link>https://arxiv.org/abs/2605.04791</link>
      <description>arXiv:2605.04791v1 Announce Type: new 
Abstract: Despite widespread adoption of smartwatches worldwide, open-benchmarks for wrist-based gesture recognition remain surprisingly limited. In this work, we intro- duce the first open-access multi-modal benchmark, OpenWatch, for wrist-based gesture recognition using synchronized inertial and physiological sensing on a com- mercial smartwatch. It contains over 10 hours of Inertial Measurement Unit (IMU) and Photoplethysmography (PPG) data across 50 participants and a vocabulary of 59 labelled gesture sequences. Furthermore, we present a subject-independent evaluation protocol including traditional and deep learning methods for time-series classification. On top of this, we develop two novel methodologies for hand-gesture recognition: (i) MixToken, a task-specific mixture-of-experts that fuses per-channel IMU filterbank features with cross-channel statistical tokens through learned logit mixing, and (ii) NormWear-Lora, a low-rank adaptation module for smartwatch foundation models. Our benchmarking results reveal that PPG signals carries a sub- stantial predictive benefit (+12.5% F1-score) for foundational smartwatch models. In addition, we show that task-specific architectures (i.e. MixToken) substantially outperforms finetuned smartwatch foundation models in terms of accuracy (F1- score=90% vs 66%) and memory efficiency (223k vs 136M parameters). Finally, we also provide clear empirical guidance on the trade-offs between specialized architecture design, modality fusion, data augmentations, and foundation-model adaptation for resource-constrained wearable sensing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04791v1</guid>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pietro Bonazzi, Youssef Ahmed, Daniel Eckert, Andrea Ronco, Junjie Zeng, Dengxin Dai, Michele Magno</dc:creator>
    </item>
    <item>
      <title>Bilinear Mamba-Koopman Neural MPC for Varying Dynamics</title>
      <link>https://arxiv.org/abs/2605.04793</link>
      <description>arXiv:2605.04793v1 Announce Type: new 
Abstract: Koopman-based neural MPC models generate time-varying dynamics from historical data, but preserve convexity by enforcing that the system operator is independent of the current control input. This conditional independence constraint limits adaptation to changing dynamics within a single MPC horizon, particularly under time-varying conditions and under stale-plan execution.
  We propose Bilinear Mamba-Koopman Neural MPC, a minimal extension that introduces control-dependent coupling in the latent dynamics, allowing the effective operator to adapt to the current input. The resulting model is a strict generalization of the standard linear, conditional-independence formulation, adds less than 1% parameters through a low-rank structure, and admits exact model Jacobians that enable efficient Sequential Convex Programming (SCP) with monotone-descent and KKT convergence results under standard trust-region assumptions.
  Across CartPole and RSCP benchmarks in time-invariant and time-varying regimes, the proposed model matches or improves forecasting accuracy on every cell when training noise is averaged out, with strict gains where control-state coupling is structurally present. Its main closed-loop gains appear in the RSCP TV task, where iterative SCP improves adaptation within the horizon and substantially stabilizes training; in CartPole TV, the gains are modest but consistent. In delayed re-planning experiments on the time-varying variants, the bilinear model degrades more gracefully under stale-plan execution, maintaining a consistent advantage on CartPole TV and a substantially larger robustness margin on RSCP TV. These results show that control-dependent latent dynamics provide a simple and effective mechanism for robust MPC under varying conditions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04793v1</guid>
      <category>cs.LG</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Matan Pagi, Zohar Sorek</dc:creator>
    </item>
    <item>
      <title>Distance Distributions Between Nodes in Concentric Disk-Annulus or Sphere-Shell Regions</title>
      <link>https://arxiv.org/abs/2605.04794</link>
      <description>arXiv:2605.04794v1 Announce Type: new 
Abstract: This letter derives closed-form expressions for the probability density function of the distance between two nodes located in heterogeneous concentric geometries, namely a disk or sphere and a surrounding annulus or spherical shell. Two scenarios are considered: (i) both nodes are independently distributed in different regions, disk or sphere and annulus or shell, and (ii) one node is static in the outer region while the other follows the stationary distribution of the random waypoint model in the inner region. The resulting expressions provide a tractable analytical tool for performance evaluation in concentric wireless regions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04794v1</guid>
      <category>cs.IT</category>
      <category>eess.SP</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Nicholas Vaiopoulos, Alexander Vavoulas, Harilaos G. Sandalidis, Konstantinos K. Delibasis</dc:creator>
    </item>
    <item>
      <title>Negative Imaginary and Passivity Properties of Synchronous Machine Systems</title>
      <link>https://arxiv.org/abs/2605.04796</link>
      <description>arXiv:2605.04796v1 Announce Type: new 
Abstract: The recent rapid proliferation of renewable energy is fundamentally changing the dynamic operations of power systems, necessitating new approaches to assess stability for these highly nonlinear systems. In this paper, we prove that synchronous machine systems, modeled in the nonlinear dq-frame, possess fundamental dissipativity properties. Specifically, we show passivity from current input to voltage output and a nonlinear negative imaginary property from torque input to rotor angle output. For the nonlinear system shifted around an equilibrium point, we derive explicit conditions for both passivity and the NI property to hold. Finally, we demonstrate that interconnection with passive droop controllers preserves these dissipativity properties with identical supply rates, thereby ensuring closed-loop stability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04796v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Maryam Khodabakhshloo, Elizabeth L. Ratnam, Ian R. Petersen</dc:creator>
    </item>
    <item>
      <title>Beyond Seeing Is Believing: On Crowdsourced Detection of Audiovisual Deepfakes</title>
      <link>https://arxiv.org/abs/2605.04797</link>
      <description>arXiv:2605.04797v1 Announce Type: new 
Abstract: Deepfakes are increasingly realistic and easy to produce, raising concerns about the reliability of human judgments in misinformation settings. We study audiovisual deepfake detection by measuring how consistently crowd workers distinguish authentic from manipulated videos and, when they flag a video as manipulated, how accurately they identify the manipulation type (audio-only, video-only, or audio-video) and how consistently they report manipulation timestamps. We run two matched crowdsourcing studies on Prolific using AV-Deepfake1M and the Trusted Media Challenge (TMC) dataset. We sample 48 videos per dataset (96 total) and collect 960 judgments (10 per video). Results show that crowd workers rarely misclassify authentic videos as manipulated, but they miss many manipulations, and agreement remains limited across videos. Aggregating multiple judgments per video stabilizes the authenticity signal, but it cannot recover manipulations that most workers consistently miss. Manipulation type identification is substantially noisier than authenticity detection even when workers detect a manipulation, with joint audio-video cases being particularly hard to recognize. Overall, these findings suggest that crowdsourcing can provide a scalable screening signal for audiovisual authenticity, while reliable modality attribution remains an open challenge.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04797v1</guid>
      <category>cs.IR</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Michael Soprano, Andrea Cioci, Stefano Mizzaro</dc:creator>
    </item>
    <item>
      <title>Online Orthogonal Vectors Revisited</title>
      <link>https://arxiv.org/abs/2605.04798</link>
      <description>arXiv:2605.04798v1 Announce Type: new 
Abstract: We prove new upper and lower bounds for the Online Orthogonal Vectors Problem ($\mathsf{OnlineOV}_{n,d}$). In this problem, a preprocessing algorithm receives $n$ vectors $x_1,\ldots,x_n\in\{0,1\}^d$ and constructs a data structure of size $S$. A query algorithm subsequently receives a query vector $q\in\{0,1\}^d$ and in time $T$ decides whether $q$ is orthogonal to any of the input vectors $x_i$.
  We design a new deterministic data structure for $\mathsf{OnlineOV}_{n,d}$. In low dimensions ($d = c \log n$), our data structure matches the performance of the best known randomized algorithm due to Chan [SoCG 2017]. Furthermore, in moderate dimensions ($d=n^{\varepsilon}$), we give the first improvement since Charikar, Indyk and Panigrahy [ICALP 2002]. Along the way, we give the first deterministic refutation of a conjecture on the hardness of $\mathsf{OnlineOV}$ posed by Goldstein, Lewenstein and Porat [ISAAC 2017]. This data structure also extends to a number of problems, including Partial Match, Orthogonal Range Search, and DNF Evaluation. We use a novel structure-versus-randomness decomposition to design our algorithm.
  Under the Non-Uniform Strong Exponential Time Hypothesis, we also prove arbitrarily large polynomial space lower bounds for any $\mathsf{OnlineOV}$ data structure with sublinear query time even with computationally unbounded preprocessing. These lower bounds extend to several other problems, including Polynomial Evaluation, Partial Match, Orthogonal Range Search, and Approximate Nearest Neighbors. We also prove similar lower bounds for $\mathsf{3-SUM}$ with preprocessing under the Non-Uniform Hamiltonian Path Conjecture.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04798v1</guid>
      <category>cs.DS</category>
      <category>cs.CC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Karthik Gajulapalli, Alexander Golovnev, Samuel King, Sidhant Saraogi</dc:creator>
    </item>
    <item>
      <title>Not All Faults Are Equal: Transient-Fault Sensitivity Characterization of an Open-Source RISC-V Vector Cluster</title>
      <link>https://arxiv.org/abs/2605.04803</link>
      <description>arXiv:2605.04803v1 Announce Type: new 
Abstract: We present a transient-fault sensitivity study of the open-source RISC-V vector cluster Spatz under SET and SEU fault models. Across 100,000 fault injections on six MatMul and Widening MatMul configurations, faulty data corruption (FD) is the dominant manifesting outcome for all evaluated workloads, accounting for at least 86% of manifesting errors in the SET campaigns and at least 91% in the SEU campaigns. At the module level, SET sensitivity is concentrated in the vector execution path, while TCDM is the major contributor to FD manifestations. We further quantify SDC severity across FP32, FP16, BP16, and FP8 by analyzing both the average number of corrupted outputs and their RMSE. FP8 shows the lowest output impact overall, while FP16 Widening MatMul reduces both corruption spread and RMSE compared with FP16 MatMul. By contrast, the effect of widening on FP8 is limited in our experiments. Finally, exponent-targeted corruptions induce the most severe SDC events, with the largest deviations observed in FP32 and BP16, motivating selective protection of the highest-impact datapaths and fault cases.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04803v1</guid>
      <category>cs.AR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Maoyuan Cai, Amirhossein Kiamarzi, Davide Rossi, Angelo Garofalo</dc:creator>
    </item>
    <item>
      <title>An Adaptive Finite Element Method Based on Generalized Barycentric Coordinates</title>
      <link>https://arxiv.org/abs/2605.04805</link>
      <description>arXiv:2605.04805v1 Announce Type: new 
Abstract: This work derives a posteriori error estimate of polygonal finite element methods based on Wachspress barycentric coordinates. In particular, we prove that the classical residual-based a posteriori error estimator is both an upper and lower bounds for the discretization error. The analysis relies a Scott-Zhang type interpolation and homogeneity arguments for rational functions on polygonal elements. Numerical experiments on square and L-shaped domains demonstrate the effectiveness of the adaptive algorithm.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04805v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yihui Zhou, Yuwen Li</dc:creator>
    </item>
    <item>
      <title>Dr-PoGO: Direct Radar Pose-Graph Optimization</title>
      <link>https://arxiv.org/abs/2605.04806</link>
      <description>arXiv:2605.04806v1 Announce Type: new 
Abstract: This paper introduces Dr-PoGO, a method for Simultaneous Localization And Mapping (SLAM) using a 2D spinning radar. Unlike cameras or lidars that require line-of-sight, millimetre-wave radars can `see' through dust, falling snow, rain, etc. Accordingly, it is a great modality for robust perception regardless of the weather conditions. While most existing radar-based SLAM methods rely on the extraction of point clouds or features to perform ego-motion estimation, Dr-PoGO leverages direct registration techniques for odometry (DRO) and loop-closure registration. An off-the-shelf radar-focused place recognition algorithm, RaPlace, provides loop-closure candidates. As RaPlace does not provide relative transformations, Dr-PoGO introduces a coarse-to-fine registration that uses visual features and descriptors to obtain an initial guess for the direct transformation refinement. The global trajectory is optimized in a pose-graph optimization. Dr-PoGO demonstrates state-of-the-art performance over 300km of data in various real-world automotive environments. Our implementation is publicly available: https://github.com/utiasASRL/dr_pogo.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04806v1</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Cedric Le Gentil, Weican Li, Leonardo Brizi, Timothy D. Barfoot</dc:creator>
    </item>
    <item>
      <title>DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents</title>
      <link>https://arxiv.org/abs/2605.04808</link>
      <description>arXiv:2605.04808v1 Announce Type: new 
Abstract: AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due to their high capability and flexibility, such agents raise significant security and safety concerns. A growing number of real-world incidents have shown that adversaries can easily manipulate agents into performing harmful actions, such as leaking API keys, deleting user data, or initiating unauthorized transactions. Evaluating agent security is inherently challenging, as agents operate in dynamic, untrusted environments involving external tools, heterogeneous data sources, and frequent user interactions. However, realistic, controllable, and reproducible environments for large-scale risk assessment remain largely underexplored. To address this gap, we introduce the DecodingTrust-Agent Platform (DTap), the first controllable and interactive red-teaming platform for AI agents, spanning 14 real-world domains and over 50 simulation environments that replicate widely used systems such as Google Workspace, Paypal, and Slack. To scale the risk assessment of agents in DTap, we further propose DTap-Red, the first autonomous red-teaming agent that systematically explores diverse injection vectors (e.g., prompt, tool, skill, environment, combinations) and autonomously discovers effective attack strategies tailored to varying malicious goals. Using DTap-Red, we curate DTap-Bench, a large-scale red-teaming dataset comprising high-quality instances across domains, each paired with a verifiable judge to automatically validate attack outcomes. Through DTap, we conduct large-scale evaluations of popular AI agents built on various backbone models, spanning security policies, risk categories, and attack strategies, revealing systematic vulnerability patterns and providing valuable insights for developing secure next-generation agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04808v1</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhaorun Chen, Xun Liu, Haibo Tong, Chengquan Guo, Yuzhou Nie, Jiawei Zhang, Mintong Kang, Chejian Xu, Qichang Liu, Xiaogeng Liu, Tianneng Shi, Chaowei Xiao, Sanmi Koyejo, Percy Liang, Wenbo Guo, Dawn Song, Bo Li</dc:creator>
    </item>
    <item>
      <title>Optimal Uncertainty-Aware Calibration for the AX=YB Problem</title>
      <link>https://arxiv.org/abs/2605.04809</link>
      <description>arXiv:2605.04809v1 Announce Type: new 
Abstract: This article proposes a general optimization framework for solving hand-eye calibration problem. Unlike traditional methods, an iterative algorithm based on Lie algebra that achieves approximately global optimal solutions is developed. During the optimization process, the method strictly preserves the structural constraints of the calibration parameters and enables synchronized updates between calibration parameters. Recognizing that data used in real-word hand-eye calibration often contain uncertainty, especially in over-loading and large workspace industrial robot scenarios, which can significantly degrade accuracy, and accurately modeling such uncertainty is inherently difficult, this article avoids explicit uncertainty modeling. Instead, an uncertainty metric to evaluate the relative uncertainty between data sources is introduced and used to dynamically refine the iterative process. To further enhance convergence efficiency, an effective initial solution generation method that improves overall stability and accuracy is designed. Numerical simulations and real-world experiments validate the effectiveness of the proposed approach, and in synthetic datasets, the proposed approach improves the estimation accuracy by at least 67\% under high-uncertainty conditions compared with the existing methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04809v1</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yanjia Chen, Xiangfei Li, Huan Zhao, Yiyuan Hong, Guanxiao Xia, Jiexin Zhang, Han Ding</dc:creator>
    </item>
    <item>
      <title>Tree-based Credit Assignment for Multi-Agent Memory System</title>
      <link>https://arxiv.org/abs/2605.04811</link>
      <description>arXiv:2605.04811v1 Announce Type: new 
Abstract: Memory systems are widely adopted to enhance LLMs for long-horizon tasks, and are commonly organized as multi-agent pipelines with memory building, summarizing, and retrieval agents. To empower this system, existing RL-based methods either apply final downstream task rewards (e.g., QA accuracy) for all agents uniformly, which are coarse and ambiguous, or design task-specific rewards for agents on different subtasks, which require costly annotations (e.g., key evidence) and are difficult to define reliably. To address these limitations, we propose Tree-based Credit Assignment for Multi-Agent Memory Systems (TreeMem), which derives agent-specific credit from the final reward without task-specific annotations. Specifically, TreeMem extends the multi-agent pipeline (builder--summarizer--retrieval) into a tree structure, where each agent's outputs are expanded into multiple subsequent branches. The contribution of each agent is estimated via Monte Carlo averaging over its subsequent branches, capturing how intermediate agent actions may influence the final reward. This converts the coarse final reward into agent-specific optimization signals. These signals are then used to update all agent policies simultaneously, helping heterogeneous agents specialize effectively. Experiments on long-horizon benchmarks show that TreeMem improves memory system performance over strong baselines, validating the effectiveness of tree-structured credit assignment for the multi-agent memory system.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04811v1</guid>
      <category>cs.MA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Marina Mao, Alexandr Liu, Pengbo Li, Siheng Li, Bo Zhou, Xiang Wang</dc:creator>
    </item>
    <item>
      <title>A Biased Nonnegative Block Term Tensor Decomposition Model for Dynamic QoS Prediction</title>
      <link>https://arxiv.org/abs/2605.04813</link>
      <description>arXiv:2605.04813v1 Announce Type: new 
Abstract: With the rapid development of cloud computing and Web services, Quality of Service (QoS) has become a key criterion for service selection and recommendation. Tensor latent feature analysis provides an effective way to model multidimensional QoS data, and most existing QoS prediction methods are mainly based on Canonical Polyadic (CP) decomposition or Tucker decomposition. However, constrained by their inherent structural properties, these methods cannot accurately capture the complex and dynamic dependencies in user-service interactions, which limits their prediction performance. To address this issue, this paper proposes a dynamic QoS prediction framework based on the Biased Nonnegative Block Term Tensor Decomposition Model, termed BNBT. Specifically, the proposed framework is developed from three aspects: (1) block term tensor decomposition is employed to enhance the representation capability of latent feature learning; (2) linear bias terms are incorporated to further improve prediction accuracy; and (3) a tensor-oriented single-element-dependent nonnegative multiplicative update algorithm, called SLF-NMUT, is designed for efficient parameter estimation. Extensive experiments on real-world QoS datasets demonstrate that the proposed BNBT framework consistently outperforms several state-of-the-art QoS prediction methods in terms of prediction accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04813v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wenjing Liu, Yujia Lei, Qu Wang</dc:creator>
    </item>
    <item>
      <title>Building AI Companions that Prioritise Learning over Performance</title>
      <link>https://arxiv.org/abs/2605.04816</link>
      <description>arXiv:2605.04816v1 Announce Type: new 
Abstract: Large language models (LLMs) are rapidly transforming knowledge work by improving the quality and efficiency of tasks such as writing, coding, and data analysis. However, their growing use in education has exposed a learning-performance paradox: while they can enhance short-term task performance, they may also undermine genuine learning, including cognitive growth, knowledge transfer, and metacognitive development. This paper addresses the question of how artificial intelligence should be designed and used to support learning rather than merely improve immediate outputs. We introduce the concept of AI learning companions, defined as adaptive, pedagogically informed, LLM-powered agents designed for integration into learning environments. We propose a framework for their design built on three interrelated foundations: a pedagogical foundation focused on how students learn with AI, an adaptive foundation focused on how AI learns about students, and a responsible design foundation ensuring systems remain transparent, accountable, inclusive, and secure. The framework is illustrated through five case studies spanning diverse educational contexts, levels, and tool designs, revealing both the promise and current limitations of existing tools. We conclude that there is a necessary shift away from LLMs designed for task-oriented performance, and beyond simply prompting them to act as tutors, toward deliberately developed AI learning companions that are pedagogically sound, adapt to their learners, and foster durable understanding, metacognitive growth, and learner agency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04816v1</guid>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Hassan Khosravi, Dragan Gasevic, Shazia Sadiq, Lixiang Yan, Jason Lodge, Jason Tangen, Paul Denny, Kristen DiCerbo, Simon Buckingham Shum, Ryan S. Baker</dc:creator>
    </item>
    <item>
      <title>Unsat Core Prediction through Polarity-Aware Representation Learning over Clause-Literal Hypergraphs</title>
      <link>https://arxiv.org/abs/2605.04819</link>
      <description>arXiv:2605.04819v1 Announce Type: new 
Abstract: Graph neural networks have been widely used in Boolean satisfiability (SAT) tasks to learn structural information from SAT formulas. The goal of these studies is to solve SAT instances or to enhance SAT solvers, including tasks such as unsat-core prediction. However, most existing approaches model a SAT formula as a bipartite graph or a directed acyclic graph, which are less expressive in capturing higher-order interactions among literals and clauses. Moreover, these approaches are limited in modeling intrinsic polarity-related properties of SAT, such as the complementary relationship between the positive and negative literals of a variable. To address these limitations, we propose a polarity-aware representation learning framework over clause-literal hypergraphs. We model SAT formulas as clause-literal hypergraphs augmented with a clause incidence graph to capture higher-order structural interactions. We then introduce a polarity-aware decomposed mechanism that separates variable representations into polarity invariant and equivariant components, explicitly modeling the relationship between positive and negative literals, with the resulting literal representations propagated along the hypergraph structure. We further incorporate a polarity-inversion consistency regularization to reinforce polarity-consistent representations during training. Experimental results on multiple SAT datasets demonstrate the effectiveness of the proposed approach.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04819v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhenchao Sun, Shuai Ma, Ping Lu, Chongyang Tao</dc:creator>
    </item>
    <item>
      <title>Toward less conservative distributed stability analysis of power systems via matrix-valued differential passivity indices</title>
      <link>https://arxiv.org/abs/2605.04821</link>
      <description>arXiv:2605.04821v1 Announce Type: new 
Abstract: Passivity indices have been widely adopted to derive distributed stability certificates for power systems. Nevertheless, conventional passivity indices remain scalar-valued even for multi-input-multi-output (MIMO) systems, which can introduce excessive conservatism and compromise analysis accuracy. To overcome these limitations, this paper extends the differential passivity index to a matrix-valued formulation that captures both channel-wise passivity properties and inter-channel coupling effects in MIMO subsystems. On this basis, semi-distributed and fully distributed stability criteria are developed for power systems with heterogeneous nonlinear devices. It is shown that system stability is guaranteed when the aggregate passivity excess of devices compensates for the passivity shortage imposed by the network. Furthermore, analytical passivity matrix expressions for typical power system components are derived, facilitating compositional stability analysis. Case studies on a three-bus system and a modified IEEE 118-bus system validate the effectiveness of the proposed framework.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04821v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xi Ru, Cong Fu, Zhongze Li, Xiaoyu Peng, Feng Liu</dc:creator>
    </item>
    <item>
      <title>Improving FMQA via Initial Training Data Design Considering Marginal Bit Coverage in One-Hot Encoding</title>
      <link>https://arxiv.org/abs/2605.04825</link>
      <description>arXiv:2605.04825v1 Announce Type: new 
Abstract: Factorization machine with quadratic-optimization annealing (FMQA) is a black-box optimization method that combines a factorization machine (FM) surrogate with QUBO-based search by an Ising machine. When FMQA is applied to integer or discretized continuous variables via one-hot encoding, uniform random initial sampling can leave many binary variables never active in the initial training data, and the corresponding FM parameters receive no direct gradient updates from the observed responses. We address this by designing the initial training data to achieve complete marginal bit coverage, namely, ensuring that every binary variable obtained by one-hot encoding takes the value one at least once. We use two space-filling sampling methods, Latin hypercube sampling (LHS) and the Sobol' sequence, yielding LHS-FMQA and Sobol'-FMQA. On the human-powered aircraft wing-shape optimization benchmark with 17 and 32 design variables, both proposed methods achieved numerically higher mean final cruising speeds than the baseline FMQA, with the advantage more pronounced on the 32-variable problem.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04825v1</guid>
      <category>cs.LG</category>
      <category>cond-mat.stat-mech</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Taiga Hayashi, Yuya Seki, Kotaro Terada, Yosuke Mukasa, Shuta Kikuchi, Shu Tanaka</dc:creator>
    </item>
    <item>
      <title>Faster Algorithms for Shortest Unique or Absent Substrings</title>
      <link>https://arxiv.org/abs/2605.04826</link>
      <description>arXiv:2605.04826v1 Announce Type: new 
Abstract: We revisit two well-known algorithmic problems on strings: computing a shortest unique substring (SUS) and a shortest absent substring (SAS) of a string $S$ of length $n$. Both problems admit folklore $\mathcal{O}(n)$-time solutions using the suffix tree of $S$. However, for small alphabets, this complexity is not necessarily optimal in the word RAM model, where a string of length $n$ over alphabet $[0,\sigma)$ can be stored in $\mathcal{O}(n \log \sigma/\log n)$ space and read in $\mathcal{O}(n \log \sigma/\log n)$ time.
  We present an $\mathcal{O}(n \log \sigma/\sqrt{\log n})$-time algorithm for computing a SUS of $S$. This algorithm decomposes the problem according to the length and the period of the sought substring and uses several tools and techniques, such as synchronizing sets, the analysis of runs, and wavelet trees, to reduce the computation of a SUS to a simple geometric problem. Further, we adapt this algorithm and combine it with an efficient construction of de Bruijn sequences in order to obtain an $\mathcal{O}(n \log \sigma/\sqrt{\log n})$-time algorithm for computing a SAS of $S$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04826v1</guid>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Panagiotis Charalampopoulos, Manal Mohamed, Solon P. Pissis, Hilde Verbeek, Wiktor Zuba</dc:creator>
    </item>
    <item>
      <title>Trustworthy Federated Label Distribution Learning under Annotation Quality Disparity</title>
      <link>https://arxiv.org/abs/2605.04827</link>
      <description>arXiv:2605.04827v1 Announce Type: new 
Abstract: Label Distribution Learning (LDL) models supervision as an instance-wise probability distribution, enabling fine-grained learning under inherent ambiguity, but its success relies on high-fidelity label distributions that are costly to obtain and thus often noisy. Motivated by privacy-sensitive applications, we study Federated Label Distribution Learning (Fed-LDL), where data isolation further induces heterogeneous annotation quality across clients, making local updates unevenly reliable and breaking sample-size-based aggregation (e.g., FedAvg). To address this trust dilemma, we propose FedQual, a quality-aware Fed-LDL framework with two coupled mechanisms: (i) quality-adaptive client training guided by a global semantic anchor that calibrates low-quality clients while preserving high-quality autonomy, and (ii) reliability-aware server aggregation that reweights client contributions by effective reliable information rather than raw sample size. To enable rigorous evaluation, we construct four new Fed-LDL benchmarks (FER-LDL, FI-LDL, PIPAL-LDL, and KADID-LDL) with controlled annotation quality disparity. We further provide a theoretical guarantee showing that under heterogeneous supervision quality, client-specific calibration is strictly better than any uniform calibration. Extensive experiments on the proposed benchmarks demonstrate the effectiveness of FedQual.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04827v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Junxiang Wu, Zhiqiang Kou, Hongwei Zeng, Wenke Huang, Biao Liu, Hanlin Gu, Yuheng Jia, Di Jiang, Yang Liu, Xin Geng, Qiang Yang</dc:creator>
    </item>
    <item>
      <title>Toward an Understanding of Developer Behaviour while Using Bug Localization Tools</title>
      <link>https://arxiv.org/abs/2605.04828</link>
      <description>arXiv:2605.04828v1 Announce Type: new 
Abstract: Bug fixing is a complex and time-consuming task in software development. Bug localization research tends to focus on the accuracy of automated tools that suggest source code files for developers to look at. However, little is known about how developers use these tools in practice. This paper reports on an ongoing qualitative user study. Eleven participants worked through four realistic bug localization tasks in a controlled environment and were given varying levels of support information offered by a specialized tool. Participants were asked to think aloud in a semi-structured interview session. The preliminary findings provide insight into three aspects of practice: how developers interact with tools, the role social and contextual information plays, and problem solving. The study demonstrates that bug localization is complex and suggests that the adoption of effective tools depends on more than their accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04828v1</guid>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Pablo Diaz Pedreira, Tamara Lopez, Michel Wermelinger</dc:creator>
    </item>
    <item>
      <title>Traffic Chunk Sizing vs. Optical Switching Speed in Future All-Optical Satellite Networks</title>
      <link>https://arxiv.org/abs/2605.04829</link>
      <description>arXiv:2605.04829v1 Announce Type: new 
Abstract: To enable efficient resource utilization under stringent Size, Weight, and Power (SWaP) constraints through transparent and all-optical switched satellites transmission, various switching paradigms can be considered, including packet, burst, or circuit. To this end, the traffic assembly and algorithmic design for path computations at the ground stations play a key role in determining the switching fabric design. Generally, traffic can be buffered and assembled in chunks at the ground stations and forwarded over the pre-computed optical path in space, similar to terrestrial optical burst switching or fast circuit switching. Regardless of the chosen paradigm, the switching fabric must satisfy specific latency performance requirements. This paper studies the performance of all-optical satellite networks based on the maximum traffic chunk sizes that can be scheduled and the performance of optical switching fabrics in the future over all-optical constellations. We consider various optical switching technologies, including MEMS- and integrated photonic-based solutions, in the context of switching speed, power consumption, and insertion loss. Simulation results indicate that traffic chunk size critically impacts the performance required by optical switching fabrics onboard a satellite.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04829v1</guid>
      <category>cs.NI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sleman Mouammar, Thomas R\"othig, Soheil Hosseini, \'Italo Brasileiro, Admela Jukan</dc:creator>
    </item>
    <item>
      <title>Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models</title>
      <link>https://arxiv.org/abs/2605.04830</link>
      <description>arXiv:2605.04830v1 Announce Type: new 
Abstract: Diffusion models undergo a phase transition in a critical time window during generation dynamics, with two complementary diagnoses of criticality. The symmetry breaking picture views the critical window as when trajectories bifurcate into different semantic minima of the energy landscape, whereas the nonlocality picture views the critical window as when local denoising fails. We study whether two notions of such phase transitions are concurrent in modern diffusion transformers. By evaluating the dynamics and outcomes of the generation trajectory, we observe a near-simultaneous occurrence of the non-locality and symmetry breaking critical times. Our work is the first to unify the two notions of phase transitions in practice: it provides a concrete diagnostic for when and why diffusion models rely on conditioning and global denoising, enabling principled evaluation of model efficiency and guiding the design of architectures and sampling schemes that avoid unnecessary computation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04830v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yifan F. Zhang, Fangjun Hu, Guangkuo Liu, Mert Okyay, Xun Gao</dc:creator>
    </item>
    <item>
      <title>StoryAlign: Evaluating and Training Reward Models for Story Generation</title>
      <link>https://arxiv.org/abs/2605.04831</link>
      <description>arXiv:2605.04831v1 Announce Type: new 
Abstract: Story generation aims to automatically produce coherent, structured, and engaging narratives. Although large language models (LLMs) have significantly advanced text generation, stories generated by LLMs still diverge from human-authored works regarding complex narrative structure and human-aligned preferences. A key reason is the absence of effective modeling of human story preferences, which are inherently subjective and under-explored. In this work, we systematically evaluate the modeling of human story preferences and introduce StoryRMB, the first benchmark for assessing reward models on story preferences. StoryRMB contains $1,133$ high-quality, human-verified instances, each consisting of a prompt, one chosen story, and three rejected stories. We find existing reward models struggle to select human-preferred stories, with the best model achieving only $66.3\%$ accuracy. To address this limitation, we construct roughly $100,000$ high-quality story preference pairs across diverse domains and develop StoryReward, an advanced reward model for story preference trained on this dataset. StoryReward achieves state-of-the-art (SoTA) performance on StoryRMB, outperforming much larger models. We also adopt StoryReward in downstream test-time scaling applications for best-of-n (BoN) story selection and find that it generally chooses stories better aligned with human preferences. We will release our dataset, model, and code to facilitate future research. Related code and data are available at https://github.com/THU-KEG/StoryReward.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04831v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haotian Xia, Hao Peng, Yunjia Qi, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li</dc:creator>
    </item>
    <item>
      <title>Replay-Based Continual Learning for Physics-Informed Neural Operators</title>
      <link>https://arxiv.org/abs/2605.04832</link>
      <description>arXiv:2605.04832v1 Announce Type: new 
Abstract: Neural operators generally demonstrate strong predictive performance on in-distribution (ID) problems. However, a critical limitation of existing methods is their significant performance degradation when encountering out-of-distribution (OOD) data. To address this issue, this work introduces continual learning into physics-informed neural operators, with particular emphasis on neural operators built upon the Transolver architecture, and proposes a simple yet effective replay-based continual learning strategy. The proposed method is fully physics-informed and does not require labeled data, relying solely on input fields together with physical constraints for training. When new OOD data become available, a small number of past data are incorporated through a distillation-based constraint to preserve previously acquired knowledge and alleviate catastrophic forgetting. Meanwhile, a transfer learning LoRA is employed to enable rapid adaptation to the new data. The proposed framework is systematically validated on three representative physical problems, including the Darcy flow problem in fluid mechanics, a two-dimensional hyperelastic brain tumor problem in biomechanics, and a three-dimensional linear elastic Triply Periodic Minimal Surfaces problem in solid mechanics. The results demonstrate that the proposed method effectively mitigates catastrophic forgetting on previously learned data while maintaining fast adaptability to new data. Compared with conventional joint training strategies, the proposed method significantly improves training efficiency while reducing additional memory usage and computational cost.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04832v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yizheng Wang, Mohammad Sadegh Eshaghi, Xiaoying Zhuang, Timon Rabczuk, Yinghua Liu</dc:creator>
    </item>
    <item>
      <title>Bridging Input Feature Spaces Towards Graph Foundation Models</title>
      <link>https://arxiv.org/abs/2605.04834</link>
      <description>arXiv:2605.04834v1 Announce Type: new 
Abstract: Unlike vision and language domains, graph learning lacks a shared input space, as input features differ across graph datasets not only in semantics, but also in value ranges and dimensionality. This misalignment prevents graph models from generalizing across datasets, limiting their use as foundation models. In this work, we propose ALL-IN, a simple and theoretically grounded method that enables transferability across datasets with different input features. Our approach projects node features into a shared random space and constructs representations via covariance-based statistics, thus eliminating dependence on the original feature space. We show that the computed node-covariance operators and the resulting node representations are invariant in distribution to permutations of the input features. We further demonstrate that the expected operator exhibits invariance to general orthogonal transformations of the input features. Empirically, ALL-IN achieves strong performance across diverse node- and graph-level tasks on unseen datasets with new input features, without requiring architecture changes or retraining. These results point to a promising direction for input-agnostic, transferable graph models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04834v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Moshe Eliasof, Krishna Sri Ipsit Mantri, Beatrice Bevilacqua, Bruno Ribeiro, Carola-Bibiane Sch\"onlieb</dc:creator>
    </item>
    <item>
      <title>Patterns of Developer Adoption of LLM-Generated Code Refactoring Suggestions</title>
      <link>https://arxiv.org/abs/2605.04835</link>
      <description>arXiv:2605.04835v1 Announce Type: new 
Abstract: Large language models (LLMs) have gained widespread popularity and have steadily improved over time, enabling software developers to use them for various code-related tasks. One common task is code refactoring, where the LLM suggests changes for the developer to apply to their code to improve quality attributes such as readability or maintainability. While current research focuses on evaluating LLM-generated refactoring suggestions, there is a limited understanding of how developers apply these suggestions in practice. To explore this, we analyze 169 GitHub commits where developers refactor their code based on a ChatGPT conversation linked in the commit message. We found that developers mostly accept and use the suggestions without modifications. When changes are made, they are mostly major and fall into five different patterns that depend on the refactoring activity, the developer's prompt, and the validity of the response from ChatGPT.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04835v1</guid>
      <category>cs.SE</category>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>David Sch\"on, Faiza Amjad, Tehreem Asif, Ranim Khojah, Mazen Mohamad, Francisco Gomes de Oliveira Neto, Philipp Leitner</dc:creator>
    </item>
    <item>
      <title>Shedding Light onto Safety Integrity Level and Basic Software Constraints in a Real-World Automotive Application: Case Study with Driverator Framework</title>
      <link>https://arxiv.org/abs/2605.04837</link>
      <description>arXiv:2605.04837v1 Announce Type: new 
Abstract: Automotive electronic control units (ECUs) are intricate systems with hundreds of individual functions, numerous software components, and multiple interdependent tasks. A prevalent structural pattern in these systems are so-called cause-effect chains. While significant research efforts have been dedicated to the temporal analysis and optimization of these chains, particularly minimizing data age and function response times, other crucial non-functional properties remain relatively underexplored. In particular, the safety integrity level (SIL) classification substantially influences the system design by determining task colocation strategies. Improper sharing of functions or interweaving tasks with different safety levels can compromise the integrity of critical functions. Additionally, AUTOSAR basic software (BSW) (e.g. OS, runtime environment, communication stacks, or diagnostics) introduces complexity that varies based on task characteristics and SIL categories. Furthermore, memory requirements present another critical challenge, given the diversity of memory architectures and SIL-specific dependencies that strongly constrain task allocations. This paper thoroughly characterizes a real-world automotive application, describing an automotive application based on SIL constraints, the impact of basic software, and memory requirements. In this context, the Driverator configuration framework is introduced for scalable system analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04837v1</guid>
      <category>cs.SE</category>
      <category>cs.OS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tobias Denzinger (CARIAD SE), Matthias Becker (KTH Royal Institute of Technology), Peter Ulbrich (TU Dortmund University)</dc:creator>
    </item>
    <item>
      <title>Hearing the Ocean: Bio-inspired Gammatone-CNN framework for Robust Underwater Acoustic Target Classification</title>
      <link>https://arxiv.org/abs/2605.04839</link>
      <description>arXiv:2605.04839v1 Announce Type: new 
Abstract: This study presents a bio inspired signal processing framework for robust Underwater Acoustic Target Recognition (UATR). The latest state of the art methods often fail to resolve dense low frequency harmonic structures in vessel propulsion signals under high noise conditions, which is addressed by the proposed framework using a biologically inspired Gammatone filter bank that emulates the cochlea nonlinear frequency selectivity. By distributing filters according to the Equivalent Rectangular Bandwidth (ERB) scale, the framework achieves a high fidelity representation of engine radiated tonals while effectively suppressing isotropic ambient interference. The resulting Cochleagram features are processed by a lightweight, custom designed Convolutional Neural Network (CNN) that leverages large receptive fields to integrate spectral-temporal continuities. Experimental results on the VTUAD dataset demonstrate a state of the art classification accuracy of 98.41%, outperforming Continuous Wavelet Transform and Mel Frequency Cepstral Coefficients baselines by 3.5% and 7.7% respectively. Furthermore, the framework achieves an inference latency of only 0.77 ms and a 0.971 Cohen Kappa score, validating its efficacy for real time deployment on autonomous, low-power sonar hardware.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04839v1</guid>
      <category>cs.SD</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Rajeshwar Tripathi, Sandeep Kumar, Monika Aggarwal, Neel Kanth Kundu</dc:creator>
    </item>
    <item>
      <title>Communication Offloading on SmartNIC DPUs: A Quantitative Approach</title>
      <link>https://arxiv.org/abs/2605.04842</link>
      <description>arXiv:2605.04842v1 Announce Type: new 
Abstract: SmartNIC Data Processing Units (DPUs) offer a promising solution for saving high-end CPU resources by offloading tasks to programmable cores near the network interface. In this work, we explore the feasibility of SmartNIC DPUs in supporting an asynchronous communication model called "fire-and-forget", particularly its core message routing service. We design a communication offloading engine called Buddy that decouples communication tasks from the application process. Buddy runs flexibly on SmartNIC DPUs such as the Nvidia BlueField-3 DPU and generic x86 CPUs. Our evaluation results in five applications identify the memory-to-communication ratio as a key predictor of the offloading performance. Host-dominated workloads, such as Quicksilver and Sparse Matrix Transpose, achieved up to 1.55x speedup with communication offloaded to the DPU. We further identify a 625x increase in DRAM traffic due to the absence of Direct Cache Access support on the DPU, highlighting a critical need in future SmartNIC designs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04842v1</guid>
      <category>cs.DC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jacob Wahlgren, Andong Hu, Roger Pearce, Maya Gokhale, Ivy Peng</dc:creator>
    </item>
    <item>
      <title>Convergence analysis of Schwarz-like methods for degenerate elliptic-parabolic equations</title>
      <link>https://arxiv.org/abs/2605.04843</link>
      <description>arXiv:2605.04843v1 Announce Type: new 
Abstract: Convergence is proven for Schwarz-like methods applied to degenerate elliptic-parabolic equations with a $p$-structure. This family of PDEs, e.g., arises when modelling nonlinear diffusion processes. The Schwarz-like approximation methods are based on decomposing the space-time domain into overlapping subdomains, which enables parallel implementations. The methods are derived by introducing a pseudo-time component and applying time integrators of splitting type, which are time stepped towards infinity. This approach of decomposing the space-time domain is related to Schwarz waveform relaxation methods, but the methods considered here have the advantage that they can be proven to converge when applied to nonlinear parabolic, or even degenerate elliptic-parabolic, PDEs. We prove convergence by deriving a nonlinear framework based on the abstract theory for monotone operators and the existence theory for degenerate elliptic-parabolic equations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04843v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Monika Eisenmann, Eskil Hansen</dc:creator>
    </item>
    <item>
      <title>QuadBox: Accelerating 3D Gaussian Splatting with Geometry-Aware Boxes</title>
      <link>https://arxiv.org/abs/2605.04844</link>
      <description>arXiv:2605.04844v1 Announce Type: new 
Abstract: 3D Gaussian Splatting (3DGS) has emerged as an advanced technique for real-time novel view synthesis by representing scene geometry and appearance using differentiable Gaussian primitives. However, efficiently computing precise Gaussian-tile intersections remains a critical task in the rasterization pipeline. To this end, we propose QuadBox, a method that leverages four axis-aligned bounding boxes to tightly encapsulate projected Gaussians in a discrete manner. First, we derive a geometry-aware stretching factor that enables the construction of a tile-aligned QuadBox, which covers the elliptical projection and largely excludes irrelevant tiles. Second, we introduce QPass, a single-pass tile traversal algorithm that exhaustively exploits the discrete nature of QuadBox, ensuring that the tile intersection check is performed with simple interval tests. Experiments on public datasets show that our method accelerates the rendering speed of 3DGS by 1.85$\times$. Code is available at \href{https://github.com/Powertony102/QuadBox}{https://github.com/Powertony102/QuadBox}.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04844v1</guid>
      <category>cs.CV</category>
      <category>cs.GR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xinze Li, Bohan Yang, Pengxu Chen, Yiyuan Wang, Hongcheng Luo, Wentao Cheng, Weifeng Su</dc:creator>
    </item>
    <item>
      <title>Agentic Repository Mining: A Multi-Task Evaluation</title>
      <link>https://arxiv.org/abs/2605.04845</link>
      <description>arXiv:2605.04845v1 Announce Type: new 
Abstract: Mining software repositories often requires classifying artifacts like commits, reviews, code lines, or entire repositories into categories. Human labeling is expensive and error-prone; limited context frequently leads to misclassifications or uncertainty in labels. We investigate whether LLM agents that dynamically explore repositories through standard bash commands can match the classification quality of simple LLMs that receive pre-engineered context. Across four tasks, eight approach configurations, and 4943 classifications, agents achieve competitive accuracy despite retrieving their own context. The primary advantage is robustness: agents avoid context-window overflows and scale independently of artifact size. A manual diagnosis of 100 cases where approaches disagree with the ground truth reveals specification ambiguities and labels produced under limited context, suggesting that accuracy against such ground truth may underestimate approaches with broader context access.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04845v1</guid>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Johannes H\"artel</dc:creator>
    </item>
    <item>
      <title>Quantile-Free Uncertainty Quantification in Graph Neural Networks</title>
      <link>https://arxiv.org/abs/2605.04847</link>
      <description>arXiv:2605.04847v1 Announce Type: new 
Abstract: Uncertainty quantification (UQ) in graph neural networks (GNNs) is crucial in high-stakes domains but remains a significant challenge. In graph settings, message passing often relies on strong assumptions such as exchangeability, which are rarely satisfied in practice. Moreover, achieving reliable UQ typically requires costly resampling or post-hoc calibration. To address these issues, we introduce Quantile-free Prediction Interval GNN (QpiGNN), a framework that builds on quantile regression (QR) to enable GNN-based UQ by directly optimizing coverage and interval width without requiring quantile inputs or post-processing. QpiGNN employs a dual-head architecture that decouples prediction and uncertainty, and is trained with label-only supervision through a quantile-free joint loss. This design allows efficient training and yields robust prediction intervals, with theoretical guarantees of asymptotic coverage and near-optimal width under mild assumptions. Experiments on 19 synthetic and real-world benchmarks show QpiGNN achieves average 22\% higher coverage and 50\% narrower intervals than baselines, while ensuring efficiency and robustness to noise and structural shifts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04847v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Soyoung park, Hwanjun Song, Sungsu Lim</dc:creator>
    </item>
    <item>
      <title>RTMS: A Real-Time Multimodal Scaffolding System for Improving Debugging in Computing Education</title>
      <link>https://arxiv.org/abs/2605.04848</link>
      <description>arXiv:2605.04848v1 Announce Type: new 
Abstract: Debugging is a demanding aspect of programming yet guidance on how to teach it effectively remains limited. Novices often struggle to recognize impasses regulate their problem solving and manage cognitive load and stress. This study investigates whether real time multimodal feedback triggered by indicators of cognitive load and physiological stress can improve debugging performance narrow expert novice gaps and reduce the influence of prior programming experience on success. We conducted a between subjects experiment with 120 undergraduate computer science students who debugged a medium sized Python program. Participants were assigned to one of four conditions no feedback cognitive load triggered feedback stress triggered feedback or combined trigger feedback. Eye tracking and heart rate variability data were used to detect moments of struggle and automatically deliver brief context sensitive hints. All three feedback conditions significantly improved debugging success and efficiency compared with the control group. Cognitive load triggered feedback produced stronger gains than stress triggered feedback and the combined trigger condition yielded the largest improvements. Programming expertise predicted performance only in the control condition and in all feedback conditions the novice expert gap was markedly reduced. Adaptive feedback that responds to learners cognitive and affective states can help manage debugging demands and reduce performance differences linked to prior experience highlighting opportunities for physiologically aware adaptive learning environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04848v1</guid>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Anahita Golrang, Kshitij Sharma</dc:creator>
    </item>
    <item>
      <title>Hybrid Iterative Neural Low-Regularity Integrator for Nonlinear Dispersive Equations</title>
      <link>https://arxiv.org/abs/2605.04853</link>
      <description>arXiv:2605.04853v1 Announce Type: new 
Abstract: We propose HIN-LRI, a hybrid framework that augments a classical numerical solver with a neural operator trained to correct the solver's structured truncation error. A base low-regularity integrator provides a consistent first-order approximation to nonlinear dispersive PDEs, while a lightweight neural network, operating on a low-dimensional latent manifold, learns the residual defect that analytical methods cannot close. An explicit time-step scaling on the neural correction ensures that its Lipschitz contribution remains $\mathcal{O}(\tau)$, yielding a Gronwall stability factor bounded uniformly in the step size and independent of the spatial resolution. The network is trained end-to-end through a solver-in-the-loop objective that unrolls the full iteration and penalises trajectory error in a Bourgain-type norm, aligning learning with multi-step solver dynamics rather than isolated one-step targets. Under stated assumptions, the global error satisfies $C(\varepsilon_{net}+\delta)\,\tau^\gamma\ln(1/\tau)$, where $\varepsilon_{net}$ measures the network approximation quality and $\delta$ the training shortfall. Experiments on three dispersive benchmarks with rough data show that HIN-LRI improves accuracy over analytical integrators, splitting methods, and neural PDE surrogates, with stable spatial refinement, effective out-of-distribution transfer, and modest online overhead.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04853v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhangyong Liang</dc:creator>
    </item>
    <item>
      <title>3D Ultrasound-Derived Pseudo-CT Synthesis Using a Transformer-Augmented Residual Network for Real-Time Operator Guidance</title>
      <link>https://arxiv.org/abs/2605.04856</link>
      <description>arXiv:2605.04856v1 Announce Type: new 
Abstract: Computed tomography (CT) is indispensable for clinical diagnosis and image-guided interventions but exposes patients to ionizing radiation, motivating the development of safer imaging alternatives. Ultrasound (US) is non-ionizing and widely accessible; however, it is highly operator dependent and lacks quantitative tissue characterization, often leading to diagnostic uncertainty and unnecessary CT examinations. This work presents a 3D ultrasound-derived pseudo-CT (UD-pCT) framework that generates CT-like anatomical reference volumes inferred from US, without aiming to reproduce physically accurate Hounsfield Units. Paired 3D kidney US and CT volumes from the TRUSTED dataset are first spatially aligned using a landmark-based multimodal registration pipeline, creating high-quality paired inputs for supervised training of an adversarial framework. The proposed Bottleneck Transformer Residual U-Net3D (BT-ResUNet3D) model employs a 3D residual encoder-decoder generator augmented with a transformer bottleneck, enabling effective modeling of fine-grained local anatomical structures as well as long-range volumetric dependencies, while a 3D Conditional PatchGAN discriminator enforces local structural realism in the synthesized pseudo-CT volumes. Quantitative evaluation using PSNR and SSIM demonstrates that the proposed method outperforms established baselines in structural fidelity and perceptual image quality. The UD-pCT volumes provide real-time anatomical reference for operator guidance, potentially reducing acquisition variability and unnecessary CT use. A limitation of this study is the relatively small paired dataset, which may limit the generalizability of the proposed model.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04856v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sapna Sachan, Amulya Kumar Mahto</dc:creator>
    </item>
    <item>
      <title>Assessing Cognitive Effort in L2 Idiomatic Processing: An Eye-Tracking Dataset</title>
      <link>https://arxiv.org/abs/2605.04857</link>
      <description>arXiv:2605.04857v1 Announce Type: new 
Abstract: This paper presents the development and validation of an eye-tracking dataset designed to investigate how second-language (L2) learners process idiomatic expressions. While native speakers often rely on direct retrieval of figurative meanings, L2 speakers frequently adopt a literal-first approach, which incurs measurable cognitive costs. This resource captures these costs through ocular metrics recorded from Portuguese L1 speakers of English across all CEFR proficiency levels (A1-C2). Although the study uses entry-level 60 Hz hardware (Tobii Pro Spark), we demonstrate that this sampling rate provides sufficient data density to detect macro-cognitive events such as fixations and regressions in reading. Preliminary analysis validates the dataset by revealing a strong inverse correlation between language proficiency and regressive eye movements. Integrated into the MIA (Modeling Idiomaticity in Human and Artificial Language Processing) initiative, this dataset serves as a cognitively grounded benchmark for evaluating both human processing models and the alignment of large language models with human-like figurative understanding.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04857v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Eduardo Santos, Juliana Carvalho, C\'esar Renn\'o-Costa</dc:creator>
    </item>
    <item>
      <title>A Pragmatic Comparison of Cryptographic Computation Technologies for Machine Learning</title>
      <link>https://arxiv.org/abs/2605.04858</link>
      <description>arXiv:2605.04858v1 Announce Type: new 
Abstract: As security demands increase, the importance of secure computation technologies grows, yet these technologies can often seem overwhelming to practitioners. Furthermore, many approaches focus only on a single technology, potentially overlooking superior alternatives. This work aims to address the issue of selecting the right technology for secure computation by presenting a comparative analysis of two highly relevant cryptographic methods and their software implementations, with a particular focus on machine learning. Firstly, we provide a theoretical summary and comparison of the secure computation paradigms of secure multi-party computation (SMPC) and fully homomorphic encryption (FHE). We outline the advantages and limitations of the protocols, as well as the relevant open-source software implementations. Secondly, we present the results of extensive benchmarking of the main software frameworks identified for machine learning operations and models. Regarding the current state of the art in FHE, we observe that it outperforms SMPC for regressions. Additionally it may be faster for simple dense networks using GPUs or Hybrid Models. Conversely, SMPC showed superior performance for complex models such as CNNs. Our results should pave the way for more technology-agnostic benchmarking of secure computation technologies for machine learning, providing guidance for practitioners looking to adopt these technologies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04858v1</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.5220/0014567700004061</arxiv:DOI>
      <dc:creator>Marcus Taubert, Adam Skuta, Thomas Loruenser</dc:creator>
    </item>
    <item>
      <title>Update-Magnitude State Redistribution (UM-SRD): A Shut-off Extension of Weighted SRD for Cut-Cell Methods</title>
      <link>https://arxiv.org/abs/2605.04863</link>
      <description>arXiv:2605.04863v1 Announce Type: new 
Abstract: Berger &amp; Giuliani (2024) developed a provably stable weighted state redistribution (SRD) algorithm for cut-cell meshes. A key limitation of their method is that, although flux redistribu- tion naturally vanishes when updates are small, SRD continuously applies redistribution even when the flux balance is zero, preventing exact steady-state preservation and potentially in- troducing unnecessary dissipation in smooth regions. This work introduces Update-Magnitude State Redistribution (UM-SRD), which blends the SRD operator with the identity operator via a smooth, locally-defined scalar indicator of the finite-volume update magnitude. UM-SRD preserves conservation and reduces exactly to the base scheme when the finite-volume update is exactly zero in a small-cell neighborhood. For a one-dimensional model problem with a single small cut cell, we prove UM-SRD is total variation diminishing under the same CFL condition as the base upwind scheme, show the local truncation error modification is higher-order in smooth regions with the unnormalized indicator, and show that the normalized implementation pre- serves first-order accuracy. Numerical experiments demonstrate convergence toward first order on smooth 1D and 2D advection tests, confirm shut-off behaviour, verify non-oscillatory proper- ties, provide numerical evidence that UM-SRD stabilizes the base scheme near a small cut cell where the base scheme diverges, and confirm exact steady-state preservation. The algorithm reuses existing weighted SRD infrastructure, adding only a local blending mechanism, making it practical for cut-cell finite-volume codes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04863v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Justo E. Karell</dc:creator>
    </item>
    <item>
      <title>Phased Ultra Massive Array (PUMA)</title>
      <link>https://arxiv.org/abs/2605.04866</link>
      <description>arXiv:2605.04866v1 Announce Type: new 
Abstract: This paper proposes a novel multiple-access framework, termed the phased ultra massive antenna array (PUMA), which exploits the distinctive spatial flexibility of fluid antenna systems (FAS) at the user equipment (UE). Building upon fluid antenna multiple access (FAMA) and compact ultra-massive antenna array (CUMA), PUMA incorporates a phased array for signal aggregation. This architecture enables the UE to inherently mitigate co-user interference within the spatial domain without necessitating channel state information (CSI) for precoding at the base station (BS) or complex interference cancellation at each UE. A primary advantage of PUMA lies in its hardware efficiency: by implementing phase shifting and signal combining in the analog domain, it achieves high antenna gain while requiring only a minimal number of radio-frequency (RF) chains, potentially a single RF chain. Comprehensive theoretical analysis of the achievable data rate is provided, complemented by extensive simulations that validate the framework. The results demonstrate that PUMA markedly outperforms FAMA and CUMA architectures, particularly for UEs with a single RF chain, offering a robust and scalable solution for interference-insensitive massive connectivity in sixth-generation (6G) systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04866v1</guid>
      <category>cs.IT</category>
      <category>eess.SP</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hanjiang Hong, Kai-Kit Wong, Xusheng Zhu, Chenguang Rao, Dazhi He, Hyundong Shin</dc:creator>
    </item>
    <item>
      <title>Not All Scaffolds Are Equal: How Initiation Mode Determines EMME Effectiveness in Debugging</title>
      <link>https://arxiv.org/abs/2605.04868</link>
      <description>arXiv:2605.04868v1 Announce Type: new 
Abstract: Adaptive learning technologies increasingly rely on real time physiological analytics to trigger instructional support automatically yet how system driven decisions interact with learners ongoing problem solving processes remains poorly understood. Eye Movement Modeling Examples have shown promise as attention guidance tools but have been studied predominantly as static instructional materials rather than as adaptive scaffolds whose timing and initiation control can vary. This study investigates whether scaffold initiation mode shapes EMME effectiveness in novice programmers debugging and specifically whether automated triggering based on a single physiological indicator of low mental effort is a viable basis for adaptive scaffold delivery. A between subjects experiment was conducted with 120 undergraduate computer science students randomly assigned to one of four conditions: teacher initiated, learner initiated, automated or no scaffold control. Participants completed ten Python debugging tasks while eye tracking data, video interaction logs and performance scores were recorded. All EMME conditions outperformed the control. However human mediated initiation whether teacher or learner consistently produced higher performance than automated triggering and more integrative engagement with the EMME material. Automated triggering based on sustained low pupillary activity was associated with disruptive behavioral patterns suggesting mistimed delivery. EMME also eliminated the performance advantage of prior programming knowledge across all initiation modes. These findings establish scaffold initiation timing and control as critical design variables for EMME and adaptive learning technologies more broadly and demonstrate that a single low effort physiological threshold is insufficient as a trigger criterion for complex problem solving support.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04868v1</guid>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Anahita Golrang, Kshitij Sharma, Halszka Jarodzka, Senne Van Hoecke</dc:creator>
    </item>
    <item>
      <title>VTAgent: Agentic Keyframe Anchoring for Evidence-Aware Video TextVQA</title>
      <link>https://arxiv.org/abs/2605.04870</link>
      <description>arXiv:2605.04870v1 Announce Type: new 
Abstract: Video text-based visual question answering (Video TextVQA) aims to answer questions by reasoning over visual textual content appearing in videos. Despite the strong multimodal video understanding capabilities of recent Video-LLMs, their performance on existing Video TextVQA benchmarks remains limited. To better understand this gap, we conduct an upper-bound analysis through frame-wise question answering, counting a sample as correct if any frame yields the right answer, which significantly outperforms direct video-based inference and reveals a substantial performance gap. The results suggest that the primary bottleneck lies in the localization of key question-relevant evidence, rather than in reasoning capacity itself. Building on this insight, we propose a question-guided agent framework that explicitly anchors the relevant keyframes before answering. The approach operates effectively in a training-free setting and consistently surpasses direct video inference. With additional supervised fine-tuning (SFT) and reinforcement learning (RL), it achieves an average improvement of +12.12 in accuracy and +11.15 in ANLS across benchmarks, establishing new state-of-the-art results. Our study underscores the critical role of explicit keyframe anchoring for advancing Video TextVQA. The code will be publicly released.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04870v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haibin He, Maoyuan Ye, Jing Zhang, Juhua Liu, Bo Du</dc:creator>
    </item>
    <item>
      <title>Measuring Psychological States Through Semantic Projection: A Theory-Driven Approach to Language-Based Assessment</title>
      <link>https://arxiv.org/abs/2605.04873</link>
      <description>arXiv:2605.04873v1 Announce Type: new 
Abstract: Recent advances in natural language processing have enabled increasingly accurate estimation of psychological traits from language. However, most existing approaches rely on supervised models trained to predict questionnaire scores, limiting interpretability and generalizability across contexts. The present study introduces a theory-driven and fully unsupervised framework for measuring psychological states directly from natural language using semantic projection. Psychological constructs were operationalized as interpretable semantic axes derived from lexical anchors and items from validated clinical scales assessing depression, anxiety, and worry. Participants textual responses were embedded using Sentence-BERT and projected onto these axes to generate continuous psychological scores across multiple response formats, including selected words, generated words, phrases, and free-text responses. Projection scores were evaluated through correlations with standardized clinical measures , split-half reliability analyses, attenuation corrections, distributional similarity using Wasserstein distance, and comparisons with lexicon-based sentiment analysis (VADER). Results showed strong associations between projection scores and clinical measures, particularly for structured formats such as selected words, written words, and phrases. Free-text responses produced weaker results when analyzed as whole texts, but performance improved substantially when sentence-level aggregation strategies were applied. These findings support semantic projection as an interpretable and scalable alternative to supervised language models for psychological assessment and highlight the importance of response format and text-processing strategies in language-based mental health measurement.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04873v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Maria Luongo, Davide Marocco, Nicola Milano</dc:creator>
    </item>
    <item>
      <title>Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models</title>
      <link>https://arxiv.org/abs/2605.04874</link>
      <description>arXiv:2605.04874v1 Announce Type: new 
Abstract: Direct Preference Optimization (DPO) has proven to be an effective solution for mitigating hallucination in Multimodal Large Language Models (MLLMs) by learning from preference pairs. One of its key challenges lies in how to transfer the sequence-level preference into fine-grained supervision on visual fidelity. To safeguard vision-related tokens that are prone to hallucination, existing methods typically allocate training emphasis according to the model's self-assessed visual sensitivity signals. However, such sensitivity, estimated by a model still under training, introduces self-referential bias: reinforcing already well-learned visual cues while neglecting hard-to-perceive but critical details, thereby limiting deeper alignment. In this work, we propose an Uncertainty-aware Exploratory Direct Preference Optimization (UE-DPO) method for MLLMs, which enables the model to uncover its cognitive deficiencies and actively explore for self-correction, guided by token-level epistemic uncertainty. Specifically, we first quantify the uncertainty from the model's failure to ground token predictions in the given image. Then, based on an uncertainty-aware exploration intensity, we encourage more learning pressure on visually deficient tokens in preferred samples, and alleviate the over-penalization of beneficial knowledge in dispreferred samples. Further, we provide a theoretical justification for our method, and extensive experiments demonstrate its effectiveness and robustness.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04874v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Huatian Zhang, Zhendong Mao, Lei Zhang, Yongdong Zhang</dc:creator>
    </item>
    <item>
      <title>Anticipating Innovation Using Large Language Models</title>
      <link>https://arxiv.org/abs/2605.04875</link>
      <description>arXiv:2605.04875v1 Announce Type: new 
Abstract: Forecasting innovation, intended as the emergence of new technological combinations, is a fundamental challenge for science and policy. We show that forthcoming combinations leave an early trace in the collective language of patents, with predictive signals detectable even decades in advance. We show that signal is not attributable to any single inventor, but emerges as a collective shift in how technologies are described across thousands of patents. To this end, we introduce TechToken, a transformer-based model that treats technologies, classified by International Patent Classification codes, as words in its vocabulary, learning the language of technologies by embedding these codes during fine-tuning. We define context similarity between code embeddings as a measure of linguistic convergence and show that it accurately predicts first technological combinations. TechToken also improves general representation quality, outperforming state-of-the-art models across different patent-related tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04875v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Enrico Maria Fenoaltea, Filippo Santoro, Giordano De Marzo, Segun Taofeek Aroyehun, Andrea Tacchella</dc:creator>
    </item>
    <item>
      <title>To Fuse or to Drop? Dual-Path Learning for Resolving Modality Conflicts in Multimodal Emotion Recognition</title>
      <link>https://arxiv.org/abs/2605.04877</link>
      <description>arXiv:2605.04877v1 Announce Type: new 
Abstract: Multimodal emotion recognition (MER) benefits from combining text, audio, and vision, yet standard fusion often fails when modalities conflict. Crucially, conflicts differ in resolvability: benign conflicts stem from missing, weak, or ambiguous cues and can be mitigated by cross-modal calibration, while severe conflicts arise from intrinsically contradictory (e.g., sarcasm) or misleading signals, for which forced fusion may amplify errors. Recognizing this, we propose Dual-Path Conflict Resolution (DCR), a unified framework that learns when to fuse and when to drop modalities. Path I (Affective Fusion Distiller, AFD) performs reverse distillation from audio/visual teachers to a textual student using temporally weighted class evidence, thereby enhancing representation-level calibration and improving fusion when alignment is beneficial. Path II (Affective Discernment Agent, ADA) formulates MER as a contextual bandit that selects among fusion and unimodal predictions based on a dual-view state and a calibration-aware reward, enabling decision-level arbitration under irreconcilable conflicts without requiring per-modality reliability labels. By taking into account the full multimodal context and coupling soft calibration with hard arbitration, DCR reconciles conflicts that can be aligned while bypassing misleading modalities when fusion is harmful. Across five benchmarks covering both dialogue-level and clip-level MER, DCR consistently outperforms competitive baselines or achieves highly competitive results. Further ablations, conflict-specific subset evaluation, and modality-selection analysis verify that AFD and ADA are complementary and jointly improve robust conflict-aware emotion recognition.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04877v1</guid>
      <category>cs.MM</category>
      <category>cs.HC</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yangchen Yu, Qian Chen, Jia Li, Zhenzhen Hu, Jinpeng Hu, Lizi Liao, Erik Cambria, Richang Hong</dc:creator>
    </item>
    <item>
      <title>A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs</title>
      <link>https://arxiv.org/abs/2605.04880</link>
      <description>arXiv:2605.04880v1 Announce Type: new 
Abstract: Recent research has revived and amplified interest in algorithms for undiscounted average reward reinforcement learning in infinite-horizon, non-episodic (continuing) tasks. Semi-Markov decision processes (SMDPs) are of particular interest. In SMDPs, discrete actions stochastically generate both rewards and durations, and the objective is to optimize the average reward rate. Existing algorithms approach this by optimizing the ratio of rewards to durations. However, when rewards and durations are non-stationary (in the infinite horizon), this can be incorrect. This paper presents a novel modified harmonic mean operator that correctly computes reward rates even under such conditions. This yields model-free learning algorithms that can work with SMDPs, while maintaining robustness to non-stationary reward and duration distributions over time. We prove theoretical properties of the modified harmonic mean operator, and empirically demonstrate its efficacy in comparison to existing algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04880v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Erel Shtossel, Alicia Vidler, Uri Shaham, Gal A. Kaminka</dc:creator>
    </item>
    <item>
      <title>From Classical to Quantum-Mechanical Data Assimilation: A Comparison between DATO and QMDA</title>
      <link>https://arxiv.org/abs/2605.04881</link>
      <description>arXiv:2605.04881v1 Announce Type: new 
Abstract: Data assimilation provides a systematic framework for combining dynamical models with partial and noisy observations to infer the evolving state of a system. In this work, we undertake a comparative study of Data Assimilation with Transfer Operators (DATO) and Quantum Mechanical Data Assimilation (QMDA), focusing on their mathematical formulation, algorithmic structure, and empirical performance. Both methods are first cast within a common operator-theoretic framework, which makes it possible to compare, on a unified basis, their representations of uncertainty, forecast propagation, and assimilation updates. We then analyse their principal similarities and differences with respect to state-space structure, update mechanisms, structural preservation properties, and computational cost. To complement the theoretical analysis, we assess both approaches on benchmark dynamical systems across a range of observational settings, including noisy, sparse, and partially observed regimes. Our results show that, despite their shared operator-theoretic motivation, DATO and QMDA embody substantially different assimilation paradigms, leading to distinct advantages and limitations in terms of interpretability, robustness, and scalability. The present study helps delineate the regimes in which each framework is most effective and offers broader insight into the design of operator-based methodologies for data assimilation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04881v1</guid>
      <category>cs.CE</category>
      <category>math.DS</category>
      <category>physics.ao-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Emanuele Donno, Giovanni Conti, Paolo Oddo, Silvio Gualdi, Luca Mainetti, Giovanni Aloisio</dc:creator>
    </item>
    <item>
      <title>FairEnc: A Fair Vision-Language Model with Fair Vision and Text Encoders for Glaucoma Detection</title>
      <link>https://arxiv.org/abs/2605.04882</link>
      <description>arXiv:2605.04882v1 Announce Type: new 
Abstract: Automated glaucoma detection is critical for preventing irreversible vision loss and reducing the burden on healthcare systems. However, ensuring fairness across diverse patient populations remains a significant challenge. In this paper, we propose FairEnc, a fair pretraining method for vision-language models (VLMs) that enables simultaneous debiasing across multiple sensitive attributes. FairEnc jointly mitigates biases in both textual and visual modalities with respect to multiple sensitive attributes, including race, gender, ethnicity, and language. Specifically, for the textual encoder, we leverage a large language model to generate synthetic clinical descriptions with varied sensitive attributes while preserving disease semantics, and employ a contrastive alignment objective to encourage demographic-invariant representations. For the visual encoder, we propose a dual-level fairness strategy that combines mutual information regularization to reduce statistical dependence between learned features and demographic groups, with multi-discriminator adversarial debiasing. Comprehensive experiments on the publicly available Harvard-FairVLMed dataset demonstrate that FairEnc effectively reduces demographic disparity as measured by DPD and DEOdds while achieving strong diagnostic performance under both zero-shot and linear probing evaluations. Additional experiments on the private FairFundus dataset show that FairEnc consistently preserves fairness advantages under cross-domain and cross-modality settings and maintains diagnostic performance within a competitive range. These results highlight FairEnc's ability to generalize fairness under distribution shifts, supporting its potential for more equitable deployment in real-world clinical settings. Our codebase and synthetic clinical notes are available at https://github.com/Mohamed-Elhabebe/FairEnc</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04882v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>eess.IV</category>
      <category>q-bio.QM</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Mohamed Elhabebe, Ayman El-Baz, Qing Liu</dc:creator>
    </item>
    <item>
      <title>A Comparative Study of PyCaret AutoML and CNN-BiLSTM for Binary Hate Speech Detection in Indonesian Twitter</title>
      <link>https://arxiv.org/abs/2605.04885</link>
      <description>arXiv:2605.04885v1 Announce Type: new 
Abstract: This paper compares a PyCaret AutoML branch and a CNN-BiLSTM branch for binary hate speech detection on Indonesian Twitter using the HS label from the corpus of Ibrohim and Budi. Both branches share the same preprocessing pipeline so that the comparison reflects modelling differences rather than inconsistent data preparation. The conventional branch uses TF-IDF with a lexicon-based abusive-word count, whereas the neural branch learns dense token representations and captures both local phrase patterns and bidirectional context. The benchmark is built from the released 13,130-row annotation table, whose HS label yields a 58:42 class ratio. On the held-out split, CNN-BiLSTM achieves the best result with 83.8% accuracy, 79.8% precision, 82.7% recall, and 81.2% F1-score. Within the PyCaret branch, Random Forest is the strongest conventional model with 77.2% accuracy and 77.0% F1-score. The neural branch therefore improves accuracy by 6.6 points and F1-score by 4.2 points. Exploratory corpus analysis, learning curves, and confusion matrices show that the dataset is short-text, moderately imbalanced, and still difficult because many decisions depend on local lexical cues plus short contextual composition. The study concludes that PyCaret AutoML is an effective conventional benchmarking framework, whereas CNN-BiLSTM is the stronger end model for the reported benchmark setting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04885v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tanty Widiyastuti,  Mayada, Adisty Syawalda Ariyanto, Luluk Muthoharoh, Ardika Satria, Martin Clinton Tosima Manullang</dc:creator>
    </item>
    <item>
      <title>BenCSSmark: Making the Social Sciences Count in LLM Research</title>
      <link>https://arxiv.org/abs/2605.04886</link>
      <description>arXiv:2605.04886v1 Announce Type: new 
Abstract: This position paper argues that the under-representation of social science tasks in contemporary LLM benchmarks limits advances in both LLM evaluation and social scientific inquiry. Benchmarks -- standardized tools for assessing computational systems -- are pivotal in the development of artificial intelligence (AI), including large language models (LLMs). Benchmarks do more than measure progress -- they actively structure it, shaping reputations, research agendas, and commercial outcomes. Despite this central role, the social sciences are largely absent from mainstream evaluation frameworks, even though scholars in these fields generate dozens of rigorously annotated, context-sensitive datasets each year. Integrating this work into benchmark design could significantly improve the generalization and robustness of AI models. In turn, models trained on social scientific tasks would likely yield better performance on classic and contemporary tasks in disciplines as diverse as history, sociology, political science or economics. This is all the more pressing as these disciplines are quickly turning to LLMs for assistance. To address this gap, we introduce BenCSSmark, a benchmark composed of datasets annotated by computational social scientists. By integrating social scientific perspectives into benchmarking, BenCSSmark seeks to promote more robust, transparent, and socially relevant AI systems and to foster efficient collaboration.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04886v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Arnault Chatelain, \'Etienne Ollion, Qianwen Guan, Diandra Fabre, Lorraine Goeuriot, Emile Chapuis, Abdelkrim Beloued, Marie Candito, Nicolas Herv\'e, Didier Schwab</dc:creator>
    </item>
    <item>
      <title>Sentiment Analysis and Customer Satisfaction Prediction on E-Commerce Platforms Based on YouTube Comments Using the XGBoost Algorithm</title>
      <link>https://arxiv.org/abs/2605.04887</link>
      <description>arXiv:2605.04887v1 Announce Type: new 
Abstract: The exponential expansion of digital commerce in Indonesia has significantly shifted consumer interactions toward video-centric social networks, particularly YouTube. Consequently, the sheer volume of unstructured, multi-contextual comments poses a tremendous challenge for manual sentiment tracking. This study investigates and constructs a predictive model for customer satisfaction leveraging the Extreme Gradient Boosting (XGBoost) architecture coupled with Term Frequency-Inverse Document Frequency (TF-IDF) vectorization. By utilizing a secondary dataset of YouTube comments retrieved from e-commerce review videos, the raw text underwent rigorous preprocessing to generate normalized numerical features. The experimental results demonstrate that the PyCaret-optimized machine learning framework delivers superior classification resilience. Beyond standard performance metrics, lexical evaluations and feature-importance mapping uncover a notable phenomenon: e-commerce discourse is heavily infiltrated by socio-political terminologies, which ultimately influence the polarity of audience satisfaction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04887v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ridho Benedictus Togi Manik, Muhammad Aqil Ramadhan, Ihsan Maulana Yusuf, Luluk Muthoharoh, Ardika Satria, Martin Clinton Tosima Manullang</dc:creator>
    </item>
    <item>
      <title>A Comparative Analysis of Machine Learning and Deep Learning Models for Tweet Sentiment Classification: A Case Study on the Sentiment140 Dataset</title>
      <link>https://arxiv.org/abs/2605.04888</link>
      <description>arXiv:2605.04888v1 Announce Type: new 
Abstract: The exponential growth of social media has created an urgent need for automated systems to analyze unstructured public sentiment in real time. This study compares a traditional Logistic Regression model using TF-IDF features with a deep learning Bidirectional Long Short-Term Memory (BiLSTM) architecture on a 10,000-tweet subset of the Sentiment140 dataset. Experimental results show that Logistic Regression outperformed BiLSTM, achieving an accuracy of 73.5% compared with 69.17%, while the deep learning model exhibited mild overfitting. These findings suggest that for medium-scale informal text data, classical machine learning with robust feature extraction can outperform more complex deep learning approaches. Finally, the trained models were integrated into an interactive web application using Streamlit and deployed on Hugging Face Spaces for public access.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04888v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vita Anggraini, Cintya Bella,  Bastian, Luluk Muthoharoh, Ardika Satria, Martin C. T. Manullang</dc:creator>
    </item>
    <item>
      <title>ADMM-based decomposed DNN+RLT Relaxations for Completely Positive Models in Electricity Market Clearing</title>
      <link>https://arxiv.org/abs/2605.04891</link>
      <description>arXiv:2605.04891v1 Announce Type: new 
Abstract: The day-ahead electricity market clearing with nonconvex order types can be formulated as a mixed-integer linear program (MILP), but its LP relaxation may provide weak bounds, and exact solutions can become computationally intractable in large-scale or extended market settings. We study a welfare-maximizing clearing model with elementary hourly orders, block orders with logical acceptance constraints, and flexible hourly orders. Starting from a compact MILP formulation, we derive an equivalent completely positive programming (CPP) reformulation via matrix lifting and propose relaxed CPP variants that further reduce the modeling burden while maintaining strong bounds. We then develop tractable doubly nonnegative (DNN) relaxations, including decomposed formulations that exploit the problem structure by using smaller positive semidefinite matrices. To further strengthen these bounds, we introduce reformulation-linearization technique (RLT) inequalities tailored to the decomposed structure. To tackle the challenge of large-scale DNNs, we design an alternating direction method of multipliers (ADMM) with adaptive penalty updates and rigorous dual lower bounds, enabling certified early termination. Computational experiments on synthetic instances show that the proposed DNN+RLT relaxations substantially tighten LP bounds, while decomposition and first-order methods significantly reduce computational effort.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04891v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shudian Zhao, Mohammad Reza Karimi Gharigh, Jan Kronqvist, Mohammad Reza Hesamzadeh</dc:creator>
    </item>
    <item>
      <title>Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics</title>
      <link>https://arxiv.org/abs/2605.04893</link>
      <description>arXiv:2605.04893v1 Announce Type: new 
Abstract: Large language models hallucinate in predictable ways: attention routing fails by over-concentrating on a narrow set of positions, or by spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport capacity; we prove that every transpose-invariant spectral diagnostic of this operator is structurally orientation-blind (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a quantitative converse establishing the asymmetry coefficient $G$ as the unique control parameter for direction.
  Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $\phi \ge 1/5$ with worst cut at $t^\ast/n \approx 0.32$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. The resulting two-axis diagnostic ($\phi$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (LC-AUROC from 0.62 to 0.84) on tested models up to 8B parameters, with polarity reversing as predicted between HaluEval and MedHallu.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04893v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dominik Dahlem, Diego Maniloff, Mac Misiura</dc:creator>
    </item>
    <item>
      <title>SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs</title>
      <link>https://arxiv.org/abs/2605.04894</link>
      <description>arXiv:2605.04894v1 Announce Type: new 
Abstract: Enterprises want AI code completion that is both high-quality and private, but they face a tension: proprietary models yield better results yet risk exposing proprietary code, while self-hosting large models is expensive and hard to maintain. As a lighter alternative, small CodeLLMs (1B-3B) can run on a developer's workstation accelerator with code never leaving the machine, but they fail on harder tasks. A practical solution is to use the small model for most requests and selectively route difficult ones to a larger self-hosted model. In this study, we evaluate 29 code specialized LLMs (0.5B-480B) from 12 families on execution-based fill-in-the-middle (FIM) code completion benchmarks across Python, Java, and C++, and find that model family and code specialized training matter more than size: a 3B model matches a 32B model despite being 10x smaller. Analyzing the 3B model's failures, we discover that 46% of its incorrect completions are not valid code. To enable efficient code completion, we propose SynConfRoute, a training-free method that combines token confidence with syntax validation to automatically decide per-request whether to keep the local completion or escalate to a larger self-hosted model. SynConfRoute improves pass@1 by 6.4% over confidence only routing on routine completions and by up to 31% on harder multi-language tasks, and the resulting pipeline achieves 78.9% on routine completions, 7.4% higher than always using the 480B model alone, while reducing accelerator usage by 58%. SynConfRoute generalizes across Python, Java, and C++, improving over confidence only routing on all three languages without ever rejecting a correct local completion. The pipeline uses off-the-shelf models with no custom training, making it immediately deployable in practice.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04894v1</guid>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kishanthan Thangarajah, Boyuan Chen, Ahmed E. Hassan</dc:creator>
    </item>
    <item>
      <title>Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization</title>
      <link>https://arxiv.org/abs/2605.04895</link>
      <description>arXiv:2605.04895v1 Announce Type: new 
Abstract: Published transfer-BO comparisons often estimate an average treatment effect of acquisition choice over hidden regime variables, while practitioners need the conditional effect for their specific prior quality, budget ratio, and metric. An audit of 40 transfer-BO papers from NeurIPS, ICML, ICLR, AISTATS, UAI, TMLR, JMLR, and AutoML-Conf (2022-2025) finds that 98% never vary B/|A| as a controlled axis. On the same GDSC2 benchmark, changing only the budget reverses the ranking: at B=50, Greedy outperforms UCB by 0.050 Hit@1, while at B=100, UCB outperforms Greedy by 0.035. We capture this transition with the Portable Regime Score PRS=(B/|A|)(1-rho), where rho is the prior rank correlation and can be estimated from pilot contexts before the main comparison. Across 79 conditions spanning chemistry, drug-response biology, and HPO, a hierarchical model gives beta=0.50 (p=1.1e-9), and 19% of conditions fall in an equivalence zone where |advantage|&lt;0.01 Hit@1. In five published reversal cases, PRS predicts the winner from pre-comparison observables. A No-Free-Leaderboard proposition explains why unconditional rankings are unstable: when CATE changes sign across regimes, the reported ATE becomes a function of benchmark mixture. RegimePlanner, which estimates rho online and switches acquisition accordingly, wins all 16 HPO-B search spaces at B=100 and exceeds the matched {Greedy,UCB} per-context oracle on GDSC2 by 18%. Pre-registered predictions achieve 27/40=67.5% overall accuracy and above 90% within EMA prior families. The practical protocol is simple: report B/|A|, rho, K, and metric alongside any claimed acquisition advantage.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04895v1</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Noel Thomas</dc:creator>
    </item>
    <item>
      <title>Storage Is Not Memory: A Retrieval-Centered Architecture for Agent Recall</title>
      <link>https://arxiv.org/abs/2605.04897</link>
      <description>arXiv:2605.04897v1 Announce Type: new 
Abstract: Extraction at ingestion is the wrong primitive for agent memory: content discarded before the query is known cannot be recovered at retrieval time. We propose True Memory, a six-layer architecture that shifts the center of the system from a storage schema to a multi-stage retrieval pipeline operating over events preserved verbatim. The full system runs as a single SQLite file on commodity CPU with no external database, vector index, graph store, or GPU. On LoCoMo (1,540 questions across 10 multi-session conversations), True Memory Pro reaches 93.0% accuracy (3-run mean) against 61.4% for Mem0, 65.4% for Supermemory, approximately 71% for Zep, and 94.5% for EverMemOS under a matched gpt-4.1-mini answer model. On LongMemEval (500 questions), True Memory Pro reaches 87.8% (3-run mean). On BEAM-1M (700 questions at the 1-million-token scale), True Memory Pro reaches 76.6% (3-run mean), above the prior published result of 73.9% for Hindsight. A 56-configuration ablation shows a 1.3-percentage-point spread within the top-performing configuration family.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04897v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Joshua Adler, Guy Zehavi</dc:creator>
    </item>
    <item>
      <title>A geometric relation of the error introduced by sampling a language model's output distribution to its internal state</title>
      <link>https://arxiv.org/abs/2605.04899</link>
      <description>arXiv:2605.04899v1 Announce Type: new 
Abstract: GPT-style language models are sensitive to single-token changes at generation points where the predicted probability distribution is spread across multiple tokens. Viewing this sensitivity as a geometric property, we derive an $\mathfrak{so}(n)$-valued 1-form that depends only on the geometry of the token embeddings. Despite this purely geometric origin, we show that its curvature is semantically meaningful: On chess reasoning tasks, the curvature couples to the world model of an off-the-shelf instruction-tuned model, with transformations clustering by board region and respecting piece importance. Our findings suggest that token space geometry directly reflects how models internally represent problems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04899v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Albert F. Modenbach</dc:creator>
    </item>
    <item>
      <title>On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference</title>
      <link>https://arxiv.org/abs/2605.04901</link>
      <description>arXiv:2605.04901v1 Announce Type: new 
Abstract: For Transformer models, cryptographically secure inference ensures that the client learns only the final output, while the server learns nothing about the client's input. However, securely computing nonlinear layers remains a major efficiency bottleneck due to the substantial communication rounds and data transmission required. To address this issue, prior works reveal intermediate activations to the client, allowing nonlinear operations to be computed in plaintext. Although this approach significantly improves efficiency, exposing activations enables adversaries to extract model weights. To mitigate this risk, existing works employ a shuffling defense that reveals only randomly permuted activations to the client. In this work, we show that the shuffling defense is not as robust as previously claimed. We propose an attack that aligns differently shuffled activations to a common permutation and subsequently exploits them to extract model weights. Experiments on Pythia-70m and GPT-2 demonstrate that the proposed attack can align shuffled activations with mean squared errors ranging from $10^{-9}$ to $10^{-6}$. With a query cost of approximately \$1, the adversary can recover model weights with L1-norm differences ranging from $10^{-4}$ to $10^{-2}$ compared to the oracle weights.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04901v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhengyi Li, Yakai Wang, Kang Yang, Yu Yu, Jiaping Gui, Yu Feng, Ning Liu, Minyi Guo, Jingwen Leng</dc:creator>
    </item>
    <item>
      <title>A Hierarchical Agent System with Reinforcement Learning for Multivariate Time Series Data Cleaning</title>
      <link>https://arxiv.org/abs/2605.04902</link>
      <description>arXiv:2605.04902v1 Announce Type: new 
Abstract: Multivariate time series (MTS) are frequently affected by co-occurring quality issues, such as missing values, outliers, and constraint violations, which significantly undermine downstream analytics. Existing cleaning approaches fix only a limited set of such issues, making them ill-suited for scenarios where multiple quality problems arise simultaneously. Furthermore, these methods commonly depend on the availability of ground truth data or domain-specific rules, both of which are rarely accessible in real-world applications.
  In this paper, we introduce \sys, an agent system with reinforcement learning designed to clean multiple data quality issues in MTS. We cast the cleaning process as a joint optimization problem that simultaneously handles quality issue order and cleaning model selection, allowing efficient navigation of the large space of possible cleaning pipelines. Our framework relies on a hierarchical agent architecture, where a high-level agent determines the order in which data quality issues should be processed, while a low-level agent identifies the most suitable cleaning method for each issue. To guide the agent toward an optimal cleaning pipeline, we propose a dual-stage reward mechanism that couples upstream (cleaning) and downstream performance, enabling effective optimization without relying on ground truth. Our experimental results show that \sys consistently outperforms existing methods, achieving up to 96\% improvement in data cleaning quality and 27\% improvement in downstream performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04902v1</guid>
      <category>cs.DB</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuhan Shi, Yuanyuan Yao, Lu Chen, Mourad Khayati, Tianyi Li</dc:creator>
    </item>
    <item>
      <title>Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs</title>
      <link>https://arxiv.org/abs/2605.04903</link>
      <description>arXiv:2605.04903v1 Announce Type: new 
Abstract: Large language models (LLMs) show strong potential for neural architecture generation, yet existing approaches produce complete model implementations from scratch -- computationally expensive and yielding verbose code. We propose Delta-Code Generation, where fine-tuned LLMs generate compact unified diffs (deltas) to refine baseline architectures rather than synthesizing entire models. Our pipeline iteratively fine-tunes the LLM via LoRA on curated architectures from the LEMUR dataset, with MinHash-Jaccard novelty filtering for structural diversity. We evaluate three 7B-class LLMs -- DeepSeek-Coder-7B, Qwen2.5-Coder-7B, and Mistral-7B -- across six datasets (CIFAR-10, CIFAR-100, MNIST, SVHN, ImageNette, CelebA) using a 22-cycle protocol (1,100 candidates per LLM). All three substantially surpass the full-generation baseline (50.6% valid rate, 42.3% mean first-epoch accuracy): DeepSeek-Coder reaches 75.3% valid rate and 65.8% mean accuracy; Qwen2.5-Coder 72.1%/64.6%; Mistral 66.6%/66.1%. On CIFAR-10, best first-epoch accuracies reach 85.5% (Mistral), 85.2% (DeepSeek), 80.6% (Qwen) -- well above 63.98% full generation and 71.5% for the concurrent approach of Gu et al. Output lengths are 30-50 lines versus 200+ for full generation (75-85% reduction). A 50-epoch study confirms the 1-epoch proxy preserves rankings (Mistral: Spearman $\rho$ = 0.926). Delta-based generation is a token-efficient, multi-domain, LLM-agnostic alternative to full-model synthesis for LLM-driven NAS.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04903v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Santosh Premi Adhikari, Radu Timofte, Dmitry Ignatov</dc:creator>
    </item>
    <item>
      <title>Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification</title>
      <link>https://arxiv.org/abs/2605.04904</link>
      <description>arXiv:2605.04904v1 Announce Type: new 
Abstract: In this paper, we explore deep learning techniques for individual identification of animals based on their skin patterns. Individual identification is crucial in biodiversity monitoring, since it enables analysis of decline or growth of populations, or intra-species interactions within populations. Models trained for the task of individual identification often do not focus on the skin pattern of animals, but on background details or body shape details. These characteristics are not individually specific, or can change drastically through time. We focus on techniques that will make machine learning models more responsive to skin pattern structure when extracting individual visual embeddings from images. For this, we explore image inpainting of task-specific masks as an auxiliary task to enhance ML-based individual identification from animal skin patterns. We propose a comparative analysis among four models as an encoder backbone for the individual identification task. We focus on the case study of zebrafish, which is a widely recognized biological model organism, and which exhibits individually identifying skin patterns. To evaluate encoder backbone performance, we present standard metrics for classification accuracy, embedding clustering metrics, and GradCAM visualizations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04904v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Jens van Bijsterveld, Daniele Avitabile, Fons J. Verbeek, Rita Pucci</dc:creator>
    </item>
    <item>
      <title>Cross-Model Consistency of Feature Importance in Electrospinning: Separating Robust from Model-Dependent Features</title>
      <link>https://arxiv.org/abs/2605.04905</link>
      <description>arXiv:2605.04905v1 Announce Type: new 
Abstract: Electrospinning is a highly sensitive fabrication process in which small variations in operating parameters can significantly influence fiber morphology and material performance. Machine learning (ML) methods are increasingly employed to model these process-structure relationships and to identify the relative importance of processing variables. However, most existing studies rely on a single ML model, implicitly assuming that the resulting feature importance is robust and reproducible. In this study, the consistency of feature importance across multiple ML model families was systematically evaluated using a curated dataset of 96 polyvinyl alcohol (PVA) electrospinning experiments. Twenty-one ML models representing linear, tree-based, kernel-based, neural network, and instance-based approaches were trained and compared. To provide a unified interpretability framework, SHAP (SHapley Additive exPlanations) values were used to calculate feature importance consistently across all models. A rank-based statistical analysis was then performed to quantify inter-model agreement and assess the robustness of parameter rankings. The results demonstrate that predictive performance and interpretive reliability are fundamentally distinct properties. Although several models achieved comparable predictive accuracy, substantial differences were observed in their feature importance rankings. Solution concentration emerged as the most robust and consistently influential parameter (variability = 0), whereas flow rate and applied voltage exhibited high ranking variability (variability &gt; 0.9), indicating strong model dependence. These findings suggest that feature importance derived from a single ML model may be unreliable, particularly for small experimental datasets, and highlight the importance of cross-model validation for achieving trustworthy interpretation in ML-assisted electrospinning research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04905v1</guid>
      <category>cs.LG</category>
      <category>cs.DB</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Mehrab Mahdian, Ferenc Ender, Tamas Pardy</dc:creator>
    </item>
    <item>
      <title>Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games</title>
      <link>https://arxiv.org/abs/2605.04906</link>
      <description>arXiv:2605.04906v1 Announce Type: new 
Abstract: While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all agents. In multi-agent games, the non-stationarity of other agents brings significant challenges on the evaluation of the reasoning process and the credit assignment over multiple reasoning steps. Existing single-agent reinforcement learning (RL) approaches and their multi-agent extensions fail to address these challenges as they do not incorporate other agents in the reasoning process. In this work, we propose Strat-Reasoner, a novel RL-based framework that improves LLMs' strategic reasoning ability in multi-agent games. We introduce a novel recursive reasoning paradigm where an agent's reasoning also integrates other agents' reasoning processes. To provide effective reward signals for the intermediate reasoning sequences, we employ a centralized Chain-of-Thought (CoT) comparison module to evaluate the reasoning quality. Finally, we compute an accurate hybrid advantage and develop a group-relative RL approach to optimize the LLM policy. Experimental results show that Strat-Reasoner substantially improves strategic abilities of underlying LLMs, achieving 22.1\% average performance improvements across various multi-agent games.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04906v1</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yidong He, Yutao Lai, Pengxu Yang, Jiarui Gan, Jiexin Wang, Yi Cai, Mengchen Zhao</dc:creator>
    </item>
    <item>
      <title>Curated AI beats frontier LLMs at pharma asset discovery</title>
      <link>https://arxiv.org/abs/2605.04908</link>
      <description>arXiv:2605.04908v1 Announce Type: new 
Abstract: General-purpose LLMs with web search are increasingly used to scout the competitive landscape of pharmaceutical pipelines. We benchmark Gosset -- an AI platform with a chat interface backed by curated target-, modality-, and indication-level drug-asset annotations -- against four frontier systems with web access (Claude Opus 4.7, GPT 5.5, Gemini 3.1 Pro, Perplexity sonar-pro) on ten niche oncology/immunology targets where most of the pipeline lives in the long tail of preclinical and Asian-developed assets. All five systems receive the same natural-language query and the same JSON output schema. Across 10 targets Gosset returns 3.2x more verified drugs per query than the best frontier system, at perfect precision and 100% recall against the cross-system union of verified drugs. The same curated index is exposed as a Gosset MCP server that any frontier model can call as a tool, suggesting that each of these systems can close most of the recall gap by swapping generic web search for a curated index behind the same chat interface.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04908v1</guid>
      <category>cs.AI</category>
      <category>q-bio.QM</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>{\L}ukasz Kidzi\'nski, Kevin Thomas</dc:creator>
    </item>
    <item>
      <title>Breaking the Quality-Privacy Tradeoff in Tabular Data Generation via In-Context Learning</title>
      <link>https://arxiv.org/abs/2605.04911</link>
      <description>arXiv:2605.04911v1 Announce Type: new 
Abstract: Tabular data synthesis aims to generate high-quality data while preserving privacy. However, we find that existing tabular generative models exhibit a clear tradeoff in the small-data regime: improving data quality typically comes at the cost of increased memorization of training samples, thereby weakening privacy protection. This tradeoff arises because small training sets make it difficult for dataset-specific generative models to distinguish generalizable structure from sample-specific patterns. To address this, we propose DiffICL, which formulates tabular data generation as an in-context learning problem. Instead of fitting each dataset from scratch,DiffICL leverages pretrained structural priors learned from a large collection of datasets, enabling it to infer data distributions from limited context rather than memorizing individual samples. We evaluate DiffICL on 14 real-world datasets. Results show that DiffICL improves both data quality and privacy, and generate synthetic data that provides effective data augmentation. Our findings suggest that the quality-privacy tradeoff can be improved through better training paradigms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04911v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xinyan Han, Yan Lu, Xiaoyu Lin, Yuanyuan Jiang, Yuanrui Wang, Xuanyue Li, Wenchao Zou, Xingxuan Zhang</dc:creator>
    </item>
    <item>
      <title>Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training</title>
      <link>https://arxiv.org/abs/2605.04913</link>
      <description>arXiv:2605.04913v1 Announce Type: new 
Abstract: LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-gradient access to pretrained representations. We argue that this full-depth backward coupling can be unnecessarily expensive and intrusive, particularly when post-training supervision is much narrower than pre-training. To this end, we propose \textbf{LoPT}: Local-Learning Post-Training, a simple post-training strategy that makes gradient reach an explicit design choice. LoPT places a single gradient boundary at the transformer midpoint: the second-half block learns from the task objective, while the first-half block is updated by a lightweight feature-reconstruction objective to preserve useful representations and maintain interface compatibility. LoPT shortens the task-induced backward path while limiting direct interference from narrow task gradients on early-layer representations. Extensive experiments demonstrate that LoPT achieves competitive performance with lower memory cost, higher training efficiency and better retention of pretrained capabilities. Our code is available at: https://github.com/HumyuShi/LoPT</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04913v1</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hengyu Shi, Tianyang Han, Peizhe Wang, Zhiling Wang, Xu Yang, Junhao Su</dc:creator>
    </item>
    <item>
      <title>A Foundation Model for Zero-Shot Logical Rule Induction</title>
      <link>https://arxiv.org/abs/2605.04916</link>
      <description>arXiv:2605.04916v1 Announce Type: new 
Abstract: Inductive Logic Programming (ILP) learns interpretable logical rules from data. Existing methods are transductive: their learned parameters are bound to specific predicates and require retraining for each new task. We introduce Neural Rule Inducer (NRI), a pretrained model for zero-shot rule induction. Rather than encoding literal identities, NRI represents literals using domain-agnostic statistical properties such as class-conditional rates, entropy, and co-occurrence, which generalize across variable identities and counts without retraining. The model consists of a statistical encoder and a parallel slot-based decoder. Parallel decoding preserves the permutation invariance of logical disjunction; an autoregressive decoder would instead impose an arbitrary clause order. Product T-norm relaxation makes rule execution differentiable, allowing end-to-end training on prediction accuracy alone. We evaluate NRI on rule recovery, robustness to label noise and spurious correlations, and zero-shot transfer to real-world benchmarks, and we believe this work opens up the possibility of foundation models for symbolic reasoning. Code and the reference checkpoint are available at https://github.com/phuayj/neural-rule-inducer.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04916v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.SC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Yin Jun Phua</dc:creator>
    </item>
    <item>
      <title>Koopman Identification of Nonlinear Systems via Reservoir Liftings</title>
      <link>https://arxiv.org/abs/2605.04917</link>
      <description>arXiv:2605.04917v1 Announce Type: new 
Abstract: Learning tractable linear representations of nonlinear dynamical systems via Koopman operator theory is often hindered by dictionary selection, temporal memory encoding, and numerical ill-conditioning. Inspired by Reservoir Computing (RC) paradigm, this paper introduces the RC-Koopman framework, which interprets reservoir as a stateful, finite-dimensional Koopman dictionary whose temporal depth is explicitly controlled by its spectral radius. We show that the Echo State Property (ESP) guarantees well-posedness and favorable numerical conditioning of the lifted Koopman approximation. A correlation-based spectral radius selection algorithm aligns reservoir memory with dominant system timescales. Analysis reveals how the finite memory of the reservoir determines which Koopman eigenfunctions remain observable from the lifted features. Evaluation on synthetic benchmarks demonstrates that RC-Koopman achieves a favorable balance between reconstruction accuracy of the underlying nonlinear dynamics and dynamical stability, compared to Extended Dynamic Mode Decomposition (EDMD) and Hankel-based lifting approaches. Code available at: https://github.com/NEAR-the-future/RC-Koopman.git</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04917v1</guid>
      <category>cs.LG</category>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Weibin Gu, Chen Yang, Lu Shi</dc:creator>
    </item>
    <item>
      <title>Reinforcement Learning for Compositional Generalization with Outcome-Level Optimization</title>
      <link>https://arxiv.org/abs/2605.04920</link>
      <description>arXiv:2605.04920v1 Announce Type: new 
Abstract: Compositional generalization refers to correctly interpret novel combinations of known primitives, which remains a major challenge. Existing approaches often rely on supervised fine-tuning, which encourages models to imitate target outputs. This token-level training paradigm fails to capture the global compositional structure required for generalizing to unseen combinations. In this work, we investigate whether compositional generalization can instead be improved through outcome-level reinforcement learning. We adopt Group Relative Policy Optimization to optimize models based on feedback on their final outputs. Within this framework, we explore both a simple binary outcome reward and a composite reward that provides additional composition feedback. Experiments on multiple compositional benchmarks show that reinforcement learning improves compositional generalization compared to supervised fine-tuning. Further analysis reveals that supervised models tend to overfit frequent training compositions, whereas reinforcement learning improves compositional generalization by reshaping the output distribution, particularly for more complex composition types.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04920v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiyan Fu, Wei Liu</dc:creator>
    </item>
    <item>
      <title>Evolving Idea Graphs with Learnable Edits-and-Commits for Multi-Agent Scientific Ideation</title>
      <link>https://arxiv.org/abs/2605.04922</link>
      <description>arXiv:2605.04922v1 Announce Type: new 
Abstract: LLM-empowered multi-agent systems offer new potential to accelerate scientific discovery by generating novel research ideas. However, existing methods typically coordinate agents through temporary texts, such as drafts or chat logs; it is difficult to pinpoint the weaknesses in the generated ideas and how the agents refine them. To this end, we introduce \textbf{Evolving Idea Graphs} (EIG), a graph-based multi-agent scientific ideation framework that can generate high-performance research ideas across various benchmark-native metrics, such as novelty, feasibility, and clarity. Instead of coordinating solely through texts, EIG represents a partially formed proposal as an evolving idea graph, where nodes capture scientific claims and edges encode relations (e.g., support and conflict), enabling unresolved weaknesses to remain identifiable throughout the idea evolving process. Specifically, a learned two-head controller operates over the evolving graph to guide the ideation: one head selects graph edits for agents to execute, while the other decides when the graph is ready for commit as final proposal synthesis. On AI Idea Bench 2025 and LiveIdeaBench, EIG outperforms all compared systems on both automatic benchmark scores and blind expert ratings. Ablations further show that explicit graph state provides the main performance gains, and learned edit-and-commit control adds consistent improvements.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04922v1</guid>
      <category>cs.MA</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiangwen Dong, Bo Li, Wanyu Lin</dc:creator>
    </item>
    <item>
      <title>Unintended Negative Impacts of Promotional Language in Patent Evaluation</title>
      <link>https://arxiv.org/abs/2605.04926</link>
      <description>arXiv:2605.04926v1 Announce Type: new 
Abstract: Promotional language has been increasingly used to aid the communication of innovative ideas in science. Yet, less is known about its role in the context of technological innovation. Here, we use a validated and domain-diagnosed lexicon of 135 promotional words to study the association between promotional language and patent evaluation outcomes among 2.7 million USPTO patent applications. Our large-scale study reveals three unexpected findings. First, in contrast to scientific evaluation, we find that a higher frequency of promotional words is negatively associated with the probability of an application being (i) granted a patent, (ii) transferred ownership, and (iii) successfully appealed. This promotional penalty holds even after accounting for a range of confounding factors and is largely robust across different technological areas. Among matched samples, the difference in the success rate between the lowest and highest promotional density quintile is 5.5, 5.9, and 5.3 percentage points for patentability, transferability, and rejection reversal. Second, contrary to institutional skepticism, we show that promotional language is not a mask of weak technology, but objectively reflects the degree of combinatorial novelty and future citation impact. Third, digging into the mechanisms, we find that the tolerance to promotional framing is strongly moderated by human factors, with men and experienced examiners showing a higher acceptance of promotional narratives than women and novice examiners. By revealing an emerging paradox in the patent system, our study offers theoretical and practical implications for improving patent evaluation through more objective scrutiny of linguistic patterns in patent filings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04926v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bingkun Zhao, Chenwei Zhang, Hao Peng</dc:creator>
    </item>
    <item>
      <title>When Does Gene Regulatory Network Inference Break? A Controlled Diagnostic Study of Causal and Correlational Methods on Single-Cell Data</title>
      <link>https://arxiv.org/abs/2605.04930</link>
      <description>arXiv:2605.04930v1 Announce Type: new 
Abstract: Despite theoretical advantages, causal methods for Gene Regulatory Network (GRN) inference from single-cell RNA-seq data consistently fail to match or outperform correlation-based baselines in many realistic benchmarks, a persistent puzzle which casts doubt on the value of causality for this task. We argue that existing benchmarks are insufficiently controlled to answer this question because they evaluate on real or semi-real data where multiple pathologies co-occur, confounding failure modes, and obscuring the specific conditions under which different inference methods excel or fail. To address this gap, we introduce a controlled diagnostic framework that isolates seven biologically motivated pathologies (dropout, latent confounders, cell-type mixing, feedback loops, network density, sample size, and pseudotime drift) and measure how six representative methods spanning three inference paradigms degrade as each pathology intensifies. Across 6,120 controlled experiments, we find that causal methods genuinely dominate in clean and structurally favorable regimes, but specific pathologies (notably dropout and latent confounders) selectively neutralize their advantages. We further introduce an error-type decomposition that reveals methods with similar aggregate accuracy commit qualitatively different errors. To probe whether single-pathology effects persist when multiple stressors co-occur, we perform an interaction sweep over the three most impactful pathologies and find that their joint effects are sub-additive, while also exposing density-conditional cross-overs invisible to single-dial analysis. Our findings offer a nuanced understanding of when and why different methods succeed or fail for GRN inference, providing actionable insights for method development and practical guidance for practitioners.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04930v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>q-bio.GN</category>
      <category>q-bio.QM</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Miguel Fernandez-de-Retana, Ruben Sanchez-Corcuera, Unai Zulaika, Aritz Bilbao-Jayo, Aitor Almeida</dc:creator>
    </item>
    <item>
      <title>Interaction Tree Semantics for RISC-V: Bridging Compiler and Hardware Verification</title>
      <link>https://arxiv.org/abs/2605.04933</link>
      <description>arXiv:2605.04933v1 Announce Type: new 
Abstract: The Instruction Set Architecture (ISA) is the contract between compilers and processors; proving this contract formally demands cross-level connection to existing mechanized compilers and hardware implementations. As an open, modular ISA gaining adoption across embedded, mobile, and cloud platforms, RISC-V makes a formally verified ISA specification particularly valuable. However, existing formal RISC-V specifications focus on hardware tooling rather than cross-level verification: they provide no machine-checked instruction-level properties and lack support for verifying this contract across levels.
  We address these limitations with a formal semantics of the RISC-V ISA in Rocq, built on Interaction Trees (ITrees). By leveraging ITree bisimulation and refinement, our semantics enables cross-level verification from compiler IR to hardware within a single framework. Our formalization covers a wide spectrum of RISC-V extensions. The correctness of individual instruction semantics is backed by machine-checked lemmas in Rocq. We further validate it by extracting an executable simulator that passes all standard RISC-V test suites. Three case studies demonstrate the effectiveness of our semantics for cross-level verification: first, we prove semantic equivalence via bisimulation between LLVM IR and RISC-V code on an array access pattern via Vellvm (LLVM ITree semantics); second, we apply translation validation to a specific instruction reordering for macro-operation fusion, distinguishing safe reorderings from those that break program-counter-relative addressing; third, we prove that a K\^oika hardware ALU correctly implements all R-type integer operations (e.g., ADD, SUB, AND) against our ISA contract.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04933v1</guid>
      <category>cs.PL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shuanglong Kan, Sebastian Ertel</dc:creator>
    </item>
    <item>
      <title>Modular Reinforcement Learning For Cooperative Swarms</title>
      <link>https://arxiv.org/abs/2605.04939</link>
      <description>arXiv:2605.04939v1 Announce Type: new 
Abstract: A cooperative robot swarm is a collective of computationally-limited robots that share a common goal. Each robot can only interact with a small subset of its peers, without knowing how this affects the collective utility. Recent advances in distributed multi-agent reinforcement learning have demonstrated that it is possible for robots to learn how to interact effectively with others, in a manner that is aligned with the common goal, despite each robot learning independently of others. However, this requires each robot to represent a potentially combinatorial number of interaction states, challenging the memory capabilities of the robots. This paper proposes an alternative approach for representing spatial interaction states for multi-robot reinforcement learning in swarms. A modular (decomposed) representation is used, where each feature of the state is handled by a separate learning procedure, and the results aggregated. We demonstrate the efficacy of the approach in numerous experiments with simulated robot swarms carrying out foraging.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04939v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Erel Shtossel, Gal A. Kaminka</dc:creator>
    </item>
    <item>
      <title>UFAL-CUNI at SemEval-2026 Task 11: An Efficient Modular Neuro-symbolic Method for Syllogistic Reasoning</title>
      <link>https://arxiv.org/abs/2605.04941</link>
      <description>arXiv:2605.04941v1 Announce Type: new 
Abstract: This paper describes our system submitted to SemEval-2026 Task 11: Disentangling Content and Formal Reasoning in Large Language Models. We present an efficient modular neuro-symbolic approach, combining a symbolic prover with small reasoning LLMs (4B parameters). The system consists of an LLM-based parser that translates natural language syllogisms to a first-order logic (FOL) representation, an automated theorem prover, and two optional modules: machine translation for multilingual inputs and a symbolic retrieval component for the identification of relevant premises. The system achieves competitive accuracy and relatively low content effect on most subtasks. Our ablations show that this approach outperforms LLM-based zero-shot baselines in this parameter size range, but also reveal limited multilingual capabilities of small LLMs. Finally, we include a discussion of the task's main ranking metric and analyze its limitations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04941v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ivan Kart\'a\v{c}, Krist\'yna Onderkov\'a, Jan Bronec, Zden\v{e}k Kasner, Mateusz Lango, Ond\v{r}ej Du\v{s}ek</dc:creator>
    </item>
    <item>
      <title>DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring</title>
      <link>https://arxiv.org/abs/2605.04943</link>
      <description>arXiv:2605.04943v1 Announce Type: new 
Abstract: The condition monitoring (CM) of synthetic fibre ropes (SFRs) used in offshore, maritime, and industrial settings demands more than a classifier: inspectors need continuous severity estimates, maintenance recommendations, anomaly flags, deterioration timelines, and automated reports, all from a single inspection image. We present DART (Damage Assessment via Rope Transformer), a vision-language foundation model that addresses the full rope inspection workflow through a unified multi-task architecture. DART extends the Joint-Embedding Predictive Architecture (JEPA) to the cross-modal domain by coupling a Vision Transformer (ViT-H/14) with Llama-3.2-3B-Instruct via a Severity-Conditioned Cross-Modal Fusion (SC-CMF) module. Three architectural innovations drive the model's versatility: (1) HD-MASK, a saliency-guided masking strategy that focuses self-supervised reconstruction on damage-dense patches; (2) per-class learnable severity gates that adaptively weight language grounding by damage category; and (3) a Contrastive Damage Disentanglement (CDD) loss that shapes the embedding space to simultaneously encode damage type, severity ordering, and cross-modal semantics. Trained once on 4,270 images spanning 14 fine-grained rope damage classes, the frozen DART backbone supports downstream tasks without any task-specific fine-tuning: damage classification (93.22 % accuracy, 91.04 % macro-F1, +38.5 pp over a vision-only baseline), continuous severity regression (Spearman rho = 0.94, within-1-ordinal accuracy 99.6 %), few-shot recognition (89.2 % macro-F1 at 20 shots). These results demonstrate that DART functions as a general-purpose CM backbone that goes well beyond classification, providing actionable inspection intelligence from a single shared representation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04943v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anju Rani, Daniel Ortiz-Arroyo, Petar Durdevic</dc:creator>
    </item>
    <item>
      <title>Training-Time Batch Normalization Reshapes Local Partition Geometry in Piecewise-Affine Networks</title>
      <link>https://arxiv.org/abs/2605.04946</link>
      <description>arXiv:2605.04946v1 Announce Type: new 
Abstract: Batch normalization (BN) is central to modern deep networks, but its effect on the realized function during training remains less understood than its optimization benefits. We study training-time BN in continuous piecewise-affine (CPA) networks through the geometry of switching hyperplanes and the induced affine-region partition. Conditioned on a mini-batch, we show that BN defines for each neuron a reference hyperplane through the batch centroid, and that breakpoint-switching hyperplanes are parallel translates whose offsets are expressed in batch-standardized coordinates and are independent of the raw bias. This yields an exact criterion for when a switching hyperplane intersects a local $\ell_\infty$ window and motivates a local region-density functional based on exact affine-region counts. Under explicit sufficient conditions, we show that BN increases expected local partition refinement in ReLU and more general piecewise-affine networks, and that this mechanism transfers locally through depth inside parent affine regions where the upstream representation map is an affine embedding. These results provide a function-level geometric account of training-time BN as a batch-conditional recentering mechanism near the data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04946v1</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xuan Qi, Yi Wei, Fanqi Yu, Furao shen, Vittorio Murino, Cigdem Beyan</dc:creator>
    </item>
    <item>
      <title>Conflict Essences for Transformation Rules with Nested Application Conditions -- Long Version</title>
      <link>https://arxiv.org/abs/2605.04947</link>
      <description>arXiv:2605.04947v1 Announce Type: new 
Abstract: Conflict and dependency analysis is an important static analysis tool that provides an overview of the potential interactions of (graph) transformation rules. This analysis is based on critical pairs and initial conflicts, which represent conflicting transformations in a minimal context. However, the crucial information about a conflicting transformation pair is contained in much smaller structures, called disabling/conflict essences in existing research. Recently, we introduced disabling essences for rules with application conditions which contain the information on how an application condition can be violated by another rule. In this paper, we extend the notion of disabling essences to support not only application conditions in Alternating Quantifier Normal Form, but also arbitrary nested conditions. We introduce (symbolic) conflict essences that are constructed from disabling essences and which capture the interaction between two rules. We show that a transformation pair is parallel dependent if and only if a symbolic conflict essence can be embedded into it and relate symbolic conflict essences to initial conflicts for transformation rules with application conditions. We present our results for adhesive HLR categories, which includes several types of graph-like structures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04947v1</guid>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alexander Lauer, Jens Kosiol, Leen Lambers, Gabriele Taentzer</dc:creator>
    </item>
    <item>
      <title>Adapting Large Language Models to a Low-Resource Agglutinative Language: A Comparative Study of LoRA and QLoRA for Bashkir</title>
      <link>https://arxiv.org/abs/2605.04948</link>
      <description>arXiv:2605.04948v1 Announce Type: new 
Abstract: This paper presents a comparative study of parameter-efficient fine-tuning (PEFT) methods, including LoRA and QLoRA, applied to the task of adapting large language models to the Bashkir language, a low-resource agglutinative language of the Turkic family. Experimental evaluation is conducted on a Bashkir text corpus of 71k documents (46.9M tokens) using models of various architectures: DistilGPT2, GPT-2 (base, medium), Phi-2, Qwen2.5-7B, DeepSeek-7B, and Mistral-7B. To improve the reliability of results, each configuration was trained with three different random seeds.
  The lowest perplexity on the test set was obtained for GPT-2 medium with full fine-tuning (3.34). Meanwhile, QLoRA applied to Mistral-7B (3.79) and Phi-2 (3.81) achieved comparable quality with over 40 times fewer trainable parameters. However, we also observed cases of significant quality degradation when using PEFT for certain architectures (e.g., DeepSeek-7B with rank 8, perplexity = 129.55), indicating that the outcome depends critically on the choice of the base model and its tokenizer.
  Additionally, a qualitative analysis of generated texts based on Bashkir prompts revealed that models with the best perplexity do not necessarily produce the most coherent outputs: QLoRA-tuned models generated monolingual Bashkir continuations, whereas the fully fine-tuned model with the lowest perplexity frequently switched to English. The results suggest that QLoRA on 7B-scale models offers an effective compromise between quality and computational cost for Bashkir. To ensure reproducibility, open data, code, and trained adapters will be released upon acceptance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04948v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mullosharaf K. Arabov, Svetlana S. Khaybullina</dc:creator>
    </item>
    <item>
      <title>AllSERP: Exhaustive Per-Element Enrichment of the Versatile AdSERP Dataset</title>
      <link>https://arxiv.org/abs/2605.04949</link>
      <description>arXiv:2605.04949v1 Announce Type: new 
Abstract: We release AllSERP, a typed AOI and per-element behavioral enrichment of the AdSERP commercial-intent SERP corpus [4]. AdSERP ships 2,776 trials of full-page screenshots, captured SERP HTML, 150 Hz Gazepoint eye tracking, evtrack mouse telemetry, scroll, and pupil signals against real Google SERPs collected before AI Overviews -- but its bounding boxes cover only ad surfaces (15.5 % of attributable clicks). AllSERP adds pixel-accurate organic and widget bboxes via screenshot-anchored CV, semantic types across thirteen element types via an HTML parser, an inter-result gap-fill flavor (typed_gapfill), and X+Y click attribution that reaches 91.7 % of the corpus while flagging the rest at trial level. The Phase C ad-vs-non-ad partition is internally consistent with the shipped ad rectangles (0 disagreements across 38,250 classifications). We ship the pipeline, per-trial JSONs, a corpus CSV, and a browser-based replay viewer; everything is reproducible from the AdSERP Zenodo volume. The release enables per-element click, fixation, regression, and above-fold analyses that the shipped ads-vs-organic split could not resolve.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04949v1</guid>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>K. Andrew Edmonds</dc:creator>
    </item>
    <item>
      <title>Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts</title>
      <link>https://arxiv.org/abs/2605.04952</link>
      <description>arXiv:2605.04952v1 Announce Type: new 
Abstract: Mixture-of-experts (MoE) models enable scalable transformer architectures by activating only a subset of experts per token. Recent evidence suggests that performance improves with increasingly granular experts, i.e., many small experts instead of a few large ones. However, this regime substantially increases routing cost, which can dominate computation. We introduce adaptive inverted-index routing for MoE (AIR-MoE), an inverted-index-inspired routing architecture based on vector quantization (VQ). In a first stage, AIR-MoE performs coarse shortlisting by assigning tokens to VQ codewords to construct a candidate set of experts. In a second stage, fine scoring computes exact routing scores restricted to this shortlist. This two-stage procedure approximates true top-k routing while avoiding full expert scoring and, in contrast to prior work, imposing no structural constraints on expert parameters. AIR-MoE serves as a drop-in replacement for standard routers and requires no modifications to the model architecture or loss function. We further provide a lower bound on the mass recall achieved by AIR-MoE that yields insights into its inner workings. Empirically, we demonstrate that AIR-MoE achieves improved performance compared to existing routing approaches in granular MoE settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04952v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Klaus-Rudolf Kladny, Maximilian Mordig, Bernhard Sch\"olkopf, Michael Muehlebach</dc:creator>
    </item>
    <item>
      <title>On the Influence of the Feature Computation Budget on Per-Instance Algorithm Selection for Black-Box Optimization</title>
      <link>https://arxiv.org/abs/2605.04954</link>
      <description>arXiv:2605.04954v1 Announce Type: new 
Abstract: Per-instance algorithm selection (PIAS) takes advantage of complementarity between a set of algorithms by deciding which algorithm to run on a given instance. This decision is based on features of the instances, which, in the context of black-box optimization (BBO), require a part of the optimization budget to be computed. This raises two questions: (a) from which fraction of the budget spent on feature computation does PIAS become worth it for BBO, and (b) which fraction of the budget optimizes the tradeoff between feature accuracy and PIAS performance. To this end, we perform a broad study where PIAS with varying sampling budgets for feature computation is compared to the single best algorithm on a broad range of algorithm selection scenarios. These scenarios consist of two portfolio sizes, three problem sets, 4 dimensionalities, and 10 target budgets. We find that PIAS is viable for the majority of tested scenarios, even when as much as a quarter of the total budget is spent on feature computation. The tradeoff for the fraction of the budget spent on feature computation to maximize the benefit of PIAS is highly dependent on the specific AS scenario. Further, on average 20 percent of PIAS loss to the virtual best solver is explained by the budget spent on feature computation, highlighting the importance of properly accounting for the feature budget.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04954v1</guid>
      <category>cs.NE</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Koen van der Blom, Diederick Vermetten</dc:creator>
    </item>
    <item>
      <title>Order-based Rehearsal Learning</title>
      <link>https://arxiv.org/abs/2605.04955</link>
      <description>arXiv:2605.04955v1 Announce Type: new 
Abstract: When a machine learning (ML) model forecasts an undesired event, one often seeks a decision to avoid it, known as the avoiding undesired future (AUF) problem. Many rehearsal learning methods have been proposed for AUF, but they rely on an underlying graph structure; learning such a graph from observational data is challenging and can incur substantial estimation error. In this work, we demonstrate that the order structure can be sufficient for AUF decision-making, and propose the first order-based rehearsal learning method. Although an order is less informative than a graph, it can be sufficient to identify the influence of decisions from observational data, suggesting that learning the entire graph is not always necessary. To learn the order, we develop an information-theoretic method that imposes no restrictions on the form of structural functions or the type of noise distributions. For AUF decision-making, we construct an order-based sampler to approximate the influence of decisions and, combined with a surrogate objective for maximizing the post-decision success probability, reduce the AUF task to a differentiable optimization problem. Experiments show that our order learning method outperforms existing methods, and that our AUF approach not only surpasses methods relying on learned graphs or learned orders, but also matches or even exceeds oracle baselines that are given the true graph.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04955v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yu-Xuan Tao, Tian-Zuo Wang, Zhi-Hua Zhou</dc:creator>
    </item>
    <item>
      <title>KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels</title>
      <link>https://arxiv.org/abs/2605.04956</link>
      <description>arXiv:2605.04956v1 Announce Type: new 
Abstract: LLM-based Triton kernel generation has attracted significant interest, yet a fundamental empirical question remains unanswered: where does this capability break down, and why? We present KernelBench-X, a benchmark designed to answer this question through category-aware evaluation of correctness and hardware efficiency across 176 tasks in 15 categories. Our systematic comparison of five representative methods yields three main findings. First, task structure determines correctness more than method design. Category explains nearly three times more variance in semantic correctness than method (9.4% vs 3.3% explained deviance), and 72% of Fusion tasks fail across all five methods while Math tasks are solved consistently. Second, iterative refinement improves correctness, but not performance. Across GEAK iterations, compile rate rises from 52.3% to 68.8% while average speedup declines from $1.58\times$ to $1.44\times$; newly rescued kernels consistently underperform persistently correct ones ($1.16\times$ vs $1.58\times$ speedup in round~0$\to$1). Third, correctness does not imply efficiency. 46.6% of correct kernels are slower than the PyTorch eager baseline, and cross-hardware speedup variance reaches $21.4\times$. Besides, quantization remains completely unsolved (0/30 successes) despite non-trivial compilation rates, revealing systematic misunderstanding of numerical computation contracts rather than surface-level syntax errors. These findings suggest that future progress depends on handling global coordination, explicitly modeling numerical precision, and incorporating hardware efficiency into generation. The code is available at https://github.com/BonnieW05/KernelBenchX</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04956v1</guid>
      <category>cs.LG</category>
      <category>cs.PF</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Han Wang, Jintao Zhang, Kai Jiang, Haoxu Wang, Jianfei Chen, Jun Zhu</dc:creator>
    </item>
    <item>
      <title>Delving into Non-Exchangeability for Conformal Prediction in Graph-Structured Multivariate Time Series</title>
      <link>https://arxiv.org/abs/2605.04957</link>
      <description>arXiv:2605.04957v1 Announce Type: new 
Abstract: Point forecasting for graph-structured multivariate time series is a fundamental problem, but rigorous uncertainty quantification for such predictions is still underexplored. Conformal prediction (CP) offers uncertainty estimation with a solid coverage guarantee under the exchangeability assumption, which requires the joint data distribution to be unchanged under permutation. However, in graph-structured time series, inherent cross-node coupling can violate the exchangeability condition, making direct application of CP unreliable. Inspired by the spectral graph theory, such coupling resides in global trends and can be characterized by the low-frequency components, while high-frequency components are nearly exchangeable. Therefore, we propose a novel concept named Spectral Graph Conditional Exchangeability (SGCE), which conditions exchangeable high-frequency components on low-frequency ones to preserve global trends and enable effective CP in the spectral domain. Based on SGCE, we further propose Spectral Conformal prediction via wAveLEt transform (SCALE). SCALE uses graph wavelets to decompose low/high-frequency components and conformalizes high-frequency residuals via adaptive gating over a low-frequency embedding. Experimental results on real-world traffic datasets show that SCALE not only achieves valid coverage but also consistently improves the coverage-efficiency trade-off over the state-of-the-art CP methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04957v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ruichao Guo, Xingyao Han, Luo Wenshui, Zhe Liu, Chen Gong, Hesheng Wang</dc:creator>
    </item>
    <item>
      <title>EP-GRPO: Entropy-Progress Aligned Group Relative Policy Optimization with Implicit Process Guidance</title>
      <link>https://arxiv.org/abs/2605.04960</link>
      <description>arXiv:2605.04960v1 Announce Type: new 
Abstract: Reinforcement learning with verifiable rewards (RLVR), particularly Group Relative Policy Optimization (GRPO), has advanced LLM reasoning. However, GRPO suffers from three credit assignment failures: uniform token-level granularity that ignores heterogeneous informational value, uniform polarity that penalizes correct steps and rewards incorrect ones, and zero-variance collapse that erases outcome-driven gradients. We systematically quantify these failures, revealing highly non-uniform token informativeness, widespread step-level polarity misalignment, and substantial training waste.
  To address these limitations, we propose Entropy-Progress Aligned GRPO (EP-GRPO), a framework that mines the model's intrinsic information flow for dense, self-supervised guidance. EP-GRPO integrates entropy-gated modulation to prioritize high entropy decision pivots, implicit process signals from policy divergence anchored to outcome advantages for directional token-level feedback without external reward models, and cumulative entropy mapping that enables progress-aligned advantage normalization, naturally maintaining gradient flow under zero reward variance.
  Extensive experiments on mathematical reasoning benchmarks demonstrate that EP-GRPO achieves superior accuracy and efficiency compared to GRPO and its variants. The code will be available.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04960v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Song Yu, Li Li, Wenwen Zhao, Zhisheng Yang</dc:creator>
    </item>
    <item>
      <title>TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding</title>
      <link>https://arxiv.org/abs/2605.04962</link>
      <description>arXiv:2605.04962v1 Announce Type: new 
Abstract: Foundation models have established unified representations for natural language processing, yet this paradigm remains largely unexplored for tabular data. Existing methods face fundamental limitations: LLM-based approaches lack retrieval-compatible vector outputs, whereas text embedding models often fail to capture tabular structure and numerical semantics. To bridge this gap, we first introduce the Tabular Embedding Benchmark (TabBench), a comprehensive suite designed to evaluate the tabular understanding capability of embedding models. We then propose TabEmbed, the first generalist embedding model that unifies tabular classification and retrieval within a shared embedding space. By reformulating diverse tabular tasks as semantic matching problems, TabEmbed leverages large-scale contrastive learning with positive-aware hard negative mining to discern fine-grained structural and numerical nuances. Experimental results on TabBench demonstrate that TabEmbed significantly outperforms state-of-the-art text embedding models, establishing a new baseline for universal tabular representation learning. Code and datasets are publicly available at https://github.com/qiangminjie27/TabEmbed and https://huggingface.co/datasets/qiangminjie27/TabBench.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04962v1</guid>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Minjie Qiang, Mingming Zhang, Xiaoyi Bao, Xing Fu, Yu Cheng, Weiqiang Wang, Zhongqing Wang, Ningtao Wang</dc:creator>
    </item>
    <item>
      <title>Reliable Modeling of Distribution Shifts via Displacement-Reshaped Optimal Transport</title>
      <link>https://arxiv.org/abs/2605.04965</link>
      <description>arXiv:2605.04965v1 Announce Type: new 
Abstract: Optimal transport (OT) is a central framework for modeling distribution shifts. Because OT compares distributions directly in input space, a well-designed ground metric between observations is essential to ensure that the optimizer does not violate the true geometry of change. We propose Displacement-Reshaped Optimal Transport (ReshapeOT), a method that reshapes the ground metric by integrating observed sample displacements as an additional source of knowledge. Technically, ReshapeOT replaces the Euclidean metric with a Mahalanobis distance estimated from displacement second moments. This effectively carves expressways through the input space, inviting transport solutions that better align with observed displacements. Our method is computationally lightweight, integrates seamlessly into any OT solver that operates on a cost matrix, and can be kernelized for further flexibility. Experiments on synthetic and real-world data show that ReshapeOT achieves substantial gains in transport reliability. We further demonstrate our method's usefulness in two practical use cases.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04965v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Philip Naumann, Jacob Kauffmann, Klaus-Robert M\"uller, Gr\'egoire Montavon</dc:creator>
    </item>
    <item>
      <title>Adaptive Contention-based Random Access for Uplink Reporting in 3GPP Ambient IoT Networks</title>
      <link>https://arxiv.org/abs/2605.04966</link>
      <description>arXiv:2605.04966v1 Announce Type: new 
Abstract: Ambient Internet of Things (A-IoT) targets energy harvesting (EH), battery-less devices as a simple connectivity solution for extensive ultra-low-power deployments. These devices typically face intermittent energy availability, making uplink reports increasingly susceptible to access collisions and energy outages. In this paper, we build upon the cellular standardization of A-IoT and examine the paging-triggered contention-based random access (CBRA) framework for uplink reporting. We analyze the effects of energy availability and collisions on these systems and introduce an EH-aware access control mechanism. In this mechanism, the reader broadcasts an access probability in the paging message, which helps regulate the number of devices attempting random access. Results show that, unlike the baselines, the proposed method scales well under dense deployments by keeping collisions nearly constant, improving access efficiency, and substantially reducing the number of paging rounds required for successful reporting. These results highlight the importance of lightweight reader-side access control for reliable and resource-efficient reporting in A-IoT environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04966v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>David E. Ruiz-Guirola, Samer Nasser, Bikramjit Singh, Henrique Duarte Moura, Andrey Belogaev, Jeroen Famaey, Efstathios Katranaras, Mahdi Shahabi, Onel L. A. Lopez</dc:creator>
    </item>
    <item>
      <title>Skill Neologisms: Towards Skill-based Continual Learning</title>
      <link>https://arxiv.org/abs/2605.04970</link>
      <description>arXiv:2605.04970v1 Announce Type: new 
Abstract: Modern LLMs show mastery over an ever-growing range of skills, as well as the ability to compose them flexibly. However, extending model capabilities to new skills in a scalable manner is an open-problem: fine-tuning and parameter-efficient variants risk catastrophic forgetting, while context-based approaches have limited expressiveness and are constrained by the model's effective context. We explore skill neologisms--i.e., soft tokens integrated in the model's vocabulary and optimized to improve capabilities over a specific skill--as a way to selectively extend model capabilities to new skills without weight updates. We first observe that off-the-shelf pre-trained LLMs already demonstrate tokens associated with procedural knowledge. We then show that skill neologisms can be learned to improve model capabilities on specific skills while being composable with out-of-distribution skills, and that independently trained skill neologisms can be composed zero-shot. These results suggest that skill neologisms may provide a scalable path towards skill-based continual learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04970v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Antonin Berthon, Nicolas Astorga, Mihaela van der Schaar</dc:creator>
    </item>
    <item>
      <title>Why Geometric Continuity Emerges in Deep Neural Networks: Residual Connections and Rotational Symmetry Breaking</title>
      <link>https://arxiv.org/abs/2605.04971</link>
      <description>arXiv:2605.04971v1 Announce Type: new 
Abstract: Weight matrices in deep networks exhibit geometric continuity -- principal singular vectors of adjacent layers point in similar directions. While this property has been widely observed, its origin remains unexplained. Through experiments on toy MLPs and small transformers, we identify two mechanisms: residual connections create cross-layer gradient coherence that aligns weight updates across layers, and symmetry-breaking nonlinearities constrain all layers to a shared coordinate frame, preventing the rotation drift that would otherwise destabilize weight structure. Crucially, a nonlinear but rotation-preserving activation fails to retain continuity, isolating symmetry breaking -- not nonlinearity itself -- as the active ingredient. Activation and normalization play distinct roles: activation concentrates continuity in the leading singular direction, while normalization distributes it across multiple directions. In transformers, continuity is projection-specific: Q, K, Gate, and Up (which read from the residual stream) develop input-space ($\mathbf{v}_1$) continuity; O and Down (which write to it) develop output-space ($\mathbf{u}_1$) continuity; V alone, lacking an adjacent nonlinearity, develops only low continuity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04971v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kyungwon Jeong, Won-Gi Paeng, Honggyo Suh</dc:creator>
    </item>
    <item>
      <title>Why Expert Alignment Is Hard: Evidence from Subjective Evaluation</title>
      <link>https://arxiv.org/abs/2605.04972</link>
      <description>arXiv:2605.04972v1 Announce Type: new 
Abstract: Aligning large language models with expert judgment is especially difficult in subjective evaluation tasks, where experts may disagree, rely on tacit criteria, and change their judgments over time. In this paper, we study expert alignment as a way to understand this difficulty. Using expert evaluations and follow-up questionnaires, we examine how different forms of expert information affect alignment and what this reveals about subjective judgment. Our findings show four consistent patterns. First, alignment difficulty varies substantially across experts, suggesting that expert evaluation styles differ widely in their distance from a model's prior behavior. Second, explicit criteria and reasoning do not always improve alignment, indicating that expert judgment is not fully captured by verbalized rules. Third, editing is sensitive to both the number and the identity of examples, with small numbers of edits providing useful but unstable gains. Fourth, alignment difficulty differs across evaluation dimensions: dimensions grounded more directly in proposal content are easier to align, while dimensions requiring external knowledge or value-based judgment remain harder. Taken together, these results suggest that expert alignment is difficult not only because of model limitations, but also because subjective evaluation is inherently heterogeneous, partly tacit, dimension-dependent, and temporally unstable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04972v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tzu-Mi Lin, Wataru Hirota, Tatsuya Ishigaki, Lung-Hao Lee, Chung-Chi Chen</dc:creator>
    </item>
    <item>
      <title>Architectural Constraints Alignment in AI-assisted, Platform-based Service Development</title>
      <link>https://arxiv.org/abs/2605.04973</link>
      <description>arXiv:2605.04973v1 Announce Type: new 
Abstract: AI-assisted development tools enable rapid prototyping of services but often lack awareness of architectural constraints, infrastructure dependencies, and organizational standards required in production environments. Consequently, generated artifacts may exhibit brittle behavior and limited deployability. We propose a retrieval-augmented scaffolding approach that combines platform-based code generation with agentic clarification loops to expose and resolve architectural constraint ambiguities. By combining template retrieval with structured interaction, the method embeds production-relevant considerations during service scaffolding. Evaluation indicates improved architectural consistency and deployability compared to general-purpose AI code generation workflows, suggesting that constraint-aware retrieval is essential for aligning AI-assisted service development with production software engineering practices.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04973v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Julius Irion, Moritz Leugers, Paul Hartwig, Simon Kling, Tachmyrat Annayev, Alexander Schwind, Maria C. Borges, Sebastian Werner</dc:creator>
    </item>
    <item>
      <title>Probabilistic Atomic Swaps for Bitcoin and Friends</title>
      <link>https://arxiv.org/abs/2605.04975</link>
      <description>arXiv:2605.04975v1 Announce Type: new 
Abstract: Atomic swaps are a fundamental primitive for the trustless exchange of digital assets across blockchains: they guarantee that either both parties receive the agreed assets or neither party transfers. While this all-or-nothing guarantee is powerful, it also imposes an inherent determinism that rules out exchanges whose intended outcome is probabilistic. As a result, existing atomic swaps cannot realize trustless exchanges in which one party pays for a fixed chance of receiving a larger asset or reward, as in lotteries, randomized allocation mechanisms, and probabilistic cross-chain trades.
  We introduce probabilistic swaps, a new cryptographic primitive that extends atomic swaps to the probabilistic setting. In a probabilistic swap, one party's transfer is executed with a fixed, publicly specified probability embedded in the protocol and cannot be biased by either party. This yields a trustless mechanism for randomized exchange with verifiable odds and no trusted intermediary.
  Our construction combines adaptor signatures with oblivious pseudorandom functions (OPRFs) to realize the desired probabilistic outcome while ensuring that neither party can predict or bias it in advance. Along the way, we introduce a new mechanism for the atomic exchange of OPRF evaluations for payments, which may be of independent interest. A key feature of our approach is that it preserves the minimal on-chain footprint of modern atomic-swap protocols. The protocol relies only on standard Bitcoin scripts, such as digital signatures and timelocks, and is deployable on any blockchain that already supports atomic swaps. Consequently, probabilistic swaps are indistinguishable from ordinary on-chain transactions, which helps preserve privacy and fungibility. We provide formal security foundations and demonstrate practicality through a probabilistic swap in the Bitcoin testnet and in the Lightning Network.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04975v1</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Paul Gerhart, Jay Taylor, Sri Aravinda Krishnan Thyagarajan</dc:creator>
    </item>
    <item>
      <title>ICPR 2026 Competition on Privacy-Preserving Person Re-Identification from Top-View RGB-Depth Camera (TVRID)</title>
      <link>https://arxiv.org/abs/2605.04977</link>
      <description>arXiv:2605.04977v1 Announce Type: new 
Abstract: This companion paper reports the ICPR 2026 TVRID competition on privacy-aware top-view person re-identification. We present the competition setting, the released RGB-Depth dataset, and a summary of final results with descriptions of the top entries. TVRID contains 86 identities captured by four synchronized overhead Intel RealSense D455 cameras, with paired RGB/Depth streams and structured geometric variation across flat, ascent, descent, and oblique viewpoints. The evaluation protocol includes three tracks: RGB Re-ID, Depth Re-ID, and RGB$\leftrightarrow$Depth cross-modal retrieval. Submissions are ranked using mAP and CMC-1 under a unified server-side evaluation. The final results show a clear difficulty ordering (RGB $&gt;$ Depth $&gt;$ Cross-Modal), highlighting both the challenge of modality-constrained retrieval and the feasibility of strong performance with modality-invariant learning. By releasing the dataset at https://zenodo.org/records/17909410, the evaluation scripts at https://github.com/RaphaelDel/ICPR-TVRID, and the accompanying documentation, TVRID establishes a reproducible benchmark for top-view, depth-based, and cross-modal person re-id.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04977v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rapha\"el Del\'ecluse, Hazem Wannous, Laurent Guimas</dc:creator>
    </item>
    <item>
      <title>Exhaustive Symbolic Integration: Integration by Differentiation and the Landscape of Symbolic Integrability</title>
      <link>https://arxiv.org/abs/2605.04978</link>
      <description>arXiv:2605.04978v1 Announce Type: new 
Abstract: We introduce Exhaustive Symbolic Integration (ESI), a method that enumerates all symbolic functions up to a given complexity $k$ within a specified operator basis and determines which admit closed-form antiderivatives within the same class. This allows us to compute the "integrability fraction" $\rho(k)$ (the fraction of functions whose derivatives lie within the same class), which we do for five operator bases including combinations of rational functions, powers, exponentials, logarithms and trigonometric functions. We find that $\rho(k)$ declines at high complexity and that the operator basis has a dramatic effect -- in particular, adding the logarithm boosts $\rho(k)$ by a factor of $\sim$3 and produces or exacerbates a clear peak at $k=6$. We also deploy ESI as a novel integration algorithm, identifying three integrals that resist SymPy, Mathematica, RUBI, FriCAS, Maxima and Giac under all tested strategies. When an antiderivative can be found by multiple methods, ESI often returns the simplest form. These results reveal that the landscape of symbolic integrability is shaped primarily by the choice of operators, and that exhaustive enumeration can systematically discover integrable forms -- including novel ones -- that elude computer albegra systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04978v1</guid>
      <category>cs.SC</category>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Harry Desmond</dc:creator>
    </item>
    <item>
      <title>On-line Learning in Tree MDPs by Treating Policies as Bandit Arms</title>
      <link>https://arxiv.org/abs/2605.04979</link>
      <description>arXiv:2605.04979v1 Announce Type: new 
Abstract: A Tree Markov Decision Problem (T-MDP) is a finite-horizon MDP with a starting state $s_{1}$, in which every state is reachable from $s_{1}$ through exactly one state-action trajectory. T-MDPs arise naturally as abstractions of decision making in sequential games with perfect recall, against stationary opponents. We consider the problem of on-line learning in T-MDPs, both in the PAC and the regret-minimisation regimes. We show that well-known bandit algorithms -- \textsc{Lucb} and \textsc{Ucb} -- can be applied on T-MDPs by treating each policy as an arm. The apparent technical challenge in this approach is that the number of policies is exponential in the number of states. Our main innovation is in the design of confidence bounds based on data shared by the policies, so that the bandit algorithms can yet be implemented with polynomial memory and per-step computation. We obtain instance-dependent upper bounds on sample complexity and regret that sum a ``gap term'' from every terminal state, rather than every policy. Empirically, our algorithms consistently outperform available alternatives on a suite of hidden-information games.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04979v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anvay Shah, Ramsundar Anandanarayanan, Sharayu Moharir, Shivaram Kalyanakrishnan</dc:creator>
    </item>
    <item>
      <title>Conceptors for Semantic Steering</title>
      <link>https://arxiv.org/abs/2605.04980</link>
      <description>arXiv:2605.04980v1 Announce Type: new 
Abstract: Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: soft projection matrices estimated from activations pooled across both poles of a bipolar concept, which preserve the concept's full multidimensional subspace. A geometric analysis shows the bipolar subspace strictly subsumes the single-vector baseline. We further show that the conceptor quota provides a parameter-free layer-selection diagnostic, predicting concept separability with Pearson correlations up to r=0.96 across three instruction-tuned models and three semantic dimensions. Beyond selection, conceptors admit a closed-form Boolean algebra (AND, OR, NOT): we evaluate conceptor compositionality on thematically related sub-concepts. Across a systematic five-axis design-space evaluation, conceptors match or outperform additive baselines at layers where concept subspaces are multi-dimensional while producing substantially fewer degenerate outputs. Conceptor steering is a geometrically principled, compositional, and practically safer alternative to single-direction steering from a limited number of contrastive pairs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04980v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ilias Triantafyllopoulos, Young-Min Cho, Ren Tao, Miranda Muqing Miao, Sunny Rai, Lyle Ungar, Sharath Chandra Guntuku, Neville Ryant, Jo\~ao Sedoc</dc:creator>
    </item>
    <item>
      <title>Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers</title>
      <link>https://arxiv.org/abs/2605.04984</link>
      <description>arXiv:2605.04984v1 Announce Type: new 
Abstract: Long-horizon LLM agents depend on intermediate information-gathering turns, yet training feedback is usually observed only at the final answer, because process-level rewards require high-quality human annotation. Existing turn-level shaping methods reward turns that increase the likelihood of a gold answer, but they require answer supervision or stable task-specific verifiers. Conversely, label-free RL methods extract self-signals from output distributions, but mainly at the answer or trajectory level and therefore cannot assign credit to intermediate turns. We propose Self-Induced Outcome Potential (SIOP), which treats semantic clusters of final answers as latent future outcome states for potential-based turn-level credit assignment. For each query, SIOP samples multiple rollouts, clusters final answers into semantic outcome modes, and builds a reliability-aware target distribution over these states. It then rewards turns for increasing posterior support for reliable future states using a tractable cluster-level approximation. The objective generalizes information-potential shaping from gold-answer supervision to settings without task-specific gold verifiers while avoiding the broadcasted rollout-level advantages used by standard GRPO. We formalize the framework, characterize its supervised gold-answer limit, and show that SIOP improves average performance over verifier-free outcome-level baselines on seven search-augmented agentic reasoning benchmarks while approaching a gold-supervised outcome baseline. Code is available at https://github.com/dl-m9/SIOP.git.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04984v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Senkang Hu, Yong Dai, Xudong Han, Zhengru Fang, Yuzhi Zhao, Sam Tak Wu Kwong, Yuguang Fang</dc:creator>
    </item>
    <item>
      <title>Attention-Based Chaotic Self-Supervision for Medical Image Classification</title>
      <link>https://arxiv.org/abs/2605.04985</link>
      <description>arXiv:2605.04985v1 Announce Type: new 
Abstract: Deep learning models for medical image classification usually achieve promising results but typically rely on large, annotated datasets or standard transfer learning from ImageNet. Self-Supervised Learning (SSL) has emerged as a powerful alternative, yet common methods like masked autoencoders (MAEs) may inadvertently destroy fine-grained diagnostic features by using random masking. In this paper, we propose a novel SSL pre-training strategy, the Chaotic Denoising Autoencoder (CDAE). Instead of masking, we apply a chaotic transformation to the input image, tasking an autoencoder to reconstruct the original. We hypothesize this forces the encoder to learn robust, domain-specific features by "inverting the chaos". Furthermore, we propose an attentive fusion mechanism that combines features from our CDAE-trained encoder with a standard encoder, leveraging the strengths of both general and domain-specific representations. Our method is evaluated on two public medical datasets: ISIC 2018 (skin lesions) and APTOS 2019 (diabetic retinopathy). The proposed model achieves high performance, with an accuracy of 0.9221 and an F1-macro of 0.8530 on ISIC 2018, and an accuracy of 0.8644 and F1-macro of 0.7433 on APTOS 2019, demonstrating the efficacy of our approach.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04985v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Joao Batista Florindo, Amanda Pontes de Oliveira Ornelas</dc:creator>
    </item>
    <item>
      <title>Low-Rank Adaptation of Geospatial Foundation Models for Wildfire Mapping Using Sentinel-2 Data</title>
      <link>https://arxiv.org/abs/2605.04989</link>
      <description>arXiv:2605.04989v1 Announce Type: new 
Abstract: Wildfire burned-area mapping is essential for damage assessment, emissions modeling, and understanding fire-climate interactions across diverse ecological regions. Recent geospatial foundation models provide strong general-purpose representations for satellite imagery, yet there is still no clear understanding of how to efficiently adapt these models for downstream Earth observation tasks, particularly under geographic and temporal domain shift. This study evaluates three state-of-the-art Geospatial Foundation Models (GFMs) - Terramind, DINOv3, and Prithvi-v2 - for burned-area mapping across the United States and Canada using Sentinel-2 data. Leveraging 3,820 wildfire events from 2017-2023, we conduct spatial and temporal generalization tests across diverse biomes. We systematically compare full fine-tuning, decoder-only fine-tuning, and Low-Rank Adaptation (LoRA) for adapting each model. Across all experiments, LoRA provides the strongest cross-domain generalization while updating less than 1% of parameters, demonstrating a favorable trade-off between accuracy and efficiency. Prithvi-v2 with LoRA achieves the highest overall accuracy and the largest improvement compared to full fine-tuning. These findings indicate that geospatial foundation models, when adapted using lightweight parameter-efficient methods such as LoRA, offer a robust and scalable solution for large-scale burned-area mapping. Code is available at https://github.com/alishibli97/wildfire-lora-gfm.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04989v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ali Shibli, Andrea Nascetti, Yifang Ban</dc:creator>
    </item>
    <item>
      <title>You Snooze, You Lose: Automatic Safety Alignment Restoration through Neural Weight Translation</title>
      <link>https://arxiv.org/abs/2605.04992</link>
      <description>arXiv:2605.04992v1 Announce Type: new 
Abstract: The open-source ecosystem has accelerated the democratization of Large Language Models (LLMs) through the public distribution of specialized Low-Rank Adaptation (LoRA) modules. However, integrating these third-party adapters often induces catastrophic forgetting of the base model's foundational safety alignment. Restoring these guardrails via fine-tuning on safety data introduces an opposing failure mode: the severe degradation of the specialized domain knowledge the adapter was originally designed to provide. To overcome this zero-resource challenge, we propose Neural Weight Translation (NeWTral), a framework that directly maps unsafe, domain-specific adapters onto a safe alignment manifold while rigorously preserving their core expertise. NeWTral operates as a non-linear translation module pre-trained on a diverse corpus of unsafe-to-safe adapter pairs. By executing this mapping entirely within the parameter space, NeWTral utilizes an adaptive Mixture of Experts (MoE) routing strategy to autonomously blend high-fidelity surgical translators and aggressive alignment experts. We evaluate our framework across four architectural families (Llama, Mistral, Qwen, and Gemma) at scales up to 72B parameters across eight diverse scientific and professional domains. Our results demonstrate that the MoE variant achieves a radical reduction in the average Attack Success Rate (ASR), dropping from 70% in unsafe experts to just 13%, while maintaining an exceptional 90\% average knowledge fidelity. Much like the crowdsourced adapters it remedies, the NeWTral module is designed as a standalone, downloadable asset that allows practitioners to restore safety alignment instantly without requiring access to original training data or hardware-intensive retraining.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04992v1</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Marco Arazzi, Vignesh Kumar Kembu, Antonino Nocera, Stjepan Picek, Saraga Sakthidharan</dc:creator>
    </item>
    <item>
      <title>Federated Learning for Early Prediction of EV Charging Demand</title>
      <link>https://arxiv.org/abs/2605.04993</link>
      <description>arXiv:2605.04993v1 Announce Type: new 
Abstract: Accurate forecasting of electric vehicle (EV) charging demand is critical for grid stability, infrastructure planning, and real-time charging optimization. In this work, we study the problem of early prediction of charging demand, where the total energy of a session is estimated using only information available at plug-in time and during the first minutes of charging. This enables actionable decisions while the session is still in progress, which is of direct importance for EV network operators. We construct a session-level dataset from the Adaptive Charging Network (ACN), combining session metadata with early-window charging measurements, and derive tabular features capturing user intent, temporal patterns, and initial charging behavior. We focus on a single operational depot, Caltech, and model intra-depot heterogeneity through station-level client partitions while evaluating multiple model families in a federated learning (FL) setting. Our results show that federated models can approach centralized predictive performance while keeping data in-depot, enabling privacy-enhanced training across distributed charging infrastructures. Overall, we demonstrate that reliable demand estimates can be obtained early in the session with minimal data, and that FL provides a practical pathway toward scalable and privacy-aware analytics for EV charging networks. Code is available at https://github.com/Indigma-Innovations/federated-learning-ev-charging-demand.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04993v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vasilis Perifanis, Foteini Nikolaidou, Nikolaos Pavlidis, Panagiotis Thomakos, Andreas Sendros</dc:creator>
    </item>
    <item>
      <title>Adaptivity Under Realizability Constraints: Comparing In-Context and Agentic Learning</title>
      <link>https://arxiv.org/abs/2605.04995</link>
      <description>arXiv:2605.04995v1 Announce Type: new 
Abstract: We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizable regime, where we require these operations to be implemented by ReLU neural networks. In both settings, adaptivity never hinders approximation performance. However, this advantage can change when one passes from the unrestricted regime to the realizable regime. We identify four distinct approximation scenarios, each witnessed by an explicit task family: (a) no advantage of adaptivity; (b) an advantage in the unrestricted regime that persists under ReLU realizability; (c) an advantage that arises only under realizability; and (d) an advantage that disappears under realizability. This demonstrates that representational constraints interact profoundly with the effect of adaptivity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04995v1</guid>
      <category>cs.LG</category>
      <category>math.ST</category>
      <category>stat.ML</category>
      <category>stat.TH</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anastasis Kratsios, A. Martina Neuman, Philipp Petersen</dc:creator>
    </item>
    <item>
      <title>Tailoring Scaffolding to Diagnostic Strategies: Theory-Informed LLM-Based Agents</title>
      <link>https://arxiv.org/abs/2605.04996</link>
      <description>arXiv:2605.04996v1 Announce Type: new 
Abstract: Learning analytics systems increasingly integrate large language models (LLMs) to provide adaptive scaffolding in complex learning environments, yet personalization is often driven by global instructional choices rather than principled alignment with learning theory, limiting effectiveness and pedagogical grounding. In prior work, we examined how structuring and problematizing scaffolding approaches can be instantiated through LLM agents in a scenario-based learning environment for diagnostic reasoning. While both approaches supported learning, we observed systematic differences in learner interaction patterns and clear tendencies indicating that different diagnostic strategies benefited from distinct forms of scaffolding. Building on these findings, we propose a theory-informed scaffolding design grounded in the Knowledge Learning Instruction (KLI) framework, as different diagnostic strategies target different types of knowledge and require different instructional mechanisms. We use KLI to guide the alignment between strategy demands and scaffolding approaches and introduce a KLI-informed hybrid LLM agent that adapts its pedagogical support according to the diagnostic strategy being practiced, rather than applying a single global scaffolding approach. We hypothesize that this design could enable better learning gains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04996v1</guid>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Fatma Betul Gures, Tanya Nazaretsky, Tanja Kaser</dc:creator>
    </item>
    <item>
      <title>DualTCN: A Physics-Constrained Temporal Convolutional Network for 2 Time-Domain Marine CSEM Inversion</title>
      <link>https://arxiv.org/abs/2605.04997</link>
      <description>arXiv:2605.04997v1 Announce Type: new 
Abstract: DualTCN is the first deep-learning framework for inverting time-domain marine controlled-source electromagnetic (MCSEM) transient data. Moving away from traditional subsurface discretization, the framework regresses four earth-model parameters -- $\sigma_1$, $\sigma_2$, $d_1$, $d_2$ -- and reconstructs conductivity-depth profiles using a differentiable soft-step decoder. The optimized architecture (379K parameters) features a Temporal Convolutional Network (TCN) encoder paired with a late-time branch and an auxiliary seafloor-depth head. This design achieves a 25.3\% loss reduction over baseline models, with high predictive accuracy ($R^2 = 0.898$ for $\sigma_2$) and an inversion speed of 3.5~ms per sample on an A100 GPU.
  The framework demonstrates high robustness to noise through curriculum-based amplitude augmentation, maintaining a mean $\bar{R}^2$ of 0.858 at $\pm2\%$ random amplitude error, compared to $0.363$ without augmentation. DualTCN generalizes effectively to three-layer extensions (seawater/resistive layer/basement), accurately resolving basement conductivity ($R^2 \approx 0.88$), though thin-layer resolution remains a physical limitation ($R^2 \approx 0.23$).
  In comparative benchmarks, DualTCN significantly outperforms traditional local optimization methods like Levenberg-Marquardt and L-BFGS-B, yielding a mean $\bar{R}^2 = 0.877$ versus 0.129-0.439 for multi-start baselines, while operating at up to 21,000$\times$ lower computational cost. Finally, the framework incorporates uncertainty quantification via Monte Carlo (MC) Dropout. While well-calibrated for $\sigma_1$ (PICP90 = 0.944), inherent signal limitations at short offsets (200m) lead to under-coverage for $d_2$ (PICP90 = 0.572), which can be mitigated through post-hoc temperature scaling or split conformal prediction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04997v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Khaled Ahmed, Ghada Omar</dc:creator>
    </item>
    <item>
      <title>Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation</title>
      <link>https://arxiv.org/abs/2605.04998</link>
      <description>arXiv:2605.04998v1 Announce Type: new 
Abstract: Chord progression generation is practically important but understudied. Most large-scale symbolic music systems target melody, multi-track arrangement, or audio synthesis, and chord-only models tend to be relegated to conditioning components inside larger pipelines. This paper treats chord generation as a standalone task and addresses a question that arises whenever such a model is adapted across genres: how much old-domain data must be retained during fine-tuning to acquire a new domain without forgetting the old? I study jazz fine-tuning starting from a pop-pretrained 25M-parameter Music Transformer (84.24% top-1 chord accuracy on a held-out pop test set). The available jazz corpus is an order of magnitude smaller than the pop corpus, so every fine-tune run uses all 1,513 jazz training sequences. The swept variable is the volume of pop "rehearsal" data mixed alongside, taking values in {0, 1K, 2.5K, 5K, 10K}. Every fine-tuned model gains 7 to 9 points of jazz top-1. Pop accuracy collapses by 2.14 points under jazz-only fine-tuning, recovers to baseline at approximately 2.5K rehearsal samples (1.65x the jazz volume), and saturates beyond that point. A complementary observation: the metric-best run (F3, 2.5K mix) is not always the perceptually preferred one. The pop-leaning (10K) and jazz-leaning (1K) endpoints carry more committed stylistic identities that the author more often selects as finished output in informal listening. I discuss what this suggests for music co-creation tools but make no perceptual claim, since no formal listening study has been conducted. All six checkpoints are released on the HuggingFace Hub at https://huggingface.co/PearlLeeStudio.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04998v1</guid>
      <category>cs.SD</category>
      <category>cs.IR</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Jinju Lee</dc:creator>
    </item>
    <item>
      <title>Agentic Vulnerability Reasoning on Windows COM Binaries</title>
      <link>https://arxiv.org/abs/2605.05000</link>
      <description>arXiv:2605.05000v1 Announce Type: new 
Abstract: Windows Component Object Model (COM) services run with elevated privileges and are widely accessible to authenticated users, making race conditions in these binaries a critical surface for local privilege escalation. We present SLYP, an end-to-end agentic pipeline that discovers race condition vulnerabilities in COM binaries and generates debugger-verified proof-of-concept (PoC) code. SLYP exposes binary exploration, COM inspection, and dynamic debugging as reusable tool interfaces, giving agents the static context, COM activation metadata, and debugger feedback needed to move from vulnerability discovery to verified PoC generation. On a benchmark of 20 COM objects covering 40 vulnerability cases, SLYP achieves 0.973 F1, outperforming production coding agents by up to 0.208 F1 and the state-of-the-art static analyzer by 3.3x in bug discovery. For PoC generation, production coding agents in their default setup (without our COM inspection and dynamic debugging tools) verify essentially no cases on either frontier model, whereas SLYP's interactive toolsets enable it to autonomously synthesize working PoCs for 67.5% of cases on the strongest configuration. Deployed on production Windows services, SLYP discovers 28 previously unknown vulnerabilities across nine COM services, all confirmed by the Microsoft Security Response Center (MSRC) with 16 CVEs assigned and $140,000 in bounties. Furthermore, SLYP is designed with generalizable binary analysis and debugging interfaces, making it readily applicable to other commercial off-the-shelf (COTS) binaries beyond Windows COM services.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05000v1</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hwiwon Lee, Jongseong Kim, Lingming Zhang</dc:creator>
    </item>
    <item>
      <title>Unlocking Embodied Probabilistic Computational Features in Motor Drives</title>
      <link>https://arxiv.org/abs/2605.05001</link>
      <description>arXiv:2605.05001v1 Announce Type: new 
Abstract: Artificial intelligence (AI)-driven fault diagnosis in motor drives often requires significant computational efforts and time for re-training, in addition to the limited knowledge behind the model and suitability of training and learning mechanisms. This work bridges this gap by proposing a structured mechanism of transforming untapped labeled fault data into AI parameters to leverage probabilistic data-driven learning. This novel AI reservoir modeling framework for power electronics not only eliminates exogenous efforts behind learning data patterns and its optimization, but also provides intuitive guidelines for power electronics engineers behind sizing of AI models. This alignment between data and system physics makes the proposed model transparent and interpretable, bridging practical understanding with data-driven learning. Its computational efficiency is demonstrated using experimental data that structured, physics-aware reservoirs achieve higher diagnostic accuracy and clearer explanations than conventional black-box AI methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05001v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Subham Sahoo, Huai Wang, Frede Blaabjerg</dc:creator>
    </item>
    <item>
      <title>Misaligned by Reward: Socially Undesirable Preferences in LLMs</title>
      <link>https://arxiv.org/abs/2605.05003</link>
      <description>arXiv:2605.05003v1 Announce Type: new 
Abstract: Reward models are a key component of large language model alignment, serving as proxies for human preferences during training. However, existing evaluations focus primarily on broad instruction-following benchmarks, providing limited insight into whether these models capture socially desirable preferences. As a result, important failures in social alignment can remain hidden.
  We extend reward-model benchmarking to four socially consequential domains: bias, safety, morality, and ethical reasoning. We introduce a framework that converts social evaluation datasets into pairwise preference data, leveraging gold labels where available and directional bias indicators otherwise. This enables us to test whether reward models prefer socially undesirable responses, and whether their preferences produce systematically biased distributions over selected outputs.
  Across five publicly available reward models and two instruction-tuned models used as reward proxies, we find substantial variation across domains, with no single model performing best overall. The models fall well short of strong social intelligence: they often prefer socially undesirable options, and their preferences produce systematically biased distributions. Moreover, stronger bias avoidance can reduce sensitivity to context, revealing a key alignment trade-off between avoiding biased outcomes and preserving contextual faithfulness. These findings show that standard reward benchmarks are insufficient for assessing social alignment and highlight the need for evaluations that directly measure the social preferences encoded in reward models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05003v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Gayane Ghazaryan, Esra D\"onmez</dc:creator>
    </item>
    <item>
      <title>Uno-Orchestra: Parsimonious Agent Routing via Selective Delegation</title>
      <link>https://arxiv.org/abs/2605.05007</link>
      <description>arXiv:2605.05007v1 Announce Type: new 
Abstract: Large language model (LLM) multi-agent systems typically rely on rigid orchestration, committing either to flat per-query routing or to hand-engineered task decomposition, so decomposition depth, worker choice, and inference budget are not jointly optimized under one objective. We introduce Uno-Orchestra, a unified orchestration policy that selectively decomposes a task and dispatches each subtask to an admissible (model, primitive) pair, with both decisions learned together from curated RL trajectories grounded in real worker interactions. Against 22 baselines on a 13-benchmark suite spanning math, code, knowledge, long-context, and agentic tool-use, Uno-Orchestra reaches 77.0% macro pass@1, roughly 16% above the strongest workflow baseline, at roughly an order of magnitude lower per-query cost, advancing the accuracy-efficiency frontier of selective delegation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05007v1</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhiqing Cui, Haotong Xie, Jiahao Yuan, Cheng Yang, Hanqing Wang, Yuxin Wu, Yifan Wu, Siru Zhong, Tao Yu, Yifu Guo, Siyu Zhang, Xinlei Yu, Qibing Ren, Usman Naseem</dc:creator>
    </item>
    <item>
      <title>Learned Neighbor Trust for Collaborative Deployment in Model-Agnostic Decentralized Learning</title>
      <link>https://arxiv.org/abs/2605.05009</link>
      <description>arXiv:2605.05009v1 Announce Type: new 
Abstract: Many decentralized distillation methods are designed around training-time coordination, yet deploy each node in isolation even when more capable neighbors remain available at inference time. This is an incomplete objective for settings such as IoT, where devices are heterogeneous, data is scarce and skewed, and a node's strongest neighbors may far exceed its own local capacity. We study how nodes should train so that their predictions compose well at deployment, and how each node should learn whom to trust. Under a server-free, model-agnostic protocol where nodes exchange only queries and soft predictions, we propose Learned Neighbor Trust (LNTrust) wherein each node learns a compact trust function over its neighborhood from local validation evidence. This trust function gates auxiliary distillation during training and defines a deployment ensemble at inference, so that collaboration learned during training transfers directly to deployment. Across datasets and topologies, LNTrust improves deployed accuracy over the strongest output-only baseline by large margins while using significantly less communication than previous methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05009v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Michael Lanier, Luise Ge, Sastry Kompella, Yevgeniy Vorobeychik</dc:creator>
    </item>
    <item>
      <title>Chaotic Contrastive Learning for Robust Texture Classification</title>
      <link>https://arxiv.org/abs/2605.05012</link>
      <description>arXiv:2605.05012v1 Announce Type: new 
Abstract: Texture classification is a pivotal task in computer vision, presenting unique challenges due to high inter-class similarity and the sensitivity of structural patterns to scale and illumination changes. While Convolutional Neural Networks (CNNs) and recent Vision Transformers have set performance benchmarks, they often require extensive labeled datasets or struggle to generalize across domains due to an over-reliance on color and shape features. This paper introduces a novel framework that synergizes Self-Supervised Learning (SSL) with deterministic chaotic dynamics. We propose a chaotic contrastive pre-training strategy, where pixel-wise chaotic maps, specifically Logistic, Tent, and Sine maps, act as non-linear data augmentation techniques. These chaotic perturbations, grounded in ergodic theory, force the network to learn topologically robust features by mimicking complex environmental noise and reflectance variations. Furthermore, we introduce an attention-based feature ensemble that fuses high-level semantic representations from a supervised large backbone with low-frequency structural features from a chaos-pretrained tiny encoder. Experimental results on six texture benchmarks (FMD, UMD, KTH-TIPS2-b, DTD, GTOS, and 1200Tex) demonstrate the superiority of the proposed method, outperforming state-of-the-art approaches and achieving promising accuracies on all the analyzed datasets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05012v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Joao B Florindo</dc:creator>
    </item>
    <item>
      <title>CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography</title>
      <link>https://arxiv.org/abs/2605.05014</link>
      <description>arXiv:2605.05014v1 Announce Type: new 
Abstract: Autonomous driving must operate across diverse surfaces to enable safe mobility. However, most driving datasets are captured on well-paved flat roads. Moreover, recent driving datasets primarily provide sparse LiDAR ground truth for images, which is insufficient for assessing fine-grained geometry in depth estimation and completion. To address these gaps, we introduce CARD, a multi-modal driving dataset that delivers quasi-dense 3D ground truth across continuous sequences rich in speed bumps, potholes, irregular surfaces and off-road segments. Our sensor suite includes synchronized global-shutter stereo cameras, front and rear LiDARs, 6-DoF poses from LiDAR-inertial odometry, per-wheel motion traces, and full calibration. Notably, our multi-LiDAR fusion yields ~500K valid depth pixels per frame, about 6.5x more than KITTI Depth Completion and 10x more on average than other public driving datasets. The dataset spans ~110 km and 4.7 hours across Germany and Italy. In addition, CARD provides 2D bounding boxes targeting road-topography irregularities, enabling accurate benchmarking for both geometry and perception tasks. Furthermore, we establish a standardized evaluation protocol for road surface irregularities on CARD and benchmark state-of-the-art depth estimation models to provide strong baselines. The CARD dataset is hosted on https://huggingface.co/CARD-Data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05014v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Gasser Elazab, Frank Neuhaus, Tilman Ko{\ss}, Malte Splietker, Aditya Date, Michael Unterreiner, Maximilian Jansen, Olaf Hellwich</dc:creator>
    </item>
    <item>
      <title>Goedel Logics: On the Elimination of The Absoluteness Operator</title>
      <link>https://arxiv.org/abs/2605.05016</link>
      <description>arXiv:2605.05016v1 Announce Type: new 
Abstract: We investigate the eliminability of the absoluteness operator Delta in Goedel logics. While Delta is not definable from the standard connectives and disrupts important proof-theoretic properties, we show that it becomes eliminable at the propositional level under a restricted semantics in which all propositional atoms (except the truth constant 'True') are interpreted strictly below 1. Under this semantics, every formula containing Delta is equivalent to a disjunction of chain formulas, yielding a Delta-free normal form (standard and restricted semantics coincide w.r.t. valid formulas without Delta). We further analyze the situation in the first-order setting, where Delta-elimination fails in general due to recursion-theoretic and topological constraints, but can be recovered under witnessed semantics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05016v1</guid>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Matthias Baaz, Mariami Gamsakhurdia</dc:creator>
    </item>
    <item>
      <title>Position: Embodied AI Requires a Privacy-Utility Trade-off</title>
      <link>https://arxiv.org/abs/2605.05017</link>
      <description>arXiv:2605.05017v1 Announce Type: new 
Abstract: Embodied AI (EAI) systems are rapidly transitioning from simulations into real-world domestic and other sensitive environments. However, recent EAI solutions have largely demonstrated advancements within isolated stages such as instruction, perception, planning and interaction, without considering their coupled privacy implications in high-frequency deployments where privacy leakage is often irreversible. This position paper argues that optimizing these components independently creates a systemic privacy crisis when deployed in sensitive settings, thereby advancing the position that privacy in EAI is a life cycle-level architectural constraint rather than a stage-local feature. To address these challenges, we propose Secure Privacy Integration in Next-generation Embodied AI (SPINE), a unified privacy-aware framework that treats privacy as a dynamic control signal governing cross-stage coupling throughout the entire EAI life cycle. SPINE decomposes the EAI pipeline into various stages and establishes a multi-criterion privacy classification matrix to orchestrate contextual sensitivity across stage boundaries. We conduct preliminary simulation and real-world case studies to conceptually validate how privacy constraints propagate downstream to reshape system behavior, illustrating the insufficiency of fragmented privacy patches and motivating future research directions into secure yet functional embodied AI systems. We detail the SPINE framework and case studies at https://github.com/rminshen03/EAI_Privacy_Position.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05017v1</guid>
      <category>cs.AI</category>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiaoliang Fan, Jiarui Chen, Zhuodong Liu, Ziqi Yang, Peixuan Xu, Ruimin Shen, Junhui Liu, Jianzhong Qi, Cheng Wang</dc:creator>
    </item>
    <item>
      <title>Graph-SND: Sparse Aggregation for Behavioral Diversity in Multi-Agent Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2605.05020</link>
      <description>arXiv:2605.05020v1 Announce Type: new 
Abstract: System Neural Diversity (SND) measures behavioral heterogeneity in multi-agent reinforcement learning by averaging pairwise distances over all $\binom{n}{2}$ agent pairs, making each call quadratic in team size. We introduce Graph-SND, which replaces this complete-graph average with a weighted average over the edges of an arbitrary graph $G$. Three regimes follow: $G=K_n$ recovers SND exactly; a fixed sparse $G$ defines a localized diversity measure at $O(|E|)$ cost; and random edge samples yield an unbiased Horvitz-Thompson estimator and a normalized sample mean with $O(1/\sqrt{m})$ concentration in the sampled edge count $m$. For fixed sparse graphs we prove forwarding-index distortion bounds for expanders and a spectral refinement under low-rank distance structure; for random $d$-regular graphs we prove an unconditional probabilistic $\widetilde{\mathcal{O}}(D_{\max}/\sqrt{n})$ bound. On VMAS we verify recovery, unbiasedness, concentration, and wall-clock scaling, with a PettingZoo TVD panel checking non-Gaussian transfer. In a 500-iteration $n=100$ PPO run, Bernoulli-$0.1$ Graph-SND tracks full SND while reducing per-call metric time by about $10\times$, and frozen-policy GPU timing up to $n=500$ follows the predicted $\binom{n}{2}/|E|$ speedup. Random $d$-regular expanders empirically achieve $\mathrm{SND}_{G}^{\mathrm{u}}/\mathrm{SND} \in [0.9987, 1.0013]$ at $\Theta(n \log n)$ edges. In DiCo diversity control at $n=50$, Bernoulli-$0.1$ Graph-SND preserves set-point tracking with paired reward differences indistinguishable from zero across nine matched cells while cutting per-call metric cost by ${\sim}9.5\times$. Together, these results show that the SND aggregation bottleneck can be removed without changing the metric's semantics, yielding a drop-in sparse alternative that scales beyond complete-graph SND and supports both passive measurement and closed-loop diversity control.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05020v1</guid>
      <category>cs.LG</category>
      <category>cs.MA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shawn Ray</dc:creator>
    </item>
    <item>
      <title>CuBridge: An LLM-Based Framework for Understanding and Reconstructing High-Performance Attention Kernels</title>
      <link>https://arxiv.org/abs/2605.05023</link>
      <description>arXiv:2605.05023v1 Announce Type: new 
Abstract: Efficient CUDA implementations of attention mechanisms are critical to modern deep learning systems, yet supporting diverse and evolving attention variants remains challenging. Existing frameworks and compilers trade performance for flexibility, while expert-written kernels achieve high efficiency but are difficult to adapt. Recent work explores large language models (LLMs) for GPU kernel generation, but prior studies report unstable correctness and significant performance gaps for complex operators such as attention.
  We present CuBridge, an LLM-based framework that adapts expert-written attention kernels through a structured lift-transfer-lower workflow. CuBridge starts from expert-written CUDA attention kernels and lifts them into an executable intermediate representation that makes execution orchestration explicit while abstracting low-level CUDA syntax. Given a user-provided PyTorch specification, CuBridge generates and verifies a target IR program, then reconstructs optimized CUDA code via reference-guided lowering. Across diverse attention variants and GPU platforms, CuBridge consistently produces correct kernels and substantially outperforms general frameworks, compiler-based approaches, and prior LLM-based methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05023v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xing Ma, Yangjie Zhou, Wu Sun, Zihan Liu, Jingwen Leng, Yun Lin, Shixuan Sun, Minyi Guo, Jin Song Dong</dc:creator>
    </item>
    <item>
      <title>Detecting Hallucinations in Large Language Models via Internal Attention Divergence Signals</title>
      <link>https://arxiv.org/abs/2605.05025</link>
      <description>arXiv:2605.05025v1 Announce Type: new 
Abstract: We propose a lightweight and single-pass uncertainty quantification method for detecting hallucinations in Large Language Models. The method uses attention matrices to estimate uncertainty without requiring repeated sampling or external models. Specifically, we measure the Kullback-Leibler divergence between each attention head's distribution and a uniform reference distribution, and use these features in a logistic regression probe. Across multiple datasets, task types, and model families, attention divergence is highly predictive of answer correctness and performs competitively with existing uncertainty estimation methods. We find that this signal is concentrated in middle layers and on factual tokens such as named entities and numbers, suggesting that attention dynamics provides an efficient and interpretable white-box signal of model uncertainty.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05025v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gijs van Dijk</dc:creator>
    </item>
    <item>
      <title>Local Intrinsic Dimension Unveils Hallucinations in Diffusion Models</title>
      <link>https://arxiv.org/abs/2605.05026</link>
      <description>arXiv:2605.05026v1 Announce Type: new 
Abstract: Diffusion models are prone to generating structural hallucinations - samples that match the statistical properties of the training data yet defy underlying structural rules, resulting in anomalies like hands with more than five fingers. Recent research studied this failure mode from several viewpoints, offering partial explanations to their occurrence, such as mode interpolation. In this work, we propose a complementary perspective that treats hallucinations as instabilities on the model-induced manifold. We begin by showing that a hallucination filter based on such instabilities matches or exceeds the performance of the recently proposed temporal one. By tracing the source of these instabilities, we identify local intrinsic dimension (LID) as their primary driver and propose Intrinsic Quenching (IQ), a direct corrective mechanism that deflates it to alleviate hallucinations. IQ consistently outperforms standard hallucination reduction baselines across a wide array of benchmarks and offers a highly promising solution for enforcing anatomical consistency in downstream medical imaging tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05026v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bartlomiej Sobieski, Matthew Tivnan, Dawid P{\l}udowski, Micha{\l} Jan W{\l}odarczyk, Pengfei Jin, Przemyslaw Biecek, Quanzheng Li</dc:creator>
    </item>
    <item>
      <title>Prompt-Anchored Vision-Text Distillation for Lifelong Person Re-identification</title>
      <link>https://arxiv.org/abs/2605.05027</link>
      <description>arXiv:2605.05027v1 Announce Type: new 
Abstract: Lifelong person re-identification (LReID) aims to train a generalizable model with sequentially collected data. However, such models often suffer from semantic drift, limited adaptability, and catastrophic forgetting as new domains emerge. Existing exemplar-free approaches largely rely on visual-only distillation or parameter regularization, while overlooking the potential of auxiliary modalities, such as text, to preserve semantic stability and enable incremental plasticity. We observe that the frozen text encoder in pretrained vision-language models can serve as a stable semantic anchor across domains. To decouple the roles of vision and text, we propose Prompt-Anchored vision-text Distillation (PAD), an asymmetric vision-text framework for semantic alignment and cross-domain generalization. On the textual side, we distill prompts to preserve vision-text alignment under a fixed semantic space, acting as a global semantic reference rather than a dominant learning signal. On the visual side, an EMA-based teacher with an adaptive prompt pool enables domain-wise adaptation by allocating new slots while freezing past ones. Extensive experiments show that PAD substantially outperforms state-of-the-art methods across seen and unseen domains, achieving a strong balance between stability and plasticity. Project page is available at https://github.com/zu-zi/PAD.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05027v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wen Wen, Hao Chen, Shiliang Zhang</dc:creator>
    </item>
    <item>
      <title>The Predictive-Causal Gap: An Impossibility Theorem and Large-Scale Neural Evidence</title>
      <link>https://arxiv.org/abs/2605.05029</link>
      <description>arXiv:2605.05029v1 Announce Type: new 
Abstract: We report a systematic failure mode in predictive representation learning. Across 2695 neural network configurations trained to predict linear-Gaussian dynamics, the optimal encoder tracks the environment rather than the system it is meant to model. The mean causal fidelity -- the fraction of encoder sensitivity allocated to system degrees of freedom -- is 0.49, and only 2.5% of configurations exceed 0.70. The failure intensifies with dimension: at N=100, the optimal encoder becomes causally blind (fidelity ~10^{-8}) while achieving 92% lower prediction error than the causal representation. We prove this is not an optimization artifact but a structural property of the predictive objective: when environment modes are slower or less noisy than system modes, every minimizer of the population risk encodes the former. The set of dynamics exhibiting this predictive-causal gap is open and of positive measure in parameter space. In a nonlinear Duffing-GRU sweep, unconstrained predictors learn environment-dominant representations in 55% of tasks (95% CI 41--68%) versus 24% under operational grounding (p=2.3e-3); the median out-of-distribution MSE inflation under environment shift is 1.82x versus 1.00x. Operational grounding -- restricting the loss to system observables -- partially suppresses the gap, but causal fidelity is never recovered without an explicit system-environment boundary. The results identify the predictive-causal gap as a structural limit of learning, with implications for self-supervised representation learning, world models, and the scaling paradigm.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05029v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kejun Liu</dc:creator>
    </item>
    <item>
      <title>Computer-Aided Design Generation by Cascaded Discrete Diffusion Model</title>
      <link>https://arxiv.org/abs/2605.05031</link>
      <description>arXiv:2605.05031v1 Announce Type: new 
Abstract: Recent deep learning approaches seek to automate CAD creation by representing a model as a sequence of discrete commands and parameters, and then generating them using autoregressive models or continuous diffusion operating in Euclidean embedding space. However, continuous diffusion perturbs representations in a continuous Euclidean domain that does not reflect the inherently discrete and heterogeneous nature of CAD tokens, often producing perturbed representations that map to semantically invalid symbols. To overcome this limitation, we propose a cascaded discrete diffusion framework for CAD generation, which consists of a command diffusion for generating CAD commands and a parameter diffusion conditioned on CAD commands. Unlike isotropic Gaussian perturbation, the forward process of our approach operates directly over categorical token distributions using delicate transition matrices. For commands, we adopt an absorbing-state transition matrix that progressively corrupts tokens to a designated symbol; for parameters, we introduce specific transition matrices tailored to heterogeneous attributes: a Gaussian kernel for coordinate continuity, a scale-invariant kernel for dimensional values, and a prior-preserving kernel for boolean attributes. The reverse process is achieved by two denoising networks: a Transformer-based encoder for command recovery, and a parameter network with extra local self-attention for command-level interaction and cross-attention for conditional injection. Experiments on the DeepCAD dataset show that the proposed approach surpasses existing autoregressive and continuous diffusion models on unconditional generation metrics, while qualitative results validate effective controllability in conditional generation tasks. Source codes will be released.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05031v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Honghu Pan, Xiaoling Luo, Yongyong Chen, Zhenyu He, Pengyang Wang</dc:creator>
    </item>
    <item>
      <title>Quantized Probabilistic AI for Gear Fault Diagnosis in Motor Drives</title>
      <link>https://arxiv.org/abs/2605.05032</link>
      <description>arXiv:2605.05032v1 Announce Type: new 
Abstract: Deploying large artificial intelligence (AI) models in power electronics often demands high computational resources. Driven by the quantization paradigm, this digest proposes a quantization-aware training (QAT) principle to substantially minimize the number of bits required and simultaneously maximize the accuracy of computations in pre-trained AI models. Considering a pre-trained probabilistic Bayesian Neural Network (BNN) for gear fault diagnosis in motor drives as an example, we quantize its weights and activation functions from floating-point FP32 to low-precision INT8 values, which enhances the computational efficiency by a significant margin of 30-45% (for different model versions) without any compromise in the accuracy and uncertainty estimates. This substantiates a sustainable mechanism of deploying most quantized light-weight AI models into low-cost edge processors for power electronic applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05032v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Subham Sahoo, Huai Wang, Frede Blaabjerg</dc:creator>
    </item>
    <item>
      <title>Few-Shot Learning Pipeline for Monkeypox Skin Disease Classification Using CNN Feature Extractors</title>
      <link>https://arxiv.org/abs/2605.05034</link>
      <description>arXiv:2605.05034v1 Announce Type: new 
Abstract: Despite the strong performance of Convolutional Neural Networks (CNNs) in disease classification, their effectiveness often depends on access to large annotated datasets, which is an impractical requirement for emerging or rare conditions such as Monkeypox. To overcome this limitation, we propose a few-shot learning (FSL) framework that employs SimpleShot, a lightweight, non-parametric, inductive classifier, for Monkeypox and pox-like skin disease recognition from limited labeled examples. The proposed pipeline passes the skin lesion images through a frozen, pretrained CNN backbone to obtain feature embeddings, which are then classified via SimpleShot using nearest-centroid comparisons in a normalized embedding space. We systematically benchmark six widely used CNN backbones as feature extractors under consistent experimental settings, enabling fair comparison. Experiments on three publicly available datasets (MSLD v1.0, MSID, and MSLD v2.0) are conducted across 2-way, 4-way, and 6-way tasks with 1-shot, 5-shot, and 10-shot configurations. Among all models, MobileNetV2_100 consistently achieves the highest accuracy. In addition, we present a cross-dataset evaluation for Monkeypox classification, revealing that binary Mpox-vs-Others transfer remains comparatively stable while multi-class performance degrades significantly under domain shift. Together, these results demonstrate the practical utility of combining inductive FSL methods with lightweight CNN backbones and highlight the importance of domain robustness for reliable real-world clinical deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05034v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Md. Safirur Rashid, Sabbir Ahmed, Muhammad Usama Islam, Sumona Hoque Mumu, Md. Hasanul Kabir</dc:creator>
    </item>
    <item>
      <title>Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization</title>
      <link>https://arxiv.org/abs/2605.05040</link>
      <description>arXiv:2605.05040v1 Announce Type: new 
Abstract: On-policy distillation is an efficient alternative to reinforcement learning, offering dense token-level training signals. However, its reliance on a stronger external teacher has driven recent work on on-policy self-distillation, where the same model serves as both teacher and student under different prompt contexts. Yet, existing self-distillation methods largely reduce learning to KL matching toward the context-augmented teacher model. This approach often suffers from training instability and can degrade reasoning performance over time. Moreover, self-distillation from the same model with prompt augmentation lacks the exploratory diversity provided by a genuine external teacher. To address these limitations, we move beyond fixed-teacher KL matching and propose \textbf{P}reference-\textbf{B}ased \textbf{S}elf-\textbf{D}istillation (\textbf{PBSD}), which revisits on-policy self-distillation through a reward-regularized perspective. Instead of directly matching the teacher distribution, we derive a reward-regularized objective whose analytic optimum is a reward-reweighted teacher distribution, yielding a target policy provably superior to the original teacher under this objective. Practically, PBSD optimizes preference gaps between teacher and student samples while maintaining on-policy student sampling. We support this framework with a statistical analysis of the induced preference-learning problem, formally establishing when on policy self-distillation is preferable to learning from an external teacher in our setting. Experiments on mathematical reasoning and tool-use benchmarks across multiple model scales demonstrate that PBSD consistently achieves the strongest average performance among comparable baselines, showing improved training stability over prior self-distillation baselines while preserving token efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05040v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xin Yu, Liuchen Liao, Yiwen Zhang, Yingchen Yu, Lingzhou Xue, Qinzhen Guo</dc:creator>
    </item>
    <item>
      <title>Finding accurate eigenvalues and eigenvectors of positive semi-definite matrices given a subspace</title>
      <link>https://arxiv.org/abs/2605.05043</link>
      <description>arXiv:2605.05043v1 Announce Type: new 
Abstract: We revisit a classical problem in numerical linear algebra: given an $k$-dimensional subspace $\mathcal{Q}$ that approximates the leading eigenspace of an $n\times n$ positive semi-definite matrix $A$, the goal is to extract high-accuracy eigenvalues. The Rayleigh-Ritz (RR) method is the standard algorithm for the task, which has been shown to be optimal in several ways (when $A$ is symmetric, not necessarily positive semi-definite $A\succeq 0$). In this paper, we show that when $A \succeq 0$, alternative methods can outperform RR, while having the same computational complexity, that is, the main cost is in computing $AQ$, plus an $O(nk^2)$ term. In particular, we advocate the use of Nystr{\"o}m's method, showing that the approximate eigenvalues always have higher accuracy than RR, and the improvement can be arbitrarily large. The difference is significant, especially when $A$ has a fast-decaying spectrum. A similar improvement is numerically observed for the purpose of approximating the leading eigenvectors. In contrast, when the target eigenvalues are the trailing ones, the situation is reversed, and the Nystr{\"o}m method performs poorly; we suggest a remedy for this situation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05043v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuji Nakatsukasa, Zheng Tang</dc:creator>
    </item>
    <item>
      <title>Efficient Cost-Based Rewrite in a Bottom-Up Optimizer</title>
      <link>https://arxiv.org/abs/2605.05044</link>
      <description>arXiv:2605.05044v1 Announce Type: new 
Abstract: The query optimizer in a Database Management Systems (DBMS), translates declarative queries into efficient execution plans. Conventional bottom-up optimization consists of two main stages: Query Rewrite (QRW) and Cost-Based Optimization (CBO). However, applying a rewrite rule during QRW may not always be beneficial; the best choice may depend on the (estimated) execution cost of the original and rewritten expressions. Fully exploiting such cost-dependent rules necessitates interleaving QRW with frequent CBO invocations, thereby incurring substantial overhead and often impractical optimization times. To mitigate this inefficiency, we introduce a novel cost-based rewrite framework for bottom-up optimizers. The core of our approach is a multi-level caching mechanism for intermediate CBO results aimed at eliminating redundant computation. Furthermore, we establish and exploit upper cost bounds to intelligently prune the search space during optimization. We also contribute methodological solutions for caching and reusing intermediate plan results within a bottom-up optimizer architecture. The framework has been implemented in the GaussDB optimizer. Experiments show that it significantly reduces overall optimization time, demonstrating the effectiveness of our approach.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05044v1</guid>
      <category>cs.DB</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Qi Cheng, Yang Sun, Weidong Yu, Danny Chen, Weicheng Wang, Chong Chen, Per-Ake Larson</dc:creator>
    </item>
    <item>
      <title>When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and Noise</title>
      <link>https://arxiv.org/abs/2605.05045</link>
      <description>arXiv:2605.05045v1 Announce Type: new 
Abstract: Vision-language models (VLMs) achieve strong multimodal performance but remain prone to relation hallucination, which requires accurate reasoning over inter-object interactions. We study the impact of visual perturbations, specifically rotation and noise, and show that even mild distortions significantly degrade relational reasoning across models and datasets. We further evaluate prompt-based augmentation and preprocessing strategies (orientation correction and denoising), finding that while they offer partial improvements, they do not fully resolve hallucinations. Our results reveal a gap between perceptual robustness and relational understanding, highlighting the need for more robust, geometry-aware VLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05045v1</guid>
      <category>cs.CV</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Philip Wootaek Shin, Ajay Narayanan Sridhar, Sivani Devarapalli, Rui Zhang, Jack Sampson, Vijaykrishnan Narayanan</dc:creator>
    </item>
    <item>
      <title>Sampling Simultaneous Edge-Colorings</title>
      <link>https://arxiv.org/abs/2605.05046</link>
      <description>arXiv:2605.05046v1 Announce Type: new 
Abstract: We study the sampling problem for simultaneous edge colorings. Given a pair of graphs $G_1=(V,E_1)$ and $G_2=(V,E_2)$ which are on the same vertex set $V$, a simultaneous edge coloring is an edge coloring of $G_1\cup G_2$ so that each of the individual graphs is properly colored. When each of $G_1$ and $G_2$ are of maximum degree $\Delta$, then it is conjectured that $\Delta+2$ colors suffice, and recent work asymptotically establishes the conjecture.
  We study Markov chains for randomly sampling from the uniform distribution over simultaneous edge colorings. Straightforward applications of Jerrum's classical coupling argument establish rapid mixing of the Glauber dynamics on the corresponding line graph when $k&gt;8\Delta$. We present a simple weighted Hamming distance for which Jerrum's coupling yields optimal mixing time (up to constant factors) of $O(m\log{n})$ when $k&gt;(6+\delta)\Delta$ for any fixed $\delta&gt;0$. Moreover, utilizing the flip dynamics with our new metric, we obtain $O(m\log{n})$ mixing of the flip dynamics with a local choice of flip parameters, which flips only bounded-size components, when $k\geq 5.95\Delta$. The proof adapts previous coupling analyses for the flip dynamics to the setting of simultaneous edge colorings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05046v1</guid>
      <category>cs.DM</category>
      <category>math.PR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ezra Furtado-Tiwari, Eric Vigoda</dc:creator>
    </item>
    <item>
      <title>Local Homophily on Bicolored Graphs is $\mathbf{P}$-complete</title>
      <link>https://arxiv.org/abs/2605.05047</link>
      <description>arXiv:2605.05047v1 Announce Type: new 
Abstract: We propose a local transformation on bicolored graphs, which we call local homophily, inspired by adaptive networks and based on majority dynamics and homophily. In this transformation, a vertex updates its color to match the majority of its neighbors, while neighbors of the same color become connected and neighbors of the opposite color become disconnected.
  We show how to simulate Boolean circuits using local homophily and establish that determining whether a given pair of vertices becomes connected under iterative applications of local homophily is $\mathbf{P}$-complete under logspace reductions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05047v1</guid>
      <category>cs.CC</category>
      <category>cs.DM</category>
      <category>math.CO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pablo Concha-Vega</dc:creator>
    </item>
    <item>
      <title>Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism</title>
      <link>https://arxiv.org/abs/2605.05049</link>
      <description>arXiv:2605.05049v1 Announce Type: new 
Abstract: Frontier models increasingly adopt Mixture-of-Experts (MoE) architectures to achieve large-model performance at reduced cost. However, training MoE models on HPC platforms is hindered by large memory footprints, frequent large-scale communication across heterogeneous networks, and severe workload imbalance. To characterize these challenges, we develop a mathematical model that quantifies memory, compute, and communication requirements for MoE configurations under various parallelization schemes, verified through micro-benchmarking, code instrumentation, and hardware profiling. Our analysis identifies performance bottlenecks: all-to-all latency at scale from expert parallelism, insufficient compute-communication overlap, low GPU utilization from imbalanced skinny GEMMs, and the absence of platform-aware hybrid parallelization strategies. To address these, we introduce Piper, a framework that leverages resource modeling to identify efficient training strategies for MoE models on target HPC platforms, applying pipeline parallelism with optimized schedules. Piper achieves 2-3.5X higher MFU than state-of-the-art frameworks such as X-MoE, and a novel all-to-all algorithm delivers 1.2-9X bandwidth over vendor implementation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05049v1</guid>
      <category>cs.DC</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sajal Dash, Feiyi Wang</dc:creator>
    </item>
    <item>
      <title>Kinematic Discriminants of Deceleration Behavior Modes in Car-Following: Evidence from NGSIM Trajectory Data</title>
      <link>https://arxiv.org/abs/2605.05050</link>
      <description>arXiv:2605.05050v1 Announce Type: new 
Abstract: Gap-closing rate and visual looming swap discriminative dominance depending on deceleration intensity - a finding that reconciles a long-standing conflict in the car-following literature and challenges spacing-centered assumptions in traditional driver behavior models. This study presents a two-stage analytical framework that distinguishes between information availability (kinematic variables measurable in the environment) and information utilization (variables that demonstrably separate driver behavioral patterns), applied to 1,060,119 valid car-following observations from the NGSIM trajectory dataset (2,932 vehicles). Six kinematic features are extracted, and deceleration events are detected under two threshold conditions (-0.5 m/s^2 and -0.3 m/s^2). K-means clustering identifies behavioral modes, and one-way ANOVA with eta-squared effect sizes ranks each feature's discriminative power. Three key findings emerge: (1) threshold selection fundamentally shapes behavioral inference - the stricter threshold yields three interpretable modes while the permissive threshold collapses these to two; (2) hard braking prioritizes gap-closing rate (eta^2 = 0.715) while moderate braking emphasizes visual looming (eta^2 = 0.574); and (3) spacing headway is negligible (eta^2 &lt;= 0.014) across both thresholds. These findings provide empirically grounded candidates for perceptual cue prioritization and have direct implications for ADAS warning system design and autonomous vehicle control.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05050v1</guid>
      <category>eess.SY</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Eni Solomon Laughter</dc:creator>
    </item>
    <item>
      <title>Reduced-order Neural Modeling with Differentiable Simulation for High-Detail Tactile Perception</title>
      <link>https://arxiv.org/abs/2605.05053</link>
      <description>arXiv:2605.05053v1 Announce Type: new 
Abstract: Tactile perception is key to dexterous manipulation, yet simulating high-resolution elastomer deformation remains computationally prohibitive. Finite element methods (FEM) deliver high fidelity but demand costly remeshing, while Material Point Methods (MPM) suffer from heavy particle-memory tradeoffs. We propose a {reduced-order neural simulation framework} that couples coarse-grained MPM dynamics with an implicit neural decoder to reconstruct sub-particle tactile details from compact latent states. The framework learns a continuous deformation manifold from paired high- and low-resolution simulations, enabling physically consistent, differentiable inference. Compared to the TacIPC, our method achieves over 65\% faster simulation and {40\% lower memory usage}, while maintaining better geometric fidelity. In tactile rendering and 3D surface reconstruction, our methods further improve accuracy by 25\% and produce realistic depth images and surface mesh within a faster inference speed. These results demonstrate that the proposed reduced-order neural model enables high-detail, physically grounded tactile simulation with substantial efficiency gains for robotic interaction and optimization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05053v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuhu Guo, Zhikai Shen, Jiasheng Qu, Chenghao Qian, Yuming Huang, Bin Chen, Guoxing Fang</dc:creator>
    </item>
    <item>
      <title>Direct Product Flow Matching: Decoupling Radial and Angular Dynamics for Few-Shot Adaptation</title>
      <link>https://arxiv.org/abs/2605.05054</link>
      <description>arXiv:2605.05054v1 Announce Type: new 
Abstract: Recent flow matching (FM) methods improve the few-shot adaptation of vision-language models, by modeling cross-modal alignment as a continuous multi-step flow. In this paper, we argue that existing FM methods are inherently constrained by incompatible geometric priors on pre-trained cross-modal features, resulting in suboptimal adaptation performance. We first analyze these methods from a polar decomposition perspective (i.e., radial and angular sub-manifolds). Under this new geometric view, we identify three overlooked limitations in them: 1) Angular dynamics distortion: The radial-angular coupling induces non-uniform speed on the angular sub-manifold, leading to regression training difficulty and extra truncation errors. 2) Radial dynamics neglect: Feature normalization discards modality confidence, failing to distinguish out-of-distribution and in-distribution data, and abandoning crucial radial dynamics. 3) Context-agnostic unconditional flow: Dataset-specific information loss during pre-trained cross-modal feature extraction remains unrecovered. To resolve these issues, we propose warped product flow matching (WP-FM), a unified Riemannian framework that reformulates alignment on a warped product manifold. Within this framework, we derive direct product flow matching (DP-FM) by introducing a constant-warping metric, which yields a decoupled cylindrical manifold (i.e., direct product manifold). DP-FM enables independent radial evolution and constant-speed angular geodesic transport, effectively eliminating angular dynamics distortion while preserving radial consistency. Meanwhile, we incorporate classifier-free guidance by conditioning the flow on the pre-trained VLMs' hidden states to inject missing dataset-specific information. Extensive results across 11 benchmarks have demonstrated that DP-FM achieves a new state-of-the-art for multi-step few-shot adaptation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05054v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hongxu Chen, Yanghao Wang, Bowei Zhu, Hongxiang Li, Zhen Wang, Ziqi Jiang, Lin Li, Rui Liu, Long Chen</dc:creator>
    </item>
    <item>
      <title>Adaptive Learning Strategies for AoA-Based Outdoor Localization: A Comprehensive Framework</title>
      <link>https://arxiv.org/abs/2605.05055</link>
      <description>arXiv:2605.05055v1 Announce Type: new 
Abstract: Localization in 5G and 6G networks is essential for important use cases such as intelligent transportation, smart factories, and smart cities. Although deep learning has enabled improving localization accuracy, depending on the deployment scenario and the effort required for dataset collection campaigns on a given infrastructure, the training process for localization models can vary significantly. Furthermore, with respect to feature selection, recent works have demonstrated the robustness of angle-of-arrival (AoA) based localization. In view of these two points, we propose an adaptive framework for AoA-based localization that consists of two alternative learning strategies, each suited either for large or small training datasets. The proposed framework is evaluated on a real, massive multiple input multiple output (mMIMO) orthogonal frequency division multiplexing (OFDM) outdoor channel state information (CSI) dataset. First, we investigate offline learning when large training datasets are available; we propose a hierarchical framework that first distinguishes between line of sight (LoS) and non line of sight (NLoS) regions and then moves to more fine grained localization in the respective region. This approach provides high-performance localization through accumulated batch retraining and an integrated hyperparameter optimization mechanism. Second, when only a small training dataset is available, an online learning framework is proposed, using incremental tree-based and ensemble-based models for handling streaming data and continuously updating mode, as well as an online few-shot learning model for rapidly initializing new classes from a limited labeled support set. These results showcase that highly accurate robust localization can be achieved incrementally during network operation by exploiting online learning, alleviating the need for large dataset collection campaigns.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05055v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>eess.SP</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bac Trinh-Nguyen, Sara Berri, Sin G. Teo, Tram Truong-Huu, Arsenia Chorti</dc:creator>
    </item>
    <item>
      <title>ScriptHOI: Learning Scripted State Transitions for Open-Vocabulary Human-Object Interaction Detection</title>
      <link>https://arxiv.org/abs/2605.05057</link>
      <description>arXiv:2605.05057v1 Announce Type: new 
Abstract: Open-vocabulary human-object interaction (HOI) detection requires recognizing interaction phrases that may not appear as annotated categories during training. Recent vision-language HOI detectors improve semantic transfer by matching human-object features with text embeddings, but their predictions are often dominated by object affordance and phrase-level co-occurrence. As a result, a model may predict \textit{cut cake} from the presence of a knife and a cake without verifying whether the hand, tool, target, contact pattern, and object state jointly support the action. We propose \textbf{ScriptHOI}, a structured framework that represents each interaction phrase as a soft scripted state transition. Rather than treating a phrase as a single class token, ScriptHOI decomposes it into body-role, contact, geometry, affordance, motion, and object-state slots. A visual state tokenizer parses each detected human-object pair into corresponding state tokens, and a slot-wise matcher estimates both script coverage and script conflict. These two quantities calibrate HOI logits, expose missing visual evidence, and provide training constraints for incomplete annotations. To avoid suppressing valid but unannotated interactions, we further introduce interval partial-label learning, which constrains unannotated candidates with script-derived lower and upper probability bounds instead of assigning closed-world negatives. A counterfactual script contrast loss swaps individual script slots to discourage object-only shortcuts. Experiments on HICO-DET, V-COCO, and open-vocabulary HOI splits show that ScriptHOI improves rare and unseen interaction recognition while substantially reducing affordance-conflict false positives.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05057v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Minh Anh Nguyen, Quang Huy Tran, Bao Ngoc Le, SuiYang Guang, Tuan Kiet Pham, Linh Chi Vo</dc:creator>
    </item>
    <item>
      <title>SoK: Robustness in Large Language Models against Jailbreak Attacks</title>
      <link>https://arxiv.org/abs/2605.05058</link>
      <description>arXiv:2605.05058v1 Announce Type: new 
Abstract: Large Language Models (LLMs) have achieved remarkable success but remain highly susceptible to jailbreak attacks, in which adversarial prompts coerce models into generating harmful, unethical, or policy-violating outputs. Such attacks pose real-world risks, eroding safety, trust, and regulatory compliance in high-stakes applications. Although a variety of attack and defense methods have been proposed, existing evaluation practices are inadequate, often relying on narrow metrics like attack success rate that fail to capture the multidimensional nature of LLM security. In this paper, we present a systematic taxonomy of jailbreak attacks and defenses and introduce Security Cube, a unified, multi-dimensional framework for comprehensive evaluation of these techniques. We provide detailed comparison tables of existing attacks and defenses, highlighting key insights and open challenges across the literature. Leveraging Security Cube, we conduct benchmark studies on 13 representative attacks and 5 defenses, establishing a clear view of the current landscape encompassing jailbreak attacks, defenses, automated judges, and LLM vulnerabilities. Based on these evaluations, we distill critical findings, identify unresolved problems, and outline promising research directions for enhancing LLM robustness against jailbreak attacks. Our analysis aims to pave the way towards more robust, interpretable, and trustworthy LLM systems. Our code is available at Code.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05058v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Feiyue Xu, Hongsheng Hu, Chaoxiang He, Sheng Hang, Hanqing Hu, Xiuming Liu, Yubo Zhao, Zhengyan Zhou, Bin Benjamin Zhu, Shi-Feng Sun, Dawu Gu, Shuo Wang</dc:creator>
    </item>
    <item>
      <title>A Comparison Between Co-Located and Distributed MIMO Deployments in OFDM-ISAC Networks</title>
      <link>https://arxiv.org/abs/2605.05059</link>
      <description>arXiv:2605.05059v1 Announce Type: new 
Abstract: This paper investigates network-level integrated sensing and communication (ISAC) under two fundamentally different topology configurations: cell-free massive MIMO (CF-mMIMO) and multi-cell massive MIMO (MC-mMIMO). A unified OFDM-based waveform is adopted for both architectures as the key enabler for ISAC functionalities. The CF system exploits distributed access points (APs) and a scalable user-target-centric operation, whereas the MC system relies on co-located transmit-receive arrays with conventional cell-centric deployment. For both architectures, we derive a GLRT-based sensing detector and the corresponding sensing SNR expressions. We then examine a series of case studies investigating how the number of OFDM subcarriers, the transceiver allocation strategy, and the antenna/node distribution across the network affect the sensing performance. The results consistently demonstrate that CF-mMIMO provides more robust and higher sensing performance across most tested scenarios, particularly when transmit resources or antenna elements are spatially distributed. These findings highlight the inherent advantages of CF deployments for next-generation ISAC networks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05059v1</guid>
      <category>cs.IT</category>
      <category>eess.SP</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Maryam Darabi, Sergi Liesegang, Emanuele Grossi, Stefano Buzzi</dc:creator>
    </item>
    <item>
      <title>Full-chip CMP modelling based on Fully Convolutional Network leveraging White Light Interferometry</title>
      <link>https://arxiv.org/abs/2605.05062</link>
      <description>arXiv:2605.05062v1 Announce Type: new 
Abstract: As time-to-market is crucial in the Integrated Circuit (IC) industry, speeding up layout manufacturability verifi-cation is essential. Chemical-Mechanical Polishing (CMP) plays a vital role in IC fabrication but is significantly influenced by Layout-Dependent Effects (LDE). An accurate and efficient CMP model enables design teams to correct surface unevenness before fabrication, reducing costs and accelerating the design phase. However, existing models often rely on Density Step Height (DSH) modeling, which is time-consuming for calibration and requires substantial hardware resources for fine-grained predictions. In this paper, we propose combining the advantages of two surface analysis techniques, White Light Interfer-ometry (WLI) and Atomic Force Microscopy (AFM), to train a deep learning model. This model aims to predict full-chip post-CMP nanotopography with nanometer-scale accuracy. Our deep learning model is based on a Convolutional Neural Network (CNN) and follows a two-step pipeline. The model is trained on each technique separately, resulting in a detailed full-chip CMP model.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05062v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Jules Exbrayat, Renan Bouis, Elie Sezestre, Viorel Balan, Arnaud Cornelis, Damien Hebras, Catherine Euvrard</dc:creator>
    </item>
    <item>
      <title>The Impossibility Triangle of Long-Context Modeling</title>
      <link>https://arxiv.org/abs/2605.05066</link>
      <description>arXiv:2605.05066v1 Announce Type: new 
Abstract: We identify and prove a fundamental trade-off governing long-sequence models: no model can simultaneously achieve (i) per-step computation independent of sequence length (Efficiency), (ii) state size independent of sequence length (Compactness), and (iii) the ability to recall a number of historical facts proportional to sequence length (Recall). We formalize this trade-off within an Online Sequence Processor abstraction that unifies Transformers, state space models, linear recurrent networks, and their hybrids. Using the Data Processing Inequality and Fano's Inequality, we prove that any model satisfying Efficiency and Compactness can recall at most O(poly(d)/log V) key-value pairs from a sequence of arbitrary length, where d is the model dimension and V is the vocabulary size. We classify 52 architectures published before March 2026 into the triangle, showing that each achieves at most two of the three properties and that hybrid architectures trace continuous trajectories in the interior. Experiments on synthetic associative recall tasks with five representative architectures validate the theoretical bound: empirical recall capacity lies strictly below the information-theoretic limit, and no architecture escapes the triangle.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05066v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yan Zhou</dc:creator>
    </item>
    <item>
      <title>Look Once, Beam Twice: Camera-Primed Real-Time Double-Directional mmWave Beam Management for Vehicular Connectivity</title>
      <link>https://arxiv.org/abs/2605.05071</link>
      <description>arXiv:2605.05071v1 Announce Type: new 
Abstract: Millimeter-wave (mmWave) frequencies promise multi-gigabit connectivity for vehicle-to-everything (V2X) networks, but face challenges in terms of severe path loss and mobility-related beam misalignment. Reliable V2X connectivity requires fast, double-directional beam alignment. However, existing methods suffer from high training overhead and limited generalization to unseen scenarios. This paper presents VIsion-based BEamforming(VIBE), a hybrid model-based, closed-loop, learning architecture for real-time double-directional mmWave beam management primed by camera sensing. VIBE fuses machine learning, model-based reasoning, and closed-loop RF feedback to balance beam-pair establishment latency with link quality. VIBE bypasses exhaustive training overhead and accelerates link establishment by leveraging camera observations to reduce the beam-search space. Lightweight beam refinement and offset tracking mechanisms adaptively refine beams in response to dynamic application requirements. VIBE is implemented and evaluated across online indoor/outdoor testbeds, public datasets, and real-time vehicular experiments, demonstrating strong generalization capabilities, making it suitable for real-time V2X communication. Comparisons with 5G NR hierarchical beamforming show that VIBE consistently maintains lower outage rates. Furthermore, VIBE outperforms state-of-the-art end-to-end ML models for beam selection when evaluated on public datasets and achieves outage rates as low as 1.1-1.4 %. The results show that a hybrid model-based, closed-loop learning architecture is better suited for real-world mmWave vehicular connectivity than end-to-end trained ML models. For reproducibility, we publish our code to https://github.com/UNL-CPN-Lab/Look-Once-Beam-Twice.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05071v1</guid>
      <category>cs.NI</category>
      <category>cs.AI</category>
      <category>cs.CE</category>
      <category>cs.CV</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Avhishek Biswas, Apala Pramanik, Eylem Ekici, Mehmet C. Vuran</dc:creator>
    </item>
    <item>
      <title>Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy</title>
      <link>https://arxiv.org/abs/2605.05072</link>
      <description>arXiv:2605.05072v1 Announce Type: new 
Abstract: 3D occupancy prediction aims to infer dense, voxel-wise scene semantics from sensor observations, where the 2D-to-3D view transformation serves as a crucial step in bridging image features and volumetric representations. Most previous methods rely on a fixed projection space, where 3D reference points are uniformly sampled along pillars. However, such sampling struggles to capture the sparsity and height variations of real-world scenes, leading to ambiguous correspondences and unreliable feature aggregation. To address these challenges, we propose HiPR, a camera-LiDAR occupancy framework with Height-Guided Projection Reparameterization. HiPR first encodes LiDAR into a BEV height map to capture the maximum height of the point cloud. HiPR then adjusts the sampling range of each pillar using the height prior, enabling adaptive reparameterization of the projection space. As a result, the projected points are redistributed into geometrically meaningful regions rather than fixed ranges. Meanwhile, we mask out the invalid parts of the height map to avoid misleading the feature aggregation. In addition, to alleviate the training instability caused by noisy LiDAR-derived heights, we introduce a training-time Progressive Height Conditioning strategy, which gradually transitions the conditioning signal from ground-truth heights to LiDAR heights. Extensive experiments demonstrate that HiPR consistently outperforms existing state-of-the-art methods while maintaining real-time inference. The code and pretrained models can be found at https://github.com/Rayn-Wu/HiPR.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05072v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuan Wu, Zhiqiang Yan, Jiawei Lian, Zhengxue Wang, Jian Yang</dc:creator>
    </item>
    <item>
      <title>FlowDIS: Language-Guided Dichotomous Image Segmentation with Flow Matching</title>
      <link>https://arxiv.org/abs/2605.05077</link>
      <description>arXiv:2605.05077v1 Announce Type: new 
Abstract: Accurate image segmentation is essential for modern computer vision applications such as image editing, autonomous driving, and medical image analysis. In recent years, Dichotomous Image Segmentation (DIS) has become a standard task for training and evaluating highly accurate segmentation models. Existing DIS approaches often fail to preserve fine-grained details or fully capture the semantic structure of the foreground. To address these challenges, we present FlowDIS, a novel dichotomous image segmentation method built on the flow matching framework, which learns a time-dependent vector field to transport the image distribution to the corresponding mask distribution, optionally conditioned on a text prompt. Moreover, with our Position-Aware Instance Pairing (PAIP) training strategy, FlowDIS offers strong controllability through text prompts, enabling precise, pixel-level object segmentation. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches both with and without language guidance. Compared with the best prior DIS method, FlowDIS achieves a 5.5% higher $F_{\beta}^{\omega}$ measure and 43% lower MAE ($\mathcal{M}$) on the DIS-TE test set. The code is available at: https://github.com/Picsart-AI-Research/FlowDIS</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05077v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Andranik Sargsyan, Shant Navasardyan</dc:creator>
    </item>
    <item>
      <title>A unified Benchmark for Multi-Frame Image Restoration under Severe Refractive Warping</title>
      <link>https://arxiv.org/abs/2605.05079</link>
      <description>arXiv:2605.05079v1 Announce Type: new 
Abstract: Video sequence capturing through refractive dynamic media, such as a turbulent air or water surface, often suffer from severe geometric distortions and temporal instability. While recent advances address mild atmospheric turbulence, no existing benchmarks systematically evaluate restoration methods under strong and highly nonuniform refractive conditions. We present a comprehensive benchmark for geometric distortion removal in video, covering a range from turbulence-like mild warping to strong discontinuous refractive deformations. The benchmark includes both laboratory-captured real data and synthetic sequences generated for static scenes via physics-based light refraction modeling across four distortion levels and multiple surface wave types. We evaluate a spectrum of methods from simple baselines and classical registration algorithms to advanced learning-based approaches including DATUM and our proposed diffusion based V-cache for high and extreme distortions regimes. Evaluation uses both pixel-level (PSNR, SSIM), and perceptual (LPIPS, DINO, CLIP) metrics providing the first large scale analysis of geometric distortion removal. Our benchmark establishes a new foundation for developing and evaluating algorithms capable of reconstructing video from highly distorted optical environments. Our code and datasets are available at https://github.com/iafoss/refractive-mfir-benchmark.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05079v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Maxim V. Shugaev, Md Reshad Ul Hoque, Bridget Kennedy, Joseph T. Riley, Fiona Hwang, Justin Hagen, Harvir Ghuman, Ethan Garcia-O'Donnell, Syed Noor Qadri, Freddie Santiago, Mun Wai Lee</dc:creator>
    </item>
    <item>
      <title>The Pinocchio Dimension: Phenomenality of Experience as the Primary Axis of LLM Psychometric Differences</title>
      <link>https://arxiv.org/abs/2605.05080</link>
      <description>arXiv:2605.05080v1 Announce Type: new 
Abstract: We administer 45 validated psychometric questionnaires to 50 large language models (LLMs) to identify the dimensions along which LLMs differ psychometrically. Using Supervised Semantic Differential (SSD), we find that the primary axis of between-model variance separates items describing phenomenally rich experience, including embodied sensation, felt affect, inner speech, imagery, and empathy, from items describing stimulus-driven behavioral reactivity ($R^2_{adj}=.037$, $p&lt;.0001$). To test this hypothesis at the item level, we introduce the Pinocchio score ($\pi_i$), the ratio of inter-model response variance under neutral prompting to that under a human-simulation prompt, as an annotation-free measure of each item's experiential demand. $\pi_i$ predicts condition-induced shifts in primary factor loading magnitudes ($\rho=-.215$, $p&lt;.0001$, $n=1292$--$1310$ items), confirming that between-model divergence on experiential items is structured rather than noisy. Applying PCA to per-model EFA scores across all questionnaires reveals one dominant dimension, the Pinocchio Axis ($\Pi$): the degree to which a model presents itself as a locus of phenomenal experience rather than a system of behavioral responses. This axis captures 47.1% of cross-questionnaire between-model variance in primary factor scores and converges with item-level Pinocchio scores ($r=.864$). Marked within-provider divergence across closely related model variants is consistent with post-training fine-tuning as a key contributor, supporting the interpretation that $\Pi$ reflects a training-shaped self-representational tendency governing how a model treats experiential language as self-applicable. The dominant axis of between-model psychometric variation is therefore not a conventional personality trait but a self-representational stance toward one's own nature as an experiencer.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05080v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hubert Plisiecki, Sabina Siudaj, Kacper Dudzic, Anna Sterna, Maciej Gorski, Karolina Drozdz, Marcin Moskalewicz</dc:creator>
    </item>
    <item>
      <title>Provable imitation learning for control of instability in partially-observed Vlasov--Poisson equations</title>
      <link>https://arxiv.org/abs/2605.05081</link>
      <description>arXiv:2605.05081v1 Announce Type: new 
Abstract: We consider the stabilization of Vlasov--Poisson plasma dynamics, a central control problem in nuclear fusion. Our focus is the gap between what an ideal controller would use and what experiments can actually observe: while optimal policy may rely on the full phase-space state, practical feedback is typically limited to sparse macroscopic diagnostics. We therefore study imitation learning methods that distill a fully observed expert policy into controllers operating only on macroscopic measurements. We show the stability guarantees of the learned policy, where the error floor depends on the minimal behavior cloning loss achievable under the observation constraints. We further characterize this minimal loss in terms of a notion of entropy that quantifies the complexity of the initial distribution. Our results demonstrates the theoretical feasibility of learning stabilizing feedback policies for kinetic plasma dynamics from macroscopic observations, and exhibits the adaptivity of the learning approach to low-complexity structures. Through extensive numerical experiments, we validate our theory and show that the learned policies can stabilize the system using only macroscopic observations, within a significantly longer time horizon than non-adaptive baseline controllers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05081v1</guid>
      <category>cs.LG</category>
      <category>math.AP</category>
      <category>math.OC</category>
      <category>physics.plasm-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaofan Xia, Qin Li, Wenlong Mou</dc:creator>
    </item>
    <item>
      <title>Order Matters: Improving Domain Adaptation by Reordering Data</title>
      <link>https://arxiv.org/abs/2605.05084</link>
      <description>arXiv:2605.05084v1 Announce Type: new 
Abstract: Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during training, but the discrepancy estimates suffer from high variance in stochastic settings, which can stifle the theoretical benefits of the method. This paper proposes Optimal Reordering of Data for Error-Reduced Estimation of Discrepancy (ORDERED), a novel unbiased stochastic variance reduction technique which reduces the discrepancy estimation error by optimising the order in which the training data are sampled. We consider two specific domain discrepancy losses (correlation alignment and the maximum mean discrepancy), formulate their stochastic estimation error as a function of the data sampling order, and propose a practical optimisation algorithm. Our simulations demonstrate reduced variance compared to related methods, and experiments on two domain shift image classification benchmarks show improved target domain accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05084v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Andrea Napoli, Paul White</dc:creator>
    </item>
    <item>
      <title>Gated Multimodal Learning for Interpretable Property Energy Performance Prediction and Retrofit Scenario Analysis</title>
      <link>https://arxiv.org/abs/2605.05088</link>
      <description>arXiv:2605.05088v1 Announce Type: new 
Abstract: Achieving resilient and sustainable cities requires scalable approaches to decarbonising residential buildings, which account for about 20% of UK greenhouse gas emissions and 25% of energy-related emissions in the European Union. Energy Performance Certificates (EPCs) support regulation and retrofit planning, but their reliance on on-site inspections limits timely city-scale assessment. This study introduces a gated multimodal model to predict Standard Assessment Procedure (SAP) energy efficiency and Environmental Impact (EI) scores by integrating EPC tabular variables, assessor-written free text, and Geographic Information System (GIS)-derived spatial features describing footprint geometry, height, area, and orientation. Sample-wise gating learns property-specific modality weights, while an auxiliary band classification head stabilises training. In a Westminster, London case study, the model predicts SAP and EI scores with MAEs of 4.03 and 4.76 points and R2 values of 0.757 and 0.748, respectively, achieving a mean MAE of 4.39. Ablation results show that full multimodal fusion outperforms unimodal and bimodal baselines for both score prediction and band-level classification. Interpretability analyses provide decision-relevant evidence: gating weights indicate strong reliance on assessor text; SHAP highlights main fuel, built form, and construction age band; text occlusion prioritises roof and wall fields; and spatial attribution is dominated by height and footprint area, with sensitivity to footprint shape. The validated framework is further applied to retrofit scenarios for wall insulation, roof insulation, and window glazing upgrades, indicating projected improvements in SAP, EI, annual energy cost, and equivalent CO2 emissions. Overall, the framework provides scalable property-level evidence for retrofit screening, intervention prioritisation, and net-zero housing transitions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05088v1</guid>
      <category>cs.LG</category>
      <category>physics.soc-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yunfei Bai, Aaron Tesfa Tsion, Raul Rosales, Barbara Shollock, Wei He</dc:creator>
    </item>
    <item>
      <title>Automatically Finding and Validating Unexpected Side-Effects of Interventions on Language Models</title>
      <link>https://arxiv.org/abs/2605.05090</link>
      <description>arXiv:2605.05090v1 Announce Type: new 
Abstract: We present an automated, contrastive evaluation pipeline for auditing the behavioral impact of interventions on large language models. Given a base model $M_1$ and an intervention model $M_2$, our method compares their free-form, multi-token generations across aligned prompt contexts and produces human-readable, statistically validated natural-language hypotheses describing how the models differ, along with recurring themes that summarize patterns across validated hypotheses.
  We evaluate the approach in synthetic setting by injecting known behavioral changes and showing that the pipeline reliably recovers them. We then apply it to three real-world interventions, reasoning distillation, knowledge editing and unlearning, demonstrating that the method surfaces both intended and unexpected behavioral shifts, distinguishes large from subtle interventions, and does not hallucinate differences when effects are absent or misaligned with the prompt bank. Overall, the pipeline provides a statistically grounded and interpretable tool for post-hoc auditing of intervention-induced changes in model behavior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05090v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Quintin Pope, Ajay Hayagreeve Balaji, Jacques Thibodeau, Xiaoli Fern</dc:creator>
    </item>
    <item>
      <title>Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout</title>
      <link>https://arxiv.org/abs/2605.05092</link>
      <description>arXiv:2605.05092v1 Announce Type: new 
Abstract: Safe L2/L3 driving automation requires anticipating human-in-the-loop reactions during shared-control transitions. While most driving world models forecast the external environment, in-cabin intelligence remains strictly recognition-oriented and lacks multi-step rollout capabilities for driver dynamics. We introduce Driver-WM, a driver-centric latent world model that rolls out in-cabin dynamics causally conditioned on out-cabin traffic context. This formulation unifies physical kinematics forecasting with auxiliary behavioral and emotional semantic recognition. Operating in a compact latent space constructed from frozen vision-language features, Driver-WM adopts a dual-stream architecture to separately encode external traffic and internal driver states. These streams are directionally coupled via a gated causal injection mechanism, which uses a learned vector gate to modulate external contextual perturbations while strictly enforcing temporal causality. Evaluations on a multi-task assistive driving benchmark demonstrate that Driver-WM yields robust long-horizon geometric forecasting for reactive high-motion maneuvers and improves semantic alignment for both driver and traffic states. Finally, the explicit external-to-internal conditioning allows for controlled test-time interventions to systematically analyze mechanism responses.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05092v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haozhuang Chi, Daosheng Qiu, Hao Su, Haochen Liu, Zirui Li, Haoruo Zhang, Chen Lv</dc:creator>
    </item>
    <item>
      <title>A Bayesian Approach for Task-Specific Next-Best-View Selection with Uncertain Geometry</title>
      <link>https://arxiv.org/abs/2605.05095</link>
      <description>arXiv:2605.05095v1 Announce Type: new 
Abstract: We develop a framework for task-specific active next-best-view selection in 3D reconstruction from point clouds, by casting the problem in the language of Bayesian decision theory. Our framework works by (a) placing a prior distribution over the space of implicit surfaces, (b) using recently-developed stochastic surface reconstruction methods to calculate the resulting posterior distribution, then (c) using the posterior distribution to carefully reason about which view to scan next. This enables us to perform camera selection in a manner that is directly optimized for the intended use of the reconstructed data - meaning, we reduce uncertainty only in those regions that make a difference in the task at hand, as opposed to prior approaches that reduce it uniformly across space. We evaluate our method across three distinct downstream tasks: semantic classification, segmentation, and PDE-guided physics simulation. Experimental results demonstrate that our framework achieves superior task performance with fewer views compared to commonly used baselines and prior general uncertainty-reduction techniques.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05095v1</guid>
      <category>cs.GR</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>ACM SIGGRAPH 2026</arxiv:journal_reference>
      <dc:creator>Jingsen Zhu, Silvia Sell\'an, Alexander Terenin</dc:creator>
    </item>
    <item>
      <title>CapsID: Soft-Routed Variable-Length Semantic IDs for Generative Recommendation</title>
      <link>https://arxiv.org/abs/2605.05096</link>
      <description>arXiv:2605.05096v1 Announce Type: new 
Abstract: Generative recommendation maps each item to a sequence of Semantic IDs (SIDs) and recasts retrieval as autoregressive token generation. In this paradigm the main bottleneck is the tokenizer rather than the Transformer: residual vector quantization with a hard nearest-neighbor assignment at every layer collapses multi-faceted item semantics at cluster boundaries and propagates early errors to later SID positions. A common workaround is to append a dense vector or attribute prefix to the SID, but this dual-representation design inflates inference cost and gives up the simplicity of a generative interface. We address the bottleneck at the tokenizer itself. CAPSID replaces hard residual quantization with capsule routing: at each layer an item probabilistically routes to several semantic capsules, the residual is updated by the routed reconstruction rather than by a single winning code, and the SID terminates once the active capsule's confidence is high enough. On top of CAPSID, SEMANTICBPE composes adjacent SID tokens into reusable subwords by combining their co-occurrence with their embedding compatibility. On Amazon Beauty, Sports, Toys, and a 35M-item proprietary industrial catalog, CAPSID+SEMANTICBPE improves Recall at 10 by 9.6% on average over ReSID, the strongest single-representation baseline, and matches or exceeds a COBRA-style sparse-dense system on every public benchmark while running at 51% of its inference latency. Ablations show that soft routing, iterative agreement, and confidence-driven length each contribute independently, and the gains are largest on tail items where boundary semantics dominate.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05096v1</guid>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wenzhuo Cheng, Menghang Gong, Qixin Guo, Hang Zheng, Zhaobin Yang, Jianguo Lou, Zhengwei Zheng</dc:creator>
    </item>
    <item>
      <title>Continual Knowledge Updating in LLM Systems: Learning Through Multi-Timescale Memory Dynamics</title>
      <link>https://arxiv.org/abs/2605.05097</link>
      <description>arXiv:2605.05097v1 Announce Type: new 
Abstract: LLMs are trained once, then deployed into a world that never stops changing. External memory compensates for this, but most systems manage it explicitly rather than letting it adapt on its own. Biological memory works differently: coupled multi-timescale dynamics make new associations immediately usable, strengthen what repetition confirms, and let the rest fade. We argue that external memory should follow a similar principle. In Memini, this view takes the form of an associative memory that organizes knowledge as a directed graph. Each edge carries two coupled internal variables, one fast and one slow, following the Benna-Fusi model of synaptic consolidation. From this coupling, episodic sensitivity, gradual consolidation, and selective forgetting emerge as facets of a single mechanism, reframing external memory as a learning substrate that reorganizes through its own dynamics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05097v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Andreas Pattichis, Constantine Dovrolis</dc:creator>
    </item>
    <item>
      <title>Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2605.05102</link>
      <description>arXiv:2605.05102v1 Announce Type: new 
Abstract: We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilistic guarantee that holds uniformly over all confidence levels $\delta \in (0,1]$, thereby characterizing the regret distribution across the full range of $\delta$. We present a simple UCBVI-style algorithm with exploration bonus $\min\{c_{1,k}/N, c_{2,k}/\sqrt{N}\}$, where $N$ denotes the visit count and $(c_{1,k},c_{2,k})$ are user-specified parameters. For arbitrary parameter sequences, we derive general gap-independent and gap-dependent distributional regret bounds, yielding a principled characterization of how the parameters control the trade-off between expected performance, tail risk, and instance-dependent behavior. In particular, our bounds achieve optimal trade-offs between expected and distributional regret in both minimax and instance-dependent regimes. As a special case, for multi-armed bandits with $A$ arms and horizon $T$, we obtain a distributional regret bound of order $\mathcal{O}(\sqrt{AT}\log(1/\delta))$, confirming the conjecture of Lattimore &amp; Szepesv\'ari (2020, Section 17.1) for the first time.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05102v1</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Harin Lee, Min-hwan Oh</dc:creator>
    </item>
    <item>
      <title>Text Corpora as Concept Fields: Black-Box Hallucination and Novelty Measurement</title>
      <link>https://arxiv.org/abs/2605.05103</link>
      <description>arXiv:2605.05103v1 Announce Type: new 
Abstract: We introduce the **Concept Field** of a text corpus: a local drift field with pointwise uncertainty, estimated in sentence-embedding space from the deltas between consecutive sentences. Given a candidate sentence transition, we score its agreement with the field by $\zeta$, the mean absolute z-distance between the observed delta and the field's local Gaussian estimate. The score is black-box (no model internals), corpus-attributable (every score traces to nearby corpus sentences), and admits a direct probabilistic reading. We support the computation with the introduction of a **Vector Sequence Database (VSDB)** that stores embeddings together with sequence-position and next-delta metadata. We evaluate this approach on two large-scale settings: hallucination-style groundedness detection over the U.S. Code of Federal Regulations, and novelty detection over Project Gutenberg. Using controlled LLM-generated rewrites, Concept Fields achieve strong selective classification performance under a grounded / ungrounded / unsure triage policy, which unlike retrieval-centric baselines have similar coverage-risk behavior across both domains, supporting a probability-based interpretation that transfers across domains. We also sketch how divergence and curl of the Concept Field, computed on dense clusters, surface qualitatively meaningful semantic patterns (logic sources, sinks, and implicit topics), which we offer as hypothesis-generating rather than as a quantitative result. Concept Fields provide a fast, lightweight, and interpretable signal for groundedness and novelty, complementary to LLM-as-judge and white-box detectors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05103v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nicholas S. Kersting, Vittorio Castelli, Chieh Ting Yeh, Xinzhu Wang, Saad Taame</dc:creator>
    </item>
    <item>
      <title>Minimizing the Expected Cost of Synchronization in Lossless Power Networks</title>
      <link>https://arxiv.org/abs/2605.05105</link>
      <description>arXiv:2605.05105v1 Announce Type: new 
Abstract: The reliable operation of large-scale electric power networks is increasingly challenging, particularly with the integration of stochastic renewable generation. In this work, we address the problem of minimizing network transients by optimally modifying the underlying network. We formulate the problem in terms of graph Laplacian matrices and show that, under certain assumptions, the problem is convex. We derive a linear matrix inequality whose feasibility guarantees the existence and uniqueness of phase cohesive steady-state angles; this condition can be directly incorporated as a convex constraint in the optimization framework and we provide several geometric interpretations of the optimization problem. The proposed method is validated on the IEEE 30-bus test system, where results demonstrate that our approach effectively identifies critical links on the network. Dynamic simulations show a significant reduction in network transients and overall improvements across several performance metrics. We explore the sparsity-optimality trade-off using a reweighted $\ell_1$ heuristic.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05105v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>math.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gerald Ogbonna, David Bindel, Lindsay C. Anderson</dc:creator>
    </item>
    <item>
      <title>Input-Output Specifications and Dynamic Droop Coefficients: Stability and Performance Conditions for Grid-Forming IBRs</title>
      <link>https://arxiv.org/abs/2605.05107</link>
      <description>arXiv:2605.05107v1 Announce Type: new 
Abstract: This paper proposes dynamic stability and performance conditions for grid-connected inverter-based resources (IBRs). To this end, we extend the notion of steady-state droop coefficients to dynamic droop coefficients to capture the small-signal dynamics of IBRs and synchronous generators (SGs). Notably, the dynamic droop coefficients can be obtained from input-output data collected at the unit's (e.g., IBR or SG) point of interconnection without requiring prior knowledge of IBR internals or controls structure. To obtain frequency stability conditions, this IBR model is combined with a lightweight dynamic transmission network model that accounts for uncertainty of line dynamics. The resulting stability conditions are highly scalable and, given a few key network parameters, can be verified at the unit level. To make the conditions practical and offer intuitive and illustrative interpretations, we map the frequency stability conditions to bounds on the Bode plot of the dynamic droop coefficient for two broad types of IBR responses. Moreover, our specifications on the dynamic droop coefficient (i) translate basic frequency control ancillary services into verifiable requirements, and (ii) provide insights into the much-debated question of how to certify an IBR as grid-forming (GFM). The results are illustrated using dynamic droop coefficients obtained using detailed simulations of GFM and GFL IBRs as well as SGs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05107v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jennifer T. Bui, Dominic Gro{\ss}</dc:creator>
    </item>
    <item>
      <title>LineRides: Line-Guided Reinforcement Learning for Bicycle Robot Stunts</title>
      <link>https://arxiv.org/abs/2605.05110</link>
      <description>arXiv:2605.05110v1 Announce Type: new 
Abstract: Designing reward functions for agile robotic maneuvers in reinforcement learning remains difficult, and demonstration-based approaches often require reference motions that are unavailable for novel platforms or extreme stunts. We present LineRides, a line-guided learning framework that enables a custom bicycle robot to acquire diverse, commandable stunt behaviors from a user-provided spatial guideline and sparse key-orientations, without demonstrations or explicit timing. LineRides handles physically infeasible guidelines using a tracking margin that permits controlled deviation, resolves temporal ambiguity by measuring progress via traveled distance along the guideline, and disambiguates motion details through position- and sequence-based key-orientations. We evaluate LineRides on the Ultra Mobility Vehicle (UMV) and show that the policy trained with our methods supports seamless transitions between normal driving and stunt execution, enabling five distinct stunts on command: MiniHop, LargeHop, ThreePointTurn, Backflip, and DriftTurn.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05110v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Seungeun Rho, Shamel Fahmi, Jeonghwan Kim, Arianna Ilvonen, Sehoon Ha, Gabriel Nelson</dc:creator>
    </item>
    <item>
      <title>Rollout Pass-Rate Control: Steering Binary-Reward RL Toward Its Most Informative Regime</title>
      <link>https://arxiv.org/abs/2605.05112</link>
      <description>arXiv:2605.05112v1 Announce Type: new 
Abstract: SWE-bench-style agentic reinforcement learning relies on expensive stateful trajectories, yet substantial compute is wasted on sampled rollout groups with skewed pass rates, where binary rewards provide a weak contrastive signal. We frame this inefficiency as a pass-rate control problem and show that a 50% pass rate is the most informative operating point: it maximizes reward entropy, the probability of surviving group filtering, RLOO advantage energy under GRPO, and success--failure contrastive structure. Guided by this principle, we propose Prefix Sampling (PS), which replays trajectory prefixes to steer skewed groups toward this regime: successful prefixes serve as head starts for mostly failing groups, while failing prefixes serve as handicaps for mostly passing groups. In stateful agent environments, prefix states are reconstructed through replay while replayed tokens are excluded from the loss, restricting optimization to continuations generated by the current policy. On SWE-bench-style agentic RL, PS delivers end-to-end wall-clock speedups of 2.01x on Qwen3-14B and 1.55x on Qwen3-32B while preserving or improving final verified performance. For 14B, the SWE-bench Verified peak rises from the baseline peak of 0.273 to 0.295 under PS. Additional mathematical reasoning experiments on AIME 2025 show the same pass-rate control pattern and decompose the gains into replay, bidirectional coverage, and adaptive control.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05112v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tianshu Zhu, Wenyu Zhang, Xiaoying Zuo, Lun Tian, Haotian Zhao, Yucheng Zeng, Jingnan Gu, Daxiang Dong, Jianmin Wu, Dawei Yin, Dou Shen</dc:creator>
    </item>
    <item>
      <title>How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences</title>
      <link>https://arxiv.org/abs/2605.05113</link>
      <description>arXiv:2605.05113v1 Announce Type: new 
Abstract: We study signal propagation in linear recurrent models at finite width. While existing signal propagation theory relies predominantly on the infinite-width limit, it remains unclear for how long that approximation remains accurate when recurrent depth $t$ grows jointly with width $n$. This question is especially relevant for modern recurrent sequence models, whose natural operating regime involves long input sequences, i.e., large $t$. We derive exact finite-width formulas for the hidden state signal energies in linear recurrences under complex Gaussian initialization. Using these formulas, we identify the joint depth-width scaling regimes that govern signal propagation: (i) a subcritical regime $t=o(\sqrt n)$, in which the infinite-width approximation remains valid; (ii) a critical regime $t\sim c\sqrt n$, in which non-negligible deviations from infinite-width predictions appear and a nontrivial joint scaling limit emerges; and (iii) a supercritical regime $t\gg \sqrt n$, in which finite-width effects dominate. Thus, our results pinpoint the precise recurrent depth scale at which infinite-width theory breaks down in long-range linear recurrences. In turn, this shows when standard initialization schemes, such as Glorot, become unstable. More broadly, our results demonstrate that finite-width effects accumulate more rapidly with depth in recurrent models than in feedforward ones, leading to qualitatively different signal propagation behavior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05113v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mariia Seleznova</dc:creator>
    </item>
    <item>
      <title>Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior</title>
      <link>https://arxiv.org/abs/2605.05115</link>
      <description>arXiv:2605.05115v1 Announce Type: new 
Abstract: Neural representations carry rich geometric structure; but does that structure causally shape behavior? To address this question, we intervene along paths through activation space defined by different geometries, and measure the behavioral trajectories they induce. In particular, we test whether interventions that respect the geometry of activation space will yield behaviors close to those the model exhibits naturally. Concretely, we first fit an activation manifold $M_h$ to representations and a behavior manifold $M_y$ to output probability distributions. We then test the link $M_h \leftrightarrow M_y$ via interventions: we find that steering along $M_h$, which we term manifold steering, yields behavioral trajectories that follow $M_y$, while linear steering -- which assumes a Euclidean geometry -- cuts through off-manifold regions and hence produces unnatural outputs. Moreover, optimizing interventions in activation space to produce paths along $M_y$ recovers activation trajectories that trace the curvature of $M_h$. We demonstrate this bidirectional relationship between the geometry of representation and behavior across tasks and modalities. In language models, we use reasoning tasks with cyclic and sequential geometries as well as in-context learning tasks with more complex graph geometries. In a video world model, we use a task with geometry corresponding to physical dynamics. Overall, our work shows that geometry in neural representation is not merely incidental, but is in fact the proper object for enabling principled control via intervention on internals. This recasts the core problem of steering from finding the right direction to finding the right geometry.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05115v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Daniel Wurgaft, Can Rager, Matthew Kowal, Vasudev Shyam, Sheridan Feucht, Usha Bhalla, Tal Haklay, Eric Bigelow, Raphael Sarfati, Thomas McGrath, Owen Lewis, Jack Merullo, Noah Goodman, Thomas Fel, Atticus Geiger, Ekdeep Singh Lubana</dc:creator>
    </item>
    <item>
      <title>On the Hardness of Junking LLMs</title>
      <link>https://arxiv.org/abs/2605.05116</link>
      <description>arXiv:2605.05116v1 Announce Type: new 
Abstract: Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit semantic structure. These attacks generally operate by fixing an adversarial instruction and optimizing small adversarial components (e.g., suffixes or prefixes). In this setting, prompt structure is fundamental for performance, and recent results show that even simple random search can achieve strong performance when combined with sophisticated prompt design. Recently, it has been observed that harmful behaviors can be elicited even without the adversarial prompt, relying solely on optimized token sequences. This suggests the existence of natural backdoors, i.e., token sequences naturally emerged during LLMs training that trigger unsafe outputs without any meaningful instruction. However, despite these observations, this setting remains largely unexplored, and in particular the hardness of finding natural backdoors has not been assessed yet. In this work, we provide a first proof-of-concept study investigating the hardness of this task, which we refer to as the junking problem. We formalize it as the problem of finding token sequences that maximize the probability of generating a target prefix of harmful responses, propose a greedy random-search method to assess is such sequences can be discovered easily. Our results show that this problem is harder than standard jailbreak attacks, confirming the importance of semantic information in prompt design. At the same time, we find that our simple strategy is sufficient to solve it with a high success rate, suggesting that natural backdoors are present and easily recoverable. Finally, through perplexity analysis, we observe that the discovered token sequences lie in low-probability regions of the model distribution, supporting the hypothesis that they emerged implicitly from the training process.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05116v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Marco Rando, Samuel Vaiter</dc:creator>
    </item>
    <item>
      <title>On the Wasserstein Gradient Flow Interpretation of Drifting Models</title>
      <link>https://arxiv.org/abs/2605.05118</link>
      <description>arXiv:2605.05118v1 Announce Type: new 
Abstract: Recently, Deng et al. (2026) proposed Generative Modeling via Drifting (GMD), a novel framework for generative tasks. This note presents an analysis of GMD through the lens of Wasserstein Gradient Flows (WGF), i.e., the path of steepest descent for a functional in the space of probability measures, equipped with the geometry of optimal transport. Unlike previous WGF-based contributions, GMD can be thought of as directly targeting a fixed point of a specific WGF flow. We demonstrate three main results: first, that one algorithm proposed by Deng et al. (2026) corresponds to finding the limiting point of a WGF on the KL divergence, with Parzen smoothing on the densities. Second, that the algorithm actually implemented by Deng et al. (2026) corresponds to a different procedure, which bears some resemblance to the fixed point of a WGF on the Sinkhorn divergence, but lacks certain desirable properties of the latter. Third, the same same idea can be extended to the limiting point of other WGFs, including the Maximum Mean Discrepancy (MMD), the sliced Wasserstein distance, and GAN critic functions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05118v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Arthur Gretton, Li Kevin Wenliang, Alexandre Galashov, James Thornton, Valentin De Bortoli, Arnaud Doucet</dc:creator>
    </item>
    <item>
      <title>MCFlash: Bulk Bitwise Processing in 3D NAND with Dynamic Sensing and Multi-level Encoding</title>
      <link>https://arxiv.org/abs/2605.05119</link>
      <description>arXiv:2605.05119v1 Announce Type: new 
Abstract: This paper presents MCFlash, a practical and immediately deployable technique for executing bulk bitwise operations directly within commercial off-the-shelf(COTS) 3D NAND flash chips. MCFlash relies solely on standard user-mode instructions, combining Multi-Level Cell (MLC) data encodings with dynamically tuned read reference voltages to execute in-place bitwise operations. We evaluate MCFlash across diverse NAND flash chips, both floating-gate and charge-trap variants, from different generations. Our results represent the first demonstration of error-free, on-chip bitwise operations, sustaining over one billion operations on fresh blocks and maintaining bit-error rates below 0.015% even after 10,000 program/erase (P/E) cycles.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05119v1</guid>
      <category>cs.AR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Habib Ur Rahman, Tharini Suresh, Sudeep Pasricha, Biswajit Ray</dc:creator>
    </item>
    <item>
      <title>Physiologically Grounded Driver Behavior Classification: SHAP-Driven Elite Feature Selection and Hybrid Gradient Boosting for Multimodal Physiological Signals</title>
      <link>https://arxiv.org/abs/2605.05120</link>
      <description>arXiv:2605.05120v1 Announce Type: new 
Abstract: An interpretable and scalable framework for decoding driving behaviors from multimodal physiological signals is proposed in this study. We utilize multimodal physiological driving behavior large-scale dataset comprising synchronized electroencephalogram (EEG), electromyography (EMG), and galvanic skin response (GSR) signals. Our approach involves rigorous preprocessing followed by a domain-specific feature extraction pipeline targeting time-domain, frequency-domain, and derived physiological indices. To address high dimensionality, we employ SHAP-based elite feature selection, retaining the top 250 features to reduce computational overhead while preserving predictive power. Hyperparameter optimization for extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) models is conducted using Bayesian optimization via Optuna. Finally, a weighted soft-voting ensemble is constructed to leverage the complementary strengths of both gradient boosting frameworks. The results demonstrate that the proposed ensemble achieves a test accuracy of 80.91% and a macro-F1 score of 0.79, significantly outperforming single-modality baselines and traditional machine learning models. Ablation studies confirm an 8% performance gain over the best single modality (EEG), validating the necessity of multimodal fusion. SHAP analysis further validates the physiological plausibility of the model, revealing that the EEG contributes the majority of predictive weight, GSR and EMG features provide critical discriminatory signals for high-arousal and motor-intensive maneuvers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05120v1</guid>
      <category>cs.LG</category>
      <category>eess.SP</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sahar Askari, Mohammad Mahdi Mirza Ali Mohammadi, Fatemeh Ensafdoust, Amin Golnari, Saeid Sanei</dc:creator>
    </item>
    <item>
      <title>Beyond Semantics: An Evidential Reasoning-Aware Multi-View Learning Framework for Trustworthy Mental Health Prediction</title>
      <link>https://arxiv.org/abs/2605.05121</link>
      <description>arXiv:2605.05121v1 Announce Type: new 
Abstract: Automated mental health prediction using textual data has shown promising results with deep learning and large language models. However, deploying these models in high-stakes real-world settings remains challenging, as existing approaches largely rely on semantic representations and often produce overconfident predictions under ambiguous, noisy, or shifted data. Moreover, most methods lack reliable uncertainty estimation, undermining trust in risk-sensitive mental health applications. To address these limitations, we formulate the task as a multi-view learning problem that integrates semantic information from encoder-only models with higher-level reasoning information from decoder-only models, where reasoning-aware representations and uncertainty modeling are obtained in a trustworthy manner. To ensure reliable fusion, we adopt an evidential learning framework based on Subjective Logic to explicitly model uncertainty and introduce an evidential fusion strategy that balances complementary views while discounting unreliable evidence. Benchmarking on three real-world datasets, Dreaddit, SDCNL, and DepSeverity, reports accuracies of 0.835, 0.731, and 0.751, respectively, demonstrating its potential for reliable mental health prediction. Additional experiments on robustness to noise and case studies for interpretability confirm that our proposed framework not only improves predictive performance but also provides trustworthy uncertainty estimates and human-understandable reasoning signals, making it suitable for risk-sensitive applications in mental health assessment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05121v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yucheng Ruan, Ling Huang, Qika Lin, Kai He, Mengling Feng</dc:creator>
    </item>
    <item>
      <title>Adaptive Policy Selection and Fine-Tuning under Interaction Budgets for Offline-to-Online Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2605.05123</link>
      <description>arXiv:2605.05123v1 Announce Type: new 
Abstract: In offline-to-online reinforcement learning (O2O-RL), policies are first safely trained offline using previously collected datasets and then further fine-tuned for tasks via limited online interactions. In a typical O2O-RL pipeline, candidate policies trained with offline RL are evaluated via either off-policy evaluation (OPE) or online evaluation (OE). The policy with the highest estimated value is then deployed and continually fine-tuned. However, this setup has two main issues. First, OPE can be unreliable, making it risky to deploy a policy based solely on those estimates, whereas OE may identify a viable policy with substantial online interaction, which could have been used for fine-tuning. Second--and more importantly--it is also often not possible to determine a priori whether a pretrained policy will improve with post-deployment fine-tuning, especially in non-stationary environments. As a result, procedures committing to a single deployed policy are impractical in many real-world settings. Moreover, a naive remedy that exhaustively fine-tunes all candidates would violate interaction budget constraints and is likewise infeasible. In this paper, we propose a novel adaptive approach for policy selection and fine-tuning under online interaction budgets in O2O-RL. Following the standard pipeline, we first train a set of candidate policies with different offline RL algorithms and hyperparameters; we then perform OPE to obtain initial performance estimates. We next adaptively select and fine-tune the policies based on their predicted performance via an upper-confidence-bound approach thereby making efficient use of online interactions. We demonstrate that our approach improves upon O2O-RL baselines with various benchmarks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05123v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alper Kamil Bozkurt, Xiaoan Xu, Shangtong Zhang, Miroslav Pajic, Yuichi Motai</dc:creator>
    </item>
    <item>
      <title>Conditional outlier detection for clinical alerting</title>
      <link>https://arxiv.org/abs/2605.05124</link>
      <description>arXiv:2605.05124v1 Announce Type: new 
Abstract: We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management actions using past patient cases stored in an electronic health record (EHR) system. Our hypothesis is that patient-management actions that are unusual with respect to past patients may be due to a potential error and that it is worthwhile to raise an alert if such a condition is encountered. We evaluate this hypothesis using data obtained from the electronic health records of 4,486 post-cardiac surgical patients. We base the evaluation on the opinions of a panel of experts. The results support that anomaly-based alerting can have reasonably low false alert rates and that stronger anomalies are correlated with higher alert rates.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05124v1</guid>
      <category>cs.LG</category>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Milos Hauskrecht, Michal Valko, Shyam Visweswaran, Iyad Batal, Gilles Clermont, Gregory Cooper</dc:creator>
    </item>
    <item>
      <title>Joint Treatment Effect Estimation from Incomplete Healthcare Data: Temporal Causal Normalizing Flows with LLM-driven Evolutionary MNAR Imputation</title>
      <link>https://arxiv.org/abs/2605.05125</link>
      <description>arXiv:2605.05125v1 Announce Type: new 
Abstract: Target trial emulation (TTE) enables causal questions to be studied with observational data when randomized controlled trials (RCTs) are infeasible. Yet treatment-effect methods often address causal estimation, missingness, and temporal structure separately, limiting their robustness in electronic health records (EHRs), where time-varying confounding and missing-not-at-random (MNAR) biomarkers can reach 50%--80%. We propose a two-stage pipeline for treatment effect estimation from incomplete longitudinal EHRs. First, CausalFlow-T, a directed acyclic graph (DAG)-constrained normalizing flow with long short-term memory (LSTM)-encoded patient history, performs exact invertible counterfactual inference, avoiding approximation errors from variational inference and separating confounding through explicit causal structure. Ablations on four synthetic and one semi-synthetic benchmark with known counterfactuals show that DAG constraints and exact inference address distinct failure modes: neither compensates for the other. Second, because CausalFlow-T requires completed inputs, we introduce an LLM-driven evolutionary imputer that proposes executable imputation operators rather than individual entries, and evaluate it with three large language model (LLM) backends, including two open-source models. Across 30%--80% MNAR missingness, this imputer achieves the best pooled rank over biomarker and causal metrics, leading in point-wise accuracy and temporal extrapolation while preserving average treatment effect (ATE) recovery as statistical baselines degrade. On Swiss primary-care EHRs from adults with type 2 diabetes initiating a GLP-1 receptor agonist or SGLT-2 inhibitor, the pipeline estimates a per-protocol weight-loss difference of -0.98 kg [95% CI -1.01, -0.96] favoring GLP-1 receptor agonists, consistent with randomized evidence and obtained from realistically incomplete real-world EHRs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05125v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Olivia Jullian Parra, Sara Zoccheddu, David Catalan Cerezo, Tom Forzy, Franziska Ulrich, William Sutcliffe, Jakob Martin Burgstaller, Oliver Senn, Patrick Owen, Nicola Serra</dc:creator>
    </item>
    <item>
      <title>ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation</title>
      <link>https://arxiv.org/abs/2605.05126</link>
      <description>arXiv:2605.05126v1 Announce Type: new 
Abstract: Current Vision-Language-Action (VLA) models primarily focus on mapping 2D observations to actions, but exhibit notable limitations in spatiotemporal perception and reasoning: 1) spatial representations often rely on additional sensors, introducing substantial computational overhead; 2) visual reasoning is typically limited to future-frame prediction, lacking alignment with the instruction-grounded scene and thus compromising spatiotemporal consistency. To address these challenges, we propose ConsisVLA-4D, a unified and efficient framework that enhances spatiotemporal consistency in 3D perception and 4D reasoning. Specifically, we design: 1) CV-Aligner, which ensures cross-view object semantic consistency by filtering instruction-relevant regions and aligning object identities across multiple viewpoints; 2) CO-Fuser, which guarantees cross-object spatial geometric consistency by eliminating spatial relation ambiguities between objects across views using compact latent representations. Building upon these, we introduce 3) CS-Thinker to achieve cross-scene spatiotemporal consistency as actions unfold. It learns implicit knowledge of local dynamics from object-semantic tokens of CV-Aligner and global depth from geometric tokens of CO-Fuser, thereby enhancing efficient visual reasoning under scene variations. Extensive experiments demonstrate that, benefiting from its efficient spatiotemporal consistency design, ConsisVLA-4D achieves 21.6% and 41.5% performance improvements, along with 2.3-fold and 2.4-fold inference speedups compared to OpenVLA on the LIBERO benchmark and real-world platforms, respectively.ConsisVLA-4D is open-sourced and publicly available at</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05126v1</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wei Li, Jizhihui Liu, Li Yixing, Junwen Tong, Rui Shao, Liqiang Nie</dc:creator>
    </item>
    <item>
      <title>BDF2-type integrator for Landau-Lifshitz-Gilbert equation in micromagnetics: a-priori error estimates</title>
      <link>https://arxiv.org/abs/2605.05129</link>
      <description>arXiv:2605.05129v1 Announce Type: new 
Abstract: We consider the Landau-Lifshitz-Gilbert equation (LLG), which models time-dependent micromagnetic phenomena. We analyze a fully discrete scheme that combines first-order finite elements in space with a BDF2 method in time. The method requires the solution of only one linear system of equations per time step and does not enforce the pointwise unit-length constraint of the magnetization. While unconditional weak convergence has been analyzed in an earlier work, we now prove optimal-order convergence rates under sufficient regularity assumptions on the exact solution and the external field. In combination with our previous work, this establishes the first higher-order-in-time and linear integrator that converges both to weak and strong solutions of LLG. Numerical experiments confirm first-order convergence in space and second-order convergence in time.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05129v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Michele Ald\'e, Dirk Praetorius, Michael Feischl</dc:creator>
    </item>
    <item>
      <title>Transformed Latent Variable Multi-Output Gaussian Processes</title>
      <link>https://arxiv.org/abs/2605.05133</link>
      <description>arXiv:2605.05133v1 Announce Type: new 
Abstract: Multi-Output Gaussian Processes (MOGPs) provide a principled probabilistic framework for modelling correlated outputs but face scalability bottlenecks when applied to datasets with high-dimensional output spaces. To maintain tractability, existing methods typically resort to restrictive assumptions, such as employing low-rank or sum-of-separable kernels, which can limit expressiveness. We propose the Transformed Latent Variable MOGP (T-LVMOGP), a novel framework that scales MOGPs to a massive number of outputs while preserving the capacity to capture meaningful inter-output dependencies. T-LVMOGP constructs a flexible multi-output deep kernel by mapping inputs and output-specific latent variables into an embedding space using a Lipschitz-regularised neural network. Combined with stochastic variational inference, our model effectively scales to high-dimensional output settings. Across diverse benchmarks, including climate modelling with over 10,000 outputs and zero-inflated spatial transcriptomics data, T-LVMOGP outperforms baselines in both predictive accuracy and computational efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05133v1</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiaoyu Jiang, Xinxing Shi, Sokratia Georgaka, Magnus Rattray, Mauricio A \'Alvarez</dc:creator>
    </item>
    <item>
      <title>Low-Cost Black-Box Detection of LLM Hallucinations via Dynamical System Prediction</title>
      <link>https://arxiv.org/abs/2605.05134</link>
      <description>arXiv:2605.05134v1 Announce Type: new 
Abstract: Large Language Models (LLMs) frequently generate plausible but non-factual content, a phenomenon known as hallucination. While existing detection methods typically rely on computationally expensive sampling-based consistency checks or external knowledge retrieval, we propose a new method that treats the LLM as a black-box dynamical system. By projecting LLM responses into a high-dimensional manifold via an embedding model, we characterize the resulting vector sequences as observable realizations of the model's latent state-space dynamics. Leveraging Koopman operator theory, we fit the transition operators for both factual and hallucinated regimes and define a differential residual score based on their respective prediction errors. To accommodate varying user requirements and domain-specific sensitivities, we introduce a preference-aware calibration mechanism that optimizes the classification threshold based on a small set of demonstrations. This approach enables low-cost hallucination detection in a single-sample pass, avoiding the need for secondary sampling or external grounding. Extensive testing across three data benchmarks demonstrates that our method achieves state-of-the-art performance with reduced resource overhead.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05134v1</guid>
      <category>cs.LG</category>
      <category>math.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dan Wilson, Mohamed Akrout</dc:creator>
    </item>
    <item>
      <title>CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization</title>
      <link>https://arxiv.org/abs/2605.05136</link>
      <description>arXiv:2605.05136v1 Announce Type: new 
Abstract: Domain Generalization (DG) aims to learn representations that remain robust under out-of-distribution (OOD) shifts and generalize effectively to unseen target domains. While recent invariant learning strategies and architectural advances have achieved strong performance, explicitly discovering a structured domain-invariant subspace through second-order statistics remains underexplored. In this work, we propose CPCANet, a novel framework grounded in Common Principal Component Analysis (CPCA), which unrolls the iterative Flury-Gautschi (FG) algorithm into fully differentiable neural layers. This approach integrates the statistical properties of CPCA into an end-to-end trainable framework, enforcing the discovery of a shared subspace across diverse domains while preserving interpretability. Experiments on four standard DG benchmarks demonstrate that CPCANet achieves state-of-the-art (SOTA) performance in zero-shot transfer. Moreover, CPCANet is architecture-agnostic and requires no dataset-specific tuning, providing a simple and efficient approach to learning robust representations under distribution shift. Code is available at https://github.com/wish44165/CPCANet.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05136v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yu-Hsi Chen, Abd-Krim Seghouane</dc:creator>
    </item>
    <item>
      <title>Executable World Models for ARC-AGI-3 in the Era of Coding Agents</title>
      <link>https://arxiv.org/abs/2605.05138</link>
      <description>arXiv:2605.05138v1 Announce Type: new 
Abstract: We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, and plans through the model before acting. The system is intentionally direct: it uses a scripted controller, predefined world-model interfaces, verifier programs, and a plan executor, but no hand-coded game-specific logic. We report results on the 25 public ARC-AGI-3 games. Each recorded playthrough uses a fresh agent instance with no access to previous playthrough-specific files or conversation state. Most games have a single recorded playthrough; for a few games, we report multiple independent fresh-agent playthroughs to expose run-to-run variability. The agent fully solved 7 games, achieved a Relative Human Action Efficiency greater than 75%, on 6 games, and obtained a mean per-game RHAE of 32.58%. Because the system uses no game-specific code, it can serve as a game-general baseline for ARC-AGI-3. Performance on the private validation set remains to be tested. Overall, the results provide preliminary evidence that verifier-driven executable world models are a promising approach for ARC-AGI-3 agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05138v1</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sergey Rodionov</dc:creator>
    </item>
    <item>
      <title>Human-AI Co-Mentorship in Project-Based Learning: A Case Study in Financial Forecasting</title>
      <link>https://arxiv.org/abs/2605.05144</link>
      <description>arXiv:2605.05144v1 Announce Type: new 
Abstract: This paper reflects on a AI research project carried out by a team of high-school and early-undergraduate students under the mentorship of graduate researchers and ably assisted by AI tools. We share our experience in not only on the learning experience for the high school students, but also on how AI tools accelerated the process that enabled the high school students to focus on higher order problem formulation and solution. Although the participants entered the project with limited background in both AI and finance, they showed strong enthusiasm for technical market analysis and ETF price prediction. Traditional learning settings would first teach the necessary methods in a classroom setting and only later let students apply them. In contrast, our project emphasized workflow design: students identified the sequence of steps needed to address the problem and then used AI-driven tools to execute each step.
  We note that the high school students developed the necessary code through iterating with the AI tools, and we used our daily stand-ups to debug and answer conceptual questions. Each of the student was able to dig deeper into their area of interest whether computer science or finance, while collaboratively making a significant advance over the summer of 2025. This project was an important pedagogical exercise on how AI tools can be used for mentoring high school students, allowing them to focus on their specific interests and using the daily stand-ups to focus on problem definition and conceptual understanding. Despite their limited technical qualifications, the students were able to leverage AI tools to build meaningful models with real-world application.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05144v1</guid>
      <category>cs.LG</category>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Freyaa Chawla, Ahan Chawla, Rishi Singh, Joe Germino, Grigorii Khvatskii</dc:creator>
    </item>
    <item>
      <title>Toward a Risk Assessment Framework for Institutional DeFi: A Nine-Dimension Approach</title>
      <link>https://arxiv.org/abs/2605.05145</link>
      <description>arXiv:2605.05145v1 Announce Type: new 
Abstract: Decentralized finance (DeFi) protocols now intermediate over USD 100 billion in value, including regulated stablecoins and tokenized assets deployed as collateral, yet no widely adopted framework operationalizes risk assessment at the rigor institutional adoption demands. Existing approaches emphasize protocol-specific parameter optimization or conceptual taxonomies without providing explainable, composability-aware, and structurally independent assessment methodologies.
  We propose a nine-dimension DeFi risk assessment framework extending the six-dimension taxonomy introduced by Moody's Analytics and Gauntlet with three novel dimensions: composability risk, comprehension debt, and temporal risk dynamics. We additionally introduce a transparency confidence modifier separating assessment reliability from risk severity.
  The framework is grounded in structural analysis of protocol dependencies conducted through an ontology-based protocol intelligence infrastructure covering more than 8,000 DeFi protocols. We retrospectively analyze 12 major DeFi-related incidents from 2024-2026 representing approximately USD 2.5 billion in direct losses. Five of the 12 incidents require at least one novel dimension for complete root-cause characterization, including the two highest-systemic-impact events in the dataset.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05145v1</guid>
      <category>cs.DC</category>
      <category>cs.CR</category>
      <category>cs.CY</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Eva Oberholzer (ZWING Intelligence AG, Zug, Switzerland), Valeriy Zamaraiev (ZWING Intelligence AG, Zug, Switzerland)</dc:creator>
    </item>
    <item>
      <title>What Matters in Practical Learned Image Compression</title>
      <link>https://arxiv.org/abs/2605.05148</link>
      <description>arXiv:2605.05148v1 Announce Type: new 
Abstract: One of the major differentiators unlocked by learned codecs relative to their hard-coded traditional counterparts is their ability to be optimized directly to appeal to the human visual system. Despite this potential, a perceptual yet practical image codec is yet to be proposed.
  In this work, we aim to close this gap. We conduct a comprehensive study of the key modeling choices that govern the design of a practical learned image codec, jointly optimized for perceptual quality and runtime -- including within the ablations several novel techniques. We then perform performance-aware neural architecture search over millions of backbone configurations to identify models that achieve the target on-device runtime while maximizing compression performance as captured by perceptual metrics.
  We combine the various optimizations to construct a new codec that achieves a significantly improved tradeoff between speed and perceptual quality. Based on rigorous subjective user studies, it provides 2.3-3x bitrate savings against AV1, AV2, VVC, ECM and JPEG-AI, and 20-40% bitrate savings against the best learned codec alternatives. At the same time, on an iPhone 17 Pro Max, it encodes 12MP images as fast as 230ms, and decodes them in 150ms -- faster than most top ML-based codecs run on a V100 GPU.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05148v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kedar Tatwawadi, Parisa Rahimzadeh, Zhanghao Sun, Zhiqi Chen, Ziyun Yang, Sanjay Nair, Divija Hasteer, Oren Rippel</dc:creator>
    </item>
    <item>
      <title>Superposition Is Not Necessary: A Mechanistic Interpretability Analysis of Transformer Representations for Time Series Forecasting</title>
      <link>https://arxiv.org/abs/2605.05151</link>
      <description>arXiv:2605.05151v1 Announce Type: new 
Abstract: Transformer architectures have been widely adopted for time series forecasting, yet whether the representational mechanisms that make them powerful in NLP actually engage on time series data remains unexplored. The persistent competitiveness of simple linear models such as DLinear has fueled ongoing debate, but no mechanistic explanation for this phenomenon has been offered. We address this gap by applying sparse autoencoders (SAEs), a tool from mechanistic interpretability, to probe the internal representations of PatchTST. We first establish that a single-layer, narrow-dimensional transformer matches the forecasting performance of deeper configurations across commonly used benchmarks. We then train SAEs on the post-GELU intermediate FFN activations with dictionary sizes ranging from 0.5x to 4.0x the native dimensionality. Expanding the dictionary yields negligible downstream performance change (average 0.214%), with large portions of overcomplete dictionaries remaining inactive. Targeted causal interventions on dominant latent features produce minimal forecast perturbation. Across all evaluated settings, we observe no empirical evidence that the analyzed FFN representations rely on strong superposition. Instead, the representations remain sparse, stable under aggressive dictionary expansion, and largely insensitive to latent interventions. These results demonstrate that superposition is not necessary for competitive performance on standard forecasting benchmarks, suggesting they may not demand the rich compositional representations that drive transformer success in language modeling, and helping explain the persistent competitiveness of simple linear models</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05151v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alper Y{\i}ld{\i}r{\i}m</dc:creator>
    </item>
    <item>
      <title>Age of Gossip in Ring Networks With Non-Poisson Updates</title>
      <link>https://arxiv.org/abs/2605.05152</link>
      <description>arXiv:2605.05152v1 Announce Type: new 
Abstract: We consider a network consisting of $n$ nodes connected in a ring formation and a source that generates updates according to a renewal process and disseminates them to the ring network according to a Poisson process. The nodes in the network gossip with each other according to a push-based gossiping protocol, and disseminate version updates. Gossip between two neighbors happens at the arrivals of renewal processes with finite mean and variance. All renewal processes and Poisson processes in the network are independent but not identically distributed. We consider both uni-directional ring networks and bi-directional ring networks. We use version age of information to quantify the freshness of information at each node. Prior work has used the stochastic hybrid systems (SHS) approach or a first passage percolation (FPP) approach to analyze ring networks with edges following identical Poisson processes. In this work, we use a sample-path backtracking approach to characterize the probabilistic scaling of the version age of information of an arbitrary node in the gossip network, where each edge follows an independent but not identically distributed renewal process. We show that the version age of information of any node in the network is stochastically equivalent to $\sqrt{n}$ at any time instant after the node has received its first update from the source.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05152v1</guid>
      <category>cs.IT</category>
      <category>cs.NI</category>
      <category>cs.SI</category>
      <category>eess.SP</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Arunabh Srivastava, Sennur Ulukus</dc:creator>
    </item>
    <item>
      <title>Aes3D: Aesthetic Assessment in 3D Gaussian Splatting</title>
      <link>https://arxiv.org/abs/2605.05155</link>
      <description>arXiv:2605.05155v1 Announce Type: new 
Abstract: As 3D Gaussian Splatting (3DGS) gains attention in immersive media and digital content creation, assessing the aesthetics of 3D scenes becomes important in helping creators build more visually compelling 3D content. However, existing evaluation methods for 3D scenes primarily emphasize reconstruction fidelity and perceptual realism, largely overlooking higher-level aesthetic attributes such as composition, harmony, and visual appeal. This limitation comes from two key challenges: (1) the absence of general 3DGS datasets with aesthetic annotations, and (2) the intrinsic nature of 3DGS as a low-level primitive representation, which makes it difficult to capture high-level aesthetic features. To address these challenges, we propose Aes3D, the first systematic framework for assessing the aesthetics of 3D neural rendering scenes. Aes3D includes Aesthetic3D, the first dataset dedicated to 3D scene aesthetic assessment, built on our proposed annotation strategy for 3D scene aesthetics. In addition, we present Aes3DGSNet, a lightweight model that directly predicts scene-level aesthetic scores from 3DGS representations. Notably, our model operates solely on 3D Gaussian primitives, eliminating the need for rendering multi-view images and thus reducing computational cost and hardware requirements. Through aesthetics-supervised learning on multi-view 3DGS scene representations, Aes3DGSNet effectively captures high-level aesthetic cues and accurately regresses aesthetic scores. Experimental results demonstrate that our approach achieves strong performance while maintaining a lightweight design, establishing a new benchmark for 3D scene aesthetic assessment. Code and datasets will be made available in a future version.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05155v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Chuanzhi Xu, Boyu Wei, Haoxian Zhou, Xuanhua Yin, Zihan Deng, Haodong Chen, Qiang Qu, Weidong Cai</dc:creator>
    </item>
    <item>
      <title>PSK at SemEval-2026 Task 9: Multilingual Polarization Detection Using Ensemble Gemma Models with Synthetic Data Augmentation</title>
      <link>https://arxiv.org/abs/2605.05159</link>
      <description>arXiv:2605.05159v1 Announce Type: new 
Abstract: We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoRA), augmented with synthetic data generated by a large language model (LLM). We employ three synthetic data strategies (direct generation, paraphrasing, and contrastive pair creation) using GPT-4o-mini, with a multi-stage quality filtering pipeline including embedding-based deduplication. We find that per-language threshold tuning on the development set yields 2 to 4\% F1 improvements without retraining. We also use weighted ensembles of 12B and 27B model predictions with per-language strategy selection. Our final system achieves a mean macro-F1 of 0.811 across all 22 languages, ranking 2nd overall of the participating teams, with 1st place finishes in 3 languages and top-3 in 8 languages. We also find that alternative architectures (XLM-RoBERTa, Qwen3) that showed strong development set performance suffered 30 to 50\% F1 drops on the test set, highlighting the importance of generalization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05159v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Srikar Kashyap Pulipaka</dc:creator>
    </item>
    <item>
      <title>Private Structured-Subset Retrieval</title>
      <link>https://arxiv.org/abs/2605.05160</link>
      <description>arXiv:2605.05160v1 Announce Type: new 
Abstract: We introduce the \emph{Private Structured-Subset Retrieval (PSSR)} problem, where a user retrieves $D$ messages from a database of $K$ messages replicated across $N$ non-colluding servers, and the demand is restricted to a known structured family of $D$-subsets. This formulation generalizes classical Private Information Retrieval (PIR) and multi-message PIR (MPIR), and captures settings where the demand space is constrained by application-specific structure. Focusing on balanced ${\{0,1\}}$-linear schemes, we derive converse bounds on the maximum retrieval rate and minimum subpacketization level, and develop an optimization-based framework for constructing schemes for general structured demand families. Our results show that, for certain families, the PSSR rate converse bound can exceed the best-known MPIR rate upper bound; when this PSSR bound is achievable, MPIR rate-optimal schemes become suboptimal for those families. By exploiting demand structure, our PSSR schemes achieve higher retrieval rates for many families and never underperform the best-known balanced ${\{0,1\}}$-linear MPIR schemes. Our results also show that demand structure can reduce the required subpacketization even when the optimal rate is unchanged. Our parallel work on contiguous-demand families further illustrates the scope of this framework by yielding rate-optimal schemes with substantially smaller subpacketization and no field-size restrictions, improving upon MPIR-based schemes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05160v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Maha Issa, Anoosheh Heidarzadeh</dc:creator>
    </item>
    <item>
      <title>Wasserstein-Aligned Localisation for VLM-Based Distributional OOD Detection in Medical Imaging</title>
      <link>https://arxiv.org/abs/2605.05161</link>
      <description>arXiv:2605.05161v1 Announce Type: new 
Abstract: Zero-shot anomaly localisation via vision-language models (VLMs) offers a compelling approach for rare pathology detection, yet its performance is fundamentally limited by the absence of healthy anatomical context. We reformulate zero-shot localisation as a comparative inference problem in which anomalies are identified through structured comparison against reference distributions of normal anatomy. We introduce WALDO, a training-free framework grounded in optimal transport theory that enables comparative reasoning through: (i) entropy-weighted Sliced Wasserstein distances for anatomically-aware reference selection from DINOv2 patch distributions, (ii) Goldilocks zone sampling exploiting the non-monotonic relationship between reference similarity and localisation accuracy, and (iii) self-consistency aggregation via weighted non-maximum suppression. We theoretically analyse the Goldilocks effect through distributional divergence, and show that references with moderate similarity minimize a bias-variance trade-off in comparative visual reasoning. On the NOVA brain MRI benchmark, WALDO with Qwen2.5-VL-72B achieves $43.5_{\pm1.6}\%$ mAP@30 (95\% CI: [40.4, 46.7]), representing a 19\% relative improvement over zero-shot baselines. Cross-model evaluation shows consistent gains: GPT-4o achieves $32.0_{\pm6.5}\%$ and Qwen3-VL-32B achieves $32.0_{\pm6.6}\%$ mAP@30. Paired McNemar tests confirm statistical significance ($p&lt;0.01$). Source code is available at https://github.com/bkainz/WALDO_MICCAI26_demo .</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05161v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bernhard Kainz, Johanna P Mueller, Matthew Baugh, Cosmin Bercea</dc:creator>
    </item>
    <item>
      <title>PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World</title>
      <link>https://arxiv.org/abs/2605.05163</link>
      <description>arXiv:2605.05163v1 Announce Type: new 
Abstract: Synthesizing physics-grounded 3D assets is a critical bottleneck for interactive virtual worlds and embodied AI. Existing methods predominantly focus on static geometry, overlooking the functional properties essential for interaction. We propose that interactive asset generation must be rooted in functional logic and hierarchical physics. To bridge this gap, we introduce PhysForge, a decoupled two-stage framework supported by PhysDB, a large-scale dataset of 150,000 assets with four-tier physical annotations. First, a VLM acts as a "physical architect" to plan a "Hierarchical Physical Blueprint" defining material, functional, and kinematic constraints. Second, a physics-grounded diffusion model realizes this blueprint by synthesizing high-fidelity geometry alongside precise kinematic parameters via a novel KineVoxel Injection (KVI) mechanism. Experiments demonstrate that PhysForge produces functionally plausible, simulation-ready assets, providing a robust data engine for interactive 3D content and embodied agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05163v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yunhan Yang, Chunshi Wang, Junliang Ye, Yang Li, Zanxin Chen, Zehuan Huang, Yao Mu, Zhuo Chen, Chunchao Guo, Xihui Liu</dc:creator>
    </item>
    <item>
      <title>Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation</title>
      <link>https://arxiv.org/abs/2605.05164</link>
      <description>arXiv:2605.05164v1 Announce Type: new 
Abstract: Accurate analysis of histopathological images is critical for disease diagnosis and treatment planning. Whole-slide images (WSIs), which digitize tissue specimens at gigapixel resolution, are fundamental to this process but require aggregating thousands of patches for slide-level predictions. Multiple Instance Learning (MIL) tackles this challenge with a two-stage paradigm, decoupling tile-level embedding and slide-level prediction. However, most existing methods implicitly embed patch representations in homogeneous Euclidean spaces, overlooking the hierarchical organization and regional heterogeneity of pathological tissues. This limits current models' ability to capture global tissue architecture and fine-grained cellular morphology. To address this limitation, we introduce a hybrid hyperbolic-Euclidean representation that embeds WSI features in dual geometric spaces, enabling complementary modeling of hierarchical tissue structures and local morphological details. Building on this formulation, we develop BatMIL, a WSI classification framework that leverages both geometric spaces. To model long-range dependencies among thousands of patches, we employ a structured state space sequence model (S4) backbone that encodes patch sequences with linear computational complexity. Furthermore, to account for regional heterogeneity, we introduce a chunk-level mixture-of-experts (MoE) module that groups patches into regions and dynamically routes them to specialized subnetworks, improving representational capacity while reducing redundant computation. Extensive experiments on seven WSI datasets spanning six cancer types demonstrate that BatMIL consistently outperforms state-of-the-art MIL approaches in slide-level classification tasks. These results indicate that geometry-aware representation learning offers a promising direction for next-generation computational pathology.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05164v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Enhui Chai, Sicheng Chen, Tianyi Zhang, Chad Wong, Kecheng Huang, Zeyu Liu, Fei Xia</dc:creator>
    </item>
    <item>
      <title>Interests Burn-down Diffusion Process for Personalized Collaborative Filtering</title>
      <link>https://arxiv.org/abs/2605.05165</link>
      <description>arXiv:2605.05165v1 Announce Type: new 
Abstract: Generative methods have gained widespread attention in Collaborative Filtering (CF) tasks for their ability to produce high-quality personalized samples aligned with users' interests. Among them, diffusion generative models have raised increasing attention in recommendation field. Despite that the pioneering efforts have applied the conventional diffusion process to model diffusive user interests, the incongruity between the Gaussian noise and the subtle nature of user's personalized interaction behavior has led to sub-optimal results. To this end, we introduce a specifically-tailored diffusion scheme for interaction systems, namely the interests burn-down process. The interests burn-down process delineates the decay of user interests towards candidate items, complemented by its reverse burn-up process that yields personalized recommendation for users. The inherent burn-down nature of this process adeptly models the diffusive user interests, aligning seamlessly with the requirements of CF tasks. We present a novel recommendation method StageCF to illustrate the superiority of this newly proposed diffusion process. Experimental results have demonstrated the effectiveness of StageCF against existing generative and diffusion-based baseline methods. Furthermore, comprehensive studies validate the functionality of interests burn-down process, shedding light on its capacity to generate personalized interactions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05165v1</guid>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yifang Qin, Zhaobin Li, Arisa Watanabe, Wei Ju, Zhiping Xiao, Ming Zhang</dc:creator>
    </item>
    <item>
      <title>The First Token Knows: Single-Decode Confidence for Hallucination Detection</title>
      <link>https://arxiv.org/abs/2605.05166</link>
      <description>arXiv:2605.05166v1 Announce Type: new 
Abstract: Self-consistency detects hallucinations by generating multiple sampled answers to a question and measuring agreement, but this requires repeated decoding and can be sensitive to lexical variation. Semantic self-consistency improves this by clustering sampled answers by meaning using natural language inference, but it adds both sampling cost and external inference overhead. We show that first-token confidence, phi_first, computed from the normalized entropy of the top-K logits at the first content-bearing answer token of a single greedy decode, matches or modestly exceeds semantic self-consistency on closed-book short-answer factual question answering. Across three 7-8B instruction-tuned models and two benchmarks, phi_first achieves a mean AUROC of 0.820, compared with 0.793 for semantic agreement and 0.791 for standard surface-form self-consistency. A subsumption test shows that phi_first is moderately to strongly correlated with semantic agreement, and combining the two signals yields only a small AUROC improvement over phi_first alone. These results suggest that much of the uncertainty information captured by multi-sample agreement is already available in the model's initial token distribution. We argue that phi_first should be reported as a default low-cost baseline before invoking sampling-based uncertainty estimation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05166v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mina Gabriel</dc:creator>
    </item>
    <item>
      <title>Deterministic identification for Bernoulli channels and related channels with continuous input</title>
      <link>https://arxiv.org/abs/2605.05168</link>
      <description>arXiv:2605.05168v1 Announce Type: new 
Abstract: For memoryless channels with continuous input alphabets, deterministic identification (DI) typically exhibits a linearithmic ($n\log n$) message growth. However, the exact DI capacity has long remained open due to a persistent gap between the best known achievability and converse bounds. This gap was recently closed for AWGN channels via a novel code construction optimising the "galaxy" codes. Here, we extend this approach to the Bernoulli channel and subsequently to any channel $W$ whose image contains a continuous curve of output probability distributions, and hence admits a reduction to the Bernoulli channel restricted to a subinterval of inputs. As a consequence, we prove that the converse bound is tight and establish $\dot{C}_{\text{DI}}(W) = \frac 12$ for this broad class of channels, thereby closing the long-standing capacity gap. A similar gap was also observed for the DI rate-reliability tradeoff. We analyse the tradeoff between rate and error of the proposed code and derive improved lower bounds on the reliability function, approaching the converse at leading order in the regime of small error exponents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05168v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pau Colomer, Christian Deppe, Holger Boche, Andreas Winter</dc:creator>
    </item>
    <item>
      <title>Private Contiguous-Block Retrieval</title>
      <link>https://arxiv.org/abs/2605.05169</link>
      <description>arXiv:2605.05169v1 Announce Type: new 
Abstract: We introduce the \emph{Private Contiguous-Block Retrieval (PCBR)} problem, where a user retrieves a block of $D$ messages with contiguous indices from $K$ replicated messages stored across $N$ non-colluding servers, while hiding the identity of the requested block from each server. This problem is motivated by storage and streaming systems where files are split into ordered segments. Unlike multi-message Private Information Retrieval (MPIR), where any $D$-subset may be requested, PCBR restricts the demand family to contiguous blocks. This relaxation raises a natural question: Can this structure be exploited to improve retrieval efficiency? We answer this question for balanced $\{0,1\}$-linear schemes. We establish an upper bound on the achievable retrieval rate for all problem parameters, derive a lower bound on the subpacketization level required by any scheme achieving the rate upper bound, and construct a rate-optimal scheme whose subpacketization level matches the lower bound for a broad range of problem parameters. Although the optimal PCBR rate coincides with the best-known MPIR rate converse bound, existing MPIR schemes can be suboptimal for PCBR and can require a much larger subpacketization level. In contrast, our scheme exploits the contiguous-block structure to achieve the optimal rate with reduced subpacketization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05169v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Maha Issa, Anoosheh Heidarzadeh</dc:creator>
    </item>
    <item>
      <title>Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours</title>
      <link>https://arxiv.org/abs/2605.05170</link>
      <description>arXiv:2605.05170v1 Announce Type: new 
Abstract: Driven by a rapid co-evolution of both harness and underlying models, LLM agents are improving at a dizzying pace. In our prior work (performed in Dec. 2025), we introduced "Design Conductor" (or just "Conductor"), a system capable of building a 5-stage Linux-capable RISC-V CPU in 12 hours. In this work, we introduce an updated multi-agent harness powered by frontier models released in April 2026, which is able to handle 80x larger tasks, at higher quality, fully autonomously. Following a brief introduction, we examine 4 designs that the system produced autonomously, including "VerTQ", an LLM inference accelerator which hard-wires support for TurboQuant in a 240-cycle pipeline, starting from the TurboQuant arXiv paper. VerTQ includes heavy compute processing, with 5129 FP16/32 units; the design was mapped to an FPGA at 125 MHz and consumes 5.7 mm^2 in TSMC 16FF (8 attention pipes). We review the key new characteristics that enabled these results. Finally, we analyze Design Conductor's token usage and other empirical characteristics, including its limitations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05170v1</guid>
      <category>cs.AR</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator> The Verkor Team, Ravi Krishna, Suresh Krishna, David Chin</dc:creator>
    </item>
    <item>
      <title>When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2605.05172</link>
      <description>arXiv:2605.05172v1 Announce Type: new 
Abstract: Behavior Cloning (BC) has emerged as a highly effective paradigm for robot learning. However, BC lacks a self-guided mechanism for online improvement after demonstrations have been collected. Existing offline-to-online learning methods often cause policies to replace previously learned good actions due to a distribution mismatch between offline data and online learning. In this work, we propose Q2RL, Q-Estimation and Q-Gating from BC for Reinforcement Learning, an algorithm for efficient offline-to-online learning. Our method consists of two parts: (1) Q-Estimation extracts a Q-function from a BC policy using a few interaction steps with the environment, followed by online RL with (2) Q-Gating, which switches between BC and RL policy actions based on their respective Q-values to collect samples for RL policy training. Across manipulation tasks from D4RL and robomimic benchmarks, Q2RL outperforms SOTA offline-to-online learning baselines on success rate and time to convergence. Q2RL is efficient enough to be applied in an on-robot RL setting, learning robust policies for contact-rich and high precision manipulation tasks such as pipe assembly and kitting, in 1-2 hours of online interaction, achieving success rates of up to 100% and up to 3.75x improvement against the original BC policy. Code and video are available at https://pages.rai-inst.com/q2rl_website/</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05172v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lakshita Dodeja, Ondrej Biza, Shivam Vats, Stephen Hart, Stefanie Tellex, Robin Walters, Karl Schmeckpeper, Thomas Weng</dc:creator>
    </item>
    <item>
      <title>Understanding In-Context Learning for Nonlinear Regression with Transformers: Attention as Featurizer</title>
      <link>https://arxiv.org/abs/2605.05176</link>
      <description>arXiv:2605.05176v1 Announce Type: new 
Abstract: Pre-trained transformers are able to learn from examples provided as part of the prompt without any weight updates, a remarkable ability known as in-context learning (ICL). Despite its demonstrated efficacy across various domains, the theoretical understanding of ICL is still developing. Whereas most existing theory has focused on linear models, we study ICL in the nonlinear regression setting. Through the interaction mechanism in attention, we explicitly construct transformer networks to realize nonlinear features, such as polynomial or spline bases, which span a wide class of functions. Based on this construction, we establish a framework to analyze end-to-end in-context nonlinear regression with the constructed features. Our theory provides finite-sample generalization error bounds in terms of context length and training set size. We numerically validate the theory on synthetic regression tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05176v1</guid>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Alexander Hsu, Zhaiming Shen, Wenjing Liao, Rongjie Lai</dc:creator>
    </item>
    <item>
      <title>Explicit Two-Sided Eigenvalue Bounds for Schr\"odinger Operators with Singular Potentials via Finite Element Method</title>
      <link>https://arxiv.org/abs/2605.05177</link>
      <description>arXiv:2605.05177v1 Announce Type: new 
Abstract: We present, to the best of our knowledge, the first numerical algorithm for explicit, computable two-sided eigenvalue bounds for Schr\"odinger operators H = -Delta + V on R^N, N = 2,3, in the presence of both an unbounded potential and an unbounded domain. "Explicit" here means that all constants and ingredients are derived in closed form from the mesh, the potential, and a small set of explicit inequalities (Payne-Weinberger, Hardy, and explicit bounded-domain Sobolev embeddings); the conversion to fully verified(IEEE-754-safe, interval-arithmetic) enclosures is a separate verification step and is left for future work. In particular, singular attractive potentials of Coulomb type, V(x) = -Z/|x|, which model the hydrogen atom and the H_2^+ molecular ion, are covered by the theory. The method combines domain truncation to a bounded domain D(R) containing {|x| &lt;= R} with an extension of Liu's Composite Enriched Crouzeix-Raviart (CECR) finite element method to sign-indefinite potentials. Upper bounds come from the standard conforming Galerkin method; lower bounds come from the CECR construction, whose gap to the exact eigenvalue closes as the mesh is refined. Numerical experiments on the 2D single- and two-centred Coulomb potentials and on the 3D hydrogen atom and H_2^+ molecular ion illustrate the algorithm and confirm the predicted convergence.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05177v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xuefeng Liu</dc:creator>
    </item>
    <item>
      <title>Estimating the expected output of wide random MLPs more efficiently than sampling</title>
      <link>https://arxiv.org/abs/2605.05179</link>
      <description>arXiv:2605.05179v1 Announce Type: new 
Abstract: By far the most common way to estimate an expected loss in machine learning is to draw samples, compute the loss on each one, and take the empirical average. However, sampling is not necessarily optimal. Given an MLP at initialization, we show how to estimate its expected output over Gaussian inputs without running samples through the network at all. Instead, we produce approximate representations of the distributions of activations at each layer, leveraging tools such as cumulants and Hermite expansions. We show both theoretically and empirically that for sufficiently wide networks, our estimator achieves a target mean squared error using substantially fewer FLOPs than Monte Carlo sampling. We find moreover that our methods perform particularly well at estimating the probabilities of rare events, and additionally demonstrate how they can be used for model training. Together, these findings suggest a path to producing models with a greatly reduced probability of catastrophic tail risks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05179v1</guid>
      <category>cs.LG</category>
      <category>cond-mat.dis-nn</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wilson Wu, Victor Lecomte, Michael Winer, George Robinson, Jacob Hilton, Paul Christiano</dc:creator>
    </item>
    <item>
      <title>A Closed-Form Dual-Barrier CBF Safety Filter for Holonomic Robots on Incrementally Built Occupancy Grid Maps</title>
      <link>https://arxiv.org/abs/2605.05182</link>
      <description>arXiv:2605.05182v1 Announce Type: new 
Abstract: We present a dual-barrier control barrier function (CBF) safety filter for real-time, safety-critical velocity control of holonomic robots operating in incrementally built occupancy grid maps. As a robot explores an unknown environment, unmapped regions introduce irreducible uncertainty, since obstacle geometry beyond the explored frontier is unknown, making entry into such regions a source of collision risk, especially with front-facing sensors. To address this, we enforce two constraints: avoidance of mapped obstacles and restriction from unexplored regions. Both constraints are derived analytically from the occupancy grid's signed distance field, yielding a closed-form safety filter that requires only a small linear system solve per cycle. On resource-constrained platforms such as the Raspberry Pi, where SLAM and planning already consume significant compute, the low overhead of the proposed filter preserves resources. An adaptive gain schedule relaxes the frontier constraint in information-rich regions and tightens it in well-mapped areas, improving exploration efficiency while maintaining safety. The filter operates in velocity space as a minimally invasive correction and composes with arbitrary nominal controllers, including learning-based methods. Hardware flight experiments on a PX4-controlled quadrotor demonstrate zero collisions across multiple indoor runs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05182v1</guid>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Himanshu Paudel, Basanta Joshi, Dhirendra Raj Madai, Alina Bartaula, Biman Rimal, Sanjay Neupane</dc:creator>
    </item>
    <item>
      <title>OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents</title>
      <link>https://arxiv.org/abs/2605.05185</link>
      <description>arXiv:2605.05185v1 Announce Type: new 
Abstract: Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents with agentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data through Wikipedia path sampling, fuzzy entity rewriting, and source-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k for SFT and SearchVL-RL-8k for RL. Besides, we design a diverse tool environment that unifies text search, image search, OCR, cropping, sharpening, super-resolution, and perspective correction, enabling agents to combine active perception with external knowledge acquisition. Finally, we propose a multi-turn fatal-aware GRPO training algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sided advantage clamping. Built on this recipe, OpenSearch-VL delivers substantial performance gains, with over 10-point average improvements across seven benchmarks, and achieves results comparable to proprietary commercial models on several tasks. We will release all data, code, and models to support open research on multimodal deep search agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05185v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shuang Chen, Kaituo Feng, Hangting Chen, Wenxuan Huang, Dasen Dai, Quanxin Shou, Yunlong Lin, Xiangyu Yue, Shenghua Gao, Tianyu Pang</dc:creator>
    </item>
    <item>
      <title>LoViF 2026 The First Challenge on Holistic Quality Assessment for 4D World Model (PhyScore)</title>
      <link>https://arxiv.org/abs/2605.05187</link>
      <description>arXiv:2605.05187v1 Announce Type: new 
Abstract: This paper reports on the LoViF 2026 PhyScore challenge, a competition on holistic quality assessment of world-model-generated videos across both 2D and 4D generation settings. The challenge is motivated by a central gap in current evaluation practice: perceptual quality alone is insufficient to judge whether generated dynamics are physically plausible, temporally coherent, and consistent with input conditions. Participants are required to build a metric that jointly predicts four dimensions, i.e., Video Quality, Physical Realism, Condition-Video Alignment, and Temporal Consistency. Depart from that, participants also need to localize physical anomaly timestamps for fine-grained diagnosis.
  The benchmark dataset contains 1,554 videos generated by seven representative world generative models, organized into three tracks (text-2D, image-to-4D, and video-to-4D) and spanning 26 categories. These categories explicitly cover physics-relevant scenarios, including dynamics, optics, and thermodynamics, together with diverse real-world and creative content. To ensure label reliability, scores and anomaly timestamps are produced through trained human annotation with an additional automated quality-control pass.
  Evaluation is based on both score prediction and anomaly localization, with a composite protocol that combines TimeStamp_IOU and SRCC/PLCC. This report summarizes the challenge design and provides method-level insights from submitted solutions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05187v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wei Luo, Yiting Lu, Xin Li, Haoran Li, Fengbin Guan, Chen Gao, Xin Jin, Yong Li, Zhibo Chen, Sijing Wu, Kang Fu, Yunhao Li, Ziang Xiao, Huiyu Duan, Jing Liu, Qiang Hu, Xiongkuo Min, Guangtao Zhai, Manxi Sun, Zixuan Guo, Yun Li, Ziyang Chen, Manabu Tsukada, Zhengyang Li, Zhenglin Du, Yi Wen, Licheng Jiao, Fang Liu, Lingling Li, Yiwen Ren, Zhilong Song, Dubing Chen, Yucheng Zhou, Tianyi Yan, Huan Zheng</dc:creator>
    </item>
    <item>
      <title>SILC: Lookahead Caching for Short-form Video Delivery Systems</title>
      <link>https://arxiv.org/abs/2605.05188</link>
      <description>arXiv:2605.05188v1 Announce Type: new 
Abstract: Short video platforms like TikTok, Instagram Reels, and YouTube Shorts have gained immense popularity in the last few years and are responsible for a large and growing fraction of Internet traffic. We identify two unique opportunities for improving short video delivery using their existing interactions with content delivery networks (CDNs). First, short videos use a push-based recommendation system, where the user is presented a sequence of videos recommended by the algorithm rather than user explicitly picking content to watch (e.g., in YouTube). Such push-based short video systems offer a unique opportunity for system design by providing visibility into upcoming requests. Second, the popularity of these videos follows a highly skewed Pareto distribution, leading to geographical and temporal overlap amongst videos being served. We leverage these opportunities to build SILC - a lookahead-aware caching system, aimed at (i) reducing CDN cache miss rates, as well as (ii) reducing midgress bandwidth between the CDN and the origin server. Our evaluation of SILC uses traces that we collect from real users, through (i) an in-person user study, and (ii) a data donation program involving 100 TikTok users across the world. Using a combination of these traces, we simulate traffic from 10,000 simultaneous users. Our evaluation shows that, compared to 10 state-of-the-art heuristic and learning-based cache eviction policies, SILC reduces a CDN's midgress costs by 11.1% to 111%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05188v1</guid>
      <category>cs.NI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Maleeha Masood, Shreya Kannan, Om Chabra, Deepak Vasisht, Indranil Gupta</dc:creator>
    </item>
    <item>
      <title>LongSeeker: Elastic Context Orchestration for Long-Horizon Search Agents</title>
      <link>https://arxiv.org/abs/2605.05191</link>
      <description>arXiv:2605.05191v1 Announce Type: new 
Abstract: Long-horizon search agents must manage a rapidly growing working context as they reason, call tools, and observe information. Naively accumulating all intermediate content can overwhelm the agent, increasing costs and the risk of errors. We propose that effective context management should be adaptive: parts of the agent's trajectory are maintained at different levels of detail depending on their current relevance to the task. To operationalize this principle, we introduce Context-ReAct, a general agentic paradigm for elastic context orchestration that integrates reasoning, context management, and tool use in a unified loop. Context-ReAct provides five atomic operations: Skip, Compress, Rollback, Snippet and Delete, which allow the agent to dynamically reshape its working context, preserving important evidence, summarizing resolved information, discarding unhelpful branches, and controlling context size. We prove that the Compress operator is expressively complete, while the other specialized operators provide efficiency and fidelity guarantees that reduce generation cost and hallucination risk. Building on this paradigm, we develop LongSeeker, a long-horizon search agent fine-tuned from Qwen3-30B-A3B on 10k synthesized trajectories. Across four representative search benchmarks, LongSeeker achieves 61.5% on BrowseComp and 62.5% on BrowseComp-ZH, substantially outperforming Tongyi DeepResearch (43.2% and 46.7%) and AgentFold (36.2% and 47.3%). These results highlight the potential of adaptive context management, showing that agents can achieve more reliable and efficient long-horizon reasoning by actively shaping their working memory.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05191v1</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yijun Lu, Rui Ye, Yuwen Du, Jiajun Wang, Songhua Liu, Siheng Chen</dc:creator>
    </item>
    <item>
      <title>Implicit Representations of Grammaticality in Language Models</title>
      <link>https://arxiv.org/abs/2605.05197</link>
      <description>arXiv:2605.05197v1 Announce Type: new 
Abstract: Grammaticality and likelihood are distinct notions in human language. Pretrained language models (LMs), which are probabilistic models of language fitted to maximize corpus likelihood, generate grammatically well-formed text and discriminate well between grammatical and ungrammatical sentences in tightly controlled minimal pairs. However, their string probabilities do not sharply discriminate between grammatical and ungrammatical sentences overall. But do LMs implicitly acquire a grammaticality distinction distinct from string probability? We explore this question through studying internal representations of LMs, by training a linear probe on a dataset of grammatical and (synthetic) ungrammatical sentences obtained by applying perturbations to a naturalistic text corpus. We find that this simple grammaticality probe generalizes to human-curated grammaticality judgment benchmarks and outperforms LM probability-based grammaticality judgments. When applied to semantic plausibility benchmarks, in which both members of a minimal pair are grammatical and differ in only plausibility, the probe however performs worse than string probability. The English-trained probe also exhibits nontrivial cross-lingual generalization, outperforming string probabilities on grammaticality benchmarks in numerous other languages. Additionally, probe scores correlate only weakly with string probabilities. These results collectively suggest that LMs acquire to some extent an implicit grammaticality distinction within their hidden layers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05197v1</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yingshan Susan Wang, Linlu Qiu, Zhaofeng Wu, Roger P. Levy, Yoon Kim</dc:creator>
    </item>
    <item>
      <title>Optimizing Bit-Labeling of Voronoi Constellations</title>
      <link>https://arxiv.org/abs/2605.05202</link>
      <description>arXiv:2605.05202v1 Announce Type: new 
Abstract: We define a novel search method and performance metric as a technique for optimizing the bit-to-symbol map of the $D_4$ and $E_8$ root lattices in reference to bit error rate. We hold other sources of lattice gain constant by fixing the lattice constellation, and consider basis matrices that permute the integer labelings of the lattice points. After searching the possible basis matrices for $D_4$ and $E_8$, we found 0.1 dB of gain in $D_4$ bit error rate curves, and 0.5 dB of gain in $E_8$ compared to the standard bases commonly used in literature at a BER of $10^{-4}$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05202v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Carilyn Rumrill, David Muzzey, Connor Davis, Stephen Mackes, Dan Chew</dc:creator>
    </item>
    <item>
      <title>D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models</title>
      <link>https://arxiv.org/abs/2605.05204</link>
      <description>arXiv:2605.05204v1 Announce Type: new 
Abstract: The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervised fine-tuning. For example, applying the commonly used fine-tuning technique would compromises their inherent few-step inference capability. To address this, we propose D-OPSD, a novel training paradigm for step-distilled diffusion models that enables on-policy learning during supervised fine-tuning. We first find that the modern diffusion model where the LLM/VLM serves as the encoder can inherit its encoder's in-context capabilities. This enables us to make the training as an on-policy self-distillation process. Specifically, during training, we make the model acts as both the teacher and the student with different contexts, where the student is conditioned only on the text feature, while the teacher is conditioned on the multimodal feature of both the text prompt and the target image. Training minimizes the two predicted distributions over the student's own roll-outs. By optimized on the model's own trajectory and under it's own supervision, D-OPSD enables the model to learn new concept, style, etc. without sacrificing the original few-step capacity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05204v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dengyang Jiang, Xin Jin, Dongyang Liu, Zanyi Wang, Mingzhe Zheng, Ruoyi Du, Xiangpeng Yang, Qilong Wu, Zhen Li, Peng Gao, Harry Yang, Steven Hoi</dc:creator>
    </item>
    <item>
      <title>Taming Outlier Tokens in Diffusion Transformers</title>
      <link>https://arxiv.org/abs/2605.05206</link>
      <description>arXiv:2605.05206v1 Announce Type: new 
Abstract: We study outlier tokens in Diffusion Transformers (DiTs) for image generation. Prior work has shown that Vision Transformers (ViTs) can produce a small number of high-norm tokens that attract disproportionate attention while carrying limited local information, but their role in generative models remains underexplored. We show that this phenomenon appears in both the encoder and denoiser of modern Representation Autoencoder (RAE)-DiT pipelines: pretrained ViT encoders can produce outlier representations, and DiTs themselves can develop internal outlier tokens, especially in intermediate layers. Moreover, simply masking high-norm tokens does not improve performance, indicating that the problem is not only caused by a few extreme values, but is more closely related to corrupted local patch semantics. To address this issue, we introduce Dual-Stage Registers (DSR), a register-based intervention for both components: trained registers when available, recursive test-time registers otherwise, and diffusion registers for the denoiser. Across ImageNet and large-scale text-to-image generation, these interventions consistently reduce outlier artifacts and improve generation quality. Our results highlight outlier-token control as an important ingredient in building stronger DiTs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05206v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiaoyu Wu, Yifei Wang, Tsu-Jui Fu, Liang-Chieh Chen, Zhe Gan, Chen Wei</dc:creator>
    </item>
    <item>
      <title>Syn4D: A Multiview Synthetic 4D Dataset</title>
      <link>https://arxiv.org/abs/2605.05207</link>
      <description>arXiv:2605.05207v1 Announce Type: new 
Abstract: Dense 3D reconstruction and tracking of dynamic scenes from monocular video remains an important open challenge in computer vision. Progress in this area has been constrained by the scarcity of high-quality datasets with dense, complete, and accurate geometric annotations. To address this limitation, we introduce Syn4D, a multiview synthetic dataset of dynamic scenes that includes ground-truth camera motion, depth maps, dense tracking, and parametric human pose annotations. A key feature of Syn4D is the ability to unproject any pixel into 3D to any time and to any camera. We conduct extensive evaluations across multiple downstream tasks to demonstrate the utility and effectiveness of the proposed dataset, including 4D scene reconstruction, 3D point tracking, geometry-aware camera retargeting, and human pose estimation. The experimental results highlight Syn4D's potential to facilitate research in dynamic scene understanding and spatiotemporal modeling.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05207v1</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zeren Jiang, Yushi Lan, Yihang Luo, Yufan Deng, Zihang Lai, Edgar Sucar, Christian Rupprecht, Iro Laina, Diane Larlus, Chuanxia Zheng, Andrea Vedaldi</dc:creator>
    </item>
    <item>
      <title>Analogy between Boltzmann machines and Feynman path integrals</title>
      <link>https://arxiv.org/abs/2301.06217</link>
      <description>arXiv:2301.06217v1 Announce Type: cross 
Abstract: We provide a detailed exposition of the connections between Boltzmann machines commonly utilized in machine learning problems and the ideas already well known in quantum statistical mechanics through Feynman's description of the same. We find that this equivalence allows the interpretation that the hidden layers in Boltzmann machines and other neural network formalisms are in fact discrete versions of path elements that are present within the Feynman path-integral formalism. Since Feynman paths are the natural and elegant depiction of interference phenomena germane to quantum mechanics, it appears that in machine learning, the goal is to find an appropriate combination of ``paths'', along with accumulated path-weights, through a network that cumulatively capture the correct $x \rightarrow y$ map for a given mathematical problem. As a direct consequence of this analysis, we are able to provide general quantum circuit models that are applicable to both Boltzmann machines and to Feynman path integral descriptions. Connections are also made to inverse quantum scattering problems which allow a robust way to define ``interpretable'' hidden layers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2301.06217v1</guid>
      <category>quant-ph</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1021/acs.jctc.3c00187</arxiv:DOI>
      <arxiv:journal_reference>Journal of Chemical Theory and Computation 2023 19 (9), 2446-2454</arxiv:journal_reference>
      <dc:creator>Srinivasan S. Iyengar, Sabre Kais</dc:creator>
    </item>
    <item>
      <title>A large language model-type architecture for high-dimensional molecular potential energy surfaces</title>
      <link>https://arxiv.org/abs/2412.03831</link>
      <description>arXiv:2412.03831v2 Announce Type: cross 
Abstract: Computing high-dimensional potential energy surfaces for molecular systems and materials is considered to be a great challenge in computational chemistry with potential impact in a range of areas including the fundamental prediction of reaction rates. In this paper, we design and discuss an algorithm that has similarities to large language models in generative AI and natural language processing. Specifically, we represent a molecular system as a graph which contains a set of nodes, edges, faces, etc. Interactions between these sets, which represent molecular subsystems in our case, are used to construct the potential energy surface for a reasonably sized chemical system with 51 nuclear dimensions. For this purpose, a family of neural networks that pertain to the graph-theoretically obtained subsystems get the job done for this 51 nuclear dimensional system. We then ask if this same family of lower-dimensional graph-based neural networks can be transformed to provide accurate predictions for a 186-dimensional potential energy surface. We find that our algorithm does provide accurate results for this larger-dimensional problem with sub-kcal/mol accuracy for the higher-dimensional potential energy surface problem. Indeed, as a result of these developments, here we produce the first efforts towards a full-dimensional potential energy surface for the protonated 21-water cluster (186 nuclear dimensions) at CCSD level accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2412.03831v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>physics.atm-clus</category>
      <category>physics.chem-ph</category>
      <category>physics.comp-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <arxiv:journal_reference>Phys. Rev. X, 2026</arxiv:journal_reference>
      <dc:creator>Xiao Zhu, Srinivasan S. Iyengar</dc:creator>
    </item>
    <item>
      <title>System-of-systems Modeling and Optimization: An Integrated Framework for Intermodal Mobility</title>
      <link>https://arxiv.org/abs/2507.08715</link>
      <description>arXiv:2507.08715v1 Announce Type: cross 
Abstract: For developing innovative systems architectures, modeling and optimization techniques have been central to frame the architecting process and define the optimization and modeling problems. In this context, for system-of-systems the use of efficient dedicated approaches (often physics-based simulations) is highly recommended to reduce the computational complexity of the targeted applications. However, exploring novel architectures using such dedicated approaches might pose challenges for optimization algorithms, including increased evaluation costs and potential failures. To address these challenges, surrogate-based optimization algorithms, such as Bayesian optimization utilizing Gaussian process models have emerged.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.08715v1</guid>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:journal_reference>ODAS 2024: 24th joint ONERA-DLR Aerospace Symposium, DLR, Jun 2024, Brunschweig, Germany</arxiv:journal_reference>
      <dc:creator>Paul Saves, Jasper Bussemaker, R\'emi Lafage, Thierry Lefebvre, Nathalie Bartoli, Youssef Diouane, Joseph Morlier</dc:creator>
    </item>
    <item>
      <title>Segmenting proto-halos with vision transformers</title>
      <link>https://arxiv.org/abs/2508.00049</link>
      <description>arXiv:2508.00049v2 Announce Type: cross 
Abstract: The formation of dark-matter halos from small cosmological perturbations generated in the early universe is a highly non-linear process typically modeled through N-body simulations. In this work, we explore the use of deep learning to segment and classify proto-halo regions in the initial density field according to their final halo mass at redshift z=0. We compare two architectures: a fully convolutional neural network (CNN) based on the V-Net design and a U-Net transformer. We find that the transformer-based network significantly outperforms the CNN across all metrics, achieving sub-percent error in the total segmented mass per halo class. Both networks deliver much higher accuracy than the perturbation-theory-based model \textsc{pinocchio}, especially at low halo masses and in the detailed reconstruction of proto-halo boundaries. We also investigate the impact of different input features by training models on the density field, the tidal shear, and their combination. Finally, we use Grad-CAM to generate class-activation heatmaps for the CNN, providing preliminary yet suggestive insights into how the network exploits the input fields.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.00049v2</guid>
      <category>astro-ph.CO</category>
      <category>astro-ph.IM</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1088/1475-7516/2025/11/083</arxiv:DOI>
      <arxiv:journal_reference>JCAP 11 (2025) 083</arxiv:journal_reference>
      <dc:creator>Toka Alokda, Cristiano Porciani</dc:creator>
    </item>
    <item>
      <title>Learning Reconstructive Embeddings in Reproducing Kernel Hilbert Spaces via the Representer Theorem</title>
      <link>https://arxiv.org/abs/2601.05811</link>
      <description>arXiv:2601.05811v1 Announce Type: cross 
Abstract: Motivated by the growing interest in representation learning approaches that uncover the latent structure of high-dimensional data, this work proposes new algorithms for reconstruction-based manifold learning within Reproducing-Kernel Hilbert Spaces (RKHS). Each observation is first reconstructed as a linear combination of the other samples in the RKHS, by optimizing a vector form of the Representer Theorem for their autorepresentation property. A separable operator-valued kernel extends the formulation to vector-valued data while retaining the simplicity of a single scalar similarity function. A subsequent kernel-alignment task projects the data into a lower-dimensional latent space whose Gram matrix aims to match the high-dimensional reconstruction kernel, thus transferring the auto-reconstruction geometry of the RKHS to the embedding. Therefore, the proposed algorithms represent an extended approach to the autorepresentation property, exhibited by many natural data, by using and adapting well-known results of Kernel Learning Theory. Numerical experiments on both simulated (concentric circles and swiss-roll) and real (cancer molecular activity and IoT network intrusions) datasets provide empirical evidence of the practical effectiveness of the proposed approach.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.05811v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <arxiv:DOI>10.1109/OJCS.2026.3682462</arxiv:DOI>
      <dc:creator>Enrique Feito-Casares, Francisco M. Melgarejo-Meseguer, Jos\'e-Luis Rojo-\'Alvarez</dc:creator>
    </item>
    <item>
      <title>Interpreting Manifolds and Graph Neural Embeddings from Internet of Things Traffic Flows</title>
      <link>https://arxiv.org/abs/2602.05817</link>
      <description>arXiv:2602.05817v2 Announce Type: cross 
Abstract: The rapid expansion of Internet of Things (IoT) ecosystems has led to increasingly complex and heterogeneous network topologies. Traditional network monitoring and visualization tools rely on aggregated metrics or static representations, which fail to capture the evolving relationships and structural dependencies between devices. Although Graph Neural Networks (GNNs) offer a powerful way to learn from relational data, their internal representations often remain opaque and difficult to interpret for security-critical operations. Consequently, this work introduces an interpretable pipeline that generates directly visualizable low-dimensional representations by mapping high-dimensional embeddings onto a latent manifold. This projection enables the interpretable monitoring and interoperability of evolving network states, while integrated feature attribution techniques decode the specific characteristics shaping the manifold structure. The framework achieves a classification F1-score of 0.830 for intrusion detection while also highlighting phenomena such as concept drift. Ultimately, the presented approach bridges the gap between high-dimensional GNN embeddings and human-understandable network behavior, offering new insights for network administrators and security analysts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.05817v2</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.NI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Enrique Feito-Casares, Francisco M. Melgarejo-Meseguer, Elena Casiraghi, Giorgio Valentini, Jos\'e-Luis Rojo-\'Alvarez</dc:creator>
    </item>
    <item>
      <title>The Resurrection of Spectrum Spreading for 6G and Beyond: From Sinusoids to Chirps</title>
      <link>https://arxiv.org/abs/2605.00249</link>
      <description>arXiv:2605.00249v1 Announce Type: cross 
Abstract: Orthogonal frequency-division multiplexing (OFDM) and its static sinusoidal subcarriers have underpinned the 4G and 5G eras, delivering high spectral efficiency and resilience to multipath fading through an efficient multicarrier architecture. However, as future systems move toward doubly dispersive environments driven by high-mobility applications and migration to mmWave/sub-THz bands, the time-invariance assumption underlying OFDM becomes increasingly difficult to maintain, and Doppler-induced degradation becomes prominent. While enhancements such as MIMO, advanced coding, and scheduling provide incremental remedies, they introduce additional overhead, because the sinusoidal subcarrier itself offers no inherent waveform-level robustness to Doppler impairments. Accordingly, two time-frequency spreading philosophies have emerged to improve Doppler resilience by distributing each symbol's energy across both dimensions of the time-frequency plane: (i) 2D isotropic spreading via the delay-Doppler (DD) domain, exemplified by the orthogonal time frequency space (OTFS) family, and (ii) sheared spreading via parameterizable chirps, exemplified by the affine frequency-division multiplexing (AFDM) family. In this article, we examine key considerations for future waveform design across these paradigms and argue that transitioning from the sinusoidal subcarriers of OFDM to the chirp-based subcarriers offers a viable design direction for improving Doppler robustness while retaining much of the mature OFDM infrastructure. This perspective also highlights the suitability of chirp-based waveforms for integrated sensing and communications (ISAC) and their extensibility to emerging physical-layer techniques. Overall, we argue that the transition from sinusoids to chirps is a technically motivated, compelling evolutionary direction for future wireless physical layer design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00249v1</guid>
      <category>eess.SP</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hyeon Seok Rou, Giuseppe Thadeu Freitas de Abreu, Emil Bj\"ornson, Sunwoo Kim, Marios Kountouris</dc:creator>
    </item>
    <item>
      <title>Modeling Subjective Urban Perception with Human Gaze</title>
      <link>https://arxiv.org/abs/2605.00764</link>
      <description>arXiv:2605.00764v1 Announce Type: cross 
Abstract: Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computational approaches primarily model urban perception directly from street view images, but largely ignore the human perceptual process through which such judgments are formed. In this paper, we introduce Place Pulse-Gaze, an urban perception dataset that augments street view images with synchronized eye-tracking recordings and individual perception labels. Based on this dataset, we propose a Gaze-Guided Urban Perception Framework to study how gaze behavior contributes to the modeling of subjective urban perception. The framework systematically investigates three complementary settings: gaze-only modeling, gaze fusion with explicit semantic scene representations, and gaze fusion with implicit richer visual representations. Experiments show that gaze alone already carries useful predictive signals for subjective urban perception, and that integrating gaze with scene representations further improves prediction under both semantic and richer visual representations. Overall, our findings highlight the importance of incorporating human perceptual processes into urban scene understanding and open a direction for gaze-guided multimodal urban computing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00764v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lin Che, Xi Wang, Marc Pollefeys, Konrad Schindler, Martin Raubal, Peter Kiefer</dc:creator>
    </item>
    <item>
      <title>The Reasoning Trap: An Information-Theoretic Bound on Closed-System Multi-Step LLM Reasoning</title>
      <link>https://arxiv.org/abs/2605.01704</link>
      <description>arXiv:2605.01704v2 Announce Type: cross 
Abstract: When copies of the same language model are prompted to debate, they produce diverse phrasings of one perspective rather than diverse perspectives. Multi-agent debate (MAD), and more broadly closed-system reasoning where agents iteratively transform each other's outputs, tends to preserve answer accuracy while degrading the reasoning behind those answers. We name the multi-agent case the Debate Trap and the broader phenomenon the Reasoning Trap, offering a programmatic theory of evidence-grounded reasoning failure.The framework has three parts: (i) SFS (Supported Faithfulness Score), a claim-level metric verifying decomposed atomic claims against provided evidence (decomposer-invariant rankings: Spearman rho=1.0); (ii) EGSR (Evidence-Grounded Socratic Reasoning), replacing adversarial argumentation with evidence-grounded inquiry; (iii) Theorem 1 (DPI Bound): under standard MAD, the chain E -&gt; O^0 -&gt; O^1 -&gt; ... is Markov, and the Data Processing Inequality implies E[I(E;O^{t+1})] &lt;= E[I(E;O^t)]. Three companion results -- open-system recovery (Theorem 2), EGSR accumulation (Lemma 2), and vote-aggregation floor (Proposition 1) -- partition multi-step LLM reasoning by its information-theoretic relationship to E. Across 16 conditions on SciFact (300 claims) and FEVER (1,000 claims), DebateCV (C13) preserves 88% of baseline accuracy while SFS drops 43%; majority-vote MAD (C15) reduces SFS to 1.7% of baseline (p &lt; 10^{-6}, d = -0.96); EGSR recovers 98%. An R6 cohort study (Korean n=10x30 FEVER; English n=3x200 SciFact) finds inter-rater Fleiss kappa &lt;= +0.018 with 0.8-1.4 Likert intra-rater shifts across language and domain -- the human agreement that faithfulness metrics have been calibrated against is not itself stable. We offer one falsifiable conjecture: any closed-system reasoning protocol preserving Theorem 1's Markov structure is, in expectation, subject to the same DPI bound.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01704v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kwan Soo Shin</dc:creator>
    </item>
    <item>
      <title>Permutation Routing on Ramanujan Hypergraphs with Applications to Neutral Atom Quantum Architectures</title>
      <link>https://arxiv.org/abs/2605.02498</link>
      <description>arXiv:2605.02498v2 Announce Type: cross 
Abstract: We consider the routing of neutral atoms on a reconfigurable lattice in terms of hypergraph transformations. We prove the routing number of a Ramanujan $(d,r)$-regular hypergraph on $N$ vertices satisfies $\mathrm{rt}(H) = \Theta(\log N)$, where routing is via matchings in the clique expansion graph $G_{\mathrm{cl}}(H)$. Hypergraphs reframe the qubit routing problem by replacing Nenadov's two-sided spectral gap hypothesis with a one-sided condition based on eigenvalue centering. Song--Fan--Miao (SFM) coverings scale for Ramanujan families of every uniformity. A virtual overlay theorem establishes a capacity--depth tradeoff for 3D acousto-optic lens (AOL) architectures, with multi-layer stacking achieving $\Theta(\log N)$ routing with $L = O(\log N)$ independent overlay layers. An abelian Alon--Boppana barrier shows that fixed-degree Cayley graphs on $\mathbb{Z}_n^2$ cannot be Ramanujan and affine derandomization on such graphs achieves 15--30% congestion reduction. Towers of $k$-fold Ramanujan coverings yield $\mathrm(H_L) = O(\log N)$ by recursive routing lift. Entanglement-assisted routing by pre-distributed Bell pairs achieves $O(\log N)$ teleportation depth with a stable crossover at $\sim\!4$ routing rounds. Displacement energy analyzes greedy adaptive routing, identifying stalling and a hybrid greedy--Valiant protocol achieving $\sim\!3\times$ speedup at practical scales. Hierarchical multi-scale routing achieves $O(\log^2 N / \log b)$ depth with boundary-only transfers at capacity $k = O(\sqrt{N} \log N)$, and $O(\log N)$ depth with optimal block size $b = \Theta(\sqrt{n})$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02498v2</guid>
      <category>quant-ph</category>
      <category>cs.DS</category>
      <category>math-ph</category>
      <category>math.MP</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Joshua M. Courtney</dc:creator>
    </item>
    <item>
      <title>Reproducing Complex Set-Compositional Information Retrieval</title>
      <link>https://arxiv.org/abs/2605.03824</link>
      <description>arXiv:2605.03824v1 Announce Type: cross 
Abstract: Complex information needs may involve set-compositional queries using conjunction, disjunction, and exclusion, yet it remains unclear whether current retrieval paradigms genuinely satisfy such constraints or exploit `semantic shortcuts'. We conduct a reproducibility study to benchmark major retrieval families and reasoning-targeted methods on QUEST and QUEST+Variants, and introduce LIMIT+, a controlled benchmark where relevance depends on arbitrary attribute predicates and constraint satisfaction, and less on pretrained knowledge. Our findings show that (i) on QUEST, the best neural retrievers achieve an effectiveness that is more than double what can be achieved with BM25 (Recall@100 ${&gt;}$0.41 vs.\ 0.20), but reasoning-targeted methods like ReasonIR and Search-R1 do not outperform general-purpose retrievers uniformly; (ii) on LIMIT+, gains fail to transfer, where the strongest QUEST method collapses from Recall@100${\approx}$0.42 to below 0.02, while classic lexical retrieval gains to ${\sim}$0.96. Lastly, (iii) stratifying by compositional depth reveals a consistent degradation across all methods, where algebraic sparse and lexical methods show more stable performance while dense approaches collapse. We release code and LIMIT+ data generation scripts to support future reproducibility and controlled evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03824v1</guid>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vincent Degenhart, Dewi Timman, Arjen P. de Vries, Faegheh Hasibi, Mohanna Hoveyda</dc:creator>
    </item>
    <item>
      <title>A Consistency-Centric Approach to Set-Based Optimization with Multiple Models of Unranked Fidelity</title>
      <link>https://arxiv.org/abs/2605.04051</link>
      <description>arXiv:2605.04051v1 Announce Type: cross 
Abstract: In complex real-world settings, optimization is challenged by the presence of diverse models of differing fidelity. In many optimization problems, a single model is treated as the most accurate representation of the underlying system, while other models are evaluated primarily by their agreement with this presumed most accurate model. Yet in real-world applications, model accuracy is rarely known a priori and assuming a single most accurate model can be misleading. This paper addresses this gap by proposing a flexible set-based optimization methodology called Set-Based Optimization with Multiple Models (S-BOMM) that works with multiple models without the assumption of a most accurate high-fidelity model. Unlike traditional optimization approaches that focus on finding an optimal solution according to the high-fidelity model, our methodology utilizes consistency between models to identify good solutions across multiple models. A probabilistic analysis of the consistency method is provided that bounds the likelihood of the methodology producing correct or incorrect results. Empirical results demonstrate the effectiveness of S-BOMM on test problems. By focusing on the consistency across models rather than relying on a single best solution, this set-based approach offers a practical alternative to optimization problems where multiple models must be considered without assuming a single most accurate high-fidelity model.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04051v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Danielle F. Morey, Giulia Pedrielli, Cherry Y. Wakayama, Zelda B. Zabinsky</dc:creator>
    </item>
    <item>
      <title>BOOOM: Loss-Function-Agnostic Black-Box Optimization over Orthonormal Manifolds for Machine Learning and Statistical Inference</title>
      <link>https://arxiv.org/abs/2605.04087</link>
      <description>arXiv:2605.04087v1 Announce Type: cross 
Abstract: Optimization over the Stiefel manifold $\mathrm{St}(p,d)$, the set of $p \times d$ column-orthonormal matrices, is fundamental in statistics, machine learning, and scientific computing, yet remains challenging in the presence of non-convex, non-smooth, or black-box objectives. Existing methods largely rely on either convex relaxations or gradient-based Riemannian optimization, limiting applicability in derivative-free and highly multimodal settings. We propose \textsc{BOOOM} (Black-box Optimization Over Orthonormal Manifolds), a general-purpose framework for loss-function-agnostic optimization on $\mathrm{St}(p,d)$. The key idea is a global Givens rotation-based parametrization that maps the manifold to an unconstrained Euclidean angle space while preserving feasibility exactly. Building on this representation, BOOOM employs a structured, parallelizable, derivative-free search based on Recursive Modified Pattern Search, enabling systematic exploration through plane-wise rotations without requiring gradient information and facilitating escape from poor local optima. We establish a unified theoretical framework showing equivalence between angle-space and manifold optimization, transfer of stationarity, and global convergence in probability under mild conditions. Empirical results across diverse problems, including heterogeneous quadratic optimization, low-rank and sparse matrix decomposition, independent component analysis, and orthogonal joint diagonalization, among other widely studied settings, demonstrate strong performance relative to state-of-the-art methods, particularly in non-smooth and highly multimodal regimes. We further illustrate its practical utility through a novel supervised PCA formulation applied to metabolomics data in colorectal cancer.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04087v1</guid>
      <category>math.OC</category>
      <category>cs.LG</category>
      <category>stat.CO</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Beomchang Kim, Subhrajyoty Roy, Priyam Das</dc:creator>
    </item>
    <item>
      <title>CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness</title>
      <link>https://arxiv.org/abs/2605.04097</link>
      <description>arXiv:2605.04097v1 Announce Type: cross 
Abstract: Despite remarkable advances, today's AI systems remain narrow in scope, falling short of the flexible, adaptive, and multisensory intelligence that characterizes human capabilities. This gap has fueled longstanding debates about whether AI might one day achieve human-like generality or even consciousness, and whether theories of consciousness can inspire new architectures for AI. This paper presents an early blueprint for implementing a general AI system, CTM-AI, combining the Conscious Turing Machine (CTM), a formal machine model of consciousness, with today's foundation models. CTM-AI contains an enormous number of powerful processors ranging from specialized experts (e.g., vision-language models and APIs) to unspecialized general-purpose learners poised to develop their own expertise. Crucially, for whatever problem must be dealt with, information from many processors is selected, integrated, and exchanged appropriately to solve the task. CTM-AI achieves state-of-the-art accuracy on MUStARD (72.28) and UR-FUNNY (72.13), outperforming multimodal and multi-agent frameworks. On tool-using and agentic tasks, CTM-AI achieves 10+ points of improvement on StableToolBench and WebArena-Lite. Overall, CTM-AI offers a principled, testable blueprint for general AI inspired by a model of consciousness.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04097v1</guid>
      <category>q-bio.NC</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haofei Yu, Yining Zhao, Lenore Blum, Manuel Blum, Paul Pu Liang</dc:creator>
    </item>
    <item>
      <title>Meta-LegNet: A Transferable and Interpretable Framework for Surface Adsorption Prediction via Self-Defined Adsorption-Environment Learning</title>
      <link>https://arxiv.org/abs/2605.04102</link>
      <description>arXiv:2605.04102v1 Announce Type: cross 
Abstract: A central challenge in computational catalysis is the identification of low-energy and chemically plausible adsorption configurations, as these directly affect adsorption energies, reaction pathways, and catalytic performance. Existing approaches generally rely on enumerating candidate adsorption sites followed by iterative refinement through density functional theory calculations or machine-learning-based relaxations. However, such workflows remain computationally expensive and are difficult to scale to complex surfaces or multi-adsorbate systems. Here, we introduce Meta-LegNet, a graph learning framework that combines SE(3)-equivariant atom-level message passing with voxel-based multiscale aggregation and cross-domain meta-learning to learn transferable representations of local adsorption environments across diverse catalyst--adsorbate systems. Rather than following a conventional regression-only paradigm, Meta-LegNet encodes local chemical environments using invariant radial features and equivariant directional information, and further incorporates broader structural context through coordinate-frame voxel pooling, assignment-based upsampling, and gated feature fusion. The resulting local-global decomposition produces atom-resolved attribution maps, which are processed to identify adsorption-relevant local environments in an interpretable manner. Based on the learned representations, we further construct an adsorption-environment database and develop a template-matching strategy to propose likely adsorption sites on previously unexplored surfaces without exhaustive site enumeration. Overall, our results suggest that learning transferable adsorption environments provides an accurate, interpretable, and practical route for accelerating catalyst screening.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04102v1</guid>
      <category>cond-mat.mtrl-sci</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yifan Li, Arravind Subramanian, Xiaoqing Liu, Qiujie Lyu, Sergey Kozlov, Lei Shen</dc:creator>
    </item>
    <item>
      <title>ProtDBench: A Unified Benchmark of Protein Binder Design and Evaluation</title>
      <link>https://arxiv.org/abs/2605.04118</link>
      <description>arXiv:2605.04118v1 Announce Type: cross 
Abstract: Recent advances in de novo protein binder design have enabled increasing experimental validation, yet reported in silico metrics remain difficult to interpret or compare across studies due to non-standardized evaluation protocols. We introduce ProtDBench, a standardized and throughput-aware evaluation framework for protein binder design. ProtDBench defines unified benchmark tasks, evaluation protocols, and success criteria, enabling systematic analysis of how evaluation design influences observed performance. Using a large wet-lab annotated dataset, we analyze commonly used structure prediction models as evaluation verifiers, revealing substantial verifier-dependent bias and limited agreement under identical filtering protocols. We then benchmark representative open-source generative binder design methods across ten diverse protein targets under a fixed evaluation protocol. Beyond per-sequence success rates, ProtDBench incorporates throughput-aware metrics based on a fixed 24-hour budget, as well as cluster-level success criteria to account for structural diversity. Together, these results expose systematic differences induced by filtering rules, success definitions, and throughput-aware evaluation between computational efficiency, success rate, and structural diversity. Overall, ProtDBench provides a fair and reproducible evaluation pipeline that supports systematic and controlled comparison of protein binder design methods under realistic evaluation settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04118v1</guid>
      <category>q-bio.QM</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Cong Liu, Milong Ren, Jiaqi Guan, Chengyue Gong, Jinyuan Sun, Xinshi Chen, Wenzhi Xiao</dc:creator>
    </item>
    <item>
      <title>Tree-Conditioned Edit Flows for Ancestral Sequence Reconstruction</title>
      <link>https://arxiv.org/abs/2605.04119</link>
      <description>arXiv:2605.04119v1 Announce Type: cross 
Abstract: Ancestral sequence reconstruction (ASR) aims to infer extinct protein sequences at internal nodes of a phylogenetic tree. Classical ASR methods are typically based on continuous-time Markov substitution models, but they treat sites largely independently and handle insertions and deletions only weakly or not at all. We introduce a tree-conditioned edit-flow model for variable-length ASR. Given two descendant sequences and their branch distances to a shared ancestor, the model reconstructs the ancestor through paired bidirectional edit trajectories constrained to agree on a common ancestral state. On a benchmark of experimentally evolved sequences with only context-independent substitutions, the model does not match the accuracy of the best classical method, yet still achieves reasonable performance despite being trained on natural sequences that include insertions, deletions, and substitutions. On a benchmark of natural homologous sequences with abundant insertions and deletions, the model most accurately localizes inferred evolutionary change.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04119v1</guid>
      <category>q-bio.QM</category>
      <category>cs.LG</category>
      <category>q-bio.PE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Emil Sharafutdinov, Ingemar Andr\'e</dc:creator>
    </item>
    <item>
      <title>A Dialogue-Based Framework for Correcting Multimodal Errors in AI-Assisted STEM Education</title>
      <link>https://arxiv.org/abs/2605.04131</link>
      <description>arXiv:2605.04131v1 Announce Type: cross 
Abstract: Large Language Models (LLMs) are democratizing access to personalized tutoring; however, their effectiveness is hindered by challenges in processing multimodal content, which limits AI's potential to provide equitable, high-quality STEM support. This study evaluates LLM performance on multimodal physics problems, identifies specific failure modes through an empirical error taxonomy, and tests practical interventions designed to overcome multimodal processing limitations. We assessed three publicly available LLMs (Claude, Gemini, and ChatGPT) on multimodal physics problems from the OpenStax database and compared the results with text-only performance. An empirically derived error taxonomy was developed through pilot testing, followed by evaluation of a structured multimodal dialogue intervention. All three models achieved near-ceiling accuracy (96%) on text-only physics problems. Performance declined substantially on multimodal problems, consistent with what we term the Multimodal Interference Effect. Error analysis identified four failure modes: visual processing errors, context misinterpretation, mathematical computational errors, and hybrid errors, with visual processing errors being the most prevalent. The structured dialogue intervention corrected 82% of errors overall; visual processing errors were corrected at 100% across all models. Educators and students can implement these interventions immediately, requiring no model retraining, to improve AI tutoring reliability on image-rich STEM content, advancing equitable access to high-quality learning support.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04131v1</guid>
      <category>physics.ed-ph</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Akshay Syal, Lawrence Swaminathan Xavier Prince, Evin Gultepe, Nik Bear Brown, Srinivas Sridhar</dc:creator>
    </item>
    <item>
      <title>Error analysis for learning fractional stochastic differential equations with applications in neural approximations</title>
      <link>https://arxiv.org/abs/2605.04168</link>
      <description>arXiv:2605.04168v1 Announce Type: cross 
Abstract: This paper develops a framework for the error analysis in nonparametric model fitting of fractional stochastic differential equations based on discrete observations. We identify and quantify the main error sources -- time discretization, coefficient approximation, and model fitting error -- within a unified framework. Through Sobolev-type norms, we derive convergence rates that incorporate the regularity of trajectories, thereby capturing the interaction of these error components. To demonstrate the applicability of the theory, we introduce a training scheme for coefficient function estimation based on shallow neural networks and a recurrent architecture. Numerical experiments validate the theoretical findings and illustrate the effectiveness of the approach.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04168v1</guid>
      <category>math.PR</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mahdi Dehshiri, Kerlyns Martinez, Lauri Viitasaari</dc:creator>
    </item>
    <item>
      <title>Heterogeneous Ordinal Structure Learning with Bayesian Nonparametric Complexity Discovery</title>
      <link>https://arxiv.org/abs/2605.04191</link>
      <description>arXiv:2605.04191v1 Announce Type: cross 
Abstract: Public attitudes toward artificial intelligence are heterogeneous, ordinally measured, and poorly captured by any single dependency graph. Existing ordinal structure learners assume a shared directed acyclic graph (DAG) across all respondents; recent heterogeneous ordinal graphical-model approaches focus on subgroup discovery rather than confirmatory cluster-specific DAG estimation; and latent profile analyses discard dependency structure entirely. We introduce a heterogeneous ordinal structure-learning framework combining monotone Gaussian score embedding, Bayesian nonparametric (BNP) complexity discovery via a truncated stick-breaking prior, and confirmatory fixed-K estimation with cluster-specific sparse DAG learning. The key methodological insight is a discovery-to-confirmation workflow: the nonparametric stage calibrates plausible archetype complexity, while inner-validated confirmatory refitting yields stable, interpretable structural estimates. On the 2024 Pew American Trends Panel AI attitudes survey, Wave 152 (W152) survey, (N = 4,788, 8 ordinal items), the confirmatory K*=5 model reduces holdout transformed-score mean squared error (MSE) by 25.8% over a single-graph baseline and by 4.6% over mixture-only clustering. A controlled tiered semi-synthetic benchmark calibrated to W152 structure validates recovery across difficulty regimes and transparently reveals failure modes under stress conditions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04191v1</guid>
      <category>stat.ML</category>
      <category>cs.CY</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Amir Rafe, Subasish Das</dc:creator>
    </item>
    <item>
      <title>Calculating Domain of Attraction Boundary of Power Systems Based on the Gentlest Ascent Dynamics</title>
      <link>https://arxiv.org/abs/2605.04197</link>
      <description>arXiv:2605.04197v1 Announce Type: cross 
Abstract: The power system, a fundamental public utility, is increasingly important due to growing global electricity demand. Recent large-scale blackouts (e.g., Iberian Peninsula, UK) have raised concerns about transient stability under impact faults. Transient stability is determined by post-disturbance synchronizing capability of synchronous generators, formulated as identifying the domain of attraction (DOA) boundary of the asymptotically stable equilibrium. Using a benchmark model of synchronous-generator-dominated power systems, this report employs a gentlest ascent dynamics (GAD) method for 1-saddle points, an adjoint operator method for periodic orbits, and stable manifold algorithms to compute the DOA boundary. These algorithms transform DOA boundary determination into constructing unstable critical elements (saddle points and periodic orbits) and their stable manifolds. Theoretically, under certain assumptions we prove that the DOA boundary is the closure of the union of stable manifolds of index-1 critical elements, and establish a stability theory for a perturbed GAD system. Numerical experiments on two-machine and three-machine systems (with only saddle points or with periodic orbits) validate the effectiveness and accuracy. Results show the algorithms accurately capture the geometric structure of the DOA boundary, providing a new numerical tool for transient stability analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04197v1</guid>
      <category>math.DS</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sixu Wu, Chenmin Zhang, Aiqing Zhu, Yang Liu, Jianxi Lin, Yifa Tang</dc:creator>
    </item>
    <item>
      <title>Conflict-Aware Seat Assignment in Classroom Environments</title>
      <link>https://arxiv.org/abs/2605.04235</link>
      <description>arXiv:2605.04235v1 Announce Type: cross 
Abstract: Classroom dynamics depend on various elements that influence teaching performance and learning activities. A key challenge is to determine the most effective seating plan, where students will seat in a specific classroom setting to achieve the best learning environment. This paper introduces the Student Seat Allocation Problem (SSAP) for strategically organizing student seating in traditional classrooms to minimize interpersonal conflicts. We propose a mathematical model and an Iterated Local Search (ILS) heuristic to solve the SSAP. Computational experiments demonstrated that ILS outperformed in more complex scenarios when compared to the results obtained by a commercial solver on the introduced mathematical model. ILS was particularly efficient in real and artificial instances that exhibited a higher number of conflicts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04235v1</guid>
      <category>math.CO</category>
      <category>cs.CY</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bruna Cristina Braga Charytitsch, Mari\'a Cristina Vasconcelos Nascimento</dc:creator>
    </item>
    <item>
      <title>Globally Solving Unbalanced Optimal Transport and Density Control for Gaussian Distributions</title>
      <link>https://arxiv.org/abs/2605.04246</link>
      <description>arXiv:2605.04246v1 Announce Type: cross 
Abstract: In this article, we study unbalanced optimal transport (UOT) and establish a control-theoretic dynamical extension, which we call the unbalanced density control (UDC), for a class of Gaussian reference measures. In the static setting, we consider UOT with quadratic transport cost and Kullback--Leibler penalties on the marginals relative to prescribed Gaussian measures. We show that the infinite-dimensional variational problem admits an exact Gaussian reduction, yielding a finite-dimensional optimization over masses, means, and covariances, together with a closed-form expression for the optimal transported mass. We then formulate UDC for discrete-time linear systems, where the initial and terminal state measures are imposed softly through KL penalties and the intermediate evolution is governed by controlled linear dynamics with quadratic control cost. For this problem, we prove that any feasible solution can be replaced, without loss of optimality, by a Gaussian initial measure and an affine-Gaussian control policy. This leads to an exact finite-dimensional reformulation and, after a standard covariance-steering lifting, to an SDP-based optimization for fixed mass, again coupled with a closed-form mass update. We further establish existence of optimal solutions and identify a sufficient condition under which the affine-Gaussian UDC policy is deterministic. These results provide globally optimal solution methods for both Gaussian UOT and Gaussian UDC. Finally, we illustrate our results with several numerical examples.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04246v1</guid>
      <category>math.OC</category>
      <category>cs.LG</category>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haruto Nakashima, Siddhartha Ganguly, Kenji Kashima</dc:creator>
    </item>
    <item>
      <title>Entropic Riemannian Neural Optimal Transport</title>
      <link>https://arxiv.org/abs/2605.04255</link>
      <description>arXiv:2605.04255v1 Announce Type: cross 
Abstract: Many machine learning problems involve data supported on curved spaces such as spheres, rotation groups, hyperbolic spaces, and general Riemannian manifolds, where Euclidean geometry can distort distances, averages, and the resulting optimal transport (OT) problem. Existing manifold OT methods have pursued amortized out-of-sample maps, while entropic regularization has made discrete OT more scalable, but these advantages have remained largely disjoint. We propose Entropic Riemannian Neural Optimal Transport (Entropic RNOT), a unified framework that combines intrinsic entropic OT with amortized out-of-sample evaluation on Riemannian manifolds. Our method learns a single target-side Schr\"odinger potential through a neural pullback parameterization, recovers the induced Gibbs coupling, and uses the resulting conditional laws to construct intrinsic transport surrogates. These include barycentric projections on Cartan-Hadamard manifolds and heat-smoothed conditional surrogates on stochastically complete manifolds, the latter turning possibly atomic target laws into absolutely continuous ones. For fixed regularization $\varepsilon&gt;0$, we prove that the proposed hypothesis class recovers the entropic optimal coupling in strong probabilistic metrics. As consequences, barycentric surrogates converge in $L^2$, while heat-smoothed surrogates are stable at fixed heat time and asymptotically unbiased as the heat time vanishes. The guarantees hold for compactly supported data on possibly noncompact manifolds. Empirically, our method matches or improves over Euclidean, tangent-space, and log-Euclidean baselines on benchmarks over $\mathbb{S}^2$, $\mathrm{SO}(3)$, $\mathrm{SPD}(3)$, $\mathrm{SE}(3)$, and $\mathbb{H}^2$, scales favorably relative to discrete manifold Sinkhorn, and in a protein-ligand docking application, refines poses on $\mathrm{SE}(3)$ without retraining or per-instance optimization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04255v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>stat.ME</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alessandro Micheli, Silvia Sapora, Anthea Monod, Samir Bhatt</dc:creator>
    </item>
    <item>
      <title>Adapt or Forget: Provable Tradeoffs Between Adam and SGD in Nonstationary Optimization</title>
      <link>https://arxiv.org/abs/2605.04269</link>
      <description>arXiv:2605.04269v1 Announce Type: cross 
Abstract: We provide a theoretical analysis of Adam under non-stationary stochastic objectives, separating two regimes: Euclidean tracking under adaptive strong monotonicity of the Adam-preconditioned mean-gradient operator, and high-probability projected stationarity guarantees under general $L$-smooth objectives. In the tracking regime, we derive finite-time expected and high-probability bounds that decompose sharply into four components: initialization, objective drift, a first-moment tracking error governed by $\beta_1$, and a preconditioner perturbation governed by $\beta_2$. We characterize the burn-in time to reach Adam's irreducible tracking floor under constant and step-decay schedules. We also prove a high-probability bound on the average projected stationarity gap for Adam under distribution shift. Across both analyses, our bounds reveal a noise--drift tradeoff: in noise-dominated regimes, first-moment averaging and adaptive preconditioning can improve the high-probability error, whereas in drift-dominated regimes, stale first-moment information and preconditioner perturbations can compound the cost of nonstationarity, allowing vanilla SGD to achieve a smaller tracking floor. Our explicit $(\beta_1,\beta_2,\epsilon)$-dependent bounds delineate when adaptive step-sizing is beneficial versus harmful, and provide a theoretical mechanism for Adam's empirical instability and stabilization under distribution shift.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04269v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sharan Sahu, Abir Sarkar, Cameron J. Hogan, Martin T. Wells</dc:creator>
    </item>
    <item>
      <title>Quantum Compression for Distributed Entanglement</title>
      <link>https://arxiv.org/abs/2605.04271</link>
      <description>arXiv:2605.04271v1 Announce Type: cross 
Abstract: We study compression strategies for multipartite entanglement distribution under uncertainty in the partitioning of the quantum state. When the partition is not known at the time of state preparation, we show that a joint design of the resource state and a family of compression schemes can increase the entanglement across partitions under a fixed transmission budget. We formulate this as a source coding problem and derive non-asymptotic upper and lower bounds on the achievable average entanglement subject to an average coding rate. We furthermore design an efficient method for jointly optimizing states and lossless compression maps by exploiting the inherent symmetry of weighted Dicke states. In the bipartite case, we propose practical constructions that closely approach the derived upper bound, and more generally we provide practical constructions for multipartite settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04271v1</guid>
      <category>quant-ph</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jan {\O}stergaard, Shashi Raj Pandey, Christophe Biscio, Torben Bach Pedersen, Petar Popovski</dc:creator>
    </item>
    <item>
      <title>Thinned Quantile Shares are Universally Feasible</title>
      <link>https://arxiv.org/abs/2605.04300</link>
      <description>arXiv:2605.04300v1 Announce Type: cross 
Abstract: Quantile shares, introduced by Babichenko, Feldman, Holzman, and Narayan [STOC 2024], offer an ordinal, self-maximizing, and interpretable benchmark for fair division of indivisible goods, but their universal feasibility is known only conditional on the rainbow Erd\H{o}s matching conjecture (EMC). Specifically, Babichenko et al. showed that assuming the rainbow EMC in the near-perfect matching regime, the $(1/2e)$-quantile share is universally feasible. In contrast, a simple argument shows that the $q$-quantile share can be infeasible for any $q &gt; 1/e$. We introduce a one-parameter refinement of quantile shares, the $c$-thinned quantile share, obtained by thinning the inclusion probability in the random benchmark bundle by a factor of $c$ for a fixed constant $c\in(0,1]$. Our main result is that there exists a universal constant $c &gt;0$ for which the $c$-thinned $e^{-c}$-quantile share is unconditionally universally feasible; this is best possible in the sense that for any $c \in (0,1]$, the $c$-thinned $q$-quantile share can be infeasible for any $q &gt; e^{-c}$. Prior to this work, the only nontrivial share known to be universally feasible was Feige's residual maximin share. The thinning viewpoint also lets us remove the factor-two loss in the conditional result for the original quantile share: assuming the rainbow EMC, the $(1/e)$-quantile share is universally feasible.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04300v1</guid>
      <category>math.ST</category>
      <category>cs.DM</category>
      <category>cs.GT</category>
      <category>math.CO</category>
      <category>stat.TH</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vishesh Jain, Clayton Mizgerd, Shyam Ravichandran</dc:creator>
    </item>
    <item>
      <title>A foundation model of vision, audition, and language for in-silico neuroscience</title>
      <link>https://arxiv.org/abs/2605.04326</link>
      <description>arXiv:2605.04326v1 Announce Type: cross 
Abstract: Cognitive neuroscience is fragmented into specialized models, each tailored to specific experimental paradigms, hence preventing a unified model of cognition in the human brain. Here, we introduce TRIBE v2, a tri-modal (video, audio and language) foundation model capable of predicting human brain activity in a variety of naturalistic and experimental conditions. Leveraging a unified dataset of over 1,000 hours of fMRI across 720 subjects, we demonstrate that our model accurately predicts high-resolution brain responses for novel stimuli, tasks and subjects, superseding traditional linear encoding models, delivering several-fold improvements in accuracy. Critically, TRIBE v2 enables in silico experimentation: tested on seminal visual and neuro-linguistic paradigms, it recovers a variety of results established by decades of empirical research. Finally, by extracting interpretable latent features, TRIBE v2 reveals the fine-grained topography of multisensory integration. These results establish artificial intelligence as a unifying framework for exploring the functional organization of the human brain.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04326v1</guid>
      <category>q-bio.NC</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>St\'ephane d'Ascoli, J\'er\'emy Rapin, Yohann Benchetrit, Teon Brooks, Katelyn Begany, Jos\'ephine Raugel, Hubert Banville, Jean-R\'emi King</dc:creator>
    </item>
    <item>
      <title>GPU-Accelerated Simulations of Problems with Moving Boundaries and Fluid-Structure Interaction at Extreme Scales</title>
      <link>https://arxiv.org/abs/2605.04335</link>
      <description>arXiv:2605.04335v1 Announce Type: cross 
Abstract: Computational fluid dynamics and fluid-structure interaction simulations involving moving and deforming bodies is extremely hard. In this work, we present a graphical processing unit (GPU) optimized implementation of the sharp-interface immersed boundary method. The method allows performing simulation around complex stationary as well as moving bodies on a Cartesian grid. We base our implementation on the ViCar3D framework and make use of OpenACC, CUDA, NCCL and MPI. We test the implementation across grid sizes ranging from O(10million) to O(1billion) points and achieved a 20X speedup compared to existing CPU implementation. We next present our multi-GPU implementation by utilizing CUDA streams and NCCL communicators. This enables us to obtain a &gt;90% strong and weak scaling efficiencies. Next we demonstrate the capability of the developed software to simulate a turbulent fluid flow and coupled fluid-structure interaction in flapping bat wing in flight at Re=5000.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04335v1</guid>
      <category>physics.comp-ph</category>
      <category>cs.DC</category>
      <category>physics.flu-dyn</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Sushrut Kumar, Joshua Romero, Jung-Hee Seo, Massimiliano Fatica, Rajat Mittal</dc:creator>
    </item>
    <item>
      <title>The Adversarial Discount - AI, Signal Correlation, and the Cybersecurity Arms Race</title>
      <link>https://arxiv.org/abs/2605.04336</link>
      <description>arXiv:2605.04336v1 Announce Type: cross 
Abstract: We study a contest-theoretic model of adversarial investment in which an attacker and a defender allocate resources to AI-augmented capabilities across multiple attack surfaces. The attacker's investment operates through two channels: it amplifies offensive potency unconditionally and erodes defensive effectiveness conditionally, generating an adversarial discount that deepens endogenously with the defender's own investment. We derive a closed-form arms race ratio decomposing the relative marginal effectiveness of offensive and defensive investment into six structural primitives and establish equilibrium uniqueness and global convergence under a continuous best-response dynamic. The central result concerns signal cross-correlation, the degree to which threat intelligence on one surface informs detection on another. With full cross-correlation, the arms race ratio is independent of the number of attack surfaces: the attacker's structural advantage from surface proliferation is completely neutralized. Under the benchmark full-dilution case, without cross-correlation, per-surface defense effectiveness vanishes as the attack surface grows. Extending the analysis to heterogeneous defenders facing an attacker who targets by expected value, we argue that the model points to a dual inefficiency: overinvestment in private defense (a zero-sum redirective externality) and underinvestment in shared signal correlation (a public good). These formal results, together with public-good reasoning outside the base model, characterize when collective information aggregation can dominate private capability investment as the decisive margin in adversarial contests.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04336v1</guid>
      <category>econ.TH</category>
      <category>cs.CR</category>
      <category>cs.GT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>James W. Bono</dc:creator>
    </item>
    <item>
      <title>Perturbation is All You Need for Extrapolating Language Models</title>
      <link>https://arxiv.org/abs/2605.04344</link>
      <description>arXiv:2605.04344v1 Announce Type: cross 
Abstract: We introduce a simple yet powerful framework for training large language models. In contrast to the standard autoregressive next-token prediction based on an exact prefix, we propose a perturbation-based procedure that first transforms the prefix into a semantic neighbor and then conditions on this perturbed variant for next-token prediction. This yields a hierarchical model with a pre-post-additive noise structure. Within this framework, we develop a rigorous theory of extrapolability, namely, the capacity of a model class to make reliable predictions for token sequences that lie outside the empirical support of the training corpus. We evaluate the finite-sample performance of the proposed procedure using both synthetic and real-world language data. Results show that the proposed method consistently improves out-of-support prediction while maintaining competitive in-support performance, demonstrating that perturbation offers a practical route to language modeling.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04344v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>math.ST</category>
      <category>stat.TH</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zetai Cen, Jin Zhu, Xinwei Shen, Chengchun Shi</dc:creator>
    </item>
    <item>
      <title>More on the Erd\H os--Kleitman problem on matchings in set families</title>
      <link>https://arxiv.org/abs/2605.04379</link>
      <description>arXiv:2605.04379v1 Announce Type: cross 
Abstract: Let $e(n,s)$ denote the maximum size of a family $\mathcal{F}$ of subsets of an $n$-element set that contains no $s$ pairwise disjoint members. In 1968, answering a question of Erd\H{o}s, Kleitman determined $e(sm-1,s)$ and $e(sm,s)$ for all integers $m,s\ge 1$. Half a century later, Frankl and Kupavskii determined $e(s(m+1)-\ell, s)$ for $\ell \leq \frac{s-3}{m+3}$. They showed that the corresponding extremal example is closely connected with the extremal example for the Erd\H{o}s Matching Conjecture, and conjectured that the same remains true for all $\ell \leq s/2$. In this paper, we prove an approximate version of their conjecture for $s\ge s_0(m)$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04379v1</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Andrey Kupavskii, Georgy Sokolov</dc:creator>
    </item>
    <item>
      <title>Causal discovery under mean independence and linearity</title>
      <link>https://arxiv.org/abs/2605.04381</link>
      <description>arXiv:2605.04381v1 Announce Type: cross 
Abstract: Causal discovery methods such as LiNGAM identify causal structure from observational data by assuming mutually independent disturbances. This assumption is fragile: shared volatility, common scale effects, or other forms of dependence can cause the methods to recover the wrong causal order, even with infinite data. We introduce the Linear Mean-Independent Acyclic Model (LiMIAM), which replaces full independence with weaker one-sided mean-independence restrictions on the disturbances. Under finite-order consequences of these restrictions, source nodes are generically identifiable, and hence a compatible causal order can be recovered recursively. Our proof is constructive and leads to DirectLiMIAM, a sequential residual-based algorithm for causal discovery under dependent noise. In simulations with mean-independent but dependent disturbances, DirectLiMIAM outperforms LiNGAM methods. A large-scale empirical application to the oil market highlights the implausibility of the independence assumption and the ability of DirectLiMIAM to recover a realistic causal ordering, from policy to production and from prices to inflation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04381v1</guid>
      <category>stat.ME</category>
      <category>cs.LG</category>
      <category>math.ST</category>
      <category>stat.ML</category>
      <category>stat.TH</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Geert Mesters, Alvaro Ribot, Anna Seigal, Piotr Zwiernik</dc:creator>
    </item>
    <item>
      <title>SpinTune: Improving the Reliability of Quantum Sensor Networks for Practical Quantum-Classical Utility</title>
      <link>https://arxiv.org/abs/2605.04416</link>
      <description>arXiv:2605.04416v1 Announce Type: cross 
Abstract: Emerging quantum sensors are increasingly envisioned as components of hybrid quantum-classical high-performance computing, enabling new capabilities in scientific, cyber-physical, and machine-learning pipelines. However, their practical utility is limited by environmental decoherence, which degrades sensing reliability. While dynamical decoupling (DD) pulse sequences can mitigate this, standard methods are often suboptimal in the presence of realistic noise. We present SpinTune, a reinforcement learning software approach that autonomously discovers adaptive, piecewise DD sequences tailored to specific environments. Using a simulation model of a Carbon-13 spin bath, we show that SpinTune significantly outperforms standard DD sequences in preserving coherence.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04416v1</guid>
      <category>quant-ph</category>
      <category>cs.ET</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jason Ludmir, Nicholas S. DiBrita, Jason Han, Tirthak Patel</dc:creator>
    </item>
    <item>
      <title>Dissociating spatial frequency reliance from adversarial robustness advantages in neurally guided deep convolutional neural networks</title>
      <link>https://arxiv.org/abs/2605.04443</link>
      <description>arXiv:2605.04443v1 Announce Type: cross 
Abstract: Deep convolutional neural networks (DCNNs) have rivaled humans on many visual tasks, yet they remain vulnerable to near-imperceptible perturbations generated by adversarial attacks. Recent work shows that aligning DCNN representations with human visual cortex activity improves adversarial robustness, but the mechanisms driving this advantage are unclear. One hypothesis suggests that neural alignment confers robustness by biasing models away from brittle high-frequency details and towards the low spatial frequencies (LSF). However, recent work shows that human object recognition critically depends on a narrow, mid-frequency "human channel". Interestingly, this band was partially preserved in prior LSF-focused studies. Here, we investigate whether a spectral bias towards the LSF or the human channel is the primary driver of the adversarial robustness observed in neurally aligned DCNNs. We first show that DCNNs aligned to higher-order regions of the human ventral visual stream systematically increase reliance on both LSF and the human channel. However, directly steering DCNNs towards these bands revealed a clear dissociation. Biasing models towards the human channel, either alone or together with LSF, does not improve robustness and even impairs it. LSF bias produced some robustness gains, but such improvements are modest despite inducing much larger shifts in spatial-frequency reliance than neurally aligned models. Spatial-frequency-biased models overall show little, if any, increase in similarity to human neural representational geometry. Together, our results suggest that altered spatial-frequency reliance is likely an emergent property of learning more human-like representations rather than the primary mechanism by which neural alignment confers adversarial robustness, and motivate the need for future research examining representational properties beyond spatial-frequency profiles.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04443v1</guid>
      <category>q-bio.NC</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhenan Shao, Tianyu Ren, Chengxiao Wang, Leyla Isik, Diane M. Beck</dc:creator>
    </item>
    <item>
      <title>The unique, universal entropy for complex systems</title>
      <link>https://arxiv.org/abs/2605.04493</link>
      <description>arXiv:2605.04493v1 Announce Type: cross 
Abstract: An axiomatic foundation regarding the entropy for complex systems is established. Missing from decades of research was the requirement that entropy must measure the uncertainty at the informational scale of the maximizing distribution, where the log-log slope equals $-1$. Additionally, entropy must be extensive across the full universality scaling classes defined by Hanel-Thurner. The coupled entropy, maximized by the coupled stretched exponential distributions, is proven to be the unique, universal entropy that satisfies these requirements. The non-additivity of the entropy is equal to the long-range dependence or nonlinear statistical coupling. The entropy-matched extensivity is a function of the coupling, stretching parameter, and dimensions. Evidence is provided that the Tsallis $q$-statistics creates misalignment in the physical modeling of complex systems. Information thermodynamic applications are reviewed, including measuring complexity, a zeroth law of temperature, the thermodynamic consistency of the coupled free energy, and a model of intelligence in non-equilibrium.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04493v1</guid>
      <category>cond-mat.stat-mech</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Kenric P. Nelson</dc:creator>
    </item>
    <item>
      <title>JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions</title>
      <link>https://arxiv.org/abs/2605.04505</link>
      <description>arXiv:2605.04505v1 Announce Type: cross 
Abstract: The rapid advancement of generative audio models has outpaced the development of robust evaluation methodologies. Existing objective metrics and general multimodal large language models (MLLMs) often struggle with domain generalization, zero-shot capabilities, and instructional flexibility. To address these bottlenecks, we propose JASTIN, a generalizable, instruction-driven audio evaluation framework that formulates audio assessment as a self-instructed reasoning task. JASTIN bridges a frozen high-performance audio encoder with a fine-tuned LLM backbone via a trainable audio adapter. To ensure robust zero-shot generalization, we introduce a comprehensive instruction following data preparation pipeline, incorporating Multi-Source, Multi-Task, Multi-Calibration, and Multi-Description data. Experimental results demonstrate that JASTIN achieves state-of-the-art Pearson and Spearman correlations with human subjective ratings. It consistently outperforms general MLLMs across speech, sound, music, and out-of-domain evaluation tasks without the need for task-specific retraining.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04505v1</guid>
      <category>eess.AS</category>
      <category>cs.AI</category>
      <category>cs.SD</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Leying Zhang, Bowen Shi, Haibin Wu, Bach Viet Do, Yanmin Qian</dc:creator>
    </item>
    <item>
      <title>Predictive and Prescriptive AI toward Optimizing Wildfire Suppression</title>
      <link>https://arxiv.org/abs/2605.04510</link>
      <description>arXiv:2605.04510v1 Announce Type: cross 
Abstract: Intense wildfire seasons require critical prioritization decisions to allocate scarce suppression resources over a dispersed geographical area. This paper develops a predictive and prescriptive approach to jointly optimize crew assignments and wildfire suppression. The problem features a discrete resource-allocation structure with endogenous wildfire demand and non-linear wildfire dynamics. We formulate an integer optimization model with crew assignments on a time-space-rest network, wildfire dynamics on a time-state network, and linking constraints between them. We develop a two-sided branch-and-price-and-cut algorithm based on: (i) a two-sided column generation scheme that generates fire suppression plans and crew routes iteratively; (ii) a new family of cuts exploiting the knapsack structure of the linking constraints; and (iii) novel branching rules to accommodate non-linear wildfire dynamics. We also propose a data-driven double machine learning approach to estimate wildfire spread as a function of covariate information and suppression efforts, mitigating observed confounding between historical crew assignments and wildfire growth. Extensive computational experiments show that the optimization algorithm scales to otherwise intractable real-world instances; and that the methodology can enhance suppression effectiveness in practice, resulting in significant reductions in area burned over a wildfire season and guiding resource sharing across wildfire jurisdictions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04510v1</guid>
      <category>math.OC</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Leonard Boussioux, Alexandre Jacquillat, Ryne Reger, Jacob Wachspress</dc:creator>
    </item>
    <item>
      <title>Online Riemannian Gradient Descent for Quantum State Tomography with Matrix Product Operators</title>
      <link>https://arxiv.org/abs/2605.04533</link>
      <description>arXiv:2605.04533v1 Announce Type: cross 
Abstract: Matrix product operators (MPOs) provide a scalable approach for quantum state tomography (QST) by offering a compact representation of many-body mixed states with limited entanglement, using only a number of parameters that scales polynomially with the system size. In this paper, we study QST for quantum density matrices that can be represented by MPOs. We first derive an equivalent characterization of Hermiticity in terms of the MPO core tensors and show that the coefficient tensor of an MPO under the Pauli or generalized Gell-Mann basis admits a real-valued low tensor-train (TT) rank structure. This establishes an explicit connection between MPO-based QST and noisy low-rank tensor completion. Motivated by this formulation, we develop an online Riemannian gradient descent (oRGD) algorithm that sequentially incorporates measurement data during the reconstruction process. With a proper initialization, we prove that oRGD converges linearly to the target MPO and succeeds with a number of distinct measurement settings that scales quadratically with the system size. As a byproduct, our analysis also yields a significantly improved sample complexity bound for the low TT rank tensor completion task. Furthermore, we propose a tailored spectral initialization method and establish its theoretical guarantee. Numerical experiments on several classes of quantum states validate the effectiveness and scalability of the proposed method.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04533v1</guid>
      <category>quant-ph</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jian-Feng Cai, Jingyang Li, Xiaoqun Zhang, Yuanwei Zhang</dc:creator>
    </item>
    <item>
      <title>Fundamental Limitations of Post-Quantum Cryptographic Architectures</title>
      <link>https://arxiv.org/abs/2605.04582</link>
      <description>arXiv:2605.04582v1 Announce Type: cross 
Abstract: Modern lattice-based cryptography, particularly the learning with errors paradigm, relies on injecting artificial noise to secure data against quantum adversaries. This study systematically examines the theoretical and physical boundaries of this noise-reliant model across four interconnected domains: computational complexity, information-theoretic thermodynamics, quantum error correction, and quantum learning theory. Starting from the algorithmic foundation, our analysis notes that these frameworks rely on provisional complexity-theoretic assumptions that remain vulnerable to future quantum algorithmic advancements. Furthermore, by translating this cryptographic mechanism into physical thermodynamics, we illustrate that intentionally injected discrete Gaussian noise does not equate to the permanent erasure of information. Because the structural integrity of the cryptographic secret remains preserved within the ciphertext, advanced quantum error correction protocols and quantum learning models can efficiently extract the underlying mathematical kernel. Ultimately, we suggest that while lattice-based cryptography provides a robust transitional alternative, definitively classifying these frameworks as unconditionally post-quantum represents a premature classification relying on transient physical bottlenecks rather than impenetrable theoretical boundaries.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04582v1</guid>
      <category>quant-ph</category>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiho Jung, Donghwa Ji, Mingyu Lee, Kabgyun Jeong</dc:creator>
    </item>
    <item>
      <title>Multiscale Euclidean Network Trajectories: Second-Moment Geometry, Attribution, and Change Points</title>
      <link>https://arxiv.org/abs/2605.04589</link>
      <description>arXiv:2605.04589v1 Announce Type: cross 
Abstract: A central challenge in dynamic network analysis is to represent temporal evolution in a way that is both geometrically meaningful and statistically identifiable. One approach embeds a sequence of network snapshots as trajectories in a Euclidean space and relates these trajectories to node embeddings. In multilayer and unfolded spectral constructions, however, node embeddings and their underlying latent positions are identifiable only up to general linear transformations. Although this ambiguity preserves edge probabilities, it can distort geometry and invalidate distance based temporal comparisons at both the trajectory and node-levels.
  We develop Multiscale Euclidean Network Trajectories (MENT), a framework for multiscale temporal trajectories based on second-moment geometry. By imposing an isotropic normalization on the anchor latent positions, we reduce the relevant ambiguity to orthogonal transformations and prevent distortion of the second-moment geometry. In this canonical representation, we define a trace variation distance and mode-wise variation distances along orthogonal directions, and use multidimensional scaling to obtain low-dimensional trajectories of time points at both global and mode-wise levels. The resulting trajectories support interpretation and inference. They admit mode-wise decompositions, support attribution of global and mode-wise temporal changes to nodes, and enable change point detection through 1D trajectories. We prove consistency of the proposed unfolded spectral embedding and of the induced temporal trajectories. Experiments on two synthetic and two real dynamic networks illustrate stable and interpretable recovery of temporal structure and show strong performance against existing change point detection baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04589v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>math.ST</category>
      <category>stat.TH</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haruka Ezoe, Ryohei Hisano</dc:creator>
    </item>
    <item>
      <title>Generative Quantum-inspired Kolmogorov-Arnold Eigensolver</title>
      <link>https://arxiv.org/abs/2605.04604</link>
      <description>arXiv:2605.04604v1 Announce Type: cross 
Abstract: High-performance computing (HPC) is increasingly important for scalable quantum chemistry workflows that couple classical generative models, quantum circuit simulation, and selected configuration interaction postprocessing. We present the generative quantum-inspired Kolmogorov-Arnold eigensolver (GQKAE), a parameter-efficient extension of the generative quantum eigensolver (GQE) for quantum chemistry. GQKAE replaces the parameter-heavy feed-forward network components in GPT-style generative eigensolvers with hybrid quantum-inspired Kolmogorov-Arnold network modules, forming a compact HQKANsformer backbone. The method preserves autoregressive operator selection and the quantum-selected configuration interaction evaluation pipeline, while using single-qubit DatA Re-Uploading ActivatioN modules to provide expressive nonlinear mappings. Numerical benchmarks on H4, N2, LiH, C2H6, H2O, and the H2O dimer show that GQKAE achieves chemical accuracy comparable to the GPT-based GQE architecture, while reducing trainable parameters and memory by approximately 66% and improving wall-time performance. For strongly correlated systems such as N2 and LiH, GQKAE also improves convergence behavior and final energy errors. These results indicate that quantum-inspired Kolmogorov-Arnold networks can reduce classical-side overhead while preserving circuit-generation quality, offering a scalable route for HPC-quantum co-design on near-term quantum platforms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04604v1</guid>
      <category>quant-ph</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Yu-Cheng Lin, Yu-Chao Hsu, I-Shan Tsai, Chun-Hua Lin, Kuo-Chung Peng, Jiun-Cheng Jiang, Yun-Yuan Wang, Tzung-Chi Huang, Tai-Yue Li, Kuan-Cheng Chen, Samuel Yen-Chi Chen, Nan-Yow Chen</dc:creator>
    </item>
    <item>
      <title>Continuations and Completeness in Proof-theoretic Semantics</title>
      <link>https://arxiv.org/abs/2605.04689</link>
      <description>arXiv:2605.04689v1 Announce Type: cross 
Abstract: This is a short paper about the relationship between logic and computation. More specifically, it is about a relationship between the completeness proof for intuitionistic propositional logic within the form of proof-theoretic semantics that is known as base-extension semantics and a fundamental idea from the theory of computation called continuation-passing semantics. The latter is explained herein both in terms of reduction in natural deduction and the lambda calculus and in terms of proof-search. The relationship between completeness and continuations is explored through an analysis of Sandqvist's proof of the completeness theorem as seen from the mathematical perspective of Kripke's and Heyting's semantics. Our analysis can be seen to reveal how syntactic representations of continuations embody intensional semantical intuitions about the relationship between their meaning and use. These intuitions are made precise using the tools of proof-theoretic semantics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04689v1</guid>
      <category>math.LO</category>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tao Gu, David Pym, Eike Ritter, Edmund Robinson</dc:creator>
    </item>
    <item>
      <title>Hamilton decompositions of all directed tori at odd modulus</title>
      <link>https://arxiv.org/abs/2605.04734</link>
      <description>arXiv:2605.04734v1 Announce Type: cross 
Abstract: Let $D_d(m) = \operatorname{Cay}((\mathbb{Z}/m\mathbb{Z})^d, \{e_0, \ldots, e_{d-1}\})$ be the directed Cartesian product of $d$ directed $m$-cycles. We prove that $D_d(m)$ admits a directed Hamilton decomposition for every dimension $d \geq 2$ and every odd modulus $m \geq 3$. The proof combines two new closure mechanisms with a small set of base dimensions. The high-modulus count branch handles every odd $d \geq 5$ and every odd $m \geq d$ via triangular prefix coordinates and a primitivity criterion controlled by gcd conditions on symbol counts. The base-tail modular-trade branch handles the complementary range $m &lt; d$ by decomposing a base multigraph into cylinders and scheduling active tail residues by local symbol trades; it yields the successor closure $b \mapsto 2b+1$ for $b \geq 5$. Together with multiplicative product closure, these reduce the all-dimensions theorem to the four base dimensions $d \in \{2, 3, 5, 7\}$. Dimensions $2$ and $3$ are proved here; dimensions $5$ and $7$ are imported from companion arXiv preprints.
  A Lean 4 formalization records the same all-dimensions endpoint. As an independent consequence, the dimensions $2$ and $3$ alone solve every odd $d \geq 29$, by a dyadic-triadic interval-hitting argument.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04734v1</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>SangHyun Park</dc:creator>
    </item>
    <item>
      <title>Data anonymization in the presence of outliers via invariant coordinate selection</title>
      <link>https://arxiv.org/abs/2605.04833</link>
      <description>arXiv:2605.04833v1 Announce Type: cross 
Abstract: Protecting confidential data while preserving utility is particularly challenging when data sets contain outlying observations. Existing latent space anonymization methods, such as spectral anonymization (SA), rely on principal component analysis (PCA) and may therefore be vulnerable to contamination. We investigate anonymization in the presence of outliers and propose ICSA, a robust alternative to SA based on invariant coordinate selection (ICS). By replacing the PCA transformation with ICS, the robustness of the anonymization procedure can be regulated through the choice of scatter matrices. Alongside the methodological development, we derive a theoretical result showing that SA fails under sufficiently influential outliers. To assess the practical implications of this result, we compare the privacy-utility trade-off of ICSA and SA through simulation experiments under varying contamination settings and outlier severities. Our findings indicate that implementations of ICSA based on robust scatter matrices achieve stronger privacy protection than SA, while typically maintaining comparable, and in some cases improved, utility. We further examine the empirical performance of the proposed method using a benchmark clinical data set, where ICSA demonstrates superior overall privacy-utility efficiency relative to SA. These results suggest that explicitly accounting for outliers can materially improve anonymization performance and that robust latent space transformations offer a promising direction for privacy-preserving statistical data release.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04833v1</guid>
      <category>stat.ME</category>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Katariina Perkonoja, Joni Virta</dc:creator>
    </item>
    <item>
      <title>PAIR-CI: Calibrated Conditional Independence Testing for Causal Discovery with Incomplete Data</title>
      <link>https://arxiv.org/abs/2605.04838</link>
      <description>arXiv:2605.04838v1 Announce Type: cross 
Abstract: The standard constraint-based paradigm for causal discovery with incomplete data -- impute first, test second -- is frequently miscalibrated: any consistent conditional independence (CI) test rejects a true null with probability approaching 1 when imputation error induces spurious conditional dependence. We introduce PAIR-CI, a nonparametric CI test that restores calibration by integrating multiple imputation directly into the inferential procedure via a paired permutation design. PAIR-CI compares cross-validated models that include and exclude the candidate variable while receiving the same imputed conditioning set, forcing imputation error to cancel in their loss difference rather than contaminate the test statistic. A provably consistent variance estimator jointly accounts for uncertainty arising from cross-validation and multiple imputation -- to our knowledge, the first formal unification of these two inferential frameworks. In simulations, existing imputation-based CI tests exhibit false positive rates of 28--45% when data are missing not at random (MNAR), whereas PAIR-CI averages below the nominal 5% level across data-generating processes and missingness mechanisms. These gains are largest in nonlinear settings and grow with causal graph size: when integrated into the PC algorithm, PAIR-CI reduces structural Hamming distance by 8% on 10-variable nonlinear graphs, 15% on 30-variable equivalents, and up to 44% on the 56-variable HAILFINDER network, with stable performance in all settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04838v1</guid>
      <category>stat.ME</category>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Thomas S. Robinson, Ranjit Lall</dc:creator>
    </item>
    <item>
      <title>W-state graphs: Structure and Algorithms</title>
      <link>https://arxiv.org/abs/2605.04855</link>
      <description>arXiv:2605.04855v1 Announce Type: cross 
Abstract: We study the class of edge-coloured graphs arising from the graph-theoretic representation of quantum photonic experiments that generate multipartite W-states. Abstracting away physical amplitudes and phases, we introduce W-state graphs: matching-covered graphs equipped with a half-edge 2-colouring such that every perfect matching contains exactly one bichromatic edge and every vertex is incident with a red half-edge. Our main contribution is a complete structural characterization of W-state graphs. We show that a graph is a W-state graph if and only if each of its 3-connected components is a W-cone, a simple and rigid building block defined by a universal vertex and a factor-critical base. This characterization implies that no W-state graph is simple and yields a recognition algorithm running as fast as verifying whether a graph is matching-covered. We also show that the natural generalization to Dicke states encounters a complexity barrier: verifying one of the two Dicke state conditions is itself coNP-complete, resolving an open problem of Vardi and Zhang [IJCAI 2023]. Our results place W-state graphs firmly within classical matching theory and precisely delineate the combinatorial structures capable of realizing idealized W-states in the experiment-graph framework.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04855v1</guid>
      <category>quant-ph</category>
      <category>cs.DM</category>
      <category>math.CO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Rishikesh Gajjala, Saurabh Ray, Dimitrios M. Thilikos</dc:creator>
    </item>
    <item>
      <title>Optimal Error Exponents for Composite Sequential Quantum Hypothesis Testing</title>
      <link>https://arxiv.org/abs/2605.04915</link>
      <description>arXiv:2605.04915v1 Announce Type: cross 
Abstract: We study the composite sequential quantum hypothesis testing (SQHT) problem, where the objective is to distinguish a null quantum state from a compact, convex set of alternative quantum states. We propose a mixture-sequential quantum probability ratio test that adaptively selects measurements based on the current mixture estimate of the alternative set, and stops upon the first threshold crossing of the mixture log-likelihood ratio. Under an expected sample size constraint, we show that our proposed adaptive strategy simultaneously achieves the optimal Type-I and (worst-case) Type-II error exponents. These exponents are characterized by the minimal measured relative entropies between the null state and the alternative set. We further establish a matching converse, thereby characterizing the optimal error exponent region. Finally, our results show that achieving vanishing error probabilities in composite SQHT requires an expected sample complexity at least as large as that of sequential testing between two fixed quantum states.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04915v1</guid>
      <category>quant-ph</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <category>math.ST</category>
      <category>stat.TH</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jacob Paul Simpson, Efstratios Palias, Sharu Theresa Jose</dc:creator>
    </item>
    <item>
      <title>Neural Discovery of Strichartz Extremizers</title>
      <link>https://arxiv.org/abs/2605.04918</link>
      <description>arXiv:2605.04918v1 Announce Type: cross 
Abstract: Strichartz inequalities are a cornerstone of the modern theory of dispersive PDEs, but their extremizers are known explicitly only in a handful of sharp cases. The non-convexity of the underlying functional makes the problem hard, and to our knowledge no systematic numerical attack has been attempted. We propose a simple neural-network-based pipeline that searches for extremizers as critical points of the Strichartz ratio, and apply it in three settings. First, on the Schr\"odinger group we recover the Gaussian extremizers of Foschi and Hundertmark--Zharnitsky in dimensions $d=1,2$ to within $10^{-3}$ relative error, with no analytical prior. Second, on $59$ further admissible pairs in $d=1$ where the answer is conjectural, the method consistently finds Gaussians, supporting the conjecture that Gaussians are the universal extremizers in the admissible range. Third, on the critical Airy--Strichartz inequality at $\gamma=1/q$, where existence is open, the optimization does not converge to any $L^2$ profile: instead, the iterates organize themselves as mKdV breathers $B(0,\cdot;\alpha,1,0,0)$ with growing internal frequency $\alpha$, and the discovered ratio approaches the Frank--Sabin universal lower bound $\widetilde A_{q,r}$ from below with a power-law gap $\sim\alpha^{-0.9}$. We confirm the same picture with an independent Hermite-basis ansatz. We propose a precise conjecture: the supremum equals $\widetilde A_{q,r}$ and is approached, but not attained, along the breather family. The pipeline thus serves both as a validator on known cases and as a discovery tool when no extremizer exists.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04918v1</guid>
      <category>math.AP</category>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Nicol\'as Valenzuela, Ricardo Freire, Claudio Mu\~noz</dc:creator>
    </item>
    <item>
      <title>423.7 + 426.5 Tb/s GMI Bi-Directional HCF Transmission</title>
      <link>https://arxiv.org/abs/2605.04924</link>
      <description>arXiv:2605.04924v1 Announce Type: cross 
Abstract: We demonstrate OESCL-band same-wavelength bi-directional transmission over 60 km HCF with 42.5 THz bandwidth, achieving GMIs comparable with the highest unidirectional SMF data-rates in both directions, with an aggregate of 423.7 + 426.5 Tb/s.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04924v1</guid>
      <category>eess.SP</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiaqian Yang, Romulo Aparecido, Eric Sillekens, Ronit Sohanpal, Mindaugas Jarmolovi\v{c}ius, Zelin Gan, Yang Hong, Morteza Kamalian-Kopae, Abdallah Ali, Shahab Bakhtiari Gorajoobi, Ruben S. Lu\'is, Daniele Orsuti, Aleksandr Donodin, Vitaly Mikhailov, Jiawei Luo, David J. DiGiovanni, Nicolas Fontaine, Lauren Dallachiesa, Mikael Mazur, Roland Ryf, Haoshuo Chen, David Neilson, Ian D. Phillips, Wladek Forysiak, Sergei K. Turitsyn, Hideaki Furukawa, Jamie Gaudette, David J. Richardson, Benjamin J. Puttnam, Robert I. Killey, Polina Bayvel</dc:creator>
    </item>
    <item>
      <title>Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift</title>
      <link>https://arxiv.org/abs/2605.04932</link>
      <description>arXiv:2605.04932v1 Announce Type: cross 
Abstract: We study long-horizon deployment of a frozen predictor under dynamic covariate shift. A time-domain Poincar\'e inequality reduces temporal risk volatility to derivative energy, and a Jacobian-velocity theorem identifies directional tangent energy along the deployment path as the governing quantity under explicit along-path regularity and domination assumptions. Under low-rank drift, that quantity reduces to directional Jacobian energy in the drift subspace, motivating drift-aligned tangent regularization (DTR) and a matched monitoring proxy. Rather than smoothing the network isotropically, DTR penalizes sensitivity only along estimated drift directions. We validate the theorem-to-method pipeline in four experiments: a synthetic benchmark for the time-domain inequality, a controlled synthetic comparison against isotropic Jacobian regularization, and two frozen-deployment studies on the UCI Air Quality and Tetouan power-consumption datasets. DTR reduces risk volatility and directional gain in the controlled low-rank regime, beats isotropic smoothing there, and gives validation-selected deployment gains on both real datasets when the Air Quality drift subspace is estimated from target-orthogonal sensor motion. Moderate drift-subspace misspecification is tolerable while orthogonal misspecification largely removes the benefit.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04932v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jonathan R. Landers</dc:creator>
    </item>
    <item>
      <title>Fast Full-Wave Simulation of Indoor RSS Maps for Pre-Measurement Validation in Device-Free Localization</title>
      <link>https://arxiv.org/abs/2605.04958</link>
      <description>arXiv:2605.04958v1 Announce Type: cross 
Abstract: Human localization is gaining momentum in security, healthcare, logistics, and smart spaces applications. While global navigation systems are unreliable indoor, device-free (a.k.a. passive) localization methods that exploit human-induced perturbations of radio propagation can be effectively used. This paper investigates the use of a compact full-wave electromagnetic (EM) setup as a fast and reliable tool to simulate indoor Wi-Fi propagation for human sensing. The goal is to provide a practical baseline for validating simplified propagation models, such as diffraction-based descriptions, and to reduce the need for costly measurement campaigns. Two-dimensional attenuation maps from received signal strength are generated and compared in controlled environments, focusing on attenuation statistics and interference patterns. The simulations reproduce the main spatial features, though discrepancies remain due to simplified material characterization. Diffraction-aware refinements are proposed to mitigate these effects. Overall, the approach provides an efficient pre-measurement reference to support device-free system design and to guide experimental planning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04958v1</guid>
      <category>eess.SP</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Federica Fieramosca, Anastasia Maiolli, Alexander H. Paulus, Stefano Savazzi, Michele D'Amico</dc:creator>
    </item>
    <item>
      <title>Matchings in permutations</title>
      <link>https://arxiv.org/abs/2605.04987</link>
      <description>arXiv:2605.04987v1 Announce Type: cross 
Abstract: We say that two permutations $[n]\to [n]$ intersect if they map some element $x$ to the same element $y$. A matching in a family of permutations is a collection of pairwise disjoint permutations. In this paper, we study families of permutations with no matchings of size $s$. In particular, we obtain a characterization of the largest $s$-matching-free families and a Hilton--Milner type result. We also obtain results for the families of derangements.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04987v1</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Eduard Inozemtsev, Dmitrii Kolupaev, Andrey Kupavskii</dc:creator>
    </item>
    <item>
      <title>Scalable inference of spatial regions and temporal signatures from time series</title>
      <link>https://arxiv.org/abs/2605.05008</link>
      <description>arXiv:2605.05008v1 Announce Type: cross 
Abstract: Regionalization aims to partition a spatial domain into contiguous regions that share similar characteristics, enabling more effective spatial analysis, policy making, and resource management. Existing approaches for spatial regionalization typically rely on static spatial snapshots rather than evolving time series. Meanwhile, most time series clustering methods ignore spatial structure or enforce spatial continuity through ad hoc regularization, constraining the number of inferred regions a priori either explicitly or implicitly. Utilizing the minimum description length principle from information theory, here we propose an efficient and fully nonparametric framework for the regionalization of spatial time series. Our method jointly infers a spatial partition along with a set of representative time series archetypes ("drivers") that best compress a spatiotemporal dataset, with a runtime log-linear in the number of time series. We demonstrate that this method can accurately recover planted regional structure and drivers in synthetic time series, and can extract meaningful structural regularities in large-scale empirical air quality and vegetation index records. Our method provides a principled and scalable framework for spatially contiguous partitioning, allowing interpretable temporal patterns and homogeneous regions to emerge directly from the data itself.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05008v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>cs.SI</category>
      <category>physics.soc-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiayu Weng, Alec Kirkley</dc:creator>
    </item>
    <item>
      <title>Hypergraph Generation via Structured Stochastic Diffusion</title>
      <link>https://arxiv.org/abs/2605.05024</link>
      <description>arXiv:2605.05024v1 Announce Type: cross 
Abstract: Hypergraphs model higher-order interactions, but realistic hypergraph generation remains difficult because incidence, hyperedge-size heterogeneity, and overlap structure are not faithfully captured by pairwise reductions. We propose \HEDGE, a generative model defined directly on relaxed incidence matrices via a structured stochastic diffusion. The forward process combines a hypergraph-specific two-sided heat operator with an Ornstein--Uhlenbeck component, preserving structure-aware noising near the data while yielding an explicit Gaussian terminal law. Conditional on an observed hypergraph, this forward process is linear-Gaussian, so conditional means, covariances, scores, and reverse-drift targets are available in closed form. We therefore learn a permutation-equivariant state-only reverse-drift field in incidence space by regressing onto exact conditional targets, and generate samples by simulating a learned reverse-time SDE from the Gaussian base law. We establish exactness in the ideal state-only setting together with finite-horizon stability guarantees, and empirically show improved hypergraph generation quality relative to strong baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05024v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>stat.CO</category>
      <category>stat.ME</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Christopher Nemeth</dc:creator>
    </item>
    <item>
      <title>Block Permutation Routing on Ramanujan Hypergraphs for Fault-Tolerant Quantum Computing</title>
      <link>https://arxiv.org/abs/2605.05036</link>
      <description>arXiv:2605.05036v1 Announce Type: cross 
Abstract: We analyze permutation routing of rigid blocks representing surface code patches of $d_C^2$ atoms on a reconfigurable lattice with hypergraph transformations. For a hypergraph $H$, code distance $d_C$, $s=d_C^2$, number of blocks $N_L$, and guard distance $g$, we show the block routing number $\mathrm{rt}_B(H, s, g) = \Theta(d_C \log N_L)$. A spectral analysis of the quotient graph $Q(G_{\mathrm{cl}}(H), B)$ (blocks as supervertices) shows that the spectral ratio $\beta_Q &lt; 1$ is preserved in the high-connectivity regime. Negative association of block permutations and congestion bounds are used for random intermediate configurations. Serialization establishes that each quotient routing phase requires $O(d_C)$ physical sub-steps due to the block footprint width. A lower bound $\mathrm{rt}_B = \Omega(d_C \log N_L)$ follows from combining the spectral lower bound on quotient phases with the traversal cost per phase. We include error model analysis grounded in recent experimental results, syndrome extraction protocols (stop-and-correct, rolling active fault-tolerant (AFT) measurement, and adaptive deformation), and integration with lattice surgery compilation via the Litinski protocol. Composition with the correlated-decoding scheme reduces syndrome-extraction overhead from $O(d_C)$ to $O(1)$ per correction window, leaving routing as the leading-order contributor to the integrated $O(d_C \log N_L)$ depth. Spectral inheritance is organized in a hierarchy: exact (Haemers interlacing on equitable partitions), perturbative (Weyl bounds for near-equitable partitions, a practically relevant case for surface-code patches), and universal (higher-order Cheeger). Methods extend directly to QCCD trapped-ion architectures under the same regime condition, with junction crossings replacing AOD transports as the elementary single-hop translation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05036v1</guid>
      <category>quant-ph</category>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Joshua M. Courtney</dc:creator>
    </item>
    <item>
      <title>External Validation of Deep Learning Models for BI-RADS Breast Density Prediction from Ultrasound Images</title>
      <link>https://arxiv.org/abs/2605.05082</link>
      <description>arXiv:2605.05082v1 Announce Type: cross 
Abstract: We externally validated three deep learning models (DenseNet121, ViT-B/32, and ResNet50) for predicting mammographic breast density from breast ultrasound exams on an independent cohort. The external validation set comprised 2,000 ultrasound exams, including 500 cancer cases defined by an initial negative exam (BI-RADS 1 or 2) followed by a cancer diagnosis within 6 months to 10 years, and 1,500 negative controls matched by manufacturer and study year. Performance was measured using patient-level AUROC across four density categories: A (fatty), B (scattered), C (heterogeneous), and D (extremely dense). As a downstream assessment, we also evaluated 10-year risk prediction by incorporating age and AI-derived density into the Tyrer-Cuzick model and comparing performance against a reference model using age and mammography-reported density. All three models performed best in extremely dense breasts (AUROC 0.868-0.899), with strong performance in fatty (0.814-0.838) and scattered density (0.764-0.799), and lower performance in heterogeneously dense breasts (0.699-0.729). DenseNet121 achieved the highest overall performance (micro-averaged AUROC 0.885), and performance across categories was comparable between internal and external testing. For risk modeling, age combined with AI-derived density yielded a lower AUROC than age combined with mammography-reported density (0.541 vs. 0.570; p = 0.23), with no statistically significant difference. These findings indicate that deep learning models generalize well to external data with different racial composition for breast density assessment. While performance is strongest in extremely dense breasts, heterogeneously dense remains more challenging, highlighting the need for targeted optimization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05082v1</guid>
      <category>eess.IV</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuxuan Chen, Arianna Bunnell, Yanqi Xu, Haoyan Yang, Thomas K. Wolfgruber, John A. Shepherd, Yiqiu Shen</dc:creator>
    </item>
    <item>
      <title>Think-Aloud Reshapes Automated Cognitive Model Discovery Beyond Behavior</title>
      <link>https://arxiv.org/abs/2605.05091</link>
      <description>arXiv:2605.05091v1 Announce Type: cross 
Abstract: Computational cognitive models discovered using large language models have so far relied solely on behavioral data. However, it is well-known that models produced from the behavioral trajectory alone are typically under-determined. In this work, we explore the use of Think Aloud traces as an additional form of data constraint during automated model discovery. When applied to the domain of risky decision-making, we find that the models discovered with think-aloud achieve significantly improved predictive performance on held-out data. Additionally, we find that the discovered models belong to different structural classes than those discovered from behavior alone for the majority of participants (69.4\%), specifically, it shifts from Explicit comparator towards Integrated utility. These results suggest that process-level language data not only improve model fit, but also systematically reshape the structure of the discovered cognitive models, enabling the identification of mechanisms that are not recoverable from behavior alone.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05091v1</guid>
      <category>q-bio.NC</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hanbo Xie, Akshay K. Jagadish, Lan Pan, Robert C. Wilson</dc:creator>
    </item>
    <item>
      <title>Proximal Projection for Doubly Sparse Regularized Models</title>
      <link>https://arxiv.org/abs/2605.05093</link>
      <description>arXiv:2605.05093v1 Announce Type: cross 
Abstract: Regularization is often used in high-dimensional regression settings to generate a sparse model, which can save tremendous computing resources and identify predictors that are most strongly associated with the response. When the predictors can be represented by a Gaussian graphical model, the structure of the predictor graph can be exploited during regularization. Our proposed model exploits this underlying predictor graph structure by decomposing the estimated coefficient vector into a sum of latent variables that correspond to the sum of each node contribution to the coefficient vector. Regularization is then performed on the latent variables rather than on the coefficient vector directly. We use a penalty function that permits a clear user-defined trade-off between the L1 and L2 penalties and propose a novel proximal projection during optimization. Further, our implementation computes the projection operator for the intersection of selected groups, which conserves more computing resources compared to predictor duplication methods, especially for high-dimensional data. Through simulation, we evaluate the performance of our approach under different graph structures and node counts, and present results on real-world data. Results suggest that our method exhibits stable performance relative to other singly or doubly sparse graphical regression models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05093v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>stat.CO</category>
      <category>stat.ME</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jia Wei He, R. Ayesha Ali, Gerarda Darlington</dc:creator>
    </item>
    <item>
      <title>Building informative materials datasets beyond targeted objectives</title>
      <link>https://arxiv.org/abs/2605.05104</link>
      <description>arXiv:2605.05104v1 Announce Type: cross 
Abstract: Materials science data collection can be expensive, making the reuse and long-term utility of datasets critical important for future discovery campaigns. In practice, researchers prioritize a subset of properties due to research interests. However, ignoring a subset of outcomes in data collection campaigns potentially generate datasets poorly suited for future learning tasks. Here, we present a framework for dataset construction that maximizes informativeness for target properties of interest while preserving performance on untargeted ones. Our approach uses diversity-aware selection to ensure broad coverage of the materials space. In noisy experimental dataset construction, we find that without our diversity-aware framework, prediction performance on untargeted properties can degrade by up to 40% relative to random sampling, whereas applying our framework yields improvements of up to 10% . For targeted properties, performance can degrade with respect to random sampling by up to 12.5% without diversity, while our framework achieves gains of up to 25%. Incorporating diversity into dataset construction not only preserves informativeness for the targeted properties, but also improves materials coverage for potential future objectives. As a result, the constructed datasets remain broadly informative across considered and unconsidered outcomes, ensuring unbiased quality entries and mitigating cold-start limitations in subsequent modeling and discovery campaigns.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05104v1</guid>
      <category>cond-mat.mtrl-sci</category>
      <category>cs.AI</category>
      <category>cs.DB</category>
      <category>cs.LG</category>
      <category>stat.AP</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Rafael Espinosa Casta\~neda, Ashley Dale, Hongchen Wang, Yonatan Kurniawan, Hao Wan, Runze Zhang, Adji Bousso Dieng, Kangming Li, Jason Hattrick-Simpers</dc:creator>
    </item>
    <item>
      <title>A Factor-Graph Formulation of CSS Syndrome Decoding: Joint BP and Four-State BP</title>
      <link>https://arxiv.org/abs/2605.05132</link>
      <description>arXiv:2605.05132v1 Announce Type: cross 
Abstract: For CSS syndrome decoding, the two check matrices impose binary parity-check constraints on the two Pauli error components. The posterior can therefore be written as a binary factor graph with two Tanner graphs coupled by the local joint prior at each qubit. We call the sum-product algorithm on this factorization joint belief propagation (joint BP). Joint BP retains the local channel correlation between the two Pauli components. This note compares joint BP with the four-state Pauli-label factor graph used for four-state BP. The two algorithms are shown to have the same posterior weights, messages, and beliefs after relabeling the four local Pauli states and marginalizing the irrelevant binary component.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05132v1</guid>
      <category>quant-ph</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kenta Kasai</dc:creator>
    </item>
    <item>
      <title>MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge</title>
      <link>https://arxiv.org/abs/2605.05175</link>
      <description>arXiv:2605.05175v1 Announce Type: cross 
Abstract: Background: Existing MRI LLM benchmarks rely mainly on review-book multiple-choice questions, where top proprietary models already score highly, limiting discrimination. No systematic benchmark has evaluated vendor-specific scanner operational knowledge central to research MRI practice. Purpose: We developed MRI-Eval, a tiered benchmark for relative model comparison on MRI physics and GE scanner operations knowledge using primary multiple-choice questions (MCQ), with stem-only and primed diagnostic conditions as complementary analyses. Methods: MRI-Eval includes 1365 scored items across nine categories and three difficulty tiers from textbooks, GE scanner manuals, programming course materials, and expert-generated questions. Five model families were evaluated (GPT-5.4, Claude Opus 4.6, Claude Sonnet 4.6, Gemini 2.5 Pro, Llama 3.3 70B). MCQ was primary; stem-only removed options and used an independent LLM judge; primed stem-only tested responses to incorrect user claims. Results: Overall MCQ accuracy was 93.2% to 97.1%. GE scanner operations was the lowest category for every model (88.2% to 94.6%). In stem-only, frontier-model accuracy fell to 58.4% to 61.1%, and Llama 3.3 70B fell to 37.1%; GE scanner operations stem-only accuracy was 13.8% to 29.8%. Conclusion: High MCQ performance can mask weak free-text recall, especially for vendor-specific operational knowledge. MRI-Eval is most informative as a relative comparison benchmark rather than an absolute competency measure and supports caution in using raw LLM outputs for GE-specific protocol guidance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05175v1</guid>
      <category>eess.IV</category>
      <category>cs.CL</category>
      <category>physics.med-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Perry E. Radau</dc:creator>
    </item>
    <item>
      <title>Numerical study of the 2D Kaup-Broer-Kuperschmidt Boussinesq system</title>
      <link>https://arxiv.org/abs/2605.05183</link>
      <description>arXiv:2605.05183v1 Announce Type: cross 
Abstract: In this work we consider the well posed version of the Kaup-Broer-Kuperschmidt system in two dimensions. We numerically construct soliton type solutions and show that they are unstable both against dispersion and singularity formation. Further, we study line solitons and their stability, as well as generally localised initial data. In either case we fail to find stable structures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05183v1</guid>
      <category>math.AP</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Th\'eo Gaudry, Christian Klein, Jean-Claude Saut, Nikola Stoilov</dc:creator>
    </item>
    <item>
      <title>Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval</title>
      <link>https://arxiv.org/abs/2605.05189</link>
      <description>arXiv:2605.05189v1 Announce Type: cross 
Abstract: How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale $d^2\asymp n\log n$. We prove that the correlation matrix memory construction, which stores associations by superposing key-target outer products, achieves this scale through a sharp phase transition, and that the same scaling is necessary for any linear memory. Thus the logarithm is the intrinsic extreme-value price of winner-take-all decoding.
  We next consider listwise retrieval, where the correct target need not be the unique top-scoring item but should remain among the strongest candidates. To formalize this regime, we propose the Tail-Average Margin (TAM), a convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list. Under this listwise retrieval criterion, the capacity follows the quadratic scale $d^2\asymp n$. At load $n/d^2\to\alpha$, we develop an exact asymptotic theory for the TAM empirical-risk minimizer through a two-parameter scalar variational principle. The theory has a rich phenomenology: in the ridgeless limit it yields a closed-form critical load separating satisfiable and unsatisfiable phases, and it predicts the limiting laws of true scores, competitor scores, margins, and percentile profiles. Finally, a small-tail extrapolation further leads to the conjectural sharp top-1 threshold $d^2\sim 2n\log n$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05189v1</guid>
      <category>stat.ML</category>
      <category>cs.IT</category>
      <category>cs.LG</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nicholas Barnfield, Juno Kim, Eshaan Nichani, Jason D. Lee, Yue M. Lu</dc:creator>
    </item>
    <item>
      <title>Almost-Orthogonality in Lp Spaces: A Case Study with Grok</title>
      <link>https://arxiv.org/abs/2605.05192</link>
      <description>arXiv:2605.05192v1 Announce Type: cross 
Abstract: Carbery proposed the following sharpened form of triangle inequality for many functions: for any $p\ge 2$ and any finite sequence $(f_j)_j\subset L^p$ we have \[ \Big\|\sum_j f_j\Big\|_p \ \le\ \left(\sup_{j} \sum_{k} \alpha_{jk}^{\,c}\right)^{1/p'} \Big(\sum_j \|f_j\|_p^p\Big)^{1/p}, \] where $c=2$, $1/p+1/p'=1$, and $\alpha_{jk}=\sqrt{\frac{\|f_{j}f_{k}\|_{p/2}}{\|f_{j}\|_{p}\|f_{k}\|_{p}}}$. In the first part of this paper we construct a counterexample showing that this inequality fails for every $p&gt;2$. We then prove that if an estimate of the above form holds, the exponent must satisfy $c\le p'$. Finally, at the critical exponent $c=p'$, we establish the inequality for all integer values $p\ge 2$.
  In the second part of the paper we obtain a sharp three-function bound \[ \Big\|\sum_{j=1}^{3} f_j\Big\|_p \ \le\ \left(1+2\Gamma^{c(p)}\right)^{1/p'} \Big(\sum_{j=1}^{3} \|f_j\|_p^p\Big)^{1/p}, \] where $p \geq 3$, $c(p) = \frac{2\ln(2)}{(p-2)\ln(3)+2\ln(2)}$ and $\Gamma=\Gamma(f_1,f_2,f_3)\in[0,1]$ quantifies the degree of orthogonality among $f_1,f_2,f_3$. The exponent $c(p)$ is optimal, and improves upon the power $r(p) = \frac{6}{5p-4}$ obtained previously by Carlen, Frank, and Lieb. Some intermediate lemmas and inequalities appearing in this work were explored with the assistance of the large language model Grok.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05192v1</guid>
      <category>math.CA</category>
      <category>cs.AI</category>
      <category>math.CO</category>
      <category>math.PR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ziang Chen, Jaume de Dios Pont, Paata Ivanisvili, Jose Madrid, Haozhu Wang</dc:creator>
    </item>
    <item>
      <title>Grokability in five inequalities</title>
      <link>https://arxiv.org/abs/2605.05193</link>
      <description>arXiv:2605.05193v1 Announce Type: cross 
Abstract: In this note, we report five mathematical discoveries made in collaboration with Grok, all of which have been subsequently verified by the authors. These include an improved lower bound on the maximal Gaussian perimeter of convex sets in $\mathbb{R}^n$, sharper $L_2$-$L_1$ moment comparison inequalities on the Hamming cube $\{-1,1\}^n$, a strengthened autoconvolution inequality, improved asymptotic bounds on the size of the largest $g$-Sidon sets in $\{1,\dots,n\}$, and an optimal balanced Szarek's inequality.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05193v1</guid>
      <category>math.PR</category>
      <category>cs.AI</category>
      <category>math.AP</category>
      <category>math.CA</category>
      <category>math.FA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Paata Ivanisvili, Xinyuan Xie</dc:creator>
    </item>
    <item>
      <title>S-LCG: Structured Linear Congruential Generator-Based Deterministic Algorithm for Search and Optimization</title>
      <link>https://arxiv.org/abs/2605.05198</link>
      <description>arXiv:2605.05198v1 Announce Type: cross 
Abstract: This study presents a novel deterministic optimization algorithm based on a special variant of the Linear Congruential Generator (LCG). While conventional algorithms generally operate within the search space, the introduced technique follows a two-level architecture. In particular, an external loop that adaptively balances between exploration and exploitation, while the internal loop evaluates solutions. It is motivated by the intrinsic structure of the generator, the reason behind naming it the Structured Linear Congruential Generator (S- LCG). which enjoys a number of unique characteristics as follows: 1) a memoryless scheme, which ensures non-overlapping sequences based on distinct seeds, thus ensuring no evaluation redundancy; 2) bit splitting representation, which converts LCG states into multi-dimensional points to overcome the Marsaglia lattice effect; 3) adaptive exploration-exploitation of the generator space, which leads to implicit optimization of the surrogate smooth objective function; and 4) constant information gathering speed to avoid the problem of premature convergence. Extensive testing on 26 benchmark functions across dimensions d = 2 to 30 demonstrates that S-LCG comes within 1% of the global optimum in 83.3% of 138 cases (100% at d = 2, 81.2% at d = 30) while the nearest competitor GA achieved 75.4%. Statistical validation shows that S-LCG outperforms eight cutting-edge binary algorithms. Furthermore, its practical value is confirmed by validation on three constrained engineering design problems. In the end, S-LCG offers an optimization framework that is strictly reproducible and requires only one sensitive parameter to be tuned.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05198v1</guid>
      <category>math.OC</category>
      <category>cs.NE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ahmed Qasim Mohammed, Haider Banka, Anamika Singh</dc:creator>
    </item>
    <item>
      <title>Syntax and Semantics of Linear Dependent Types</title>
      <link>https://arxiv.org/abs/1405.0033</link>
      <description>arXiv:1405.0033v5 Announce Type: replace 
Abstract: A type theory is presented that combines (intuitionistic) linear types with type dependency, thus properly generalising both intuitionistic dependent type theory and full linear logic. A syntax and complete categorical semantics are developed, the latter in terms of (strict) indexed symmetric monoidal categories with comprehension. Various optional type formers are treated in a modular way. In particular, we will see that the historically much-debated multiplicative quantifiers and identity types arise naturally from categorical considerations. These new multiplicative connectives are further characterised by several identities relating them to the usual connectives from dependent type theory and linear logic. Finally, one important class of models, given by families with values in some symmetric monoidal category, is investigated in detail.</description>
      <guid isPermaLink="false">oai:arXiv.org:1405.0033v5</guid>
      <category>cs.LO</category>
      <category>cs.PL</category>
      <category>math.CT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Matthijs V\'ak\'ar</dc:creator>
    </item>
    <item>
      <title>A Categorical Semantics for Linear Logical Frameworks</title>
      <link>https://arxiv.org/abs/1501.05016</link>
      <description>arXiv:1501.05016v2 Announce Type: replace 
Abstract: A type theory is presented that combines (intuitionistic) linear types with type dependency, thus properly generalising both intuitionistic dependent type theory and full linear logic. A syntax and complete categorical semantics are developed, the latter in terms of (strict) indexed symmetric monoidal categories with comprehension. Various optional type formers are treated in a modular way. In particular, we see that the historically much-debated multiplicative quantifiers and identity types arise naturally from categorical considerations. These new multiplicative connectives are further characterised by several identities relating them to the usual connectives from dependent type theory and linear logic. Finally, one important class of models, given by families with values in some symmetric monoidal category, is investigated in detail.</description>
      <guid isPermaLink="false">oai:arXiv.org:1501.05016v2</guid>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Matthijs V\'ak\'ar</dc:creator>
    </item>
    <item>
      <title>A block Recycled GMRES method with investigations into aspects of solver performance</title>
      <link>https://arxiv.org/abs/1604.01713</link>
      <description>arXiv:1604.01713v3 Announce Type: replace 
Abstract: We propose a block Krylov subspace version of the GCRO-DR method proposed in [Parks et al.; SISC 2005], which is an iterative method allowing for the efficient minimization of the the residual over an augmented Krylov subspace. We offer a clean derivation of our proposed method and discuss methods of selecting recycling subspaces at restart as well as implementation decisions in the context of high-performance computing. Numerical experiments are split into those demonstrating convergence properties and those demonstrating the data movement and cache efficiencies of the dominant operations of the method, measured using processor monitoring code from Intel.</description>
      <guid isPermaLink="false">oai:arXiv.org:1604.01713v3</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Michael L. Parks, Kirk M. Soodhalter, Daniel B. Szyld</dc:creator>
    </item>
    <item>
      <title>Deterministic Mincut in Almost-Linear Time</title>
      <link>https://arxiv.org/abs/2106.05513</link>
      <description>arXiv:2106.05513v2 Announce Type: replace 
Abstract: We present a deterministic (global) mincut algorithm for weighted, undirected graphs that runs in $m^{1+o(1)}$ time, answering an open question of Karger from the 1990s. To obtain our result, we de-randomize the construction of the \emph{skeleton} graph in Karger's near-linear time mincut algorithm, which is its only randomized component. In particular, we partially de-randomize the well-known Benczur-Karger graph sparsification technique by random sampling, which we accomplish by the method of pessimistic estimators. Our main technical component is designing an efficient pessimistic estimator to capture the cuts of a graph, which involves harnessing the expander decomposition framework introduced in recent work by Goranci et al. (SODA 2021). As a side-effect, we obtain a structural representation of all approximate mincuts in a graph, which may have future applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2106.05513v2</guid>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jason Li</dc:creator>
    </item>
    <item>
      <title>Quantifying Harm</title>
      <link>https://arxiv.org/abs/2209.15111</link>
      <description>arXiv:2209.15111v3 Announce Type: replace 
Abstract: In earlier work we defined a qualitative notion of harm: either harm is caused, or it is not. For practical applications, we often need to quantify harm; for example, we may want to choose the least harmful of a set of possible interventions. In this work, which is an expanded version of an earlier conference paper, we develop a quantitative notion of harm. We first present a quantitative definition of harm in a deterministic context involving a single individual, then we consider the issues involved in dealing with uncertainty regarding the context and going from a notion of harm for a single individual to a notion of "societal harm", which involves aggregating the harm to individuals. We show that the "obvious" way of doing this (just taking the expected harm for an individual and then summing the expected harm over all individuals) can lead to counterintuitive or inappropriate answers, and discuss alternatives, drawing on work from the decision-theory literature. Finally, we connect our work to a recent debate over harm within the context of precision medicine.</description>
      <guid isPermaLink="false">oai:arXiv.org:2209.15111v3</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sander Beckers, Hana Chockler, Joseph Y. Halpern</dc:creator>
    </item>
    <item>
      <title>Algebraic Semantics of Datalog with Equality</title>
      <link>https://arxiv.org/abs/2302.03167</link>
      <description>arXiv:2302.03167v3 Announce Type: replace 
Abstract: We discuss the syntax and semantics of relational Horn logic (RHL) and partial Horn logic (PHL). RHL is an extension of the Datalog programming language that allows introducing and equating variables in conclusions. PHL is a syntactic extension of RHL by partial functions and one of the many equivalent notions of essentially algebraic theory.
  Our main contribution is a new construction of free models. We associate to RHL and PHL sequents classifying morphisms, which enable us to characterize logical satisfaction using lifting properties. We then obtain free and weakly free models using the small object argument. The small object argument can be understood as an abstract generalization of Datalog evaluation. It underpins the implementation of the Eqlog Datalog engine, which computes free models of PHL theories.</description>
      <guid isPermaLink="false">oai:arXiv.org:2302.03167v3</guid>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Martin E. Bidlingmaier</dc:creator>
    </item>
    <item>
      <title>RealLiFe: Real-Time Light Field Reconstruction via Hierarchical Sparse Gradient Descent</title>
      <link>https://arxiv.org/abs/2307.03017</link>
      <description>arXiv:2307.03017v5 Announce Type: replace 
Abstract: With the rise of Extended Reality (XR) technology, there is a growing need for real-time light field reconstruction from sparse view inputs. Existing methods can be classified into offline techniques, which can generate high-quality novel views but at the cost of long inference/training time, and online methods, which either lack generalizability or produce unsatisfactory results. However, we have observed that the intrinsic sparse manifold of Multi-plane Images (MPI) enables a significant acceleration of light field reconstruction while maintaining rendering quality.Based on this insight, we introduce \textbf{RealLiFe}, a novel light field optimization method, which leverages the proposed Hierarchical Sparse Gradient Descent (HSGD) to produce high-quality light fields from sparse input images in real time. Technically, the coarse MPI of a scene is first generated using a 3D CNN, and it is further optimized leveraging only the scene content aligned sparse MPI gradients in a few iterations. Extensive experiments demonstrate that our method achieves comparable visual quality while being 100x faster on average than state-of-the-art offline methods and delivers better performance (about 2 dB higher in PSNR) compared to other online approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2307.03017v5</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yijie Deng, Lei Han, Tianpeng Lin, Lin Li, Jinzhi Zhang, Lu Fang</dc:creator>
    </item>
    <item>
      <title>Three Hardness Results for Graph Similarity Problems</title>
      <link>https://arxiv.org/abs/2309.03810</link>
      <description>arXiv:2309.03810v2 Announce Type: replace 
Abstract: Notions of graph similarity provide alternative perspective on the graph isomorphism problem and vice-versa. In this paper, we consider measures of similarity arising from mismatch norms as studied in Gervens and Grohe: the edit distance $\delta_{\mathcal{E}}$, and the metrics arising from $\ell_p$-operator norms, which we denote by $\delta_p$ and $\delta_{|p|}$. We address the following question: can these measures of similarity be used to design polynomial-time approximation algorithms for graph isomorphism? We show that computing an optimal value of $\delta_{\mathcal{E}}$ is \NP-hard on pairs of graphs with the same number of edges. In addition, we show that computing optimal values of $\delta_p$ and $\delta_{|p|}$ is \NP-hard even on pairs of $1$-planar graphs with the same degree sequence and bounded degree. These two results improve on previous known ones, which did not examine the restricted case where the pairs of graphs are required to have the same number of edges.
  Finally, we study similarity problems on strongly regular graphs and prove some near optimal inequalities with interesting consequences on the computational complexity of graph and group isomorphism.</description>
      <guid isPermaLink="false">oai:arXiv.org:2309.03810v2</guid>
      <category>cs.DM</category>
      <category>cs.CC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>He Sun, Danny Vagnozzi</dc:creator>
    </item>
    <item>
      <title>MalPurifier: Enhancing Android Malware Detection with Adversarial Purification against Evasion Attacks</title>
      <link>https://arxiv.org/abs/2312.06423</link>
      <description>arXiv:2312.06423v3 Announce Type: replace 
Abstract: Machine learning (ML) has gained significant adoption in Android malware detection to address the escalating threats posed by the rapid proliferation of malware attacks. However, recent studies have revealed the inherent vulnerabilities of ML-based detection systems to evasion attacks. While efforts have been made to address this critical issue, many of the existing defensive methods encounter challenges such as lower effectiveness or reduced generalization capabilities. In this paper, we introduce MalPurifier, a novel adversarial purification framework specifically engineered for Android malware detection. Specifically, MalPurifier integrates three key innovations: a diversified adversarial perturbation mechanism for robustness and generalizability, a protective noise injection strategy for benign data integrity, and a Denoising AutoEncoder (DAE) with a dual-objective loss for accurate purification and classification. Extensive experiments on two large-scale datasets demonstrate that MalPurifier significantly outperforms state-of-the-art defenses. It robustly defends against a comprehensive set of 37 perturbation-based evasion attacks, consistently achieving robust accuracies above 90.91%. As a lightweight, model-agnostic, and plug-and-play module, MalPurifier offers a practical and effective solution to bolster the security of ML-based Android malware detectors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2312.06423v3</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuyang Zhou, Guang Cheng, Zongyao Chen, Shui Yu</dc:creator>
    </item>
    <item>
      <title>Inevitability of Polarization in Geometric Opinion Exchange</title>
      <link>https://arxiv.org/abs/2402.08446</link>
      <description>arXiv:2402.08446v2 Announce Type: replace 
Abstract: Polarization and unexpected correlations between opinions on diverse topics (including in politics, culture and consumer choices) are an object of sustained attention. However, numerous theoretical models do not seem to convincingly explain these phenomena.
  This paper is motivated by a recent line of work, studying models where polarization can be explained in terms of biased assimilation and geometric interplay between opinions on various topics. The agent opinions are represented as unit vectors on a multidimensional sphere and updated according to geometric rules. In contrast to previous work, we focus on the classical opinion exchange setting, where the agents update their opinions in discrete time steps, with a pair of agents interacting randomly at every step. The opinions are updated according to an update rule belonging to a general class.
  Our findings are twofold. First, polarization appears to be ubiquitous in the class of models we study, requiring only relatively modest assumptions reflecting biased assimilation. Second, there is a qualitative difference between two-dimensional dynamics on the one hand, and three or more dimensions on the other. Accordingly, we prove almost sure polarization for a large class of update rules in two dimensions. Then, we prove polarization in three and more dimensions in more limited cases and try to shed light on central difficulties that are absent in two dimensions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2402.08446v2</guid>
      <category>cs.SI</category>
      <category>econ.TH</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Abdou Majeed Alidou, J\'ulia Balig\'acs, Max Hahn-Klimroth, Jan H\k{a}z{\l}a, Lukas Hintze, Olga Scheftelowitsch</dc:creator>
    </item>
    <item>
      <title>Aggressive or Imperceptible, or Both: Network Pruning Assisted Hybrid Byzantines in Federated Learning</title>
      <link>https://arxiv.org/abs/2404.06230</link>
      <description>arXiv:2404.06230v3 Announce Type: replace 
Abstract: In federated learning (FL), profiling and verifying each client is inherently difficult, which introduces a significant security vulnerability: malicious clients, commonly referred to as Byzantines, can degrade the accuracy of the global model by submitting poisoned updates during training. To mitigate this, the aggregation process at the parameter server must be robust against such adversarial behaviour. Most existing defences approach the Byzantine problem from an outlier detection perspective, treating malicious updates as statistical anomalies and ignoring the internal structure of the trained neural network (NN). Motivated by this, this work highlights the potential of leveraging side information tied to the NN architecture to design stronger, more targeted attacks. In particular, inspired by insights from sparse NNs, we introduce a hybrid sparse Byzantine attack. The attack consists of two coordinated components: (i) A sparse attack component that selectively manipulates parameters with higher sensitivity in the NN, aiming to cause maximum disruption with minimal visibility; (ii) A slow-accumulating attack component that silently poisons parameters over multiple rounds to evade detection. Together, these components create a strong but imperceptible attack strategy that can bypass common defences. We evaluate the proposed attack through extensive simulations and demonstrate its effectiveness against eight state-of-the-art defence mechanisms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2404.06230v3</guid>
      <category>cs.LG</category>
      <category>cs.CR</category>
      <category>cs.DC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Emre Ozfatura, Kerem Ozfatura, Baturalp Buyukates, Mert Coskuner, Alptekin Kupcu, Deniz Gunduz</dc:creator>
    </item>
    <item>
      <title>Automated versus Human Engagement: Mapping Cognitive Bias Triggers in Online Discourse</title>
      <link>https://arxiv.org/abs/2406.07293</link>
      <description>arXiv:2406.07293v2 Announce Type: replace 
Abstract: In the digital environment, human attention is frequently guided by cognitive heuristics rather than deliberate evaluation. Since low-credibility narratives often lack substantive factual evidence, their diffusion disproportionally relies on activating these mental shortcut to simulate credibility and capture attention. This study presents a computational framework designed to detect computational triggers through observable data proxies for eight distinct cognitive biases across 3.5 million posts of contested COVID-19 narratives. We demonstrate that automated accounts (bots) embed these triggers more frequently than human users, yielding distinctly source-dependent associations with audience interaction. In bot-authored posts, affective and cognitive dissonance (stance-shifting) triggers are strongly associated with higher engagement, while the deployment of authority and availability (repetition) cues correlates with reduced audience interaction. Furthermore, we identify limits to heuristic compounding: positive engagement correlations with bot-authored content declines when multiple biases are stacked within a single post, whereas human-authored communication remains structurally resilient to high trigger density. By operationalizing psychological heuristics into scalable, measurable data, this work bridges computational social science and cognitive psychology to reveal how source identity (bot/human) shapes the mechanics of information diffusion in digital networks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2406.07293v2</guid>
      <category>cs.SI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Lynnette Hui Xian Ng, Wenqi Zhou, Kathleen M. Carley</dc:creator>
    </item>
    <item>
      <title>From AoI to QVAoI: Query-Based Semantics-Aware Scheduling for Energy-Harvesting IoT Systems</title>
      <link>https://arxiv.org/abs/2407.08587</link>
      <description>arXiv:2407.08587v3 Announce Type: replace 
Abstract: In this work, we study the freshness and significance of information in an IoT status update system in which an Energy Harvesting (EH) device samples an information source and forwards update packets to a destination node via a direct channel. We introduce and optimize a semantics-aware metric, Query Version Age of Information (QVAoI), in the system along with other metrics: Query Age of Information (QAoI), Version Age of Information (VAoI), and Age of Information (AoI). We formulate the optimization problem as a Markov Decision Process to determine the optimal transmission policy at the device, which decides the time slots for transmitting updates, subject to the device's battery energy limitations and the energy arrivals. Furthermore, we derive closed-form expressions for the average update rate and the QVAoI for a unit-capacity battery, serving as analytical benchmarks. We compare the performance of QVAoI-Optimal, QAoI-Optimal, VoI-Optimal, and AoI-Optimal policies with a baseline greedy policy. All semantics-aware policies achieve better performance than the greedy policy. The QVAoI-Optimal policy, in particular, demonstrates a significant performance improvement either by providing fresher, more relevant, and more valuable updates with the same energy arrivals or by reducing the number of transmissions in the system while maintaining the same level of freshness and information significance as the QAoI-Optimal and other policies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2407.08587v3</guid>
      <category>cs.IT</category>
      <category>cs.NI</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Erfan Delfani, Nikolaos Pappas</dc:creator>
    </item>
    <item>
      <title>Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise</title>
      <link>https://arxiv.org/abs/2408.09929</link>
      <description>arXiv:2408.09929v2 Announce Type: replace 
Abstract: Inspired by the idea of Positive-incentive Noise (Pi-Noise or $\pi$-Noise) that aims at learning the reliable noise beneficial to tasks, we scientifically investigate the connection between contrastive learning and $\pi$-noise in this paper. By converting the contrastive loss to an auxiliary Gaussian distribution to quantitatively measure the difficulty of the specific contrastive model under the information theory framework, we properly define the task entropy, the core concept of $\pi$-noise, of contrastive learning. It is further proved that the predefined data augmentation in the standard contrastive learning paradigm can be regarded as a kind of point estimation of $\pi$-noise. Inspired by the theoretical study, a framework that develops a $\pi$-noise generator to learn the beneficial noise (instead of estimation) as data augmentations for contrast is proposed. The designed framework can be applied to diverse types of data and is also completely compatible with the existing contrastive models. From the visualization, we surprisingly find that the proposed method successfully learns effective augmentations. Our code is available at https://github.com/hyzhang98/PiNDA.</description>
      <guid isPermaLink="false">oai:arXiv.org:2408.09929v2</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hongyuan Zhang, Yanchen Xu, Sida Huang, Xuelong Li</dc:creator>
    </item>
    <item>
      <title>Quantum-inspired Reinforcement Learning for Synthesizable Drug Design</title>
      <link>https://arxiv.org/abs/2409.09183</link>
      <description>arXiv:2409.09183v2 Announce Type: replace 
Abstract: Synthesizable molecular design (also known as synthesizable molecular optimization) is a fundamental problem in drug discovery, and involves designing novel molecular structures to improve their properties according to drug-relevant oracle functions (i.e., objective) while ensuring synthetic feasibility. However, existing methods are mostly based on random search. To address this issue, in this paper, we introduce a novel approach using the reinforcement learning method with quantum-inspired simulated annealing policy neural network to navigate the vast discrete space of chemical structures intelligently. Specifically, we employ a deterministic REINFORCE algorithm using policy neural networks to output transitional probability to guide state transitions and local search using genetic algorithm to refine solutions to a local optimum within each iteration. Our methods are evaluated with the Practical Molecular Optimization (PMO) benchmark framework with a 10K query budget. We further showcase the competitive performance of our method by comparing it against the state-of-the-art genetic algorithms-based method.</description>
      <guid isPermaLink="false">oai:arXiv.org:2409.09183v2</guid>
      <category>cs.LG</category>
      <category>q-bio.BM</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dannong Wang, Jintai Chen, Yingzhou Lu, Minjie Shen, Lulu Chen, Zhiding Liang, Tianfan Fu, Xiao-Yang Liu</dc:creator>
    </item>
    <item>
      <title>Subjective and Objective Quality-of-Experience Evaluation Study for Live Video Streaming</title>
      <link>https://arxiv.org/abs/2409.17596</link>
      <description>arXiv:2409.17596v3 Announce Type: replace 
Abstract: In recent years, live video streaming has gained widespread popularity across various social media platforms. Quality of experience (QoE), which reflects end-users' satisfaction and overall experience, plays a critical role for media service providers to optimize large-scale live compression and transmission strategies to achieve perceptually optimal rate-distortion trade-off. Although many QoE metrics for video-on-demand (VoD) have been proposed, there remain significant challenges in developing QoE metrics for live video streaming. To bridge this gap, we conduct a comprehensive study of subjective and objective QoE evaluations for live video streaming. For the subjective QoE study, we introduce the first live video streaming QoE dataset, TaoLive QoE, which consists of $42$ source videos collected from real live broadcasts and $1,155$ corresponding distorted ones degraded due to a variety of streaming distortions, including conventional streaming distortions such as compression, stalling, as well as live streaming-specific distortions like frame skipping, variable frame rate, etc. Subsequently, a human study was conducted to derive subjective QoE scores of videos in the TaoLive QoE dataset. For the objective QoE study, we benchmark existing QoE models on the TaoLive QoE dataset as well as publicly available QoE datasets for VoD scenarios, highlighting that current models struggle to accurately assess video QoE, particularly for live content. Hence, we propose an end-to-end QoE evaluation model, Tao-QoE, which integrates multi-scale semantic features and optical flow-based motion features to predicting a retrospective QoE score, eliminating reliance on statistical quality of service (QoS) features.</description>
      <guid isPermaLink="false">oai:arXiv.org:2409.17596v3</guid>
      <category>cs.MM</category>
      <category>cs.AI</category>
      <category>eess.IV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <arxiv:DOI>10.1109/TCSVT.2026.3689180</arxiv:DOI>
      <dc:creator>Zehao Zhu, Wei Sun, Jun Jia, Wei Wu, Sibin Deng, Kai Li, Ying Chen, Xiongkuo Min, Jia Wang, Guangtao Zhai</dc:creator>
    </item>
    <item>
      <title>A Unified FPT Framework for Crossing Number Problems</title>
      <link>https://arxiv.org/abs/2410.00206</link>
      <description>arXiv:2410.00206v4 Announce Type: replace 
Abstract: The basic (and traditional) crossing number problem is to determine the minimum number of crossings in a topological drawing of an input graph in the plane. We develop a unified framework yielding fixed-parameter tractable (FPT) algorithms for many generalized crossing number problems.
  Our framework takes the following form. We fix a surface S and a class D of "allowed" topological drawings of graphs in S (e.g., some class of drawings with at most t crossings). We assume that testing membership in D can be done algorithmically, and that restricting a drawing in D, extending it without adding any crossing, or transforming it with a self-homeomorphism of S yields a drawing that is also in D. Then deciding whether an input graph G has a drawing in D, and computing one if it is the case, is fixed-parameter tractable in (essentially) the genus of S and the maximum number of crossings in a drawing in D. More generally, we may take as input an edge-colored graph and distinguish crossings by the colors of the involved edges; and we may allow a bounded number of edge removals and vertex splits on G before drawing it. The proof is a reduction to the embeddability of a graph on a two-dimensional simplicial complex.
  This implies, in a unified way, linear or quadratic FPT algorithms for many topological crossing number variants established in the graph drawing community. Some of these variants already had previously published FPT algorithms, mostly relying on Courcelle's metatheorem, but our algorithms have a better runtime. Moreover, our framework extends to these crossing number variants in any fixed surface, and also allows us to fix the rotation system of the drawing of a graph in some variants.</description>
      <guid isPermaLink="false">oai:arXiv.org:2410.00206v4</guid>
      <category>cs.CG</category>
      <category>math.CO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>\'Eric Colin de Verdi\`ere, Petr Hlin\v{e}n\'y</dc:creator>
    </item>
    <item>
      <title>Dataset-Driven Channel Masks in Transformers for Multivariate Time Series</title>
      <link>https://arxiv.org/abs/2410.23222</link>
      <description>arXiv:2410.23222v3 Announce Type: replace 
Abstract: Recent advancements in foundation models have been successfully extended to the time series (TS) domain, facilitated by the emergence of large-scale TS datasets. However, previous efforts have primarily Capturing channel dependency (CD) is essential for modeling multivariate time series (TS), and attention-based methods have been widely employed for this purpose. Nonetheless, these methods primarily focus on modifying the architecture, often neglecting the importance of dataset-specific characteristics. In this work, we introduce the concept of partial channel dependence (PCD) to enhance CD modeling in Transformer-based models by leveraging dataset-specific information to refine the CD captured by the model. To achieve PCD, we propose channel masks (CMs), which are integrated into the attention matrices of Transformers via element-wise multiplication. CMs consist of two components: 1) a similarity matrix that captures relationships between the channels, and 2) dataset-specific and learnable domain parameters that refine the similarity matrix. We validate the effectiveness of PCD across diverse tasks and datasets with various backbones. Code is available at this repository: https://github.com/YonseiML/pcd.</description>
      <guid isPermaLink="false">oai:arXiv.org:2410.23222v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Seunghan Lee, Taeyoung Park, Kibok Lee</dc:creator>
    </item>
    <item>
      <title>ModelPredictiveControl.jl: advanced process control made easy in Julia</title>
      <link>https://arxiv.org/abs/2411.09764</link>
      <description>arXiv:2411.09764v4 Announce Type: replace 
Abstract: Proprietary closed-source software is still the norm in advanced process control. Transparency and reproducibility are key aspects of scientific research. Free and open-source toolkit can contribute to the development, sharing and advancement of new and efficient control approaches, and the industrial sector will certainly benefit from them. This paper presents ModelPredictiveControl.jl, an open-source software package for designing model predictive controllers in the Julia programming language. It is designed to be easy to use and modular, while providing advanced features like nonlinear control and moving horizon estimation. It relies on powerful control system, mathematical optimization and automatic differentiation frameworks to simplify the construction and testing of state estimators and predictive controllers. It also integrates with the standard plotting library to quickly visualize closed-loop data. The paper presents the main functionalities and illustrates them with two case studies in simulation. The first example is a continuously stirred tank reactor described by linear dynamics. The second one implements a nonlinear, an economic, and a successive linearization model predictive controllers for an inverted pendulum. The solving times are benchmarked against equivalent implementations in MATLAB to show the efficiency of the package.</description>
      <guid isPermaLink="false">oai:arXiv.org:2411.09764v4</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Francis Gagnon, Alex Thivierge, Andr\'e Desbiens, Fredrik Bagge Carlson</dc:creator>
    </item>
    <item>
      <title>Fast Switching in Mixed-Integer Model Predictive Control</title>
      <link>https://arxiv.org/abs/2411.19300</link>
      <description>arXiv:2411.19300v5 Announce Type: replace 
Abstract: We deduce stability results for finite control set and mixed-integer model predictive control with a downstream oversampling phase. The presentation rests upon the inherent robustness of model predictive control with stabilizing terminal conditions and techniques for solving mixed-integer optimal control problems by continuous optimization. Partial outer convexification and binary relaxation transform mixed-integer problems into common optimal control problems. We deduce nominal asymptotic stability for the resulting relaxed system formulation and implement sum-up rounding to restore efficiently integer feasibility on an oversampling time grid. If fast control switching is technically possible and inexpensive, we can approximate the relaxed system behavior in the state space arbitrarily close. We integrate input perturbed model predictive control with practical asymptotic stability. Numerical experiments illustrate practical relevance of fast control switching.</description>
      <guid isPermaLink="false">oai:arXiv.org:2411.19300v5</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Artemi Makarow, Christian Kirches</dc:creator>
    </item>
    <item>
      <title>RoDyGS: Robust Dynamic Gaussian Splatting for Casual Videos</title>
      <link>https://arxiv.org/abs/2412.03077</link>
      <description>arXiv:2412.03077v2 Announce Type: replace 
Abstract: 4D reconstruction from casually captured monocular videos is challenging due to inherent ambiguity in reconstructing dynamic 3D geometry. To address this challenge, we introduce Robust Dynamic Gaussian Splatting (RoDyGS), a method that reconstructs dynamic scene representation from casual monocular videos. RoDyGS explicitly separates static and dynamic scene elements, and applies spatiotemporal regularization to enforce physically plausible geometry and temporally consistent motion. Furthermore, we propose a comprehensive benchmark, Kubric-MRig, which provides extensive camera and object motion along with simultaneous multi-view capture, features that are absent in previous benchmarks. Experiments demonstrate that RoDyGS significantly outperforms previous pose-free dynamic novel view synthesis approaches and achieves competitive rendering quality compared to existing pose-free static novel view synthesis approaches. Our proejct page is available at https://rodygs.github.io</description>
      <guid isPermaLink="false">oai:arXiv.org:2412.03077v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Junmyeong Lee, Hoseung Choi, Yoonwoo Jeong, Minsu Cho</dc:creator>
    </item>
    <item>
      <title>Optimal Control with Natural Images: Efficient Reinforcement Learning using Overcomplete Sparse Codes</title>
      <link>https://arxiv.org/abs/2412.08893</link>
      <description>arXiv:2412.08893v3 Announce Type: replace 
Abstract: Optimal control and sequential decision making are widely used in many complex tasks. Optimal control over a sequence of natural images is a first step towards understanding the role of vision in control. Here, we formalize this problem as a reinforcement learning task, and derive general conditions under which an image includes enough information to implement an optimal policy. Reinforcement learning is shown to provide a computationally efficient method for finding optimal policies when natural images are encoded into "efficient" image representations. This is demonstrated by introducing a new reinforcement learning benchmark that easily scales to large numbers of states and long horizons. In particular, by representing each image as an overcomplete sparse code, we are able to efficiently solve an optimal control task that is orders of magnitude larger than those tasks solvable using complete codes. Theoretical justification for this behaviour is provided. This work also demonstrates that deep learning is not necessary for efficient optimal control with natural images.</description>
      <guid isPermaLink="false">oai:arXiv.org:2412.08893v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Peter N. Loxley</dc:creator>
    </item>
    <item>
      <title>The Geometry of Statistical Data and Information: A Large Deviation Perspective</title>
      <link>https://arxiv.org/abs/2501.01556</link>
      <description>arXiv:2501.01556v3 Announce Type: replace 
Abstract: The manifold of empirical mean values of statistical data ad infinitum has a geometric shape that depends on the probability measure that governs the generating model. Large deviation theory produces entropy functions that depend on both the probability measure and the statistical data; we use entropy to study the geometry of the data space rather than that of the space of probability distributions. It is well known, since Rao's work, that the Fisher-Rao metric makes the probability simplex into a sphere. From our perspective, that result translates to the space of empirical singleton counting frequencies under an i.i.d. assumption. Following our ideas and going beyond i.i.d., the choice of measure curves the space. When we study the pairwise statistics, the spherical geometry breaks down entirely. We show that the information projection, defined in information geometry as divergence minimization, coincides with the information projection in Kolmogorov's probability theory. This identification holds under both i.i.d. and Markovian assumptions and connects information geometry to the foundations of probability theory.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.01556v3</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Viswa Virinchi Muppirala, Hong Qian</dc:creator>
    </item>
    <item>
      <title>Materialist: Physically Based Editing Using Single-Image Inverse Rendering</title>
      <link>https://arxiv.org/abs/2501.03717</link>
      <description>arXiv:2501.03717v3 Announce Type: replace 
Abstract: Achieving physically consistent image editing remains a significant challenge in computer vision. Existing image editing methods typically rely on neural networks, which struggle to accurately handle shadows and refractions. Conversely, physics-based inverse rendering often requires multi-view optimization, limiting its practicality in single-image scenarios. In this paper, we propose Materialist, a neural-initialized physically based rendering pipeline for single-image inverse rendering. Unlike previous hybrid methods that use physics to guide neural generation, our method leverages neural networks to predict initial material properties, which are then rigorously optimized via progressive differentiable rendering. Our approach enables a range of applications, including material editing, object insertion, and relighting, while also introducing an effective method for editing material transparency via ray-traced refraction without requiring full scene geometry. Furthermore, our envmap estimation method also achieves competitive performance, further enhancing the accuracy of image editing task. Experiments demonstrate strong performance across synthetic and real-world datasets, excelling even on challenging out-of-domain images.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.03717v3</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.GR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1007/s11263-026-02833-z</arxiv:DOI>
      <arxiv:journal_reference>International Journal of Computer Vision (IJCV), 2026</arxiv:journal_reference>
      <dc:creator>Lezhong Wang, Duc Minh Tran, Ruiqi Cui, Thomson TG, Anders Bjorholm Dahl, Siavash Arjomand Bigdeli, Jeppe Revall Frisvad, Manmohan Chandraker</dc:creator>
    </item>
    <item>
      <title>MAD-BA: 3D LiDAR Bundle Adjustment -- from Uncertainty Modelling to Structure Optimization</title>
      <link>https://arxiv.org/abs/2501.03972</link>
      <description>arXiv:2501.03972v2 Announce Type: replace 
Abstract: The joint optimization of sensor poses and 3D structure is fundamental for state estimation in robotics and related fields. Current LiDAR systems often prioritize pose optimization, with structure refinement either omitted or treated separately using implicit representations. This paper introduces a framework for simultaneous optimization of sensor poses and 3D map, represented as surfels. A generalized LiDAR uncertainty model is proposed to address less reliable measurements in varying scenarios. Experimental results on public datasets demonstrate improved performance over most comparable state-of-the-art methods. The system is provided as open-source software to support further research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.03972v2</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1109/LRA.2025.3573628</arxiv:DOI>
      <dc:creator>Krzysztof \'Cwian, Luca Di Giammarino, Simone Ferrari, Thomas Ciarfuglia, Giorgio Grisetti, Piotr Skrzypczy\'nski</dc:creator>
    </item>
    <item>
      <title>An O(log n)-Approximation Algorithm for (p,q)-Flexible Graph Connectivity via Independent Rounding</title>
      <link>https://arxiv.org/abs/2501.12549</link>
      <description>arXiv:2501.12549v2 Announce Type: replace 
Abstract: In the Flexible Graph Connectivity (FGC) problem, we are given an undirected multigraph on $n$ vertices with nonnegative edge costs, where each edge is classified as either safe or unsafe. Given integer parameters $p$ and $q$, the goal in $(p,q)$-FGC is to purchase a minimum-cost set of edges such that the resulting spanning subgraph remains $p$-edge-connected after the removal of any set of up to $q$ unsafe edges.
  Our main contribution is an $O(\log n)$-approximation algorithm based on independent rounding, improving the previous best approximation ratio of $O(q \log n)$. Central to our approach is a new linear programming formulation of feasible solutions that encodes knapsack cover inequalities as cut-capacity constraints. Unlike prior work, the capacity of an edge in a cut may depend on the partially purchased solution for this cut. We show that the resulting linear program admits a polynomial-time separation oracle. Scaling the fractional solution by $\Theta(\log n)$ and applying independent rounding yields a feasible integral solution with constant probability; here, we leverage the knapsack cover inequalities to obtain strong concentration bounds for the rounded solution relative to any given partial solution. A key ingredient in both separation and rounding is the use of Karger's bound on the number of near-minimum cuts.
  We also extend the $(p,q)$-FGC problem to model more than two safety tiers and show that our results and techniques extend naturally to this setting, albeit with increased approximation ratios and running times that scale with the number of tiers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.12549v2</guid>
      <category>cs.DM</category>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sharat Ibrahimpur, L\'aszl\'o A. V\'egh</dc:creator>
    </item>
    <item>
      <title>Complexity of Constructing Minimal Faithful Permutation Representations for Fitting-free Groups</title>
      <link>https://arxiv.org/abs/2501.16039</link>
      <description>arXiv:2501.16039v4 Announce Type: replace 
Abstract: In this paper, we investigate the complexity of computing minimal faithful permutation representations for groups without abelian normal subgroups (a.k.a. Fitting-free groups). When our groups are given as quotients of permutation groups, we exhibit a polynomial-time algorithm for constructing such representations. Furthermore, in the setting of permutation groups, we obtain an $\textsf{NC}$ procedure for computing the minimal faithful permutation degree, and a randomized $\textsf{NC}$ ($\textsf{RNC}$) algorithm for computing a minimal faithful permutation representation. This improves upon the work of Das and Thakkar (STOC 2024, SIAM J. Comput. 2026), who established a Las Vegas polynomial-time algorithm for computing the minimal faithful permutation degree for this class in the setting of permutation groups.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.16039v4</guid>
      <category>cs.DS</category>
      <category>cs.CC</category>
      <category>math.GR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Michael Levet, Pranjal Srivastava, Dhara Thakkar</dc:creator>
    </item>
    <item>
      <title>Bayesian Parameter Shift Rule in Variational Quantum Eigensolvers</title>
      <link>https://arxiv.org/abs/2502.02625</link>
      <description>arXiv:2502.02625v2 Announce Type: replace 
Abstract: Parameter shift rules (PSRs) are key techniques for efficient gradient estimation in variational quantum eigensolvers (VQEs). In this paper, we propose its Bayesian variant, where Gaussian processes with appropriate kernels are used to estimate the gradient of the VQE objective. Our Bayesian PSR offers flexible gradient estimation from observations at arbitrary locations with uncertainty information and reduces to the generalized PSR in special cases. In stochastic gradient descent (SGD), the flexibility of Bayesian PSR allows the reuse of observations in previous steps, which accelerates the optimization process. Furthermore, the accessibility to the posterior uncertainty, along with our proposed notion of gradient confident region (GradCoRe), enables us to minimize the observation costs in each SGD step. Our numerical experiments show that the VQE optimization with Bayesian PSR and GradCoRe significantly accelerates SGD and outperforms the state-of-the-art methods, including sequential minimal optimization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.02625v2</guid>
      <category>cs.LG</category>
      <category>quant-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Samuele Pedrielli, Christopher J. Anders, Lena Funcke, Karl Jansen, Kim A. Nicoli, Shinichi Nakajima</dc:creator>
    </item>
    <item>
      <title>"Security vs. Interoperability" Arguments: An Analytical Framework</title>
      <link>https://arxiv.org/abs/2502.04538</link>
      <description>arXiv:2502.04538v5 Announce Type: replace 
Abstract: Concerns about big tech's monopoly power have featured prominently in recent media and policy discourse, as regulators across the European Union (EU), the United States (US) and beyond have ramped up efforts to promote healthier market competition. One favored approach is to require certain kinds of interoperation between platforms, to mitigate the current concentration of power in the biggest companies. Unsurprisingly, interoperability initiatives have generally been met with resistance by big tech companies. Perhaps more surprisingly, a significant part of that pushback has been in the name of security -- that is, arguing against interoperation on the basis that it will undermine security. We conduct a systematic examination of "security vs. interoperability" (SvI) discourse in the context of EU antitrust and competition proceedings. Our resulting contributions are threefold. First, we propose a taxonomy of SvI concerns in three categories: engineering, vetting, and hybrid. Second, we present an analytical framework for assessing real-world SvI concerns, and illustrate its utility by analyzing several case studies spanning our three taxonomy categories. Third, we undertake a comparative analysis that highlights key considerations around the interplay of economic incentives, market power, and security across our diverse case study contexts, identifying common patterns in each taxonomy category. Our contributions provide valuable analytical tools for experts and non-experts alike to critically assess SvI discourse in today's fast-paced regulatory landscape.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.04538v5</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Daji Landis, Elettra Bietti, Sunoo Park</dc:creator>
    </item>
    <item>
      <title>Round and Resilience-Optimal Approximate Agreement on Trees and Block Graphs</title>
      <link>https://arxiv.org/abs/2502.05591</link>
      <description>arXiv:2502.05591v4 Announce Type: replace 
Abstract: Approximate Agreement ($\mathcal{AA}$) is a fundamental primitive that, even in the presence of Byzantine faults, allows honest parties to obtain close (but not necessarily identical) outputs that lie within the range of their inputs. While the optimal round complexity of synchronous $\mathcal{AA}$ on real values is well understood, its extension to other input spaces has remained open, with fundamental questions regarding achievable resilience and round efficiency still unresolved. In this work, we investigate the optimal round complexity of synchronous $\mathcal{AA}$ on trees under Byzantine failures. In this setting, parties hold as inputs vertices of a publicly known labeled tree $T$ and must output $1$-close vertices lying in the convex hull of the honest inputs. We present a synchronous protocol with optimal resilience and round complexity $O\left(\frac{\log D(T)}{\log \log D(T)}\right)$, where $D(T)$ denotes the diameter of the input space tree. Complementing this result, we extend impossibility results for real-valued $\mathcal{AA}$ to any graph $G$ by proving a lower bound of $\Omega\left(\frac{\log D(G)}{\log \log D(G) + \log \frac{n+t}{t}}\right)$ rounds, where $n$ is the number of parties and $t$ the number of Byzantine faults. Together, these results establish the asymptotic optimality of our protocol whenever $t \in \Theta(n)$. We further extend our techniques to block graphs by leveraging their clique tree structure. This yields protocols for $\mathcal{AA}$ on block graphs with optimal resilience in both the synchronous and asynchronous models, and with optimal round complexity in the synchronous model.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.05591v4</guid>
      <category>cs.DC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Marc Fuchs, Diana Ghinea, Zahra Parsaeian, Joel Rybicki</dc:creator>
    </item>
    <item>
      <title>Positional Encoding in Transformer-Based Time Series Models: A Survey</title>
      <link>https://arxiv.org/abs/2502.12370</link>
      <description>arXiv:2502.12370v3 Announce Type: replace 
Abstract: Recent advancements in transformer-based models have greatly improved time series analysis, providing robust solutions for tasks such as forecasting, anomaly detection, and classification. A crucial element of these models is positional encoding, which allows transformers to capture the intrinsic sequential nature of time series data. This survey systematically examines existing techniques for positional encoding in transformer-based time series models. We investigate a variety of methods, including fixed, learnable, relative, and hybrid approaches, and evaluate their effectiveness in different time series classification tasks. Our findings indicate that data characteristics like sequence length, signal complexity, and dimensionality significantly influence method effectiveness. Advanced positional encoding methods exhibit performance gains in terms of prediction accuracy, however, they come at the cost of increased computational complexity. Furthermore, we outline key challenges and suggest potential research directions to enhance positional encoding strategies. By delivering a comprehensive overview and quantitative benchmarking, this survey intends to assist researchers and practitioners in selecting and designing effective positional encoding methods for transformer-based time series models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.12370v3</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Habib Irani, Vangelis Metsis</dc:creator>
    </item>
    <item>
      <title>Collision-Aware Object-Goal Visual Navigation via Two-Stage Deep Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2502.13498</link>
      <description>arXiv:2502.13498v2 Announce Type: replace 
Abstract: Object-goal visual navigation aims to reach a specific target object using egocentric visual observations. Recent deep reinforcement learning (DRL) approaches have achieved promising success rates but often neglect collisions during evaluation, limiting real-world deployment. To address this issue, this letter introduces a collision-aware evaluation metric, namely collision-free success rate (CF-SR), to explicitly measure navigation performance under collision constraints. In addition, collision-free success weighted by path length (CF-SPL) is adopted to further evaluate navigation efficiency. Furthermore, a two-stage DRL training framework with collision prediction is proposed to improve collision-free navigation performance. In the first stage, a collision prediction module is trained by supervising the agent's collision states during exploration. In the second stage, leveraging the trained collision prediction, the agent learns to navigate toward target objects while avoiding collision. Extensive experiments across multiple navigation models in the AI2-THOR environment demonstrate consistent improvements in both CF-SR and CF-SPL. Real-world experiments further validate the effectiveness and generalization capability of the proposed framework.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.13498v2</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hongwu Wang, Shiwei Lian, Feitian Zhang</dc:creator>
    </item>
    <item>
      <title>RouteFormer: A Transformer-Based Routing Framework for Autonomous Vehicles</title>
      <link>https://arxiv.org/abs/2504.05407</link>
      <description>arXiv:2504.05407v2 Announce Type: replace 
Abstract: Autonomous surveillance missions in Internet of Things (IoT) networks often involve solving NP-hard combinatorial optimization problems to ensure efficient resource utilization. To address the limitations of conventional heuristics in dynamic environments, we propose RouteFormer, a novel framework for single-agent routing in graph-based terrains. RouteFormer creates a synergy between the global context awareness of the transformer self-attention mechanism and the adaptive decision-making capabilities of Reinforcement Learning (RL). This architecture allows the system to output optimized routing decisions that adapt to complex task dependencies and resource availability without requiring labeled training datasets. We evaluated RouteFormer on varying graph sizes designed to resemble realistic reconnaissance missions. The results indicate that our model effectively handles the complexity of missions requiring multiple action profiles, outperforming baseline approaches, in terms of both time and distance. Specifically, RouteFormer achieved 10\% and 7\% reduction in distance compared to the solutions obtained from well-established solvers like Concorde and Lin-Kernighan-Helsgaun-3 (LKH-3). This improvement was achieved by effectively incorporating mission-specific constraints that traditional solvers overlook. The proposed framework serves as a modular, scalable pipeline for diverse autonomous scheduling and routing tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2504.05407v2</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yazan Youssef, Paulo Ricardo Marques de Araujo, Aboelmagd Noureldin, Sidney Givigi</dc:creator>
    </item>
    <item>
      <title>Solar-charge your car: EV charging can be aligned with renewables by providing pro-environmental information on a smartboard</title>
      <link>https://arxiv.org/abs/2504.09542</link>
      <description>arXiv:2504.09542v2 Announce Type: replace 
Abstract: Integrating electric vehicle (EV) charging with renewable energy production is essential for reducing the transport sector's carbon footprint, but effective and scalable strategies to align individual charging behavior with renewable supply remain underexplored. This quasi-experimental field study tests whether real-time prescriptive informational cues can influence EV drivers to charge during periods of high renewable energy availability. A smartboard displaying dynamic "charge green now" and "charge later" signals based on local photovoltaic forecasts and market prices was installed at a semi-residential charging facility in Ghent, Belgium. Hourly charging data (N = 619 days) were analyzed using a difference-in-differences design of lamp states between control-intervention garages at pre-trial and during-trial. Results are consistent with a behavioral effect of the "charge green now" smartboard lamps increasing number of charging operations and the total kWh charged during renewable-rich periods, without financial incentives. Emission modelling estimates that charging at the hours observed during the trial was associated with approximately 20% lower CO2-equivalent emissions compared with baseline charging patterns. These findings suggest that non-financial interventions, i.e., providing salient, real-time prescriptive information, may meaningfully contribute to demand-side flexibility in EV charging. The study offers practical insights for designing non-financial demand response mechanisms and offers a scalable, cost-effective method for reducing greenhouse gas emissions of EVs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2504.09542v2</guid>
      <category>cs.ET</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Celina Kacperski, Melanie Vogel, Florian Kutzner, Mona Bielig, Soroush Karimi Madahi, Lieven Demolder, Matthias Strobbe, Sonja Klingert</dc:creator>
    </item>
    <item>
      <title>Privacy-Preserving Empathy Detection in Video Interactions</title>
      <link>https://arxiv.org/abs/2504.10808</link>
      <description>arXiv:2504.10808v3 Announce Type: replace 
Abstract: Detecting empathy from video interactions has emerging applications, yet raw videos that could be used for training AI models are rarely available due to privacy and ethical constraints. Public benchmarks are consequently released only as pre-extracted features, creating a privacy-constrained learning regime whose privacy-utility trade-off is poorly characterised. We formalise three levels of privacy for video-based behavioural prediction -- no privacy (raw video), partial privacy (temporal visual features such as facial landmarks, action units and eye gaze) and strong privacy (summary statistics of those features) -- and ask whether strong, subject-generalisable empathy detection is achievable at the strong-privacy level. We propose TFMPathy, instantiated with two recent Tabular Foundation Models (TFMs) (TabPFN v2 and TabICL), under both in-context learning and fine-tuning paradigms. On a public human-robot interaction benchmark, TFMPathy achieves strong utility under strong privacy, outperforming established baselines by a substantial margin. To assess robustness and facilitate fair, safe deployment, we introduce a cross-subject evaluation protocol that was previously lacking in this benchmark. Under this protocol, TFM fine-tuning improves generalisation capacity substantially (accuracy: $0.590 \rightarrow 0.730$; AUC: $0.564 \rightarrow 0.669$). Aggregating temporal features into summary statistics also suppresses subject-specific and demographic cues, aligning TFMPathy with data-minimisation principles. TFMPathy, therefore, offers a practical route to building AI systems that depend on human-centred video when governance, consent or institutional policies restrict the sharing of raw video. Code will be released upon acceptance at https://github.com/hasan-rakibul/TFMPathy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2504.10808v3</guid>
      <category>cs.CV</category>
      <category>cs.HC</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Md Rakibul Hasan, Md Zakir Hossain, Aneesh Krishna, Shafin Rahman, Tom Gedeon</dc:creator>
    </item>
    <item>
      <title>Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR</title>
      <link>https://arxiv.org/abs/2504.11101</link>
      <description>arXiv:2504.11101v4 Announce Type: replace 
Abstract: Optical Character Recognition (OCR) is fundamental to Vision-Language Models (VLMs) and high-quality data generation for LLM training. Yet, despite progress in average OCR accuracy, state-of-the-art VLMs still struggle with detecting sample-level errors and lack effective unsupervised quality control. We introduce Consensus Entropy (CE), a training-free, model-agnostic metric that estimates output reliability by measuring inter-model agreement entropy. The core insight is that correct predictions converge in output space, while errors diverge. Based on CE, we develop CE-OCR, a lightweight multi-model framework that verifies outputs by ensemble agreement, selects the best outputs, and further improves efficiency through adaptive routing. Experiments demonstrate that CE is robust for quality verification, improving F1 scores by 42.1% over VLM-as-Judge. CE-OCR achieves consistent OCR gains, outperforming self-consistency and single-model baselines at the same cost. Notably, CE requires no training or supervision, enabling plug-and-play integration. Code: https://github.com/Aslan-yulong/consensus-entropy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2504.11101v4</guid>
      <category>cs.CV</category>
      <category>cs.MM</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yulong Zhang, Tianyi Liang, Xinyue Huang, Erfei Cui, Guoqing Wang, Xu Guo, Chenhui Li, Gongshen Liu</dc:creator>
    </item>
    <item>
      <title>Beyond Public Access in LLM Pre-Training Data</title>
      <link>https://arxiv.org/abs/2505.00020</link>
      <description>arXiv:2505.00020v2 Announce Type: replace 
Abstract: Using a legally obtained dataset of 34 copyrighted O'Reilly Media books, we apply the DE-COP membership inference attack method to investigate whether OpenAI's large language models show recognition of copyrighted content. Our results based on this small sample suggest that GPT-4o, OpenAI's more recent and capable model, exhibits patterns consistent with recognition of pay-walled book content, with an AUROC score of 0.82 (95% bootstrapped CI: 0.60-0.96), though this wide confidence interval reflects substantial uncertainty due to the limited number of books tested. GPT-4o Mini, as a much smaller model, shows little recognition of any O'Reilly Media content with an AUROC score of 0.56 (0.28-0.83) for non-public data. Testing multiple models, with the same cutoff date, provides a partial control for potential language shifts over time that might bias our findings, though differences in model size, architecture, and potentially training data composition limit the strength of this control. These preliminary results underscore the importance of increased corporate transparency regarding pre-training data sources and the development of formal licensing frameworks for AI content training. Our principal contribution is our examination of public and non public data separately.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.00020v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sruly Rosenblat, Tim O'Reilly, Ilan Strauss</dc:creator>
    </item>
    <item>
      <title>TNStream: Applying Tightest Neighbors to Micro-Clusters to Define Multi-Density Clusters in Streaming Data</title>
      <link>https://arxiv.org/abs/2505.00359</link>
      <description>arXiv:2505.00359v2 Announce Type: replace 
Abstract: In data stream clustering, systematic theory of stream clustering algorithms remains relatively scarce. Recently, density-based methods have gained attention. However, existing algorithms struggle to simultaneously handle arbitrarily shaped, multi-density, high-dimensional data while maintaining strong outlier resistance. Clustering quality significantly deteriorates when data density varies complexly. This paper proposes a clustering algorithm based on the novel concept of Tightest Neighbors and introduces a data stream clustering theory based on the Skeleton Set. Based on these theories, this paper develops a new method, TNStream, a fully online algorithm. The algorithm adaptively determines the clustering radius based on local similarity, summarizing the evolution of multi-density data streams in micro-clusters. It then applies a Tightest Neighbors-based clustering algorithm to form final clusters. To improve efficiency in high-dimensional cases, Locality-Sensitive Hashing (LSH) is employed to structure micro-clusters, addressing the challenge of storing k-nearest neighbors. TNStream is evaluated on various synthetic and real-world datasets using different clustering metrics. Experimental results demonstrate its effectiveness in improving clustering quality for multi-density data and validate the proposed data stream clustering theory.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.00359v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.NE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qifen Zeng, Haomin Bao, Yuanzhuo Hu, Zirui Zhang, Yuheng Zheng, Luosheng Wen</dc:creator>
    </item>
    <item>
      <title>LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey</title>
      <link>https://arxiv.org/abs/2505.00753</link>
      <description>arXiv:2505.00753v5 Announce Type: replace 
Abstract: Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability due to hallucinations, difficulty in handling complex tasks, and substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world applications. To overcome these limitations, LLM-based human-agent systems (LLM-HAS) incorporate human-provided information, feedback, or control into the agent system to enhance system performance, reliability, and safety. These human-agent collaboration systems enable humans and LLM-based agents to collaborate effectively by leveraging their complementary strengths. This paper provides the first comprehensive and structured survey of LLM-HAS. It clarifies fundamental concepts, systematically presents core components shaping these systems, including environment and profiling, human feedback, interaction types, orchestration, and communication, explores emerging applications, and discusses unique challenges and opportunities arising from human-AI collaboration. By consolidating current knowledge and offering a structured overview, we aim to foster further research and innovation in this rapidly evolving interdisciplinary field. Paper lists and resources are available at https://github.com/HenryPengZou/Awesome-Human-Agent-Collaboration-Interaction-Systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.00753v5</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Jizhou Guo, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Hanrong Zhang, Fangxin Wang, Pengfei Zhang, Huacan Wang, Langzhou He, Yangning Li, Dongyuan Li, Renhe Jiang, Xue Liu, Philip S. Yu</dc:creator>
    </item>
    <item>
      <title>Fixed-Length Dense Fingerprint Representation with Alignment and Robust Enhancement</title>
      <link>https://arxiv.org/abs/2505.03597</link>
      <description>arXiv:2505.03597v2 Announce Type: replace 
Abstract: Fixed-length fingerprint representations, which map each fingerprint to a compact and fixed-size feature vector, are computationally efficient and well-suited for large-scale matching. However, designing a robust representation that effectively handles diverse fingerprint modalities, pose variations, and noise interference remains a significant challenge. In this work, we propose a fixed-length dense descriptor of fingerprints, and introduce FLARE-a fingerprint matching framework that integrates the Fixed-Length dense descriptor with pose-based Alignment and Robust Enhancement. This fixed-length representation employs a three-dimensional dense descriptor to effectively capture spatial relationships among fingerprint ridge structures, enabling robust and locally discriminative representations. To ensure consistency within this dense feature space, FLARE incorporates pose-based alignment using complementary estimation methods, along with dual enhancement strategies that refine ridge clarity while preserving the original fingerprint modality. The proposed dense descriptor supports fixed-length representation while maintaining spatial correspondence, enabling fast and accurate similarity computation. Extensive experiments demonstrate that FLARE achieves superior performance across rolled, plain, latent, and contactless fingerprints, significantly outperforming existing methods in cross-modality and low-quality scenarios. Further analysis validates the effectiveness of the dense descriptor design, as well as the impact of alignment and enhancement modules on the accuracy of dense descriptor matching. Experimental results highlight the effectiveness and generalizability of FLARE as a unified and scalable solution for robust fingerprint representation and matching. The implementation and code will be publicly available at https://github.com/Yu-Yy/FLARE.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.03597v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Zhiyu Pan, Xiongjun Guan, Yongjie Duan, Jianjiang Feng, Jie Zhou</dc:creator>
    </item>
    <item>
      <title>Combining Abstract Argumentation and Machine Learning for Efficiently Analyzing Low-Level Process Event Streams</title>
      <link>https://arxiv.org/abs/2505.05880</link>
      <description>arXiv:2505.05880v2 Announce Type: replace 
Abstract: Monitoring and analyzing process traces is a critical task for modern companies and organizations. In scenarios where there is a gap between trace events and reference business activities, this entails an interpretation problem, amounting to translating each event of any ongoing trace into the corresponding step of the activity instance. Building on a recent approach that frames the interpretation problem as an acceptance problem within an Abstract Argumentation Framework (AAF), one can elegantly analyze plausible event interpretations (possibly in an aggregated form), as well as offer explanations for those that conflict with prior process knowledge. Since, in settings where event-to-activity mapping is highly uncertain (or simply under-specified) this reasoning-based approach may yield lowly-informative results and heavy computation, one can think of discovering a sequence-tagging model, trained to suggest highly-probable candidate event interpretations in a context-aware way. However, training such a model optimally may require using a large amount of manually-annotated example traces. We then propose a data-efficient neuro-symbolic approach to the problem, where the candidate interpretations returned by the example-driven sequence tagger is refined by the AAF-based reasoner. This allows us to also leverage prior knowledge to compensate for the scarcity of example data, as confirmed by experimenftal results.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.05880v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bettina Fazzinga, Sergio Flesca, Filippo Furfaro, Luigi Pontieri, Francesco Scala</dc:creator>
    </item>
    <item>
      <title>Provable Distributional Value Iteration under Partial Observability</title>
      <link>https://arxiv.org/abs/2505.06518</link>
      <description>arXiv:2505.06518v3 Announce Type: replace 
Abstract: In many real-world planning tasks, agents must tackle uncertainty about the environment's state and variability in the outcomes induced by stochastic dynamics and rewards. Motivated by recent progress in world model approaches, where latent models approximate beliefs and support planning, we extend Distributional Reinforcement Learning (DistRL), which models the entire return distribution for fully observable domains, to Partially Observable Markov Decision Processes (POMDPs). Concretely, we introduce new distributional Bellman operators for partial observability and prove their convergence under the supremum p-Wasserstein metric. We also propose a finite representation of these return distributions via psi-vectors, generalizing the classical alpha-vectors in POMDP solvers. Building on this, we develop Distributional Point-Based Value Iteration (DPBVI), which integrates psi-vectors into a standard point-based backup procedure, bridging DistRL and POMDP planning. Our experiments demonstrate that DPBVI recovers classical Point-Based Value Iteration (PBVI) in the risk-neutral case, validating the distributional extension.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.06518v3</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Larry Preuett III, Qiuyi Zhang, Muhammad Aurangzeb Ahmad</dc:creator>
    </item>
    <item>
      <title>Deterministic Suffix-reading Automata</title>
      <link>https://arxiv.org/abs/2505.09353</link>
      <description>arXiv:2505.09353v5 Announce Type: replace 
Abstract: We introduce deterministic suffix-reading automata (DSA), a new automaton model over finite words. Transitions in a DSA are labeled with words. From a state, a DSA triggers an outgoing transition on seeing a word ending with the transition's label. Therefore, rather than moving along an input word letter by letter, a DSA can jump along blocks of letters, with each block ending in a suitable suffix. This feature allows DSAs to recognize regular languages more concisely, compared to DFAs. In this work, we focus on questions around finding a minimal DSA for a regular language. In this context, the number of states is not a faithful measure of the size of a DSA, since the transition-labels contain strings of arbitrary length. Hence, we consider total-size (number of states + number of edges + total length of transition-labels) as the size measure of DSAs.
  We start by formally defining the model and providing a DSA-to-DFA conversion that allows to compare the expressiveness and succinctness of DSA with related automata models. Our main technical contribution is a method to derive DSAs from a given DFA: a DFA-to-DSA conversion. We make a surprising observation that the smallest DSA derived from the canonical DFA of a regular language L need not be a minimal DSA for L. This observation leads to a fundamental bottleneck in deriving a minimal DSA for a regular language. In fact, we prove that given a DFA and a number k, the problem of deciding if there exists an equivalent DSA of total-size atmost k is NP-complete.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.09353v5</guid>
      <category>cs.FL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>R Keerthan, B Srivathsan, R Venkatesh, Sagar Verma</dc:creator>
    </item>
    <item>
      <title>UniMoCo: Unified Modality Completion for Robust Multi-Modal Embeddings</title>
      <link>https://arxiv.org/abs/2505.11815</link>
      <description>arXiv:2505.11815v2 Announce Type: replace 
Abstract: Current vision-language models have been explored for multi-modal embedding tasks like information retrieval. However, they face significant challenges in real-world queries and targets involving diverse modality combinations, as existing approaches often fail to align all modality combinations within a unified embedding space during training, leading to degraded performance on rare modality patterns during inference. To address this fundamental limitation, we propose UniMoCo, a novel architecture featuring a modality-completion module that generates visual features from text, thereby ensuring modality completeness for both queries and targets. Additionally, UniMoCo incorporates a specialized training strategy that aligns embeddings from both original and modality-completed inputs, thus ensuring consistent and robust embeddings for diverse modality combinations. Comprehensive experiments demonstrate that UniMoCo outperforms previous methods while exhibiting consistent robustness across diverse settings. Furthermore, we identify and quantify the inherent bias in conventional approaches caused by imbalanced modality combinations in training data, showing that our modality-completion paradigm effectively mitigates this limitation. The code is available at https://github.com/HobbitQia/UniMoCo.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.11815v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiajun Qin, Yuan Pu, Zhuolun He, Seunggeun Kim, David Z. Pan, Bei Yu</dc:creator>
    </item>
    <item>
      <title>Joint Relational Database Generation via Graph-Conditional Diffusion Models</title>
      <link>https://arxiv.org/abs/2505.16527</link>
      <description>arXiv:2505.16527v3 Announce Type: replace 
Abstract: Building generative models for relational databases (RDBs) is important for many applications, such as privacy-preserving data release and augmenting real datasets. However, most prior works either focus on single-table generation or adapt single-table models to the multi-table setting by relying on autoregressive factorizations and sequential generation. These approaches limit parallelism, restrict flexibility in downstream applications, and compound errors due to commonly made conditional independence assumptions. In this paper, we propose a fundamentally different approach: jointly modeling all tables in an RDB without imposing any table order. By using a natural graph representation of RDBs, we propose the Graph-Conditional Relational Diffusion Model (GRDM), which leverages a graph neural network to jointly denoise row attributes and capture complex inter-table dependencies. Extensive experiments on six real-world RDBs demonstrate that our approach substantially outperforms autoregressive baselines in modeling multi-hop inter-table correlations and achieves state-of-the-art performance on single-table fidelity metrics. Our code is available at https://github.com/ketatam/rdb-diffusion.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.16527v3</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mohamed Amine Ketata, David L\"udke, Leo Schwinn, Stephan G\"unnemann</dc:creator>
    </item>
    <item>
      <title>Emergent Hierarchical Structure in Large Language Models: An Information-Theoretic Framework for Multi-Scale Representation</title>
      <link>https://arxiv.org/abs/2505.18244</link>
      <description>arXiv:2505.18244v3 Announce Type: replace 
Abstract: Why do language models from different architecture families respond so differently to the same perturbation? We argue that the answer is not scale, but \emph{how architecture shapes information compression}. Analyzing eight Transformer models (7B--70B parameters) from the Llama and Qwen families, we show that every model spontaneously develops discrete functional boundaries dividing its layers into Local, Intermediate, and Global processing segments -- yet boundary locations and per-segment brittleness are determined overwhelmingly by architecture family rather than model size or training configuration. We formalize this regularity as the \textbf{Multi-Scale Probabilistic Generation Theory} (MSPGT), which models an autoregressive Transformer as a Hierarchical Variational Information Bottleneck system and derives a tiered set of falsifiable predictions. Three predictions are strongly confirmed: all eight models exhibit two prominent phase-transition boundaries (P1.1); Llama boundary positions are stable across a $10{\times}$ parameter range ($\mathrm{CV}{=}0.067$--$0.095$) while Qwen positions vary widely ($\mathrm{CV}{=}0.465$--$0.726$), precisely matching our strong- and weak-dominance conditions; and cross-architecture local-segment brittleness spans \textbf{three orders of magnitude} ($493{\times}$ ratio) -- a gap that architecture family alone predicts and that dwarfs any within-family or scale-driven variation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.18244v3</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yukin Zhang, Qi Dong, Kemu Xu</dc:creator>
    </item>
    <item>
      <title>Purdah and Patriarchy: Evaluating and Mitigating South Asian Biases in Open-Ended Multilingual LLM Generations</title>
      <link>https://arxiv.org/abs/2505.18466</link>
      <description>arXiv:2505.18466v2 Announce Type: replace 
Abstract: Evaluations of Large Language Models (LLMs) often overlook intersectional and culturally specific biases, particularly in underrepresented multilingual regions like South Asia. This work addresses these gaps by conducting a multilingual and intersectional analysis of LLM outputs across 10 Indo-Aryan and Dravidian languages, identifying how cultural stigmas influenced by purdah and patriarchy are reinforced in generative tasks. We construct a culturally grounded bias lexicon capturing previously unexplored intersectional dimensions including gender, religion, marital status, and number of children. We use our lexicon to quantify intersectional bias and the effectiveness of self-debiasing in open-ended generations (e.g., storytelling, hobbies, and to-do lists), where bias manifests subtly and remains largely unexamined in multilingual contexts. Finally, we evaluate two self-debiasing strategies (simple and complex prompts) to measure their effectiveness in reducing culturally specific bias in Indo-Aryan and Dravidian languages. Our approach offers a nuanced lens into cultural bias by introducing a novel bias lexicon and evaluation framework that extends beyond Eurocentric or small-scale multilingual settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.18466v2</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mamnuya Rinki, Chahat Raj, Anjishnu Mukherjee, Ziwei Zhu</dc:creator>
    </item>
    <item>
      <title>Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap</title>
      <link>https://arxiv.org/abs/2505.19625</link>
      <description>arXiv:2505.19625v3 Announce Type: replace 
Abstract: Search-based software engineering (SBSE), which integrates metaheuristic search techniques with software engineering, has been an active area of research for about 25 years. It has been applied to solve numerous problems across the entire software engineering lifecycle and has demonstrated its versatility in multiple domains. With recent advances in Artificial Intelligence (AI), particularly the emergence of foundation models (FMs) such as large language models (LLMs), the evolution of SBSE alongside these models remains undetermined. In this window of opportunity, we present a research roadmap that articulates the current landscape of SBSE in relation to FMs, identifies open challenges, and outlines potential research directions to advance SBSE through its synergy with FMs. Specifically, we analyze three core aspects: utilizing FMs to enhance SBSE, applying SBSE to advance FMs, and exploring the integration of SBSE and FMs. Furthermore, we present a forward-thinking perspective that envisions the future of SBSE in the era of FMs, highlighting promising research opportunities to address challenges in emerging domains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.19625v3</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hassan Sartaj, Shaukat Ali, Paolo Arcaini, Andrea Arcuri</dc:creator>
    </item>
    <item>
      <title>Software Engineering for Self-Adaptive Robotics: A Research Agenda</title>
      <link>https://arxiv.org/abs/2505.19629</link>
      <description>arXiv:2505.19629v3 Announce Type: replace 
Abstract: Self-adaptive robotic systems operate autonomously in dynamic and uncertain environments, requiring robust real-time monitoring and adaptive behaviour. Unlike traditional robotic software with predefined logic, self-adaptive robots exploit artificial intelligence (AI), machine learning, and model-driven engineering to adapt continuously to changing conditions, thereby ensuring reliability, safety, and optimal performance. This paper presents a research agenda for software engineering in self-adaptive robotics, structured along two dimensions. The first concerns the software engineering lifecycle, requirements, design, development, testing, and operations, tailored to the challenges of self-adaptive robotics. The second focuses on enabling technologies such as digital twins and AI-driven adaptation, which support runtime monitoring, fault detection, and automated decision-making. We identify open challenges, including verifying adaptive behaviours under uncertainty, balancing trade-offs between adaptability, performance, and safety, and integrating self-adaptation frameworks like MAPE K/MAPLE-K. By consolidating these challenges into a roadmap toward 2030, this work contributes to the foundations of trustworthy and efficient self-adaptive robotic systems capable of meeting the complexities of real-world deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.19629v3</guid>
      <category>cs.SE</category>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hassan Sartaj, Shaukat Ali, Ana Cavalcanti, Lukas Esterle, Cl\'audio Gomes, Peter Gorm Larsen, Anastasios Tefas, Jim Woodcock, Houxiang Zhang</dc:creator>
    </item>
    <item>
      <title>Beyond Exponential Decay: Rethinking Error Accumulation in Large Language Models</title>
      <link>https://arxiv.org/abs/2505.24187</link>
      <description>arXiv:2505.24187v2 Announce Type: replace 
Abstract: The prevailing assumption of an exponential decay in large language model (LLM) reliability with sequence length, predicated on independent per-token error probabilities, posits an inherent limitation for long autoregressive outputs. Our research fundamentally challenges this view by synthesizing emerging evidence that LLM errors are not uniformly distributed but are concentrated at sparse "key tokens" ($5-10\%$ of total tokens) representing critical decision junctions. By distinguishing these high-impact tokens from the increasingly predictable majority, we introduce a new reliability formula explaining the sustained coherence of modern LLMs over thousands of tokens. Converging research streams reveal that long-context performance primarily depends on accurately navigating a few crucial semantic decision points rather than on uniform token-level accuracy, enabling targeted strategies that significantly outperform brute-force approaches. We thus propose a framework for next-generation systems centered on selective preservation of semantically vital tokens, dynamic computational allocation at uncertain decision boundaries, multi-path exploration at ambiguities, and architectures aligned with natural semantic domains. This marks a fundamental shift from raw scaling to strategic reasoning, promising breakthrough performance without proportionate computational scaling and offering a more nuanced understanding that supersedes the exponential decay hypothesis, thereby opening pathways toward substantially more powerful and efficient language systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.24187v2</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mikhail L. Arbuzov, Sisong Bei, Ziwei Dong, Dmitri Kalaev, Alexey A. Shvets</dc:creator>
    </item>
    <item>
      <title>SysLLMatic: Large Language Models are Software System Optimizers</title>
      <link>https://arxiv.org/abs/2506.01249</link>
      <description>arXiv:2506.01249v3 Announce Type: replace 
Abstract: Automatic software system optimization can improve software speed, reduce operating costs, and save energy. Traditional approaches to optimization rely on manual tuning and compiler heuristics, limiting their ability to generalize across diverse codebases and system contexts. Recent methods using Large Language Models (LLMs) introduce automation on simple programs, but they do not scale effectively to the complexity and size of real-world software systems. We present SysLLMatic, a system that integrates LLMs with performance diagnostics and a curated catalog of 43 optimization patterns to automatically optimize software systems. By leveraging profiling to identify performance hotspots, our approach enables LLMs to optimize real-world software beyond isolated code snippets. We evaluate it on three benchmark suites: HumanEval_CPP (competitive programming in C++), SciMark2 (scientific kernels in Java), and DaCapo (large-scale software systems in Java). Results show that SysLLMatic can improve software system performance, including latency, throughput, energy efficiency, memory usage, and CPU utilization. It consistently outperforms state-of-the-art LLM baselines on microbenchmarks. On large-scale application codes, to which prior LLM approaches have not scaled, it surpasses compiler optimizations, achieving average relative improvements of 1.54x in latency (vs. 1.01x for the compiler) and 1.24x in energy (vs. 1.08x for the compiler). Our findings demonstrate that LLMs, guided by performance knowledge through the optimization pattern catalog and appropriate performance diagnostics, can serve as viable software system optimizers. We further identify limitations of our approach and the challenges involved in handling complex applications. This work provides a foundation for generating optimized code across various languages, benchmarks, and program sizes in a principled manner.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.01249v3</guid>
      <category>cs.SE</category>
      <category>cs.PF</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Huiyun Peng, Arjun Gupte, Ryan Hasler, Nicholas John Eliopoulos, Chien-Chou Ho, Rishi Mantri, Leo Deng, Konstantin L\"aufer, George K. Thiruvathukal, James C. Davis</dc:creator>
    </item>
    <item>
      <title>Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning</title>
      <link>https://arxiv.org/abs/2506.06856</link>
      <description>arXiv:2506.06856v3 Announce Type: replace 
Abstract: Visual reasoning is crucial for understanding complex multimodal data and advancing Artificial General Intelligence. Existing methods enhance the reasoning capability of Multimodal Large Language Models (MLLMs) through Reinforcement Learning (RL) fine-tuning (e.g., GRPO). However, current RL approaches sample action groups solely from the policy model itself, which limits the upper boundary of the model's reasoning capability and leads to inefficient training. To address these limitations, this paper proposes a novel RL framework called \textbf{Vision-EKIPL}. The core of this framework lies in introducing high-quality actions generated by external auxiliary models during the RL training process to guide the optimization of the policy model. The policy learning with knowledge infusion from external models significantly expands the model's exploration space, effectively improves the reasoning boundary, and substantially accelerates training convergence speed and efficiency. Experimental results demonstrate that our proposed Vision-EKIPL achieved up to a 5\% performance improvement on the Reason-RFT-CoT Benchmark compared to the state-of-the-art (SOTA). It reveals that Vision-EKIPL can overcome the limitations of traditional RL methods, significantly enhance the visual reasoning performance of MLLMs, and provide a new effective paradigm for research in this field.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.06856v3</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chaoyang Wang, Zeyu Zhang, Meng Meng, Xu Zhou, Haiyun Jiang</dc:creator>
    </item>
    <item>
      <title>Enhanced Consistency Bi-directional GAN (CBiGAN) for Malware Anomaly Detection</title>
      <link>https://arxiv.org/abs/2506.07372</link>
      <description>arXiv:2506.07372v2 Announce Type: replace 
Abstract: Static malware analysis remains a core technique in cybersecurity due to its ability to assess potentially malicious software without execution. Nevertheless, many existing static approaches rely on handcrafted features or curated datasets that may not generalize well to evolving malware distributions. In this work, we investigate an alternative representation that operates directly on raw binary content. Executable files are transformed into visual encodings that preserve local structural relationships, enabling the use of deep learning models without requiring semantic disassembly or dynamic behavior profiling. This study explores the use of a Consistency Bi-directional Generative Adversarial Network (CBi-GAN) as an anomaly detection framework rather than as a generative model. The method enforces consistency between latent encodings and reconstructions, allowing deviations from learned benign structure to be quantified through reconstruction discrepancies. Importantly, the approach does not introduce a new generative architecture, instead, it evaluates how consistency based generative modeling can be applied at scale to heterogeneous malware data. The proposed framework is evaluated across multiple datasets comprising both Portable Executable (PE) and Object Linking and Embedding (OLE) files, including a large self-collected corpus spanning 214 malware families. Results demonstrate stable detection performance in terms of Area Under the Curve (AUC) while maintaining a unified and computationally lightweight processing pipeline. These findings suggest that consistency based generative modeling provides a practical and scalable direction for malware anomaly detection across diverse file formats and threat families.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.07372v2</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Thesath Wijayasiri, Kar Wai Fok, Vrizlynn L. L. Thing</dc:creator>
    </item>
    <item>
      <title>Overcoming Environmental Meta-Stationarity in MARL via Adaptive Curriculum and Counterfactual Group Advantage</title>
      <link>https://arxiv.org/abs/2506.07548</link>
      <description>arXiv:2506.07548v2 Announce Type: replace 
Abstract: Multi-agent reinforcement learning (MARL) has reached competitive performance on cooperative tasks against scripted adversaries, yet most methods train agents at a single fixed difficulty throughout the entire run. We term this static-difficulty regime environmental meta-stationarity and show that it caps policy generalization and steers learning toward shallow local optima. To break this regime, we propose CL-MARL, a dynamic curriculum learning framework that adapts opponent strength online from win-rate signals, advancing or regressing the task as agents master it. Its scheduler, FlexDiff, fuses momentum-based trend estimation with sliding-window dual-curve monitoring of training and evaluation returns, yielding stable difficulty transitions without manual tuning. Because a moving curriculum amplifies non-stationarity and sparsifies global rewards, we introduce the Counterfactual Group Relative Policy Advantage (CGRPA), which extends GRPO-style group-relative optimization with counterfactual baselines to disentangle each agent's contribution under shifting team dynamics. On the StarCraft Multi-Agent Challenge (SMAC), CL-MARL attains a 40% mean win rate on the super-hard maps with an average episode return of 17.85, exceeding the QMIX, OW-QMIX, DER, EMC, and MARR baselines by +2.94 on average, while reaching its peak win rate roughly 1.28faster on 8m_vs_9m and 1.42 faster on 3s5z_vs_3s6z than the strongest baseline. The implementation is publicly available at https://github.com/NICE-HKU/CL2MARL-SMAC.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.07548v2</guid>
      <category>cs.AI</category>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Weiqiang Jin, Yang Liu, Shixiang Tang, Jinhu Qi, Wentao Zhang, Junli Wang, Biao Zhao, Hongyang Du</dc:creator>
    </item>
    <item>
      <title>Diverse Committees with Incomplete or Inaccurate Approval Ballots</title>
      <link>https://arxiv.org/abs/2506.10843</link>
      <description>arXiv:2506.10843v3 Announce Type: replace 
Abstract: We study diversity in approval-based committee elections with incomplete or inaccurate information. We define diversity according to the Maximum Coverage problem, which is known to be $\mathsf{NP}$-complete, with a best attainable polynomial time approximation ratio of $1-1/e$. In the incomplete information setting, voters vote only on a small portion of the candidates, and we prove that getting arbitrarily close to the optimal approximation ratio w.h.p. requires $\Omega(m^2)$ non-adaptive queries, where $m$ is the number of candidates. This motivates studying adaptive querying algorithms, that can adapt their querying strategy to information obtained from previous query outcomes. In that setting, we lower this bound to only $\Omega(m)$ queries. We propose a greedy algorithm to match this lower bound up to log-factors. We prove the same $\tilde\Theta(m)$ bound for the generalized problem of Maximum Coverage over a matroid constraint, using a local search algorithm. Specifying a matroid of valid committees lets us implement extra structural requirements on the committee, like quota. In the inaccurate information setting, voters' responses are corrupted with a small probability. We prove $\tilde\Theta(nm)$ queries are required to attain a $(1-1/e)$-approximation with high probability, where $n$ is the number of voters. While the proven bounds show that all our algorithms are viable asymptotically, they also show that some of them would still require large numbers of queries in instances of practical relevance. Using real data from Polis as well as synthetic data, we observe that our algorithms perform well also on smaller instances, both with incomplete and inaccurate information.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.10843v3</guid>
      <category>cs.GT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Feline Lindeboom, Martijn Brehm, Davide Grossi, Pradeep Murukannaiah</dc:creator>
    </item>
    <item>
      <title>Shape2Animal: Creative Animal Generation from Natural Silhouettes</title>
      <link>https://arxiv.org/abs/2506.20616</link>
      <description>arXiv:2506.20616v3 Announce Type: replace 
Abstract: Humans possess a unique ability to perceive meaningful patterns in ambiguous stimuli, a cognitive phenomenon known as pareidolia. This paper introduces Shape2Animal framework to mimics this imaginative capacity by reinterpreting natural object silhouettes, such as clouds, stones, or flames, as plausible animal forms. Our automated framework first performs open-vocabulary segmentation to extract object silhouette and interprets semantically appropriate animal concepts using vision-language models. It then synthesizes an animal image that conforms to the input shape, leveraging text-to-image diffusion model and seamlessly blends it into the original scene to generate visually coherent and spatially consistent compositions. We evaluated Shape2Animal on a diverse set of real-world inputs, demonstrating its robustness and creative potential. Our Shape2Animal can offer new opportunities for visual storytelling, educational content, digital art, and interactive media design. Our project page is here: https://shape2image.github.io</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.20616v3</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Quoc-Duy Tran, Anh-Tuan Vo, Dinh-Khoi Vo, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le</dc:creator>
    </item>
    <item>
      <title>FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing</title>
      <link>https://arxiv.org/abs/2506.20911</link>
      <description>arXiv:2506.20911v2 Announce Type: replace 
Abstract: We develop a cost-efficient neurosymbolic agent to address challenging multi-turn image editing tasks such as ``Detect the bench in the image while recoloring it to pink. Also, remove the cat for a clearer view and recolor the wall to yellow.'' It combines the fast, high-level subtask planning by large language models (LLMs) with the slow, accurate, tool-use, and local A$^*$ search per subtask to find a cost-efficient toolpath -- a sequence of calls to AI tools. To save the cost of A$^*$ on similar subtasks, we perform inductive reasoning on previously successful toolpaths via LLMs to continuously extract/refine frequently used subroutines and reuse them as new tools for future tasks in an adaptive fast-slow planning, where the higher-level subroutines are explored first, and only when they fail, the low-level A$^*$ search is activated. The reusable symbolic subroutines considerably save exploration cost on the same types of subtasks applied to similar images, yielding a human-like fast-slow toolpath agent ``FaSTA$^*$'': fast subtask planning followed by rule-based subroutine selection per subtask is attempted by LLMs at first, which is expected to cover most tasks, while slow A$^*$ search is only triggered for novel and challenging subtasks. By comparing with recent image editing approaches, we demonstrate FaSTA$^*$ is significantly more computationally efficient while remaining competitive with the state-of-the-art baseline in terms of success rate. Our code and data can be accessed at https://github.com/tianyi-lab/FaSTAR.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.20911v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Advait Gupta, Rishie Raj, Dang Nguyen, Tianyi Zhou</dc:creator>
    </item>
    <item>
      <title>Experience converting a large mathematical software package written in C++ to C++20 modules</title>
      <link>https://arxiv.org/abs/2506.21654</link>
      <description>arXiv:2506.21654v2 Announce Type: replace 
Abstract: Mathematical software has traditionally been built in the form of "packages" that build on each other. A substantial fraction of these packages is written in C++ and, as a consequence, the interface of a package is described in the form of header files that downstream packages and applications can then #include. C++ has inherited this approach towards exporting interfaces from C, but the approach is clunky, unreliable, and slow. As a consequence, C++20 has introduced a "module" system in which packages explicitly export declarations and code that compilers then store in machine-readable form and that downstream users can "import" -- a system in line with what many other programming languages have used for decades.
  Herein, I explore how one can convert large mathematical software packages written in C++ to this system, using the deal.II finite element library with its around 800,000 lines of code as an example. I describe an approach that allows providing both header-based and module-based interfaces from the same code base, discuss the challenges one encounters, and how modules actually work in practice in a variety of technical and human metrics. The results show that with a non-trivial, but also not prohibitive effort, the conversion to modules is possible, resulting in a reduction in compile time for the converted library itself; on the other hand, for downstream projects, compile times show no clear trend. I end with thoughts about long-term strategies for converting the entire ecosystem of mathematical software over the coming years or decades.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.21654v2</guid>
      <category>cs.SE</category>
      <category>cs.MS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wolfgang Bangerth</dc:creator>
    </item>
    <item>
      <title>AnyPos: Automated Task-Agnostic Actions for Bimanual Manipulation</title>
      <link>https://arxiv.org/abs/2507.12768</link>
      <description>arXiv:2507.12768v2 Announce Type: replace 
Abstract: Learning generalizable manipulation policies hinges on data, yet robot manipulation data is scarce and often entangled with specific embodiments, making both cross-task and cross-platform transfer difficult. We tackle this challenge with task-agnostic embodiment modeling, which learns embodiment dynamics directly from task-agnostic action data and decouples them from high-level policy learning. By focusing on exploring all feasible actions of the embodiment to capture what is physically feasible and consistent, task-agnostic data takes the form of independent image-action pairs with the potential to cover the entire embodiment workspace, unlike task-specific data, which is sequential and tied to concrete tasks. This data-driven perspective bypasses the limitations of traditional dynamics-based modeling and enables scalable reuse of action data across different tasks. Building on this principle, we introduce AnyPos, a unified pipeline that integrates large-scale automated task-agnostic exploration with robust embodiment modeling through inverse dynamics learning. AnyPos generates diverse yet safe trajectories at scale, then learns embodiment representations by decoupling arm and end-effector motions and employing a direction-aware decoder to stabilize predictions under distribution shift, which can be seamlessly coupled with diverse high-level policy models. In comparison to the standard baseline, AnyPos achieves a 51% improvement in test accuracy. On manipulation tasks such as operating a microwave, toasting bread, folding clothes, watering plants, and scrubbing plates, AnyPos raises success rates by 30-40% over strong baselines. These results highlight data-driven embodiment modeling as a practical route to overcoming data scarcity and achieving generalization across tasks and platforms in visuomotor control. Project page: https://embodiedfoundation.github.io/vidar_anypos.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.12768v2</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hengkai Tan, Yao Feng, Xinyi Mao, Shuhe Huang, Guodong Liu, Zhongkai Hao, Hang Su, Jun Zhu</dc:creator>
    </item>
    <item>
      <title>The Tsetlin Machine Goes Deep: Logical Learning and Reasoning With Graphs</title>
      <link>https://arxiv.org/abs/2507.14874</link>
      <description>arXiv:2507.14874v2 Announce Type: replace 
Abstract: Pattern recognition with concise and flat AND-rules makes the Tsetlin Machine (TM) both interpretable and efficient, while the power of Tsetlin automata enables accuracy comparable to deep learning on an increasing number of datasets. We introduce the Graph Tsetlin Machine (GraphTM) for learning interpretable deep clauses from graph-structured input. Moving beyond flat, fixed-length input, the GraphTM gets more versatile, supporting sequences, grids, relations, and multimodality. Through message passing, the GraphTM builds nested deep clauses to recognize sub-graph patterns with exponentially fewer clauses, increasing both interpretability and data utilization. For image classification, GraphTM preserves interpretability and achieves 3.86%-points higher accuracy on CIFAR-10 than a convolutional TM. For tracking action coreference, faced with increasingly challenging tasks, GraphTM outperforms other reinforcement learning methods by up to 20.6%-points. In recommendation systems, it tolerates increasing noise to a great extent, similar to a GCN. Finally, for viral genome sequence data, GraphTM is competitive with BiLSTM-CNN and GCN accuracy-wise, training ~2.5x faster than GCN. The GraphTM's application to these varied fields demonstrates how graph representation learning and deep clauses bring new possibilities for TM learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.14874v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ole-Christoffer Granmo, Youmna Abdelwahab, Per-Arne Andersen, Karl Audun K. Borgersen, Paul F. A. Clarke, Kunal Dumbre, Ylva Gr{\o}nnings{\ae}ter, Vojtech Halenka, Runar Helin, Lei Jiao, Ahmed Khalid, Rebekka Omslandseter, Rupsa Saha, Mayur Shende, Xuan Zhang</dc:creator>
    </item>
    <item>
      <title>IDRBench: Understanding the Capability of Large Language Models on Interdisciplinary Research</title>
      <link>https://arxiv.org/abs/2507.15736</link>
      <description>arXiv:2507.15736v2 Announce Type: replace 
Abstract: Innovation is a key driving force of human civilization. As the body of knowledge has grown considerably, bridging knowledge across different disciplines, where significant innovation often emerges, has become increasingly challenging. The recent advancements in machine learning models, particularly Large Language Models (LLMs), have provided effective access to extensive knowledge sources and shown impressive abilities in reasoning, rendering significant opportunities for interdisciplinary discovery. Our research aims to understand the capabilities of state-of-the-art LLMs in integrating knowledge from different fields for interdisciplinary research (IDR). To address this fundamental problem, we introduce IDRBench, a pioneering framework that includes both datasets and evaluation tasks: (1) IDR Paper Identification, (2) IDR Idea Integration, and (3) IDR Idea Recommendation. Our study on ten mainstream LLMs provides a comprehensive analysis of their behavior and establishes benchmarks and baselines for future research. To the best of our knowledge, IDRBench is the first to provide a comprehensive investigation of LLMs' IDR capability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.15736v2</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuanhao Shen, Daniel Xavier de Sousa, Ricardo Mar\c{c}al, Hongyu Guo, Xiaodan Zhu</dc:creator>
    </item>
    <item>
      <title>Optimizing Split Learning Latency in TinyML-Based IoT Systems</title>
      <link>https://arxiv.org/abs/2507.16594</link>
      <description>arXiv:2507.16594v2 Announce Type: replace 
Abstract: Split learning (SL) addresses the limitation of running deep learning inference directly on low-power edge/IoT nodes, in which it executes part of the inference process on the sensor and offloading the remainder to a companion device. Despite its promise, the inference latency of SL on constrained hardware under realistic low-power wireless protocols remains unexplored. This paper presents the first experimental latency benchmark of TinyML-based SL on ESP32-S3 boards, comparing four wireless communication protocol solutions (UDP, TCP, ESP-NOW, BLE). We also analyze the impact of the choice of different split points across different models (MobileNet-V2 and ResNet50) in terms of communication and computation overhead as a way to minimize the end-to-end inference latency. We propose a Beam Search-based algorithm for split point optimization that minimizes end-to-end latency, and compare it with other methods, including Greedy Search, First-Fit, Random-Fit, and Brute Force. ESP-NOW achieves the best RTT (3.6 s) and serves as the base protocol for the algorithm, which delivers near-optimal latency with processing time of 0.1 s for 5 devices.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.16594v2</guid>
      <category>cs.NI</category>
      <category>cs.AI</category>
      <category>cs.DC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zied Jenhani, Mounir Bensalem, Jasenka Dizdarevi\'c, Admela Jukan</dc:creator>
    </item>
    <item>
      <title>Higher-order Kripke models for intuitionistic and non-classical modal logics</title>
      <link>https://arxiv.org/abs/2507.18798</link>
      <description>arXiv:2507.18798v4 Announce Type: replace 
Abstract: This paper introduces higher-order (``nested") Kripke models, a generalization of Kripke models that is remarkably close to Kripke's original idea -- both mathematically and conceptually. Standard models are now $0$-ary models, whereas $n$-ary models for $n &gt; 0$ are models whose set of objects (``possible worlds'') contain only $(n-1)$-ary models. A key idea is the use of worlds as fixed points for modal definitions, in the sense that what is necessary or possible in a world of a frame depends only on what is true in the same world on the accessible frames. This paper mainly deals with the paradigmatic cases of intuitionistic modal logics $IK$ and $MK$, from which the generalisation to other non-classical logics arises naturally. The association between conditions on accessibility relations and modal axioms also carries over to this framework, so modal logics stronger than $K$ can be obtained by imposing requirements on the relations between frames. Just like Kripke models define a concept of ``alternative'' for classical models, the $n$-ary models (for $n &gt; 0$) defines the same concept for any interpretation of the $(n-1)$-ary models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.18798v4</guid>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Victor Barroso-Nascimento</dc:creator>
    </item>
    <item>
      <title>When Engineering Outruns Intelligence: Rethinking Instruction-Guided Navigation</title>
      <link>https://arxiv.org/abs/2507.20021</link>
      <description>arXiv:2507.20021v3 Announce Type: replace 
Abstract: Recent ObjectNav systems credit large language models (LLMs) for sizable zero-shot gains, yet it remains unclear how much comes from language versus geometry. We revisit this question by re-evaluating an instruction-guided pipeline, InstructNav, under a detector-controlled setting and introducing two training-free variants that only alter the action value map: a geometry-only Frontier Proximity Explorer (FPE) and a lightweight Semantic-Heuristic Frontier (SHF) that polls the LLM with simple frontier votes. Across HM3D and MP3D, FPE matches or exceeds the detector-controlled instruction follower while using no API calls and running faster; SHF attains comparable accuracy with a smaller, localized language prior. These results suggest that carefully engineered frontier geometry accounts for much of the reported progress, and that language is most reliable as a light heuristic rather than an end-to-end planner. Code available at: https://github.com/matinaghaei/instructnav-scrutinized</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.20021v3</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Matin Aghaei, Lingfeng Zhang, Mohammad Ali Alomrani, Mahdi Biparva, Yingxue Zhang</dc:creator>
    </item>
    <item>
      <title>SGEMM-cube: Precision-Recovery FP32 GEMM Approximation on Ascend NPUs with FP16 Matrix Engines</title>
      <link>https://arxiv.org/abs/2507.23387</link>
      <description>arXiv:2507.23387v4 Announce Type: replace 
Abstract: Modern AI accelerators provide high-throughput low-precision matrix engines, but their support for FP32 GEMM is often limited or inefficient. This work presents SGEMM-cube, a precision-recovery FP32 GEMM approximation on Ascend NPUs using FP16 Cube units. Rather than claiming bit-exact FP32 approximation, SGEMM-cube targets near-FP32 accuracy for inputs whose magnitudes are representable within the FP16 dynamic range. The method follows a two-component FP32-to-FP16 splitting strategy related to Ozaki-style and Ootomo-style schemes: each FP32 operand is represented by an FP16 high component and a scaled FP16 residual component, and the matrix product is reconstructed from the dominant high-high and high-low terms while omitting the low-low term. The main contribution of this paper is not a new splitting paradigm, but an architecture-specific realization and analysis of this precision-recovery scheme on Ascend NPUs. We analyze the effects of round-to-nearest conversion, underflow, residual scaling, and accumulation order under the Ascend execution model, and clarify the range and accuracy limitations of the approach. We further adapt standard high-performance GEMM techniques, including L1-aware blocking and double-buffered pipelining, to the software-managed memory hierarchy of Ascend NPUs. Experiments on Ascend 910A show that SGEMM-cube recovers substantially higher accuracy than native FP16 GEMM and approaches FP32 SGEMM accuracy for moderate-range inputs, while achieving up to 65.3 TFLOP/s, corresponding to 77\% of the FP32-equivalent peak defined by the three-GEMM decomposition cost. These results demonstrate that FP32-accuracy GEMM approximation can be made practical on FP16-only NPU matrix engines, provided that its range, error, and implementation constraints are explicitly managed.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.23387v4</guid>
      <category>cs.DC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Weicheng Xue, Baisong Xu, Kai Yang, Yongxiang Liu, Dengdeng Fan, Pengxiang Xu, Yonghong Tian</dc:creator>
    </item>
    <item>
      <title>Adaptive Ensemble Aggregation for Actor-Critics</title>
      <link>https://arxiv.org/abs/2507.23501</link>
      <description>arXiv:2507.23501v2 Announce Type: replace 
Abstract: Ensembles are ubiquitous in off-policy actor-critic learning, yet their efficacy depends critically on how they are aggregated. Current methods typically rely on static rules or task-specific hyperparameters to balance overestimation bias and variance, leaving the challenge of a truly adaptive approach open. We introduce Adaptive Ensemble Aggregation (AEA), an algorithm that dynamically constructs ensemble-based targets for both critic and actor updates directly from training dynamics. We prove that AEA converges to a unique equilibrium where the aggregation parameter minimizes value estimation error within a defined stability region. Theoretically, we establish that AEA achieves a shrinkage property where the estimation bias vanishes as the total ensemble size grows. Unlike subset-based methods like REDQ, which hit an information bottleneck determined by a fixed variance floor regardless of the ensemble size, AEA exploits the full ensemble to achieve optimal variance reduction-scaling inversely with the total number of models-and maximal Fisher information. Furthermore, we provide a formal guarantee for monotonic policy improvement under this adaptive regime. Extensive evaluations on various continuous control tasks demonstrate that AEA outperforms, on the majority of tasks, state-of-the-art baselines, providing a robust and self-calibrating framework for ensemble-based reinforcement learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.23501v2</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Nicklas Werge, Yi-Shan Wu, Manuel Haussmann, Bahareh Tasdighi, Melih Kandemir</dc:creator>
    </item>
    <item>
      <title>Coward: Collision-based OOD Watermarking for Practical Proactive Federated Backdoor Detection</title>
      <link>https://arxiv.org/abs/2508.02115</link>
      <description>arXiv:2508.02115v4 Announce Type: replace 
Abstract: Backdoor detection is currently the mainstream defense against backdoor attacks in federated learning (FL), where a small number of malicious clients can upload poisoned updates to compromise the federated global model. Existing backdoor detection techniques fall into two categories, passive and proactive, depending on whether the server proactively intervenes in the training process. However, both of them have practical limitations: passive detection methods are disrupted by common non-i.i.d. data distributions and random participation of FL clients, whereas current proactive detection methods are misled by an inevitable out-of-distribution (OOD) bias because they rely on backdoor coexistence effects. To address these issues, we introduce a novel proactive detection method dubbed Coward, inspired by our discovery of multi-backdoor collision effects, in which consecutively planted, distinct backdoors significantly suppress earlier ones. Correspondingly, we modify the federated global model by injecting a carefully designed backdoor-collided watermark, implemented via regulated dual-mapping learning on OOD data. This design not only enables an inverted detection paradigm compared to existing proactive methods, thereby naturally counteracting the adverse impact of OOD prediction bias, but also introduces a low-disruptive training intervention that inherently limits the strength of OOD bias, leading to significantly fewer misjudgments. Extensive experiments on benchmark datasets show that Coward achieves state-of-the-art performance and effectively alleviates OOD bias.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.02115v4</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wenjie Li, Siying Gu, Yiming Li, Shuxin Li, Zhili Chen, Tianwei Zhang, Shu-Tao Xia</dc:creator>
    </item>
    <item>
      <title>ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments</title>
      <link>https://arxiv.org/abs/2508.04204</link>
      <description>arXiv:2508.04204v2 Announce Type: replace 
Abstract: Large Reasoning Models (LRMs) have demonstrated impressive performance in reasoning-intensive tasks, but they remain vulnerable to harmful content generation, particularly in the mid-to-late steps of their reasoning processes. Current defense methods, however, depend on costly fine-tuning and additional expert knowledge, which limits their scalability. In this work, we propose ReasoningGuard, an inference-time safeguard for LRMs. It injects timely safety aha moments during the reasoning process to guide the model towards harmless yet helpful reasoning. Our approach leverages the internal attention mechanisms of the LRM to accurately identify key points in the reasoning path, triggering safety-oriented reflections. To safeguard both the subsequent reasoning steps and the final answers, we implement a scaling sampling strategy during decoding to select the optimal reasoning path. With minimal additional inference cost, ReasoningGuard effectively mitigates four types of jailbreak attacks, including recent ones targeting the reasoning process of LRMs. Our approach outperforms nine existing safeguards, providing state-of-the-art defenses while avoiding common exaggerated safety issues.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.04204v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuquan Wang, Mi Zhang, Yining Wang, Geng Hong, Mi Wen, Xiaoyu You, Min Yang</dc:creator>
    </item>
    <item>
      <title>The domain-of-dependence stabilization for cut-cell meshes is fully discretely stable</title>
      <link>https://arxiv.org/abs/2508.05372</link>
      <description>arXiv:2508.05372v2 Announce Type: replace 
Abstract: We present a fully discrete stability analysis of the domain-of-dependence stabilization for hyperbolic problems. The method aims to address issues caused by small cut cells by redistributing mass around the neighborhood of a small cut cell at a semi-discrete level. Our analysis is conducted for the linear advection model problem in one spatial dimension. We demonstrate that fully discrete stability can be achieved under a time step restriction that does not depend on the arbitrarily small cells, using an operator norm estimate. Additionally, this analysis offers a detailed understanding of the stability mechanism and highlights some challenges associated with higher-order polynomials. We also propose a way to mitigate these issues to derive a feasible CFL-like condition. The analytical findings, as well as the proposed solution are verified numerically in one- and two-dimensional simulations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.05372v2</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.5802/smai-jcm.147</arxiv:DOI>
      <arxiv:journal_reference>The SMAI Journal of computational mathematics, Volume 12 (2026), pp. 187-218</arxiv:journal_reference>
      <dc:creator>Louis Petri, Gunnar Birke, Christian Engwer, Hendrik Ranocha</dc:creator>
    </item>
    <item>
      <title>PureSample: Neural Materials Learned by Sampling Microgeometry</title>
      <link>https://arxiv.org/abs/2508.07240</link>
      <description>arXiv:2508.07240v3 Announce Type: replace 
Abstract: Traditional physically-based material models rely on analytically derived bidirectional reflectance distribution functions (BRDFs), typically by considering statistics of micro-primitives such as facets, flakes, or spheres, sometimes combined with multi-bounce interactions such as layering and multiple scattering. These derivations are often complex and model-specific. Once an analytic BRDF evaluation is defined, one still needs to design an importance sampling method for it and evaluate the probability density function (pdf) of that sampling distribution, requiring further model-specific derivations. We present PureSample: a novel neural BRDF representation that allows learning a material's appearance purely by sampling forward random walks on the microgeometry, which is usually straightforward to implement. Our representation allows for efficient BRDF evaluation, importance sampling, and pdf evaluation, for homogeneous as well as spatially varying materials. We achieve this by two learnable components: first, the sampling distribution is modeled using a flow matching neural network, which allows both importance sampling and pdf evaluation; second, we introduce a view-dependent albedo term, captured by a lightweight neural network, which allows for converting a pdf value to a BRDF value for any pair of view and light directions. We demonstrate PureSample on challenging materials, including various microgeometries, multi-layered materials, and multiple-scattering microfacet materials.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.07240v3</guid>
      <category>cs.GR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1145/3799902.3811156</arxiv:DOI>
      <arxiv:journal_reference>ACM SIGGRAPH Conference Papers, 2026</arxiv:journal_reference>
      <dc:creator>Zixuan Li, Zixiong Wang, Jian Yang, Milo\v{s} Ha\v{s}an, Beibei Wang</dc:creator>
    </item>
    <item>
      <title>Understanding Transformers through the Lens of Pavlovian Conditioning</title>
      <link>https://arxiv.org/abs/2508.08289</link>
      <description>arXiv:2508.08289v2 Announce Type: replace 
Abstract: Transformer architectures have revolutionized artificial intelligence (AI) through their attention mechanisms, yet the computational principles underlying their success remain opaque. We present a novel theoretical framework that reinterprets the core computation of attention as Pavlovian conditioning. Our model finds a direct mathematical analogue in linear attention, which simplifies the analysis of the underlying associative process. We demonstrate that attention's queries, keys, and values can be mapped to the three elements of classical conditioning: test stimuli that probe associations, conditional stimuli (CS) that serve as retrieval cues, and unconditional stimuli (US) that contain response information. Through this lens, we suggest that each attention operation constructs a transient associative memory via a Hebbian rule, where CS-US pairs form dynamic associations that test stimuli can later retrieve. Our framework yields several theoretical insights grounded in this linearized model: (1) a capacity theorem showing that attention heads can store $O(\sqrt{d_k})$ associations for worst-case, error-free retrieval, while average-case retrieval fidelity scales robustly as $O(d_k)$; (2) an error propagation analysis revealing fundamental architectural trade-offs of balancing model depth, width, and head redundancy to maintain reliability; and (3) an understanding of how biologically plausible learning rules could enhance transformer architectures. By establishing this deep connection, we suggest that the success of modern AI may stem not from architectural novelty alone, but from implementing computational principles analogous to those optimized by biology over millions of years of evolution.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.08289v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>q-bio.NC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mu Qiao</dc:creator>
    </item>
    <item>
      <title>UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning</title>
      <link>https://arxiv.org/abs/2508.11196</link>
      <description>arXiv:2508.11196v2 Announce Type: replace 
Abstract: Recent advances in vision-language models (VLMs) have demonstrated strong generalization in natural image tasks. However, their performance often degrades on unmanned aerial vehicle (UAV)-based aerial imagery, which features high resolution, complex spatial semantics, and strict real-time constraints. These challenges limit the applicability of general-purpose VLMs to structured aerial reasoning tasks. To address these challenges, we propose UAV-VL-R1, a lightweight VLM explicitly designed for aerial visual reasoning. It is trained using a hybrid method that combines supervised fine-tuning (SFT) and multi-stage reinforcement learning (RL). We leverage the group relative policy optimization (GRPO) algorithm to promote structured and interpretable reasoning through rule-guided rewards and intra-group policy alignment. To support model training and evaluation, we introduce a high-resolution visual question answering dataset named HRVQA-VL, which consists of 50,019 annotated samples covering eight UAV-relevant reasoning tasks, including object counting, transportation recognition, and spatial scene inference. Experimental results show that UAV-VL-R1 achieves a 48.17% higher zero-shot accuracy than the Qwen2-VL-2B-Instruct baseline and even outperforms its 72B-scale variant, which is 36x larger, on multiple tasks. Ablation studies reveal that while SFT improves semantic alignment, it may reduce reasoning diversity in mathematical tasks. GRPO-based RL compensates for this limitation by enhancing logical flexibility and the robustness of inference. Additionally, UAV-VL-R1 requires only 3.9GB of memory under FP16 inference and can be quantized to 2.5GB with INT8, supporting real-time deployment on resource-constrained UAV platforms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.11196v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1007/s11704-026-52082-z</arxiv:DOI>
      <dc:creator>Jiajin Guan (Research Institute of Electronic Science and Technology, University of Electronic Science and Technology of China, Chengdu, China), Haibo Mei (School of Aeronautics and Astronautics, University of Electronic Science and Technology of China, Chengdu, China), Bonan Zhang (Research Institute of Electronic Science and Technology, University of Electronic Science and Technology of China, Chengdu, China), Dan Liu (Research Institute of Electronic Science and Technology, University of Electronic Science and Technology of China, Chengdu, China), Yuanshuang Fu (Research Institute of Electronic Science and Technology, University of Electronic Science and Technology of China, Chengdu, China), Yue Zhang (School of Aeronautics and Astronautics, University of Electronic Science and Technology of China, Chengdu, China)</dc:creator>
    </item>
    <item>
      <title>Investigating Advanced Reasoning of Large Language Models via Black-Box Environment Interaction</title>
      <link>https://arxiv.org/abs/2508.19035</link>
      <description>arXiv:2508.19035v2 Announce Type: replace 
Abstract: Existing tasks fall short in evaluating reasoning ability of Large Language Models (LLMs) in an interactive, unknown environment. This deficiency leads to the isolated assessment of deductive, inductive, and abductive reasoning, neglecting the integrated reasoning process that is indispensable for human-like discovery learning. We introduce a novel evaluation paradigm, \textit{black-box environment interaction}, to tackle this challenge. A black-box environment is defined by a hidden function that maps a specific set of inputs to outputs. LLMs are required to unravel the hidden function behind the black-box environment by interacting with it in given exploration turns, and reasoning over observed input-output pairs. Leveraging this idea, we build the \textsc{Oracle} benchmark which comprises 6 types of black-box task with 96 black-box environments. 19 modern LLMs are benchmarked. o3, a leading LLM from OpenAI, ranks first in 5 of the 6 tasks, achieving over 70\% accuracy on most easy black-box environments. But it still struggles with some hard black-box tasks, where the average performance drops below 40\%. Further analysis reveals a universal difficulty among LLMs: They lack the high-level planning capability to develop efficient and adaptive exploration strategies for hypothesis refinement. Code is available in https://github.com/lemonsis/Oracle_Benchmark.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.19035v2</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Congchi Yin, Tianyi Wu, Yankai Shu, Alex Gu, Yunhan Wang, Jun Shao, Xun Jiang, Piji Li</dc:creator>
    </item>
    <item>
      <title>Scalable Object Detection in the Car Interior With Vision Foundation Models</title>
      <link>https://arxiv.org/abs/2508.19651</link>
      <description>arXiv:2508.19651v2 Announce Type: replace 
Abstract: AI tasks in the car interior like identifying and localizing externally introduced objects is crucial for response quality of personal assistants. However, computational resources of on-board systems remain highly constrained, restricting the deployment of such solutions directly within the vehicle. To address this limitation, we propose the novel Object Detection and Localization (ODAL) framework for interior scene understanding. Our approach leverages vision foundation models through a distributed architecture, splitting computational tasks between on-board and cloud. This design overcomes the resource constraints of running foundation models directly in the car. To benchmark model performance, we introduce ODALbench, a new metric for comprehensive assessment of detection and localization.Our analysis demonstrates the framework's potential to establish new standards in this domain. We compare the state-of-the-art GPT-4o vision foundation model with the lightweight LLaVA 1.5 7B model and explore how fine-tuning enhances the lightweight models performance. Remarkably, our fine-tuned ODAL-LLaVA model achieves an ODAL$_{score}$ of 89%, representing a 71% improvement over its baseline performance and outperforming GPT-4o by nearly 20%. Furthermore, the fine-tuned model maintains high detection accuracy while significantly reducing hallucinations, achieving an ODAL$_{SNR}$ three times higher than GPT-4o.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.19651v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>B\'alint M\'esz\'aros, Ahmet Firintepe, Sebastian Schmidt, Stephan G\"unnemann</dc:creator>
    </item>
    <item>
      <title>When Darwin met Ianus: dichotomies of expressivity</title>
      <link>https://arxiv.org/abs/2509.04347</link>
      <description>arXiv:2509.04347v4 Announce Type: replace 
Abstract: The classifications of temporal and phylogeny constraint languages stand among the most seminal complexity classifications within infinite-domain Constraint Satisfaction Problems (CSPs), yet remain the most mysterious in terms of algorithms and algebraic invariants for the tractable cases. We show that those languages which do not pp-construct EVERYTHING (and thus by the classifications are solvable in polynomial time) have, in fact, very limited expressive power as measured by the graphs and hypergraphs they can pp-interpret. This limitation yields many previously unknown algebraic consequences, while also providing new, uniform proofs for known invariance properties. In particular, we show that such temporal and phylogeny constraint languages admit $4$-ary pseudo-Siggers polymorphisms -- a result that sustains the possibility that the existence of such polymorphisms extends to the much broader context of the Bodirsky-Pinsker conjecture. Although temporal and phylogeny constraint languages appear to follow fundamentally different algorithmic principles, our proofs reveal a common core and proceed along strikingly similar lines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.04347v4</guid>
      <category>cs.LO</category>
      <category>math.LO</category>
      <category>math.RA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Johanna Brunar, Michael Pinsker, Moritz Sch\"obi</dc:creator>
    </item>
    <item>
      <title>Discovering New Theorems via LLMs with In-Context Proof Learning in Lean</title>
      <link>https://arxiv.org/abs/2509.14274</link>
      <description>arXiv:2509.14274v2 Announce Type: replace 
Abstract: Large Language Models (LLMs) have demonstrated significant promise in formal theorem proving. In this study, we investigate the ability of LLMs to discover novel theorems and produce verified proofs. We propose a pipeline called \textit{Conjecturing-Proving Loop} (CPL), which iteratively generates mathematical conjectures and attempts to prove them in Lean 4. A key feature of CPL is that each iteration conditions the LLM on previously generated theorems and their formal proofs, enabling parameter-free improvement of proof strategies via in-context learning. We provide both theoretical and experimental evidence that CPL increases the discovery rate of hard-to-prove theorems compared to frameworks that generate statements and proofs simultaneously. Moreover, our experiments show that reusing the LLM's own formally verified outputs as context consistently improves subsequent proof success, demonstrating the effectiveness of self-generated in-context learning for neural theorem proving. The source code is available at https://github.com/auto-res/ConjecturingProvingLoop.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.14274v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kazumi Kasaura, Naoto Onda, Yuta Oriike, Masaya Taniguchi, Akiyoshi Sannai, Sho Sonoda</dc:creator>
    </item>
    <item>
      <title>DyWPE: Signal-Aware Dynamic Wavelet Positional Encoding for Time Series Transformers</title>
      <link>https://arxiv.org/abs/2509.14640</link>
      <description>arXiv:2509.14640v2 Announce Type: replace 
Abstract: Existing positional encoding methods in transformers are fundamentally signal-agnostic, deriving positional information solely from sequence indices while ignoring the underlying signal characteristics. This limitation is particularly problematic for time series analysis, where signals exhibit complex, non-stationary dynamics across multiple temporal scales. We introduce Dynamic Wavelet Positional Encoding (DyWPE), a novel signal-aware framework that generates positional embeddings directly from input time series using the Discrete Wavelet Transform (DWT). Comprehensive experiments on ten diverse time series datasets demonstrate that DyWPE consistently outperforms state-of-the-art positional encoding methods, with particularly significant improvements on longer sequences and complex biomedical signals.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.14640v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Habib Irani, Vangelis Metsis</dc:creator>
    </item>
    <item>
      <title>Scalable Multi Agent Diffusion Policies for Coverage Control</title>
      <link>https://arxiv.org/abs/2509.17244</link>
      <description>arXiv:2509.17244v2 Announce Type: replace 
Abstract: We propose MADP, a novel diffusion-model-based approach for collaboration in decentralized robot swarms. MADP leverages diffusion models to generate samples from complex and high-dimensional action distributions that capture the interdependencies between agents' actions. Each robot conditions policy sampling on a fused representation of its own observations and perceptual embeddings received from peers. To evaluate this approach, we task a team of holonomic robots piloted by MADP to address coverage control-a canonical multi agent navigation problem. The policy is trained via imitation learning from a clairvoyant expert on the coverage control problem, with the diffusion process parameterized by a spatial transformer architecture to enable decentralized inference. We evaluate the system under varying numbers, locations, and variances of importance density functions, capturing the robustness demands of real-world coverage tasks. Experiments demonstrate that our model inherits valuable properties from diffusion models, generalizing across agent densities and environments, and consistently outperforming state-of-the-art baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.17244v2</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Frederic Vatnsdal, Romina Garcia Camargo, Saurav Agarwal, Alejandro Ribeiro</dc:creator>
    </item>
    <item>
      <title>Perceive, Verify and Understand Long Video: Multi-Granular Perception and Active Verification via Interactive Agents</title>
      <link>https://arxiv.org/abs/2509.24943</link>
      <description>arXiv:2509.24943v2 Announce Type: replace 
Abstract: Long videos, characterized by temporal complexity and sparse task-relevant information, pose significant reasoning challenges for AI systems. Although existing Large Language Model (LLM)-based approaches have advanced long video understanding, they remain bottlenecked by task-agnostic, fixed-granularity perception pipelines and suffer from vision-language hallucinations. Inspired by human adaptive perception and active verification, we propose CogniGPT, a framework leveraging an interactive loop between a Multi-Granular Perception Agent (MPA) and an Active Verification Agent (AVA). Specifically, instead of predetermined heuristics, MPA adaptively determines the optimal perception granularity and strategy based on the evolving context, while AVA actively mines multi-perspective visual evidence to cross-verify key observations and eliminate hallucinations. This interaction allows CogniGPT to efficiently identify a minimal set of reliable task-related clues. Extensive experiments on EgoSchema, Video-MME, NExT-QA, and MovieChat demonstrate its superiority in accuracy and efficiency. Notably, on EgoSchema, it surpasses existing training-free methods using only 11.2 frames and achieves performance comparable to Gemini 1.5-Pro.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.24943v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiahua Li, Zhanhe Zhang, Chenghao Xu, Zhe Xu, Kun Wei, Xu Yang, Cheng Deng</dc:creator>
    </item>
    <item>
      <title>GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference</title>
      <link>https://arxiv.org/abs/2509.25041</link>
      <description>arXiv:2509.25041v4 Announce Type: replace 
Abstract: Sparse Mixture of Experts (SMoE) enables scalable parameter growth in large language models (LLMs) by selectively activating a subset of experts, and its large parameter count necessitates distributed deployment for inference. However, distributed inference faces a critical dilemma: although communication overhead constitutes the primary bottleneck, reducing it often exacerbates computational load imbalance, leading to resource waste. In this paper, we present GRACE-MoE, which stands for Grouping and Replication with Locality-Aware Routing for SMoE inference. GRACE-MoE is a lossless co-optimization framework that integrates expert grouping to reduce communication and dynamic replication to correct load skew, together with locality-aware routing to resolve replica selection. To underpin this coordinated optimization in multi-node settings, GRACE-MoE adopts a hierarchical sparse communication design that reduces cross-node traffic while implicitly aligning execution across nodes, thereby mitigating synchronization overhead. Experiments on diverse models and multi-node, multi-GPU environments demonstrate that GRACE-MoE efficiently reduces end-to-end inference latency, achieving up to 4.66x speedup over existing systems, and the code will be released upon acceptance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.25041v4</guid>
      <category>cs.DC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yu Han, Lehan Pan, Jie Peng, Ziyang Tao, Hanqi Zhu, Wuyang Zhang, Yanyong Zhang</dc:creator>
    </item>
    <item>
      <title>Feature Identification via the Empirical NTK</title>
      <link>https://arxiv.org/abs/2510.00468</link>
      <description>arXiv:2510.00468v4 Announce Type: replace 
Abstract: We provide evidence that eigenanalysis of the empirical neural tangent kernel (eNTK) can surface feature directions in trained neural networks. Across three increasingly realistic settings -- a 1-layer MLP trained on modular addition, a 1-layer Transformer trained on modular addition and the pretrained language model Gemma-3-270M -- we show that top eigenspaces of the eNTK align with ground-truth or interpretable features. In the modular arithmetic examples, top eNTK eigenspaces align with the Fourier features used by the MLP and the Fourier features at seed-dependent frequencies used by the Transformer to implement known ground-truth algorithms. Moreover, the alignment of the relevant subspaces evolves over training, with its first derivative peaking near the onset of grokking. For Gemma-3-270M, we compute top eNTK eigendirections on a dataset of TinyStories context windows and check their alignment with an automatically-generated set of parts-of-speech and other grammatical feature directions. We find that the alignment of eNTK eigendirections with grammar features outperforms a same-budget baseline of PCA on model activations. These results suggest that eNTK eigenanalysis may provide a new handle towards identifying features in trained models for mechanistic interpretability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.00468v4</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jennifer Lin</dc:creator>
    </item>
    <item>
      <title>FideDiff: Efficient Diffusion Model for High-Fidelity Image Motion Deblurring</title>
      <link>https://arxiv.org/abs/2510.01641</link>
      <description>arXiv:2510.01641v3 Announce Type: replace 
Abstract: Recent advancements in image motion deblurring, driven by CNNs and transformers, have made significant progress. Large-scale pre-trained diffusion models, which are rich in real-world modeling, have shown great promise for high-quality image restoration tasks such as deblurring, demonstrating stronger generative capabilities than CNN and transformer-based methods. However, challenges such as unbearable inference time and compromised fidelity still limit the full potential of the diffusion models. To address this, we introduce FideDiff, a novel single-step diffusion model designed for high-fidelity deblurring. We reformulate motion deblurring as a diffusion-like process where each timestep represents a progressively blurred image, and we train a consistency model that aligns all timesteps to the same clean image. By reconstructing training data with matched blur trajectories, the model learns temporal consistency, enabling accurate one-step deblurring. We further enhance model performance by integrating Kernel ControlNet for blur kernel estimation and introducing adaptive timestep prediction. Our model achieves superior performance on full-reference metrics, surpassing previous diffusion-based methods and matching the performance of other state-of-the-art models. FideDiff offers a new direction for applying pre-trained diffusion models to high-fidelity image restoration tasks, establishing a robust baseline for further advancing diffusion models in real-world industrial applications. Our dataset and code will be available at https://github.com/xyLiu339/FideDiff.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.01641v3</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiaoyang Liu, Zhengyan Zhou, Zihang Xu, Jiezhang Cao, Zheng Chen, Yulun Zhang</dc:creator>
    </item>
    <item>
      <title>Forensic Similarity for Speech Deepfakes</title>
      <link>https://arxiv.org/abs/2510.02864</link>
      <description>arXiv:2510.02864v2 Announce Type: replace 
Abstract: In this paper, we introduce the concept of forensic similarity in the speech deepfake detection domain, which aims to determine whether two audio segments share the same underlying forensic traces. Our approach is inspired by prior work in the image domain. To transfer this idea to the audio domain, we propose a two-stage deep learning framework consisting of a Siamese-based feature extractor and a core decision module, referred to as the similarity network. The system goal to assess whether two speech samples originate from the same source by comparing their forensic characteristics. In practice, the model maps pairs of audio segments to a similarity score indicating whether they contain identical or different forensic traces. We evaluate the proposed method on the emerging task of source verification, demonstrating its ability to determine whether two speech samples were generated by the same model. In addition, we explore its applicability to audio splicing detection as a complementary use case. Experimental results show that the proposed approach generalizes well to previously unseen forensic traces, highlighting its robustness, flexibility, and practical relevance for digital audio forensics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.02864v2</guid>
      <category>cs.SD</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Viola Negroni, Davide Salvi, Daniele Ugo Leonzio, Paolo Bestagini, Stefano Tubaro</dc:creator>
    </item>
    <item>
      <title>Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency</title>
      <link>https://arxiv.org/abs/2510.08431</link>
      <description>arXiv:2510.08431v3 Announce Type: replace 
Abstract: Although continuous-time consistency models (e.g., sCM, MeanFlow) are theoretically principled and empirically powerful for fast academic-scale diffusion, its applicability to large-scale text-to-image and video tasks remains unclear due to infrastructure challenges in Jacobian-vector product (JVP) computation and the limitations of evaluation benchmarks like FID. This work represents the first effort to scale up continuous-time consistency to general application-level image and video diffusion models, and to make JVP-based distillation effective at large scale. We first develop a parallelism-compatible FlashAttention-2 JVP kernel, enabling sCM training on models with over 10 billion parameters and high-dimensional video tasks. Our investigation reveals fundamental quality limitations of sCM in fine-detail generation, which we attribute to error accumulation and the "mode-covering" nature of its forward-divergence objective. To remedy this, we propose the score-regularized continuous-time consistency model (rCM), which incorporates score distillation as a long-skip regularizer. This integration complements sCM with the "mode-seeking" reverse divergence, effectively improving visual quality while maintaining high generation diversity. Validated on large-scale models (Cosmos-Predict2, Wan2.1) up to 14B parameters and 5-second videos, rCM generally matches the state-of-the-art distillation method DMD2 on quality metrics while mitigating mode collapse and offering notable advantages in diversity, all without GAN tuning or extensive hyperparameter searches. The distilled models generate high-fidelity samples in only $1\sim4$ steps, accelerating diffusion sampling by $15\times\sim50\times$. These results position rCM as a practical and theoretically grounded framework for advancing large-scale diffusion distillation. Code is available at https://github.com/NVlabs/rcm.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.08431v3</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kaiwen Zheng, Yuji Wang, Qianli Ma, Huayu Chen, Jintao Zhang, Yogesh Balaji, Jianfei Chen, Ming-Yu Liu, Jun Zhu, Qinsheng Zhang</dc:creator>
    </item>
    <item>
      <title>LTGS: Long-Term Gaussian Scene Chronology From Sparse View Updates</title>
      <link>https://arxiv.org/abs/2510.09881</link>
      <description>arXiv:2510.09881v3 Announce Type: replace 
Abstract: Recent advances in novel-view synthesis can create the photo-realistic visualization of real-world environments from conventional camera captures. However, the everyday environment experiences frequent scene changes, which require dense observations, both spatially and temporally, that an ordinary setup cannot cover. We propose long-term Gaussian scene chronology from sparse-view updates, coined LTGS, an efficient scene representation that can embrace everyday changes from highly under-constrained casual captures. Given an incomplete and unstructured 3D Gaussian Splatting (3DGS) representation obtained from an initial set of input images, we robustly model the long-term chronology of the scene despite abrupt movements and subtle environmental variations. We construct objects as template Gaussians, which serve as structural, reusable priors for shared object tracks. Then, the object templates undergo a further refinement pipeline that modulates the priors to adapt to temporally varying environments given few-shot observations. Once trained, our framework is generalizable across multiple time steps through simple transformations, significantly enhancing the scalability for a temporal evolution of 3D environments. As existing datasets do not explicitly represent the long-term real-world changes with a sparse capture setup, we collect real-world datasets to evaluate the practicality of our pipeline. Experiments demonstrate that our framework achieves superior reconstruction quality compared to other baselines while enabling fast and light-weight updates. Project page is available at: https://mkjjang3598.github.io/LTGS.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.09881v3</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Minkwan Kim, Seungmin Lee, Junho Kim, Young Min Kim</dc:creator>
    </item>
    <item>
      <title>Diffusion-Inspired Masked Fine-Tuning for Knowledge Injection in Autoregressive LLMs</title>
      <link>https://arxiv.org/abs/2510.09885</link>
      <description>arXiv:2510.09885v5 Announce Type: replace 
Abstract: Large language models (LLMs) are often used in environments where facts evolve, yet factual knowledge updates via fine-tuning on unstructured text often suffer from 1) reliance on compute-heavy paraphrasing augmentation and 2) the reversal curse. Recent studies show diffusion large language models (dLLMs) require fewer training samples to achieve lower loss in pre-training and are more resistant to the reversal curse, suggesting dLLMs may learn new knowledge more easily than autoregressive LLMs (arLLMs). We test this hypothesis in controlled knowledge fine-tuning experiments and find that while arLLMs rely on paraphrase augmentation to generalize knowledge text into question-answering (QA) capability, dLLMs do not require paraphrases to achieve high QA accuracy. To further investigate whether the demasking objective alone can induce such a knowledge injection advantage in dLLMs regardless of their diffusion denoising paradigm, we propose masked fine-tuning for arLLMs, which prompts an arLLM to reconstruct the original text given a masked version in context. The masked fine-tuning for arLLMs substantially improves the efficacy of knowledge injection, i.e. no paraphrase needed and resistant to the reversal curse, closing the gap between arLLMs and dLLMs. We also demonstrate broader applicability: on a large-scale knowledge-intensive dataset (1.2M samples), masked SFT achieves the best downstream accuracy on GPQA-diamond among all fine-tuning variants. The demasking objective also improves SFT on math tasks, suggesting broad utility beyond factual knowledge injection.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.09885v5</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Xu Pan, Ely Hahami, Jingxuan Fan, Ziqian Xie, Haim Sompolinsky</dc:creator>
    </item>
    <item>
      <title>Soft-Decoding Reverse Reconciliation in Discrete-Modulation CV-QKD</title>
      <link>https://arxiv.org/abs/2510.10674</link>
      <description>arXiv:2510.10674v2 Announce Type: replace 
Abstract: In continuous-variable quantum key distribution, information reconciliation is required to extract a shared secret key from correlated random variables obtained through the quantum channel. Reverse reconciliation (RR) is generally preferred, since the eavesdropper has less information about Bob's measurements than about Alice's transmitted symbols. When discrete modulation formats are employed, however, soft information is available only at Bob's side, while Alice has access only to hard information (her transmitted sequence). This forces her to rely on hard-decision decoding to recover Bob's key.
  In this work, we introduce a novel RR technique for PAM (and QAM) in which Bob discloses a carefully designed soft metric to help Alice recover Bob's key, while leaking no additional information about the key to an eavesdropper. We assess the performance of the proposed technique in terms of achievable secret key rate (SKR) and its bounds, showing that the achievable SKR closely approaches the upper bound, with a significant gain over hard-decision RR. Finally, we implement the scheme at the coded level using binary LDPC codes with belief-propagation decoding, assess its bit-error rate through numerical simulations, compare the observed gain with theoretical predictions from the achievable SKR, and discuss the residual gap.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.10674v2</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Marco Origlia, Emanuele Parente, Marco Secondini</dc:creator>
    </item>
    <item>
      <title>Efficient Model-Based Reinforcement Learning for Robot Control via Online Optimization</title>
      <link>https://arxiv.org/abs/2510.18518</link>
      <description>arXiv:2510.18518v2 Announce Type: replace 
Abstract: We present an online model-based reinforcement learning algorithm suitable for controlling complex robotic systems directly in the real world. Unlike prevailing sim-to-real pipelines that rely on extensive offline simulation and model-free policy optimization, our method builds a dynamics model from real-time interaction data and performs policy updates guided by the learned dynamics model. This efficient model-based reinforcement learning scheme significantly reduces the number of samples to train control policies, enabling direct training on real-world rollout data. This significantly reduces the influence of bias in the simulated data, and facilitates the search for high-performance control policies. We adopt online optimization analysis to derive sublinear regret bounds under stochastic online optimization assumptions, providing formal guarantees on performance improvement as more interaction data are collected. Experimental evaluations were performed on a hydraulic excavator arm and a soft robot arm, where the algorithm demonstrates strong sample efficiency compared to model-free reinforcement learning methods, reaching comparable performance within hours. Robust adaptation to shifting dynamics was also observed when the payload condition was randomized. Our approach paves the way toward efficient and reliable on-robot learning for a broad class of challenging control tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.18518v2</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Fang Nan, Hao Ma, Qinghua Guan, Josie Hughes, Michael Muehlebach, Marco Hutter</dc:creator>
    </item>
    <item>
      <title>What Can Be Recovered Under Sparse Adversarial Corruption? Assumption-Free Theory for Linear Measurements</title>
      <link>https://arxiv.org/abs/2510.24215</link>
      <description>arXiv:2510.24215v4 Announce Type: replace 
Abstract: Recovery from linear measurements under sparse adversarial corruption is typically formulated as an exact-recovery problem: one seeks structural conditions on $A$ (e.g., the restricted isometry property) that guarantee unique recovery of $x^\star$ from $y = A x^\star + e$ with $\left\lVert e \right\rVert_0 \leq q$. However, in practice, these conditions are rarely met and are hard to verify, and so the existing guarantees provide no guidance once exact recovery fails. This limitation obscures even simple robustness phenomena -- for instance, repeated rows in $A$ can preserve nontrivial information about $x^\star$ under sparse corruption.
  In this paper, we address the more general question: for arbitrary $A \in \mathbb{R}^{m \times n}$, what information about $x^\star$ remains robust in $y$ despite any $q$-sparse adversarial corruption $e$? We show that the robust information is precisely $x^\star + \ker(U)$, where $U$ is the orthogonal projection onto the intersection of rowspaces of all submatrices of $A$ obtained by deleting $2q$ rows. This characterization clarifies, for each sparsity level $q$, how the row structure of $A$ determines whether a $q$-sparse $e$ allows exact, partial, or only trivial recovery, thereby extending the standard exact-recovery framework. We further prove that every $x$ that minimizes $\left\lVert y - A x \right\rVert_0$ belongs to $x^\star + \ker(U)$, yielding a constructive approach to recover this set. For i.i.d. Gaussian $A$, we show a sharp phase transition: depending on $m$, $n$, and $q$, either exact recovery holds or no nontrivial recovery is possible. We sketch two applications: robust network tomography and signal reconstruction from oversampled DCT measurements.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.24215v4</guid>
      <category>cs.IT</category>
      <category>cs.LG</category>
      <category>eess.SP</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Vishal Halder, Alexandre Reiffers-Masson, Abdeldjalil A\"issa-El-Bey, Gugan Thoppe</dc:creator>
    </item>
    <item>
      <title>ProMediate: A Socio-cognitive framework for evaluating proactive agents in multi-party negotiation</title>
      <link>https://arxiv.org/abs/2510.25224</link>
      <description>arXiv:2510.25224v3 Announce Type: replace 
Abstract: While Large Language Models (LLMs) are increasingly used in agentic frameworks to assist individual users, there is a growing need for agents that can proactively manage complex, multi-party collaboration. Systematic evaluation methods for such proactive agents remain scarce, limiting progress in developing AI that can effectively support multiple people together. Negotiation offers a demanding testbed for this challenge, requiring socio-cognitive intelligence to navigate conflicting interests between multiple participants and multiple topics and build consensus. Here, we present ProMediate, the first framework for evaluating proactive AI mediator agents in complex, multi-topic, multi-party negotiations. ProMediate consists of two core components: (i) a simulation testbed based on realistic negotiation cases and theory-driven difficulty levels (ProMediate-Easy, ProMediate-Medium, and ProMediate-Hard), with a plug-and-play proactive AI mediator grounded in socio-cognitive mediation theories, capable of flexibly deciding when and how to intervene; and (ii) a socio-cognitive evaluation framework with a new suite of metrics to measure consensus changes, intervention latency, mediator effectiveness, and intelligence. Together, these components establish a systematic framework for assessing the socio-cognitive intelligence of proactive AI agents in multi-party settings. Our results show that a socially intelligent mediator agent outperforms a generic baseline, via faster, better-targeted interventions. In the ProMediate-Hard setting, our social mediator increases consensus change by 3.6 percentage points compared to the generic baseline (10.65\% vs 7.01\%) while being 77\% faster in response (15.98s vs. 3.71s). In conclusion, ProMediate provides a rigorous, theory-grounded testbed to advance the development of proactive, socially intelligent agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.25224v3</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ziyi Liu, Bahar Sarrafzadeh, Pei Zhou, Longqi Yang, Jieyu Zhao, Ashish Sharma</dc:creator>
    </item>
    <item>
      <title>Online Continual Learning on Intel Loihi 2 via a Co-designed Spiking Neural Network</title>
      <link>https://arxiv.org/abs/2511.01553</link>
      <description>arXiv:2511.01553v2 Announce Type: replace 
Abstract: AI systems on edge devices require online continual learning -- adapting to non-stationary streams and unfamiliar classes without catastrophic forgetting -- under strict power constraints. We present CLP-SNN, a spiking neural network with a self-normalizing local learning rule and a spike-driven neural state machine for autonomous on-chip learning, implemented on Intel's Loihi 2 neuromorphic processor. On OpenLORIS few-shot experiments, CLP-SNN matches replay-based accuracy rehearsal-free. On Loihi 2, CLP-SNN achieves 113x lower latency (0.33 ms vs. 37.3 ms) and 6,600x lower energy (0.05 mJ vs. 333 mJ) than the strongest edge-GPU baseline. This gain decomposes into algorithmic efficiency (~14.5x latency, ~22.6x energy on the same GPU) and neuromorphic hardware co-design (~7.8x latency, ~295x energy) exploiting event-driven learning and sparse graded-spike communication. We show that co-designed brain-inspired algorithms and neuromorphic hardware can break traditional accuracy-efficiency trade-offs in edge AI.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.01553v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.DC</category>
      <category>cs.NE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Elvin Hajizada, Danielle Rager, Timothy Shea, Leobardo Campos-Macias, Andreas Wild, Eyke H\"ullermeier, Yulia Sandamirskaya, Mike Davies</dc:creator>
    </item>
    <item>
      <title>ContextPilot: Fast Long-Context Inference via Context Reuse</title>
      <link>https://arxiv.org/abs/2511.03475</link>
      <description>arXiv:2511.03475v4 Announce Type: replace 
Abstract: AI applications increasingly depend on long-context inference, where LLMs consume substantial context to support stronger reasoning. Common examples include retrieval-augmented generation, agent memory layers, and multi-agent orchestration. As input contexts get longer, prefill latency becomes the main bottleneck. Yet today's prefill acceleration techniques face a trade-off: they either preserve reasoning quality but deliver little KV-cache reuse, or improve reuse at the cost of degraded reasoning quality.
  We present ContextPilot, a system that accelerates prefill by introducing context reuse as a new mechanism for faster long-context inference. ContextPilot introduces a context index to identify overlapping context blocks across LLM interactions (e.g., across users and turns). It further proposes context ordering and de-duplication techniques to maximize KV-cache reuse. To preserve reasoning quality under reuse, it introduces succinct context annotations that prevent quality degradation. Finally, ContextPilot is built around a modular architecture with a clean interface that integrates with existing inference engines. Extensive evaluation shows that ContextPilot reduces LLM prefill latency by up to $3\times{}$ compared to state-of-the-art methods while preserving reasoning quality. At longer context lengths, it can even improve reasoning quality. ContextPilot is open-sourced at: https://github.com/EfficientContext/ContextPilot.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.03475v4</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Yinsicheng Jiang, Yeqi Huang, Liang Cheng, Cheng Deng, Xuan Sun, Luo Mai</dc:creator>
    </item>
    <item>
      <title>Making Knowledge Accessible: Divergent Readability-Accuracy Strategies of Mistral and QWen in Biomedical Text Simplification</title>
      <link>https://arxiv.org/abs/2511.05080</link>
      <description>arXiv:2511.05080v4 Announce Type: replace 
Abstract: The growing public demand for accessible biomedical information calls for scalable text simplification. While large language models (LLMs) offer solutions, they too struggle with balancing improved readability against preservation of meaning. This report empirically compares how two LLMs - instruction-tuned Mistral-Small 3 24B and the reasoning-augmented QWen2.5 32B- navigate this trade-off in biomedical text simplification, benchmarked against human performance. Our analysis highlights how each model applies distinct operational strategies when simplifying biomedical text. Mistral exhibits a tempered lexical simplification approach that consistently enhances readability across multiple metrics while preserving discourse fidelity (BERTScore: 0.91, statistically comparable to that of humans). In comparison, QWen also attains enhanced readability performance and a reasonable BERTScore of 0.89, but presents a disconnect in balancing between readability and accuracy. Additionally, a comprehensive correlation analysis of a suite of 21 metrics confirms strong functional redundancies in metrics and informs adaptation requirements.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.05080v4</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>P. Bilha Githinji, Aikaterini Melliou, Zeming Liang, Lian Zhang, Peiwu Qin</dc:creator>
    </item>
    <item>
      <title>Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning</title>
      <link>https://arxiv.org/abs/2511.06371</link>
      <description>arXiv:2511.06371v3 Announce Type: replace 
Abstract: Humanoid robots are promising to learn a diverse set of human-like locomotion behaviors, including standing up, walking, running, and jumping. However, existing methods predominantly require training independent policies for each skill, yielding behavior-specific controllers that exhibit limited generalization and brittle performance when deployed on irregular terrains and in diverse situations. To address this challenge, we propose Adaptive Humanoid Control (AHC) that adopts a two-stage framework to learn an adaptive humanoid locomotion controller across different skills and terrains. Specifically, we first train several primary locomotion policies and perform a multi-behavior distillation process to obtain a basic multi-behavior controller, facilitating adaptive behavior switching based on the environment. Then, we perform reinforced fine-tuning by collecting online feedback in performing adaptive behaviors on more diverse terrains, enhancing terrain adaptability for the controller. We conduct experiments in both simulation and real-world experiments in Unitree G1 robots. The results show that our method exhibits strong adaptability across various situations and terrains. Project website: https://ahc-humanoid.github.io.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.06371v3</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yingnan Zhao, Xinmiao Wang, Dewei Wang, Xinzhe Liu, Dan Lu, Qilong Han, Peng Liu, Chenjia Bai</dc:creator>
    </item>
    <item>
      <title>MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains</title>
      <link>https://arxiv.org/abs/2511.06452</link>
      <description>arXiv:2511.06452v3 Announce Type: replace 
Abstract: Although multimodal fusion has made significant progress, its advancement is severely hindered by the lack of adequate evaluation benchmarks. Current fusion methods are typically evaluated on a small selection of public datasets, a limited scope that inadequately represents the complexity and diversity of real-world scenarios, potentially leading to biased evaluations. This issue presents a twofold challenge. On one hand, models may overfit to the biases of specific datasets, hindering their generalization to broader practical applications. On the other hand, the absence of a unified evaluation standard makes fair and objective comparisons between different fusion methods difficult. Consequently, a truly universal and high-performance fusion model has yet to emerge. To address these challenges, we have developed a large-scale, domain-adaptive benchmark for multimodal evaluation. This benchmark integrates over 30 datasets, encompassing 15 modalities and 20 predictive tasks across key application domains. To complement this, we have also developed an open-source, unified, and automated evaluation pipeline that includes standardized implementations of state-of-the-art models and diverse fusion paradigms. Leveraging this platform, we have conducted large-scale experiments, successfully establishing new performance baselines across multiple tasks. This work provides the academic community with a crucial platform for rigorous and reproducible assessment of multimodal models, aiming to propel the field of multimodal artificial intelligence to new heights.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.06452v3</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Leyan Xue, Changqing Zhang, Kecheng Xue, Xiaohong Liu, Guangyu Wang, Zongbo Han</dc:creator>
    </item>
    <item>
      <title>SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation</title>
      <link>https://arxiv.org/abs/2511.06754</link>
      <description>arXiv:2511.06754v3 Announce Type: replace 
Abstract: Inspired by how humans reason over discrete objects and their relationships, we explore whether compact object-centric and object-relation representations can form a foundation for multitask robotic manipulation. Most existing robotic multitask models rely on dense embeddings that entangle both object and background cues, raising concerns about both efficiency and interpretability. In contrast, we study object-relation-centric representations as a pathway to more structured, efficient, and explainable visuomotor control. Our contributions are two-fold. First, we introduce LIBERO+, a fine-grained benchmark dataset designed to enable and evaluate object-relation reasoning in robotic manipulation. Unlike prior datasets, LIBERO+ provides object-centric annotations that enrich demonstrations with box- and mask-level labels as well as instance-level temporal tracking, supporting compact and interpretable visuomotor representations. Second, we propose SlotVLA, a slot-attention-based framework that captures both objects and their relations for action decoding. It uses a slot-based visual tokenizer to maintain consistent temporal object representations, a relation-centric decoder to produce task-relevant embeddings, and an LLM-driven module that translates these embeddings into executable actions. Experiments on LIBERO+ demonstrate that object-centric slot and object-relation slot representations drastically reduce the number of required visual tokens, while providing competitive generalization. Together, LIBERO+ and SlotVLA provide a compact, interpretable, and effective foundation for advancing object-relation-centric robotic manipulation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.06754v3</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Taisei Hanyu, Nhat Chung, Huy Le, Toan Nguyen, Yuki Ikebe, Anthony Gunderman, Duy Nguyen Ho Minh, Khoa Vo, Tung Kieu, Kashu Yamazaki, Chase Rainwater, Anh Nguyen, Ngan Le</dc:creator>
    </item>
    <item>
      <title>UI2Code^N: UI-to-Code Generation as Interactive Visual Optimization</title>
      <link>https://arxiv.org/abs/2511.08195</link>
      <description>arXiv:2511.08195v3 Announce Type: replace 
Abstract: UI-to-code aims to translate UI screenshots into executable front-end code. Despite progress with vision-language models (VLMs), most existing methods formulate UI-to-code as a single-pass generation, which mismatches real-world UI development that is inherently iterative and feedback-driven. We reformulate UI-to-code as an interactive visual optimization problem, where code generation is embedded in a closed-loop process of execution, visual inspection, and iterative refinement driven by rendered visual feedback. To address the non-differentiability of visual objectives and the noise of absolute visual evaluators, we propose Relative Visual Policy Optimization (RVPO), a preference-based reinforcement learning method that optimizes relative visual rankings among rendered candidates under execution feedback. We instantiate this paradigm in UI2Code^N, an open-source 9B model trained via continual pre-training, supervised fine-tuning, and reinforcement learning. Experiments demonstrate state-of-the-art performance on UI drafting, UI polishing, and UI editing benchmarks, even outperforming larger models, with performance consistently improving through iterative visual optimization. Our code and models are available at https://github.com/zai-org/UI2Code_N.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.08195v3</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhen Yang, Wenyi Hong, Mingde Xu, Xinyue Fan, Weihan Wang, Jiale Cheng, Xiaotao Gu, Jie Tang</dc:creator>
    </item>
    <item>
      <title>NSL-MT: Linguistically Informed Negative Samples for Efficient Machine Translation in Low-Resource Languages</title>
      <link>https://arxiv.org/abs/2511.09537</link>
      <description>arXiv:2511.09537v2 Announce Type: replace 
Abstract: We introduce negative space learning machine translation (NSL-MT), a training method for underresourced languages, that augments limited parallel data with synthetically generated violations of the target language's grammar and explicitly penalizes the model when it assigns high probability to these linguistically invalid outputs. NSL-MT delivers improvements across all baselines we tested, including 3-12% BLEU gains for well-performing models and 56-89% gains for models lacking decent initial support. Furthermore, NSL-MT provides a 5x data efficiency multiplier: training with 1,000 examples matches or exceeds normal training with 5,000 examples. NSL-MT thus provides a data-efficient alternative training method for settings where parallel data is limited.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.09537v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mamadou K. Keita, Christopher Homan, Huy Le</dc:creator>
    </item>
    <item>
      <title>VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping</title>
      <link>https://arxiv.org/abs/2511.13587</link>
      <description>arXiv:2511.13587v3 Announce Type: replace 
Abstract: Visual autoregressive (AR) generation models have demonstrated strong potential for image generation, yet their next-token-prediction paradigm introduces considerable inference latency. Although speculative decoding (SD) has been proven effective for accelerating visual AR models, its "draft one step, then verify one step" paradigm prevents a direct reduction in the number of forward passes, limiting its acceleration potential. Motivated by the interchangeability of visual tokens, we explore verification skipping in the SD process for the first time to explicitly cut the number of target model forward passes, thereby reducing inference latency. By analyzing the characteristics of the drafting stage, we observe that verification redundancy and stale feature reusability are key factors to maintain generation quality while improving speed for verification-free steps. Inspired by these two observations, we propose a novel SD framework VVS to accelerate visual AR model via partial verification skipping, which integrates three complementary modules: (1) a verification-free token selector with dynamic truncation, (2) token-level feature caching and reuse, and (3) fine-grained skipped step scheduling. Consequently, VVS reduces the number of target model forward passes by $2.8\times$ relative to vanilla AR decoding while maintaining competitive generation quality, offering a superior speed-quality trade-off over conventional SD frameworks and revealing strong potential to reshape the SD paradigm. Our code is available at https://github.com/HyattDD/VVS.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.13587v3</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haotian Dong, Ye Li, Rongwei Lu, Chen Tang, Shu-Tao Xia, Zhi Wang</dc:creator>
    </item>
    <item>
      <title>Second-Order MPC-Based Distributed Q-Learning</title>
      <link>https://arxiv.org/abs/2511.16424</link>
      <description>arXiv:2511.16424v2 Announce Type: replace 
Abstract: The state of the art for model predictive control (MPC)-based distributed Q-learning is limited to first-order gradient updates of the MPC parameterization. In general, using secondorder information can significantly improve the speed of convergence for learning, allowing the use of higher learning rates without introducing instability. This work presents a second-order extension to MPC-based Q-learning with updates distributed across local agents, relying only on locally available information and neighbor-to-neighbor communication. In simulation the approach is demonstrated to significantly outperform first-order distributed Q-learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.16424v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Samuel Mallick, Filippo Airaldi, Azita Dabiri, Bart De Schutter</dc:creator>
    </item>
    <item>
      <title>POMA-3D: The Point Map Way to 3D Scene Understanding</title>
      <link>https://arxiv.org/abs/2511.16567</link>
      <description>arXiv:2511.16567v3 Announce Type: replace 
Abstract: In this paper, we introduce POMA-3D, the first self-supervised 3D representation model learned from point maps. Point maps encode explicit 3D coordinates on a structured 2D grid, preserving global 3D geometry while remaining compatible with the input format of 2D foundation models. To transfer rich 2D priors into POMA-3D, a view-to-scene alignment strategy is designed. Moreover, as point maps are view-dependent with respect to a canonical space, we introduce POMA-JEPA, a joint embedding-predictive architecture that enforces geometrically consistent point map features across multiple views. Additionally, we introduce ScenePoint, a point map dataset constructed from 6.5K room-level RGB-D scenes and 1M 2D image scenes to facilitate large-scale POMA-3D pretraining. Experiments show that POMA-3D serves as a strong backbone for both specialist and generalist 3D understanding. It benefits diverse tasks, including 3D question answering, embodied navigation, scene retrieval, and embodied localization, all achieved using only geometric inputs (i.e., 3D coordinates). Overall, our POMA-3D explores a point map way to 3D scene understanding, addressing the scarcity of pretrained priors and limited data in 3D representation learning. Project Page: https://matchlab-imperial.github.io/poma3d/</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.16567v3</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ye Mao, Weixun Luo, Ranran Huang, Junpeng Jing, Krystian Mikolajczyk</dc:creator>
    </item>
    <item>
      <title>Row-stochastic matrices can provably outperform doubly stochastic matrices in decentralized learning</title>
      <link>https://arxiv.org/abs/2511.19513</link>
      <description>arXiv:2511.19513v2 Announce Type: replace 
Abstract: Decentralized learning often involves a weighted global loss with heterogeneous node weights $\lambda$. We revisit two natural strategies for incorporating these weights: (i) embedding them into the local losses to retain a uniform weight (and thus a doubly stochastic matrix), and (ii) keeping the original losses while employing a $\lambda$-induced row-stochastic matrix. Although prior work shows that both strategies yield the same expected descent direction for the global loss, it remains unclear whether the Euclidean-space guarantees are tight and what fundamentally differentiates their behaviors. To clarify this, we develop a weighted Hilbert-space framework $L^2(\lambda;\mathbb{R}^d)$ and obtain convergence rates that are strictly tighter than those from Euclidean analysis. In this geometry, the row-stochastic matrix becomes self-adjoint whereas the doubly stochastic one does not, creating additional penalty terms that amplify consensus error, thereby slowing convergence. Consequently, the difference in convergence arises not only from spectral gaps but also from these penalty terms. We then derive sufficient conditions under which the row-stochastic design converges faster even with a smaller spectral gap. Finally, by using a Rayleigh-quotient and Loewner-order eigenvalue comparison, we further obtain topology conditions that guarantee this advantage and yield practical topology-design guidelines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.19513v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bing Liu, Boao Kong, Limin Lu, Kun Yuan, Chengcheng Zhao</dc:creator>
    </item>
    <item>
      <title>Model Predictive Control and Moving Horizon Estimation using Statistically Weighted Data-Based Ensemble Models</title>
      <link>https://arxiv.org/abs/2511.21343</link>
      <description>arXiv:2511.21343v2 Announce Type: replace 
Abstract: This paper presents a model predictive control (MPC) framework leveraging an ensemble of data-based models to optimally control complex systems under multiple operating conditions. A novel combination rule for ensemble models is proposed, based on the statistical Mahalanobis distance, enabling the ensemble weights to suitably vary across the prediction window based on the system input. In addition, a novel state observer for ensemble models is developed using moving horizon estimation (MHE). The effectiveness of the proposed methodology is demonstrated on a benchmark energy system operating under multiple conditions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.21343v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Laura Boca de Giuli, Samuel Mallick, Alessio La Bella, Azita Dabiri, Bart De Schutter, Riccardo Scattolini</dc:creator>
    </item>
    <item>
      <title>Model-free practical PI-Lead control design by ultimate sensitivity principle</title>
      <link>https://arxiv.org/abs/2511.21641</link>
      <description>arXiv:2511.21641v2 Announce Type: replace 
Abstract: Practical design and tuning of feedback controllers has often to get by without a model of the dynamic process at hand. Only some general assumptions about the system dynamics, in this work type-one stable, can be available for engineers, for instance in motion control applications and many others. This paper proposes a practical and simple in realization procedure for designing a robust PI-Lead control without modeling. The developed method derives from the ultimate sensitivity principles, known in empirical Ziegler-Nichols tuning of PID controllers, and makes use of some general characteristics of the loop shaping. A three-steps procedure is proposed to determine the integration time constant, control gain, and Lead-element in a way to guarantee a sufficient phase margin, while all steps are served by only experimental monitoring of the output value. Proposed method is demonstrated and discussed with experiments accomplished on a noise-perturbed electro-mechanical actuator system.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.21641v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Michael Ruderman</dc:creator>
    </item>
    <item>
      <title>CAEC: Confidential, Attestable, and Efficient Inter-CVM Communication with Arm CCA</title>
      <link>https://arxiv.org/abs/2512.01594</link>
      <description>arXiv:2512.01594v5 Announce Type: replace 
Abstract: Confidential Virtual Machines (CVMs) are increasingly adopted to protect sensitive workloads from privileged adversaries such as the hypervisor. While they provide strong isolation guarantees, existing CVM architectures lack first-class mechanisms for inter-CVM data sharing due to their disjoint memory model, making inter-CVM data exchange a performance bottleneck in compartmentalized or collaborative multi-CVM systems. Under this model, a CVM's accessible memory is either shared with the hypervisor or protected from both the hypervisor and all other CVMs. This design simplifies reasoning about memory ownership; however, it fundamentally precludes plaintext data sharing between CVMs because all inter-CVM communication must pass through hypervisor-accessible memory, requiring costly encryption and decryption to preserve confidentiality and integrity. In this paper, we introduce CAEC, a system that enables protected memory sharing between CVMs. CAEC builds on Arm Confidential Compute Architecture (CCA) and extends its firmware to support Confidential Shared Memory (CSM), a memory region securely shared between multiple CVMs while remaining inaccessible to the hypervisor and all non-participating CVMs. CAEC's design is fully compatible with CCA hardware and introduces only a modest increase (6%) in CCA firmware code size. CAEC delivers substantial performance benefits across a range of workloads. For instance, inter-CVM communication over CAEC achieves up to 209x reduction in CPU cycles compared to encryption-based mechanisms over hypervisor-accessible shared memory. By combining high performance, strong isolation guarantees, and attestable sharing semantics, CAEC provides a practical and scalable foundation for the next generation of trusted multi-CVM services across both edge and cloud environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.01594v5</guid>
      <category>cs.CR</category>
      <category>cs.OS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sina Abdollahi, Amir Al Sadi, David Kotz, Marios Kogias, Hamed Haddadi</dc:creator>
    </item>
    <item>
      <title>Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits</title>
      <link>https://arxiv.org/abs/2512.03465</link>
      <description>arXiv:2512.03465v4 Announce Type: replace 
Abstract: In this study, we more rigorously evaluated our attack script $\textit{TraceTarnish}$, which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. To ensure the efficacy and utility of our attack, we sourced, processed, and analyzed Reddit comments -- comments that were later alchemized into $\textit{TraceTarnish}$ data -- to gain valuable insights. The transformed $\textit{TraceTarnish}$ data was then further augmented by $\textit{StyloMetrix}$ to manufacture stylometric features -- features that were culled using the Information Gain criterion, leaving only the most informative, predictive, and discriminative ones. Our results found that function words and function word types ($L\_FUNC\_A$ $\&amp;$ $L\_FUNC\_T$); content words and content word types ($L\_CONT\_A$ $\&amp;$ $L\_CONT\_T$); and the Type-Token Ratio ($ST\_TYPE\_TOKEN\_RATIO\_LEMMAS$) yielded significant Information-Gain readings. The identified stylometric cues -- function-word frequencies, content-word distributions, and the Type-Token Ratio -- serve as reliable indicators of compromise (IoCs), revealing when a text has been deliberately altered to mask its true author. Similarly, these features could function as forensic beacons, alerting defenders to the presence of an adversarial stylometry attack; granted, in the absence of the original message, this signal may go largely unnoticed, as it appears to depend on a pre- and post-transformation comparison. "In trying to erase a trace, you often imprint a larger one." Armed with this understanding, we framed $\textit{TraceTarnish}$'s operations and outputs around these five isolated features, using them to conceptualize and implement enhancements that further strengthen the attack.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.03465v4</guid>
      <category>cs.CR</category>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Robert Dilworth</dc:creator>
    </item>
    <item>
      <title>DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training</title>
      <link>https://arxiv.org/abs/2512.03847</link>
      <description>arXiv:2512.03847v3 Announce Type: replace 
Abstract: Reinforcement learning (RL) has shown strong performance in LLM post-training, but real-world deployment often involves noisy or incomplete supervision. In such settings, complex and unreliable supervision signals can destabilize training and harm generalization. While existing approaches such as worst-case optimization (e.g., RFQI, CQL) and mean-based methods (e.g., PPO, GRPO) can improve stability, they often overlook generalization and may produce overly conservative policies, leading to uneven performance across diverse real scenarios. To this end, we introduce DVPO (Distributional Value Modeling with Risk-aware Policy Optimization), a new RL framework that combines conditional risk theory with distributional value modeling to better balance robustness and generalization. DVPO learns token-level value distributions to provide fine-grained supervision, and applies an asymmetric risk regularization to shape the distribution tails: it contracts the lower tail to dampen noisy negative deviations, while expanding the upper tail to preserve exploratory diversity. Across extensive experiments and analysis in multi-turn dialogue, math reasoning, and scientific QA, DVPO consistently outperforms PPO, GRPO, and robust Bellman-based PPO under noisy supervision, showing its potential for LLM post-training in the real-world.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.03847v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dingwei Zhu, Zhiheng Xi, Shihan Dou, Yuhui Wang, Sixian Li, Junjie Ye, Honglin Guo, Shichun Liu, Chenhao Huang, Yajie Yang, Junlin Shang, Senjie Jin, Ming Zhang, Jiazheng Zhang, Caishuang Huang, Yunke Zhang, Yuran Wang, Tao Gui</dc:creator>
    </item>
    <item>
      <title>Learning to Orchestrate Agents in Natural Language with the Conductor</title>
      <link>https://arxiv.org/abs/2512.04388</link>
      <description>arXiv:2512.04388v5 Announce Type: replace 
Abstract: Powerful large language models (LLMs) from different providers have been expensively trained and finetuned to specialize across varying domains. In this work, we introduce a new kind of Conductor model trained with reinforcement learning to automatically discover powerful coordination strategies among LLMs. Our Conductor learns not only to design targeted communication topologies for effective agent-to-agent collaboration, but also to prompt engineer focused instructions to the LLMs to maximally leverage their individual capabilities. We show that, by learning optimal coordination strategies over pools of powerful worker LLMs, a 7B Conductor achieves significant performance gains beyond any individual worker, attaining state-of-the-art results in challenging reasoning benchmarks, such as LiveCodeBench and GPQA. By training with randomized agent pools, our conductor effectively adapts to arbitrary sets of open- and closed-source agents, meeting any user requirements. Furthermore, allowing the Conductor to select itself as a worker gives rise to recursive topologies, elevating performance with a new form of dynamic test-time scaling through online iterative adaptation. More broadly, ours is among the early work demonstrating language model coordination can be unlocked through RL, where powerful coordination strategies emerge naturally in LLMs through pure end-to-end reward maximization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.04388v5</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Stefan Nielsen, Edoardo Cetin, Peter Schwendeman, Qi Sun, Jinglue Xu, Yujin Tang</dc:creator>
    </item>
    <item>
      <title>On Tight FPT Time Approximation Algorithms for k-Clustering Problems</title>
      <link>https://arxiv.org/abs/2512.04614</link>
      <description>arXiv:2512.04614v2 Announce Type: replace 
Abstract: Following recent advances in combining approximation algorithms with fixed-parameter tractability (FPT), we study FPT-time approximation algorithms for minimum-norm $k$-clustering problems, parameterized by the number $k$ of open facilities.
  For the capacitated setting, we give a tight $(3+\epsilon)$-approximation for the general-norm capacitated $k$-clustering problem in FPT-time parameterized by $k$ and $\epsilon$. Prior to our work, such a result was only known for the capacitated $k$-median problem [CL, ICALP, 2019]. As a special case, our result yields an FPT-time $3$-approximation for capacitated $k$-center. The problem has not been studied in the FPT-time setting, with the previous best known polynomial-time approximation ratio being 9 [ABCG, MP, 2015].
  In the uncapacitated setting, we consider the $top$-$cn$ norm $k$-clustering problem, where the goal of the problem is to minimize the $top$-$cn$ norm of the connection distance vector. Our main result is a tight $\big(1 + \frac 2{ec} + \epsilon\big)$-approximation algorithm for the problem with $c \in \big(\frac1e, 1\big]$. (For the case $c \leq \frac1e$, there is a simple tight $(3+\epsilon)$-approximation.) Our framework can be easily extended to give a tight $\left(3, 1+\frac2e + \epsilon\right)$-bicriteria approximation for the ($k$-center, $k$-median) problem in FPT time, improving the previous best polynomial-time $(4, 8)$ guarantee [AB, WAOA, 2017].
  All results are based on a unified framework: computing a $(1+\epsilon)$-approximate solution using $O\left(\frac{k\log n}{\epsilon}\right)$ facilities $S$ via LP rounding, sampling a few client representatives $R$ based on the solution $S$, guessing a few pivots from $S \cup R$ and some radius information on the pivots, and solving the problem using the guesses. We believe this framework can lead to further results on $k$-clustering problems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.04614v2</guid>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Han Dai, Shi Li, Sijin Peng</dc:creator>
    </item>
    <item>
      <title>Variance Matters: Improving Domain Adaptation via Stratified Sampling</title>
      <link>https://arxiv.org/abs/2512.05226</link>
      <description>arXiv:2512.05226v2 Announce Type: replace 
Abstract: Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during training, but the discrepancy estimates suffer from high variance in stochastic settings, which can stifle the theoretical benefits of the method. This paper proposes Variance-Reduced Domain Adaptation via Stratified Sampling (VaRDASS), the first specialised stochastic variance reduction technique for UDA. We consider two specific discrepancy measures -- correlation alignment and the maximum mean discrepancy (MMD) -- and derive ad hoc stratification objectives for these terms. We then present expected and worst-case error bounds, and prove that our proposed objective for the MMD is theoretically optimal (i.e., minimises the variance) under certain assumptions. Finally, a practical k-means style optimisation algorithm is introduced and analysed. Experiments on four domain shift datasets demonstrate improved discrepancy estimation accuracy and target domain performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.05226v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:journal_reference>Transactions on Machine Learning Research, 2835-8856 (2026)</arxiv:journal_reference>
      <dc:creator>Andrea Napoli, Paul White</dc:creator>
    </item>
    <item>
      <title>NEAT: Neighborhood-Guided, Efficient, Autoregressive Set Transformer for 3D Molecular Generation</title>
      <link>https://arxiv.org/abs/2512.05844</link>
      <description>arXiv:2512.05844v3 Announce Type: replace 
Abstract: Transformer-based autoregressive models offer an efficient alternative to diffusion- and flow-matching-based approaches for generating 3D molecules. One challenge remains: standard transformer architectures require a sequential ordering of tokens, which is not inherently defined for the atoms in a molecule. Prior works have addressed this by using canonical atom orderings. However, these approaches are not permutation invariant w.r.t. atoms and bias next-token prediction towards ordering conventions. We overcome this limitation by introducing a novel neighborhood-guided training strategy. Our model, NEAT (Neighborhood-Guided, Efficient, Autoregressive Set Transformer) treats molecular graphs as sets of atoms and learns an order-agnostic distribution over admissible tokens at the graph boundary, thereby ensuring atom-level permutation invariance. NEAT achieves state-of-the-art generation quality on the QM9 and GEOM-Drugs datasets while offering a significant speed advantage over existing baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.05844v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Daniel Rose, Roxane Axel Jacob, Johannes Kirchmair, Thierry Langer</dc:creator>
    </item>
    <item>
      <title>Conflict-Aware Fusion: Mitigating Logic Inertia in Large Language Models via Structured Cognitive Priors</title>
      <link>https://arxiv.org/abs/2512.06393</link>
      <description>arXiv:2512.06393v5 Announce Type: replace 
Abstract: Large language models (LLMs) achieve high accuracy on many reasoning benchmarks but remain brittle under structural perturbations of rule-based systems. We introduce a diagnostic framework with four stress tests -- redundant vs. essential rule deletion, contradictory-rule injection, logic-preserving rewrites, and multi-law stacking -- and use it to expose Logic Inertia: the tendency of generative LLMs (Qwen2/3, TinyLlama, GPT-4o, Gemma-3-4B-IT) and the encoder-only BERT baseline to persist along learned deductive trajectories under inconsistent premises. The collapse is sharp: untreated baselines fall from accuracy 1.00 on the base task to 0.00 on contradiction injection (instance-level exact match), and GPT-4o resolves only 56.0% of contradiction cases. We propose Conflict-Aware Fusion, a four-stage training pipeline that enforces verification-before-deduction as a learned structural prior: (i) SFT establishes the verification preamble; (ii) DPO sharpens the halt-on-contradiction decision boundary; (iii) Logical Invariance REgularisation (LIRE) penalises divergence between logically equivalent rule formulations via symmetric KL; (iv) Reinforcement Learning from Verification Feedback (RLVF) uses a symbolic forward-chaining engine as a deterministic oracle reward, jointly optimising invariance and sensitivity. The pipeline saturates all four primary stress tests for both 1.5B and 8B backbones. We further validate a Phase 2 extension that replaces the propositional oracle with a Lean 4 kernel, attaining 99.0% kernel agreement on the 105 classically-derivable (T) questions within a stratified 187-question Lean-translated sample (overall 71.7% across both polarities), providing a sound upgrade path to formally verified RL training. Code and benchmark: https://github.com/14H034160212/lemo</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.06393v5</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Qiming Bao, Xiaoxuan Fu, Michael Witbrock</dc:creator>
    </item>
    <item>
      <title>Theoretical Studies of Sub-THz Active Split-Ring Resonators for Near-Field Imaging</title>
      <link>https://arxiv.org/abs/2512.08265</link>
      <description>arXiv:2512.08265v2 Announce Type: replace 
Abstract: This paper develops a theoretical framework for the design of Active Split-Ring Resonators (ASRRs). An ASRR is a Split-Ring Resonator (SRR) equipped with a tunable negative resistor, enabling both switchability and quality factor boosting and tuning. These properties make ASRRs well-suited for integration into dense arrays on silicon chips, where pixelated near-fields are generated and leveraged for high-resolution 2D imaging of samples. Such imagers pave the way for real-time, non-invasive, and low-cost imaging of human body tissue. The paper investigates ASRR coupling to host transmission lines, nonlinear effects, signal flow, and the influence of various noise sources on detection performance. Verified through simulations, these studies provide design guidelines for optimizing the Signal-to-Noise Ratio (SNR) and power consumption of a single pixel, while adhering to the constraints of a scalable array.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.08265v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1109/TCSI.2026.3685973</arxiv:DOI>
      <dc:creator>Ali Ameri, Jun-Chau Chien, Ali M. Niknejad</dc:creator>
    </item>
    <item>
      <title>Can the GPC standard eliminate consent banners in the EU?</title>
      <link>https://arxiv.org/abs/2512.08856</link>
      <description>arXiv:2512.08856v3 Announce Type: replace 
Abstract: In the EU, the General Data Protection Regulation and the ePrivacy Directive mandate consent for the use of personal data for the purpose of behavioural advertising and tracking technologies. However, the ubiquity of consent banners has led to widespread consent fatigue and questions about the effectiveness of these mechanisms in protecting data subjects' data. To simplify digital laws and make the EU more competitive, the EU Commission recently proposed the Digital Omnibus, introducing a new Article 88b GDPR to express data subjects' choices in a technical way. While the Digital Omnibus is under legislative negotiation, California residents and residents of other US states can already exercise their rights via Global Privacy Control (GPC), a privacy signal to automatically broadcast a legally binding opt-out request to websites. In light of the Digital Omnibus, we evaluate to which extent GPC can be adapted to the EU legal framework to reduce consent banners, mitigate consent fatigue, and improve data protection for EU users.
  GPC is based on a technical specification, currently being standardised at the World Wide Web Consortium. By sending a GPC signal, data subjects can express their refusal or withdrawal of consent under the GDPR to the use of their personal data for cross-context ad targeting and, in some cases, to express their objection under the GDPR against the use of their data for such purposes. Our evaluation identifies friction between the GPC specification and current EU data protection law. In the longer term, it would be possible for the EU legislator to amend EU laws, as proposed in the current Digital Omnibus, in such a way that internet users can use automated signals to express choices about personal data use and online tracking. In the shorter term, websites and companies who conduct online tracking can already honour GPC.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.08856v3</guid>
      <category>cs.CY</category>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1016/j.clsr.2026.106332</arxiv:DOI>
      <arxiv:journal_reference>Computer Law &amp; Security Review, 61, 106332 (2026)</arxiv:journal_reference>
      <dc:creator>Sebastian Zimmeck, Harshvardhan J. Pandit, Frederik Zuiderveen Borgesius, Cristiana Teixeira Santos, Konrad Kollnig, Robin Berjon</dc:creator>
    </item>
    <item>
      <title>A Neuro-Symbolic Framework for Accountability in Public-Sector AI</title>
      <link>https://arxiv.org/abs/2512.12109</link>
      <description>arXiv:2512.12109v4 Announce Type: replace 
Abstract: Automated eligibility systems increasingly determine access to essential public benefits, but the explanations they generate often fail to reflect the legal rules that authorize those decisions. This thesis develops a legally grounded explainability framework that links system-generated decision justifications to the statutory constraints of CalFresh, California's Supplemental Nutrition Assistance Program. The framework combines a structured ontology of eligibility requirements derived from the state's Manual of Policies and Procedures (MPP), a rule extraction pipeline that expresses statutory logic in a verifiable formal representation, and a solver-based reasoning layer to evaluate whether the explanation aligns with governing law. Case evaluations demonstrate the framework's ability to detect legally inconsistent explanations, highlight violated eligibility rules, and support procedural accountability by making the basis of automated determinations traceable and contestable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.12109v4</guid>
      <category>cs.CY</category>
      <category>cs.AI</category>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3805689.3812335</arxiv:DOI>
      <dc:creator>Allen Daniel Sunny, Ido Sivan-Sevilla</dc:creator>
    </item>
    <item>
      <title>Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation</title>
      <link>https://arxiv.org/abs/2512.14954</link>
      <description>arXiv:2512.14954v2 Announce Type: replace 
Abstract: Computing next-token likelihood ratios between two language models (LMs) is a standard task in training paradigms such as knowledge distillation. Since this requires both models to share the same probability space, it becomes challenging when the teacher and student LMs use different tokenizers, for instance, when edge-device deployment necessitates a smaller vocabulary size to lower memory overhead. This work addresses this vocabulary misalignment problem by uncovering an implicit recursive structure in the commonly deployed Byte-Pair Encoding (BPE) algorithm and utilizing it to create a probabilistic framework for cross-tokenizer likelihood scoring. Our method enables sequence likelihood evaluation for vocabularies different from the teacher model native tokenizer, addressing two specific scenarios: when the student vocabulary is a subset of the teacher vocabulary, and the general case where it is arbitrary. In the subset regime, our framework computes exact likelihoods and provides next-token probabilities for sequential sampling with only ${O}(1)$ model evaluations per token. When used for distillation, this yields up to a $12\%$ reduction in memory footprint for the Qwen2.5-1.5B model while also improving baseline performance up to $4\%$ on the evaluated tasks. For the general case, we introduce a rigorous lossless procedure that leverages BPE recursive structure, complemented by a fast approximation that keeps large-vocabulary settings practical. Applied to GSM8K mathematical reasoning distillation, our method improves accuracy by over $2\%$ the current state of the art. Code: github.com/truongbuu/cross-tokenizer-scoring</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.14954v2</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Buu Phan, Ashish Khisti, Karen Ullrich</dc:creator>
    </item>
    <item>
      <title>Beyond Majority Voting: Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2512.15146</link>
      <description>arXiv:2512.15146v4 Announce Type: replace 
Abstract: Test-time reinforcement learning mitigates the reliance on annotated data by using majority voting results as pseudo-labels, emerging as a complementary direction to reinforcement learning with verifiable rewards (RLVR) for improving reasoning ability. However, this voting strategy often induces confirmation bias and suffers from sparse rewards, limiting the overall performance. In this work, we propose subgroup-specific step-wise confidence-weighted pseudo-label estimation (SCOPE), a framework integrating model confidence and dynamic subgroup partitioning to address these issues. Specifically, SCOPE integrates the proposed step-wise confidence into pseudo label estimation, prioritizing high-quality reasoning paths over simple frequency count. Furthermore, it dynamically partitions the candidate outputs pool into independent subgroups by balancing reasoning quality against exploration diversity. By deriving local consensus via repeat sampling for each sub group, SCOPE provides diverse supervision targets to encourage broader exploration. We conduct experiments across various models and benchmarks, experimental results show that SCOPE consistently outperforms recent baselines. Notably, SCOPE achieving relative improvements of 13.1% on challenging AIME 2025 and 8.1% on AMC. The code is released at https://github.com/szu-tera/SCOPE.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.15146v4</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Weiqin Wang, Yile Wang, Kehao Chen, Hui Huang</dc:creator>
    </item>
    <item>
      <title>Towards Generative Location Awareness for Disaster Response: A Probabilistic Cross-view Geolocalization Approach</title>
      <link>https://arxiv.org/abs/2512.20056</link>
      <description>arXiv:2512.20056v2 Announce Type: replace 
Abstract: As Earth's climate changes, it is impacting disasters and extreme weather events across the planet. Record-breaking heat waves, drenching rainfalls, extreme wildfires, and widespread flooding during hurricanes are all becoming more frequent and more intense. Rapid and efficient response to disaster events is essential for climate resilience and sustainability. A key challenge in disaster response is to accurately and quickly identify disaster locations to support decision-making and resources allocation. In this paper, we propose a Probabilistic Cross-view Geolocalization approach, called ProbGLC, exploring new pathways towards generative location awareness for rapid disaster response. Herein, we combine probabilistic and deterministic geolocalization models into a unified framework to simultaneously enhance model explainability (via uncertainty quantification) and achieve state-of-the-art geolocalization performance. Designed for rapid diaster response, the ProbGLC is able to address cross-view geolocalization across multiple disaster events as well as to offer unique features of probabilistic distribution and localizability score. To evaluate the ProbGLC, we conduct extensive experiments on two cross-view disaster datasets (i.e., MultiIAN and SAGAINDisaster), consisting diverse cross-view imagery pairs of multiple disaster types (e.g., hurricanes, wildfires, floods, to tornadoes). Preliminary results confirms the superior geolocalization accuracy (i.e., 0.86 in Acc@1km and 0.97 in Acc@25km) and model explainability (i.e., via probabilistic distributions and localizability scores) of the proposed ProbGLC approach, highlighting the great potential of leveraging generative cross-view approach to facilitate location awareness for better and faster disaster response. The data and code is publicly available at https://github.com/bobleegogogo/ProbGLC</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.20056v2</guid>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1016/j.isprsjprs.2026.03.050</arxiv:DOI>
      <dc:creator>Hao Li, Fabian Deuser, Wenping Yin, Steffen Knoblauch, Wufan Zhao, Filip Biljecki, Yong Xue, Wei Huang</dc:creator>
    </item>
    <item>
      <title>DIAL: Direct Iterative Adversarial Learning for Realistic Multi-Turn Dialogue Simulation</title>
      <link>https://arxiv.org/abs/2512.20773</link>
      <description>arXiv:2512.20773v4 Announce Type: replace 
Abstract: Realistic user simulation is crucial for training and evaluating multi-turn dialogue systems, yet creating simulators that accurately replicate human behavior remains a significant challenge. An effective simulator must expose the failure modes of the systems under evaluation. This work introduces Direct Iterative Adversarial Learning (DIAL), an adversarial framework that iteratively enhances user simulator realism through a competitive dynamic between a generator (user simulator) and a discriminator. When applied to mental health support, a domain characterized by diverse failure types and a critical dependence on realistic user behavior for failure detection, DIAL restores lexical diversity diminished by supervised fine-tuning and drastically reduces discriminator accuracy. The resulting simulator exhibits a strong correlation between simulated and real failure occurrence rates while maintaining low distributional divergence of failure modes. These findings indicate that DIAL is a promising method for developing realistic user simulators in multi-turn dialogue, facilitating reliable and cost-effective system evaluation prior to deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.20773v4</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ziyi Zhu, Olivier Tieleman, Caitlin A. Stamatis, Luka Smyth, Thomas D. Hull, Daniel R. Cahn, Jinghong Chen, Matteo Malgaroli</dc:creator>
    </item>
    <item>
      <title>Low-Latency Quasi-Static Modeling of UAV Tether Aerodynamics</title>
      <link>https://arxiv.org/abs/2512.22588</link>
      <description>arXiv:2512.22588v3 Announce Type: replace 
Abstract: One of the main limitations of multirotor UAVs is their short flight time due to battery constraints. A practical solution for continuous operation is to power the drone from the ground via a tether. While this approach has been demonstrated for stationary systems, scenarios with a fast-moving base vehicle or strong wind conditions require modeling the tether forces, including aerodynamic effects. In this work, we propose two complementary approaches for low-latency quasi-static tether modeling with aerodynamics. The first is an analytical method based on catenary theory with a uniform drag assumption, achieving very fast solve times below 1 ms. The second is a numerical method that discretizes the tether into segments and lumped masses, solving the equilibrium equations using CasADi and IPOPT. By leveraging initialization strategies, such as warm starting and analytical initialization, low-latency performance was achieved with a solve time of 5 ms, while allowing for flexible force formulations. Both approaches were validated in real-world tests using a load cell to measure the tether force. The results show that the analytical method provides sufficient accuracy for most tethered UAV applications with minimal computational cost, while the numerical method offers higher flexibility and physical accuracy when required. These approaches form a lightweight and extensible framework for low-latency tether simulation, applicable to both offline optimization and online tasks such as simulation, control, and trajectory planning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.22588v3</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Max Beffert, Andreas Zell</dc:creator>
    </item>
    <item>
      <title>Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2</title>
      <link>https://arxiv.org/abs/2512.22671</link>
      <description>arXiv:2512.22671v2 Announce Type: replace 
Abstract: Structured width pruning of GLU-MLP layers, guided by the Maximum Absolute Weight (MAW) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model capabilities. While performance on tasks relying on parametric knowledge (e.g., MMLU, GSM8K) and perplexity metrics degrades predictably, instruction-following capabilities improve substantially (+46% to +75% in IFEval for Llama-3.2-1B and 3B models), and multi-step reasoning remains robust (MUSR). This pattern challenges the prevailing assumption that pruning induces uniform degradation. We evaluated seven expansion ratio configurations using comprehensive benchmarks assessing factual knowledge, mathematical reasoning, language comprehension, instruction-following, and truthfulness. Our analysis identifies the expansion ratio as a critical architectural parameter that selectively modulates cognitive capabilities, rather than merely serving as a compression metric. We provide the first systematic characterization of this selective preservation phenomenon. Notably, we document a robust inverse correlation (r = -0.864, p = 0.012 in Llama-3B) between factual knowledge capacity (MMLU) and truthfulness metrics (TruthfulQA-MC2): as knowledge degrades, the model's ability to discriminate misconceptions improves consistently. This connects two previously distinct research areas, demonstrating that MAW-guided width pruning acts as a selective filter, reducing parametric knowledge while preserving or enhancing behavioral alignment. Additionally, we quantify context-dependent efficiency trade-offs: pruned configurations achieve up to 23% reduction in energy consumption (J/token) but incur penalties in single-request latency, whereas batch processing workloads benefit uniformly.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.22671v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pere Martra</dc:creator>
    </item>
    <item>
      <title>Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation</title>
      <link>https://arxiv.org/abs/2512.23864</link>
      <description>arXiv:2512.23864v3 Announce Type: replace 
Abstract: Vision-Language-Action (VLA) models have shown remarkable generalization by mapping web-scale knowledge to robotic control, yet they remain blind to physical contact. Consequently, they struggle with contact-rich manipulation tasks that require reasoning about force, texture, and slip. While some approaches incorporate low-dimensional tactile signals, they fail to capture the high-resolution dynamics essential for such interactions. To address this limitation, we introduce DreamTacVLA, a framework that grounds VLA models in contact physics by learning to feel the future. Our model adopts a hierarchical perception scheme in which high-resolution tactile images serve as micro-vision inputs coupled with wrist-camera local vision and third-person macro vision. To reconcile these multi-scale sensory streams, we first train a unified policy with a Hierarchical Spatial Alignment (HSA) loss that aligns tactile tokens with their spatial counterparts in the wrist and third-person views. To further deepen the model's understanding of fine-grained contact dynamics, we finetune the system with a tactile world model that predicts future tactile signals. To mitigate tactile data scarcity and the wear-prone nature of tactile sensors, we construct a hybrid large-scale dataset sourced from both high-fidelity digital twin and real-world experiments. By anticipating upcoming tactile states, DreamTacVLA acquires a rich model of contact physics and conditions its actions on both real observations and imagined consequences. Across contact-rich manipulation tasks, it outperforms state-of-the-art VLA baselines, achieving up to 95% success, highlighting the importance of understanding physical contact for robust, touch-aware robotic agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.23864v3</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Guo Ye, Zexi Zhang, Xu Zhao, Shang Wu, Haoran Lu, Shihan Lu, Han Liu</dc:creator>
    </item>
    <item>
      <title>Personalized Spiking Neural Networks with Ferroelectric Synapses for EEG Signal Processing</title>
      <link>https://arxiv.org/abs/2601.00020</link>
      <description>arXiv:2601.00020v3 Announce Type: replace 
Abstract: Electroencephalography (EEG)-based brain-computer interfaces (BCIs) are strongly affected by non-stationary neural signals that vary across sessions and individuals, limiting the generalization of subject-agnostic models and motivating adaptive and personalized learning on resource-constrained platforms. Programmable memristive hardware offers a promising substrate for such post-deployment adaptation; however, practical realization is challenged by limited weight resolution, device variability, nonlinear programming dynamics, and finite device endurance. In this work, we show that spiking neural networks (SNNs) can be deployed on ferroelectric memristive synaptic devices for adaptive EEG-based motor imagery decoding under realistic device constraints, achieving classification performance comparable to software-based SNNs. We fabricate, characterize, and model the weight update in ferroelectric synapses. We then evaluate the deployment of convolutional-recurrent SNN architecture using two strategies. First, we adapt to SNNs a mixed precision strategy in which gradient-based updates are accumulated digitally and converted into discrete programming events only when a threshold is exceeded. Additionally, the weight update is device-aware and accounts for the nonlinear, state-dependent programming dynamics. During learning and adaptation, this scheme mitigates possible endurance and energy constraints. Second, we evaluate the transfer of software-trained weights followed by low-overhead on-device re-tuning. We show that, subject-specific transfer learning achieved by retraining only the final network layers improves classification accuracy. These results demonstrate that programmable ferroelectric hardware can support robust, low-overhead adaptation in spiking neural networks, opening a practical path toward personalized neuromorphic processing of neural signals.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.00020v3</guid>
      <category>cs.NE</category>
      <category>cs.AI</category>
      <category>cs.ET</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1063/5.0319912</arxiv:DOI>
      <dc:creator>Nikhil Garg, Anxiong Song, Niklas Plessnig, Nathan Savoia, Laura B\'egon-Lours</dc:creator>
    </item>
    <item>
      <title>S1-MMAlign: A Large-Scale, Multi-Disciplinary Dataset for Scientific Figure-Text Understanding</title>
      <link>https://arxiv.org/abs/2601.00264</link>
      <description>arXiv:2601.00264v2 Announce Type: replace 
Abstract: Multimodal learning has revolutionized general domain tasks, yet its application in scientific discovery is hindered by the profound semantic gap between complex scientific imagery and sparse textual descriptions. We present S1-MMAlign, a large-scale, multi-disciplinary multimodal dataset comprising over 15.5 million high-quality image-text pairs derived from 2.5 million open-access scientific papers. Spanning disciplines from physics and biology to engineering, the dataset captures diverse visual modalities including experimental setups, heatmaps, and microscopic imagery. To address the pervasive issue of weak alignment in raw scientific captions, we introduce an AI-ready semantic enhancement pipeline that leverages advanced multimodal large language models to recaption images, by synthesizing comprehensive context from paper abstracts and the citation contexts of corresponding figures. Technical validation confirms that our enhancement pipeline markedly improves data quality via reduced SciBERT pseudo-perplexity and enhanced CLIP image-text alignment, while also significantly boosting multimodal large language models performance in zero-shot scientific captioning, multi-domain scientific reasoning, and visual instruction tuning. S1-MMAlign provides a pivotal foundational resource for cross-modal scientific understanding in the AI for Science era, supporting the development of scientific foundation models and a wide range of downstream scientific intelligence applications. The dataset is publicly available at https://huggingface.co/datasets/ScienceOne-AI/S1-MMAlign.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.00264v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>He Wang, Longteng Guo, Pengkang Huo, Xuanxu Lin, Yichen Yuan, Jie Jiang, Jing Liu</dc:creator>
    </item>
    <item>
      <title>SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting</title>
      <link>https://arxiv.org/abs/2601.00285</link>
      <description>arXiv:2601.00285v2 Announce Type: replace 
Abstract: Reconstructing a dynamic target moving over a large area is challenging. Standard approaches for dynamic object reconstruction require dense coverage in both the viewing space and the temporal dimension, typically relying on multi-view videos captured at each time step. However, such setups are only possible in constrained environments. In real-world scenarios, observations are often sparse over time and captured sparsely from diverse viewpoints (e.g., from security cameras), making dynamic reconstruction highly ill-posed. We present SV-GS, a framework that simultaneously estimates a deformation model and the object's motion over time under sparse observations. To initialize SV-GS, we leverage a rough skeleton graph and an initial static reconstruction as inputs to guide motion estimation. (Later, we show that this input requirement can be relaxed.) Our method optimizes a skeleton-driven deformation field composed of a coarse skeleton joint pose estimator and a module for fine-grained deformations. By making only the joint pose estimator time-dependent, our model enables smooth motion interpolation while preserving learned geometric details. Experiments on synthetic datasets show that our method outperforms existing approaches under sparse observations by up to 34% in PSNR, and achieves comparable performance to dense monocular video methods on real-world datasets despite using significantly fewer frames. Moreover, we demonstrate that the input initial static reconstruction can be replaced by a diffusion-based generative prior, making our method more practical for real-world scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.00285v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jun-Jee Chao, Volkan Isler</dc:creator>
    </item>
    <item>
      <title>In Line with Context: Repository-Level Code Generation via Context Inlining</title>
      <link>https://arxiv.org/abs/2601.00376</link>
      <description>arXiv:2601.00376v3 Announce Type: replace 
Abstract: Repository-level code generation has attracted growing attention in recent years. Unlike function-level code generation, it requires the model to understand the entire repository, reasoning over complex dependencies across functions, classes, and modules. However, existing approaches such as retrieval-augmented generation (RAG) or context-based function selection often fall short: they primarily rely on surface-level similarity and struggle to capture the rich dependencies that govern repository-level semantics. In this paper, we introduce InlineCoder, a novel framework for repository-level code generation. InlineCoder enhances the understanding of repository context by inlining the unfinished function into its call graph, thereby reframing the challenging repository understanding as an easier function-level coding task. Given a function signature, InlineCoder first generates a draft completion, termed an anchor, which approximates downstream dependencies and enables perplexity-based confidence estimation. This anchor drives a bidirectional inlining process: (i) Upstream Inlining, which embeds the anchor into its callers to capture diverse usage scenarios; and (ii) Downstream Retrieval, which integrates the anchor's callees into the prompt to provide precise dependency context. The enriched context, combining draft completion with upstream and downstream perspectives, equips the LLM with a comprehensive repository view.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.00376v3</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chao Hu, Wenhao Zeng, Yuling Shi, Beijun Shen, Xiaodong Gu</dc:creator>
    </item>
    <item>
      <title>Dynamic Hyperparameter Importance for Efficient Multi-Objective Optimization</title>
      <link>https://arxiv.org/abs/2601.03166</link>
      <description>arXiv:2601.03166v2 Announce Type: replace 
Abstract: Choosing a suitable ML model is a complex task that can depend on several objectives, e.g., accuracy, fairness, or energy consumption. In practice, this requires trading off multiple, often competing, objectives through multi-objective optimization (MOO). However, existing MOO methods typically treat all hyperparameters as equally important, disregarding that hyperparameter importance (HPI) can vary significantly across objectives. We propose a novel dynamic optimization approach that prioritizes the most influential hyperparameters based on varying objective trade-offs during the search, thereby accelerating empirical convergence. We advance prior work on HPI for MOO from post-analysis to direct, dynamic integration within the optimization, using the recent HPI method HyperSHAP. For this, we leverage the objective weightings naturally produced by the MOO algorithm ParEGO and reduce the configuration space by fixing the unimportant hyperparameters, allowing the search to focus on the important ones. Eventually, we evaluate our method on diverse tasks from PyMOO and YAHPO-Gym. For HPO, integrating HPI yields up to 24% improvement in final Pareto front quality, while on synthetic data, integrating HPI achieves 2x better final results.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.03166v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Daphne Theodorakopoulos, Marcel Wever, Marius Lindauer</dc:creator>
    </item>
    <item>
      <title>Craig-Lyndon Interpolation for the Logic of Here and There with a Variation of Mints' Sequent System</title>
      <link>https://arxiv.org/abs/2601.04080</link>
      <description>arXiv:2601.04080v3 Announce Type: replace 
Abstract: We present a variation of Maehara's method to construct Craig-Lyndon interpolants for the three-valued propositional logic of here and there (HT), also known as G\"odel's $G_3$, a superintuitionistic logic of importance in logic programming. Our method adapts a recent interpolation technique that operates on classically encoded logic programs to a variation of Mints' sequent system for HT. The approach is characterized by two stages: First, a preliminary interpolant is constructed, a formula that is an interpolant in some sense but not yet the desired HT formula. In the second stage, an actual HT interpolant is obtained from this preliminary interpolant. With the classical encoding, the preliminary interpolant is a classical Craig-Lyndon interpolant for classical encodings of the two input HT formulas. In the presented adaptation, the sequent system operates directly on HT formulas, and the preliminary interpolant is in a nonclassical logic that generalizes HT by an additional logic operator.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.04080v3</guid>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Christoph Wernhard</dc:creator>
    </item>
    <item>
      <title>Inapproximability of Counting Permutation Patterns</title>
      <link>https://arxiv.org/abs/2601.05166</link>
      <description>arXiv:2601.05166v2 Announce Type: replace 
Abstract: Detecting and counting copies of permutation patterns are fundamental algorithmic problems, with applications in the analysis of rankings, nonparametric statistics, and property testing tasks such as independence and quasirandomness testing. From an algorithmic perspective, there is a sharp difference in complexity between detecting and counting the copies of a given length-$k$ pattern in a length-$n$ permutation. The former admits a $2^{\mathcal{O}(k^2)} \cdot n$ time algorithm (Guillemot and Marx, 2014) while the latter cannot be solved in time $f(k)\cdot n^{o(k/\log k)}$ unless the Exponential Time Hypothesis (ETH) fails (Berendsohn, Kozma, and Marx, 2021). In fact already for patterns of length 4, exact counting is unlikely to admit near-linear time algorithms under standard fine-grained complexity assumptions (Dudek and Gawrychowski, 2020).
  Recently, Ben-Eliezer, Mitrovi\'c and Sristava (2026) showed that for patterns of length up to 5, a $(1+\varepsilon)$-approximation of the pattern count can be computed in near-linear time, yielding a separation between exact and approximate counting for small patterns, and conjectured that approximate counting is asymptotically easier than exact counting in general. We strongly refute their conjecture by showing that, under ETH, no algorithm running in time $f(k)\cdot n^{o(k/\log k)}$ can approximate the number of copies of a length-$k$ pattern within a multiplicative factor $n^{(1/2-\varepsilon)k}$. The lower bound on runtime matches the conditional lower bound for exact pattern counting, and the obtained bound on the multiplicative error factor is essentially tight, as an $n^{k/2}$-approximation can be computed in $2^{\mathcal{O}(k^2)}\cdot n$ time using an algorithm for pattern detection.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.05166v2</guid>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Michal Opler</dc:creator>
    </item>
    <item>
      <title>Age of Gossip With Cellular Drone Mobility</title>
      <link>https://arxiv.org/abs/2601.05983</link>
      <description>arXiv:2601.05983v2 Announce Type: replace 
Abstract: We consider a cellular network containing $n$ nodes where nodes within a cell gossip with each other in a fully-connected fashion and a source shares updates with these nodes via a mobile drone. The drone receives source updates and shares them with nodes in the cell where it currently resides. The drone moves between cells according to an underlying continuous-time Markov chain (CTMC). We evaluate the impact of the number of cells $f(n)$, drone speed $\lambda_m(n)$ and drone dissemination rate $\lambda_d(n)$ on the information freshness of nodes in the network. We use the version age of information metric to quantify information freshness. We observe that the expected duration between two drone-to-cell service times depends on the stationary distribution of the underlying CTMC and $\lambda_d(n)$, but not on $\lambda_m(n)$. However, the version age instability makes high probability analysis for a general underlying CTMC difficult. Therefore, we focus on the fully-connected drone mobility model. Under this model, we uncover a dual-bottleneck, by leveraging stochastic equivalence between drone mobility and drone dissemination speed: the version age is constrained by the slower of these two processes. If $\lambda_d(n) \gg \lambda_m(n)$, then the version age scaling of nodes is dominated by the inverse of $\lambda_m(n)$ and is independent of $\lambda_d(n)$. If $\lambda_m(n) \gg \lambda_d(n)$, then the version age scaling of nodes is dominated by the inverse of $\lambda_d(n)$ and is independent of $\lambda_m(n)$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.05983v2</guid>
      <category>cs.IT</category>
      <category>cs.NI</category>
      <category>cs.SI</category>
      <category>eess.SP</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Arunabh Srivastava, Sennur Ulukus</dc:creator>
    </item>
    <item>
      <title>PRISM: Color-Stratified Point Cloud Sampling</title>
      <link>https://arxiv.org/abs/2601.06839</link>
      <description>arXiv:2601.06839v2 Announce Type: replace 
Abstract: We present PRISM, a novel color-guided stratified sampling method for RGB-LiDAR point clouds. Our approach is motivated by the observation that unique scene features often exhibit chromatic diversity while repetitive, redundant features are homogeneous in color. Conventional downsampling methods (Random Sampling, Voxel Grid, Normal Space Sampling) enforce spatial uniformity while ignoring this photometric content. In contrast, PRISM allocates sampling density proportional to chromatic diversity. By treating RGB color space as the stratification domain and imposing a maximum capacity k per color bin, the method preserves texture-rich regions with high color variation while substantially reducing visually homogeneous surfaces. This shifts the sampling space from spatial coverage to visual complexity to produce sparser point clouds that retain essential features for 3D reconstruction tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.06839v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hansol Lim, Minhyeok Im, Jongseong Brad Choi</dc:creator>
    </item>
    <item>
      <title>On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training</title>
      <link>https://arxiv.org/abs/2601.07389</link>
      <description>arXiv:2601.07389v2 Announce Type: replace 
Abstract: Post-training of large language models routinely interleaves supervised fine-tuning (SFT) with reinforcement learning (RL). These two methods have different objectives: SFT minimizes the cross-entropy loss between model outputs and expert responses, while RL maximizes reward signals derived from human preferences or rule-based verifiers. Modern reasoning models have widely adopted the practice of alternating SFT and RL training. However, there is no theoretical account of whether they can be decoupled. We prove that decoupling is impossible in either order: (1) SFT-then-RL coupling: RL increases SFT loss under both distributional (KL-based) and landscape (PL-based) analyses; and (2) RL-then-SFT coupling: SFT lowers the reward achieved by RL under analogous conditions. Under the PL condition, we further derive the optimal RL duration that balances reward improvement against SFT degradation, identify the non-decoupling threshold governing when RL can improve SFT, and bound the gradient misalignment via spectral concentration. Experiments on Qwen3-0.6B confirm the predicted degradation, verifying that SFT and RL cannot be separated without loss of prior performance in the post-training pipeline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.07389v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Xueyan Niu, Bo Bai, Wei Han, Weixi Zhang</dc:creator>
    </item>
    <item>
      <title>TMATDG: applying TDG methods to multiple scattering via T-matrix approximation</title>
      <link>https://arxiv.org/abs/2601.07704</link>
      <description>arXiv:2601.07704v2 Announce Type: replace 
Abstract: We present a MATLAB package for the solution of multiple scattering problems, coupling Trefftz Discontinuos Galerkin methods for Helmholtz scattering with the T-matrix method. We rely on the TMATROM package to numerically approximate the T-matrices and deal with multiple scattering problem, providing a framework to handle scattering by polygonal obstacles.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.07704v2</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Armando Maria Monforte</dc:creator>
    </item>
    <item>
      <title>SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models</title>
      <link>https://arxiv.org/abs/2601.08623</link>
      <description>arXiv:2601.08623v2 Announce Type: replace 
Abstract: Image generation models (IGMs), while capable of producing impressive and creative content, often memorize a wide range of undesirable concepts from their training data, leading to the reproduction of unsafe content such as NSFW imagery and copyrighted artistic styles. Such behaviors pose persistent safety and compliance risks in real-world deployments and cannot be reliably mitigated by post-hoc filtering, owing to the limited robustness of such mechanisms and a lack of fine-grained semantic control. Recent unlearning methods seek to erase harmful concepts at the model level, which exhibit the limitations of requiring costly retraining, degrading the quality of benign generations, or failing to withstand prompt paraphrasing and adversarial attacks. To address these challenges, we introduce SafeRedir, a lightweight inference-time framework for robust unlearning via prompt embedding redirection. Without modifying the underlying IGMs, SafeRedir adaptively routes unsafe prompts toward safe semantic regions through token-level interventions in the embedding space. The framework comprises two core components: a latent-aware multi-modal safety classifier for identifying unsafe generation trajectories, and a token-level delta generator for precise semantic redirection, equipped with auxiliary predictors for token masking and adaptive scaling to localize and regulate the intervention. Empirical results across multiple representative unlearning tasks demonstrate that SafeRedir achieves effective unlearning capability, high semantic and perceptual preservation, robust image quality, and enhanced resistance to adversarial attacks. Furthermore, SafeRedir generalizes effectively across a variety of diffusion backbones and existing unlearned models, validating its plug-and-play compatibility and broad applicability. Code and data are available at https://github.com/ryliu68/SafeRedir.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.08623v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Renyang Liu, Kangjie Chen, Han Qiu, Jie Zhang, Kwok-Yan Lam, Tianwei Zhang, See-Kiong Ng</dc:creator>
    </item>
    <item>
      <title>ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection</title>
      <link>https://arxiv.org/abs/2601.09195</link>
      <description>arXiv:2601.09195v3 Announce Type: replace 
Abstract: Supervised fine-tuning (SFT) is a fundamental post-training strategy to align Large Language Models (LLMs) with human intent. However, traditional SFT often ignores the one-to-many nature of language by forcing alignment with a single reference answer, leading to the model overfitting to non-core expressions. Although our empirical analysis suggests that introducing multiple reference answers can mitigate this issue, the prohibitive data and computational costs necessitate a strategic shift: prioritizing the mitigation of single-reference overfitting over the costly pursuit of answer diversity. To achieve this, we reveal the intrinsic connection between token probability and semantic importance: high-probability tokens carry the core logical framework, while low-probability tokens are mostly replaceable expressions. Based on this insight, we propose ProFit, which selectively masks low-probability tokens to prevent surface-level overfitting. Extensive experiments confirm that ProFit consistently outperforms traditional SFT baselines on general reasoning and mathematical benchmarks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.09195v3</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tao Liu, Taiqiang Wu, Runming Yang, Shaoning Sun, Junjie Wang, Yujiu Yang</dc:creator>
    </item>
    <item>
      <title>Improving the local solution of the DG predictor of the ADER-DG method for solving systems of ordinary differential equations and its applicability to systems of differential-algebraic equations</title>
      <link>https://arxiv.org/abs/2601.13908</link>
      <description>arXiv:2601.13908v2 Announce Type: replace 
Abstract: Improved local numerical solution for the ADER-DG numerical method with a local DG predictor for solving the initial value problem for a first-order ODE system is proposed. The improved local numerical solution demonstrates convergence orders of one higher than the convergence order of the local numerical solution of the original ADER-DG numerical method and has the property of continuity at grid nodes. Rigorous proofs of the approximation orders of the local numerical solution and the improved local numerical solution are presented. Obtaining the proposed improved local numerical solution does not require significant changes to the structure of the ADER-DG numerical method. Therefore, all conclusions regarding the convergence orders of the numerical solution at grid nodes, the resulting superconvergence, and the high stability of the ADER-DG numerical method remain unchanged. A wide range of applications of the ADER-DG numerical method is presented for solving specific initial value problems for ODE systems for a wide range of polynomial degrees. The obtained results provide strong confirmation for the developed rigorous theory. The improved local numerical solution is shown to exhibit both higher accuracy and improved smoothness and point-wise comparability. Empirical convergence orders of all individual numerical solutions were calculated for a wide range of error norms, which well agree with the expected convergence orders. The rigorous proof, based on the $\epsilon$-embedding method, of the applicability of the ADER-DG numerical method with a local DG predictor to solving DAE systems is presents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.13908v2</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <category>math.FA</category>
      <category>physics.app-ph</category>
      <category>physics.comp-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>I. S. Popov</dc:creator>
    </item>
    <item>
      <title>torch-sla: Differentiable Sparse Linear Algebra with Adjoint Solvers and Sparse Tensor Parallelism for PyTorch</title>
      <link>https://arxiv.org/abs/2601.13994</link>
      <description>arXiv:2601.13994v2 Announce Type: replace 
Abstract: Differentiable sparse linear algebra is foundational for scientific machine learning, yet PyTorch lacks a unified library for it: \texttt{torch.sparse} provides only low-level kernels and a non-differentiable, CPU-only \texttt{spsolve}, and \texttt{torch.linalg} is dense-only. We present \torchsla{}, an open-source library that fills this gap. It exposes a single autograd-aware API for direct, iterative, nonlinear, and eigenvalue solvers across five interchangeable backends -- SciPy and Eigen on CPU, cuDSS, CuPy, and a PyTorch-native iterative solver on GPU -- with automatic dispatch by device and problem size. The library further supports batched solves over shared or distinct sparsity patterns and distributed multi-GPU execution via domain decomposition with halo exchange. These capabilities are made scalable by an O(1)-graph adjoint differentiation framework and an autograd-compatible distributed halo-exchange layer. The library is available at https://www.torchsla.com/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.13994v2</guid>
      <category>cs.DC</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mingyuan Chi, Shizheng Wen</dc:creator>
    </item>
    <item>
      <title>Progressive $\mathcal{J}$-Invariant Self-supervised Learning for Low-Dose CT Denoising</title>
      <link>https://arxiv.org/abs/2601.14180</link>
      <description>arXiv:2601.14180v3 Announce Type: replace 
Abstract: Self-supervised learning has been increasingly investigated for low-dose computed tomography (LDCT) image denoising, as it alleviates the dependence on paired normal-dose CT (NDCT) data, which are often difficult to collect. However, many existing self-supervised blind-spot denoising methods suffer from training inefficiencies and suboptimal performance due to restricted receptive fields. To mitigate this issue, we propose a novel Progressive $\mathcal{J}$-invariant Learning that maximizes the use of $\mathcal{J}$-invariant to enhance LDCT denoising performance. We introduce a step-wise blind-spot denoising mechanism that enforces conditional independence in a progressive manner, enabling more fine-grained learning for denoising. Furthermore, we explicitly inject a combination of controlled Gaussian and Poisson noise during training to regularize the denoising process and mitigate overfitting. Extensive experiments on the Mayo LDCT dataset demonstrate that the proposed method consistently outperforms existing self-supervised approaches and achieves performance comparable to, or better than, several representative supervised denoising methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.14180v3</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yichao Liu, Zongru Shao, Yueyang Teng, Junwen Guo</dc:creator>
    </item>
    <item>
      <title>A Universal Large Language Model -- Drone Command and Control Interface</title>
      <link>https://arxiv.org/abs/2601.15486</link>
      <description>arXiv:2601.15486v2 Announce Type: replace 
Abstract: The use of artificial intelligence (AI) for drone control can have a transformative impact on drone capabilities, especially when real world information can be integrated with drone sensing, command, and control, part of a growing field of physical AI. Large language models (LLMs) can be advantageous if trained at scale on general knowledge, but especially and in particular when the training data includes information such as detailed map geography topology of the entire planet, as well as the ability to access real time situational data such as weather. However, challenges remain in the interface between drones and LLMs in general, with each application requiring a tedious, labor intensive effort to connect the LLM trained knowledge to drone command and control. Here, we solve that problem, using an interface strategy that is LLM agnostic and drone agnostic, providing the first universal, versatile, comprehensive and easy to use drone control interface. We do this using the new model context protocol (MCP) standard, an open standard that provides a universal way for AI systems to access external data, tools, and services. We develop and deploy a cloud based Linux machine hosting an MCP server that supports the Mavlink protocol, an ubiquitous drone control language used almost universally by millions of drones including Ardupilot and PX4 framework.We demonstrate flight control of a real unmanned aerial vehicle. In further testing, we demonstrate extensive flight planning and control capability in a simulated drone, integrated with a Google Maps MCP server for up to date, real time navigation information. This demonstrates a universal approach to integration of LLMs with drone command and control, a paradigm that leverages and exploits virtually all of modern AI industry with drone technology in an easy to use interface that translates natural language to drone control.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.15486v2</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Javier N. Ramos-Silva, Peter J. Burke</dc:creator>
    </item>
    <item>
      <title>DSVM-UNet : Enhancing VM-UNet with Dual Self-distillation for Medical Image Segmentation</title>
      <link>https://arxiv.org/abs/2601.19690</link>
      <description>arXiv:2601.19690v2 Announce Type: replace 
Abstract: Vision Mamba models have been extensively researched in various fields, which address the limitations of previous models by effectively managing long-range dependencies with a linear-time overhead. Several prospective studies have further designed Vision Mamba based on UNet(VM-UNet) for medical image segmentation. These approaches primarily focus on optimizing architectural designs by creating more complex structures to enhance the model's ability to perceive semantic features. In this paper, we propose a simple yet effective approach to improve the model by Dual Self-distillation for VM-UNet (DSVM-UNet) without any complex architectural designs. To achieve this goal, we develop double self-distillation methods to align the features at both the global and local levels. Extensive experiments conducted on the ISIC2017, ISIC2018, and Synapse benchmarks demonstrate that our approach achieves state-of-the-art performance while maintaining computational efficiency. Code is available at https://github.com/RoryShao/DSVM-UNet.git.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.19690v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1109/ICASSP55912.2026.11464398</arxiv:DOI>
      <arxiv:journal_reference>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2026)</arxiv:journal_reference>
      <dc:creator>Renrong Shao, Dongyang Li, Dong Xia, Lin Shao, Jiangdong Lu, Fen Zheng, Lulu Zhang</dc:creator>
    </item>
    <item>
      <title>A Scalable Multi-Task Model for Virtual Sensors</title>
      <link>https://arxiv.org/abs/2601.20634</link>
      <description>arXiv:2601.20634v2 Announce Type: replace 
Abstract: Virtual sensors replace expensive physical sensors in critical applications through machine learning by predicting target signals from available measurements. Existing virtual sensor approaches require application-specific models with hand-selected inputs for each sensor, cannot leverage task synergies, and lack consistent benchmarks. While emerging time series foundation models offer general-purpose, pretrained solutions in other domains, they are computationally expensive and limited to predicting their input signals, making them incompatible with virtual sensors. We introduce the first multi-task model for virtual sensors addressing both limitations. Our unified model can simultaneously predict diverse virtual sensors exploiting synergies while maintaining computational efficiency. It learns relevant input signals for each virtual sensor, eliminating expert knowledge requirements while adding explainability. In our large-scale evaluation on three standard benchmarks and an application-specific dataset with over 18 billion samples, our architecture reduces computation time by up to 415x and memory requirements by 951x, while maintaining or even improving predictive quality compared to unified baselines. Compared to existing isolated models for a single virtual sensor, our unified approach generates superior predictions at similar inference speed while scaling gracefully to hundreds of virtual sensors with nearly constant parameter count, enabling practical deployment in large-scale sensor networks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.20634v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Leon G\"otz, Lars Frederik Peiss, Erik Sauer, Andreas Udo Sass, Thorsten Bagdonat, Stephan G\"unnemann, Leo Schwinn</dc:creator>
    </item>
    <item>
      <title>Visual Disentangled Diffusion Autoencoders: Scalable Counterfactual Generation for Foundation Models</title>
      <link>https://arxiv.org/abs/2601.21851</link>
      <description>arXiv:2601.21851v2 Announce Type: replace 
Abstract: Foundation models, despite their robust zero-shot capabilities, remain vulnerable to spurious correlations and 'Clever Hans' strategies. Existing mitigation methods often rely on unavailable group labels or computationally expensive gradient-based adversarial optimization. To address these limitations, we propose Visual Disentangled Diffusion Autoencoders (DiDAE), a novel framework integrating frozen foundation models with disentangled dictionary learning for efficient, gradient-free counterfactual generation directly for the foundation model. DiDAE first edits foundation model embeddings in interpretable disentangled directions of the disentangled dictionary and then decodes them via a diffusion autoencoder. This allows the generation of multiple diverse, disentangled counterfactuals for each factual, much faster than existing baselines, which generate single entangled counterfactuals. When paired with Counterfactual Knowledge Distillation, DiDAE-CFKD achieves state-of-the-art performance in mitigating shortcut learning, improving downstream performance on unbalanced datasets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.21851v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sidney Bender, Marco Morik</dc:creator>
    </item>
    <item>
      <title>Defining Operational Conditions for Safety-Critical AI-Based Systems from Data</title>
      <link>https://arxiv.org/abs/2601.22118</link>
      <description>arXiv:2601.22118v2 Announce Type: replace 
Abstract: Artificial Intelligence (AI) has been on the rise in many domains, including numerous safety-critical applications. However, for complex systems in the real world, defining the underlying environmental conditions in which the AI-based system must operate -- the Operational Design Domain (ODD) -- is extremely challenging. This often results in an incomplete description of the ODD, which contrasts with the requirements of many domains for certifying AI-based systems. Traditionally, the ODD is created in the early stages of the development process, drawing on sophisticated expert knowledge and related standards. This paper presents a novel Safety-by-Design method to a posteriori define the ODD from previously collected data using a multi-dimensional kernel-based representation. This approach is validated through both Monte Carlo methods and a real-world aviation use case for a future collision-avoidance system. Moreover, by defining under what conditions two ODDs are similar, the paper shows that the data-driven ODD can produce a dataset similar to the original, hidden ODD. Deriving the novel, Safety-by-Design, deterministic kernel-based affinity representation of ODDs is fully automated via a bounded, order-independent algorithm. Utilizing the proposed ODD representation enables future certification of data-driven, safety-critical AI-based systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.22118v2</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Johann Maximilian Christensen, Elena Hoemann, Frank K\"oster, Sven Hallerbach</dc:creator>
    </item>
    <item>
      <title>OpenVTON-Bench: A Large-Scale High-Resolution Benchmark for Controllable Virtual Try-On Evaluation</title>
      <link>https://arxiv.org/abs/2601.22725</link>
      <description>arXiv:2601.22725v3 Announce Type: replace 
Abstract: Recent advances in diffusion models have significantly elevated the visual fidelity of Virtual Try-On (VTON) systems, yet reliable evaluation remains a persistent bottleneck. Traditional metrics struggle to quantify fine-grained texture details and semantic consistency, while existing datasets fail to meet commercial standards in scale and diversity. We present OpenVTON-Bench, a large-scale benchmark comprising approximately 100K high-resolution image pairs (up to $1536 \times 1536$). The dataset is constructed using DINOv3-based hierarchical clustering for semantically balanced sampling and Gemini-powered dense captioning, ensuring a uniform distribution across 20 fine-grained garment categories. To support reliable evaluation, we propose a multi-modal protocol that measures VTON quality along five interpretable dimensions: background consistency, identity fidelity, texture fidelity, shape plausibility, and overall realism. The protocol integrates VLM-based semantic reasoning with a novel Multi-Scale Representation Metric based on SAM3 segmentation and morphological erosion, enabling the separation of boundary alignment errors from internal texture artifacts. Experimental results show strong agreement with human judgments (Kendall's $\tau$ of 0.833 vs. 0.611 for SSIM), establishing a robust benchmark for VTON evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.22725v3</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jin Li, Tao Chen, Shuai Jiang, Weijie Wang, Jingwen Luo, Chenhui Wu</dc:creator>
    </item>
    <item>
      <title>Syntax- and Compilation-Preserving Evasion of LLM Vulnerability Detectors</title>
      <link>https://arxiv.org/abs/2602.00305</link>
      <description>arXiv:2602.00305v2 Announce Type: replace 
Abstract: LLM-based vulnerability detectors are increasingly deployed in CI/CD security gating, yet their resilience to evasion under syntax- and compilation-preserving edits remains poorly understood. We evaluate five attack variants spanning four carrier families of behavior-preserving code transformations on a unified C/C++ benchmark ($N=5000$) and introduce Complete Resistance (CR), measuring the fraction of correctly detected vulnerabilities that withstand all attack variants. Our findings reveal a significant robustness gap: models achieving 70\%+ clean recall exhibit CR as low as 0.12\%, meaning over 87\% of detected vulnerabilities can be evaded by at least one syntax-preserving edit. Universal adversarial strings optimized on a 14B surrogate transfer effectively to black-box APIs including GPT-4o, while on-target optimization further amplifies evasion (up to 92.5\% ASR). These results indicate that clean benchmark accuracy alone is insufficient as a security guarantee for deployed vulnerability detectors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.00305v2</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Luze Sun, Alina Oprea, Eric Wong</dc:creator>
    </item>
    <item>
      <title>CLAMP: Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining</title>
      <link>https://arxiv.org/abs/2602.00937</link>
      <description>arXiv:2602.00937v3 Announce Type: replace 
Abstract: Leveraging pre-trained 2D image representations in behavior cloning policies has achieved great success and has become a standard approach for robotic manipulation. However, such representations fail to capture the 3D spatial information about objects and scenes that is essential for precise manipulation. In this work, we introduce Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining (CLAMP), a novel 3D pre-training framework that utilizes point clouds and robot actions. From the merged point cloud computed from RGB-D images and camera extrinsics, we re-render multi-view four-channel image observations with depth and 3D coordinates, including dynamic wrist views, to provide clearer views of target objects for high-precision manipulation tasks. The pre-trained encoders learn to associate the 3D geometric and positional information of objects with robot action patterns via contrastive learning on large-scale simulated robot trajectories. During encoder pre-training, we pre-train a Diffusion Policy to initialize the policy weights for fine-tuning, which is essential for improving fine-tuning sample efficiency and performance. After pre-training, we fine-tune the policy on a limited amount of task demonstrations using the learned image and action representations. We demonstrate that this pre-training and fine-tuning design substantially improves learning efficiency and policy performance on unseen tasks. Furthermore, we show that CLAMP outperforms state-of-the-art baselines across six simulated tasks and five real-world tasks. The project website and videos can be found at https://clamp3d.github.io/CLAMP/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.00937v3</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>I-Chun Arthur Liu, Krzysztof Choromanski, Sandy Huang, Connor Schenck</dc:creator>
    </item>
    <item>
      <title>Autoregressive, Yet Revisable: In Decoding Revision for Secure Code Generation</title>
      <link>https://arxiv.org/abs/2602.01187</link>
      <description>arXiv:2602.01187v2 Announce Type: replace 
Abstract: Large Language Model (LLM) based code generation is predominantly formulated as a strictly monotonic process, appending tokens linearly to an immutable prefix. This formulation contrasts to the cognitive process of programming, which is inherently interleaved with forward generation and on-the-fly revision. While prior works attempt to introduce revision via post-hoc agents or external static tools, they either suffer from high latency or fail to leverage the model's intrinsic semantic reasoning. In this paper, we propose Stream of Revision, a paradigm shift that elevates code generation from a monotonic stream to a dynamic, self-correcting trajectory by leveraging model's intrinsic capabilities. We introduce specific action tokens that enable the model to seamlessly backtrack and edit its own history within a single forward pass. By internalizing the revision loop, our framework Stream of Revision allows the model to activate its latent capabilities just-in-time without external dependencies. Empirical results on secure code generation show that Stream of Revision significantly reduces vulnerabilities with minimal inference overhead.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.01187v2</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chengran Yang, Zichao Wei, Heminghao Deng, Jinfeng Jiang, Zhensu Sun, Ting Zhang, Tianyi Wu, Ming Wen, David Lo</dc:creator>
    </item>
    <item>
      <title>Multi-Scale Wavelet Transformers for Operator Learning of Dynamical Systems</title>
      <link>https://arxiv.org/abs/2602.01486</link>
      <description>arXiv:2602.01486v2 Announce Type: replace 
Abstract: Recent years have seen a surge in data-driven surrogates for dynamical systems that can be orders of magnitude faster than numerical solvers. However, many machine learning-based models such as neural operators exhibit spectral bias, attenuating high-frequency components that often encode small-scale structure. This limitation is particularly damaging in applications such as weather forecasting, where misrepresented high frequencies can induce long-horizon instability. To address this issue, we propose multi-scale wavelet transformers (MSWTs), which learn system dynamics in a tokenized wavelet domain. The wavelet transform explicitly separates low- and high-frequency content across scales. MSWTs leverage a wavelet-preserving downsampling scheme that retains high-frequency features and employ wavelet-based attention to capture dependencies across scales and frequency bands. Experiments on chaotic dynamical systems show substantial error reductions and improved long horizon spectral fidelity. On the ERA5 climate reanalysis, MSWTs further reduce climatological bias, demonstrating their effectiveness in a real-world forecasting setting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.01486v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xuesong Wang, Michael Groom, Rafael Oliveira, He Zhao, Terence O'Kane, Edwin V. Bonilla</dc:creator>
    </item>
    <item>
      <title>MoLF: Mixture-of-Latent-Flow for Pan-Cancer Spatial Gene Expression Prediction from Histology</title>
      <link>https://arxiv.org/abs/2602.02282</link>
      <description>arXiv:2602.02282v2 Announce Type: replace 
Abstract: Inferring spatial transcriptomics (ST) from histology enables scalable histogenomic profiling, yet current methods are largely restricted to single-tissue models. This fragmentation fails to leverage biological principles shared across cancer types and hinders application to data-scarce scenarios. While pan-cancer training offers a solution, the resulting heterogeneity challenges monolithic architectures. To bridge this gap, we introduce MoLF (Mixture-of-Latent-Flow), a generative model for pan-cancer histogenomic prediction. MoLF leverages a conditional Flow Matching objective to map noise to the gene latent manifold, parameterized by a Mixture-of-Experts (MoE) velocity field. By dynamically routing inputs to specialized sub-networks, this architecture effectively decouples the optimization of diverse tissue patterns. Our experiments demonstrate that MoLF establishes a new state-of-the-art, consistently outperforming both specialized and foundation model baselines on pan-cancer benchmarks. Furthermore, MoLF exhibits zero-shot generalization to cross-species data, suggesting it captures fundamental, conserved histo-molecular mechanisms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.02282v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:journal_reference>Proceedings 43rd International Conference on Machine Learning 2026</arxiv:journal_reference>
      <dc:creator>Susu Hu, Stefanie Speidel</dc:creator>
    </item>
    <item>
      <title>The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors</title>
      <link>https://arxiv.org/abs/2602.02315</link>
      <description>arXiv:2602.02315v2 Announce Type: replace 
Abstract: Large language models (LLMs) form implicit beliefs (posteriors over latent variables) from prompts, but we lack a mechanistic account of how these beliefs are encoded in representation space, how they update with new evidence, and how interventions reshape them. We study a controlled setting in which Llama-3.2 infers the parameters of a normal distribution from in-context samples. We show that parameter posteriors are encoded as curved manifolds in representation space, and trace how they evolve along the prompt. Standard linear steering moves representations off-manifold, inducing unintended, coupled changes, whereas geometry-aware methods preserve the target belief family. Our work demonstrates an example of linear field probing (LFP) as a principled approach to tile the data manifold and make interventions that respect the underlying geometry. Our results suggest that LLM beliefs are inherently geometric objects, and that globally linear representations are often inadequate abstractions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.02315v2</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Rapha\"el Sarfati, Eric Bigelow, Daniel Wurgaft, Siddharth Boppana, Jack Merullo, Atticus Geiger, Owen Lewis, Tom McGrath, Ekdeep Singh Lubana</dc:creator>
    </item>
    <item>
      <title>Norm Anchors Make Model Edits Last</title>
      <link>https://arxiv.org/abs/2602.02543</link>
      <description>arXiv:2602.02543v3 Announce Type: replace 
Abstract: Sequential Locate-and-Edit (L&amp;E) model editing can fail abruptly after many edits. We identify and formalize this failure as a positive norm-feedback loop, in which solved value vectors and edited MLP weights progressively amplify each other, degrading edit quality and eventually collapsing model capabilities. Our analysis shows that this feedback can yield approximately exponential norm growth under standard L&amp;E dynamics, and can remain unresolved by existing increment-level regularizers or update clamps. We propose Norm-Anchor Scaling (NAS), a plug-in stabilizer that breaks this loop by rescaling each solved value vector to an original-model reference norm. Across multiple LLM backbones, datasets, and L&amp;E editors, NAS extends the usable editing horizon by more than 4x and improves long-run editing performance by 72.2% on average, while preserving single-edit efficacy, with only a one-line modification and negligible computational overhead.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.02543v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mingda Liu, Zhenghan Zhu, Ze'an Miao, Katsuki Fujisawa</dc:creator>
    </item>
    <item>
      <title>Comparison of Trefftz-Based PINNs and Standard PINNs Focusing on Structure Preservation</title>
      <link>https://arxiv.org/abs/2602.02779</link>
      <description>arXiv:2602.02779v3 Announce Type: replace 
Abstract: In this study, we investigate the capability of physics-informed neural networks (PINNs) to preserve global physical structures by comparing standard PINNs with a Trefftz-based PINN (Trefftz-PINN). The target problem is the reproduction of mag-netic field-line structures in a helical fusion reactor configuration. Using identical training data sampled from exact solutions, we perform comparisons under matched mean squared error (MSE) levels. Visualization of magnetic field lines reveals that standard PINNs may exhibit structural collapse across magnetic surfaces even when the MSE is sufficiently small, whereas Trefftz-PINNs successfully preserve the global topology of magnetic field lines. Furthermore, the proposed framework is extended to computational fluid dynamics (CFD) problems, where streamline structures of veloc-ity fields are analyzed. Similar tendencies are observed, demonstrating that Trefftz-PINNs provide superior structure preservation compared to standard PINNs. These results indicate that minimizing numerical error alone does not guarantee physical consistency, and that constraining the solution space prior to learning is an effective strategy for physics-consistent surrogate modeling.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.02779v3</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Koji Koyamada</dc:creator>
    </item>
    <item>
      <title>How Does the Lagrangian Guide Safe Reinforcement Learning through Diffusion Models?</title>
      <link>https://arxiv.org/abs/2602.02924</link>
      <description>arXiv:2602.02924v2 Announce Type: replace 
Abstract: Diffusion policy sampling enables reinforcement learning (RL) to represent multimodal action distributions beyond suboptimal unimodal Gaussian policies. However, existing diffusion-based RL methods primarily focus on offline settings for reward maximization, with limited consideration of safety in online settings. To address this gap, we propose Augmented Lagrangian-Guided Diffusion (ALGD), a novel algorithm for off-policy safe RL. By revisiting optimization theory and energy-based model, we show that the instability of primal-dual methods arises from the non-convex Lagrangian landscape. In diffusion-based safe RL, the Lagrangian can be interpreted as an energy function guiding the denoising dynamics. Counterintuitively, direct usage destabilizes both policy generation and training. ALGD resolves this issue by introducing an augmented Lagrangian that locally convexifies the energy landscape, yielding a stabilized policy generation and training process without altering the distribution of the optimal policy. Theoretical analysis and extensive experiments demonstrate that ALGD is both theoretically grounded and empirically effective, achieving strong and stable performance across diverse environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.02924v2</guid>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiaoyuan Cheng, Wenxuan Yuan, Boyang Li, Yuanchao Xu, Yiming Yang, Hao Liang, Bei Peng, Robert Loftin, Zhuo Sun, Yukun Hu</dc:creator>
    </item>
    <item>
      <title>Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization</title>
      <link>https://arxiv.org/abs/2602.02958</link>
      <description>arXiv:2602.02958v5 Announce Type: replace 
Abstract: Despite rapid progress in autoregressive video diffusion, an emerging system algorithm bottleneck limits both deployability and generation capability: KV cache memory. In autoregressive video generation models, the KV cache grows with generation history and quickly dominates GPU memory, often exceeding 30 GB, preventing deployment on widely available hardware. More critically, constrained KV cache budgets restrict the effective working memory, directly degrading long horizon consistency in identity, layout, and motion. To address this challenge, we present Quant VideoGen (QVG), a training free KV cache quantization framework for autoregressive video diffusion models. QVG leverages video spatiotemporal redundancy through Semantic Aware Smoothing, producing low magnitude, quantization friendly residuals. It further introduces Progressive Residual Quantization, a coarse to fine multi stage scheme that reduces quantization error while enabling a smooth quality memory trade off. Across LongCat Video, HY WorldPlay, and Self Forcing benchmarks, QVG establishes a new Pareto frontier between quality and memory efficiency, reducing KV cache memory by up to 7.0 times with less than 4% end to end latency overhead while consistently outperforming existing baselines in generation quality. Code is available at: https://github.com/svg-project/Quant-VideoGen</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.02958v5</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haocheng Xi, Shuo Yang, Yilong Zhao, Muyang Li, Han Cai, Xingyang Li, Yujun Lin, Zhuoyang Zhang, Jintao Zhang, Xiuyu Li, Zhiying Xu, Jun Wu, Chenfeng Xu, Ion Stoica, Song Han, Kurt Keutzer</dc:creator>
    </item>
    <item>
      <title>Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry</title>
      <link>https://arxiv.org/abs/2602.03204</link>
      <description>arXiv:2602.03204v2 Announce Type: replace 
Abstract: While Mixture-of-Experts (MoE) architectures define the state-of-the-art, their theoretical success is often attributed to heuristic efficiency rather than geometric expressivity. In this work, we present the first analysis of MoE through the lens of tropical geometry, establishing that the Top-$k$ routing mechanism is algebraically isomorphic to the $k$-th elementary symmetric tropical polynomial. This isomorphism partitions the input space into the Normal Fan of a Hypersimplex, revealing that \textbf{sparsity is combinatorial depth} which scales geometric capacity by the binomial coefficient $\binom{N}{k}$. Moving beyond ambient bounds, we introduce the concept of \textit{Effective Capacity} under the Manifold Hypothesis. We prove that while dense networks suffer from capacity collapse on low-dimensional data, MoE architectures exhibit \textit{Combinatorial Resilience}, maintaining high expressivity via the transversality of routing cones. Translating these theoretical bounds into architectural principles, we derive asymptotic capacity limits for optimal expert granularity and prove that shared experts are geometrically necessary to prevent routing collapse.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.03204v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ye Su, Huayi Tang, Zixuan Gong, Yong Liu</dc:creator>
    </item>
    <item>
      <title>Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective</title>
      <link>https://arxiv.org/abs/2602.03396</link>
      <description>arXiv:2602.03396v3 Announce Type: replace 
Abstract: Proprietary large language models (LLMs) embody substantial economic value and are generally exposed only as black-box APIs, yet adversaries can still exploit their outputs to extract knowledge via distillation. Existing defenses focus exclusively on text-based distillation, leaving the important logit-based distillation largely unexplored. In this work, we analyze this problem and present an effective solution from an information-theoretic perspective. We characterize distillation-relevant information in teacher outputs using the conditional mutual information (CMI) between teacher logits and input queries conditioned on ground-truth labels. This quantity captures contextual information beneficial for model extraction, motivating us to defend distillation via CMI minimization. Guided by our theoretical analysis, we propose learning a transformation matrix that purifies the original outputs to enhance distillation resistance. We further derive a CMI-inspired anti-distillation objective to optimize this transformation, which effectively removes distillation-relevant information while preserving output utility. Extensive experiments across multiple LLMs and strong distillation algorithms demonstrate that the proposed method significantly degrades distillation performance while preserving task accuracy, effectively protecting models' intellectual property.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.03396v3</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hao Fang, Tianyi Zhang, Tianqu Zhuang, Jiawei Kong, Kuofeng Gao, Bin Chen, Leqi Zheng, Shu-Tao Xia, Ke Xu</dc:creator>
    </item>
    <item>
      <title>Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing</title>
      <link>https://arxiv.org/abs/2602.03452</link>
      <description>arXiv:2602.03452v2 Announce Type: replace 
Abstract: Reinforcement learning with verifiable rewards (RLVR) is effective for training large language models on deterministic outcome reasoning tasks. Prior work shows RLVR works with few prompts, but prompt selection is often based only on training-accuracy variance, leading to unstable optimization directions and weaker transfer. We revisit prompt selection from a mechanism-level view and argue that an effective minibatch should provide both (i) a reliable positive anchor and (ii) explicit negative learning signals from rare failures. Based on this principle, we propose \emph{positive--negative pairing}: at each update, we sample a hard-but-solvable $q^{+}$ and an easy-but-brittle prompt $q^{-}$(high success rate but not perfect), characterized by low and high empirical success rates under multiple rollouts. We further introduce Weighted GRPO, which reweights binary outcomes at the pair level and uses group-normalized advantages to amplify rare successes on $q^{+}$ into sharp positive guidance while turning rare failures on $q^{-}$ into strong negative penalties. This bidirectional signal provides informative learning feedback for both successes and failures, improving sample efficiency without suppressing exploration. On Qwen2.5-Math-7B, a single paired minibatch per update consistently outperforms a GRPO baseline that selects two prompts via commonly used variance-based selection heuristics: AIME~2025 Pass@8 improves from 16.8 to 22.2, and AMC23 Pass@64 from 94.0 to 97.0, while remaining competitive with large-scale RLVR trained from a pool of 1209 training prompts. Similar gains are observed on Qwen2.5-Math-7B-Instruct.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.03452v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yujuan Pang, Jiaxin Li, Xin Sheng, Ran Peng, Yong Ma</dc:creator>
    </item>
    <item>
      <title>KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning</title>
      <link>https://arxiv.org/abs/2602.04129</link>
      <description>arXiv:2602.04129v2 Announce Type: replace 
Abstract: Heterogeneous multi-robot systems are increasingly used in long-horizon missions requiring coordinated planning across diverse capabilities. However, existing planning approaches struggle to construct accurate symbolic representations and maintain plan consistency in dynamic environments. Classical PDDL planners require manually crafted symbolic models, while LLM-based planners often ignore agent heterogeneity and environmental uncertainty. We introduce KGLAMP, a knowledge-graph-guided LLM planning framework for heterogeneous multi-robot teams. The framework maintains a structured knowledge graph encoding object relations, spatial reachability, and robot capabilities, which guides the LLM in generating accurate PDDL problem specifications. The knowledge graph serves as a persistent, dynamically updated memory that incorporates new observations and triggers replanning upon detecting inconsistencies, enabling symbolic plans to adapt to evolving world states. Experiments on the MAT-THOR benchmark show that KGLAMP improves performance by at least 25.3% over both LLM-only and PDDL-based variants.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.04129v2</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.ET</category>
      <category>cs.MA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Chak Lam Shek, Faizan M. Tariq, Sangjae Bae, David Isele, Piyush Gupta</dc:creator>
    </item>
    <item>
      <title>DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training</title>
      <link>https://arxiv.org/abs/2602.05890</link>
      <description>arXiv:2602.05890v2 Announce Type: replace 
Abstract: Training reinforcement learning (RL) systems in real-world environments remains challenging due to noisy supervision and poor out-of-domain (OOD) generalization, especially in LLM post-training. Recent distributional RL methods improve robustness by modeling values with multiple quantile points, but they still learn each quantile independently as a scalar. This results in rough-grained value representations that lack fine-grained conditioning on state information, struggling under complex and OOD conditions. We propose DFPO (Distributional Value Flow Policy Optimization with Conditional Risk and Consistency Control), a robust distributional RL framework that models values as continuous flows across time steps. By scaling value modeling through learning of a value flow field instead of isolated quantile predictions, DFPO captures richer state information for more accurate advantage estimation. To stabilize training under noisy feedback, DFPO further integrates conditional risk control and consistency constraints along value flow trajectories. Experiments on dialogue, math reasoning, and scientific tasks show that DFPO outperforms PPO, FlowRL, and other robust baselines under noisy supervision, achieving improved training stability and generalization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.05890v2</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dingwei Zhu, Zhiheng Xi, Shihan Dou, Jiahan Li, Chenhao Huang, Junjie Ye, Sixian Li, Mingxu Chai, Yuhui Wang, Yajie Yang, Ming Zhang, Jiazheng Zhang, Shichun Liu, Caishuang Huang, Yunke Zhang, Yuran Wang, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang</dc:creator>
    </item>
    <item>
      <title>Calibrating Tabular Anomaly Detection via Optimal Transport</title>
      <link>https://arxiv.org/abs/2602.06810</link>
      <description>arXiv:2602.06810v2 Announce Type: replace 
Abstract: Tabular anomaly detection (TAD) remains challenging due to the heterogeneity of tabular data: features lack natural relationships, vary widely in distribution and scale, and exhibit diverse types. Consequently, each TAD method makes implicit assumptions about anomaly patterns that work well on some datasets but fail on others, and no method consistently outperforms across diverse scenarios. We present CTAD (Calibrating Tabular Anomaly Detection), a model-agnostic post-processing framework that enhances any existing TAD detector through sample-specific calibration. Our approach characterizes normal data via two complementary distributions, i.e., an empirical distribution from random sampling and a structural distribution from K-means centroids, and measures how adding a test sample disrupts their compatibility using Optimal Transport (OT) distance. Normal samples maintain low disruption while anomalies cause high disruption, providing a calibration signal to amplify detection. We prove that OT distance has a lower bound proportional to the test sample's distance from centroids, and establish that anomalies systematically receive higher calibration scores than normals in expectation, explaining why the method generalizes across datasets. Extensive experiments on 34 diverse tabular datasets with 7 representative detectors spanning all major TAD categories (density estimation, classification, reconstruction, and isolation-based methods) demonstrate that CTAD consistently improves performance with statistical significance. Remarkably, CTAD enhances even state-of-the-art deep learning methods and shows robust performance across diverse hyperparameter settings, requiring no additional tuning for practical deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.06810v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hangting Ye, He Zhao, Wei Fan, Xiaozhuang Song, Dandan Guo, Yi Chang, Hongyuan Zha</dc:creator>
    </item>
    <item>
      <title>Uncovering Cross-Objective Interference in Multi-Objective Alignment</title>
      <link>https://arxiv.org/abs/2602.06869</link>
      <description>arXiv:2602.06869v2 Announce Type: replace 
Abstract: We study a persistent failure mode in multi-objective alignment for large language models (LLMs): training improves performance on only a subset of objectives while causing others to degrade. We formalize this phenomenon as cross-objective interference and conduct the first systematic study across scalarization algorithms, showing that interference is pervasive and exhibits strong model dependence. To explain this phenomenon, we derive a local covariance law showing that an objective improves when its reward exhibits positive covariance with the scalarized score. We extend this analysis to clipped surrogate objectives used in modern alignment, demonstrating that the covariance law remains valid under mild conditions despite clipping. Building on this analysis, we propose Covariance Targeted Weight Adaptation (CTWA), a plug-and-play method that maintains positive covariance between objective rewards and the training signal to effectively mitigate cross-objective interference. Finally, we complement these local improvement conditions with a global convergence analysis under the Polyak--\L{}ojasiewicz condition, establishing when non-convex scalarized optimization achieves global convergence and how cross-objective interference depends on specific model geometric properties.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.06869v2</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yining Lu, Meng Jiang</dc:creator>
    </item>
    <item>
      <title>HistoMet: A Pan-Cancer Deep Learning Framework for Prognostic Prediction of Metastatic Progression and Site Tropism from Primary Tumor Histopathology</title>
      <link>https://arxiv.org/abs/2602.07608</link>
      <description>arXiv:2602.07608v2 Announce Type: replace 
Abstract: Metastatic Progression remains the leading cause of cancer-related mortality, yet predicting whether a primary tumor will metastasize and where it will disseminate directly from histopathology remains a fundamental challenge. Although whole-slide images (WSIs) provide rich morphological information, prior computational pathology approaches typically address metastatic status or site prediction as isolated tasks, and do not explicitly model the clinically sequential decision process of metastatic risk assessment followed by downstream site-specific evaluation. To address this research gap, we present a decision-aware, concept-aligned MIL framework, HistoMet, for prognostic metastatic outcome prediction from primary tumor WSIs. Our proposed framework adopts a two-module prediction pipeline in which the likelihood of metastatic progression from the primary tumor is first estimated, followed by conditional prediction of metastatic site for high-risk cases. To guide representation learning and improve clinical interpretability, our framework integrates linguistically defined and data-adaptive metastatic concepts through a pretrained pathology vision-language model. We evaluate HistoMet on a multi-institutional pan-cancer cohort of 6504 patients with metastasis follow-up and site annotations. Under clinically relevant high-sensitivity screening settings (95 percent sensitivity), HistoMet significantly reduces downstream workload while maintaining high metastatic risk recall. Conditional on metastatic cases, HistoMet achieves a macro F1 of 74.6 with a standard deviation of 1.3 and a macro one-vs-rest AUC of 92.1. These results demonstrate that explicitly modeling clinical decision structure enables robust and deployable prognostic prediction of metastatic progression and site tropism directly from primary tumor histopathology.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.07608v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yixin Chen, Ziyu Su, Lingbin Meng, Elshad Hasanov, Wei Chen, Anil Parwani, M. Khalid Khan Niazi</dc:creator>
    </item>
    <item>
      <title>SWE Context Bench: A Benchmark for Context Learning in Coding</title>
      <link>https://arxiv.org/abs/2602.08316</link>
      <description>arXiv:2602.08316v3 Announce Type: replace 
Abstract: Large language models are increasingly used as coding agents for software engineering tasks. Current benchmarks mainly evaluate whether the agent can correctly solve the request or fix the bugs. They largely treat tasks as independent and do not assess whether agents can reuse previous experience across related problems. As a result, the efficiency gains from reusing the previous experience remains difficult to measure. We introduce SWE-ContextBench, a benchmark designed to explicitly evaluate context understanding and retrieval in coding agents. SWE-ContextBench consists of 1,100 base tasks with another 376 related tasks derived from real dependency and reference relationships among GitHub issues and pull requests. SWE-ContextBench groups base tasks and related tasks with shared context across 51 unique repositories and 9 programming languages. The benchmark evaluates how accurately and efficiently agents solve related issues when prior cases are available in context. Using SWE-ContextBench, we study the behavior of multiple coding agents across varying context reuse settings and retrieval strategies. Our results show that accurately summarized and retrieved previous experience can significantly improve resolution accuracy and reduce runtime and token cost, particularly on harder tasks. In contrast, unfiltered or incorrectly selected context provides limited or negative benefits. These findings highlight the importance of context management and retrieval accuracy, and position SWE-ContextBench as a principled benchmark for studying context learning in coding agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.08316v3</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiayuan Zhu, Junde Wu, Minhao Hu, Shengda Zhu, Jiazhen Pan, Weixiang Shen, Yijun Yang, Fenglin Liu, Jianye Hao, Yueming Jin, Qirong Ho, Min Xu</dc:creator>
    </item>
    <item>
      <title>Hybrid Pooling with LLMs via Relevance Context Learning</title>
      <link>https://arxiv.org/abs/2602.08457</link>
      <description>arXiv:2602.08457v2 Announce Type: replace 
Abstract: High-quality relevance judgements over large query sets are essential for evaluating Information Retrieval (IR) systems, yet manual annotation remains costly and time-consuming. Large Language Models (LLMs) have recently shown promise as automatic relevance assessors, but their reliability is still limited. Most existing approaches rely on zero-shot prompting or in-context learning (ICL) with a small number of labelled examples. However, standard ICL treats examples as independent instances and fails to explicitly capture the underlying relevance criteria of a topic, restricting its ability to generalise to unseen query-document pairs. To address this limitation, we introduce Relevance Context Learning (RCL), a novel framework that leverages human relevance judgements to explicitly model topic-specific relevance criteria. Rather than directly using labelled examples for in-context prediction, RCL first prompts an LLM (Instructor LLM) to analyse sets of judged query-document pairs and generate explicit narratives that describe what constitutes relevance for a given topic. These relevance narratives are then used as structured prompts to guide a second LLM (Assessor LLM) in producing relevance judgements. To evaluate RCL in a realistic data collection setting, we propose a hybrid pooling strategy in which a shallow depth-k pool from participating systems is judged by human assessors, while the remaining documents are labelled by LLMs. Experimental results demonstrate that RCL substantially outperforms zero-shot prompting and consistently improves over standard ICL. Overall, our findings indicate that transforming relevance examples into explicit, context-aware relevance narratives is a more effective way of exploiting human judgements for LLM-based IR dataset construction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.08457v2</guid>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>David Otero, Javier Parapar</dc:creator>
    </item>
    <item>
      <title>UniComp: A Unified Evaluation of Large Language Model Compression via Pruning, Quantization and Distillation</title>
      <link>https://arxiv.org/abs/2602.09130</link>
      <description>arXiv:2602.09130v4 Announce Type: replace 
Abstract: Model compression is increasingly essential for deploying large language models (LLMs), yet existing comparative studies largely focus on pruning and quantization evaluated primarily on knowledge-centric benchmarks. Thus, we introduce UniComp, a unified evaluation framework for comparing pruning, quantization, and knowledge distillation. UniComp evaluates compressed models along three dimensions: performance, reliability, and efficiency, using a diverse set of capability- and safety-oriented benchmarks together with a hardware-aware efficiency analysis. Through evaluation of six compression techniques across 40 datasets, we observe (i) a consistent knowledge bias, where factual recall is largely preserved while multi-step reasoning, multilingual, and instruction-following capabilities degrade; (ii) a decoupling between performance and reliability, indicating that retained performance does not consistently imply preserved reliability; and (iii) that task-specific calibration can yield up to 50% relative improvement of reasoning performance in pruned models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.09130v4</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jonathan von Rad, Yong Cao, Andreas Geiger</dc:creator>
    </item>
    <item>
      <title>Cross-Dataset Linkage of Brain MRI using Image Similarity Measures</title>
      <link>https://arxiv.org/abs/2602.10043</link>
      <description>arXiv:2602.10043v2 Announce Type: replace 
Abstract: Head magnetic resonance imaging (MRI) data are routinely collected and shared for research under strict regulatory frameworks that require the removal of direct identifiers prior to data release. However, even after skull stripping, brain parenchyma may retain participant-specific features that enable linkage of scans acquired from the same individual across datasets, posing a potential privacy risk when combined with auxiliary information. Current regulatory approaches typically assess such risks using qualitative notions of reasonableness. Although prior work has suggested that brain MRI can support subject linkage, existing demonstrations have relied on training-based or computationally intensive methods.
  Here, we show that reliable linkage of skull-stripped T1-weighted brain MRI is possible using standard preprocessing pipelines followed by direct image similarity computations. Using this simple approach, we achieve near-perfect matching accuracy across datasets acquired at different time points, with varying scanner types, spatial resolutions, and acquisition protocols, and even in the presence of cognitive decline. These experiments simulate realistic scenarios of cross-database matching in large-scale neuroimaging repositories. Our findings highlight a previously underappreciated re-identification risk in shared brain MRI data and provide empirical evidence relevant to the development of informed, forward-looking data-sharing policies in neuroimaging research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.10043v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gaurang Sharma, Harri Polonen, Juha Pajula, Jutta Suksi, Jussi Tohka</dc:creator>
    </item>
    <item>
      <title>Photons x Force: Differentiable Radiation Pressure Modeling</title>
      <link>https://arxiv.org/abs/2602.10712</link>
      <description>arXiv:2602.10712v2 Announce Type: replace 
Abstract: We propose a system to optimize parametric designs subject to radiation pressure, \ie the effect of light on the motion of objects. This is most relevant in the design of spacecraft, where radiation pressure presents the dominant non-conservative forcing mechanism, which is the case beyond approximately 800 km altitude. Despite its importance, the high computational cost of high-fidelity radiation pressure modeling has limited its use in large-scale spacecraft design, optimization, and space situational awareness applications. We enable this by offering three innovations in the simulation, in representation and in optimization: First, a practical computer graphics-inspired Monte-Carlo (MC) simulation of radiation pressure. The simulation is highly parallel, uses importance sampling and next-event estimation to reduce variance and allows simulating an entire family of designs instead of a single spacecraft as in previous work. Second, we introduce neural networks as a representation of forces from design parameters. This neural proxy model, learned from simulations, is inherently differentiable and can query forces orders of magnitude faster than a full MC simulation. Third, and finally, we demonstrate optimizing inverse radiation pressure designs, such as finding geometry, material or operation parameters that minimizes travel time, maximizes proximity given a desired end-point, minimize thruster fuel, trains mission control policies or allocated compute budget in extraterrestrial compute.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.10712v2</guid>
      <category>cs.GR</category>
      <category>astro-ph.EP</category>
      <category>astro-ph.IM</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1145/3811396</arxiv:DOI>
      <arxiv:journal_reference>ACM Transactions on Graphics, Vol. 45, No. 4, Article 82 (July 2026)</arxiv:journal_reference>
      <dc:creator>Charles Constant, Elizabeth Bates, Santosh Bhattarai, Marek Ziebart, Tobias Ritschel</dc:creator>
    </item>
    <item>
      <title>Pack it in: Packing into Partially Filled Containers Through Contact</title>
      <link>https://arxiv.org/abs/2602.12095</link>
      <description>arXiv:2602.12095v3 Announce Type: replace 
Abstract: The automation of warehouse operations is crucial for improving productivity and reducing human exposure to hazardous environments. One operation frequently performed in warehouses is bin-packing where items need to be placed into containers, either for delivery to a customer, or for temporary storage in the warehouse. Whilst prior bin-packing works have largely been focused on packing items into empty containers and have adopted collision-free strategies, it is often the case that containers will already be partially filled with items, often in suboptimal arrangements due to transportation about a warehouse. This paper presents a contact-aware packing approach that exploits purposeful interactions with previously placed objects to create free space and enable successful placement of new items. This is achieved by using a contact-based multi-object trajectory optimizer within a model predictive controller, integrated with a physics-aware perception system that estimates object poses even during inevitable occlusions, and a method that suggests physically-feasible locations to place the object inside the container.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.12095v3</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>David Russell, Zisong Xu, Maximo A. Roa, Mehmet Dogar</dc:creator>
    </item>
    <item>
      <title>SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise</title>
      <link>https://arxiv.org/abs/2602.12783</link>
      <description>arXiv:2602.12783v2 Announce Type: replace 
Abstract: Spoken query retrieval is an important interaction mode in modern information retrieval. However, existing evaluation datasets are often limited to simple queries under constrained noise conditions, making them inadequate for assessing the robustness of spoken query retrieval systems under complex acoustic perturbations. To address this limitation, we present SQuTR, a robustness benchmark for spoken query retrieval that includes a large-scale dataset and a unified evaluation protocol. SQuTR aggregates 37,317 unique queries from six commonly used English and Chinese text retrieval datasets, spanning multiple domains and diverse query types. We synthesize speech using voice profiles from 200 real speakers and mix 17 categories of real-world environmental noise under controlled SNR levels, enabling reproducible robustness evaluation from quiet to highly noisy conditions. Under the unified protocol, we conduct large-scale evaluations on representative cascaded and end-to-end retrieval systems. Experimental results show that retrieval performance decreases as noise increases, with substantially different drops across systems. Even large-scale retrieval models struggle under extreme noise, indicating that robustness remains a critical bottleneck. Overall, SQuTR provides a reproducible testbed for benchmarking and diagnostic analysis, and facilitates future research on robustness in spoken query to text retrieval.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.12783v2</guid>
      <category>cs.IR</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuejie Li, Ke Yang, Yueying Hua, Berlin Chen, Jianhao Nie, Yueping He, Caixin Kang</dc:creator>
    </item>
    <item>
      <title>BEAGLE: Behavior-Enforced Agent for Grounded Learner Emulation</title>
      <link>https://arxiv.org/abs/2602.13280</link>
      <description>arXiv:2602.13280v2 Announce Type: replace 
Abstract: Simulating student learning behaviors in open-ended problem-solving environments holds potential for education research, from training adaptive tutoring systems to stress-testing pedagogical interventions. However, collecting authentic data is challenging due to privacy concerns and the high cost of longitudinal studies. While Large Language Models (LLMs) offer a promising path to student simulation, they suffer from competency bias, optimizing for efficient correctness rather than the erratic, iterative struggle characteristic of novice learners. We present BEAGLE, a neuro-symbolic framework that addresses this bias by incorporating Self-Regulated Learning (SRL) theory into a novel architecture. BEAGLE integrates three key technical innovations: (1) a semi-Markov model that governs the timing and transitions of cognitive behaviors and metacognitive behaviors; (2) Bayesian Knowledge Tracing with explicit flaw injection to enforce realistic knowledge gaps and "unknown unknowns"; and (3) a decoupled agent design that separates high-level strategy use from code generation actions to prevent the model from silently correcting its own intentional errors. In evaluations on Python programming tasks, BEAGLE significantly outperforms state-of-the-art baselines in reproducing authentic trajectories. In a human Turing test, participants could not reliably tell BEAGLE traces apart from real student data: classification accuracy was statistically equivalent to chance (52.8%, d' = 0.15, N = 71)</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.13280v2</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hanchen David Wang, Clayton Cohn, Zifan Xu, Siyuan Guo, Gautam Biswas, Meiyi Ma</dc:creator>
    </item>
    <item>
      <title>Advancing Analytic Class-Incremental Learning through Vision-Language Calibration</title>
      <link>https://arxiv.org/abs/2602.13670</link>
      <description>arXiv:2602.13670v2 Announce Type: replace 
Abstract: Class-incremental learning (CIL) with pre-trained models (PTMs) faces a critical trade-off between efficient adaptation and long-term stability. While analytic learning enables rapid, recursive closed-form updates, its efficacy is often compromised by accumulated errors and feature incompatibility. In this paper, we first conduct a systematic study to dissect the failure modes of PTM-based analytic CIL, identifying representation rigidity as the primary bottleneck. Motivated by this insight, we propose VILA, a novel dual-branch framework that advances analytic CIL via a two-level vision-language calibration strategy. Specifically, we coherently fuse plastic, task-adapted features with a frozen, universal visual anchor at the feature level through geometric calibration, and leverage cross-modal semantic priors at the decision level to rectify prediction bias. This confluence maintains analytic-learning's extreme efficiency while overcoming its inherent brittleness. Extensive experiments across eight benchmarks demonstrate that VILA consistently yields superior performance, particularly in fine-grained and long-sequence scenarios. Our framework harmonizes high-fidelity prediction with the simplicity of analytic learning. Our code is available at https://github.com/byzhaoAI/VILA.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.13670v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Binyu Zhao, Wei Zhang, Xingrui Yu, Zhaonian Zou, Ivor Tsang</dc:creator>
    </item>
    <item>
      <title>The Implicit Curriculum: Learning Dynamics in RL with Verifiable Rewards</title>
      <link>https://arxiv.org/abs/2602.14872</link>
      <description>arXiv:2602.14872v2 Announce Type: replace 
Abstract: Reinforcement learning with verifiable rewards (RLVR) has been a main driver of recent breakthroughs in large reasoning models. Yet it remains a mystery how rewards based solely on final outcomes can help overcome the long-horizon barrier to extended reasoning. To understand this, we develop a theory of the training dynamics of RLVR for transformers on compositional reasoning tasks. Our theory shows that mixed-difficulty training naturally follows an implicit curriculum: without any explicit schedule, easier problems become learnable first and shape the frontier for harder ones, creating a learning progression from easy to hard during optimization. The effectiveness of this curriculum is governed by the smoothness of the difficulty spectrum. When the spectrum is smooth, training dynamics enters a well-behaved relay regime, in which persistent gradient signals on easier problems make slightly harder ones tractable and keep training at the edge of competence. When the spectrum contains abrupt discontinuities, training undergoes grokking-type phase transitions with prolonged plateaus before progress recurs. As a technical contribution, our analysis develops and adapts techniques from Fourier analysis on finite groups to our setting. We validate the predicted mechanisms empirically via synthetic experiments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.14872v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>math.OC</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yu Huang, Zixin Wen, Yuejie Chi, Yuting Wei, Aarti Singh, Yingbin Liang, Yuxin Chen</dc:creator>
    </item>
    <item>
      <title>DistributedEstimator: Distributed Training of Quantum Neural Networks via Circuit Cutting</title>
      <link>https://arxiv.org/abs/2602.16233</link>
      <description>arXiv:2602.16233v2 Announce Type: replace 
Abstract: Circuit cutting decomposes a large quantum circuit into smaller subcircuits whose outputs are classically reconstructed to recover original expectation values. While prior work characterises cutting overhead via subcircuit counts and sampling complexity, its end-to-end impact on iterative, estimator-driven training pipelines remains insufficiently measured from a systems perspective. We propose DistributedEstimator, a cut-aware estimator execution pipeline that treats circuit cutting as a staged distributed workload, instrumenting each query across four phases: partitioning, subexperiment generation, parallel execution, and classical reconstruction. Using logged runtime traces and learning outcomes on two binary classification workloads (Iris and MNIST), we quantify cutting overheads, scaling limits, and sensitivity to injected stragglers, and evaluate whether accuracy and robustness are preserved under matched training budgets. Reconstruction dominates per-query time -- reaching a median of 53% and 95th percentile of 58% at three cuts -- bounding achievable speed-up under parallelism. Despite these overheads, test accuracy is fully preserved on Iris and maintained without systematic degradation on MNIST across all cut configurations. Robustness under Gaussian noise and FGSM perturbations is similarly preserved, with several cut configurations exhibiting comparable or improved robustness relative to the uncut baseline. Exponential growth of subexperiment counts (${O}(9^c)$ for CNOT-based decomposition) is a fundamental barrier limiting practical experimentation to small qubit counts. These results establish that practical scaling for learning workloads requires reducing and overlapping reconstruction, scheduling policies for barrier-dominated critical paths, and computationally efficient reconstruction strategies for larger qubit counts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.16233v2</guid>
      <category>cs.DC</category>
      <category>cs.LG</category>
      <category>quant-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Prabhjot Singh, Adel N. Toosi, Rajkumar Buyya</dc:creator>
    </item>
    <item>
      <title>The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems</title>
      <link>https://arxiv.org/abs/2602.17753</link>
      <description>arXiv:2602.17753v2 Announce Type: replace 
Abstract: Agentic AI systems are increasingly capable of performing professional and personal tasks with limited human involvement. However, tracking these developments is difficult because the AI agent ecosystem is complex, rapidly evolving, and inconsistently documented, posing obstacles to both researchers and policymakers. To address these challenges, this paper presents the 2025 AI Agent Index. The Index documents information regarding the origins, design, capabilities, ecosystem, and safety features of 30 state-of-the-art AI agents based on publicly available information and email correspondence with developers. In addition to documenting information about individual agents, the Index illuminates broader trends in the development of agents, their capabilities, and the level of transparency of developers. Notably, we find different transparency levels among agent developers and observe that most developers share little information about safety, evaluations, and societal impacts. The 2025 AI Agent Index is available online at https://aiagentindex.mit.edu</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.17753v2</guid>
      <category>cs.CY</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Leon Staufer, Kevin Feng, Kevin Wei, Luke Bailey, Yawen Duan, Mick Yang, A. Pinar Ozisik, Stephen Casper, Noam Kolt</dc:creator>
    </item>
    <item>
      <title>Back to Blackwell: Closing the Loop on Intransitivity in Multi-Objective Preference Fine-Tuning</title>
      <link>https://arxiv.org/abs/2602.19041</link>
      <description>arXiv:2602.19041v2 Announce Type: replace 
Abstract: A recurring challenge in preference fine-tuning (PFT) is handling $\textit{intransitive}$ (i.e., cyclic) preferences. Intransitive preferences often stem from either $\textit{(i)}$ inconsistent rankings along a single objective or $\textit{(ii)}$ scalarizing multiple objectives into a single metric. Regardless of their source, the downstream implication of intransitive preferences is the same: there is no well-defined optimal policy, breaking a core assumption of the standard PFT pipeline. In response, we propose a novel, game-theoretic solution concept, the $\textit{Maximum Entropy Blackwell Winner}$ ($\textit{MaxEntBW}$), that is well-defined under multi-objective intransitive preferences. To enable computing MaxEntBWs at scale, we derive $\texttt{PROSPER}$: a provably efficient PFT algorithm. Unlike prior self-play techniques, $\texttt{PROSPER}$ directly handles multiple objectives without requiring scalarization. We then apply $\texttt{PROSPER}$ to the problem of fine-tuning large language models (LLMs) from multi-objective LLM-as-a-Judge feedback (e.g., rubric-based judges), a setting where both sources of intransitivity arise. We find that $\texttt{PROSPER}$ outperforms all baselines considered across both instruction following and general chat benchmarks, releasing trained model checkpoints at the 7B and 3B parameter scales.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.19041v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiahao Zhang, Lujing Zhang, Keltin Grimes, Zhuohao Yu, Gokul Swamy, Zhiwei Steven Wu</dc:creator>
    </item>
    <item>
      <title>Denoising Particle Filters: Learning State Estimation with Single-Step Objectives</title>
      <link>https://arxiv.org/abs/2602.19651</link>
      <description>arXiv:2602.19651v2 Announce Type: replace 
Abstract: Learning-based methods commonly treat state estimation in robotics as a sequence modeling problem. While this paradigm can be effective at maximizing end-to-end performance, models are often difficult to interpret and expensive to train, since training requires unrolling sequences of predictions in time. As an alternative to end-to-end trained state estimation, we propose a novel particle filtering algorithm in which models are trained from individual state transitions, fully exploiting the Markov property in robotic systems. In this framework, measurement models are learned implicitly by minimizing a denoising score matching objective. At inference, the learned denoiser is used alongside a (learned) dynamics model to approximately solve the Bayesian filtering equation at each time step, effectively guiding predicted states toward the data manifold informed by measurements. We evaluate the proposed method on challenging robotic state estimation tasks in simulation, demonstrating competitive performance compared to tuned end-to-end trained baselines. Importantly, our method offers the desirable composability of classical filtering algorithms, allowing prior information and external sensor models to be incorporated without retraining.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.19651v2</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lennart R\"ostel, Berthold B\"auml</dc:creator>
    </item>
    <item>
      <title>Meta-Learning and Meta-Reinforcement Learning -- Tracing the Path towards DeepMind's Adaptive Agent</title>
      <link>https://arxiv.org/abs/2602.19837</link>
      <description>arXiv:2602.19837v3 Announce Type: replace 
Abstract: Humans are highly effective at utilizing prior knowledge to adapt to novel tasks, a capability that standard machine learning models struggle to replicate due to their reliance on task-specific training. Meta-learning overcomes this limitation by allowing models to acquire transferable knowledge from various tasks, enabling rapid adaptation to new challenges with minimal data. This survey provides a rigorous, task-based formalization of meta-learning and meta-reinforcement learning and uses that paradigm to chronicle the landmark algorithms that paved the way for DeepMind's Adaptive Agent, consolidating the essential concepts needed to understand the Adaptive Agent and other generalist approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.19837v3</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bj\"orn Hoppmann, Christoph Scholz</dc:creator>
    </item>
    <item>
      <title>Manifold of Failure: Behavioral Attraction Basins in Language Models</title>
      <link>https://arxiv.org/abs/2602.22291</link>
      <description>arXiv:2602.22291v3 Announce Type: replace 
Abstract: While prior work has focused on projecting adversarial examples back onto the manifold of natural data to restore safety, we argue that a comprehensive understanding of AI safety requires characterizing the unsafe regions themselves. This paper introduces a framework for systematically mapping the Manifold of Failure in Large Language Models (LLMs). We reframe the search for vulnerabilities as a quality diversity problem, using MAP-Elites to illuminate the continuous topology of these failure regions, which we term behavioral attraction basins. Our quality metric, Alignment Deviation, guides the search towards areas where the model's behavior diverges most from its intended alignment. Across three LLMs: Llama-3-8B, GPT-OSS-20B, and GPT-5-Mini, we show that MAP-Elites achieves up to 63% behavioral coverage, discovers up to 370 distinct vulnerability niches, and reveals dramatically different model-specific topological signatures: Llama-3-8B exhibits a near-universal vulnerability plateau (mean Alignment Deviation 0.93), GPT-OSS-20B shows a fragmented landscape with spatially concentrated basins (mean 0.73), and GPT-5-Mini demonstrates strong robustness with a ceiling at 0.50. Our approach produces interpretable, global maps of each model's safety landscape that no existing attack method (GCG, PAIR, or TAP) can provide, shifting the paradigm from finding discrete failures to understanding their underlying structure.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.22291v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sarthak Munshi, Manish Bhatt, Vineeth Sai Narajala, Idan Habler, Ammar Al-Kahfah, Ken Huang, Blake Gatto</dc:creator>
    </item>
    <item>
      <title>Helmlab: A Two-Space Family of Analytical, Data-Driven Color Spaces for UI Design Systems</title>
      <link>https://arxiv.org/abs/2602.23010</link>
      <description>arXiv:2602.23010v3 Announce Type: replace 
Abstract: We present Helmlab, a family of two purpose-built color spaces for UI design systems sharing a common 11-stage analytical structure: MetricSpace, a 72-parameter space optimized for color-difference prediction, and GenSpace, a 44-parameter space optimized for gradient and palette generation. The forward transform maps CIE XYZ to a perceptually-organized Lab representation through learned matrices, per-channel power compression, Fourier hue correction, and embedded Helmholtz-Kohlrausch lightness adjustment. A post-pipeline neutral correction holds gray-axis chroma below 1e-5 on a 21-step ramp, and a rigid rotation of the chromatic plane improves hue-angle alignment without affecting the distance metric (which is invariant under isometries).
  On COMBVD (3,813 color pairs), MetricSpace v21 achieves STRESS 22.48, a 23 percent reduction from CIEDE2000 (29.20). On the held-out MacAdam 1974 dataset it scores 19.51 (CIEDE2000: 22.13; CAM16-UCS leads at 18.71). On a self-collected 3,552-judgement screen-condition set it scores 23.26 vs 62.54 for CIEDE2000. On academic He et al. 2022 (82 3D-printed pairs) MetricSpace scores 35.9 vs CIEDE2000 32.6, a regression we own. Averaging the three primary datasets, MetricSpace scores 21.75 vs the next-best baseline CIECAM02-UCS at 35.98.
  GenSpace v0.11.1 trades distance accuracy for generation quality: on a 90-metric, 3,038-pair gradient/palette benchmark across sRGB, P3, and Rec.2020, it wins 65 of 90 vs OKLab. The transform is invertible with round-trip errors below 1e-13. Production implementations ship on PyPI, npm, Color.js (PR 722, merged), and as a PostCSS plugin.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.23010v3</guid>
      <category>cs.GR</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gorkem Yildiz</dc:creator>
    </item>
    <item>
      <title>France or Spain or Germany or France: A Neural Account of Non-Redundant Redundant Disjunctions</title>
      <link>https://arxiv.org/abs/2602.23547</link>
      <description>arXiv:2602.23547v2 Announce Type: replace 
Abstract: Sentences like "She will go to France or Spain, or perhaps to Germany or France." appear formally redundant, yet become acceptable in contexts such as "Mary will go to a philosophy program in France or Spain, or a mathematics program in Germany or France." While this phenomenon has typically been analyzed using symbolic formal representations, we aim to provide an account grounded in artificial neural mechanisms. We first present new behavioral evidence from humans and large language models demonstrating the robustness of this apparent non-redundancy across contexts. We then show that, in language models, redundancy avoidance arises from two interacting mechanisms: models learn to bind contextually relevant information to repeated lexical items, and Transformer induction heads selectively attend to these context-licensed representations. We argue that this neural explanation sheds light on the mechanisms underlying context-sensitive semantic interpretation, and that it complements existing symbolic analyses.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.23547v2</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sasha Boguraev, Qing Yao, Kyle Mahowald</dc:creator>
    </item>
    <item>
      <title>ArtiFixer: Enhancing and Extending 3D Reconstruction with Auto-Regressive Diffusion Models</title>
      <link>https://arxiv.org/abs/2603.00492</link>
      <description>arXiv:2603.00492v2 Announce Type: replace 
Abstract: Per-scene optimization methods such as 3D Gaussian Splatting provide state-of-the-art novel view synthesis quality but extrapolate poorly to under-observed areas. Methods that leverage generative priors to correct artifacts in these areas hold promise but currently suffer from two shortcomings. The first is scalability, as existing methods use image diffusion models or bidirectional video models that are limited in the number of views they can generate in a single pass (and thus require a costly iterative distillation process for consistency). The second is quality itself, as generators used in prior work tend to produce outputs that are inconsistent with existing scene content and fail entirely in completely unobserved regions. To solve these, we propose a two-stage pipeline that leverages two key insights. First, we train a powerful bidirectional generative model with a novel opacity mixing strategy that encourages consistency with existing observations while retaining the model's ability to extrapolate novel content in unseen areas. Second, we distill it into a causal auto-regressive model that generates hundreds of frames in a single pass. This model can directly produce novel views or serve as pseudo-supervision to improve the underlying 3D representation in a simple and highly efficient manner. We evaluate our method extensively and demonstrate that it can generate plausible reconstructions in scenarios where existing approaches fail completely. When measured on commonly benchmarked datasets, we outperform all existing baselines by a wide margin, exceeding prior state-of-the-art methods by 1-3 dB PSNR.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.00492v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.GR</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Riccardo de Lutio, Tobias Fischer, Yen-Yu Chang, Yuxuan Zhang, Jay Zhangjie Wu, Xuanchi Ren, Tianchang Shen, Katarina Tothova, Zan Gojcic, Haithem Turki</dc:creator>
    </item>
    <item>
      <title>A Reconstruction System for Industrial Pipeline Inner Walls Using Panoramic Image Stitching with Endoscopic Imaging</title>
      <link>https://arxiv.org/abs/2603.00714</link>
      <description>arXiv:2603.00714v2 Announce Type: replace 
Abstract: Visual analysis and reconstruction of pipeline inner walls remain challenging in industrial inspection scenarios. This paper presents a dedicated reconstruction system for pipeline inner walls via industrial endoscopes, which is built on panoramic image stitching technology. Equipped with a custom graphical user interface (GUI), the system extracts key frames from endoscope video footage, and integrates polar coordinate transformation with image stitching techniques to unwrap annular video frames of pipeline inner walls into planar panoramic images. Experimental results demonstrate that the proposed method enables efficient processing of industrial endoscope videos, and the generated panoramic stitched images preserve all detailed features of pipeline inner walls in their entirety. This provides intuitive and accurate visual support for defect detection and condition assessment of pipeline inner walls. In comparison with the traditional frame-by-frame video review method, the proposed approach significantly elevates the efficiency of pipeline inner wall reconstruction and exhibits considerable engineering application value.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.00714v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rui Ma, Yifeng Wang, Ziteng Yang, Jing Guo, Naomi Imali Okanda, Xinghui Li</dc:creator>
    </item>
    <item>
      <title>From OCR to Analysis: Tracking Correction Provenance in Digital Humanities Pipelines</title>
      <link>https://arxiv.org/abs/2603.00884</link>
      <description>arXiv:2603.00884v4 Announce Type: replace 
Abstract: Optical Character Recognition (OCR) is a critical but error-prone stage in digital humanities text pipelines. While OCR correction improves usability for downstream NLP tasks, common workflows often overwrite intermediate decisions, obscuring how textual transformations affect scholarly interpretation. We present a provenance-aware framework for OCR-corrected humanities corpora that records correction lineage at the span level, including edit type, correction source, confidence, and revision status. Using a pilot corpus of historical texts, we compare downstream named entity extraction across raw OCR, fully corrected text, and provenance-filtered corrections. Our results show that correction pathways can substantially alter extracted entities and document-level interpretations, while provenance signals help identify unstable outputs and prioritize human review. We argue that provenance should be treated as a first-class analytical layer in NLP for digital humanities, supporting reproducibility, source criticism, and uncertainty-aware interpretation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.00884v4</guid>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haoze Guo, Ziqi Wei</dc:creator>
    </item>
    <item>
      <title>Understanding LoRA as Knowledge Memory: An Empirical Analysis</title>
      <link>https://arxiv.org/abs/2603.01097</link>
      <description>arXiv:2603.01097v2 Announce Type: replace 
Abstract: Continuous knowledge updating for pre-trained large language models (LLMs) is increasingly necessary yet remains challenging. Although inference-time methods like In-Context Learning (ICL) and Retrieval-Augmented Generation (RAG) are popular, they face constraints in context budgets, costs, and retrieval fragmentation. Departing from these context-dependent paradigms, this work investigates a parametric approach using Low-Rank Adaptation (LoRA) as a modular knowledge memory. Although few recent works examine this concept, the fundamental mechanics governing its capacity and composability remain largely unexplored. We bridge this gap through the first systematic empirical study mapping the design space of LoRA-based memory, ranging from characterizing storage capacity and optimizing internalization to scaling multi-module systems and evaluating long-context reasoning. Rather than proposing a single architecture, we provide practical guidance on the operational boundaries of LoRA memory. Overall, our findings position LoRA as the complementary axis of memory alongside RAG and ICL, offering distinct advantages.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.01097v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:journal_reference>Proceedings of the Forty-Third International Conference on Machine Learning (ICML), 2026</arxiv:journal_reference>
      <dc:creator>Seungju Back, Dongwoo Lee, Naun Kang, Taehee Lee, S. K. Hong, Youngjune Gwon, Sungjin Ahn</dc:creator>
    </item>
    <item>
      <title>A Graph-Native Approach to Normalization</title>
      <link>https://arxiv.org/abs/2603.02995</link>
      <description>arXiv:2603.02995v2 Announce Type: replace 
Abstract: In recent years, knowledge graphs (KGs) - in particular in the form of labeled property graphs (LPGs) - have become essential components in a broad range of applications. Although the absence of strict schemas for KGs facilitates structural issues that lead to redundancies and subsequently to inconsistencies and anomalies, the problem of KG quality has so far received only little attention. Inspired by normalization using functional dependencies for relational data, a first approach exploiting dependencies within nodes has been proposed. However, real-world KGs also expose functional dependencies involving edges. In this paper, we therefore propose graph-native normalization, which considers dependencies within nodes, edges, and their combination. We define a range of graph-native normal forms and graph object functional dependencies and propose algorithms for transforming graphs accordingly. We evaluate our contributions using a broad range of synthetic and native graph datasets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.02995v2</guid>
      <category>cs.DB</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Johannes Schrott, Maxime Jakubowski, Katja Hose</dc:creator>
    </item>
    <item>
      <title>Local Safety Filters for Networked Systems via Two-Time-Scale Design</title>
      <link>https://arxiv.org/abs/2603.03632</link>
      <description>arXiv:2603.03632v3 Announce Type: replace 
Abstract: Safety filters based on Control Barrier Functions (CBFs) provide formal guarantees of forward invariance, but are often difficult to implement in networked dynamical systems. This is due to global coupling and communication requirements. This paper develops locally implementable approximations of networked CBF safety filters that require no coordination across subsystems. The proposed approach is based on a two-time-scale dynamic implementation inspired by singular perturbation theory, where a small parameter $\epsilon$ separates fast filter dynamics from the plant dynamics; then, a local implementation is enabled via derivative estimation. Explicit bounds are derived to quantify the mismatch between trajectories of the systems with dynamic filter and with the ideal centralized safety filter. These results characterize how safety degradation depends on the time-scale parameter $\epsilon$, estimation errors, and filter activation time, thereby quantifying trade-offs between safety guarantees and local implementability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.03632v3</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Emiliano Dall'Anese</dc:creator>
    </item>
    <item>
      <title>Relational In-Context Learning via Synthetic Pre-training with Structural Prior</title>
      <link>https://arxiv.org/abs/2603.03805</link>
      <description>arXiv:2603.03805v3 Announce Type: replace 
Abstract: Relational Databases (RDBs) are the backbone of modern business, yet they lack foundation models comparable to those in text or vision. A key obstacle is that high-quality RDBs are private, scarce and structurally heterogeneous, making internet-scale pre-training infeasible. To overcome this data scarcity, We introduce $\textbf{RDB-PFN}$, the first relational foundation model trained purely via $\textbf{synthetic data}$. Inspired by Prior-Data Fitted Networks (PFNs) where synthetic data generated from Structural Causal Models (SCMs) enables reasoning on single tables, we design a $\textbf{Relational Prior Generator}$ to create an infinite stream of diverse RDBs from scratch. Pre-training on $\textbf{over 2 million}$ synthetic single-table and relational tasks, RDB-PFN learns to adapt to any new database instantly via genuine $\textbf{in-context learning}$. Experiments verify RDB-PFN achieves strong few-shot performance on 19 real-world relational prediction tasks, outperforming graph-based and single-table foundation-model baselines (given the same DFS-linearized inputs), while using a lightweight architecture and fast inference. The code is available at https://github.com/MuLabPKU/RDBPFN</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.03805v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.DB</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yanbo Wang, Jiaxuan You, Chuan Shi, Muhan Zhang</dc:creator>
    </item>
    <item>
      <title>Capacity-Aware Mixture Law Enables Efficient LLM Data Optimization</title>
      <link>https://arxiv.org/abs/2603.08022</link>
      <description>arXiv:2603.08022v2 Announce Type: replace 
Abstract: A data mixture refers to how different data sources are combined to train large language models, and selecting an effective mixture is crucial for optimal downstream performance. Existing methods either conduct costly searches directly on the target model or rely on mixture scaling laws that fail to extrapolate well to large model sizes. We address these limitations by introducing a compute-efficient pipeline for data mixture scaling. First, we propose CAMEL, a capacity-aware mixture law that models validation loss with the nonlinear interplay between model size and mixture. We also introduce a loss-to-benchmark prediction law that estimates benchmark accuracy from validation loss, enabling end-to-end performance prediction for the target model. Next, we study how to allocate a fixed compute budget across model scales to fit the law and reduce prediction error. Finally, we apply our method to Mixture-of-Experts models with up to 7B-A150M parameters to fit the law, and verify the optimal mixture derived from the law by extrapolating to a 55B-A1.2B target model. Compared to prior methods, we reduce mixture optimization costs by 50\% and improves downstream benchmark performance by up to 3\%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.08022v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jingwei Li, Xinran Gu, Jingzhao Zhang</dc:creator>
    </item>
    <item>
      <title>Avoiding Big Integers: Parallel Multimodular Algebraic Verification of Arithmetic Circuits</title>
      <link>https://arxiv.org/abs/2603.09501</link>
      <description>arXiv:2603.09501v2 Announce Type: replace 
Abstract: Word-level verification of arithmetic circuits with large operands typically relies on arbitrary-precision arithmetic, which can lead to significant computational overhead as word sizes grow. In this paper, we present a hybrid algebraic verification technique based on polynomial reasoning that combines linear and nonlinear rewriting. Our approach relies on multimodular reasoning using homomorphic images, where computations are performed in parallel modulo different primes, thereby avoiding any large-integer arithmetic. We implement the proposed method in the verification tool TalisMan2.0 and evaluate it on a suite of multiplier benchmarks. Our results show that hybrid multimodular reasoning significantly improves upon existing approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.09501v2</guid>
      <category>cs.SC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Clemens Hofstadler, Daniela Kaufmann, Chen Chen</dc:creator>
    </item>
    <item>
      <title>A Hybrid Quantum-Classical Framework for Financial Volatility Forecasting Based on Quantum Circuit Born Machines</title>
      <link>https://arxiv.org/abs/2603.09789</link>
      <description>arXiv:2603.09789v2 Announce Type: replace 
Abstract: Accurate financial volatility forecasting is crucial but challenged by the non-linear, highly correlated nature of market data. Recently, quantum computing has emerged as a promising paradigm for solving complex high-dimensional sampling problems. To harness this, we propose a novel hybrid framework combining the temporal representation power of classical neural networks with the distribution-learning capabilities of quantum models. Specifically, we integrate a Long Short-Term Memory (LSTM) network with a Quantum Circuit Born Machine (QCBM). The LSTM extracts dynamic features, while the QCBM acts as a learnable generative prior modeling complex market distributions to guide forecasting. Evaluated on 5-minute high-frequency data from the SSE Composite and CSI 300 indices, our model significantly outperforms a classical LSTM baseline across MSE, RMSE, and QLIKE metrics. Furthermore, by introducing a stochastic ``Drop-Prior" mechanism during training, the LSTM implicitly distills structured information from the quantum prior. This establishes a pragmatic paradigm of ``quantum-assisted training with classical-efficient inference", whereby the model retains its quantum-enhanced accuracy even when the quantum module is entirely disabled during deployment. This demonstrates a practical pathway for leveraging quantum computing to enhance classical models without real-time quantum inference latency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.09789v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>quant-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yixiong Chen</dc:creator>
    </item>
    <item>
      <title>InSpatio-WorldFM: An Open-Source Real-Time Generative Frame Model</title>
      <link>https://arxiv.org/abs/2603.11911</link>
      <description>arXiv:2603.11911v3 Announce Type: replace 
Abstract: We present InSpatio-WorldFM, an open-source real-time frame model for spatial intelligence. Unlike video-based world models that rely on sequential frame generation and incur substantial latency due to window-level processing, InSpatio-WorldFM adopts a frame-based paradigm that generates each frame independently, enabling low-latency real-time spatial inference. By enforcing multi-view spatial consistency through explicit 3D anchors and implicit spatial memory, the model preserves global scene geometry while maintaining fine-grained visual details across viewpoint changes. We further introduce a progressive three-stage training pipeline that transforms a pretrained image diffusion model into a controllable frame model and finally into a real-time generator through few-step distillation. Experimental results show that InSpatio-WorldFM achieves strong multi-view consistency while supporting interactive exploration on consumer-grade GPUs, providing an efficient alternative to traditional video-based world models for real-time world simulation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.11911v3</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator> InSpatio Team, Donghui Shen, Guofeng Zhang, Haomin Liu, Haoyu Ji, Jialin Liu, Jing Guo, Nan Wang, Siji Pan, Weihong Pan, Weijian Xie, Xiaojun Xiang, Xiaoyu Zhang, Xianbin Liu, Yifu Wang, Yipeng Chen, Zhewen Le, Zhichao Ye, Ziqiang Zhao</dc:creator>
    </item>
    <item>
      <title>Reaching a Consensus in Predictive Loops</title>
      <link>https://arxiv.org/abs/2603.12137</link>
      <description>arXiv:2603.12137v2 Announce Type: replace 
Abstract: Predictions in digital platforms must adapt over time as individuals update their beliefs through social interactions. At the same time, changing predictions alter the content people are exposed to and, consequently, the very beliefs they aim to forecast. This recursive coupling between predictions and individuals complicates the analysis of the long-term societal impact of predictive systems. In this work, we propose a minimal model where predictions and opinions co-evolve, combining insights from network science with concepts from performative prediction. In our model a platform's predictions influence individual opinions, which then evolve through peer interactions and form the training data for future platform model updates. We demonstrate that this co-evolution induces a novel equilibrium that qualitatively differs from standard network equilibria. In particular, we show how standard predictive objectives can drive networks toward consensus even under conditions where classical opinion-dynamics models lead to disagreement. This emerges because predictive systems dynamically adapt to changing opinions, and learning objectives create spillover effects among individuals beyond the topology of the network. We further analyze systematic deviations from standard prediction and demonstrate amplified effects of targeted platform interventions on equilibrium outcomes, compared to classical network intervention analyses. Together, our results illustrate performativity as an important, yet so far neglected, qualifying factor in social networks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.12137v2</guid>
      <category>cs.SI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Jiduan Wu, Rediet Abebe, Celestine Mendler-D\"unner</dc:creator>
    </item>
    <item>
      <title>Advancing Trustworthy AI in Healthcare Through Meta-Research: Results of an Interdisciplinary Design-Thinking Workshop</title>
      <link>https://arxiv.org/abs/2603.13286</link>
      <description>arXiv:2603.13286v2 Announce Type: replace 
Abstract: Meta-research and Trustworthy AI (TAI) share common goals, namely improving evidence, robustness, and transparency, yet there is very little interplay between the two fields. To investigate the potential benefits of closer collaboration between the domains of TAI in healthcare and meta-research, we convened an interdisciplinary workshop funded by the Volkswagen Foundation in February 2025. The workshop aimed to collaboratively examine key challenges in translating AI ethics principles into practice and to identify potential solutions informed by meta-research approaches. A Design Thinking-informed co-creation approach was followed by an inductive descriptive analysis of the outputs. Our results demonstrate how meta-research can offer concrete contributions to address pressing challenges of TAI in healthcare. These challenges include the dynamic and complex nature of TAI ethical requirements and principles, common terminology and understanding of TAI, ensuring robustness, replicability, and reproducibility, choosing adequate evaluation metrics, lack of transparency, advancing preclinical biomedical research, and validation in real-world clinical environments. We present a catalog of ideas and a roadmap for future research, which synthesize existing interconnections and identify concrete next steps and open research gaps, thereby serving as a foundation for future interdisciplinary efforts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.13286v2</guid>
      <category>cs.CY</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Valerie B\"urger, Marlie Besouw, Jana Fehr, Riana Minocher, Emma Moorhead, Isabel Velarde, Louis Agha-Mir-Salim, Julia Amann, Alexandra Bannach-Brown, David B. Blumenthal, Kaitlyn Hair, Bert Heinrichs, Moritz Herrmann, Elizabeth Hofvenschi\"old, Sune Holm, Anne A. H. de Hond, Sara Kijewski, Stuart McLennan, Timo Minssen, Marco S. Nobile, Nico Pfeifer, Jessica L. Rohmann, Tony Ross-Hellauer, Marija Slavkovik, Karin Tafur, Eleonora Vigan\`o, Magnus Westerlund, Tracey Weissgerber, Vince I. Madai</dc:creator>
    </item>
    <item>
      <title>RetimeGS: Continuous-Time Reconstruction of 4D Gaussian Splatting</title>
      <link>https://arxiv.org/abs/2603.13783</link>
      <description>arXiv:2603.13783v2 Announce Type: replace 
Abstract: Temporal retiming, the ability to reconstruct and render dynamic scenes at arbitrary timestamps, is crucial for applications such as slow-motion playback, temporal editing, and post-production. However, most existing 4D Gaussian Splatting (4DGS) methods overfit at discrete frame indices but struggle to represent continuous-time frames, leading to ghosting artifacts when interpolating between timestamps. We identify this limitation as a form of temporal aliasing and propose RetimeGS, a simple yet effective 4DGS representation that explicitly defines the temporal behavior of the 3D Gaussian and mitigates temporal aliasing. To achieve smooth and consistent interpolation, we incorporate optical flow-guided initialization and supervision, triple-rendering supervision, and other targeted strategies. Together, these components enable ghost-free, temporally coherent rendering even under large motions. Experiments on datasets featuring fast motion, non-rigid deformation, and severe occlusions demonstrate that RetimeGS achieves superior quality and coherence over state-of-the-art methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.13783v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xuezhen Wang, Li Ma, Yulin Shen, Zeyu Wang, Pedro V. Sander</dc:creator>
    </item>
    <item>
      <title>Topology-Preserving Data Augmentation for Ring-Type Polygon Annotations</title>
      <link>https://arxiv.org/abs/2603.14764</link>
      <description>arXiv:2603.14764v3 Announce Type: replace 
Abstract: Geometric data augmentation is widely used in segmentation workflows, but polygon annotations are often assumed to remain valid after transformation. This assumption can fail in structured domains such as architectural floorplan analysis, where a region may contain an interior void encoded as part of a single ordered polygon chain. Cropping or clipping can remove bridge vertices in this chain, causing one semantic region to split into disconnected components. We propose a lightweight topology-preserving augmentation strategy that repairs missing adjacency relations in index space while preserving the original vertex order. The method adds minimal overhead and can be integrated into existing preprocessing workflows. Experiments show that the proposed approach achieves near-perfect Cyclic Adjacency Preservation (CAP) across common geometric transformations and improves annotation consistency in polygon-based segmentation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.14764v3</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sudip Laudari, Sang Hun Baek</dc:creator>
    </item>
    <item>
      <title>Encoding Predictability and Legibility for Style-Conditioned Diffusion Policy</title>
      <link>https://arxiv.org/abs/2603.16368</link>
      <description>arXiv:2603.16368v2 Announce Type: replace 
Abstract: Striking a balance between efficiency and transparent motion is a core challenge in human-robot collaboration, as highly expressive movements often incur unnecessary time and energy costs. In collaborative environments, legibility allows a human observer a better understanding of the robot's actions, increasing safety and trust. However, these behaviors result in sub-optimal and exaggerated trajectories that are redundant in low-ambiguity scenarios where the robot's goal is already obvious. To address this trade-off, we propose Style-Conditioned Diffusion Policy (SCDP), a modular framework that constrains the trajectory generation of a pre-trained diffusion model toward either legibility or efficiency based on the environment's configuration. Our method utilizes a post-training pipeline that freezes the base policy and trains a lightweight scene encoder and conditioning predictor to modulate the diffusion process. At inference time, an ambiguity detection module activates the appropriate conditioning, prioritizing expressive motion only for ambiguous goals and reverting to efficient paths otherwise. We evaluate SCDP on manipulation and navigation tasks, and results show that it enhances legibility in ambiguous settings while preserving optimal efficiency when legibility is unnecessary, all without retraining the base policy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.16368v2</guid>
      <category>cs.RO</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Adrien Jacquet Cr\'etides, Mouad Abrini, Hamed Rahimi, Mohamed Chetouani</dc:creator>
    </item>
    <item>
      <title>LLMs learn scientific taste from institutional traces across the social sciences</title>
      <link>https://arxiv.org/abs/2603.16659</link>
      <description>arXiv:2603.16659v2 Announce Type: replace 
Abstract: Reinforcement-learned reasoning has powered recent AI leaps on verifiable tasks, including mathematics, code, and structure prediction. The harder bottleneck is evaluative judgment in low-verifiability domains, where no oracle anchors reward and the core question is which untested ideas deserve attention. We test whether institutional traces, the record of what fields published, where, and at which tier, can serve as a training signal for AI evaluators. Across eight social science disciplines (psychology, economics, communication, sociology, political science, management, business and finance, public administration), we built held-out four-tier research-pitch benchmarks and supervised-fine-tuned (SFT) LLMs on field-specific publication outcomes. The fine-tuned models cleared the 25 percent chance baseline and exceeded frontier-model performance by wide margins, with best single-model accuracy ranging from 55.0 percent in public administration to 85.5 percent in psychology. In management, evaluated against 48 expert gatekeepers, 174 junior researchers, and 11 frontier reasoning models, the best single fine-tuned model (Qwen3-4B) reached 59.2 percent, 17.6 percentage points above expert majority vote (41.6 percent, non-tied) and 28.1 percentage points above the frontier mean (31.1 percent). The fine-tuned models also showed calibrated confidence: confidence rose when predictions were correct and fell when wrong, mirroring how a skilled reviewer can say "I'm sure" versus "I'm guessing." Selective triage on this signal reached very high accuracy on the highest-confidence subsets in every field. Institutional traces, we conclude, encode a scalable training signal for the low-verifiability judgment on which science depends.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.16659v2</guid>
      <category>cs.AI</category>
      <category>econ.GN</category>
      <category>q-fin.EC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ziqin Gong, Ning Li, Huaikang Zhou</dc:creator>
    </item>
    <item>
      <title>Multi-Source Human-in-the-Loop Digital Twin Testbed for Connected and Autonomous Vehicles in Mixed Traffic Flow</title>
      <link>https://arxiv.org/abs/2603.17751</link>
      <description>arXiv:2603.17751v3 Announce Type: replace 
Abstract: In the emerging mixed traffic environments, Connected and Autonomous Vehicles (CAVs) have to interact with surrounding human-driven vehicles (HDVs). This paper introduces MSH-MCCT (Multi-Source Human-in-the-Loop Mixed Cloud Control Testbed), a novel CAV testbed that captures complex interactions between various CAVs and HDVs. Utilizing the Mixed Digital Twin concept, which combines Mixed Reality with Digital Twin, MSH-MCCT integrates physical, virtual, and mixed platforms, along with multi-source control inputs. Bridged by the mixed platform, MSH-MCCT allows human drivers and CAV algorithms to operate both physical and virtual vehicles within multiple fields of view. Particularly, this testbed facilitates the coexistence and real-time interaction of physical and virtual CAVs \&amp; HDVs, significantly enhancing the experimental flexibility and scalability. Experiments on vehicle platooning in mixed traffic showcase the potential of MSH-MCCT to conduct CAV testing with multi-source real human drivers in the loop through driving simulators of diverse fidelity. The videos for the experiments are available at our project website: https://dongjh20.github.io/MSH-MCCT.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.17751v3</guid>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jianghong Dong, Chunying Yang, Mengchi Cai, Chaoyi Chen, Qing Xu, Jianqiang Wang, Jiawei Wang, Keqiang Li</dc:creator>
    </item>
    <item>
      <title>Attention Sinks Induce Gradient Sinks: Massive Activations as Gradient Regulators in Transformers</title>
      <link>https://arxiv.org/abs/2603.17771</link>
      <description>arXiv:2603.17771v2 Announce Type: replace 
Abstract: Attention sinks and massive activations are recurring and closely related phenomena in Transformer models. Existing explanations have largely focused on the forward pass, yet in pre-norm Transformers, large residual-stream norms play only an indirect forward role because sublayers operate on normalized inputs. We study this relationship from the perspective of backpropagation. Empirically and theoretically, we show that under causal masking, attention sinks can induce pronounced gradient concentration, which we term gradient sinks. Since the RMSNorm Jacobian attenuates gradients roughly in inverse proportion to input norm, massive activations can be understood as adaptive regulators of this localized gradient pressure during training. This interpretation predicts that attenuating sink-induced gradients should weaken massive activations. We test this prediction with V-scale, a modification that adjusts backpropagated gradients on the value path. In V-scale models, attention sinks are preserved, whereas massive activations are suppressed. These results identify gradient sinks as a backward-pass counterpart of attention sinks, and massive activations as an adaptive RMSNorm-mediated response that attenuates the resulting localized training pressure. Our code is available at https://anonymous.4open.science/r/GradientSinkCode-B309.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.17771v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yihong Chen, Zhouchen Lin, Quanming Yao</dc:creator>
    </item>
    <item>
      <title>Centrality-Based Pruning for Efficient Echo State Networks</title>
      <link>https://arxiv.org/abs/2603.20684</link>
      <description>arXiv:2603.20684v2 Announce Type: replace 
Abstract: Echo State Networks (ESNs) are a reservoir computing framework widely used for nonlinear time-series prediction. However, despite their effectiveness, randomly initialized reservoirs often contain redundant nodes, leading to unnecessary computational overhead and reduced efficiency. In this work, we propose a graph centrality-based pruning approach that interprets the reservoir as a weighted directed graph and removes structurally less important nodes using centrality measures. Experiments on Mackey-Glass time-series prediction and electric load forecasting demonstrate that the proposed method can significantly reduce reservoir size while maintaining, and in some cases improving, prediction accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.20684v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sudip Laudari</dc:creator>
    </item>
    <item>
      <title>Beyond Content Safety: Real-Time Monitoring for Reasoning Vulnerabilities in Large Language Models</title>
      <link>https://arxiv.org/abs/2603.25412</link>
      <description>arXiv:2603.25412v2 Announce Type: replace 
Abstract: Large language models increasingly rely on explicit chain-of-thought reasoning to solve complex tasks, yet the safety of the reasoning process itself remains largely unaddressed. Existing work focuses predominantly on content safety (i.e., detecting harmful, biased, or factually incorrect outputs), while treating the underlying reasoning chain as an opaque intermediate artifact. We argue that reasoning safety constitutes a fundamental security dimension orthogonal to content safety: the requirement that a model's reasoning trajectory be logically consistent, computationally efficient, and resistant to adversarial manipulation. In this paper, we formalize reasoning safety and introduce a systematic taxonomy of nine unsafe reasoning behaviors. We then conduct a large-scale prevalence study, annotating over 4,000 reasoning chains across benign benchmarks and four state-of-the-art reasoning attacks, empirically demonstrating that all nine error types occur in practice with mechanistically interpretable signatures. To mitigate these threats, we propose the Reasoning Safety Monitor: an external, zero-shot verification framework that runs in parallel with the target LLM. It inspects each reasoning step in real time via a taxonomy-embedded prompt and dispatches an interrupt signal upon detecting unsafe behavior. Extensive evaluations show our monitor achieves up to 87.11% step-level localization accuracy, outperforming hallucination detectors and the best process reward model baselines by a substantial margin. Crucially, the monitor maintains a low false positive rate on correct reasoning paths, operates with negligible latency overhead, and exhibits robust resilience against adaptive adversarial evasion. These findings establish reasoning safety monitoring as a highly feasible and essential component for the secure deployment of large reasoning models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.25412v2</guid>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xunguang Wang, Yuguang Zhou, Qingyue Wang, Zongjie Li, Ruixuan Huang, Zhenlan Ji, Pingchuan Ma, Shuai Wang</dc:creator>
    </item>
    <item>
      <title>ARTA: Adversarial-Robust Multivariate Time--Series Anomaly Detection via Sparsity-Constrained Perturbations</title>
      <link>https://arxiv.org/abs/2603.25956</link>
      <description>arXiv:2603.25956v2 Announce Type: replace 
Abstract: Time-series anomaly detection (TSAD) is a critical component in monitoring complex systems, yet modern deep learning-based detectors are often highly sensitive to localized input corruptions and structured noise. We propose ARTA (Adversarially Robust multivariate Time-series Anomaly detection via sparsity-constrained perturbations), a joint training framework that improves detector robustness through a principled min-max optimization objective. ARTA comprises an anomaly detector and a sparsity-constrained mask generator that are trained simultaneously. The generator identifies minimal, task-relevant temporal perturbations that maximally increase the detector's anomaly score, while the detector is optimized to remain stable under these structured perturbations. The resulting masks characterize the detector's sensitivity to adversarial temporal corruptions and can serve as explanatory signals for the detector's decisions. This adversarial training strategy exposes brittle decision pathways and encourages the detector to rely on distributed and stable temporal patterns rather than spurious localized artifacts. We conduct extensive experiments on the TSB-AD benchmark, demonstrating that ARTA consistently improves anomaly detection performance across diverse datasets and exhibits significantly more graceful degradation under increasing noise levels compared to state-of-the-art baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.25956v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hadi Hojjati, Narges Armanfard</dc:creator>
    </item>
    <item>
      <title>DPD-Cancer: Explainable Graph-Based Deep Learning for Small Molecule Anti-Cancer Activity Prediction</title>
      <link>https://arxiv.org/abs/2603.26114</link>
      <description>arXiv:2603.26114v2 Announce Type: replace 
Abstract: DPD-Cancer is a graph-attention deep learning framework for predicting small-molecule DPD-Cancer is a graph-attention deep learning framework for predicting small-molecule anti-cancer activity across the NCI-60 panel, trained and evaluated under a strict chemistry-aware data-partitioning scheme. On the hold-out test set, the classifier achieved an Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.87 (95% CI [0.86, 0.88]) and Area Under the Precision-Recall Curve (AUPRC) of 0.73 (95% CI [0.70, 0.76]); per-cell-line regression models for 73 cell lines produced a median Pearson's Correlation Coefficient (Pearson's R) of 0.64 and median Root Mean Squared Error (RMSE) of 0.67 for pGI50-value prediction. Benchmarks against pdCSM-Cancer, MLASM, and ACLPred under matched data conditions yielded consistently higher Matthew's Correlation Coefficient (MCC) scores, an occlusion-based attribution analysis confirmed that model explanations were quantitatively faithful to classifier decisions, and an applicability-domain analysis characterised reliability as a function of chemical distance. To facilitate widespread adoption, DPD-Cancer is available as a free, user-friendly web server for unrestricted use at https://biosig.lab.uq.edu.au/dpd_cancer/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.26114v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Magnus H. Str{\o}mme, Alex G. C. de S\'a, David B. Ascher</dc:creator>
    </item>
    <item>
      <title>Geometric Evolution Graph Convolutional Networks: Enhancing Graph Representation Learning via Ricci Flow</title>
      <link>https://arxiv.org/abs/2603.26178</link>
      <description>arXiv:2603.26178v2 Announce Type: replace 
Abstract: We introduce the Geometric Evolution Graph Convolutional Network (GEGCN), a novel framework that enhances graph representation learning through explicit modeling of geometric evolution on graph structures. Specifically, GEGCN leverages a Long Short-Term Memory (LSTM) network to capture the dynamic structural sequence generated by discrete Ricci flow, and infuses the learned dynamic representations into a graph convolutional network. Extensive experiments demonstrate that GEGCN achieves excellent performance on classification tasks across various benchmark datasets, including homophilic/heterophilic graphs, filtered graphs, and large-scale graphs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.26178v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jicheng Ma, Yunyan Yang, Juan Zhao, Liang Zhao</dc:creator>
    </item>
    <item>
      <title>A Rational Account of Categorization Based on Information Theory</title>
      <link>https://arxiv.org/abs/2603.29895</link>
      <description>arXiv:2603.29895v3 Announce Type: replace 
Abstract: We present a new theory of categorization based on an information-theoretic rational analysis. To evaluate this theory, we investigate how well it can account for key findings from classic categorization experiments conducted by Hayes-Roth and Hayes-Roth (1977), Medin and Schaffer (1978), and Smith and Minda (1998). We find that it explains the human categorization behavior as well as (or better) than the independent cue and context models (Medin &amp; Schaffer, 1978), the rational model of categorization (Anderson, 1991), and a hierarchical Dirichlet process model (Griffiths et al., 2007).</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.29895v3</guid>
      <category>cs.AI</category>
      <category>cs.IT</category>
      <category>cs.LG</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Christopher J. MacLellan, Karthik Singaravadivelan, Xin Lian, Zekun Wang, Pat Langley</dc:creator>
    </item>
    <item>
      <title>Diffusion-Based Feature Denoising with NNMF for Robust handwritten digit multi-class classification</title>
      <link>https://arxiv.org/abs/2603.29917</link>
      <description>arXiv:2603.29917v2 Announce Type: replace 
Abstract: This work presents a robust multi-class classification framework for handwritten digits that combines diffusion-driven feature denoising with a hybrid feature representation. Inspired by our previous work on brain tumor classification, the proposed approach operates in a feature space to improve the robustness to noise and adversarial attacks. This manuscript is submitted as an extended abstract rather than a full-length press-ready paper. First, the input images are converted into tight, interpretable exemplification using Non-negative Matrix Factorization (NNMF). In parallel, special deep features are extracted using a computational neural network (CNN). These integral features are combined into a united hybrid representation. The main objective of this work is to extend our previously validated two-class framework to a multi-class handwritten digit classification scenario. To improve robustness, a step diffusion operation is used in the feature space by gradually adding Gaussian noise. A feature denoiser network is trained to reverse this operation and rebuild clean representations from tilted inputs. The courteous features are then applied for multi-class classification. The suggested method is evaluated in both baseline and adversarial settings using AutoAttack. The experimental outcome present that the diffusion-based hybrid model is both effective and robust, the CNN baseline models outperforming while maintain powerful classification performance. These results explain the activity of feature-level diffusion defense for reliable multi-class handwritten digit classification.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.29917v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Hiba Adil Al-kharsan, R\'obert Rajk\'o</dc:creator>
    </item>
    <item>
      <title>Massively Parallel Exact Inference for Hawkes Processes</title>
      <link>https://arxiv.org/abs/2604.01342</link>
      <description>arXiv:2604.01342v2 Announce Type: replace 
Abstract: Multivariate Hawkes processes are a widely used class of self-exciting point processes, but maximum likelihood estimation naively scales as $O(N^2)$ in the number of events. The canonical linear exponential Hawkes process admits a faster $O(N)$ recurrence, but prior work evaluates this recurrence sequentially, without exploiting parallelization on modern GPUs. We show that the Hawkes process intensity can be expressed as a product of sparse transition matrices admitting a linear-time associative multiply, enabling computation via a parallel prefix scan. This yields a massively parallelizable algorithm for estimation of linear exponential Hawkes processes. Our method reduces the computational complexity to approximately $O(N/P)$ with $P$ parallel processors, and naturally yields a batching scheme to maintain constant memory usage, avoiding GPU memory constraints. Importantly, it computes the exact likelihood without any additional assumptions or approximations, preserving the simplicity and interpretability of the model. We demonstrate orders-of-magnitude speedups on simulated and real datasets, scaling to thousands of nodes and tens of millions of events, substantially beyond scales reported in prior work. We provide an open-source PyTorch library implementing our optimizations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.01342v2</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ahmer Raza, Hudson Smith</dc:creator>
    </item>
    <item>
      <title>Malliavin Calculus for Counterfactual Gradient Estimation in Adaptive Inverse Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2604.01345</link>
      <description>arXiv:2604.01345v2 Announce Type: replace 
Abstract: Inverse reinforcement learning (IRL) recovers the loss function of a forward learner from its observed responses. Adaptive IRL aims to reconstruct the loss function of a forward learner by passively observing its gradients as it performs reinforcement learning (RL). This paper proposes a novel passive Langevin-based algorithm that achieves adaptive IRL. The key difficulty in adaptive IRL is that the required gradients in the passive algorithm are counterfactual, that is, they are conditioned on events of probability zero under the forward learner's trajectory. Therefore, naive Monte Carlo estimators are prohibitively inefficient, and kernel smoothing, though common, suffers from slow convergence. We overcome this by employing Malliavin calculus to efficiently estimate the required counterfactual gradients. We reformulate the counterfactual conditioning as a ratio of unconditioned expectations involving Malliavin quantities, thus recovering standard estimation rates. We derive the necessary Malliavin derivatives and their adjoint Skorohod integral formulations for a general Langevin structure, and provide a concrete algorithmic approach which exploits these for counterfactual gradient estimation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.01345v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vikram Krishnamurthy, Luke Snow</dc:creator>
    </item>
    <item>
      <title>From SWE-ZERO to SWE-HERO: Execution-free to Execution-based Fine-tuning for Software Engineering Agents</title>
      <link>https://arxiv.org/abs/2604.01496</link>
      <description>arXiv:2604.01496v2 Announce Type: replace 
Abstract: We introduce SWE-ZERO to SWE-HERO, a two-stage SFT recipe that achieves state-of-the-art results on SWE-bench by distilling open-weight frontier LLMs. Our pipeline replaces resource-heavy dependencies with an evolutionary refinement strategy: (1) SWE-ZERO utilizes large-scale, execution-free trajectories to master code semantics and repository-level reasoning, and (2) SWE-HERO applies targeted, execution-backed refinement to transition these semantic intuitions into rigorous engineering workflows. Our empirical results set a new benchmark for open-source models of comparable size. We release a dataset of 300k SWE-ZERO and 13k SWE-HERO trajectories distilled from Qwen3-Coder-480B, alongside a suite of agents based on the Qwen2.5-Coder series. Notably, SWE-HERO-32B achieves a 62.2% resolution rate on SWE-bench Verified. Furthermore, despite being trained exclusively on Python, our agents demonstrate robust zero-shot transferability on SWE-bench Multilingual, reaching 44.1% and confirming the paradigm's generalizability across diverse languages.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.01496v2</guid>
      <category>cs.SE</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nikolai Ludwig, Wasi Uddin Ahmad, Somshubra Majumdar, Boris Ginsburg</dc:creator>
    </item>
    <item>
      <title>Stable Hermite transforms via the Golub-Welsch algorithm</title>
      <link>https://arxiv.org/abs/2604.02041</link>
      <description>arXiv:2604.02041v2 Announce Type: replace 
Abstract: We introduce an efficient stable algorithm for transforms associated with expansions in Hermite functions interpolated at Hermite polynomial roots. The Hermite transform matrix can be factorised into a diagonal component and an orthogonal matrix, leading to a form which allows both the forward and inverse Hermite transforms to be computed stably. Our novel algorithm computes this factorisation based on the eigendecomposition of the Jacobi matrix associated with Hermite functions. Through numerical experiments, we demonstrate the stability and efficiency gains of this novel method over prior work. Numerical experiments show that the new approach matches or improves on the accuracy of existing stabilized methods, is substantially faster in practice, and enables reliable use of large Hermite expansions in downstream PDE computations. We also provide an open-source implementation, together with reference implementations of previous methods, to facilitate adoption by the community.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.02041v2</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Marcus Webb, Georg Maierhofer</dc:creator>
    </item>
    <item>
      <title>LOCARD: An Agentic Framework for Blockchain Forensics</title>
      <link>https://arxiv.org/abs/2604.04211</link>
      <description>arXiv:2604.04211v2 Announce Type: replace 
Abstract: Blockchain forensics inherently involves dynamic and iterative investigations, while many existing approaches primarily model it through static inference pipelines. We propose a paradigm shift towards Agentic Blockchain Forensics (ABF), modeling forensic investigation as a sequential decision-making process. To instantiate this paradigm, we introduce LOCARD, the first agentic framework for blockchain forensics. LOCARD operationalizes this perspective through a Tri-Core Cognitive Architecture that decouples strategic planning, operational execution, and evaluative validation. Unlike generic LLM-based agents, it incorporates a Structured Belief State mechanism to enforce forensic rigor and guide exploration under explicit state constraints. To demonstrate the efficacy of the ABF paradigm, we apply LOCARD to the inherently complex domain of cross-chain transaction tracing. We introduce Thor25, a benchmark dataset comprising over 151k real-world cross-chain forensic records, and evaluate LOCARD on the Group-Transfer Tracing task for dismantling Sybil clusters. Validated against representative laundering sub-flows from the Bybit hack, LOCARD achieves high-fidelity tracing results, providing empirical evidence that modeling blockchain forensics as an autonomous agentic task is both viable and effective. These results establish a concrete foundation for future agentic approaches to large-scale blockchain forensic analysis. Code and dataset are publicly available at https://github.com/xhyumiracle/locard and https://github.com/xhyumiracle/thorchain-crosschain-data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.04211v2</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaohang Yu, William Knottenbelt</dc:creator>
    </item>
    <item>
      <title>What is Human in Judgment? Comparing Automation Bias and Algorithm Aversion Between the United States Military Academy and the General Public</title>
      <link>https://arxiv.org/abs/2604.04333</link>
      <description>arXiv:2604.04333v2 Announce Type: replace 
Abstract: Human judgment has always been central to conflict and escalation, but how will a world of artificial intelligence (AI) change the role of humans in war? As militaries increasingly adopt AI-enabled decision-support systems (DSS), including the United States in the war against Iran, concerns about automation bias -- over-reliance on algorithmic recommendations -- and algorithm aversion -- premature distrust of automated outputs -- raise fears that relying on AI too much could increase the risk of error, miscalculation, and accidents. Yet existing evidence on how militaries actually interact with AI remains limited. We test theories about the susceptibility of militaries to automation bias by comparing the results from a survey experiment conducted with 236 cadets at the United States Military Academy at West Point to a demographically similar cross-national public sample. Respondents completed a target identification task and then received advice from either an algorithm or a human analyst and had the opportunity to re-assess their initial identification, allowing direct measurement of automation bias and algorithm aversion. We find that West Point cadets are less prone to cognitive distortion than members of the general public, displaying better calibrated trust in algorithmic decision support systems. While the findings are limited, they suggest that military education and exposure to AI can meaningfully shape how AI influences international politics in matters of war and peace.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.04333v2</guid>
      <category>cs.CY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lauren Kahn, Michael C. Horowitz, Laura Resnick Samotin</dc:creator>
    </item>
    <item>
      <title>The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment</title>
      <link>https://arxiv.org/abs/2604.06377</link>
      <description>arXiv:2604.06377v3 Announce Type: replace 
Abstract: We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific behaviors and are transferable across models through linear alignment. Based on this hypothesis, we introduce UNLOCK, a training-free and label-free framework that extracts a capability direction by contrasting activations between capability-present and capability-absent Source variants, aligns it with a Target model through a low-rank linear transformation, and applies it at inference time to elicit the behavior. Experiments on reasoning behaviors, including Chain-of-Thought (CoT) and mathematical reasoning, demonstrate substantial improvements across model scales without training. For example, transferring CoT reasoning from Qwen1.5-14B to Qwen1.5-7B yields an accuracy gain of 12.1% on MATH, and transferring a mathematical reasoning direction from Qwen3-4B-Base to Qwen3-14B-Base improves AGIEval Math accuracy from 61.1% to 71.3%, surpassing the 67.8% achieved by the 14B post-trained model. Our analysis shows that the success of transfer depends on the capabilities learned during pre-training, and that our intervention amplifies latent capabilities by sharpening the output distribution toward successful reasoning trajectories.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.06377v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rishab Balasubramanian, Pin-Jie Lin, Rituraj Sharma, Anjie Fang, Fardin Abdi, Viktor Rozgic, Zheng Du, Mohit Bansal, Tu Vu</dc:creator>
    </item>
    <item>
      <title>ROZA Graphs: Self-Improving Near-Deterministic RAG through Evidence-Centric Feedback</title>
      <link>https://arxiv.org/abs/2604.07595</link>
      <description>arXiv:2604.07595v3 Announce Type: replace 
Abstract: Language model agents reason from scratch on every query, discarding their chain of thought after each run. The result is lower accuracy and high run-to-run variance. We introduce reasoning graphs, which persist the per-evidence chain of thought as structured edges. Unlike prior memory that retrieves distilled strategies by query similarity, reasoning graphs enable evidence-centric feedback: for every candidate item, the system traverses all incoming evaluation edges across prior runs to surface how that specific item has been judged before. We further introduce retrieval graphs, which feed a planner that prunes consistently-rejected candidates over successive runs. Together they form a ROZA graph: a self-improving feedback loop in which accuracy gains scale with gold-passage reuse (reasoning graph) and efficiency gains scale with candidate-pool overlap (retrieval graph). The base model remains frozen; all gains come from context engineering via graph traversal. We evaluate on MuSiQue and HotpotQA, plus a high-reuse deployment subset. Four findings stand out. (1) Dose-response: accuracy improves monotonically with evidence-profile coverage, reaching +10.6pp over Vanilla RAG at 50%+ coverage on the same questions (47% error reduction, $p&lt;0.0001$; per-question Spearman $\rho=+0.144$, $p&lt;10^{-6}$, $n=1{,}100$). (2) Multi-hop scaling: 4-hop accuracy improves by +11.0pp ($p=0.0001$). (3) Cross-cluster prediction: the cluster-level gain is predicted by gold-passage reuse density ($r=0.604$, $p=0.001$, $n=26$ clusters). (4) High-reuse Pareto dominance: highest or tied-for-highest accuracy alongside 46% lower cost and 46% lower latency. Per-passage decision consistency across repeated runs ($N=73$ paired probes, $K=10$ runs each, two model families, three temperatures) rises by +8 to +13pp on a fixed 20-passage context and by +12 to +21pp when the retrieval graph also prunes (all $p&lt;0.005$).</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.07595v3</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Matthew Penaroza</dc:creator>
    </item>
    <item>
      <title>VSAS-Bench: Real-Time Evaluation of Visual Streaming Assistant Models</title>
      <link>https://arxiv.org/abs/2604.07634</link>
      <description>arXiv:2604.07634v2 Announce Type: replace 
Abstract: Streaming vision-language models (VLMs) continuously generate responses given an instruction prompt and an online stream of input frames. This is a core mechanism for real-time visual assistants. Existing VLM frameworks predominantly assess models in offline settings. In contrast, the performance of a streaming VLM depends on additional metrics beyond pure video understanding, including proactiveness, which reflects the timeliness of the model's responses, and consistency, which captures the robustness of its responses over time. To address this limitation, we propose VSAS-Bench, a new framework and benchmark for Visual Streaming Assistants. In contrast to prior benchmarks that primarily employ single-turn question answering on video inputs, VSAS-Bench features temporally dense annotations with over 18,000 annotations across diverse input domains and task types. We introduce standardized synchronous and asynchronous evaluation protocols, along with metrics that isolate and measure distinct capabilities of streaming VLMs. Using this framework, we conduct large-scale evaluations of recent video and streaming VLMs, analyzing the accuracy-latency trade-off under key design factors such as memory buffer length, memory access policy, and input resolution, yielding several practical insights. Finally, we show empirically that conventional VLMs can be adapted to streaming settings without additional training, and demonstrate that these adapted models outperform recent streaming VLMs. For example, Qwen3-VL-4B surpasses Dispider, the best streaming VLM on our benchmark, by 3% under the asynchronous protocol. The benchmark and code will be available at https://github.com/apple/ml-vsas-bench.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.07634v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Pavan Kumar Anasosalu Vasu, Cem Koc, Fartash Faghri, Chun-Liang Li, Bo Feng, Zhengfeng Lai, Meng Cao, Oncel Tuzel, Hadi Pouransari</dc:creator>
    </item>
    <item>
      <title>Governed Capability Evolution for Embodied Agents: Safe Upgrade, Compatibility Checking, and Runtime Rollback for Embodied Capability Modules</title>
      <link>https://arxiv.org/abs/2604.08059</link>
      <description>arXiv:2604.08059v3 Announce Type: replace 
Abstract: Embodied agents are increasingly expected to improve over time by updating their executable capabilities rather than rewriting the agent itself. Prior work has separately studied modular capability packaging, capability evolution, and runtime governance. However, a key systems problem remains underexplored: once an embodied capability module evolves into a new version, how can the hosting system deploy it safely without breaking policy constraints, execution assumptions, or recovery guarantees?
  We formulate governed capability evolution as a first-class systems problem for embodied agents. We propose a lifecycle-aware upgrade framework in which every new capability version is treated as a governed deployment candidate rather than an immediately executable replacement. The framework introduces four upgrade compatibility checks -- interface, policy, behavioral, and recovery -- and organizes them into a staged runtime pipeline comprising candidate validation, sandbox evaluation, shadow deployment, gated activation, online monitoring, and rollback.
  We evaluate over 6 rounds of capability upgrade with 15 random seeds. Naive upgrade achieves 72.9% task success but drives unsafe activation to 60% by the final round; governed upgrade retains comparable success (67.4%) while maintaining zero unsafe activations across all rounds (Wilcoxon p=0.003). Shadow deployment reveals 40% of regressions invisible to sandbox evaluation alone, and rollback succeeds in 79.8% of post-activation drift scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.08059v3</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xue Qin, Simin Luan, John See, Cong Yang, Zhijun Li</dc:creator>
    </item>
    <item>
      <title>ETCH-X: Robustify Expressive Body Fitting to Clothed Humans with Composable Datasets</title>
      <link>https://arxiv.org/abs/2604.08548</link>
      <description>arXiv:2604.08548v3 Announce Type: replace 
Abstract: Human body fitting, which aligns parametric body models such as SMPL to raw 3D point clouds of clothed humans, serves as a crucial first step for downstream tasks like animation and texturing. An effective fitting method should be both locally expressive-capturing fine details such as hands and facial features-and globally robust to handle real-world challenges, including clothing dynamics, pose variations, and noisy or partial inputs. Existing approaches typically excel in only one aspect, lacking an all-in-one solution. We upgrade ETCH to ETCH-X, which leverages a tightness-aware fitting paradigm to filter out clothing dynamics ("undress"), extends expressiveness with SMPL-X, and replaces explicit sparse markers (which are highly sensitive to partial data) with implicit dense correspondences ("dense fit") for more robust and fine-grained body fitting. Our disentangled "undress" and "dense fit" modular stages enable separate and scalable training on composable data sources, including diverse simulated garments (CLOTH3D), large-scale full-body motions (AMASS), and fine-grained hand gestures (InterHand2.6M), improving outfit generalization and pose robustness of both bodies and hands. Our approach achieves robust and expressive fitting across diverse clothing, poses, and levels of input completeness, delivering a substantial performance improvement over ETCH on both: 1) seen data, such as 4D-Dress (MPJPE-All, 33.0% ) and CAPE (V2V-Hands, 35.8% ), and 2) unseen data, such as BEDLAM2.0 (MPJPE-All, 80.8% ; V2V-All, 80.5% ). Code and models will be released at https://xiaobenli00.github.io/ETCH-X/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.08548v3</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaoben Li, Jingyi Wu, Zeyu Cai, Siyuan Yu, Boqian Li, Yuliang Xiu</dc:creator>
    </item>
    <item>
      <title>LABBench2: An Improved Benchmark for AI Systems Performing Biology Research</title>
      <link>https://arxiv.org/abs/2604.09554</link>
      <description>arXiv:2604.09554v2 Announce Type: replace 
Abstract: Optimism for accelerating scientific discovery with AI continues to grow. Current applications of AI in scientific research range from training dedicated foundation models on scientific data to agentic autonomous hypothesis generation systems to AI-driven autonomous labs. The need to measure progress of AI systems in scientific domains correspondingly must not only accelerate, but increasingly shift focus to more real-world capabilities. Beyond rote knowledge and even just reasoning to actually measuring the ability to perform meaningful work. Prior work introduced the Language Agent Biology Benchmark LAB-Bench as an initial attempt at measuring these abilities. Here we introduce an evolution of that benchmark, LABBench2, for measuring real-world capabilities of AI systems performing useful scientific tasks. LABBench2 comprises nearly 1,900 tasks and is, for the most part, a continuation of LAB-Bench, measuring similar capabilities but in more realistic contexts. We evaluate performance of current frontier models, and show that while abilities measured by LAB-Bench and LABBench2 have improved substantially, LABBench2 provides a meaningful jump in difficulty (model-specific accuracy differences range from -26% to -46% across subtasks) and underscores continued room for performance improvement. LABBench2 continues the legacy of LAB-Bench as a de facto benchmark for AI scientific research capabilities and we hope that it continues to help advance development of AI tools for these core research functions. To facilitate community use and development, we provide the task dataset at https://huggingface.co/datasets/futurehouse/labbench2 and a public eval harness at https://github.com/EdisonScientific/labbench2.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.09554v2</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Jon M Laurent, Albert Bou, Michael Pieler, Conor Igoe, Alex Andonian, Siddharth Narayanan, James Braza, Alexandros Sanchez Vassopoulos, Jacob L Steenwyk, Blake Lash, Andrew D White, Samuel G Rodriques</dc:creator>
    </item>
    <item>
      <title>CausalGaze: Unveiling Hallucinations via Counterfactual Graph Intervention in Large Language Models</title>
      <link>https://arxiv.org/abs/2604.11087</link>
      <description>arXiv:2604.11087v2 Announce Type: replace 
Abstract: Despite the groundbreaking advancements made by large language models (LLMs), hallucination remains a critical bottleneck for their deployment in high-stakes domains. Existing classification-based methods mainly rely on static and passive signals from internal states, which often captures the noise and spurious correlations, while overlooking the underlying causal mechanisms. To address this limitation, we shift the paradigm from passive observation to active intervention by introducing CausalGaze, a novel hallucination detection framework based on structural causal models (SCMs). CausalGaze models LLMs' internal states as dynamic causal graphs and employs counterfactual interventions to disentangle causal reasoning paths from incidental noise, thereby enhancing model interpretability. Extensive experiments across four datasets and three widely used LLMs demonstrate the effectiveness of CausalGaze, especially achieving over 5.2\% improvement in AUROC on the TruthfulQA dataset compared to state-of-the-art baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.11087v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Linggang Kong, Lei Wu, Yunlong Zhang, Xiaofeng Zhong, Zhen Wang, Yongjie Wang, Yao Pan</dc:creator>
    </item>
    <item>
      <title>R3-VAE: Reference Vector-Guided Rating Residual Quantization VAE for Generative Recommendation</title>
      <link>https://arxiv.org/abs/2604.11440</link>
      <description>arXiv:2604.11440v3 Announce Type: replace 
Abstract: Generative Recommendation (GR) has gained traction for its merits of superior performance and cold-start capability. As the vital role in GR, Semantic Identifiers (SIDs) represent item semantics through discrete tokens. However, current techniques for SID generation based on vector quantization face two main challenges: (i) training instability, stemming from insufficient gradient propagation through the straight-through estimator and sensitivity to initialization; and (ii) inefficient SID quality assessment, where industrial practice still depends on costly GR training and A/B testing. To address these challenges, we propose Reference Vector-Guided Rating Residual Quantization VAE (R3-VAE). This framework incorporates three key innovations: (i) a reference vector that functions as a semantic anchor for the initial features, thereby mitigating sensitivity to initialization; (ii) a dot product-based rating mechanism designed to stabilize the training process and prevent codebook collapse; and (iii) two SID evaluation metrics, Semantic Cohesion and Preference Discrimination, serving as regularization terms during training. Empirical results on six benchmarks demonstrate that R3-VAE outperforms state-of-the-art methods, achieving an average improvement of 14.5% in Recall@10 and 15.5% in NDCG@10 across three public datasets (Beauty, Sports, and Toys). Furthermore, we perform GR training and online A/B tests on Toutiao. Our method achieves a 1.62% improvement in MRR and a 0.83% gain in StayTime/U versus baselines. Additionally, we employ R3-VAE to replace the item ID of CTR model, resulting in significant improvements in content cold start by 15.36%, corroborating the strong applicability and business value in industry-scale recommendation scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.11440v3</guid>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qiang Wan, Ze Yang, Dawei Yang, Ying Fan, Xin Yan, Siyang Liu, Yicong Liu, Chenwei Zhang, Wei Xu, Jiahao Qin, Ke Wang</dc:creator>
    </item>
    <item>
      <title>BEM: Training-Free Background Embedding Memory for False-Positive Suppression in Real-Time Fixed-Background Camera</title>
      <link>https://arxiv.org/abs/2604.11714</link>
      <description>arXiv:2604.11714v3 Announce Type: replace 
Abstract: Pretrained detectors perform well on benchmarks but often suffer performance degradation in real-world deployments due to distribution gaps between training data and target environments. COCO-like benchmarks emphasize category diversity rather than instance density, causing detectors trained under per-class sparsity to struggle in dense, single- or few-class scenes such as surveillance and traffic monitoring. In fixed-camera environments, the quasi-static background provides a stable, label-free prior that can be exploited at inference to suppress spurious detections. To address the issue, we propose Background Embedding Memory (BEM), a lightweight, training-free, weight-frozen module that can be attached to pretrained detectors during inference. BEM estimates clean background embeddings, maintains a prototype memory, and re-scores detection logits with an inverse-similarity, rank-weighted penalty, effectively reducing false positives while maintaining recall. Empirically, background-frame cosine similarity correlates negatively with object count and positively with Precision-Confidence AUC (P-AUC), motivating its use as a training-free control signal. Across YOLO and RT-DETR families on LLVIP and simulated surveillance streams, BEM consistently reduces false positives while preserving real-time performance. Our code is available at https://github.com/Leo-Park1214/Background-Embedding-Memory.git</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.11714v3</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Junwoo Park, Jangho Lee, Sunho Lim</dc:creator>
    </item>
    <item>
      <title>When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation</title>
      <link>https://arxiv.org/abs/2604.11840</link>
      <description>arXiv:2604.11840v2 Announce Type: replace 
Abstract: Behavioral simulation and strategic problem solving are different tasks. Large language models are increasingly explored as agents in policy-facing institutional simulations, but stronger reasoning need not improve behavioral sampling. We study this solver-sampler mismatch in three multi-agent negotiation environments: two trading-limits scenarios with different authority structures and a grid-curtailment case in emergency electricity management. Across two primary model families, native reasoning and often no reflection collapse toward authority-heavy outcomes. The sharpest case is DeepSeek native reasoning in the grid-curtailment transfer: it reaches action entropy 1.256 and a concession-arc rate of 0.933, yet still ends in authority decision in 15 of 15 runs. A direct OpenAI extension shows the same pressure at provider breadth: GPT-5.2 native reasoning ends in authority decisions in 45 of 45 runs across the three environments. Budget-matched no-reflection controls and orthogonal private-state controls remain rigid, while the negotiation-structured scaffold condition is the only condition that consistently opens negotiated outcomes. These diagnostics are failure screens within a fixed negotiation grammar, not evidence of external behavioral realism or policy-forecasting validity. The results show that neither more output space nor generic extra private state rescues solver-like sampler failure. For institutional simulation, solver strength and sampler qualification are different objectives: models should be evaluated for the behavioral role they are meant to play, not only for strategic capability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.11840v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <category>cs.MA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sandro Andric</dc:creator>
    </item>
    <item>
      <title>DEEP-GAP: Deep-learning Evaluation of Execution Parallelism in GPU Architectural Performance</title>
      <link>https://arxiv.org/abs/2604.14552</link>
      <description>arXiv:2604.14552v2 Announce Type: replace 
Abstract: Modern datacenters increasingly rely on low-power, single-slot inference accelerators to balance performance, energy efficiency, and rack density constraints. The NVIDIA T4 GPU has become widely deployed due to strong performance per watt and mature software support. Its successor, the NVIDIA L4 GPU, introduces improvements in Tensor Core throughput, cache capacity, memory bandwidth, and parallel execution capability. However, limited empirical evidence quantifies the practical inference performance gap between these two generations under controlled and reproducible conditions.
  This work introduces DEEP-GAP, a systematic evaluation extending the GDEV-AI methodology to GPU inference. Using identical configurations and workloads, we evaluate ResNet18, ResNet50, and ResNet101 across FP32, FP16, and INT8 precision modes using PyTorch and TensorRT.
  Results show that reduced precision significantly improves performance, with INT8 achieving up to 58x throughput improvement over CPU baselines. L4 achieves up to 4.4x higher throughput than T4 while reaching peak efficiency at smaller batch sizes between 16 and 32, improving latency-throughput tradeoffs for latency-sensitive workloads. T4 remains competitive for large batch workloads where cost or power efficiency is important.
  DEEP-GAP provides practical guidance for selecting precision modes, batch sizes, and GPU architectures for modern inference deployments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.14552v2</guid>
      <category>cs.PF</category>
      <category>cs.AR</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kathiravan Palaniappan</dc:creator>
    </item>
    <item>
      <title>TurboTalk: Progressive Distillation for One-Step Audio-Driven Talking Avatar Generation</title>
      <link>https://arxiv.org/abs/2604.14580</link>
      <description>arXiv:2604.14580v2 Announce Type: replace 
Abstract: Existing audio-driven video digital human generation models rely on multi-step denoising, resulting in substantial computational overhead that severely limits their deployment in real-world settings. While one-step distillation approaches can significantly accelerate inference, they often suffer from training instability. To address this challenge, we propose TurboTalk, a two-stage progressive distillation framework that effectively compresses a multi-step audio-driven video diffusion model into a single-step generator. We first adopt Distribution Matching Distillation to obtain a strong and stable 4-step student, and then progressively reduce the denoising steps from 4 to 1 through adversarial distillation. To ensure stable training under extreme step reduction, we introduce a progressive timestep sampling strategy and a self-compare adversarial objective that provides an intermediate adversarial reference that stabilizes progressive distillation. Our method achieve single-step generation of video talking avatar, boosting inference speed by 120 times while maintaining high generation quality.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.14580v2</guid>
      <category>cs.CV</category>
      <category>cs.MM</category>
      <category>cs.SD</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiangyu Liu, Feng Gao, Xiaomei Zhang, Yong Zhang, Xiaoming Wei, Zhen Lei, Xiangyu Zhu</dc:creator>
    </item>
    <item>
      <title>Mapping High-Performance Regions in Battery Scheduling across Data Uncertainty, Battery Design, and Planning Horizons</title>
      <link>https://arxiv.org/abs/2604.15360</link>
      <description>arXiv:2604.15360v2 Announce Type: replace 
Abstract: This study presents a controlled parametric framework for analyzing energy storage planning under uncertainty in a multi-stage model predictive control setting. The framework enables a broad and systematic exploration through parametrized generation of synthetic datasets in the context of energy price arbitrage. It facilitates the study of the joint effects of battery characteristics, signal structure, forecast uncertainty, and planning horizon on revenue performance in energy storage optimization, which are rarely considered together. The analysis is driven by two objectives. First, it characterizes how these interacting factors influence operational revenue and its sensitivity to planning horizon selection, including economic losses caused by deviations from optimal horizons. This provides guidance on expected horizon ranges and their impact on revenue and computational cost. Second, it enables a compact parametrization of the relationships between battery properties, data characteristics, forecast uncertainty, and horizon-dependent performance, providing a basis for future modelling of optimal planning horizon length. Results show that the framework captures consistent structural dependencies across configurations and provides meaningful guidance for horizon selection under uncertainty. In particular, increasing forecast uncertainty systematically reduces the optimal planning horizon across battery types, reflecting the diminishing value of long-term information under increasingly unreliable forecasts. Comparison with real market data shows that the parametrization reproduces the main qualitative trends of optimal horizon behavior, suggesting its potential as a lightweight surrogate for more complex simulation-based analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.15360v2</guid>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Jaime de Miguel Rodriguez, Artjom Vargunin, Brigitta Robin Raudne, David Solis Martin, Yaroslava Mykhailenko, Kaarel Oja</dc:creator>
    </item>
    <item>
      <title>MambaBack: Bridging Local Features and Global Contexts in Whole Slide Image Analysis</title>
      <link>https://arxiv.org/abs/2604.15729</link>
      <description>arXiv:2604.15729v2 Announce Type: replace 
Abstract: Whole Slide Image (WSI) analysis is pivotal in computational pathology, enabling cancer diagnosis by integrating morphological and architectural cues across magnifications. Multiple Instance Learning (MIL) serves as the standard framework for WSI analysis. Recently, Mamba has become a promising backbone for MIL, overtaking Transformers due to its efficiency and global context modeling capabilities originating from Natural Language Processing (NLP). However, existing Mamba-based MIL approaches face three critical challenges: (1) disruption of 2D spatial locality during 1D sequence flattening; (2) sub-optimal modeling of fine-grained local cellular structures; and (3) high memory peaks during inference on resource-constrained edge devices. Studies like MambaOut reveal that Mamba's SSM component is redundant for local feature extraction, where Gated CNNs suffice. Recognizing that WSI analysis demands both fine-grained local feature extraction akin to natural images, and global context modeling akin to NLP, we propose MambaBack, a novel hybrid architecture that harmonizes the strengths of Mamba and MambaOut. First, we propose the Hilbert sampling strategy to preserve the 2D spatial locality of tiles within 1D sequences, enhancing the model's spatial perception. Second, we design a hierarchical structure comprising a 1D Gated CNN block based on MambaOut to capture local cellular features, and a BiMamba2 block to aggregate global context, jointly enhancing multi-scale representation. Finally, we implement an asymmetric chunking design, allowing parallel processing during training and chunking-streaming accumulation during inference, minimizing peak memory usage for deployment. Experimental results on five datasets demonstrate that MambaBack outperforms seven state-of-the-art methods. Source code and datasets are publicly available.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.15729v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sicheng Chen, Chad Wong, Tianyi Zhang, Enhui Chai, Zeyu Liu, Fei Xia</dc:creator>
    </item>
    <item>
      <title>SegMix:Shuffle-based Feedback Learning for Semantic Segmentation of Pathology Images</title>
      <link>https://arxiv.org/abs/2604.15777</link>
      <description>arXiv:2604.15777v2 Announce Type: replace 
Abstract: Segmentation is a critical task in computational pathology, as it identifies areas affected by disease or abnormal growth and is essential for diagnosis and treatment. However, acquiring high-quality pixel-level supervised segmentation data requires significant workload demands from experienced pathologists, limiting the application of deep learning. To overcome this challenge, relaxing the label conditions to image-level classification labels allows for more data to be used and more scenarios to be enabled. One approach is to leverage Class Activation Map (CAM) to generate pseudo pixel-level annotations for semantic segmentation with only image-level labels. However, this method fails to thoroughly explore the essential characteristics of pathology images, thus identifying only small areas that are insufficient for pseudo masking. In this paper, we propose a novel shuffle-based feedback learning method inspired by curriculum learning to generate higher-quality pseudo-semantic segmentation masks. Specifically, we perform patch level shuffle of pathology images, with the model adaptively adjusting the shuffle strategy based on feedback from previous learning. Experimental results demonstrate that our proposed approach outperforms state-of-the-arts on three different datasets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.15777v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhiling Yan, Sicheng Chen, Tianyi Zhang, Nan Ying, Yanli Lei, Guanglei Zhang</dc:creator>
    </item>
    <item>
      <title>A Practical Guide to PID Controller Implementation</title>
      <link>https://arxiv.org/abs/2604.15918</link>
      <description>arXiv:2604.15918v2 Announce Type: replace 
Abstract: How difficult can it be to implement a PID controller? The answer is twofold. Implementing the PID control law is simple and computationally inexpensive. However, this basic form will not work in practical applications. The primary reason for this is the various physical limitations of the actuator. Measurement noise, different implementations depending on the various structures (P, PI, PD or PID), bumpless transfer, and varying sampling time also result in problems rendering the basic form inoperable. PID implementation is therefore more difficult than meets the eye. This paper introduces a reference implementation of the PID controller which considers these practical issues. It includes pseudo-code, discussion of the implementation choices and simulation of carefully selected, important test cases.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.15918v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>E. Sundstr\"om, M. Bauer, J. L. Guzm\'an, T. H\"agglund, K. Soltesz</dc:creator>
    </item>
    <item>
      <title>AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems</title>
      <link>https://arxiv.org/abs/2604.16804</link>
      <description>arXiv:2604.16804v2 Announce Type: replace 
Abstract: Optimization problems are central to decision-making in manufacturing, logistics, scheduling, and other industrial settings. Translating complicated descriptions of these problems into solver-ready formulations requires specialized operations research (OR) expertise, making it hard to scale. We present AutoOR, a scalable synthetic data generation and reinforcement learning pipeline that trains LLMs to autoformalize optimization problems specified in natural language across linear, mixed-integer, and non-linear categories. AutoOR generates verified training data from standard optimization forms and uses solver execution feedback as the reward signal for RL post-training. AutoOR applied to an 8B model achieves state-of-the-art or competitive results across six established OR benchmarks, matching significantly larger frontier models. For a non-linear problem class involving physical dynamics, where frontier models score near 0%, we introduce a curriculum RL strategy that bootstraps from limited initial training data to make this class tractable for post-training. We believe that methods such as AutoOR can significantly accelerate industrial decision-making with AI.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.16804v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Sumeet Ramesh Motwani, Chuan Du, Aleksander Petrov, Christopher Davis, Philip Torr, Antonio Papania-Davis, Weishi Yan</dc:creator>
    </item>
    <item>
      <title>Shepherding UAV Swarm with Action Prediction Based on Movement Constraints</title>
      <link>https://arxiv.org/abs/2604.17189</link>
      <description>arXiv:2604.17189v2 Announce Type: replace 
Abstract: In this study, we propose a new sheepdog-inspired control method for a swarm of small unmanned aerial vehicles (UAVs), which predicts the swarm behavior while explicitly accounting for the motion constraints of real robots. Sheepdog-inspired guidance control refers to a framework in which a small number of navigator agents (sheepdog agents) indirectly drive a large number of autonomous agents (a flock of sheep agents) so as to steer the group toward a target position. In conventional studies on sheepdog-inspired guidance, both types of agents have typically been modeled as point masses, and the guidance law for the navigator agents has been designed using simple interaction vectors based on the instantaneous relative positions between the agents. However, when implementing such methods on real robots such as drones, it is necessary to consider each agent's motion constraints, including upper bounds on velocity and acceleration. Moreover, we argue that guidance can be made more efficient by predicting the future behavior of the autonomous swarm that is observable to the navigator agents. To this end, we propose a three-dimensional guidance control law based on behavior prediction of autonomous agents under motion constraints, inspired by the Dynamic Window Approach (DWA). At each control cycle, the navigator agent generates a set of feasible motion candidates that satisfy its motion constraints, and predicts the short-horizon swarm evolution using an internal model of the autonomous agents maintained within the navigator agent. The motion candidates are then evaluated according to criteria such as the progress velocity toward the target, the positioning strategy with respect to the swarm, and safety margins, and the optimal motion is selected to achieve safe and efficient guidance. Numerical simulation results demonstrate the effectiveness of the proposed guidance control law.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.17189v2</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Yusuke Tsunoda, Yusuke Goto, Takao Sato</dc:creator>
    </item>
    <item>
      <title>On Drift Induced by Local Transition Asymmetry in Combinatorial State Spaces</title>
      <link>https://arxiv.org/abs/2604.17332</link>
      <description>arXiv:2604.17332v2 Announce Type: replace 
Abstract: We study stochastic processes on combinatorial state spaces with local transition constraints, as arise in local search algorithms. We show that asymmetry in local transitions induces a systematic drift in a distance process relative to a reference configuration. This drift results from the imbalance between inward and outward transitions, translating combinatorial multiplicities into directional bias. Analyzing the random walk on the Johnson graph, we derive explicit expressions for the drift and expected hitting times. We also show that locality constraints lead to trajectory-level differences that can hinder search trajectories from reaching the target, even under identical stationary distributions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.17332v2</guid>
      <category>cs.CE</category>
      <category>math.PR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Fumio Ishizaki</dc:creator>
    </item>
    <item>
      <title>Compiling Deterministic Structure into SLM Harnesses</title>
      <link>https://arxiv.org/abs/2604.17450</link>
      <description>arXiv:2604.17450v3 Announce Type: replace 
Abstract: Enterprise SLM deployment faces epistemic asymmetry: small models cannot self-correct reasoning errors, while frontier LLMs incur prohibitive costs and data sovereignty risks at scale. We propose Semantic Gradient Descent (SGDe), a teacher-student framework that compiles agentic workflows into discrete execution plans--DAG topologies, system prompts, and deterministic code. The trailing e distinguishes this discrete, compilation-based approach from stochastic gradient descent. Operating in discrete semantic space, a frontier teacher generates natural-language critiques that serve as directional gradients to iteratively refine the SLM's workflow artefacts. We formalise SGDe under PAC learning, establishing sample-complexity bounds that enable convergence with as few as three training examples by leveraging the teacher as a statistical prior. On an adversarially synthesized GSM-Hard test set, compiled workflows achieve 91.3% accuracy at m=5 and 99.3% at m=3--a +26.3% to +34.3% absolute gain over state-of-the-art prompt optimisers. Within harness engineering, SGDe treats deterministic code placement (which subtasks to delegate to Python versus retain as LLM calls) as a trace-driven, per-node optimisation target, generalising static whole-problem offloading in PAL and PoT. The teacher compiles two deterministic structures: capability offloading (delegating subtasks to Python when the SLM is unreliable) and structural consensus (wrapping variance-sensitive steps in fan-out/fan-in subgraphs with deterministic voting).</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.17450v3</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zan Kai Chong, Hiroyuki Ohsaki, Bryan Ng</dc:creator>
    </item>
    <item>
      <title>Homogeneous Network Caching is Fixed-Parameter Tractable Parameterized by the Number of Caches</title>
      <link>https://arxiv.org/abs/2604.17546</link>
      <description>arXiv:2604.17546v2 Announce Type: replace 
Abstract: Network caching asks how to place contents in distributed caches so that future requests are served close to their users. Ganian, Mc Inerney and Tsigkari recently initiated the parameterized-complexity study of the problem and, for the homogeneous unit-size variant (HomNC), isolated an unresolved family of six parameterizations: by the number of caches $C$, the number of users $U$, $U+K$, $C+U$, $C+\lambda$, and the vertex-cover number $\text{vc}(G)$, where $K$ is the maximum cache capacity and $\lambda$ is the maximum number of contents requested with nonzero probability by any user. Their interreducibility theorem showed that these six cases stand or fall together under parameterized reductions, and they conjectured the family to be W[1]-hard. We resolve this conjecture in the opposite direction. We prove that HomNC is fixed-parameter tractable parameterized by $C$ alone, and therefore fixed-parameter tractable for all six parameterizations. Our algorithm is based on an exact $n$-fold integer programming formulation that reveals a nontrivial block structure in homogeneous network caching, with the repeated part depending only on $C$. Standard algorithms for $n$-fold integer programming then yield a running time of the form $f(C)\lvert I\rvert^{O(1)}$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.17546v2</guid>
      <category>cs.DS</category>
      <category>cs.CC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>J\'ozsef Pint\'er, Regina Stangl</dc:creator>
    </item>
    <item>
      <title>FedCRF: A Federated Cross-domain Recommendation Method with Semantic-driven Deep Knowledge Fusion</title>
      <link>https://arxiv.org/abs/2604.17681</link>
      <description>arXiv:2604.17681v2 Announce Type: replace 
Abstract: As user behavior data becomes increasingly scattered across different platforms, achieving cross-domain knowledge fusion while preserving privacy has become a critical issue in recommender systems. Existing PPCDR methods usually rely on overlapping users or items as a bridge, making them inapplicable to non-overlapping scenarios. They also suffer from limitations in the collaborative modeling of global and local semantics. To this end, this paper proposes a Federated Cross-domain Recommendation method with deep knowledge Fusion (FedCRF). Using textual semantics as a cross-domain bridge, FedCRF achieves cross-domain knowledge transfer via federated semantic learning under the non-overlapping scenario. Specifically, FedCRF constructs global semantic clusters on the server side to extract shared semantic information, and designs a FGSAT module on the client side to dynamically adapt to local data distributions and alleviate cross-domain distribution shift. Meanwhile, it builds a semantic graph based on textual features to learn representations that integrate both structural and semantic information, and introduces contrastive learning constraints between global and local semantic representations to enhance semantic consistency and promote deep knowledge fusion. In this framework, only item semantic representations are shared, while user interaction data remains locally stored, effectively mitigating privacy leakage risks. Experimental results on multiple real-world datasets show that FedCRF significantly outperforms existing methods in terms of Recall@20 and NDCG@20, validating its effectiveness and superiority in non-overlapping cross-domain recommendation scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.17681v2</guid>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1016/j.ipm.2026.104827</arxiv:DOI>
      <dc:creator>Lei Guo, Ting Yang, Xu Yu, Xiaohui Han, Guiyuan Jiang, Hui Liu</dc:creator>
    </item>
    <item>
      <title>AgenTEE: Confidential LLM Agent Execution on Edge Devices</title>
      <link>https://arxiv.org/abs/2604.18231</link>
      <description>arXiv:2604.18231v2 Announce Type: replace 
Abstract: Large Language Model (LLM) agents provide powerful automation capabilities, but they also create a substantially broader attack surface than traditional applications due to their tight integration with non-deterministic models and third-party services. While current deployments primarily rely on cloud-hosted services, emerging designs increasingly execute agents directly on edge devices to reduce latency and enhance user privacy. However, securely hosting such complex agent pipelines on edge devices remains challenging. These deployments must protect proprietary assets (e.g., system prompts and model weights) and sensitive runtime state on heterogeneous platforms that are vulnerable to software attacks and potentially controlled by malicious users. To address these challenges, we present AgenTEE, a system for deploying confidential agent pipelines on edge devices. AgenTEE places the agent runtime, inference engine, and third-party applications into independently attested confidential virtual machines (cVMs) and mediates their interaction through explicit, verifiable communication channels. Built on Arm Confidential Compute Architecture (CCA), a recent extension to Arm platforms, AgenTEE enforces strong system-level isolation of sensitive assets and runtime state. Our evaluation shows that such multi-cVMs system is practical, achieving near-native performance with less than 5.15% runtime overhead compared to commodity OS multi-process deployments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.18231v2</guid>
      <category>cs.CR</category>
      <category>cs.OS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3805621.3807660</arxiv:DOI>
      <dc:creator>Sina Abdollahi, Mohammad M Maheri, Javad Forough, Amir Al Sadi, Josh Millar, David Kotz, Marios Kogias, Hamed Haddadi</dc:creator>
    </item>
    <item>
      <title>Enhancing Glass Surface Reconstruction via Depth Prior for Robot Navigation</title>
      <link>https://arxiv.org/abs/2604.18336</link>
      <description>arXiv:2604.18336v2 Announce Type: replace 
Abstract: Indoor robot navigation is often compromised by glass surfaces, which severely corrupt depth sensor measurements. While foundation models like Depth Anything 3 provide excellent geometric priors, they lack an absolute metric scale. We propose a training-free framework that leverages depth foundation models as a structural prior, employing a robust local RANSAC-based alignment to fuse it with raw sensor depth. This naturally avoids contamination from erroneous glass measurements and recovers an accurate metric scale. Furthermore, we introduce \ti{GlassRecon}, a novel RGB-D dataset with geometrically derived ground truth for glass regions. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art baselines, especially under severe sensor depth corruption. The dataset and related code will be released at https://github.com/jarvisyjw/GlassRecon.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.18336v2</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiamin Zheng, Jingwen Yu, Guangcheng Chen, Hong Zhang</dc:creator>
    </item>
    <item>
      <title>River-LLM: Large Language Model Seamless Exit Based on KV Share</title>
      <link>https://arxiv.org/abs/2604.18396</link>
      <description>arXiv:2604.18396v2 Announce Type: replace 
Abstract: Large Language Models (LLMs) have demonstrated exceptional performance across diverse domains but are increasingly constrained by high inference latency. Early Exit has emerged as a promising solution to accelerate inference by dynamically bypassing redundant layers. However, in decoder-only architectures, the efficiency of Early Exit is severely bottlenecked by the KV Cache Absence problem, where skipped layers fail to provide the necessary historical states for subsequent tokens. Existing solutions, such as recomputation or masking, either introduce significant latency overhead or incur severe precision loss, failing to bridge the gap between theoretical layer reduction and practical wall-clock speedup. In this paper, we propose River-LLM, a training-free framework that enables seamless token-level Early Exit. River-LLM introduces a lightweight KV-Shared Exit River that allows the backbone's missing KV cache to be naturally generated and preserved during the exit process, eliminating the need for costly recovery operations. Furthermore, we utilize state transition similarity within decoder blocks to predict cumulative KV errors and guide precise exit decisions. Extensive experiments on mathematical reasoning and code generation tasks demonstrate that River-LLM achieves 1.71 to 2.16 times of practical speedup while maintaining high generation quality.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.18396v2</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yingtao Shen, An Zou</dc:creator>
    </item>
    <item>
      <title>Evaluating LLM-Driven Summarisation of Parliamentary Debates with Computational Argumentation</title>
      <link>https://arxiv.org/abs/2604.19331</link>
      <description>arXiv:2604.19331v2 Announce Type: replace 
Abstract: Understanding how policy is debated and justified in parliament is a fundamental aspect of the democratic process. However, the volume and complexity of such debates mean that outside audiences struggle to engage. Meanwhile, Large Language Models (LLMs) have been shown to enable automated summarisation at scale. While summaries of debates can make parliamentary procedures more accessible, evaluating whether these summaries faithfully communicate argumentative content remains challenging. Existing automated summarisation metrics have been shown to correlate poorly with human judgements of consistency (i.e., faithfulness or alignment between summary and source). In this work, we propose a formal framework for evaluating parliamentary debate summaries that grounds argument structures in the contested proposals up for debate. Our novel approach, driven by computational argumentation, focuses the evaluation on formal properties concerning the faithful preservation of the reasoning presented to justify or oppose policy outcomes. We demonstrate our methods using a case-study of debates from the European Parliament and associated LLM-driven summaries.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.19331v2</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Eoghan Cunningham, Derek Greene, James Cross, Antonio Rago</dc:creator>
    </item>
    <item>
      <title>Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture the Flag Challenges</title>
      <link>https://arxiv.org/abs/2604.19354</link>
      <description>arXiv:2604.19354v2 Announce Type: replace 
Abstract: Large Language Model (LLM) agents are increasingly proposed for autonomous cybersecurity tasks, but their capabilities in realistic offensive settings remain poorly understood. We present DeepRed, an open-source benchmark for evaluating LLM-based agents on realistic Capture The Flag (CTF) challenges in isolated virtualized environments. DeepRed places an agent in a Kali attacker environment with terminal tools and optional web search, connected over a private network to a target challenge, and records full execution traces for analysis. To move beyond binary solved/unsolved outcomes, we introduce a partial-credit scoring method based on challenge-specific checkpoints derived from public writeups, together with an automated summarise-then-judge labelling pipeline for assigning checkpoint completion from logs. Using DeepRed, we benchmark ten commercially accessible LLMs on ten VM-based CTF challenges spanning different challenge categories. The results indicate that current agents remain limited: the best model achieves only 35% average checkpoint completion, performing strongest on common challenge types and weakest on tasks requiring non-standard discovery and longer-horizon adaptation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.19354v2</guid>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ali Al-Kaswan, Maksim Plotnikov, Maxim H\'ajek, Roland V\'izner, Arie van Deursen, Maliheh Izadi</dc:creator>
    </item>
    <item>
      <title>Estimating Power-Law Exponent with Edge Differential Privacy</title>
      <link>https://arxiv.org/abs/2604.20274</link>
      <description>arXiv:2604.20274v2 Announce Type: replace 
Abstract: Many real-world graphs have degree distributions that are well approximated by a power-law, and the corresponding scaling parameter $\alpha$ provides a compact summary of that structure which is useful for graph analysis and system optimization. When graphs contain sensitive relationship data, $\alpha$ must be estimated without revealing information about individual edges. This paper studies power-law exponent estimation under edge differential privacy. Instead of first releasing a noisy degree distribution and then fitting a power-law model, we propose privatizing only the low-dimensional sufficient statistics needed to estimate $\alpha$, thereby avoiding the high distortion introduced by traditional approaches. Using these released statistics, we support both discrete approximation and likelihood-based numerical optimization for efficient parameter estimation. We develop edge-DP algorithms for both centralized and local DP models, compare degree release and log-statistic release in the local setting, and evaluate the resulting methods on various graph datasets across multiple privacy budgets and tail-cutoff settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.20274v2</guid>
      <category>cs.DB</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3807894.3810274</arxiv:DOI>
      <dc:creator>Adam Tan, Mohamed Hefny, Keval Vora</dc:creator>
    </item>
    <item>
      <title>X-Cache: Cross-Chunk Block Caching for Few-Step Autoregressive World Models Inference</title>
      <link>https://arxiv.org/abs/2604.20289</link>
      <description>arXiv:2604.20289v2 Announce Type: replace 
Abstract: Real-time world simulation is becoming a key infrastructure for scalable evaluation and online reinforcement learning of autonomous driving systems. Recent driving world models built on autoregressive video diffusion achieve high-fidelity, controllable multi-camera generation, but their inference cost remains a bottleneck for interactive deployment. However, existing diffusion caching methods are designed for offline video generation with multiple denoising steps, and do not transfer to this scenario. Few-step distilled models have no inter-step redundancy left for these methods to reuse, and sequence-level parallelization techniques require future conditioning that closed-loop interactive generation does not provide. We present X-Cache, a training-free acceleration method that caches along a different axis: across consecutive generation chunks rather than across denoising steps. X-Cache maintains per-block residual caches that persist across chunks, and applies a dual-metric gating mechanism over a structure- and action-aware block-input fingerprint to independently decide whether each block should recompute or reuse its cached residual. To prevent approximation errors from permanently contaminating the autoregressive KV cache, X-Cache identifies KV update chunks (the forward passes that write clean keys and values into the persistent cache) and unconditionally forces full computation on these chunks, cutting off error propagation. We implement X-Cache on X-world, a production multi-camera action-conditioned driving world model built on multi-block causal DiT with few-step denoising and rolling KV cache. X-Cache achieves 71% block skip rate with 2.6x wall-clock speedup while maintaining minimum degradation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.20289v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yixiao Zeng, Jianlei Zheng, Chaoda Zheng, Shijia Chen, Mingdian Liu, Tongping Liu, Tengwei Luo, Yu Zhang, Boyang Wang, Linkun Xu, Siyuan Lu, Bo Tian, Xianming Liu</dc:creator>
    </item>
    <item>
      <title>Caesar: Deep Agentic Web Exploration for Creative Answer Synthesis</title>
      <link>https://arxiv.org/abs/2604.20855</link>
      <description>arXiv:2604.20855v2 Announce Type: replace 
Abstract: To advance from passive retrieval to creative discovery of new ideas, autonomous agents must be capable of deep, associative synthesis. However, current agentic frameworks prioritize convergent search, often resulting in derivative summaries that lack creativity. Caesar is an agentic architecture designed to bridge the gap between information gathering and synthesis of new insights. Unlike existing agents that treat the web as a flat sequence of disconnected documents, Caesar performs a deep web traversal to construct a dynamic knowledge graph. This graph then serves as a navigational scaffold, guiding the agent to diverse, non-obvious information that flat retrieval would never encounter. Caesar thus consists of two components: (1) exploration driven by a dynamic context-aware policy that maximizes information coverage across the web's topological structure, and (2) synthesis through adversarial refinement that actively seeks novel perspectives rather than confirming established priors. Caesar demonstrates the ability to generate artifacts and answers characterized by high novelty and structural coherence, achieving 13% to 23% improvement over state-of-the-art deep research agents in creative synthesis challenges, with strong dominance across all output formats.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.20855v2</guid>
      <category>cs.IR</category>
      <category>cs.MA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jason Liang, Elliot Meyerson, Risto Miikkulainen</dc:creator>
    </item>
    <item>
      <title>Scale-Parameter Selection in Gaussian Kolmogorov-Arnold Networks</title>
      <link>https://arxiv.org/abs/2604.21174</link>
      <description>arXiv:2604.21174v2 Announce Type: replace 
Abstract: Kolmogorov--Arnold Networks (KANs) have recently attracted attention as edge-based neural architectures in which learnable univariate functions replace conventional fixed activation functions. A key source of flexibility in KANs is the choice of basis functions used to parameterize the learnable edge functions. In this context, Gaussian basis functions provide a simple and efficient alternative to splines. However, their performance depends strongly on the scale (shape) parameter \(\epsilon\), whose role has not been studied systematically. In this paper, we investigate how \(\epsilon\) affects Gaussian KANs through first-layer feature geometry, conditioning, and approximation behavior. Our central observation is that scale selection is governed primarily by the first layer, since it is the only layer constructed directly on the input domain and any loss of distinguishability introduced there cannot be recovered by later layers. From this viewpoint, we analyze the first-layer feature matrix and identify a practical operating interval, \[ \epsilon \in \left[\frac{1}{G-1},\frac{2}{G-1}\right], \] where \(G\) denotes the number of Gaussian centers. We interpret this interval not as a universal optimality result, but as a stable and effective design rule, and validate it through brute-force sweeps over \(\epsilon\) across function-approximation problems with different collocation densities, grid resolutions, network architectures, and input dimensions, as well as physics-informed problems. We further show that this range is useful for fixed-scale selection, variable-scale constructions, constrained training of \(\epsilon\), and efficient scale search using early training MSE. In this way, the paper positions scale selection as a practical design principle for Gaussian KANs rather than as an ad hoc hyperparameter choice.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.21174v2</guid>
      <category>cs.CE</category>
      <category>cs.AI</category>
      <category>math.AP</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Amir Noorizadegan, Sifan Wang</dc:creator>
    </item>
    <item>
      <title>CAP: Controllable Alignment Prompting for Unlearning in LLMs</title>
      <link>https://arxiv.org/abs/2604.21251</link>
      <description>arXiv:2604.21251v3 Announce Type: replace 
Abstract: Large language models (LLMs) trained on unfiltered corpora inherently risk retaining sensitive information, necessitating selective knowledge unlearning for regulatory compliance and ethical safety. However, existing parameter-modifying methods face fundamental limitations: high computational costs, uncontrollable forgetting boundaries, and strict dependency on model weight access. These constraints render them impractical for closed-source models, yet current non-invasive alternatives remain unsystematic and reliant on empirical experience. To address these challenges, we propose the Controllable Alignment Prompting for Unlearning (CAP) framework, an end-to-end prompt-driven unlearning paradigm. CAP decouples unlearning into a learnable prompt optimization process via reinforcement learning, where a prompt generator collaborates with the LLM to suppress target knowledge while preserving general capabilities selectively. This approach enables reversible knowledge restoration through prompt revocation. Extensive experiments demonstrate that CAP achieves precise, controllable unlearning without updating model parameters, establishing a dynamic alignment mechanism that overcomes the transferability limitations of prior methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.21251v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhaokun Wang, Jinyu Guo, Jingwen Pu, Hongli Pu, Meng Yang, Xunlei Chen, Jie Ou, Wenyi Li, Guangchun Luo, Wenhong Tian</dc:creator>
    </item>
    <item>
      <title>Sub-Token Routing in LoRA for Adaptation and Query-Aware KV Compression</title>
      <link>https://arxiv.org/abs/2604.21335</link>
      <description>arXiv:2604.21335v2 Announce Type: replace 
Abstract: Sub-token routing provides a finer compression axis for transformer efficiency than the coarse units used in most prior work, such as tokens, pages, heads, or layers. In this paper, we study routing within a token representation itself in LoRA-adapted transformers. We consider two settings. In the query-independent setting, we combine routed subspace LoRA with value-group routing on the KV path for compression-aware language modeling. In the query-aware setting, we use a predictor-based selector to allocate a global retention budget over context-token/value-group pairs using query-conditioned relevance. Experiments show that the query-independent design improves language-model quality under reduced KV budgets, while the query-aware design preserves downstream behavior well under KV compression. We further show that sub-token routing is most effective as a complementary compression axis to token-level query-aware selection: token-level methods decide which tokens survive globally, while sub-token routing determines how the surviving tokens are compressed internally. Their combination enables deeper KV compression at nearly unchanged task accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.21335v2</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Wei Jiang, Wei Wang</dc:creator>
    </item>
    <item>
      <title>CARE: Counselor-Aligned Response Engine for Online Mental-Health Support</title>
      <link>https://arxiv.org/abs/2604.21352</link>
      <description>arXiv:2604.21352v2 Announce Type: replace 
Abstract: Mental health challenges are increasing worldwide, straining emotional support services and leading to counselor overload. This can result in delayed responses during critical situations, such as suicidal ideation, where timely intervention is essential. While large language models (LLMs) have shown strong generative capabilities, their application in low-resource languages, especially in sensitive domains like mental health, remains underexplored. Furthermore, existing LLM-based agents often struggle to replicate the supportive language and intervention strategies used by professionals due to a lack of training on large-scale, real-world datasets.
  To address this, we propose CARE (Counselor-Aligned Response Engine), a GenAI framework that assists counselors by generating real-time, psychologically aligned response recommendations. CARE fine-tunes open-source LLMs separately for Hebrew and Arabic using curated subsets of real-world crisis conversations. The training data consists of sessions rated as highly effective by professional counselors, enabling the models to capture interaction patterns associated with successful de-escalation. By training on complete conversation histories, CARE maintains the evolving emotional context and dynamic structure of counselor-help-seeker dialogue.
  In experimental settings, CARE demonstrates stronger semantic and strategic alignment with gold-standard counselor responses compared to non-specialized LLMs. These findings suggest that domain-specific fine-tuning on expert-validated data can significantly support counselor workflows and improve care quality in low-resource language contexts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.21352v2</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hagai Astrin, Ayal Swaid, Avi Segal, Kobi Gal</dc:creator>
    </item>
    <item>
      <title>Load constrained wind farm flow control through multi-objective multi-agent reinforcement learning</title>
      <link>https://arxiv.org/abs/2604.22795</link>
      <description>arXiv:2604.22795v2 Announce Type: replace 
Abstract: This study presents a multi-agent reinforcement learning (MARL) framework for load-constrained wind farm flow control (WFFC). While wake steering can enhance total wind farm power, it often introduces increased structural loads on downstream turbines. To address this, we integrate an Independent Soft Actor-Critic (I-SAC) architecture with a data-driven, local inflow sector-averaged surrogate model to provide real-time estimates of Damage Equivalent Loads (DELs). By incorporating these estimates into a shaped reward function, turbine-specific agents are trained to maximize power production while adhering to specific load-increase thresholds ($\Delta_{max}$) of 10%, 20%, and 30% relative to a baseline controller. The framework is implemented within the WindGym environment using the DYNAMIKS flow solver with Dynamic Wake Meandering (DWM) model to capture non-stationary wake physics. Results indicate that the MARL agents successfully learn collaborative policies that prioritise power gain while actively retreating from high-DEL control strategies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.22795v2</guid>
      <category>eess.SY</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Teodor {\AA}strand, Marcus Binder Nilsen, Iasonas Tsaklis, Tuhfe G\"o\c{c}men, Pierre-Elouan R\'ethor\'e, Nikolay Dimitrov</dc:creator>
    </item>
    <item>
      <title>Code Broker: A Multi-Agent System for Automated Code Quality Assessment</title>
      <link>https://arxiv.org/abs/2604.23088</link>
      <description>arXiv:2604.23088v2 Announce Type: replace 
Abstract: We present Code Broker, a multi agent system built on Google s Agent Development Kit ADK that analyses Python source code from individual files, local directory trees, or remote GitHub repositories and generates structured, actionable quality assessment reports. The system realises a hierarchical five agent architecture in which a root orchestrator coordinates a sequential pipeline agent that, in turn, dispatches three specialised agents concurrently a Correctness Assessor, a Style Assessor, and a Description Generator before synthesising their findings through an Improvement Recommender. Reports quantify four quality dimensions correctness, security, style, and maintainability on a normalised scale and are rendered in both Markdown and HTML for integration into diverse developer workflows. Code Broker fuses LLM based semantic reasoning with deterministic static analysis signals from Pylint, employs asynchronous execution with exponential backoff retry logic to improve robustness under transient API failures, and explores lightweight session memory for retaining and querying prior assessment context across runs. We frame this paper as a technical report on system design, prompt engineering, and tool orchestration, and present a preliminary qualitative evaluation on representative Python codebases of varying scale. The results indicate that parallel specialised agents produce readable, developer oriented feedback that complements traditional linting, while also foregrounding current limitations in evaluation depth, security tooling, large repository handling, and the exclusive reliance on in memory persistence. All code and reproducibility materials are publicly available: https://github.com/Samir-atra/agents_intensive_dev.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.23088v2</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.PL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Samer Attrah</dc:creator>
    </item>
    <item>
      <title>A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework</title>
      <link>https://arxiv.org/abs/2604.23338</link>
      <description>arXiv:2604.23338v2 Announce Type: replace 
Abstract: Agentic AI systems introduce a security surface that is qualitatively different from that of stateless LLMs. They persist memory, invoke external tools, coordinate with peer agents, and operate across sessions, allowing attacks to emerge not only at the prompt interface but also through architectural state, delegated authority, and long-horizon interactions. Existing security taxonomies, however, primarily organize threats by attack type, such as prompt injection or jailbreaking, and therefore obscure where in the agentic stack a threat arises and over what timescale it manifests.
  We propose the Layered Attack Surface Model (\lasm), a structural taxonomy for agentic AI security. \lasm decomposes the agentic stack into seven layers -- Foundation, Cognitive, Memory, Tool Execution, Multi-Agent Coordination, Ecosystem, and Governance -- and augments them with a four-class temporality axis covering instantaneous, session-persistent, cross-session cumulative, and sub-session-stack threats. We use this 7$\times$4 framework to analyze 116 papers from 2021--2026. The resulting map reveals that the upper layers of the agentic stack remain sharply under-explored, especially for long-horizon and stack-propagating threats; multiple documented attack regions have no corresponding defenses; and current benchmarks provide no coverage for cross-session or sub-session-stack failure modes.
  We further derive a cross-layer defense taxonomy, defense recipes for canonical attack classes, and a dependency DAG that separates near-term engineering gaps from fundamental research challenges. We release the per-paper coding, robustness scripts, and a reference Agent Bill of Materials schema to support reproducible analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.23338v2</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kexin Chu</dc:creator>
    </item>
    <item>
      <title>Supernodes and Halos: Loss-Critical Hubs in LLM Feed-Forward Layers</title>
      <link>https://arxiv.org/abs/2604.23475</link>
      <description>arXiv:2604.23475v2 Announce Type: replace 
Abstract: We study the organization of channel-level importance in transformer feed-forward networks (FFNs). Using a Fisher-style loss proxy (LP) based on activation-gradient second moments, we show that loss sensitivity is concentrated in a small set of channels within each layer. In Llama-3.1-8B, the top 1% of channels per layer accounts for a median of 58.7% of LP mass, with a range of 33.0% to 86.1%. We call these loss-critical channels supernodes. Although FFN layers also contain strong activation outliers, LP-defined supernodes overlap only weakly with activation-defined outliers and are not explained by activation power or weight norms alone. Around this core, we find a weaker but consistent halo structure: some non-supernode channels share the supernodes' write support and show stronger redundancy with the protected core. We use one-shot structured FFN pruning as a diagnostic test of this organization. At 50% FFN sparsity, baselines that prune many supernodes degrade sharply, whereas our SCAR variants explicitly protect the supernode core; the strongest variant, SCAR-Prot, reaches perplexity 54.8 compared with 989.2 for Wanda-channel. The LP-concentration pattern appears across Mistral-7B, Llama-2-7B, and Qwen2-7B, remains visible in targeted Llama-3.1-70B experiments, and increases during OLMo-2-7B pretraining. These results suggest that LLM FFNs develop a small learned core of loss-critical channels, and that preserving this core is important for reliable structured pruning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.23475v2</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Audrey Cherilyn, Houman Safaai</dc:creator>
    </item>
    <item>
      <title>Mammographic Lesion Segmentation with Lightweight Models: A Comparative Study</title>
      <link>https://arxiv.org/abs/2604.23899</link>
      <description>arXiv:2604.23899v2 Announce Type: replace 
Abstract: Breast cancer is a leading cause of cancer-related mortality among women worldwide, with mammography as the primary screening tool. While deep learning models have shown strong performance in lesion segmentation, most rely on computationally intensive architectures that limit their use in resource-constrained environments. This study evaluates the performance and efficiency of lightweight models for mammographic lesion segmentation. Architectures including MobileNetV2, EfficientNet Lite, FPN, and Fast-SCNN were compared against a U-Net baseline using the INbreast dataset with 5-fold cross-validation. Performance was assessed using Dice score, Intersection over Union (IoU), and Recall, alongside model complexity. MobileNetV2 with Squeeze-and-Excitation (SCSE) achieved the best performance, with a Dice score of 0.5766 while using approximately 75% fewer parameters than U-Net. Cross-dataset evaluation on the DMID dataset showed reduced accuracy due to domain shift but preserved recall. These results demonstrate that lightweight architectures offer a practical balance between performance and efficiency for deployable CAD systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.23899v2</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Helder Oliveira</dc:creator>
    </item>
    <item>
      <title>Vulnerability Identification by Harnessing Inter-connected Multi-Source Information</title>
      <link>https://arxiv.org/abs/2604.24028</link>
      <description>arXiv:2604.24028v2 Announce Type: replace 
Abstract: The utilization of third-party open-source libraries is widespread in modern software development. Due to the dependency relationships, vulnerabilities within open-source libraries pose significant security threats to downstream software. However, the library vulnerabilities are usually implicitly reported and patched, without explicit notification to dependent software, leaving the downstream software vulnerable to potential attacks. Existing research efforts primarily focus on identifying vulnerability patches according to bug reports, commit messages, or code changes, overlooking the rich semantic connections among various sources of information. In this paper, our main insight is that various sources of information, including the vulnerability descriptions (e.g., bug reports) and its fixing strategies (e.g., commit messages and code changes), are highly interconnected. They express the high-level semantic information about the symptom, root cause and fixing strategies of the bugs. Hence, we propose an approach that involves training an AI model to integrate multiple sources, thus enhancing the effectiveness of vulnerability identification and vulnerability type classification. We introduce VPFinder, a tool that utilizes multi-head attention mechanisms to extract high-level semantic information from diverse sources. Evaluation results demonstrate that VPFinder achieves remarkable 0.941 F1-score in vulnerability identification task and 0.610 F1-score in vulnerability type classification task, outperforming state-of-the-art approaches by 5.4%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.24028v2</guid>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Liyou Chen, Hailong Sun, Xiang Gao, Lin Shi, Yixin Yang, Yi Xu</dc:creator>
    </item>
    <item>
      <title>ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning</title>
      <link>https://arxiv.org/abs/2604.24300</link>
      <description>arXiv:2604.24300v2 Announce Type: replace 
Abstract: Current evaluations of spatial intelligence can be systematically invalid under modern vision-language model (VLM) settings. First, many benchmarks derive question-answer (QA) pairs from point-cloud-based 3D annotations originally curated for traditional 3D perception. When such annotations are treated as ground truth for video-based evaluation, reconstruction and annotation artifacts can miss objects that are clearly visible in the video, mislabel object identities, or corrupt geometry-dependent answers (e.g., size), yielding incorrect or ambiguous QA pairs. Second, evaluations often assume full-scene access, while many VLMs operate on sparsely sampled frames (e.g., 16-64), making many questions effectively unanswerable under the actual model inputs. We improve evaluation validity by introducing ReVSI, a benchmark and protocol that ensures each QA pair is answerable and correct under the model's actual inputs. To this end, we re-annotate objects and geometry across 381 scenes from 5 datasets to improve data quality, and regenerate all QA pairs with rigorous bias mitigation and human verification using professional 3D annotation tools. We further enhance evaluation controllability by providing variants across multiple frame budgets (16/32/64/all) and fine-grained object visibility metadata, enabling controlled diagnostic analyses. Evaluations of general and domain-specific VLMs on ReVSI reveal systematic failure modes that are obscured by prior benchmarks, yielding a more reliable and diagnostic assessment of spatial intelligence.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.24300v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yiming Zhang, Jiacheng Chen, Jiaqi Tan, Yongsen Mao, Wenhu Chen, Angel X. Chang</dc:creator>
    </item>
    <item>
      <title>Less Is More: Engineering Challenges of On-Device Small Language Model Integration in a Mobile Application</title>
      <link>https://arxiv.org/abs/2604.24636</link>
      <description>arXiv:2604.24636v2 Announce Type: replace 
Abstract: On-device Small Language Models (SLMs) promise fully offline, private AI experiences for mobile users (no cloud dependency, no data leaving the device). But is this promise achievable in practice? This paper presents a longitudinal practitioner case study documenting the engineering challenges of integrating SLMs (Gemma 4 E2B, 2.6B parameters; Qwen3 0.6B, 600M parameters) into Palabrita, a production Android word-guessing game. Over a 5-day development sprint comprising 204 commits (~90 directly AI-related), the system underwent a radical transformation: from an ambitious design where the LLM generated complete structured puzzles (word, category, difficulty, and five hints as JSON) to a pragmatic architecture where curated word lists provide the words and the LLM generates only three short hints, with a deterministic fallback if it fails. We identify five categories of failures specific to on-device SLM integration: output format violations, constraint violations, context quality degradation, latency incompatibility, and model selection instability. For each failure category, we document the observed symptoms, root causes, and the prompt engineering and architectural strategies that effectively mitigated them, including multi-layer defensive parsing, contextual retry with failure feedback, session rotation, progressive prompt hardening, and systematic responsibility reduction. Our findings demonstrate that on-device SLMs are viable for production mobile applications, but only when the developer accepts a fundamental constraint: the most reliable on-device LLM feature is one where the LLM does the least. We distill our experience into eight actionable design heuristics for practitioners integrating SLMs into mobile apps.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.24636v2</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>William Oliveira</dc:creator>
    </item>
    <item>
      <title>Knowledge Distillation Must Account for What It Loses</title>
      <link>https://arxiv.org/abs/2604.25110</link>
      <description>arXiv:2604.25110v2 Announce Type: replace 
Abstract: This position paper argues that knowledge distillation must account for what it loses: student models should be judged not only by retained task scores, but by whether they preserve the teacher capabilities that make those scores reliable. This matters because distillation is increasingly used to turn large teacher models into deployable students, yet headline metrics can obscure losses in the capabilities that make teacher behavior reliable. Conceptually, we show that current evaluation often assumes retained task scores imply retained teacher capabilities. Reframing distillation as a lossy projection exposes this flaw: students may match selected teacher observables without preserving the capabilities that make them reliable. We then synthesize existing evidence into a taxonomy of off-metric distillation losses, showing that such losses are concrete, recurring, and measurable, yet often unaccounted for when studies report what students retain rather than what they lose. To make the position actionable, we propose scenario-specific preservation targets and a Distillation Loss Statement that reports what was preserved, what was lost, and why the remaining losses are acceptable. The goal is not lossless distillation, but accountable distillation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.25110v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wenshuo Wang</dc:creator>
    </item>
    <item>
      <title>GPT-Image-2 in the Wild: A Twitter Dataset of Self-Reported AI-Generated Images from the First Week of Deployment</title>
      <link>https://arxiv.org/abs/2604.25370</link>
      <description>arXiv:2604.25370v2 Announce Type: replace 
Abstract: The release of GPT-image-2 by OpenAI marks a watershed moment in AI-generated imagery: the boundary between photographic reality and synthetic content has never been more difficult to discern. We introduce the GPT-Image-2 Twitter Dataset, the first published dataset of GPT-image-2 generated images, sourced from publicly available Twitter/X posts in the immediate aftermath of the model's April 21, 2026 release. Leveraging the Twitter API v2 and a multi-stage curation pipeline spanning multilingual text heuristics (English, Japanese, and Chinese), browser-automated Twitter "Made with AI" badge verification, and model name variant matching, we curate 10,217 confirmed GPT-image-2 images from 27,662 collected records over a six-day window.
  We characterize the dataset across four analyses: CLIP-based zero-shot subject taxonomy, OCR text legibility (82.0% of images contain detectable text), face detection (59.2% of images, 22,583 total faces), and semantic clustering (137 CLIP ViT-L/14 clusters).
  A key negative result is that C2PA content credentials are systematically stripped by Twitter's CDN on upload, rendering cryptographic provenance verification infeasible for social-media-sourced AI images. The dataset and all curation code are released publicly.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.25370v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Kidus Zewde, Simiao Ren, Xingyu Shen, Jiaqi Wu, Yuchen Zhou, Tommy Duong, Zikang Zhang, Ethan Traister, Kewen Xie</dc:creator>
    </item>
    <item>
      <title>Prime-Field PINI: Machine-Checked Composition Theorems for Post-Quantum NTT Masking</title>
      <link>https://arxiv.org/abs/2604.25878</link>
      <description>arXiv:2604.25878v2 Announce Type: replace 
Abstract: This is Paper 6 of a series of formally-verified analyses of masked NTT hardware for post-quantum cryptography; Paper 1 [1] established structural dependency analysis of the QANARY platform, and Paper 2 [2] quantified security margins under partial NTT masking. Boolean masking composition is well-understood through NI, SNI, and PINI. Arithmetic masking over $\mathbb{Z}_q$ for prime $q$, the foundation of NTT-based post-quantum cryptography, has lacked an analogous theory. We prove, to our knowledge, the first machine-checked composition theorems for arithmetic masking over prime fields. Our key insight is the renewal argument: when a fresh random mask is applied between two pipeline stages, the intermediate wire becomes perfectly uniform regardless of Stage 1's security parameter. For two PF-PINI gadgets with parameters $k_1$ and $k_2$, the composed two-stage pipeline with fresh masking satisfies PF-PINI($k_2$), Stage 1's multiplicity is completely erased from the composed output. Without fresh masking, intermediate wires have multiplicity up to $k_1$, creating a necessary condition for differential power analysis. We formalize both theorems in Lean 4 with 18 machine-checked proofs and zero sorry stubs. We formally bridge the algebraic and hardware-faithful arithmetic models of Barrett reduction, and instantiate the theorems to formally diagnose Microsoft's Adams Bridge PQC accelerator: its absence of fresh inter-stage masking leaves Barrett output wires non-uniform under the first-order probing model, the same architectural flaw that two independent empirical analyses [3, 4] and our own prior structural analysis [1] identified. Computational evidence further suggests the 1-Bit Barrier is universal across Barrett and Montgomery reductions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.25878v2</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ray Iskander, Khaled Kirah</dc:creator>
    </item>
    <item>
      <title>LLM-Guided Issue Generation from Uncovered Code Segments</title>
      <link>https://arxiv.org/abs/2604.26118</link>
      <description>arXiv:2604.26118v2 Announce Type: replace 
Abstract: Developers are increasingly overwhelmed by AI-generated issue reports that lack actionability and reproducibility, eroding trust in automated bug detection tools. In this paper, we present IssueSpecter, an automated tool that finds bugs in uncovered code segments and automatically generates prioritized, actionable issue reports. IssueSpecter combines coverage analysis with LLM-based defect identification, producing structured reports complete with severity ratings, reproduction steps, and suggested fixes. We evaluate IssueSpecter on 13 actively maintained Python projects, generating 10,467 issue reports. Manual annotation of the top-130 ranked issues by IssueSpecter confirms that 84.6% of the LLM-generated issues are valid or warrant further investigation, with only 15.4% false positives. LLM-based ranking outperforms rule-based ranking by 50% at P@3 and 41% in MRR. The identified bugs cover a wide variety of types, from logic and boundary errors to security vulnerabilities and state consistency bugs. By ranking issues by priority, IssueSpecter aims to help developers focus their attention on the most impactful bugs first. Finally, we validate IssueSpecter through case studies reproducing real bugs surfaced from its generated issue reports, demonstrating its practical value for automatic bug discovery in open-source Python projects. Compared against CoverUp, a state-of-the-art coverage-driven test generation tool, IssueSpecter achieves a higher bug validity rate (81.0% vs. 76.2%) under identical evaluation conditions, using the same model and the same number of evaluated artifacts per project, while additionally providing structured issue reports with reproduction steps and candidate fixes that are immediately actionable without requiring developers to interpret generated test intent.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26118v2</guid>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Diany Pressato, Honghao Tan, Mariam Elmoazen, Shin Hwei Tan</dc:creator>
    </item>
    <item>
      <title>Co-Learning Port-Hamiltonian Systems and Optimal Energy-Shaping Control</title>
      <link>https://arxiv.org/abs/2604.26172</link>
      <description>arXiv:2604.26172v2 Announce Type: replace 
Abstract: We develop a physics-informed learning framework for energy-shaping control of port-Hamiltonian (pH) systems from trajectory data. The proposed approach co-learns a pH system model and an optimal energy-balancing passivity-based controller (EB-PBC) through alternating optimization with policy-aware data collection. At each iteration, the system model is refined using trajectory data collected under the current control policy, and the controller is re-optimized on the updated model. Both components are parameterized by neural networks that embed the pH dynamics and EB-PBC structure, ensuring interpretability in terms of energy interactions. The learned controller renders the closed-loop system inherently passive and provably stable, and exploits passive plant dynamics without canceling the natural potential. A dissipation regularization enforces strict energy decay during training, thereby enhancing robustness to sim-to-real gaps. The proposed framework is validated on state-regulation and swing-up tasks for planar and torsional pendulum systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26172v2</guid>
      <category>eess.SY</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>math.OC</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ankur Kamboj, Biswadip Dey, Vaibhav Srivastava</dc:creator>
    </item>
    <item>
      <title>StreamGuard: Exploring a 5G Architecture for Efficient, Quality of Experience-Aware Video Conferencing</title>
      <link>https://arxiv.org/abs/2604.26223</link>
      <description>arXiv:2604.26223v3 Announce Type: replace 
Abstract: Video conferencing over 5G is increasingly prevalent, yet its Quality of Experience (QoE) often degrades under limited radio resources. This has two causes: 5G networks must serve many users, while interactive traffic requires careful handling. Motivated by the insight that different subflows within an interactive session have a disproportionate effect on QoE, we present the design and implementation of StreamGuard, a practical 5G architecture for subflow-level, QoE-aware prioritization. StreamGuard forms a closed control loop with three components: (1) a monitor in the Radio Access Network (RAN) that uses deep packet inspection to infer QoE and RAN state, (2) a controller that selects prioritization actions to balance QoE and fairness, and (3) a marking module that applies these decisions by marking packets to steer subflows into appropriate priority queues. StreamGuard further shapes application behaviors via mechanisms including selective subflow dropping and probe-based rate control, to align application behavior with radio constraints. Implemented in a real 5G testbed, StreamGuard achieves a superior QoE-fairness tradeoff compared to vanilla 5G and prior state-of-the-art approaches, improving QoE by up to 70% at comparable background throughput or preserving up to 2x higher background throughput at similar QoE.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26223v3</guid>
      <category>cs.NI</category>
      <category>cs.MM</category>
      <category>eess.IV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xuyang Cao, Oliver Michel, Kyle Jamieson</dc:creator>
    </item>
    <item>
      <title>3D Generation for Embodied AI and Robotic Simulation: A Survey</title>
      <link>https://arxiv.org/abs/2604.26509</link>
      <description>arXiv:2604.26509v2 Announce Type: replace 
Abstract: Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has advanced rapidly, embodied applications impose requirements far beyond visual realism: generated objects must carry kinematic structure and material properties, scenes must support interaction and task execution, and the resulting content must bridge the gap between simulation and reality. This survey reviews 3D generation for embodied AI and organizes the literature around three roles that 3D generation plays in embodied systems. In Data Generator, 3D generation produces simulation-ready objects and assets, including articulated, physically grounded, and deformable content for downstream interaction; in Simulation Environments, it constructs interactive and task-oriented worlds, spanning structure-aware, controllable, and agentic scene generation; and in Sim2Real Bridge, it supports digital twin reconstruction, data augmentation, and synthetic demonstrations for downstream robot learning and real-world transfer. We also show that the field is shifting from visual realism toward interaction readiness, and we identify the main bottlenecks, including limited physical annotations, the gap between geometric quality and physical validity, fragmented evaluation, and the persistent sim-to-real divide, that must be addressed for 3D generation to become a dependable foundation for embodied intelligence. Our project page is at https://3dgen4robot.github.io.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26509v2</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tianwei Ye, Yifan Mao, Minwen Liao, Jian Liu, Chunchao Guo, Dazhao Du, Quanxin Shou, Fangqi Zhu, Song Guo</dc:creator>
    </item>
    <item>
      <title>Atomic-Probe Governance for Skill Updates in Compositional Robot Policies</title>
      <link>https://arxiv.org/abs/2604.26689</link>
      <description>arXiv:2604.26689v3 Announce Type: replace 
Abstract: Skill libraries in deployed robotic systems are continually updated through fine-tuning, fresh demonstrations, or domain adaptation, yet existing typed-composition methods (BLADE, SymSkill, Generative Skill Chaining) treat the library as frozen at test time and do not analyze how composition outcomes change when a skill is replaced. We introduce a paired-sampling cross-version swap protocol on robosuite manipulation tasks to characterize this dimension of compositional skill learning. On a dual-arm peg-in-hole task we discover a dominant-skill effect: one ECM achieves 86.7% atomic success rate while every other ECM is at or below 26.7%, and whether this dominant ECM enters a composition shifts the success rate by up to +50pp. We characterize the boundary on a simpler pick task where all atomic policies saturate at 100% and the effect is undefined. Across three tasks we further find that off-policy behavioral distance metrics fail to identify the dominant ECM, ruling out the natural cheap predictor. We propose an atomic-quality probe and a Hybrid Selector combining per-skill probes (zero per-decision cost) with selective composition revalidation (full cost), and characterize its Pareto frontier on 144 skill-update decisions. On T6 the atomic-only probe sits 23pp below full revalidation (64.6% vs 87.5% oracle match) at zero per-decision cost; a Hybrid Selector with m=10 closes most of that gap to ~12pp at 46% of full-revalidation cost. On the cross-task average over 144 events, atomic-only is within 3pp of full revalidation under a mixed-oracle caveat. The atomic-quality probe is, to our knowledge, the first principled, deployment-ready primitive for skill-update governance in compositional robot policies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26689v3</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xue Qin, Simin Luan, John See, Zeyd Boukhers, Cong Yang, Zhijun Li</dc:creator>
    </item>
    <item>
      <title>GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents</title>
      <link>https://arxiv.org/abs/2604.26752</link>
      <description>arXiv:2604.26752v2 Announce Type: replace 
Abstract: We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around this objective: multimodal perception is integrated as a core component of reasoning, planning, tool use, and execution, rather than as an auxiliary interface to a language model. This report summarizes the main improvements behind GLM-5V-Turbo across model design, multimodal training, reinforcement learning, toolchain expansion, and integration with agent frameworks. These developments lead to strong performance in multimodal coding, visual tool use, and framework-based agentic tasks, while preserving competitive text-only coding capability. More importantly, our development process offers practical insights for building multimodal agents, highlighting the central role of multimodal perception, hierarchical optimization, and reliable end-to-end verification.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26752v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator> V Team, Wenyi Hong, Xiaotao Gu, Ziyang Pan, Zhen Yang, Yuting Wang, Yue Wang, Yuanchang Yue, Yu Wang, Yanling Wang, Yan Wang, Xijun Liu, Wenmeng Yu, Weihan Wang, Wei Li, Shuaiqi Duan, Sheng Yang, Ruiliang Lv, Mingdao Liu, Lihang Pan, Ke Ning, Junhui Ji, Jinjiang Wang, Jing Chen, Jiazheng Xu, Jiale Zhu, Jiale Cheng, Ji Qi, Guobing Gan, Guo Wang, Cong Yao, Zijun Dou, Zihao Zhou, Zihan Wang, Zhiqi Ge, Zhijie Li, Zhenyu Hou, Zhao Xue, Zehui Wang, Zehan Qi, Zehai He, Yutao Zhang, Yusen Liu, Yukuo Cen, Yuchen Li, Yuan Wang, Yu Yang, Yongbin Liu, Yijian Lu, Yifan Xu, Yanzi Wang, Yanxiao Zhao, Yanfeng Wang, Yadong Xue, Yabo Xu, Xinyu Zhang, Xinyu Liu, Xiao Liu, Wenyi Zhao, Wenkai Li, Tianyu Tong, Tianshu Zhang, Shudan Zhang, Shengdong Yan, Qinkai Zheng, Mingde Xu, Licheng Bao, lat Long long, Jiaxing Xu, Jiaxin Fan, Jiawen Qian, Jiali Chen, Jiahui Lin, Jiadai Sun, Haozhi Zheng, Haoran Wang, Haochen Li, Hanyu Liu, Han Xu, Fan Yang, Dan Zhang, Da Yin, Chuangxin Zhao, Chengcheng Wu, Boyan Shi, Bowen Lv, Bowei Jia, Bo Li, Bin Chen, Baoxu Wang, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Minlie Huang, Yuxiao Dong, Jie Tang</dc:creator>
    </item>
    <item>
      <title>PM-EKF: A Physiological Model-Based Extended Kalman Filter for Daily-Life Physical Activity Energy Expenditure Estimation</title>
      <link>https://arxiv.org/abs/2604.26803</link>
      <description>arXiv:2604.26803v2 Announce Type: replace 
Abstract: Monitoring physical activity energy expenditure (PAEE) in daily life is essential for characterizing individual health and metabolic status. Although indirect calorimetry provides gold-standard PAEE measurements, it is impractical for continuous daily-life monitoring. Consequently, wearable sensing approaches using inertial measurement units (IMUs) and heart rate (HR) sensors have attracted substantial interest. However, most existing IMU- and HR-based methods are purely data-driven and offer limited physiological interpretability. In this work, we propose a simplified physiological model that explicitly links body movement during activities of daily living to the underlying metabolic gas-exchange processes governing PAEE. The model is formulated as a nonlinear state-space system and embedded within an Extended Kalman Filter (EKF), enabling principled handling of measurement noise, model uncertainty, and system nonlinearities. The proposed framework provides personalized, interpretable PAEE estimates without employing black-box models. Our model was validated using a dataset, including 9 subjects with around 50 minutes of measurements per subject, collected in our lab simulating a free-living condition. Using the respiratory data measured by COSMED K5 as reference and explained variance (R^2) as evaluation metric, our model's predicted PAEE yielded median (min-max) R^2 = 0.72 (0.60--0.87), using three IMUs (pelvis and two thighs) for capturing the body-center-of-mass motion and measured HR for the time-varying cardiac output. Our model outperformed a linear regression (LR) model (R^2 = 0.52 (0.23--0.92)) and CNN-LSTM model (R^2 = 0.65 (0.46--0.78)) on the same dataset. Notably, excluding the sensory HR measurement did not significantly degrade PAEE estimation of all three models, indicating that IMU-captured mechanical workload dominated PAEE estimation performance in our protocol.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26803v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shuhao Que, Remco Poelarends, Valentina Breschi, Ying Wang</dc:creator>
    </item>
    <item>
      <title>The Field of Safe Motion: Operationalizing Affordances in the Field of Safe Travel Using Reachability Analysis</title>
      <link>https://arxiv.org/abs/2604.27168</link>
      <description>arXiv:2604.27168v2 Announce Type: replace 
Abstract: We present the Field of Safe Motion (FSM), a quantitative safety model for determining whether a driver maintains a collision-free escape route, or "out," at any given moment by accounting for that driver's physical capabilities and the foreseeable actions of other road users. The Field of Safe Travel (FST) provides a framework for representing the types of sensory information and actions available to drivers. However, the FST has remained conceptual in nature since its initial publication almost 90 years ago -- and a concrete computational operationalization is still lacking. At the same time, reachability analysis provides a quantitative basis for assessing the possible actions available to road users, using interpretable kinematic models, but reachability models have so far remained confined largely to the engineering and robotics literature. Bringing these two approaches together provides for an interpretable, quantitative tool for assessing driving behavior across a wide range of driving scenarios. Beyond being interpretable, our approach relies on a relatively small set of basic assumptions that are easy to enumerate and reason about. Furthermore, an interpretable reachability model paired with kinematic assumptions provides a way to bound uncertainty about road users' reasonably foreseeable future locations. We demonstrate the applicability of the FSM to different driving scenarios and discuss the strengths and weaknesses of the model.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27168v2</guid>
      <category>cs.RO</category>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Leif Johnson, Trent Victor, Johan Engstr\"om</dc:creator>
    </item>
    <item>
      <title>Path-Lock Expert: Separating Reasoning Mode in Hybrid Thinking via Architecture-Level Separation</title>
      <link>https://arxiv.org/abs/2604.27201</link>
      <description>arXiv:2604.27201v2 Announce Type: replace 
Abstract: Hybrid-thinking language models expose explicit think and no-think modes, but current designs do not separate them cleanly. Even in no-think mode, models often emit long and self-reflective responses, causing reasoning leakage. Existing work reduces this issue through better data curation and multi-stage training, yet leakage remains because both modes are still encoded in the same feed-forward parameters. We propose Path-Lock Expert (PLE), an architecture-level solution that replaces the single MLP in each decoder layer with two semantically locked experts, one for think and one for no-think, while keeping attention, embeddings, normalization, and the language-model head shared. A deterministic control-token router selects exactly one expert path for the entire sequence, so inference preserves the dense model's per-token computation pattern and each expert receives mode-pure updates during supervised fine-tuning. Across math and science reasoning benchmarks, PLE maintains strong think performance while producing a substantially stronger no-think mode that is more accurate, more concise, and far less prone to reasoning leakage. On Qwen3-4B, for example, PLE reduces no-think reflective tokens on AIME24 from 2.54 to 0.39 and improves no-think accuracy from 20.67% to 40.00%, all while preserving think-mode performance. These results suggest that controllable hybrid thinking is fundamentally an architectural problem, and separating mode-specific feed-forward pathways is a simple and effective solution.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27201v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shouren Wang, Wang Yang, Chuang Ma, Debargha Ganguly, Vikash Singh, Chaoda Song, Xinpeng Li, Xianxuan Long, Vipin Chaudhary, Xiaotian Han</dc:creator>
    </item>
    <item>
      <title>WaferSAGE: Large Language Model-Powered Wafer Defect Analysis via Synthetic Data Generation and Rubric-Guided Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2604.27629</link>
      <description>arXiv:2604.27629v2 Announce Type: replace 
Abstract: We present WaferSAGE, a framework for wafer defect visual question answering using small vision-language models. To address data scarcity in semiconductor manufacturing, we propose a three-stage synthesis pipeline incorporating structured rubric generation for precise evaluation. Starting from limited labeled wafer maps, we employ clustering-based cleaning to filter label noise, then generate comprehensive defect descriptions using vision-language models, which are converted into structured evaluation rubrics criteria. These rubrics guide the synthesis of VQA pairs, ensuring coverage across defect type identification, spatial distribution, morphology, and root cause analysis.
  Our dual assessment framework aligns rule-based metrics with LLM-Judge scores via Bayesian optimization, enabling reliable automated evaluation. Through curriculum-based reinforcement learning with Group Sequence Policy Optimization (GSPO) and rubric-aligned rewards, our 4B-parameter Qwen3-VL model achieves a 6.493 LLM-Judge score, closely approaching Gemini-3-Flash (7.149) while enabling complete on-premise deployment. We demonstrate that small models with domain-specific training can surpass proprietary large models in specialized industrial visual understanding, offering a viable path for privacy-preserving, cost-effective deployment in semiconductor manufacturing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27629v2</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ke Xu</dc:creator>
    </item>
    <item>
      <title>Temporal Routing in Static Networks: The Schedule Completion Problem</title>
      <link>https://arxiv.org/abs/2604.27757</link>
      <description>arXiv:2604.27757v2 Announce Type: replace 
Abstract: We introduce the TemporallyEdgeDisjointScheduleCompletion (TEDSC) problem in which we need to cover a set of temporal edge demands $D$ by routing $k$ temporal walks through a directed static graph while remaining temporally edge disjoint. This problem combines the temporal aspects of train routing and passenger demands with the static nature of real-world rail networks. We present a polynomial time algorithm for TEDSC. Motivated by real world constraints, we next investigate two restricted variants of TEDSC in which each walk can only travel for some bounded distance or time $h$. We show that both are tractable when parameterized by $k + h$, but hard for $h$ and $k + |D|$. If we fix the underlying network, the two problems exhibit distinct complexities: The distance variant remains $W[1]$-hard parameterized by $k$ even on a path of three vertices whereas the time variant admits an FPT algorithm on any fixed star. Finally, we show how to approximate the number of required walks up to a factor of $(2-h^{-1})$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27757v2</guid>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Michelle D\"oring, Niklas Mohrin, George Skretas</dc:creator>
    </item>
    <item>
      <title>Probabilistic Circuits for Irregular Multivariate Time Series Forecasting</title>
      <link>https://arxiv.org/abs/2604.27814</link>
      <description>arXiv:2604.27814v2 Announce Type: replace 
Abstract: Joint probabilistic modeling is essential for forecasting irregular multivariate time series (IMTS) to accurately quantify uncertainty. Existing approaches often struggle to balance model expressivity with consistent marginalization, frequently leading to unreliable or contradictory forecasts. To address this, we propose CircuITS, a novel architecture for probabilistic IMTS forecasting based on probabilistic circuits. Our model is flexible in capturing intricate dependencies between time series channels while structurally guaranteeing valid joint distributions. Experiments on four real world datasets demonstrate that CircuITS achieves superior joint and marginal density estimation compared to state of the art baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27814v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Christian Kl\"otergens, Vijaya Krishna Yalavarthi, Lars Schmidt-Thieme</dc:creator>
    </item>
    <item>
      <title>A Brief Overview: Agentic Reinforcement Learning In Large Language Models</title>
      <link>https://arxiv.org/abs/2604.27859</link>
      <description>arXiv:2604.27859v2 Announce Type: replace 
Abstract: Reinforcement Learning (RL) has traditionally focused on training specialized agents to optimize predefined reward functions within narrowly defined environments. However, the advent of powerful Large Language Models (LLMs) and increasingly complex, open-ended tasks has catalyzed a paradigm shift towards agentic paradigms within RL. This emerging framework extends beyond traditional RL by emphasizing the development of autonomous agents capable of goal-setting, long-term planning, dynamic strategy adaptation, and interactive reasoning in uncertain, real-world environments. Unlike conventional approaches that rely heavily on static objectives and episodic interactions, LLM-based Agentic RL incorporates cognitive-like capabilities such as meta-reasoning, self-reflection, and multi-step decision-making directly into the learning loop. In this paper, we provide a deep insight for looking the conceptual foundations, methodological innovations, and effective designs underlying this trend. Furthermore, we identify critical challenges and outline promising future directions for building LLM-based Agentic RL.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27859v2</guid>
      <category>cs.AI</category>
      <category>cs.ET</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Fangming Cui, Ruixiao Zhu, Cheng Fang, Sunan Li, Jiahong Li</dc:creator>
    </item>
    <item>
      <title>In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks</title>
      <link>https://arxiv.org/abs/2604.27891</link>
      <description>arXiv:2604.27891v2 Announce Type: replace 
Abstract: Agent orchestration frameworks -- LangGraph, CrewAI, Google ADK, OpenAI Agents SDK, and others -- place an external orchestrator above the LLM, tracking state and injecting routing instructions at every turn. We present a controlled comparison showing that for procedural tasks, this architecture is dominated by a simpler alternative: putting the entire procedure in the system prompt and letting the model self-orchestrate. Across three domains -- travel booking (14 nodes), Zoom technical support (14 nodes), and insurance claims processing (55 nodes) -- we evaluate 200 conversations per condition using LLM-as-judge scoring on five quality criteria. The in-context approach scores 4.53--5.00 on a 5-point scale while a LangGraph orchestrator using the same model scores 4.17--4.84. The orchestrated system fails on 24% of travel, 9% of Zoom, and 17% of insurance conversations, compared to 11.5%, 0.5%, and 5% for the in-context baseline. While external orchestration may have been necessary for earlier models, advances in frontier model capabilities have made it unnecessary for multi-turn conversations following a defined procedure.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27891v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Simon Dennis, Michael Diamond, Rivaan Patil, Kevin Shabahang, Hao Guo</dc:creator>
    </item>
    <item>
      <title>The Effects of Visual Priming on Cooperative Behavior in Vision-Language Models</title>
      <link>https://arxiv.org/abs/2604.27953</link>
      <description>arXiv:2604.27953v2 Announce Type: replace 
Abstract: As Vision-Language Models (VLMs) become increasingly integrated into decision-making systems, it is essential to understand how visual inputs influence their behavior. This paper investigates the effects of visual priming on VLMs' cooperative behavior using the Iterated Prisoner's Dilemma (IPD) as a test scenario. We examine whether exposure to images depicting behavioral concepts (kindness/helpfulness vs. aggressiveness/selfishness) and color-coded reward matrices alters VLM decision patterns. Experiments were conducted across multiple state-of-the-art VLMs. We further explore mitigation strategies including prompt modifications, Chain of Thought (CoT) reasoning, and visual token reduction. Results show that VLM behavior can be influenced by both image content and color cues, with varying susceptibility and mitigation effectiveness across models. These findings not only underscore the importance of robust evaluation frameworks for VLM deployment in visually rich and safety-critical environments, but also highlight how architectural and training differences among models may lead to distinct behavioral responses-an area worthy of further investigation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27953v2</guid>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kenneth J. K. Ong</dc:creator>
    </item>
    <item>
      <title>LaST-R1: Reinforcing Robotic Manipulation via Adaptive Physical Latent Reasoning</title>
      <link>https://arxiv.org/abs/2604.28192</link>
      <description>arXiv:2604.28192v2 Announce Type: replace 
Abstract: Robotic foundation models require reasoning over complex visual scenes to execute adaptive actions in dynamic environments. While recent studies on latent-reasoning Vision-Language-Action (VLA) models have demonstrated the capability to capture fine-grained physical dynamics, they remain predominantly confined to static imitation learning, severely limiting their adaptability and generalization. In this paper, we present LaST-R1, a novel reinforcement learning (RL) post-training framework designed to effectively harness "latent reasoning-before-acting" policies. Specifically, we propose Latent-to-Action Policy Optimization (LAPO), a core RL algorithm that jointly optimizes the latent reasoning process and the action generation. By explicitly embedding latent Chain-of-Thought (CoT) reasoning directly within the RL optimization loop, LAPO stimulates profound physical world modeling, which in turn drives robust execution in interactive environments. Furthermore, an adaptive latent CoT mechanism is introduced, allowing the policy to dynamically modulate its reasoning horizon based on diverse environment states. Experiments show that LaST-R1 achieves a near-perfect 99.9% average success rate on the LIBERO benchmark with only one-shot supervised warm-up, significantly improving convergence speed and performance over prior state-of-the-art (SOTA) methods. In real-world deployments, LaST-R1 yields up to a 22.5% average improvement over SOTA supervised fine-tuning approach across four complex tasks, including both single-arm and dual-arm settings. Finally, LaST-R1 demonstrates strong generalization across simulated and real-world environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.28192v2</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hao Chen, Jiaming Liu, Zhonghao Yan, Nuowei Han, Renrui Zhang, Chenyang Gu, Jialin Gao, Ziyu Guo, Siyuan Qian, Yinxi Wang, Peng Jia, Shanghang Zhang, Pheng-Ann Heng</dc:creator>
    </item>
    <item>
      <title>Unlearning What Matters: Token-Level Attribution for Precise Language Model Unlearning</title>
      <link>https://arxiv.org/abs/2605.00364</link>
      <description>arXiv:2605.00364v2 Announce Type: replace 
Abstract: Machine unlearning has emerged as a critical capability for addressing privacy, safety, and regulatory concerns in large language models (LLMs). Existing methods operate at the sequence level, applying uniform updates across all tokens despite only a subset encoding the knowledge targeted for removal. This introduces gradient noise, degrades utility, and leads to suboptimal forgetting. We propose TokenUnlearn, a token-level attribution framework that identifies and selectively targets critical tokens. Our approach combines knowledge-aware signals via masking, and entropy-aware signals to yield importance scores for precise token selection. We develop two complementary strategies: hard selection, applying unlearning only to high-importance tokens, and soft weighting, modulating gradient contributions based on importance scores. Both extend existing methods to token-level variants. Theoretical analysis shows token-level selection improves gradient signal-to-noise ratio. Experiments on TOFU and WMDP benchmarks across three model architectures demonstrate consistent improvements over sequence-level baselines in both forgetting effectiveness and utility preservation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00364v2</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiawei Wu, Doudou Zhou</dc:creator>
    </item>
    <item>
      <title>The Power of Order: Fooling LLMs with Adversarial Table Permutations</title>
      <link>https://arxiv.org/abs/2605.00445</link>
      <description>arXiv:2605.00445v2 Announce Type: replace 
Abstract: Large Language Models have achieved remarkable success and are increasingly deployed in critical applications involving tabular data, such as Table Question Answering. However, their robustness to the structure of this input remains a critical, unaddressed question. This paper demonstrates that modern LLMs exhibit a significant vulnerability to the layout of tabular data. Specifically, we show that semantically-invariant permutations of rows and columns - rearrangements that do not alter the table's underlying information - are sometimes sufficient to cause incorrect or inconsistent model outputs. To systematically probe this vulnerability, we introduce Adversarial Table Permutation, a novel, gradient-based attack that efficiently identifies worst-case permutations designed to maximally disrupt model performance. Our extensive experiments demonstrate that ATP significantly degrades the performance of a wide range of LLMs. This reveals a pervasive vulnerability across different model sizes and architectures, including the most recent and popular models. Our findings expose a fundamental weakness in how current LLMs process structured data, underscoring the urgent need to develop permutation-robust models for reliable, real-world applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00445v2</guid>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xinshuai Dong, Haifeng Chen, Xuyuan Liu, Shengyu Chen, Haoyu Wang, Shaoan Xie, Kun Zhang, Zhengzhang Chen</dc:creator>
    </item>
    <item>
      <title>A Policy-Driven DRL Framework for System-Level Tradeoff Control in NR-U/Wi-Fi Coexistence</title>
      <link>https://arxiv.org/abs/2605.00457</link>
      <description>arXiv:2605.00457v2 Announce Type: replace 
Abstract: The coexistence of NR-U and Wi-Fi in unlicensed spectrum introduces a system-level resource coordination problem, where heterogeneous channel access mechanisms lead to a significant imbalance in spectrum utilization and degraded Wi-Fi performance. To address this challenge, we propose a policy-driven deep reinforcement learning (DRL) framework for adaptive TXOP control, in which the coexistence process is formulated as a Markov decision process (MDP) and a deep Q-network (DQN) learns control policies through online interaction. A key contribution is the introduction of a policy layer via reward design, enabling explicit control of system-level tradeoffs among fairness, throughput, and quality of service (QoS). Three policies, namely absolute fairness, moderate fairness, and utility-based fairness, are developed to achieve different operating points. Simulation results show that the proposed framework achieves a Jain fairness index above 0.9 under strict fairness control. Compared to absolute fairness, moderate fairness improves aggregate throughput by 68.22%, while the utility-based policy further enhances utility by 177.6%. These results demonstrate that policy-driven control provides a flexible and effective solution for managing tradeoffs in heterogeneous coexistence networks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00457v2</guid>
      <category>cs.NI</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Po-Heng Chou, Yi-Fang Yu, Shou-Yu Chen, Chiapin Wang</dc:creator>
    </item>
    <item>
      <title>Near-optimal and Efficient First-Order Algorithm for Multi-Task Learning with Shared Linear Representation</title>
      <link>https://arxiv.org/abs/2605.00473</link>
      <description>arXiv:2605.00473v2 Announce Type: replace 
Abstract: Multi-task learning (MTL) has emerged as a pivotal paradigm in machine learning by leveraging shared structures across multiple related tasks. Despite its empirical success, the development of likelihood-based efficiently solvable algorithms--even for shared linear representations--remains largely underdeveloped, primarily due to the non-convex structure intrinsic to matrix factorization. This paper introduces a first-order algorithm that jointly learns a shared representation and task-specific parameters, with guaranteed efficiency. Notably, it converges in $\widetilde{\mathcal{O}}(1)$ iterations and attains a \emph{near-optimal} estimation error of $\widetilde{\mathcal{O}}(dk/(TN))$, \emph{improving} over existing likelihood-based methods by a factor of $k$, where $d$, $k$, $T$, $N$ denote input dimension, representation dimension, task count, and samples per task, respectively. Our results justify that likelihood-based first-order methods can efficiently solve the MTL problem.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00473v2</guid>
      <category>cs.LG</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shihong Ding, Fangyu Du, Cong Fang</dc:creator>
    </item>
    <item>
      <title>KingsGuard: Enclave Data Protection Under Real-World TEE Vulnerabilities</title>
      <link>https://arxiv.org/abs/2605.00613</link>
      <description>arXiv:2605.00613v2 Announce Type: replace 
Abstract: Trusted Execution Environments (TEEs) have emerged as a cornerstone for securing sensitive computations by providing isolated enclaves protected from untrusted software. However, their security guarantees are undermined by vulnerabilities in both the enclave code and the underlying hardware design, which can allow sensitive data to leak despite strong isolation guarantees. This paper presents KINGSGUARD, a novel TEE design that systematically monitors and controls the propagation of sensitive data within an enclave. By enforcing fine-grained data flow tracking and checks in hardware, our approach ensures that sensitive data does not leave the enclave boundary, thus bridging the gap between the idealized threat models of TEEs and their practical realizations. Additionally, to balance security with practical functionality, we introduce controlled declassification at enclave boundaries, allowing intentional release of data to the outside world. Our implementation of KINGSGUARD on a RISC-V processor has a 10.8% hardware area overhead when synthesized on FPGA and a 5.69% performance overhead.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00613v2</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Saltanat Firdous Allaqband, Deepanjali S, Rohit Srinivas R G, Devashish Gosain, Chester Rebeiro</dc:creator>
    </item>
    <item>
      <title>STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack</title>
      <link>https://arxiv.org/abs/2605.00699</link>
      <description>arXiv:2605.00699v2 Announce Type: replace 
Abstract: Red-teaming Vision-Language Models is essential for identifying vulnerabilities where adversarial image-text inputs trigger toxic outputs. Existing approaches treat image generation as a black box, returning only terminal toxicity scores and leaving open the question of when and how toxic semantics emerge during multi-step synthesis. We introduce STARE, a hierarchical reinforcement learning framework that treats the denoising trajectory itself as the attack surface, under a direct white-box T2I and query-only black-box VLM setting. By coupling a high-level prompt editor with low-level T2I fine-tuning via Group Relative Policy Optimization (GRPO), STARE attains a 68\% improvement in Attack Success Rate over state-of-the-art black-box and white-box baselines. More importantly, this trajectory-level view surfaces the Optimization-Induced Phase Alignment phenomenon: vanilla models exhibit diffuse toxicity, whereas adversarial optimization concentrates conceptual harms into early semantic phases and detail-oriented harms into late refinement. Targeted perturbations of either window selectively suppress different toxicity categories, indicating that this temporal structure is a genuine causal handle rather than a side effect of the hierarchical design. The phenomenon turns toxicity formation from a chaotic process into a small set of predictable vulnerability windows, providing both a potent attack engine and a basis for phase-aware safety mechanisms. Content warning: This paper contains examples of toxic content that may be offensive or disturbing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00699v2</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xutao Mao, Liangjie Zhao, Tao Liu, Xiang Zheng, Hongying Zan, Cong Wang</dc:creator>
    </item>
    <item>
      <title>OceanPile: A Large-Scale Multimodal Ocean Corpus for Foundation Models</title>
      <link>https://arxiv.org/abs/2605.00877</link>
      <description>arXiv:2605.00877v2 Announce Type: replace 
Abstract: The vast and underexplored ocean plays a critical role in regulating global climate and supporting marine biodiversity, yet artificial intelligence has so far delivered limited impact in this domain due to a fundamental data bottleneck. Specifically, ocean data are highly fragmented across disparate sources and inherently exhibit multi-modal, high-noise, and weakly labeled characteristics, lacking unified schemas and semantic alignment. Although Multimodal Large Language Models (MLLMs) have achieved remarkable success in general domains, their application to ocean science remains severely constrained by the absence of large-scale, well-aligned multimodal datasets tailored to marine environments. To bridge this gap, we introduce OceanPile, a large-scale multimodal corpus designed for ocean foundation models. It comprises three key components: OceanCorpus, a unified collection integrating sonar data, underwater imagery, marine science visuals, and scientific text from diverse authoritative sources; OceanInstruction, a high-quality instruction dataset synthesized via a novel pipeline guided by a hierarchical Ocean Concept Knowledge Graph; and OceanBenchmark, a manually curated evaluation benchmark for rigorous assessment. We establish a multi-stage quality control process to ensure scientific validity and alignment across modalities. Experimental validation demonstrates significant performance improvements for models trained on our data. All datasets are publicly released to advance the field of marine artificial intelligence and empower domain-specific MLLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00877v2</guid>
      <category>cs.MM</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yida Xue, Ningyu Zhang, Tingwei Wu, Zhe Ma, Daxiong Ji, Zhao Wang, Guozhou Zheng, Huajun Chen</dc:creator>
    </item>
    <item>
      <title>Information Accessibility Limits in Structured NP Search</title>
      <link>https://arxiv.org/abs/2605.00953</link>
      <description>arXiv:2605.00953v3 Announce Type: replace 
Abstract: We study the problem of locating violating principal minors in structured matrix families that lie near the boundary of P-matrices and admit sparse violations under perturbation. Viewing violation search as an information acquisition problem, we show that, despite strong underlying structure, the location of a violation may be globally encoded and not accessible through local queries under a restricted interaction model.
  This leads to an information-theoretic bottleneck: each query reveals only vanishing information about the violating subset, so that polynomially many queries accumulate insufficient information to identify it. Using mutual information and Fano's inequality, we show that any algorithm restricted to polynomially many queries cannot recover the violating subset with constant success probability.
  Our analysis highlights a distinction between structure and accessibility: even highly structured problems can be computationally intractable when the information required to locate a solution is not accessible through the available queries.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00953v3</guid>
      <category>cs.IT</category>
      <category>cs.CC</category>
      <category>math.IT</category>
      <category>math.OC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jing-Yuan Wei</dc:creator>
    </item>
    <item>
      <title>Sampled-data Robust Control of Electrically Stimulated Engineered Cell Factories</title>
      <link>https://arxiv.org/abs/2605.01090</link>
      <description>arXiv:2605.01090v2 Announce Type: replace 
Abstract: Closed-loop bioelectronic regulation of engineered secretory cell systems is challenging because electric-field (EF) stimulation acts indirectly through transcription-factor activation, in the presence of delayed, nonlinear, and noisy intracellular dynamics, sparse measurements, and constrained burst-based actuation. We develop a framework for robust closed-loop endocrine regulation in electrically stimulated engineered cell factories, illustrated through extracellular thyroid hormone \(T_4\) production in engineered thyroid-like cells. The plant is modeled by a control-oriented ODE formulation combining a reduced mechanistic \(T_4\) pathway, an EF-responsive Hill module, and a linear-chain Erlang cascade representing distributed intracellular delay. On this basis, we design a sampled-data adaptive proportional-integral-derivative (PID) controller with derivative filtering, anti-windup, saturation and rate limits, and hysteretic band-locking, together with a robust adaptive extension that accounts for parameter mismatch, sensor noise and bias, actuator mismatch, delay/jitter, and exogenous rhythmic disturbance through a scenario-based risk-aware update. We provide local sampled-data input-to-state stability interpretations for both APID and RAPID, showing that, under standard local Lyapunov and bounded-disturbance conditions, the sampled tracking error is ultimately bounded by a disturbance-dependent constant. In silico experiments demonstrate sustained regulation of extracellular \(T_4\) across prescribed targets despite significant uncertainty.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01090v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Papri Dey, Ksenia Zlobina, Nicholas A. Rondoni, Marcella M. Gomez</dc:creator>
    </item>
    <item>
      <title>Valley3: Scaling Omni Foundation Models for E-commerce</title>
      <link>https://arxiv.org/abs/2605.01278</link>
      <description>arXiv:2605.01278v2 Announce Type: replace 
Abstract: In this work, we present Valley3, an omni multimodal large language model (MLLM) developed for diverse global e-commerce tasks, with unified understanding and reasoning capabilities across text, images, video, and audio. A key feature of Valley3 is its native multilingual audio capability for e-commerce, developed by extending vision-language models to better support crucial audio-visual tasks, particularly in short-video scenarios. To achieve this, we carefully design a four-stage omni e-commerce continued pre-training pipeline, through which Valley3 progressively acquires audio understanding, cross-modal instruction-following, e-commerce domain knowledge, and long-context reasoning capabilities, ultimately evolving into an omni model for diverse e-commerce scenarios. Then, we further improve Valley3 through post-training to encourage long-chain reasoning with controllable reasoning modes, enabling one non-thinking mode and three distinct levels of thinking, thereby balancing inference efficiency in simple scenarios with deep reasoning for complex applications. Moreover, we equip Valley3 with agentic search capabilities to proactively invoke search tools and acquire task-relevant information for e-commerce deep research tasks. To comprehensively assess the capabilities of Valley3, we construct an omni e-commerce benchmark spanning 6 tasks. Experimental results show that Valley3 consistently outperforms strong baselines on our in-house and open-source e-commerce benchmarks, while remaining competitive on general-domain benchmarks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01278v2</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zeyu Chen, Guanghao Zhou, Qixiang Yin, Ziwang Zhao, Huanjin Yao, Pengjiu Xia, Min Yang, Cen Chen, Minghui Qiu</dc:creator>
    </item>
    <item>
      <title>The Perceptual Bandwidth Bottleneck in Vision-Language Models: Active Visual Reasoning via Sequential Experimental Design</title>
      <link>https://arxiv.org/abs/2605.01345</link>
      <description>arXiv:2605.01345v2 Announce Type: replace 
Abstract: Visual perception in modern Vision-Language Models (VLMs) is constrained by a perceptual bandwidth bottleneck: a broad field of view preserves global context but sacrifices the fine-grained details required for complex reasoning. We argue that high-resolution visual reasoning is therefore not only semantic reasoning but also task-relevant evidence acquisition under limited perceptual bandwidth. Inspired by active vision and information foraging, we formalise this process as sequential Bayesian optimal experimental design (S-BOED), where an agent decides which visual evidence to acquire before answering. Since exact Bayesian inference is intractable in continuous gigapixel spaces, we derive a tractable coverage--resolution objective as a proxy for task-relevant information gain. We instantiate this framework with FOVEA, a training-free procedure that refines VLM crop proposals through evidence-oriented probing. Experiments on high-resolution benchmarks show consistent gains over direct and ReAct-style baselines, with particularly strong improvements in search-dominated remote-sensing settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01345v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anjie Liu, Ziqin Gong, Yan Song, Yuxiang Chen, Xiaolong Liu, Hengtong Lu, Kaike Zhang, Chen Wei, Jun Wang</dc:creator>
    </item>
    <item>
      <title>Neuro-Symbolic Agents for Hallucination-Free Requirements Reuse</title>
      <link>https://arxiv.org/abs/2605.01562</link>
      <description>arXiv:2605.01562v2 Announce Type: replace 
Abstract: The Object-Oriented Method for Requirements Authoring and Management (OOMRAM) is a requirements reuse framework that relies on exact identifier matching and rigid templates, limiting its ability to adapt specifications across diverse contexts. While Large Language Models (LLMs) offer the flexibility to overcome this bottleneck, they introduce the risk of generating structurally invalid or inconsistent requirement combinations. To address this tension, we present a neuro-symbolic multi-agent system that re-conceptualizes requirements reuse as a Model-Driven Elicitation process. In this paradigm, an LLM serves as a non-deterministic heuristic for traversing a deterministic domain model represented by a formal OOMRAM requirement lattice. A deterministic, symbolic validator enforces all structural constraints within the agent loop, effectively eliminating hallucinated requirement combinations by construction. Evaluated on an autonomous benchmark across two application families, our system achieves 100% requirement coverage and a constraint-violation rate of only 0.2%. Although the F1-score against a single gold standard is moderate (0.47-0.51), every generated specification is structurally valid and satisfies all mandatory domain requirements. The model-agnostic implementation scales to larger lattices via subgraph navigation and provides transparent audit trails for regulatory compliance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01562v2</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ahmed F. Ibrahim</dc:creator>
    </item>
    <item>
      <title>AI Alignment via Incentives and Correction</title>
      <link>https://arxiv.org/abs/2605.01643</link>
      <description>arXiv:2605.01643v2 Announce Type: replace 
Abstract: We study AI alignment through the lens of law-and-economics models of deterrence and enforcement. In these models, misconduct is not treated as an external failure, but as a strategic response to incentives: an actor weighs the gain from violation against the probability of detection and the severity of punishment. We argue that the same logic arises naturally in agentic AI pipelines. A solver may benefit from producing a persuasive but incorrect answer, hiding uncertainty, or exploiting spurious shortcuts, while an auditor or verifier must decide whether costly monitoring is worthwhile. Alignment is therefore a fixed-point problem: stronger penalties may deter solver misbehavior, but they can also reduce the auditor's incentive to inspect, since auditing then mainly incurs cost on a population that appears increasingly aligned.
  This perspective also changes what should count as a post-training signal. Standard feedback often attaches reward to the final answer alone, but a solver-auditor pipeline exposes the full correction event: whether the solver erred, whether the auditor inspected, whether the error was caught, and whether oversight incentives remained active. We formalize this interaction in a two-agent model in which a principal chooses rewards over joint correction outcomes, inducing both solver behavior and auditor monitoring. Reward design is therefore a bilevel optimization problem: rewards are judged not by their immediate semantic meaning, but by the behavioral equilibrium they induce. We propose a bandit-based outer-loop procedure for searching over reward profiles using noisy interaction feedback. Experiments on an LLM coding pipeline show that adaptive reward profiles can maintain useful oversight pressure and improve principal-aligned outcomes relative to static hand-designed rewards, including a substantial reduction in hallucinated incorrect attempts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01643v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Rohit Agarwal, Joshua Lin, Mark Braverman, Elad Hazan</dc:creator>
    </item>
    <item>
      <title>Maxwell \`a la Helmholtz: Direct boundary integral equations for 3D scattering by perfect electric conductors via Helmholtz operators</title>
      <link>https://arxiv.org/abs/2605.01670</link>
      <description>arXiv:2605.01670v2 Announce Type: replace 
Abstract: This paper is the direct-formulation companion to [Burbano-Gallegos, P\'erez-Arancibia, and Turc, ESAIM: M2AN, 60(1):273--315, 2026], which developed indirect combined-field-only boundary integral equations (BIEs) for time-harmonic electromagnetic scattering by smooth perfectly electrically conducting (PEC) obstacles, relying entirely on Helmholtz boundary integral operators. Here we exploit the same equivalence between the Maxwell PEC scattering problem and a pair of vector Helmholtz boundary value problems -- one for the electric field and one for the magnetic field -- to derive direct BIE formulations whose unknowns are the Dirichlet and Neumann traces of the total fields, decomposed into their normal and tangential surface components. These unknowns carry direct physical meaning: in particular, the magnetic-field formulation yields the surface electric currents as part of its solution. The mixed regularity of the two field-trace components requires introducing a tailored product H\"older space, a distinctive feature absent from the indirect approach. We prove that the resulting Direct Electric and Magnetic Combined-Field-Only Integral Equations (D-ECFOIE and D-MCFOIE) are uniquely solvable at all frequencies, and introduce Calder\'on-type regularizations (RD-ECFOIE and RD-MCFOIE) that render them of the Fredholm second kind. We further examine the low-frequency breakdown affecting the electric-field formulation and introduce a modified equation that enforces the physical charge-conservation constraints, which restores numerical accuracy and well-conditioned linear systems for frequencies arbitrarily close to zero. Numerical experiments, performed using a high-order Nystr\"om solver based on the Density Interpolation Method and implemented in the Julia package Inti.jl, validate the accuracy and robustness of the proposed formulations across a range of geometries and frequencies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01670v2</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <category>physics.comp-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Carlos P\'erez-Arancibia, Catalin Turc</dc:creator>
    </item>
    <item>
      <title>Probe-Geometry Alignment: Erasing the Cross-Sequence Memorization Signature Below Chance</title>
      <link>https://arxiv.org/abs/2605.01699</link>
      <description>arXiv:2605.01699v2 Announce Type: replace 
Abstract: Recent attacks show that behavioural unlearning of large language models leaves internal traces recoverable by adversarial probes. We characterise where this retention lives and show it can be surgically removed without measurable capability cost. Our central protocol is a leave-one-out cross-sequence probe that tests whether a memorisation signature generalises across held-out sequences. The signature is real and consistent across scale: memorisation-specific gaps of +0.32, +0.19, +0.30 on Pythia-70M, GPT-2 medium, and Mistral-7B; on Pythia-70M, the random-initialisation control collapses to -0.04 at the deepest layer where the pretrained signature peaks. The probe direction is causally separable from recall -- projecting it out collapses the signature locally (+0.44 -&gt; -0.19) while behavioural recall barely changes -- and a probe trained on naturally memorised content does not classify fine-tuning-injected secrets, marking two representationally distinct regimes. We then introduce probe-geometry alignment (PGA), a surgical erasure that aligns activations along the probe's live readout direction at each depth. PGA drives the cross-sequence probe below random chance at all four scales tested (toy depth-4: 0.17; Pythia-70M: 0.07; Mistral-7B: 0.45; GPT-2 medium: 0.06 via MD-PGA k=2) and remains robust to six adversarial probe variants. Against a re-fitting attacker who trains a fresh probe on PGA-treated activations, we extend PGA adversarially, defeating the re-fit probe at every memorisation-relevant depth while preserving five zero-shot capability benchmarks within 2.8 percentage points per task (mean {\Delta}acc = +0.2pp). The cross-sequence signature is a real, causally separable, regime-specific property of pretrained representations -- removable below chance with a single rank-one intervention per depth at no measurable capability cost.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01699v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <category>cs.NE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anamika Paul Rupa, Anietie Andy</dc:creator>
    </item>
    <item>
      <title>Linear-Time Global Visual Modeling without Explicit Attention</title>
      <link>https://arxiv.org/abs/2605.01711</link>
      <description>arXiv:2605.01711v2 Announce Type: replace 
Abstract: Existing research largely attributes the global sequence modeling capability of Transformers to the explicit computation of attention weights, a process that inherently incurs quadratic computational complexity. In this work, we offer a novel perspective: we demonstrate that attention can be mathematically reframed as a Multi-Layer Perceptron (MLP) equipped with dynamically predicted parameters. Through this lens, we explain attention's global modeling power not as explicit token-wise aggregation, but as an implicit process where dynamically generated parameters act as a compressed representation of the global context. Inspired by this insight, we investigate a fundamental question: can we achieve Transformer-level sequence global modeling entirely through dynamic parameterization while maintaining linear complexity, effectively replacing explicit attention? To explore this, we design various dynamic parameter prediction strategies and integrate them into standard network layers. Extensive empirical studies on vision models demonstrate that dynamic parameterization can indeed serve as a highly effective, linear-complexity alternative to explicit attention, opening new pathways for efficient sequence modeling. Code is available at https://github.com/LeapLabTHU/WeightFormer.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01711v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ruize He, Dongchen Han, Gao Huang</dc:creator>
    </item>
    <item>
      <title>SignVerse-2M: A Two-Million-Clip Pose-Native Universe of 55+ Sign Languages</title>
      <link>https://arxiv.org/abs/2605.01720</link>
      <description>arXiv:2605.01720v2 Announce Type: replace 
Abstract: Existing large-scale sign language resources typically provide supervision only at the level of raw video-text alignment and are often produced in laboratory settings. While such resources are important for semantic understanding, they do not directly provide a unified interface for open-world recognition and translation, or for modern pose-driven sign language video generation frameworks: 1. RGB-based pretrained recognition models depend heavily on fixed backgrounds or clothing conditions during recording, and are less robust in open-world settings than style-agnostic pose-processing models. 2. Recent pose-guided image/video generation models mostly use a unified keypoint representation such as DWPose as their control interface. At present, the sign language field still lacks a data resource that can directly interface with this modern pose-native paradigm while also targeting real-world open scenarios. We present SignVerse-2M, a large-scale multilingual pose-native dataset for sign language pose modeling and evaluation. Built from publicly available multilingual sign language video resources, it applies DWPose in a unified preprocessing pipeline to convert raw videos into 2D pose sequences that can be used directly for modeling, resulting in a consolidated corpus of about two million clips covering more than 55 sign languages. Unlike many laboratory datasets, this resource preserves the recording conditions and speaker diversity of real-world videos while reducing appearance variation through a unified pose representation. Toward this goal, we further provide the data construction pipeline, task definitions, and a simple SignDW Transformer baseline, demonstrating the feasibility of this resource for multilingual pose-space modeling and its compatibility with modern pose-driven pipelines, while discussing the evaluation claims it can support as well as its current limitations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01720v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sen Fang, Hongbin Zhong, Yanxin Zhang, Dimitris N. Metaxas</dc:creator>
    </item>
    <item>
      <title>Efficient Decision Procedures for RNmatrix Semantics</title>
      <link>https://arxiv.org/abs/2605.01845</link>
      <description>arXiv:2605.01845v2 Announce Type: replace 
Abstract: Restricted non-deterministic matrices (RNmatrices) impose constraints on the rows of non-deterministic matrices (Nmatrices), filtering out "unsound" rows and retaining only "valid" ones. This yields a more expressive framework than standard Nmatrices. Although this approach enables sound and complete semantics for a broad class of logics, \eg, paraconsistent logics, propositional intuitionistic logic, and the fifteen normal modal logics of the modal cube, no {\em efficient} decision procedures based on these semantics have been proposed. In this paper, we implement the RNmatrix framework to develop a new suite of automated theorem provers for these logics. By encoding RNmatrices and their elimination criteria as Satisfiability Modulo Theories (SMT) problems, we leverage SMT solvers to decide formula validity and construct countermodels. We illustrate the method for paraconsistent logics, where our prover outperforms the current state-of-the-art and provides the first implementation for the entire $C_n$ hierarchy, as well as for intuitionistic and modal logics, where our general-purpose prover achieves competitive performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01845v2</guid>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Renato R. Leme, Carlos Olarte, Elaine Pimentel</dc:creator>
    </item>
    <item>
      <title>NeuroState-Bench: A Human-Calibrated Benchmark for Commitment Integrity in LLM Agent Profiles</title>
      <link>https://arxiv.org/abs/2605.01847</link>
      <description>arXiv:2605.01847v2 Announce Type: replace 
Abstract: Outcome-only evaluation under-specifies whether an evaluated agent profile preserves the commitments required to solve a multi-turn task coherently. NeuroState-Bench is a human-calibrated benchmark that operationalizes commitment integrity through benchmark-defined side-query probes rather than inferred hidden activations. The released inventory contains 144 deterministic tasks and 306 benchmark-defined side-query probes spanning eight cognitively motivated failure families, paired clean and distractor variants, and three difficulty bands. The main 32-profile evaluation contains a fixed 16-profile local subset and a matched 16-profile hosted large-model subset evaluated through the same benchmark pipeline. Human calibration uses the final merged reporting scope: 104 sampled task units, 216 raw annotations, and 108 adjudicated task rows, with weighted kappa = 0.977 and ICC(2,1) = 0.977. Empirically, task success and commitment integrity diverge across this expanded grid: the success leader is not the integrity leader, 31 of 32 profiles change rank when integrity replaces task success, and integrity rankings are more stable under distractor perturbation. The primary confidence-free score HCCIS-CORE reaches 0.8469 AUC and 0.6992 PR-AUC for post-probe diagnostic discrimination of terminal task failure; the legacy full heuristic variant HCCIS-FULL reaches 0.7997 AUC and 0.6410 PR-AUC. Probe accuracy and state drift achieve slightly higher ROC-AUC, 0.8587, and better Brier/ECE, while HCCIS-CORE has substantially higher point-estimate PR-AUC and remains more closely tied to the benchmark's intended construct. The exploratory neural-augmented variant HCCIS+N is weaker overall, and a randomized subspace control approaches chance. NeuroState-Bench therefore contributes a calibrated evaluation axis for exposing commitment failures over a broader model grid than the original local-only subset.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01847v2</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jia Xiao</dc:creator>
    </item>
    <item>
      <title>Stability of Control Lyapunov Function Guided Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2605.01978</link>
      <description>arXiv:2605.01978v2 Announce Type: replace 
Abstract: Reinforcement learning (RL) has become the de facto method for achieving locomotion on humanoid robots in practice, yet stability analysis of the corresponding control policies is lacking. Recent work has attempted to merge control theoretic ideas with reinforcement learning through control guided learning. A notable example of this is the use of a control Lyapunov function (CLF) to synthesize the reinforcement learning rewards, a technique known as CLF-RL, which has shown practical success. This paper investigates the stability properties of optimal controllers using CLF-RL with the goal of bridging experimentally observed stability with theoretical guarantees. The RL problem is viewed as an optimal control problem and exponential stability is proven in both continuous and discrete time using both core CLF reward terms and the additional terms used in practice. The theoretical bounds are numerically verified on systems such as the double integrator and cart-pole. Finally, the CLF guided rewards are implemented for a walking humanoid robot to generate stable periodic orbits.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01978v2</guid>
      <category>eess.SY</category>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zachary Olkin, William D. Compton, Aaron D. Ames</dc:creator>
    </item>
    <item>
      <title>RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy</title>
      <link>https://arxiv.org/abs/2605.02003</link>
      <description>arXiv:2605.02003v2 Announce Type: replace 
Abstract: Machine Learning (ML) has transformed many scientific fields, yet key applications still lack standardized benchmarks. Raman spectroscopy, a widely used technique for non-invasive molecular analysis, is one such field where progress is limited by fragmented datasets, inconsistent evaluation, and models that fail to capture the structure of spectral data. We introduce RamanBench, the first large-scale, fully reproducible benchmark for ML on Raman spectroscopy, consisting of streamlined data access, evaluation protocols and code, as well as a live leaderboard. It unifies 74 datasets (including 16 first released with this benchmark) across four domains, comprising 325,668 spectra and spanning classification and regression tasks under diverse experimental conditions. We benchmark 28 models under a standardized protocol, including classical methods (e.g., PLS), Raman-specific (e.g., RamanNet), Tabular Foundation Model (TFM) (e.g., TabPFN), and time-series approaches (e.g., ROCKET). TFM consistently outperform domain-specific and gradient boosting baselines, while time-series models remain competitive. However, no method generalizes across datasets, revealing a fundamental gap. Therefore, we invite the community to contribute new approaches to our living benchmark, with the potential to accelerate advances in critical applications such as medical diagnostics, biological research, and materials science.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02003v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mario Koddenbrock, Christoph Lange, Robin Legner, Martin J\"ager, Martin K\"ogler, Mariano N. Cruz Bournazou, Peter Neubauer, Felix Biessmann, Erik Rodner</dc:creator>
    </item>
    <item>
      <title>RenCon 2025: Revival of the Expressive Performance Rendering Competition</title>
      <link>https://arxiv.org/abs/2605.02059</link>
      <description>arXiv:2605.02059v2 Announce Type: replace 
Abstract: This paper presents a comprehensive documentation of RenCon 2025, the revival of the expressive performance rendering competition which took place at ISMIR 2025 in Daejeon, Korea. The competition attracted 9 entries from international research groups, representing diverse approaches to expressive piano performance rendering. The two-phase assessment structure comprised a preliminary online evaluation and live real-time rendering at the conference. We analyze the competition format, participant demographics, system performance, and lessons learned for future iterations. The results demonstrate significant advances in expressive rendering capabilities while highlighting remaining challenges in achieving human-level musical expression.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02059v2</guid>
      <category>cs.MM</category>
      <category>cs.SD</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Huan Zhang, Taegyun Kwon, Anders Friberg, Junyan Jiang, Hayeon Bang, Hyeyoon Cho, Gus Xia, Akira Maezawa, Simon Dixon, Dasaem Jeong</dc:creator>
    </item>
    <item>
      <title>Manifold-Aligned Guided Integrated Gradients for Reliable Feature Attribution</title>
      <link>https://arxiv.org/abs/2605.02167</link>
      <description>arXiv:2605.02167v2 Announce Type: replace 
Abstract: Feature attribution is central to diagnosing and trusting deep neural networks, and Integrated Gradients (IG) is widely used due to its axiomatic properties. However, IG can yield unreliable explanations when the integration path between a baseline and the input passes through regions with noisy gradients. While Guided Integrated Gradients reduces this sensitivity by adaptively updating low-gradient-magnitude features, input-space guidance still produces intermediate inputs that deviate from the data manifold. To address this limitation, we propose \emph{Manifold-Aligned Guided Integrated Gradients} (MA-GIG), which constructs attribution paths in the latent space of a pre-trained variational autoencoder. By decoding intermediate latent states, MA-GIG biases the path toward the learned generative manifold and reduces exposure to implausible input-space regions. Through qualitative and quantitative evaluations, we demonstrate that MA-GIG produces faithful explanations by aggregating gradients on path features proximal to the input. Consequently, our method reduces off-manifold noise and outperforms prior path-based attribution methods across multiple datasets and classifiers. Our code is available at https://github.com/leekwoon/ma-gig/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02167v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Soyeon Kim, Seongwoo Lim, Kyowoon Lee, Jaesik Choi</dc:creator>
    </item>
    <item>
      <title>Anon: Extrapolating Adaptivity Beyond SGD and Adam</title>
      <link>https://arxiv.org/abs/2605.02317</link>
      <description>arXiv:2605.02317v2 Announce Type: replace 
Abstract: Adaptive optimizers such as Adam have achieved great success in training large-scale models like large language models and diffusion models. However, they often generalize worse than non-adaptive methods, such as SGD on classical architectures like CNNs. We identify a key cause of this performance gap: adaptivity in pre-conditioners, which limits the optimizer's ability to adapt to diverse optimization landscapes. To address this, we propose Anon (Adaptivity Non-restricted Optimizer with Novel convergence technique), a novel optimizer with continuously tunable adaptivity in R, allowing it to interpolate between SGD-like and Adam-like behaviors and even extrapolate beyond both. To ensure convergence across the entire adaptivity spectrum, we introduce incremental delay update (IDU), a novel mechanism that is more flexible than AMSGrad's hard max-tracking strategy and enhances robustness to gradient noise. We theoretically establish convergence guarantees under both convex and non-convex settings. Empirically, Anon consistently outperforms state-of-the-art optimizers on representative image classification, diffusion, and language modeling tasks. These results demonstrate that adaptivity can serve as a valuable tunable design principle, and Anon provides the first unified and reliable framework capable of bridging the gap between classical and modern optimizers and surpassing their advantageous properties.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02317v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yiheng Zhang, Kaiyan Zhao, Shaowu Wu, Yiming Wang, Jiajun Wu, Leong Hou U, Steve Drew, Xiaoguang Niu</dc:creator>
    </item>
    <item>
      <title>ANO: A Principled Approach to Robust Policy Optimization</title>
      <link>https://arxiv.org/abs/2605.02320</link>
      <description>arXiv:2605.02320v2 Announce Type: replace 
Abstract: Proximal Policy Optimization (PPO) dominates reinforcement learning and LLM alignment but relies on a "hard clipping" mechanism that discards valuable gradients. Conversely, unconstrained methods like SPO expose the optimization to unbounded updates, causing severe instability and policy collapse during extreme outlier encounters. To resolve this dilemma, we introduce a principled design space for policy optimization, demonstrating that a robust estimator must inherently suppress outliers while maintaining a smooth restoration force. Guided by these geometric principles, we derive Anchored Neighborhood Optimization (ANO), a novel method that seamlessly replaces hard clipping with a redescending gradient mechanism. Extensive evaluations demonstrate ANO's empirical superiority across diverse domains. In continuous (MuJoCo) and discrete (Atari) control, ANO establishes a robust state-of-the-art, uniquely preventing policy collapse even under highly aggressive learning rates ($1 \times 10^{-3}$). Furthermore, in LLM alignment (RLHF), ANO explicitly eliminates the catastrophic KL divergence explosion inherent to unconstrained methods, dominating PPO, SPO, and GRPO in head-to-head win rates.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02320v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yiheng Zhang, Yiming Wang, Kaiyan Zhao, Zhenglin Wan, Jiayu Chen, Leong Hou U</dc:creator>
    </item>
    <item>
      <title>ZNO: Stable Rational Neural Operators in the Z-Domain for Discrete-Time Dynamics</title>
      <link>https://arxiv.org/abs/2605.02356</link>
      <description>arXiv:2605.02356v2 Announce Type: replace 
Abstract: We introduce the Z-Domain Neural Operator (ZNO), a causal neural operator whose layers are stable low-rank multiple-input multiple-output (MIMO) rational filters parameterized directly in the $z$-plane. ZNO addresses a limitation of existing operator learning methods, many of which are primarily tailored for continuous-time problems, while a large class of system-identification problems is intrinsically discrete-time. The $z$-domain form expresses stability as a unit-disk pole constraint and makes learned discrete-time poles directly readable. The model combines low-rank channel mixing, smooth stable pole reparameterization, causal recurrence, and an optional short finite impulse response (FIR) branch in a single $z$-domain rational recurrent layer. Across controlled discrete system-identification experiments, ZNO's advantage is most evident when the target dynamics are stable rational systems with lightly damped poles near the unit circle. Under matched parameter budgets, ZNO is not uniformly dominant; however, with validation-selected configurations, the same architecture can achieve the lowest mean error across the controlled tasks. A five-bin difficulty sweep over near-unit-circle / long-memory dynamics shows that ZNO has the lowest mean error across memory regimes, from short (approximately 10 steps) to long (approximately 100-200 steps). On five public nonlinear system-identification benchmarks, ZNO is competitive with neural operator and state-space baselines, achieving the lowest mean error on benchmarks whose dynamics align with stable rational discrete-time filters, while classical or state-space baselines remain preferable on some systems. These results position ZNO as a strong model for stable rational discrete-time dynamics, especially in near-unit-circle and long-memory regimes, but not as a universal replacement for specialized system-identification methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02356v2</guid>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xianli Zhu, Jia Yin</dc:creator>
    </item>
    <item>
      <title>When Stress Becomes Signal: Detecting Antifragility-Compatible Regimes in Multi-Agent LLM Systems</title>
      <link>https://arxiv.org/abs/2605.02463</link>
      <description>arXiv:2605.02463v2 Announce Type: replace 
Abstract: Multi-agent LLM systems are increasingly used to solve complex tasks through decomposition, debate, specialization, and ensemble reasoning. However, these systems are usually evaluated in terms of robustness: whether performance is preserved under perturbation. This paper studies a different question: whether semantic stress exposes structured variation that could support future antifragile learning. We introduce CAFE (Cognitive Antifragility Framework for Evaluation), a statistical framework for detecting antifragility-compatible regimes in multi-agent architectures. CAFE models a controlled expected distribution of semantic stressors, reconstructs an architecture-specific observed effective stress distribution from multi-dimensional judge signals, and compares both distributions using a distributional Jensen Gap under a convex stress potential. A positive gap does not imply immediate performance improvement; instead, it indicates a convex-expansive deformation of the observed stress distribution, suggesting that the architecture exposes learnable stress structure. We evaluate CAFE on a banking-risk analysis benchmark with five multi-agent architectures: flat, hierarchical, debate, meta-adaptive, and ensemble. Across all architectures, semantic stress reduces average judged quality by roughly one third. Yet all architectures exhibit positive distributional Jensen Gaps with bootstrap confidence intervals above zero. These results show that immediate quality degradation can coexist with statistically detectable antifragility-compatible stress geometry. CAFE is therefore not an antifragile learner itself, but a measurement layer for identifying when and where antifragility learning may be worth applying.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02463v2</guid>
      <category>cs.MA</category>
      <category>cs.AI</category>
      <category>cs.CE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jose Manuel de la Chica, Juan Manuel Vera, Jairo Rodr\'iguez</dc:creator>
    </item>
    <item>
      <title>Shadow-Loom: Causal Reasoning over Graphical World Models of Narratives</title>
      <link>https://arxiv.org/abs/2605.02475</link>
      <description>arXiv:2605.02475v2 Announce Type: replace 
Abstract: Stories hold a reader's attention because they have causes, secrets, and consequences. Shadow-Loom is an experimental open-source framework that turns a narrative into a versioned graphical world model and lets two engines act on it: a causal physics grounded in Pearl's ladder of causation and a recently proposed counterfactual calculus over Ancestral Multi-World Networks; and a narrative physics that scores the same graph against four structural reader-states -- mystery, dramatic irony, suspense, and surprise -- in the tradition of Sternberg's curiosity/suspense/surprise triad, with suspense formalised in the structural-affect line of work on story comprehension and computational suspense. Large language models are used only at the boundary: extraction, rendering, and audit; identification, intervention, and counterfactual reasoning are carried out in typed code over the graph. The system is offered as a research artefact rather than as a benchmarked NLP model; code, fixtures, and pipeline are released open source.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02475v2</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>David Wilmot</dc:creator>
    </item>
    <item>
      <title>GRAIL: A Deep-Granularity Hybrid Resonance Framework for Real-Time Agent Discovery via SLM-Enhanced Indexing</title>
      <link>https://arxiv.org/abs/2605.02489</link>
      <description>arXiv:2605.02489v2 Announce Type: replace 
Abstract: As the ecosystem of Large Language Model (LLM)-based agents expands rapidly, efficient and accurate Agent Discovery becomes a critical bottleneck for large-scale multi-agent collaboration. Existing approaches typically face a dichotomy: either relying on heavy-weight LLMs for intent parsing, leading to prohibitive latency (often exceeding 30 seconds), or using monolithic vector retrieval that sacrifices semantic precision for speed. To bridge this gap, we propose \textbf{GRAIL} (Granular Resonance-based Agent/AI Link), a novel framework achieving sub-400ms discovery latency without compromising accuracy. GRAIL introduces three key innovations: (1) \textbf{SLM-Enhanced Prediction}, replacing the generalized LLM parser with a specialized, fine-tuned Small Language Model (SLM) for millisecond-level capability tag prediction; (2) \textbf{Pseudo-Document Expansion}, augmenting agent descriptions with synthetic queries to enhance semantic density for robust dense retrieval; and (3) \textbf{MaxSim Resonance}, a fine-grained matching mechanism computing maximum similarity between user queries and discrete agent usage examples, effectively mitigating semantic dilution. Validated on \textbf{AgentTaxo-9K}, our new large-scale dataset of 9,240 agents, GRAIL reduces end-to-end discovery latency by over \textbf{79$\times$} compared to LLM-parsing baselines, while significantly outperforming traditional vector search in Recall@10. This framework offers a scalable, industrial-grade solution for the real-time ``Internet of Agents."</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02489v2</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jinliang Xu</dc:creator>
    </item>
    <item>
      <title>GuardSec: A Multi-Modal Web Platform for Real-Time Digital Fraud Detection, Entity Verification, and Connection Security Analysis in the African Context</title>
      <link>https://arxiv.org/abs/2605.02502</link>
      <description>arXiv:2605.02502v3 Announce Type: replace 
Abstract: Online fraud in Africa has reached epidemic scale, yet the few cybersecurity tools that exist are not available to ordinary citizens and are calibrated almost exclusively for SOCs and technically literate users operating on stable broadband connections. This mismatch is not incidental: it is the predictable outcome of a research culture that optimises for benchmark performance while systematically neglecting deployability, accessibility, and local threat context.
  This paper presents \textit{GuardSec}, a production-deployed, openly accessible web platform for real-time multi-modal digital threat verification, designed from the ground up for the African user context. The system enables any user with a browser to assess the legitimacy of URLs, websites, phone numbers, email addresses, and business entities in under five seconds, without registration, without an API key, and without cybersecurity expertise. A distinctive original component of GuardSec is the \textit{Mon Empreinte} (My Footprint) module, which performs a real-time security audit of the user's own connection and digital exposure: it analyses the visitor's IP address, geolocation, ISP identity, connection type, device fingerprint, browser configuration, and a set of twelve security indicators spanning network integrity, tracking exposure, and anonymisation status. This self-diagnostic capability transforms GuardSec from a passive verification tool into an active instrument of digital self-awareness, enabling users to understand not only whether an external entity is safe, but whether their own connection is compromised, tracked, or exposed. The platform additionally embeds \textit{Gilda}, a context-aware conversational security assistant that answers user questions about digital threats in plain language and issues personalised security recommendations on demand.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02502v3</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Gilda Rech Bansimba, Regis Freguin Babindamana</dc:creator>
    </item>
    <item>
      <title>TemPose-TF-ASF: Two-Stage Bidirectional Stroke Context Fusion for Badminton Stroke Classification</title>
      <link>https://arxiv.org/abs/2605.02558</link>
      <description>arXiv:2605.02558v2 Announce Type: replace 
Abstract: Accurate badminton stroke prediction is crucial for fine-grained sports analysis and tactical decision support. However, existing methods struggle to model rich temporal context. This paper introduces TemPose-TF-ASF (Adjacent-Stroke Fusion), a context-aware extension of TemPose. It enhances stroke recognition by incorporating stroke-type information from both preceding and subsequent strokes. A two-stage training and inference strategy is adopted. Preliminary predictions from the baseline model are reused as estimated temporal context. These predictions guide the joint optimization of the ASF module and the classifier. By explicitly modeling bidirectional temporal stroke dependencies, the proposed method can be seamlessly integrated into existing state-of-the-art models. Experiments on a large-scale badminton match dataset show consistent improvements over the baseline and its variants in terms of Accuracy and Macro-F1. Moreover, integrating ASF into other advanced methods yields notable performance gains. These results demonstrate strong transferability and generalization capability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02558v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tzu-Yu Liu, Duan-Shin Lee</dc:creator>
    </item>
    <item>
      <title>An explainable hypothesis-driven approach to Drug-Induced Liver Injury with HADES</title>
      <link>https://arxiv.org/abs/2605.02669</link>
      <description>arXiv:2605.02669v2 Announce Type: replace 
Abstract: Drug-induced liver injury (DILI) remains a leading cause of late-stage clinical trial attrition. However, existing computational predictors primarily rely on binary classification, a framing that limits generalization and yields no mechanistic insight to guide translational decisions. We argue that DILI prediction is better posed as an explainable hypothesis-generation problem.
  To support this shift, we introduce the DILER Benchmark, a dataset that extends beyond binary labels by augmenting a curated set of molecules with mechanistic hepatotoxicity hypotheses derived from biomedical literature. We further present HADES, an agentic system designed to generate transparent and auditable reasoning traces. By combining molecular-level predictions, metabolite decomposition, structural understanding, and toxicity pathway evidence, HADES mechanistically assesses DILI risk.
  Evaluated on the DILER Benchmark, HADES outperforms existing models in binary classification, achieving a ROC-AUC of 0.68 on the Test Set and 0.59 on the challenging Post-2021 Set, compared with 0.63 and 0.50 for DILI-Predictor, respectively. More importantly, we establish a baseline for mechanistic hypothesis generation, where HADES achieves a Hypothesis Alignment Fuzzy Jaccard Index of 0.16. This result underscores the inherent complexity of the task while highlighting the need for advanced explainable approaches in predictive toxicology.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02669v2</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Maciej Wisniewski, Bartosz Topolski, Pawel Dabrowski-Tumanski, Dariusz Plewczynski, Tomasz Jetka</dc:creator>
    </item>
    <item>
      <title>Foundation Models to Unlock Real-World Evidence from Nationwide Medical Claims</title>
      <link>https://arxiv.org/abs/2605.02740</link>
      <description>arXiv:2605.02740v2 Announce Type: replace 
Abstract: Evidence derived from large-scale real-world data (RWD) is increasingly informing regulatory evaluation and healthcare decision-making. Administrative claims provide population-scale, longitudinal records of healthcare utilization, expenditure, and detailed coding of diagnoses, procedures, and medications, yet their potential as a substrate for healthcare foundation models remains largely unexplored. Here we present ReClaim, a generative transformer trained from scratch on 43.8 billion medical events from more than 200 million enrollees in the MarketScan claims data spanning 2008-2022. ReClaim models longitudinal trajectories across diagnoses, procedures, medications, and expenditure, and was scaled to 140 million, 700 million, and 1.7 billion parameters. Across over 1,000 disease-onset prediction tasks, ReClaim achieved a mean AUC of 75.6%, substantially outperforming disease-specific LightGBM (66.3%) and the transformer-based Delphi model (69.4%), with the largest gains for rare diseases. These advantages held across retrospective and prospective evaluations and in external validation on two independent datasets. Performance improved monotonically with scale, and post-training added 13.8 percentage points over pre-training alone. Beyond disease prediction, ReClaim captured financial outcomes and improved real-world evidence (RWE) analyses: for healthcare expenditure forecasting it increased explained variance from 0.28 to 0.37 relative to LightGBM, and in a target trial emulation it reduced systematic bias by 72% on average relative to Delphi. Together, these results establish administrative claims as a scalable substrate for healthcare foundation models and show that learned representations generalize across time periods and data sources, supporting disease surveillance, expenditure forecasting, and RWE generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02740v2</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Fan Ma, Yuntian Liu, Xiang Lan, Weipeng Zhou, Jun Ni, Mauro Giuffr\`e, Lingfei Qian, Xueqing Peng, Yujia Zhou, Ruey-Ling Weng, Huan He, Lu Li, Huiyuan Wang, Qingyu Chen, Andrew Loza, Laila Rasmy, Degui Zhi, Yuan Lu, Chenjie Zeng, Joshua C Denny, Lee Schwamm, Daniella Meeker, Lucila Ohno-Machado, Yong Chen, Hua Xu</dc:creator>
    </item>
    <item>
      <title>TRACE: Temporal Reasoning over Context and Evidence for Activity Recognition in Smart Homes</title>
      <link>https://arxiv.org/abs/2605.02841</link>
      <description>arXiv:2605.02841v2 Announce Type: replace 
Abstract: Human activity recognition (HAR) in smart homes remains challenging because many daily activities exhibit similar local sensor patterns, while minimally intrusive sensing provides sparse and ambiguous observations. As a result, methods based on short temporal or event windows often fail to capture the broader temporal and behavioral context needed for reliable activity understanding. We present TRACE (Temporal Reasoning over Context and Evidence), a contextual activity recognition framework for smart homes that integrates multi-source sensor evidence with user-specific contextual priors to improve activity interpretation. Rather than treating recognition as a local classification problem, TRACE leverages contextual reasoning to resolve ambiguities, reduce fragmented predictions, and infer more semantically specific activities. We evaluate TRACE on public benchmarks and in a deployment study conducted in our smart-home environment. Results show that TRACE improves recognition accuracy for semantically complex activities, produces more temporally coherent predictions that better align with user-specific routines, and maintains robust performance under cross-domain transfer and missing-modality conditions. These findings demonstrate the value of contextual reasoning for advancing smart-home HAR.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02841v2</guid>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yingtian Shi, Abivishaq Balasubramanian, Jessica Herring, Jiachen Li, Juan Macias Romero, Rosemarie Santa Gonzalez, Varun Mishra, Agata Rozga, Xiang Zhi Tan, Thomas Pl\"otz</dc:creator>
    </item>
    <item>
      <title>The 1-Bit Barrier is Universal: k-Stage Pipeline Composition and Unified Leakage Bounds for Standard Modular Reductions in PQC Hardware</title>
      <link>https://arxiv.org/abs/2605.02856</link>
      <description>arXiv:2605.02856v2 Announce Type: replace 
Abstract: This is Paper 7 of a series of formally-verified analyses of masked NTT hardware for post-quantum cryptography; Paper 1 [1] established structural dependency analysis of the QANARY platform, and Paper 2 [2] quantified security margins under partial NTT masking. Arbitrary-depth $k$-stage masked NTT pipelines with fresh inter-stage masking and per-stage PF-PINI($\leq 2$) gadgets satisfy a per-observation cardinality bound of $2 \cdot q^{2k-2}$ on the preimage of any output value, machine-checked in Lean 4 with zero \texttt{sorry}. Under the standard (informal) semantic translation that divides this cardinality by the total mask-tuple space size $q^{2k-1}$, the per-observation conditional probability bound is $2/q$, independent of pipeline depth $k$. The QANARY program has previously established machine-checked cardinality bounds on the per-observation leakage of masked NTT hardware: PF-PINI(2) for Barrett reduction (Paper 5 [3]), 2-stage composition with fresh inter-stage masking (Paper 6 [4]), an underlying universality theorem (Paper 3 [5]), and PF-PINI(1) for butterfly wires (Paper 4 [6]). This paper closes the program with four contributions. First, a $k$-stage composition theorem generalizing Paper 6's two-stage result to arbitrary $k \geq 1$ gives the last-stage-determined bound $G_{k-1}.\texttt{maxMult} \cdot q^{2k-2}$: only the last stage's PF-PINI parameter survives, with intermediate parameters erased by fresh inter-stage masking. Second, Montgomery reduction satisfies PF-PINI(2) with tight max-multiplicity 2. Third, we assemble these into the end-to-end bound $2 \cdot q^{2k-2}$ for any depth-$k$ PF-PINI($\leq 2$) pipeline under fresh inter-stage masking. Fourth, a Lean-verified hypothesis-violation conditional anchors the prior empirical and structural Adams Bridge analyses ([1, 2, 7, 8]).</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02856v2</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ray Iskander, Khaled Kirah</dc:creator>
    </item>
    <item>
      <title>CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing</title>
      <link>https://arxiv.org/abs/2605.02910</link>
      <description>arXiv:2605.02910v2 Announce Type: replace 
Abstract: Recent advances in large language models have led to strong performance on reasoning and environment-interaction tasks, yet their ability for creative problem-solving remains underexplored. We study this capability through the lens of creative tool use, where a model repurposes available objects by reasoning about their affordances and attributes rather than relying on canonical usage. As a first step, we introduce CreativityBench, a benchmark for evaluating affordance-based creativity in LLMs. To this end, we build a large-scale affordance knowledge base (KB) with 4K entities and 150K+ affordance annotations, explicitly linking objects, parts, attributes, and actionable uses. Building on this KB, we generate 14K grounded tasks that require identifying non-obvious yet physically plausible solutions under constraints. Evaluations across 10 state-of-the-art LLMs, including closed and open-source models, show that models can often select a plausible object, but fail to identify the correct parts, their affordances, and the underlying physical mechanism needed to solve the task, leading to a significant drop in performance. Furthermore, improvements from model scaling quickly saturate, strong general reasoning does not reliably translate to creative affordance discovery, and common inference-time strategies such as Chain-of-Thought yield limited gains. These results suggest that creative tool use remains a major challenge for current models, and that CreativityBench provides a useful testbed for studying this missing dimension of intelligence, with potential implications for planning and reasoning modules in future agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02910v2</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Cheng Qian, Hyeonjeong Ha, Jiayu Liu, Jeonghwan Kim, Jiateng Liu, Bingxuan Li, Aditi Tiwari, Dwip Dalal, Zhenhailong Wang, Xiusi Chen, Mahdi Namazifar, Yunzhu Li, Heng Ji</dc:creator>
    </item>
    <item>
      <title>Structured Diffusion Bridges: Inductive Bias for Denoising Diffusion Bridges</title>
      <link>https://arxiv.org/abs/2605.02973</link>
      <description>arXiv:2605.02973v2 Announce Type: replace 
Abstract: Modality translation is inherently under-constrained, as multiple cross-modal mappings may yield the same marginals. Recent work has shown that diffusion bridges are effective for this task. However, most existing approaches rely on fully paired datasets, thereby imposing a single data-driven constraint. We propose a diffusion-bridge framework that characterizes the space of admissible solutions and restricts it via alignment constraints, treating paired supervision as an optional heuristic rather than a prerequisite. We validate our method on synthetic and real modality translation benchmarks across unpaired, semi-paired, and paired regimes, showing consistent performance across supervision levels. Notably, \textbf{it achieves near fully-paired quality with a substantial relaxation in pairing requirements, and remaining applicable in the unpaired regime}. These results highlight diffusion bridges as a flexible foundation for modality translation beyond fully paired data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02973v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Eitan Kosman, Gabriele Serussi, Chaim Baskin</dc:creator>
    </item>
    <item>
      <title>ChaRVoC: A Challenge-Response Voice Cancelable Authentication System</title>
      <link>https://arxiv.org/abs/2605.02990</link>
      <description>arXiv:2605.02990v2 Announce Type: replace 
Abstract: In this work, we present a Challenge-Response Voice Cancelable authentication system, called ChaRVoC, which provides protection against replay attacks, revocability issues, and template compromise. Our approach integrates three security factors: (1) inherent voice biometric characteristics, (2) user-memorized secret keys enabling template revocability, and (3) dynamic system-generated challenges providing liveness detection. Specifically, we introduce a novel HashGray-XOR scheme which combines a cryptographic hash function with an unrecoverable graycode-based transformation to create secured templates that are mathematically proven to be non-invertible. We compare our methods with existing cancelable biometric methods (WTA, IoM, RoE) on VoxCeleb1, TIMIT, and VOiCES datasets to show the recognition performance of our proposed system. We also show that our system achieves both cancelability and unlinkability properties.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02990v2</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Phuc-Khang Vo-Hoang, Hoang C. Ta, Nhien-An Le-Khac, Dinh-Thuc Nguyen, Hong-Hanh Nguyen-Le</dc:creator>
    </item>
    <item>
      <title>Boundary-Aware Uncertainty Quantification for Wildfire Spread Prediction</title>
      <link>https://arxiv.org/abs/2605.03148</link>
      <description>arXiv:2605.03148v2 Announce Type: replace 
Abstract: Reliable wildfire spread prediction is vital for risk-aware emergency planning, yet most deep learning models lack principled uncertainty quantification (UQ). Further, for boundary-sensitive cases like wildfire spread, evaluating models with global metrics alone is often insufficient. To shift the focus of UQ evaluation toward a more operationally relevant approach, the Fire-Centered Evaluation Region (FCER) framework is introduced as a spatially conditioned protocol to characterize UQ within critical fire zones. Using FCER, an Ensemble is compared against an distilled single-pass student model on the WildfireSpreadTS dataset. The student model demonstrates comparable calibration and complementary uncertainty ranking in boundary-relevant regimes. Code is available at https://github.com/jonasvilhofunk/WildfireUQ-FCER</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03148v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jonas V. Funk</dc:creator>
    </item>
    <item>
      <title>Kerncap: Automated Kernel Extraction and Isolation for AMD GPUs</title>
      <link>https://arxiv.org/abs/2605.03208</link>
      <description>arXiv:2605.03208v2 Announce Type: replace 
Abstract: Iterative GPU kernel tuning is bottlenecked by the scale of the applications that host the kernels. Rapid iteration requires isolating the kernel so it can be edited, recompiled, and validated without rebuilding the full application -- but manual isolation requires reconstructing build flags, dispatch configuration, and runtime inputs by hand, so developers usually settle for slow in-place edits.
  We present Kerncap, an automated kernel extraction tool that intercepts dispatches at the HSA runtime for both HIP and Triton, bridging Triton's JIT-only metadata into HSA-level capture via a lightweight Python compile-hook shim. Kerncap performs an address-space closure of all device memory -- a virtual-address-faithful snapshot that preserves embedded device pointers without DWARF metadata or pointer chasing -- locates kernel sources, and emits self-contained reproducer projects. HIP reproducers use a Clang VFS overlay for source-level recompilation without modifying the original build system; Triton reproducers are tuning-pinned, binding the captured autotuner configuration into the artifact to preserve the JIT kernel's numerical contract.
  Across six real-world HIP and Triton workloads spanning traditional HPC and ML domains on three AMD GPU architectures (CDNA2, CDNA3, RDNA3), Kerncap extracts and validates kernels from snapshots ranging from 152~MB to 30~GB -- including a VA-faithful capture of vLLM's Mixture-of-Experts weight pool reached through pointer indirection. On our llama-cpp case study, Kerncap's edit-recompile-validate loop achieves a 13.6x speedup over the traditional workflow, reducing kernel isolation from a multi-hour process to a single command. The resulting reproducers also serve as a substrate for autotuning agents and LLM-driven kernel generators that need rapid, isolated evaluation of candidates.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03208v2</guid>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Cole Ramos, Keith Lowery</dc:creator>
    </item>
    <item>
      <title>ADAPTS: Agentic Decomposition for Automated Protocol-agnostic Tracking of Symptoms</title>
      <link>https://arxiv.org/abs/2605.03212</link>
      <description>arXiv:2605.03212v2 Announce Type: replace 
Abstract: Modeling latent clinical constructs from unconstrained clinical interactions is a unique challenge in affective computing. We present ADAPTS (Agentic Decomposition for Automated Protocol-agnostic Tracking of Symptoms), a framework for automated rating of depression and anxiety severity using a mixture-of-agents LLM architecture. This approach decomposes long-form clinical interviews into symptom-specific reasoning tasks, producing auditable justifications while preserving temporal and speaker alignment. Generalization was evaluated across two independent datasets ($N=204$) with distinct interview structures. On high-discrepancy interviews, automated ratings approximated expert benchmarks ($\text{absolute error}=22$) more closely than original human ratings ($\text{absolute error}=26$). Implementing an ``extended'' protocol that incorporates qualitative clinical conventions significantly stabilized ratings, with absolute agreement reaching $\text{ICC(2,1)} = 0.877$. These findings suggest that the ADAPTS framework enables promising evaluations of psychiatric severity. While the current implementation is purely text-based, the underlying architecture is readily extensible to multimodal inputs, including acoustic and visual features. By approximating expert-level precision in a protocol-agnostic manner, this framework provides a foundation for objective and scalable psychiatric assessment, especially in resource-limited settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03212v2</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.HC</category>
      <category>stat.AP</category>
      <category>stat.CO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Alexandria K. Vail, Marcelo Cicconet, Katie Aafjes-van Doorn, Ryan Maroney, Marc Aafjes</dc:creator>
    </item>
    <item>
      <title>RLDX-1 Technical Report</title>
      <link>https://arxiv.org/abs/2605.03269</link>
      <description>arXiv:2605.03269v2 Announce Type: replace 
Abstract: While Vision-Language-Action models (VLAs) have shown remarkable progress toward human-like generalist robotic policies through the versatile intelligence (i.e. broad scene understanding and language-conditioned generalization) inherited from pre-trained Vision-Language Models, they still struggle with complex real-world tasks requiring broader functional capabilities (e.g. motion awareness, long-term memory, and physical sensing). To address this, we introduce RLDX-1, a general-purpose robotic policy for dexterous manipulation built on the Multi-Stream Action Transformer (MSAT), an architecture that unifies these capabilities by integrating heterogeneous modalities through modality-specific streams with cross-modal joint self-attention. RLDX-1 further combines this architecture with system-level design choices, including data synthesis for rare manipulation scenarios, learning procedures specialized for human-like manipulation, and inference optimizations for real-time deployment. Through empirical evaluation, we show that RLDX-1 consistently outperforms recent frontier VLAs (e.g. $\pi_{0.5}$ and GR00T N1.6) across both simulation benchmarks and real-world tasks that require broad functional capabilities beyond general versatility. In particular, RLDX-1 shows superiority in ALLEX humanoid tasks by achieving success rates of 86.8% while $\pi_{0.5}$ and GR00T N1.6 achieve around 40%, highlighting the ability of RLDX-1 to control a high-DoF humanoid robot under diverse functional demands. Together, these results position RLDX-1 as a promising step toward reliable VLAs for complex, contact-rich, and dynamic real-world dexterous manipulation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03269v2</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dongyoung Kim, Huiwon Jang, Myungkyu Koo, Suhyeok Jang, Taeyoung Kim, Beomjun Kim, Byungjun Yoon, Changsung Jang, Daewon Choi, Dongsu Han, Donguk Lee, Heeseung Kwon, Hojin Jeon, Jaehyun Kang, Jaekyoung Bae, Jihyuk Lee, Jimin Lee, John Won, Joonwoo Ahn, Junhyeong Park, Junyoung Sung, Kyungmin Lee, Minseong Han, Minsung Yoon, Sejune Joo, Seonil Son, Seungcheol Park, Seunggeun Cho, Seungjun Moon, Seungku Kim, Yonghoon Dong, Yongjin Cho, Youngchan Kim, Chang Hwan Kim, Dohyeon Kim, Heecheol Kim, Heewon Lee, Hensen Ahn, Hyungkyu Ryu, Hyunsoo Choi, Hyunsoo Shin, Jaeheon Jung, Jaewoo Kim, Jinwook Kim, Joochul Chang, Joonsoo Kim, Junghun Park, Jungwoo Park, Junho Cho, Junhyeok Park, Junwon Lee, Kangwook Lee, Kwanghoon Kim, Kyoungwhan Choe, Manoj Bhadu, Nayoung Oh, Sangjun Kim, Sangwoo Kim, Seunghoon Shim, Seunghyun Kim, Seungjun Lee, Seungyup Ka, Sungryol Yang, Wook Jung, Yashu Shukla, Yeonjae Lee, Yeonwoo Bae, Jinwoo Shin</dc:creator>
    </item>
    <item>
      <title>When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning</title>
      <link>https://arxiv.org/abs/2605.03314</link>
      <description>arXiv:2605.03314v2 Announce Type: replace 
Abstract: In single-stream autoregressive interfaces, the same tokens both update the model state and constitute an irreversible public commitment. This coupling creates a silence tax: additional deliberation postpones the first task-relevant content, while naive early streaming risks premature commitments that bias subsequent generations. We introduce Side-by-Side (SxS) Interleaved Reasoning, which makes disclosure timing a controllable decision within standard autoregressive generation. SxS interleaves partial disclosures with continued private reasoning in the same context, but releases content only when it is supported by the reasoning so far. To learn such pacing without incentivizing filler, we construct entailment-aligned interleaved trajectories by matching answer prefixes to supporting reasoning prefixes, then train with SFT to acquire the dual-action semantics and RL to recover reasoning performance under the new format. Across two Qwen3 architectures/scales (MoE Qwen3-30B-A3B, dense Qwen3-4B) and both in-domain (AIME25) and out-of-domain (GPQA-Diamond) benchmarks, SxS improves accuracy--content-latency Pareto trade-offs under token-level proxies such as inter-update waiting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03314v2</guid>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang, Siqi Sun, Qingyun Wang, Chenyu You</dc:creator>
    </item>
    <item>
      <title>Toward Structural Multimodal Representations: Specialization, Selection, and Sparsification via Mixture-of-Experts</title>
      <link>https://arxiv.org/abs/2605.03348</link>
      <description>arXiv:2605.03348v2 Announce Type: replace 
Abstract: We propose S3 (Specialization, Selection, Sparsification), a framework that rethinks multimodal learning through a structural perspective. Instead of encoding all signals into a fixed embedding, S3 decomposes multimodal inputs into semantic experts and selectively routes them for each task. Specialization forms concept-level experts in a shared latent space, Selection adapts routing for task-specific needs, and Sparsification prunes low-utility paths to yield compact, information-minimal representations. Across four MultiBench benchmarks, S3 improves accuracy and shows a consistent reverse U-shaped sparsity-performance trend, with peak performance at intermediate sparsity. These results suggest that structuring multimodal representations as selectable semantic components provides a practical and principled alternative to contrastive learning or InfoMax-driven approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03348v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hahyeon Choi, Nojun Kwak</dc:creator>
    </item>
    <item>
      <title>ReasonAudio: A Benchmark for Evaluating Reasoning Beyond Matching in Text-Audio Retrieval</title>
      <link>https://arxiv.org/abs/2605.03361</link>
      <description>arXiv:2605.03361v2 Announce Type: replace 
Abstract: As multimodal content continues to expand at a rapid pace, audio retrieval has emerged as a key enabling technology for media search, content organization, and intelligent assistants. However, most existing benchmarks concentrate on semantic matching and fail to capture the fact that real-world queries often demand advanced reasoning abilities, including negation understanding, temporal ordering, concurrent event recognition, and duration discrimination. To address this gap, we introduce ReasonAudio, the first reasoning-intensive benchmark for Text-Audio Retrieval, comprising 1,000 queries and 10,000 composite audio clips across five fundamental reasoning tasks: Negation, Order, Overlap, Duration, and Mix. Despite their intuitive nature for humans and straightforward construction, these tasks pose significant challenges to current models. Our evaluation of ten state-of-the-art models reveals the following findings: All models struggle with reasoning-intensive audio retrieval, performing particularly poorly on Negation and Duration while showing relatively better results on Overlap and Order. Moreover, Multimodal Large Language Model-based embedding models fail to inherit the reasoning capabilities of their backbones through contrastive fine-tuning, suggesting that current training paradigms are insufficient to preserve reasoning capacity in retrieval settings</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03361v2</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Honglei Zhang, Yuting Chen, Chenpeng Hu, Siyue Zhang, Yilei Shi</dc:creator>
    </item>
    <item>
      <title>Geometry over Density: Few-Shot Cross-Domain OOD Detection</title>
      <link>https://arxiv.org/abs/2605.03410</link>
      <description>arXiv:2605.03410v2 Announce Type: replace 
Abstract: Out-of-distribution (OOD) detection identifies test samples that fall outside a model's training distribution, a capability critical for safe deployment in high-stakes applications. Standard OOD detectors are trained on a specific in-distribution (ID) dataset and detect deviations from that single domain. In contrast, we study few-shot cross-domain OOD detection: given a \emph{single} pre-trained model, can we perform OOD detection on \emph{arbitrary} new ID-OOD task pairs using only a handful of ID samples at inference time, with no additional training? We propose \textbf{UFCOD}, a unified framework that achieves this goal through information-geometric analysis of diffusion trajectories. Our key insight is that diffusion noise predictions are score functions (gradients of log-density), and we extract two energy features: \emph{Path Energy} (integrated score magnitude) and \emph{Dynamics Energy} (score smoothness), that form a discrete Sobolev norm capturing how samples interact with the learned diffusion process. The central contribution is a \textbf{train-once, deploy-anywhere} paradigm: a diffusion model trained on a single dataset (e.g., CelebA) serves as a universal feature extractor for OOD detection across semantically unrelated domains (e.g., CIFAR-10, SVHN, Textures). At deployment, each new task requires only $\sim$100 unlabeled ID samples for inference: no retraining, no fine-tuning, no task-specific adaptation. Using 100 ID samples per task, UFCOD achieves 93.7\% average AUROC across 12 cross-domain benchmarks, competitive with methods trained on 50k--163k samples, demonstrating $\sim$500$\times$ improvement in sample efficiency. See our code in https://github.com/lili0415/UFCOD.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03410v2</guid>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shawn Li, You Qin, Jiate Li, Charith Peris, Lisa Bauer, Roger Zimmermann, Yue Zhao</dc:creator>
    </item>
    <item>
      <title>Learning Discriminative Signed Distance Functions from Multi-scale Level-of-detail Features for 3D Anomaly Detection</title>
      <link>https://arxiv.org/abs/2605.03437</link>
      <description>arXiv:2605.03437v2 Announce Type: replace 
Abstract: Detecting anomalies from 3D point clouds has received increasing attention in the field of computer vision, with some group-based or point-based methods achieving impressive results in recent years. However, learning accurate point-wise representations for 3D anomaly detection faces great challenges due to the large scale and sparsity of point clouds. In this study, a surface-based method is proposed for 3D anomaly detection, which learns a discriminative signed distance function using multi-scale level-of-detail features. We first present a Noisy Points Generation (NPG) module to generate different types of noise, thereby facilitating the learning of discriminative features by exposing abnormal points. Then, we introduce a Multi-scale Level-of-detail Feature (MLF) module to capture multi-scale information from a point cloud, which provides both fine-grained local and coarse-grained global feature information. Finally, we design an Implicit Surface Discrimination (ISD) module that leverages the extracted multi-scale features to learn an implicit surface representation of point clouds, which effectively trains a signed distance function to distinguish between abnormal and normal points. Experimental results demonstrate that the proposed method achieves an average object-level AUROC of 92.1\% and 85.9\% on the Anomaly-ShapeNet and Real3D-AD datasets, outperforming the current best approach by 2.1\% and 3.6\%, respectively. Codes are available at https://anonymous.4open.science/r/DLF-3AD-DA61.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03437v2</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haibo Xiao, Hanzhe Liang, Jie Zhou, Jinbao Wang, Can Gao</dc:creator>
    </item>
    <item>
      <title>VL-SAM-v3: Memory-Guided Visual Priors for Open-World Object Detection</title>
      <link>https://arxiv.org/abs/2605.03456</link>
      <description>arXiv:2605.03456v2 Announce Type: replace 
Abstract: Open-world object detection aims to localize and recognize objects beyond a fixed closed-set label space. It is commonly divided into two categories, i.e., open-vocabulary detection, which assumes a predefined category list at test time, and open-ended detection, which requires generating candidate categories during the inference. Existing methods rely primarily on coarse textual semantics and parametric knowledge, which often provide insufficient visual evidence for fine-grained appearance variation, rare categories, and cluttered scenes. In this paper, we propose VL-SAM-v3, a unified framework that augments open-world detection with retrieval-grounded external visual memory. Specifically, once candidate categories are available, VL-SAM-v3 retrieves relevant visual prototypes from a non-parametric memory bank and transforms them into two complementary visual priors, i.e., sparse priors for instance-level spatial anchoring and dense priors for class-aware local context. These priors are integrated with the original detection prompts via Memory-Guided Prompt Refinement, enabling a shared retrieval-and-refinement mechanism that supports open-vocabulary and open-ended inference.Extensive zero-shot experiments on LVIS show that VL-SAM-v3 consistently improves detection performance under both open-vocabulary and open-ended inference, with particularly strong gains on rare categories.Moreover, experiments with a stronger open-vocabulary detector (i.e., SAM3) validate the generality of the proposed retrieval-and-refinement mechanism.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03456v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chih-Chung Liu, Zhiwei Lin, Yongtao Wang</dc:creator>
    </item>
    <item>
      <title>WorldJen: An End-to-End Multi-Dimensional Benchmark for Generative Video Models</title>
      <link>https://arxiv.org/abs/2605.03475</link>
      <description>arXiv:2605.03475v2 Announce Type: replace 
Abstract: Evaluating generative video models remains an open problem. Reference-based metrics such as Structural Similarity Index Measure (SSIM) and Peak Signal to Noise Ratio (PSNR) reward pixel fidelity over semantic correctness, while Frechet Video Distance (FVD) favors distributional textures over physical plausibility. Binary Visual Question Answering (VQA) based benchmarks like VBench~2.0 are prone to yes-bias and rely on low-resolution auditors that miss temporal failures. Moreover, their prompts target a single dimension at a time, multiplying the number of videos required while still not guaranteeing reliable results.
  WorldJen addresses these limitations directly. Binary VQA is replaced with Likert-scale questionnaires graded by a VLM that receives frames at native video resolution. Video generation costs are addressed by using adversarially curated prompts that are designed to exercise up to 16 quality dimensions simultaneously. The framework is built around two interlocking contributions. First, A blind human preference study is conducted, accumulating (2,696 pairwise annotations from 7 annotators with 100% pair coverage over 50 of the curated prompts $\times$ 6 state-of-the-art video models. A mean inter-annotator agreement of 66.9% is achieved and the study establishes a human ground-truth Bradley-Terry (BT) rating with a three-tier structure. Second, A VLM-as-a-judge evaluation engine using prompt-specific, dimension-specific Likert questionnaires (10 questions per dimension, 47,160 scored responses) judges the videos and reproduces the human-established three-tier BT rating structure independently. The VLM achieves a Spearman $\hat{\rho}=1.000,~p=0.0014$ that is interpreted as tier agreement with the human results. Six focused ablation studies validate the robustness of the VLM evaluation framework. Project page: https://moonmath.ai/worldjen/</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03475v2</guid>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Karthik Inbasekar, Guy Rom, Omer Shlomovits</dc:creator>
    </item>
    <item>
      <title>Unifying Dynamical Systems and Graph Theory to Mechanistically Understand Computation in Neural Networks</title>
      <link>https://arxiv.org/abs/2605.03598</link>
      <description>arXiv:2605.03598v2 Announce Type: replace 
Abstract: Understanding how biological and artificial neural networks implement computation from connectivity is a central problem in neuroscience and machine learning. In neural systems, structural and functional connectivity are known to diverge, motivating approaches that move beyond direct connections alone. Here, we show that the spatial and temporal function of recurrent neural networks (RNNs) trained on hierarchically modular tasks can be recovered by modelling the network as a graph and analysing the multi-hop pathways between input and output units. In particular, decomposing these pathways by hop length reveals how the network temporally routes information. This perspective reframes regularisation: if function is implemented through multi-hop communication, then standard penalties such as L1 regularisation, which act only on individual weights, constrain single-hop structure rather than the multi-hop pathways that support computation. Motivated by this view, we introduce resolvent-RNNs (R-RNNs), which constrain multi-hop pathways and thereby induce temporal sparsity beyond that achieved by standard L1 regularisation. Compared with L1 regularisation, R-RNNs achieve improved performance by inducing temporal sparsity that matches the task structure, even when the task signal is sparse. Moreover, R-RNNs exhibit stronger sparsity-function alignment, reflected in their increased robustness under strong regularisation. Together, our results identify multi-hop communication as a key principle linking structure to function in recurrent networks, and suggest that sparsity should be defined over functional pathways rather than individual parameters.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03598v2</guid>
      <category>cs.NE</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jatin Sharma, Dan F. M Goodman, Danyal Akarca</dc:creator>
    </item>
    <item>
      <title>The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive Code</title>
      <link>https://arxiv.org/abs/2605.03619</link>
      <description>arXiv:2605.03619v2 Announce Type: replace 
Abstract: Malware authors have traditionally relied on polymorphic techniques to produce variants in the same malware family, complicating signature-based detection. Integrating generative AI into offensive toolchains enables attackers to synthesize structurally diverse payloads with identical behavior, raising the question of how much polymorphism LLMs provide. Recent work has assumed that LLMs can produce sufficiently polymorphic payloads, leaving unquantified the variation that emerges when an attacker repeatedly builds the same payload, or explicitly instructs the model to avoid prior implementations. In this work, we measure the polymorphic capacity of a commercial model (Claude Opus 4.6) as an automated malware generator. We build a dual-agent, four-stage pipeline that generates, tests, and refines a data-exfiltration payload comprising file traversal, encryption, exfiltration, and integration. We produce payloads in two settings: using prompts that specify only functional requirements, and using prompts that inject a structured history of prior outcomes to force divergence. We measure pairwise distances along structural (AST) and semantic (embedding) axes, finding that when polymorphism is not explicitly required, structural distances are high while semantic distances remain low; i.e., implementations diverge widely without changing high-level behavior. Explicit prompting substantially amplifies this structural diversity while preserving correctness, at the cost of roughly 5 times more tokens but only a small increase in LLM calls (from $4.2$ to $4.5$ per payload, with effective API costs of \$0.41 and \$0.73). These results show that a single commercial LLM can cheaply generate large populations of behaviorally equivalent yet structurally diverse payloads, facilitating the evasion of signature-based detection rules and similarity-based clustering.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03619v2</guid>
      <category>cs.CR</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Gabriel Hortea, Juan Tapiador</dc:creator>
    </item>
    <item>
      <title>From Code to Prediction: Fine-Tuning LLMs for Neural Network Performance Classification in NNGPT</title>
      <link>https://arxiv.org/abs/2605.03686</link>
      <description>arXiv:2605.03686v2 Announce Type: replace 
Abstract: Automated Machine Learning (AutoML) frameworks increasingly leverage Large Language Models (LLMs) for tasks such as hyperparameter optimization and neural architecture code generation. However, current LLM-based approaches focus on generative outputs and evaluate them by training the produced artifacts. Whether LLMs can learn to reason about neural network performance across datasets remains underexplored. We present a classification task integrated into the NNGPT framework, in which a fine-tuned LLM predicts which of two image classification datasets a given neural network architecture achieves higher accuracy on. The task is built on the LEMUR dataset, which provides standardized PyTorch implementations with reproducible performance metrics. Three prompt configurations of increasing difficulty are evaluated: a normalized-accuracy baseline (trivially reaching 100%), a metadata-enriched prompt replacing accuracies with dataset properties, and a code-only prompt presenting only architecture source code and dataset names. Using DeepSeek-Coder-7B-Instruct fine-tuned with LoRA, the code-only prompt reaches 80% peak accuracy over 15 epochs, while the metadata prompt peaks at 70%. Perdataset analysis reveals complementary strengths: metadata excels for datasets with distinctive properties (CelebAGender at 90.9%) but degrades for overlapping characteristics, whereas the code-only prompt shows more balanced performance. A comparison with DeepSeek-Coder1.3B confirms that model capacity affects this form of architectural reasoning. The results establish that LLMs can be fine-tuned to predict cross-dataset suitability from neural network code, suggesting that architecture source code contains richer discriminative signal than dataset metadata alone.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03686v2</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mahmoud Hanouneh, Radu Timofte, Dmitry Ignatov</dc:creator>
    </item>
    <item>
      <title>Firmware Distribution as Attack Surface: A Security Study of ASIC Cryptocurrency Miners</title>
      <link>https://arxiv.org/abs/2605.03770</link>
      <description>arXiv:2605.03770v2 Announce Type: replace 
Abstract: ASIC cryptocurrency miners are a core component of blockchain infrastructures, directly converting computation and energy into monetary value. Despite their economic importance, their security is rarely evaluated in a structured manner. In this paper, we show that the firmware distribution ecosystem of mining devices fundamentally challenges existing trust assumptions. We introduce a scalable methodology based on the collection and static analysis of publicly distributed firmware artifacts, requiring neither device access nor runtime interaction. Applying this approach, we reconstruct and analyze 134 firmware images spanning manufacturers that account for over 99% of deployed miners (Bitmain, MicroBT, Canaan, Iceriver). Our results reveal that firmware artifacts alone are sufficient to recover internal architecture, identify security weaknesses, and reconstruct complete attack paths leading to high-impact adversarial objectives. In particular, our analysis reveals vulnerabilities that enable realistic large-scale attack scenarios, including firmware phishing and the exploitation of miners still operating over Stratum V1. Validation on two real devices confirms that publicly distributed artifacts closely reflect deployed software and that these weaknesses translate into attack capabilities. Overall, our study shows that firmware distribution mechanisms themselves constitute a primary attack surface, significantly lowering the barrier to compromise in the ASIC mining ecosystem.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03770v2</guid>
      <category>cs.CR</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Pierre Pouliquen, Hadrien Barral, David Naccache, Thibaut Heckmann, Antoine Houssais</dc:creator>
    </item>
    <item>
      <title>Capability centrality: the next step from scale-free property</title>
      <link>https://arxiv.org/abs/2605.03796</link>
      <description>arXiv:2605.03796v2 Announce Type: replace 
Abstract: In this article we present a new centrality measure called ksi-centrality. We show that ksi-centrality distinguishes real networks from random ones, similar to degree centrality: the ksi-centrality distribution is right-skewed for real networks and centered for random Erdos-Renyi networks, and has linear pattern with a heavy tail on a log plot. Furthermore, the ksi-centrality distribution is centered for models simulating real networks: Barabasi-Albert, Watts-Strogatz, and Boccaletti-Hwang-Latora. Thus, this centrality distribution is an additional and independent property with respect to scale-freeness. We also introduce a normalized version of ksi-centrality and show that it is related to algebraic connectivity and the Chegeer's value of a network. Moreover, the average value of this normalized centrality is in bijective correspondence with the relative number of edges that a new node connects to others in the Barabasi-Albert preferential attachment model, thus answering the question of how to choose the parameter $m$ to model a given real-world network.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03796v2</guid>
      <category>cs.SI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mikhail Tuzhilin</dc:creator>
    </item>
    <item>
      <title>Evaluating Generative Models as Interactive Emergent Representations of Human-Like Collaborative Behavior</title>
      <link>https://arxiv.org/abs/2605.03855</link>
      <description>arXiv:2605.03855v2 Announce Type: replace 
Abstract: Human-AI collaboration requires AI agents to understand human behavior for effective coordination. While advances in foundation models show promising capabilities in understanding and showing human-like behavior, their application in embodied collaborative settings needs further investigation. This work examines whether embodied foundation model agents exhibit emergent collaborative behaviors indicating underlying mental models of their collaborators, which is an important aspect of effective coordination. This paper develops a 2D collaborative game environment where large language model agents and humans complete color-matching tasks requiring coordination. We define five collaborative behaviors as indicators of emergent mental model representation: perspective-taking, collaborator-aware planning, introspection, theory of mind, and clarification. An automated behavior detection system using LLM-based judges identifies these behaviors, achieving fair to substantial agreement with human annotations. Results from the automated behavior detection system show that foundation models consistently exhibit emergent collaborative behaviors without being explicitly trained to do so. These behaviors occur at varying frequencies during collaboration stages, with distinct patterns across different LLMs. A user study was also conducted to evaluate human satisfaction and perceived collaboration effectiveness, with the results indicating positive collaboration experiences. Participants appreciated the agents' task focus, plan verbalization, and initiative, while suggesting improvements in response times and human-like interactions. This work provides an experimental framework for human-AI collaboration, empirical evidence of collaborative behaviors in embodied LLM agents, a validated behavioral analysis methodology, and an assessment of collaboration effectiveness.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03855v2</guid>
      <category>cs.RO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shinas Shaji, Teena Chakkalayil Hassan, Sebastian Houben, Alex Mitrevski</dc:creator>
    </item>
    <item>
      <title>Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards</title>
      <link>https://arxiv.org/abs/2605.03862</link>
      <description>arXiv:2605.03862v2 Announce Type: replace 
Abstract: Reinforcement learning with verifiable rewards has become a common way to improve explicit reasoning in large language models, but final-answer correctness alone does not reveal whether the reasoning trace is faithful, reliable, or useful to the model that consumes it. This outcome-only signal can reinforce traces that are right for the wrong reasons, overstate reasoning gains by rewarding shortcuts, and propagate flawed intermediate states in multi-step systems. To this end, we propose TraceLift, a planner-executor training framework that treats reasoning as a consumable intermediate artifact. During planner training, the planner emits tagged reasoning. A frozen executor turns this reasoning into the final artifact for verifier feedback, while an executor-grounded reward shapes the intermediate trace. This reward multiplies a rubric-based Reasoning Reward Model (RM) score by measured uplift on the same frozen executor, crediting traces that are both high-quality and useful. To make reasoning quality directly learnable, we introduce TRACELIFT-GROUPS, a rubric-annotated reason-only dataset built from math and code seed problems. Each example is a same-problem group containing a high-quality reference trace and multiple plausible flawed traces with localized perturbations that reduce reasoning quality or solution support while preserving task relevance. Extensive experiments on code and math benchmarks show that this executor-grounded reasoning reward improves the two-stage planner-executor system over execution-only training, suggesting that reasoning supervision should evaluate not only whether a trace looks good, but also whether it helps the model that consumes it. Our code is available at: https://github.com/MasaiahHan/TraceLift</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03862v2</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tianyang Han, Hengyu Shi, Junjie Hu, Xu Yang, Zhiling Wang, Junhao Su</dc:creator>
    </item>
    <item>
      <title>PHALAR: Phasors for Learned Musical Audio Representations</title>
      <link>https://arxiv.org/abs/2605.03929</link>
      <description>arXiv:2605.03929v2 Announce Type: replace 
Abstract: Stem retrieval, the task of matching missing stems to a given audio submix, is a key challenge currently limited by models that discard temporal information. We introduce PHALAR, a contrastive framework achieving a relative accuracy increase of up to $\approx 70\%$ over the state-of-the-art while requiring $&lt;50\%$ of the parameters and a 7$\times$ training speedup. By utilizing a Learned Spectral Pooling layer and a complex-valued head, PHALAR enforces pitch-equivariant and phase-equivariant biases. PHALAR establishes new retrieval state-of-the-art across MoisesDB, Slakh, and ChocoChorales, correlating significantly higher with human coherence judgment than semantic baselines. Finally, zero-shot beat tracking and linear chord probing confirm that PHALAR captures robust musical structures beyond the retrieval task.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03929v2</guid>
      <category>cs.SD</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>eess.SP</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Davide Marincione, Michele Mancusi, Giorgio Strano, Luca Cerovaz, Donato Crisostomi, Roberto Ribuoli, Emanuele Rodol\`a</dc:creator>
    </item>
    <item>
      <title>iWorld-Bench: A Benchmark for Interactive World Models with a Unified Action Generation Framework</title>
      <link>https://arxiv.org/abs/2605.03941</link>
      <description>arXiv:2605.03941v2 Announce Type: replace 
Abstract: Achieving Artificial General Intelligence (AGI) requires agents that learn and interact adaptively, with interactive world models providing scalable environments for perception, reasoning, and action. Yet current research still lacks large-scale datasets and unified benchmarks to evaluate their physical interaction capabilities. To address this, we propose iWorld-Bench, a comprehensive benchmark for training and testing world models on interaction-related abilities such as distance perception and memory. We construct a diverse dataset with 330k video clips and select 2.1k high-quality samples covering varied perspectives, weather, and scenes. As existing world models differ in interaction modalities, we introduce an Action Generation Framework to unify evaluation and design six task types, generating 4.9k test samples. These tasks jointly assess model performance across visual generation, trajectory following, and memory. Evaluating 14 representative world models, we identify key limitations and provide insights for future research. The iWorld-Bench model leaderboard is publicly available at iWorld-Bench.com.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03941v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jianjie Fang, Yingshan Lei, Qin Wan, Ziyou Wang, Yuchao Huang, Yongyan Xu, Baining Zhao, Weichen Zhang, Chen Gao, Xinlei Chen, Yong Li</dc:creator>
    </item>
    <item>
      <title>Transformers with Selective Access to Early Representations</title>
      <link>https://arxiv.org/abs/2605.03953</link>
      <description>arXiv:2605.03953v2 Announce Type: replace 
Abstract: Several recent Transformer architectures expose later layers to representations computed in the earliest layers, motivated by the observation that low-level features can become harder to recover as the residual stream is repeatedly transformed through depth. The cheapest among these methods add static value residuals: learned mixing coefficients that expose the first-layer value projection V_1 uniformly across tokens and heads. More expressive dense or dynamic alternatives recover finer-grained access, but at higher memory cost and lower throughput. The usefulness of V_1 is unlikely to be constant across tokens, heads, and contexts; different positions plausibly require different amounts of access to early lexical or semantic information. We therefore treat early-representation reuse as a retrieval problem rather than a connectivity problem, and introduce Selective Access Transformer (SATFormer), which preserves the first-layer value pathway while controlling access with a context-dependent gate. Across models from 130M to 1.3B parameters, SATFormer consistently improves validation loss and zero-shot accuracy over the static value-residual and Transformer baselines. Its strongest gains appear on retrieval-intensive benchmarks, where it improves over static value residuals by approximately 1.5 average points, while maintaining throughput and memory usage close to the baseline Transformer. Gate analyses suggest sparse, depth-dependent, head-specific, and category-sensitive access patterns, supporting the interpretation that SATFormer learns selective reuse of early representations rather than uniform residual copying. Our code is available at https://github.com/SkyeGunasekaran/SATFormer.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03953v2</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Skye Gunasekaran, T\'ea Wright, Rui-Jie Zhu, Jason Eshraghian</dc:creator>
    </item>
    <item>
      <title>Mitigating False Positives in Static Memory Safety Analysis of Rust Programs via Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2605.04000</link>
      <description>arXiv:2605.04000v2 Announce Type: replace 
Abstract: Static analysis tools are essential for ensuring memory safety in Rust programs, particularly as Rust gains adoption in safety-critical domains. However, existing tools such as Rudra and MirChecker suffer from high false positive rates, which diminish developer trust, increase manual review effort, and may obscure genuine vulnerabilities. This paper presents a novel reinforcement learning (RL)-based approach for automatically classifying and suppressing spurious warnings in static memory safety analysis for Rust. To achieve this, we design an RL agent that learns a warning suppression policy by extracting contextual features from Rust's Mid-level Intermediate Representation (MIR) and optimizing its decisions through interaction with static analysis outputs. To improve decision quality, we integrate dynamic validation via cargo-fuzz as an auxiliary feedback mechanism, allowing the agent to selectively validate suspicious warnings through targeted fuzz testing. Our evaluation shows that the proposed approach significantly outperforms state-of-the-art LLM-based baselines, achieving 65.2% accuracy and an F1 score of 0.659, an improvement of 17.1% over the best LLM baseline. With a recall of 74.6%, our method successfully identifies nearly three-quarters of true bugs while substantially reducing false positives, improving precision from 25.6% in raw Rudra output to 59.0%. Incorporating dynamic fuzzing further boosts performance, yielding additional improvements of 10.7 percentage points in accuracy and 8.6 percentage points in F1 score over the RL-only variant. Overall, our work demonstrates that combining reinforcement learning with hybrid static-dynamic analysis can substantially reduce false positives and improve the practical usability of memory safety verification tools for Rust.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04000v2</guid>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Akilesh P, Leuson Da Silva, Foutse Khomh, Sridhar Chimalakonda</dc:creator>
    </item>
    <item>
      <title>Distributional Principal Autoencoders</title>
      <link>https://arxiv.org/abs/2404.13649</link>
      <description>arXiv:2404.13649v2 Announce Type: replace-cross 
Abstract: Dimension reduction techniques usually lose information in the sense that reconstructed data are not identical to the original data. However, we argue that it is possible to have reconstructed data identically distributed as the original data, irrespective of the retained dimension or the specific mapping. This can be achieved by learning a distributional model that matches the conditional distribution of data given its low-dimensional latent variables. Motivated by this, we propose Distributional Principal Autoencoder (DPA) that consists of an encoder that maps high-dimensional data to low-dimensional latent variables and a decoder that maps the latent variables back to the data space. For reducing the dimension, the DPA encoder aims to minimise the unexplained variability of the data with an adaptive choice of the latent dimension. For reconstructing data, the DPA decoder aims to match the conditional distribution of all data that are mapped to a certain latent value, thus ensuring that the reconstructed data retains the original data distribution. Our numerical results on climate data, single-cell data, and image benchmarks demonstrate the practical feasibility and success of the approach in reconstructing the original distribution of the data. DPA embeddings are shown to preserve meaningful structures of data such as the seasonal cycle for precipitations and cell types for gene expression.</description>
      <guid isPermaLink="false">oai:arXiv.org:2404.13649v2</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>stat.ME</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xinwei Shen, Nicolai Meinshausen</dc:creator>
    </item>
    <item>
      <title>Symmetric Linear Arc Monadic Datalog and Gadget Reductions</title>
      <link>https://arxiv.org/abs/2407.04924</link>
      <description>arXiv:2407.04924v4 Announce Type: replace-cross 
Abstract: A Datalog program solves a constraint satisfaction problem (CSP) if and only if it derives the goal predicate precisely on the unsatisfiable instances of the CSP.
  There are three Datalog fragments that are particularly important for finite-domain constraint satisfaction: arc monadic Datalog, linear Datalog, and symmetric linear Datalog, each having good computational properties. We consider the fragment of Datalog where we impose all of these restrictions simultaneously, i.e., we study \emph{symmetric linear arc monadic (slam) Datalog}. We characterise the CSPs that can be solved by a slam Datalog program as those that have a gadget reduction to a particular Boolean constraint satisfaction problem. We also present exact characterisations in terms of a homomorphism duality (which we call \emph{unfolded caterpillar duality}), and in universal-algebraic terms (using known minor conditions, namely the existence of quasi Maltsev operations and $k$-absorptive operations of arity $nk$}, for all $n,k \geq 1$). Our characterisations also imply that the question whether a given finite-domain CSP can be expressed by a slam Datalog program is decidable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2407.04924v4</guid>
      <category>math.RA</category>
      <category>cs.CC</category>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <arxiv:DOI>10.1007/s00224-026-10261-2</arxiv:DOI>
      <dc:creator>Manuel Bodirsky, Florian Starke</dc:creator>
    </item>
    <item>
      <title>Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes</title>
      <link>https://arxiv.org/abs/2407.14861</link>
      <description>arXiv:2407.14861v3 Announce Type: replace-cross 
Abstract: With the growing access to administrative health databases, retrospective studies have become crucial evidence for medical treatments. Yet, non-randomized studies frequently face selection biases, requiring mitigation strategies. Propensity score matching (PSM) addresses these biases by selecting comparable populations, allowing for analysis without further methodological constraints. However, PSM has several drawbacks. Different matching methods can produce significantly different Average Treatment Effects (ATE) for the same task, even when meeting all validation criteria. To prevent cherry-picking the best method, public authorities must involve field experts and engage in extensive discussions with researchers.
  To address this issue, we introduce a novel metric, A2A, to reduce the number of valid matches. A2A constructs artificial matching tasks that mirror the original ones but with known outcomes, assessing each matching method's performance comprehensively from propensity estimation to ATE estimation. When combined with Standardized Mean Difference, A2A enhances the precision of model selection, resulting in a reduction of up to 50% in ATE estimation errors across synthetic tasks and up to 90% in predicted ATE variability across both synthetic and real-world datasets. To our knowledge, A2A is the first metric capable of evaluating outcome correction accuracy using covariates not involved in selection.
  Computing A2A requires solving hundreds of PSMs, we therefore automate all manual steps of the PSM pipeline. We integrate PSM methods from Python and R, our automated pipeline, a new metric, and reproducible experiments into popmatch, our new Python package, to enhance reproducibility and accessibility to bias correction methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2407.14861v3</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>stat.ME</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Alexandre Abraham, Andr\'es Hoyos Idrobo</dc:creator>
    </item>
    <item>
      <title>Fully Guided Neural Schr\"odinger bridge for Brain MR image synthesis</title>
      <link>https://arxiv.org/abs/2501.14171</link>
      <description>arXiv:2501.14171v3 Announce Type: replace-cross 
Abstract: Multi-modal brain MRI provides essential complementary information for clinical diagnosis. However, acquiring all modalities in practice is often constrained by time and cost. To address this, various methods have been proposed to generate missing modalities from available ones. Existing approaches can be broadly categorized into two types: paired and unpaired methods. While paired methods achieve high synthesis accuracy, obtaining large-scale paired datasets is typically impractical. In contrast, unpaired methods, though more scalable, often fail to preserve critical anatomical features, such as lesions. In this paper, we propose Fully Guided Schr\"odinger Bridge (FGSB), a novel framework designed to overcome these limitations by enabling high-fidelity generation with extremely limited paired data. When lesion-specific information, such as expert annotations or segmentation masks, is available, FGSB preserves clinically relevant lesions during missing modality synthesis. Our model comprises two stages: (1) a generation stage that iteratively refines synthetic images using paired source images and Gaussian noise, and (2) a training stage that learns optimal transformation pathways by modeling intermediate states to ensure consistent, high-fidelity synthesis. Experimental results across multiple datasets demonstrate that FGSB achieves reliable synthesis performance across diverse imaging resolutions and data acquisition environments. In addition, incorporating lesion-specific priors further enhances the preservation of clinically relevant features.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.14171v3</guid>
      <category>eess.IV</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hanyeol Yang, Sunggyu Kim, Mi Kyung Kim, Yongseon Yoo, Yu-Mi Kim, Min-Ho Shin, Insung Chung, Sang Baek Koh, Hyeon Chang Kim, Jong-Min Lee</dc:creator>
    </item>
    <item>
      <title>Dynamic Modeling and Control of Multi-Stack Alkaline Water Electrolysis Systems with Shared Gas Separators and Lye Circulation: An Experiment-Based Study</title>
      <link>https://arxiv.org/abs/2501.14576</link>
      <description>arXiv:2501.14576v2 Announce Type: replace-cross 
Abstract: An emerging approach for large-scale renewable hydrogen production is integrating multiple alkaline water electrolysis (AWE) stacks into one balance-of-plant (BoP) system, sharing gas-lye separation and lye circulation components. While this configuration, termed $N$-in-1, reduces cost and complexity, its dynamic performance under fluctuating power remains unclear compared with conventional 1-in-1 systems. This paper develops a state-space model of the multi-stack AWE system, capturing lye circulation, temperature, and hydrogen-to-oxygen (HTO) dynamics, calibrated via experiments on a 4,000 Nm$^3$/h-rated 4-in-1 system. A nonlinear model predictive controller (NMPC) is then designed to coordinate inter-stack current distribution, lye flow, and cooling for load tracking and operational stability. Simulations on the experimental-validated model show that a $4$-in-1 system can achieve very similar performance compared to four parallel 1-in-1 systems. Differences in load-tracking error, temperature stabilization, and specific energy consumption remain below 0.015 MW, 0.346 K, and 0.001 kWh/Nm$^3$ under wind power supply.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.14576v2</guid>
      <category>math.OC</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yiwei Qiu (College of Electrical Engineering, Sichuan University), Jiatong Li (College of Electrical Engineering, Sichuan University), Yangjun Zeng (College of Electrical Engineering, Sichuan University), Yi Zhou (College of Electrical Engineering, Sichuan University), Shi Chen (College of Electrical Engineering, Sichuan University), Xiaoyan Qiu (College of Electrical Engineering, Sichuan University), Buxiang Zhou (College of Electrical Engineering, Sichuan University), Ge He (Sichuan Tsinghua Energy Internet Research Institute), Xu Ji (Sichuan Tsinghua Energy Internet Research Institute), Wenying Li (Sichuan Tsinghua Energy Internet Research Institute)</dc:creator>
    </item>
    <item>
      <title>Three Fundamental Questions in Modern Infinite-Domain Constraint Satisfaction</title>
      <link>https://arxiv.org/abs/2502.06621</link>
      <description>arXiv:2502.06621v4 Announce Type: replace-cross 
Abstract: The Feder-Vardi dichotomy conjecture for Constraint Satisfaction Problems (CSPs) with finite templates, confirmed independently by Bulatov and Zhuk, has an extension to certain well-behaved infinite templates due to Bodirsky and Pinsker which remains wide open. We formulate three fundamental questions on the scope of the Bodirsky-Pinsker conjecture and provide positive answers to them.
  Our first two main results provide two simplifications of this scope, one of structural, and the other one of algebraic nature. The former simplification implies that the conjecture is equivalent to its restriction to templates without algebraicity, a crucial assumption in the most powerful classification methods. The latter yields that the higher-arity invariants of any template within its scope can be assumed to be essentially injective, and any algebraic condition characterizing any complexity class within the conjecture closed under Datalog reductions must be satisfiable by injections, thus lifting the mystery of the better applicability of certain algebraic conditions over others.
  Our third main result uses the first one to show that any non-trivially tractable template within the scope serves, up to a Datalog-computable modification of it, as the witness of the tractability of a non-finitely tractable finite-domain Promise Constraint Satisfaction Problem (PCSP) by the so-called sandwich method. This provides a particularly strong connection between the Bodirsky-Pinsker conjecture and finite-domain PCSPs.
  In the light of the third main result, we initiate a new case study-of phylogeny CSPs-which we investigate from the perspective of descriptive complexity. Within this study, we show that there exists a tractable phylogeny CSP that pp-constructs a finite-domain PCSP inexpressible in fixed-point logic with counting but does not pp-construct any finite-domain CSP with this property.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.06621v4</guid>
      <category>math.LO</category>
      <category>cs.LO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Michael Pinsker, Jakub Rydval, Moritz Sch\"obi, Christoph Spiess, Paul Winkler</dc:creator>
    </item>
    <item>
      <title>Efficiency of Parallel and Restart Exploration Strategies in Model Free Stochastic Simulations</title>
      <link>https://arxiv.org/abs/2503.03565</link>
      <description>arXiv:2503.03565v3 Announce Type: replace-cross 
Abstract: We analyze the efficiency of parallelization and restart mechanisms for stochastic simulations in model-free settings, where the underlying system dynamics are unknown. Such settings are common in Reinforcement Learning (RL) and rare event estimation, where standard variance-reduction techniques like importance sampling are inapplicable. Focusing on the challenge of reaching rare states under a finite computational budget, we model exploration via random walks and L\'evy processes. Based on rigorous probability analysis, our work reveals a phase transition in the success probability as a function of the number of parallel simulations: an optimal number $N^*$ exists, balancing exploration diversity and time allocation per simulation. Beyond this threshold, performance degrades exponentially. Furthermore, we demonstrate that a restart strategy, which reallocates resources from stagnant trajectories to promising regions, can yield an exponential improvement in success probability. In the context of RL, these strategies can improve policy gradient methods by enabling more efficient state-space exploration, leading to more accurate policy gradient estimates.</description>
      <guid isPermaLink="false">oai:arXiv.org:2503.03565v3</guid>
      <category>math.PR</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ernesto Garcia, Paola Bermolen, Matthieu Jonckheere, Seva Shneer</dc:creator>
    </item>
    <item>
      <title>Construction and Decoding of Quantum Margulis Codes</title>
      <link>https://arxiv.org/abs/2503.03936</link>
      <description>arXiv:2503.03936v3 Announce Type: replace-cross 
Abstract: Quantum low-density parity-check codes are a promising approach to fault-tolerant quantum computation, offering potential advantages in rate and decoding efficiency. In this work, we introduce quantum Margulis codes, a new class of QLDPC codes derived from Margulis' classical LDPC construction via the two-block group algebra framework. We show that quantum Margulis codes, unlike bivariate bicycle codes which require ordered statistics decoding for effective error correction, can be efficiently decoded using a standard min-sum decoder with linear complexity, when decoded under the code capacity noise model. This is attributed to their Tanner graph structure, which does not exhibit group symmetry, thereby mitigating the well-known problem of error degeneracy in QLDPC decoding. To further enhance performance, we propose an algorithm for constructing 2BGA codes with controlled girth, ensuring a minimum girth of 6 or 8, and use it to generate several quantum Margulis codes of length 240 and 642. We validate our approach through numerical simulations, demonstrating that quantum Margulis codes behave significantly better than BB codes in the error floor region, under min-sum decoding.</description>
      <guid isPermaLink="false">oai:arXiv.org:2503.03936v3</guid>
      <category>quant-ph</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Michele Pacenti, Dimitris Chytas, Bane Vasic</dc:creator>
    </item>
    <item>
      <title>Multi-site modelling and reconstruction of past extreme skew surges along the French Atlantic coast</title>
      <link>https://arxiv.org/abs/2505.00835</link>
      <description>arXiv:2505.00835v2 Announce Type: replace-cross 
Abstract: Appropriate modelling of extreme skew surges is crucial, particularly for coastal risk management. Our study focuses on modelling extreme skew surges along the French Atlantic coast, with a particular emphasis on investigating the extremal dependence structure between stations. We employ the peak-over-threshold framework, where a multivariate extreme event is defined whenever at least one location records a large value, though not necessarily all stations simultaneously. A novel method for determining an appropriate level (threshold) above which observations can be classified as extreme is proposed. Two complementary approaches are explored. First, the multivariate generalized Pareto distribution is employed to model extremes, leveraging its properties to derive a generative model that predicts extreme skew surges at one station based on observed extremes at nearby stations. Second, a novel extreme regression framework is assessed for point predictions. This specific regression framework enables accurate point predictions using only the 'angle' of input variables, i.e., input variables divided by their norms. The ultimate objective is to reconstruct historical skew surge time series at stations with limited data. This is achieved by integrating extreme skew surge data from stations with longer records, such as Brest and Saint-Nazaire, which provide over 150 years of observations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.00835v2</guid>
      <category>stat.AP</category>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nathan Huet, Philippe Naveau, Anne Sabourin</dc:creator>
    </item>
    <item>
      <title>Scalable Policy Maximization Under Network Interference</title>
      <link>https://arxiv.org/abs/2505.18118</link>
      <description>arXiv:2505.18118v2 Announce Type: replace-cross 
Abstract: Many interventions, such as vaccines in clinical trials or coupons in online marketplaces, must be assigned sequentially without full knowledge of their effects. Multi-armed bandit algorithms have proven successful in such settings. However, standard independence assumptions fail when the treatment status of one individual impacts the outcomes of others, a phenomenon known as interference. We study optimal-policy learning under interference on a dynamic network. Existing approaches to this problem require repeated observations of the same fixed network and struggle to scale in sample size beyond as few as fifteen connected units -- both limit applications. We show that under common assumptions on the structure of interference, rewards become linear. This enables us to develop a scalable Thompson sampling algorithm that maximizes policy impact when a new $n$-node network is observed each round. We prove a Bayesian regret bound that is sublinear in $n$ and the number of rounds. Simulation experiments show that our algorithm learns quickly and outperforms existing methods. The results close a key scalability gap between causal inference methods for interference and practical bandit algorithms, enabling policy optimization in large-scale networked systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.18118v2</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aidan Gleich, Eric Laber, Alexander Volfovsky</dc:creator>
    </item>
    <item>
      <title>Online Voting using Point to MultiPoint Quantum Key Distribution</title>
      <link>https://arxiv.org/abs/2505.21756</link>
      <description>arXiv:2505.21756v3 Announce Type: replace-cross 
Abstract: The use of quantm mechanisms in the service of voting security suffers from the problem that in order to generate keys for voters and verifiers a point to point connection has to be physically established for each pair, rendering this impractical. We thus propose using Point-to-Multipoint quantum key distribution (QKD) via time division multiplexing (TDM) and wavelength division multiplexing (WDM) in passive optical networks (PON) to improve both the deployment and security of online voting systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.21756v3</guid>
      <category>quant-ph</category>
      <category>cs.CR</category>
      <category>cs.ET</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bernardo A. Huberman, Jing Wang</dc:creator>
    </item>
    <item>
      <title>A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning</title>
      <link>https://arxiv.org/abs/2506.14432</link>
      <description>arXiv:2506.14432v3 Announce Type: replace-cross 
Abstract: We present FOMO260K, a large-scale, heterogeneous dataset of 260,927 brain Magnetic Resonance Imaging (MRI) scans from 77,589 MRI sessions and 55,378 subjects, aggregated from 910 publicly available sources. The dataset includes both clinical- and research-grade images, multiple MRI sequences, and a wide range of anatomical and pathological variability, including scans with large brain anomalies. Minimal preprocessing was applied to preserve the original image characteristics while reducing entry barriers for new users. Companion code for self-supervised pretraining and finetuning is provided, along with pretrained models. FOMO260K is intended to support the development and benchmarking of self-supervised learning methods in medical imaging at scale.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.14432v3</guid>
      <category>eess.IV</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Stefano Cerri, Asbj{\o}rn Munk, Sebastian N{\o}rgaard Llambias, Jakob Ambsdorf, Julia Machnio, Vardan Nersesjan, Christian Hedeager Krag, Peirong Liu, Pablo Rocamora Garc\'ia, Mostafa Mehdipour Ghazi, Mikael Boesen, Michael Eriksen Benros, Juan Eugenio Iglesias, Mads Nielsen</dc:creator>
    </item>
    <item>
      <title>Microcanonical simulated annealing: Massively parallel Monte Carlo simulations with sporadic random-number generation</title>
      <link>https://arxiv.org/abs/2506.16240</link>
      <description>arXiv:2506.16240v4 Announce Type: replace-cross 
Abstract: Numerical simulations of models and theories that describe complex systems such as spin glasses are becoming increasingly important. Beyond fundamental research, these computational methods also find practical applications in fields like combinatorial optimization. However, Monte Carlo simulations, an important subcategory of these methods, are plagued by a major drawback: they are extremely greedy for (pseudo) random numbers. The total fraction of computer time dedicated to random-number generation increases as the hardware grows more sophisticated, and can get prohibitive for special-purpose computing platforms. We propose here a general-purpose microcanonical simulated annealing (MicSA) formalism that dramatically reduces such a burden. The algorithm is fully adapted to a massively parallel computation, as we show in the particularly demanding benchmark of the three-dimensional Ising spin glass. We carry out very stringent numerical tests of the new algorithm by comparing our results, obtained on GPUs, with high-precision standard (i.e., random-number-greedy) simulations performed on the Janus II custom-built supercomputer. In those cases where thermal equilibrium is reachable (i.e., in the paramagnetic phase), both simulations reach compatible values. More significantly, barring short-time corrections, a simple time rescaling suffices to map the MicSA off-equilibrium dynamics onto the results obtained with standard simulations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.16240v4</guid>
      <category>cond-mat.stat-mech</category>
      <category>cond-mat.dis-nn</category>
      <category>cs.AR</category>
      <category>physics.comp-ph</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1016/j.cpc.2026.110182</arxiv:DOI>
      <arxiv:journal_reference>Comp. Phys. Comm. 325, 110182 (2026)</arxiv:journal_reference>
      <dc:creator>M. Bernaschi, C. Chilin, L. A. Fernandez, I. Gonz\'alez-Adalid Pemart\'in, E. Marinari, V. Martin-Mayor, G. Parisi, F. Ricci-Tersenghi, J. J. Ruiz-Lorenzo, D. Yllanes</dc:creator>
    </item>
    <item>
      <title>Cardiovascular disease classification using radiomics and geometric features from cardiac CT</title>
      <link>https://arxiv.org/abs/2506.22226</link>
      <description>arXiv:2506.22226v2 Announce Type: replace-cross 
Abstract: Automatic detection and classification of Cardiovascular disease (CVD) from Computed Tomography (CT) images play an important part in facilitating better-informed clinical decisions. However, most of the recent deep learning based methods either directly work on raw CT data or utilize it in pair with anatomical cardiac structure segmentation by training an end-to-end classifier. As such, these approaches become much more difficult to interpret from a clinical perspective. To address this challenge, in this work, we break down the CVD classification pipeline into three components: (i) image segmentation, (ii) image registration, and (iii) downstream CVD classification. Specifically, we utilize the Atlas-ISTN framework and recent segmentation foundational models to generate anatomical structure segmentation and a normative healthy atlas. These are further utilized to extract clinically interpretable radiomic features as well as deformation field based geometric features (through atlas registration) for CVD classification. Our experiments on the publicly available ASOCA dataset show that utilizing these features leads to better CVD classification accuracy (87.50\%) when compared against classification model trained directly on raw CT images (67.50\%). Our code is publicly available: https://github.com/biomedia-mira/grc-net</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.22226v2</guid>
      <category>eess.IV</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ajay Mittal, Raghav Mehta, Omar Todd, Philipp Seeb\"ock, Georg Langs, Ben Glocker</dc:creator>
    </item>
    <item>
      <title>Colorful Minors</title>
      <link>https://arxiv.org/abs/2507.10467</link>
      <description>arXiv:2507.10467v3 Announce Type: replace-cross 
Abstract: We introduce the notion of colorful minors, which generalizes the classical concept of rooted minors in graphs. A $q$-colorful graph is defined as a pair $(G, \chi),$ where $G$ is a graph and $\chi$ assigns to each vertex a (possibly empty) subset of at most $q$ colors. The colorful minor relation enhances the classical minor relation by merging color sets at contracted edges and allowing the removal of colors from vertices. This framework naturally models algorithmic problems involving graphs with (possibly overlapping) annotated vertex sets. We develop a structural theory for colorful minors by establishing three core theorems characterizing $\mathcal{H}$-colorful minor-free graphs, where $\mathcal{H}$ consists either of a clique or a grid with all vertices assigned all colors, or of grids with colors segregated and ordered on the outer face. Our results reveal that when exclusion is imposed not only on graphs but also to the way colors are distributed in them, a more refined structural landscape appears.
  Leveraging our structural insights, we provide a complete classification -- parameterized by the number $q$ of colors -- of all colorful graphs that exhibit the Erd\H{o}s-P\'osa property with respect to colorful minors. On the algorithmic side, we deduce that colorful minor testing is fixed-parameter tractable. Together with the fact that the colorful minor relation forms a well-quasi-order, this implies that every colorful minor-monotone parameter on colorful graphs admits a fixed-parameter algorithm. Furthermore, we derive two algorithmic meta-theorems (AMTs) whose structural conditions are linked to extensions of treewidth and Hadwiger number on colorful graphs. Our results suggest how known AMTs can be extended to incorporate not only the structure of the input graph but also the way the colored vertices are distributed in it.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.10467v3</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Evangelos Protopapas, Dimitrios M. Thilikos, Sebastian Wiederrecht</dc:creator>
    </item>
    <item>
      <title>Provable Non-Convex Euclidean Distance Matrix Completion: Geometry, Reconstruction, and Robustness</title>
      <link>https://arxiv.org/abs/2508.00091</link>
      <description>arXiv:2508.00091v3 Announce Type: replace-cross 
Abstract: The problem of recovering the configuration of points from their partial pairwise distances, referred to as the Euclidean Distance Matrix Completion (EDMC) problem, arises in a broad range of applications, including sensor network localization, molecular conformation, and manifold learning. In this paper, we propose a Riemannian optimization framework for solving the EDMC problem by formulating it as a low-rank matrix completion task over the space of positive semi-definite Gram matrices. The available distance measurements are encoded as expansion coefficients in a non-orthogonal basis, and optimization over the Gram matrix implicitly enforces geometric consistency through nonnegativity and the triangle inequality, a structure inherited from classical multidimensional scaling. Under a Bernoulli sampling model for observed distances, we prove that Riemannian gradient descent on the manifold of rank-$r$ matrices locally converges linearly with high probability when the sampling probability satisfies $p\geq O(\nu^2 r^2\log(n)/n)$, where $\nu$ is an EDMC-specific incoherence parameter. Furthermore, we provide an initialization candidate using a one-step hard thresholding procedure that yields convergence, provided the sampling probability satisfies $p \geq O(\nu r^{3/2}\log^{3/4}(n)/n^{1/4})$. A key technical contribution of this work is the analysis of a symmetric linear operator arising from a dual basis expansion in the non-orthogonal basis, which requires analysis of a second order degenerate U-statistic to establish an optimal restricted isometry property in the presence of coupled terms. Empirical evaluations on synthetic data demonstrate that our algorithm achieves competitive performance relative to state-of-the-art methods. Moreover, we provide a geometric interpretation of matrix incoherence tailored to the EDMC setting and provide robustness guarantees for our method.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.00091v3</guid>
      <category>math.OC</category>
      <category>cs.CG</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Chandler Smith, HanQin Cai, Abiy Tasissa</dc:creator>
    </item>
    <item>
      <title>Echoes of the Past: A Unified Perspective on Fading memory and Echo States</title>
      <link>https://arxiv.org/abs/2508.19145</link>
      <description>arXiv:2508.19145v3 Announce Type: replace-cross 
Abstract: Recurrent neural networks (RNNs) have become increasingly popular in information processing tasks involving time series and temporal data. A fundamental property of RNNs is their ability to create reliable input/output responses, often linked to how the network handles its memory of the information it processed. Various notions have been proposed to conceptualize the behavior of memory in RNNs, including steady states, echo states, state forgetting, input forgetting, and fading memory. Although these notions are often used interchangeably, their precise relationships remain unclear. This work aims to unify these notions in a common language, derive new implications and equivalences between them, and provide alternative proofs to some existing results. By clarifying the relationships between these concepts, this research contributes to a deeper understanding of RNNs and their temporal information processing capabilities.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.19145v3</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>math.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1162/NECO.a.1510</arxiv:DOI>
      <arxiv:journal_reference>Neural Computation, vol 38(5), 2026</arxiv:journal_reference>
      <dc:creator>Juan-Pablo Ortega, Florian Rossmannek</dc:creator>
    </item>
    <item>
      <title>Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning</title>
      <link>https://arxiv.org/abs/2509.10784</link>
      <description>arXiv:2509.10784v3 Announce Type: replace-cross 
Abstract: Medical vision foundation models remain limited in downstream tasks, particularly volumetric medical image segmentation. While fine-tuning on labeled target-domain data improves performance, existing approaches typically rely on randomly selected samples, which may fail to identify the most informative data and thus hinder adaptation. To address the limitations, we propose an Active Selective Semi-supervised Fine-tuning framework for efficient adaptation of Med-VFMs to generalize across volumetric medical image segmentation. ASSFT integrates a novel active learning strategy with selective semi-supervised learning to maximize adaptation performance under a limited annotation budget, without requiring access to source data. Specifically, we introduce an Active Test-Time Sample Query strategy that identifies informative samples from the target domain using two complementary query metrics: Diversified Knowledge Divergence and Anatomical Segmentation Difficulty. DKD quantifies both the knowledge gap between pre-training and target domains and the semantic diversity within the target dataset, enabling the selection of samples that contain previously unlearned knowledge while maintaining intra-domain diversity. ASD estimates the segmentation difficulty of target anatomical structures by measuring predictive uncertainty within foreground regions of interest, allowing the model to prioritize samples with complex anatomical patterns rather than those dominated by background uncertainty. Second, we propose a Selective Semi-supervised Fine-tuning strategy to further improve adaptation performance by leveraging unlabeled target samples. Instead of utilizing all pseudo-labeled data, the proposed method selectively incorporates reliable unlabeled samples based on predictive confidence and semantic distance to labeled samples, enabling stable semi-supervised training while avoiding noisy pseudo-labels.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.10784v3</guid>
      <category>eess.IV</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jin Yang, Daniel S. Marcus, Aristeidis Sotiras</dc:creator>
    </item>
    <item>
      <title>Addressing Methodological Sensitivity in MCDM with a Systematic Pipeline Approach to Data Transformation Sensitivity Analysis</title>
      <link>https://arxiv.org/abs/2509.24996</link>
      <description>arXiv:2509.24996v2 Announce Type: replace-cross 
Abstract: Multicriteria decision-making methods exhibit critical dependence on the choice of normalization techniques, where different selections can alter 20-40% of the final rankings. Current practice is characterized by the ad-hoc selection of methods without systematic robustness evaluation. We present a framework that addresses this methodological sensitivity through automated exploration of the scaling transformation space. The implementation leverages the existing Scikit-Criteria infrastructure to automatically generate all possible methodological combinations and provide robust comparative analysis.We apply this approach in an evaluation dataset of cryptocurrencies with 6 methodological scenarios, showing a range of correlation between methods, explicitly quantifying the methodological sensitivity limits.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.24996v2</guid>
      <category>math.OC</category>
      <category>cs.SE</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Juan B. Cabral, Alvaro Roy Schachner</dc:creator>
    </item>
    <item>
      <title>Deep Learning in Astrophysics</title>
      <link>https://arxiv.org/abs/2510.10713</link>
      <description>arXiv:2510.10713v2 Announce Type: replace-cross 
Abstract: Deep learning has generated diverse perspectives in astronomy, with ongoing discussions between proponents and skeptics motivating this review. We examine how neural networks complement classical statistics, extending our data analytical toolkit for modern surveys. Astronomy offers unique opportunities through encoding physical symmetries, conservation laws, and differential equations directly into architectures, creating models that generalize beyond training data. Yet challenges persist as unlabeled observations number in billions while confirmed examples with known properties remain scarce and expensive. This review demonstrates how deep learning incorporates domain knowledge through architectural design, with built-in assumptions guiding models toward physically meaningful solutions. We evaluate where these methods offer genuine advances versus claims requiring careful scrutiny.
  - Neural architectures overcome bias-variance trade-offs among scalability, expressivity, and data efficiency by encoding physical symmetries and conservation laws into network structure, enabling learning from limited labeled data.
  - Simulation-based inference and anomaly detection extract information from complex, non-Gaussian distributions where analytical likelihoods fail, enabling field-level cosmological analysis and systematic discovery of rare phenomena.
  - Multiscale neural modeling bridges resolution gaps in astronomical simulations, learning effective subgrid physics from expensive high-fidelity runs to enhance large-volume calculations where direct computation remains prohibitive.
  - Emerging paradigms-reinforcement learning for telescope operations, foundation models learning from minimal examples, and large language model agents for research automation-show promise though are still developing in astronomical applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.10713v2</guid>
      <category>astro-ph.IM</category>
      <category>astro-ph.CO</category>
      <category>astro-ph.EP</category>
      <category>astro-ph.GA</category>
      <category>astro-ph.HE</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1146/annurev-astro-051024-021708</arxiv:DOI>
      <arxiv:journal_reference>Annual Review of Astronomy and Astrophysics, 2026, Vol. 64</arxiv:journal_reference>
      <dc:creator>Yuan-Sen Ting</dc:creator>
    </item>
    <item>
      <title>Learning Time-Varying Graphs from Incomplete Graph Signals</title>
      <link>https://arxiv.org/abs/2510.17903</link>
      <description>arXiv:2510.17903v3 Announce Type: replace-cross 
Abstract: This paper tackles the challenging problem of jointly inferring time-varying network topologies and imputing missing data from partially observed graph signals. We propose a unified non-convex optimization framework to simultaneously recover a sequence of graph Laplacian matrices while reconstructing the unobserved signal entries. Unlike conventional decoupled methods, our integrated approach facilitates a bidirectional flow of information between the graph and signal domains, yielding superior robustness, particularly in high missing-data regimes. To capture realistic network dynamics, we introduce a fused-lasso type regularizer on the sequence of Laplacians. This penalty promotes temporal smoothness by penalizing large successive changes, thereby preventing spurious variations induced by noise while still permitting gradual topological evolution. For solving the joint optimization problem, we develop an efficient Proximal Alternating Direction Method of Multipliers (PADMM) algorithm, which leverages the problem's structure to yield closed-form solutions for both the graph and signal subproblems. This design ensures scalability to large-scale networks and long time horizons. On the theoretical front, despite the inherent non-convexity, we establish a convergence guarantee, proving that the proposed PADMM scheme converges to a stationary point. Furthermore, we derive non-asymptotic statistical guarantees, providing high-probability error bounds for the graph estimator as a function of sample size, signal smoothness, and the intrinsic temporal variability of the graph. Extensive numerical experiments validate the approach, demonstrating that it significantly outperforms state-of-the-art baselines in both convergence speed and the joint accuracy of graph learning and signal recovery.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.17903v3</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chuansen Peng, Xiaojing Shen</dc:creator>
    </item>
    <item>
      <title>Iso-Riemannian Optimization on Learned Data Manifolds</title>
      <link>https://arxiv.org/abs/2510.21033</link>
      <description>arXiv:2510.21033v2 Announce Type: replace-cross 
Abstract: High-dimensional data with intrinsic low-dimensional structure is ubiquitous in machine learning and data science. While various approaches allow one to learn a data manifold with a Riemannian structure from finite samples, performing downstream tasks such as optimization directly on these learned manifolds remains challenging. In particular, Euclidean convex functions cannot be assumed to be geodesically convex, and the associated Riemannian gradient fields are generally not monotone in the classical Riemannian sense. As a result, existing Riemannian optimization theory neither identifies a canonical vector field to use in first-order schemes nor guarantees their convergence in this setting. To address this, we introduce notions of convexity, monotonicity, and Lipschitz continuity induced by a connection different from the Levi-Civita connection, namely the recently proposed iso-connection. Within this iso-Riemannian framework, we propose an iso-Riemannian descent algorithm and provide a detailed convergence analysis. We then show, for several downstream tasks - including iso-Riemannian barycentre computation and the optimization of Euclidean convex functions over learned data manifolds - that iso-convexity, iso-monotonicity, and iso-Lipschitz continuity form the right set of assumptions to reconcile learned geometry with Euclidean convexity. Experiments on synthetic and real datasets, including MNIST, endowed with a learned pullback structure, demonstrate that our approach yields interpretable barycentres, improved clustering, and provably efficient solutions to inverse problems, even in high-dimensional settings. Taken together, these results show that iso-Riemannian optimization provides a natural geometric framework for designing and analyzing algorithms on learned data manifolds.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.21033v2</guid>
      <category>math.OC</category>
      <category>cs.LG</category>
      <category>math.DG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Willem Diepeveen, Melanie Weber</dc:creator>
    </item>
    <item>
      <title>Human-computer interactions predict mental health</title>
      <link>https://arxiv.org/abs/2511.20179</link>
      <description>arXiv:2511.20179v5 Announce Type: replace-cross 
Abstract: Scalable assessments of mental illness remain a critical roadblock toward accessible and equitable care. Here, we show that everyday human-computer interactions encode mental health with biomarker accuracy. We introduce MAILA, a MAchine-learning framework for Inferring Latent mental states from digital Activity. We trained MAILA on 18,200 cursor and touchscreen recordings labeled with 1.3 million mental-health self-reports collected from 9,500 participants. MAILA tracks dynamic mental states along 13 clinically relevant dimensions, resolves circadian fluctuations and experimental manipulations of arousal and valence, achieves near-ceiling accuracy at the group level, captures information that is only partially reflected in verbal self-report, and improves the ability of large language models to infer user mental health. By extracting signatures of psychological function that have so far remained untapped, MAILA establishes human-computer interactions as a new modality for scalable digital phenotyping and a foundation for context-aware artificial intelligence.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.20179v5</guid>
      <category>q-bio.NC</category>
      <category>cs.AI</category>
      <category>cs.HC</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Veith Weilnhammer, Jefferson Ortega, David Whitney</dc:creator>
    </item>
    <item>
      <title>Resolvable Triple Arrays</title>
      <link>https://arxiv.org/abs/2512.08681</link>
      <description>arXiv:2512.08681v2 Announce Type: replace-cross 
Abstract: We present a new construction of triple arrays by combining a symmetric 2-design with a resolution of another 2-design. This is the first general method capable of producing non-extremal triple arrays. We call the triple arrays which can be obtained in this way resolvable. We employ the construction to produce the first examples of $(21 \times 15, 63)$-triple arrays, and enumerate all resolvable $(7 \times 15, 35)$-triple arrays, of which there was previously only a single known example. An infinite subfamily of Paley triple arrays turns out to be resolvable.
  We also introduce a new intermediate object, unordered triple arrays, that are to triple arrays what symmetric 2-designs are to Youden rectangles, and propose a strengthening of Agrawal's long-standing conjecture on the existence of extremal triple arrays. For small parameters, we completely enumerate all unordered triple arrays, and use this data to corroborate the new conjecture. We construct several infinite families of resolvable unordered triple arrays, and, in particular, show that all $((q + 1) \times q^2, q(q + 1))$-triple arrays are resolvable and are in correspondence with finite affine planes of order $q$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.08681v2</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alexey Gordeev, Lars-Daniel \"Ohman</dc:creator>
    </item>
    <item>
      <title>Bridging Modalities: Joint Synthesis and Registration Framework for Aligning Diffusion MRI with T1-Weighted Images</title>
      <link>https://arxiv.org/abs/2601.11689</link>
      <description>arXiv:2601.11689v2 Announce Type: replace-cross 
Abstract: Multimodal image registration between diffusion MRI (dMRI) and T1-weighted (T1w) MRI images is a critical step for aligning diffusion-weighted imaging (DWI) data with structural anatomical space. Traditional registration methods often struggle to ensure accuracy due to the large intensity differences between diffusion data and high-resolution anatomical structures. This paper proposes an unsupervised registration framework based on a generative registration network, which transforms the original multimodal registration problem between b0 and T1w images into a unimodal registration task between a generated image and the real T1w image. This effectively reduces the complexity of cross-modal registration. The framework first employs an image synthesis model to generate images with T1w-like contrast, and then learns a deformation field from the generated image to the fixed T1w image. The registration network jointly optimizes local structural similarity and cross-modal statistical dependency to improve deformation estimation accuracy. Experiments conducted on two independent datasets demonstrate that the proposed method outperforms several state-of-the-art approaches in multimodal registration tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.11689v2</guid>
      <category>eess.IV</category>
      <category>cs.CV</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaofan Wang, Junyi Wang, Yuqian Chen, Lauren J. O' Donnell, Fan Zhang</dc:creator>
    </item>
    <item>
      <title>Multivariate Time Series Data Imputation via Distributionally Robust Regularization</title>
      <link>https://arxiv.org/abs/2602.00844</link>
      <description>arXiv:2602.00844v2 Announce Type: replace-cross 
Abstract: Multivariate time series imputation is often compromised by mismatch between the observed and true data distributions, a bias induced by the combined effects of time-series non-stationarity and systematic missingness. Standard methods that encourage point-wise reconstruction or direct distributional alignment may overfit these biased observations. We propose the Distributionally Robust Regularized Imputer Objective (DRIO), which jointly minimizes reconstruction error and the worst-case divergence between the imputer distribution and data distributions within a Wasserstein ambiguity set. We derive a tractable upper-bound surrogate that reduces infinite-dimensional optimization over measures to adversarial search over sample trajectories, and develop an alternating learning algorithm compatible with modern deep learning backbones. Comprehensive experiments on diverse real-world datasets show that DRIO consistently provides robust imputation and suggests improved downstream forecasting under various missingness scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.00844v2</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>stat.AP</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Che-Yi Liao, Zheng Dong, Gian-Gabriel Garcia, Kamran Paynabar</dc:creator>
    </item>
    <item>
      <title>Seeing the Goal, Missing the Truth: Human Accountability for AI Bias</title>
      <link>https://arxiv.org/abs/2602.09504</link>
      <description>arXiv:2602.09504v2 Announce Type: replace-cross 
Abstract: This research explores how human-defined goals influence the behavior of Large Language Models (LLMs) through purpose-conditioned cognition. Using financial prediction tasks, we show that revealing the downstream use (e.g., predicting stock returns or earnings) of LLM outputs leads the LLM to generate biased sentiment and competition measures, even though these measures are intended to be downstream task-independent. Goal-aware prompting shifts these intermediate measures toward the disclosed downstream objective, producing in-sample overfitting. Specifically, purpose leakage improves performance on data prior to the LLM's knowledge cutoff, but provides no advantage after the cutoff. This bias is strong enough that regularization of prompt instructions cannot fully address this form of overfitting. We further show that the bias can arise from users' unintentional conversational context that hints at the purpose. Overall, we document that AI bias due to "seeing the goal" is not an algorithmic flaw, but stems from human accountability in research design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.09504v2</guid>
      <category>q-fin.GN</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Sean Cao, Wei Jiang, Hui Xu</dc:creator>
    </item>
    <item>
      <title>When LLMs get significantly worse: A statistical approach to detect model degradations</title>
      <link>https://arxiv.org/abs/2602.10144</link>
      <description>arXiv:2602.10144v2 Announce Type: replace-cross 
Abstract: Minimizing the inference cost and latency of foundation models has become a crucial area of research. Optimization approaches include theoretically lossless methods and others without accuracy guarantees like quantization. In all of these cases it is crucial to ensure that the model quality has not degraded. However, even at temperature zero, model generations are not necessarily robust even to theoretically lossless model optimizations due to numerical errors. We thus require statistical tools to decide whether a finite-sample accuracy deviation is an evidence of a model's degradation or whether it can be attributed to (harmless) noise in the evaluation. We propose a statistically sound hypothesis testing framework based on McNemar's test allowing to efficiently detect model degradations, while guaranteeing a controlled rate of false positives. The crucial insight is that we have to confront the model scores on each sample, rather than aggregated on the task level. Furthermore, we propose three approaches to aggregate accuracy estimates across multiple benchmarks into a single decision. We provide an implementation on top of the largely adopted open source LM Evaluation Harness and provide a case study illustrating that the method correctly flags degraded models, while not flagging model optimizations that are provably lossless. We find that with our tests even empirical accuracy degradations of 0.3% can be confidently attributed to actual degradations rather than noise.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.10144v2</guid>
      <category>stat.ML</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>ICLR 2026</arxiv:journal_reference>
      <dc:creator>Jonas K\"ubler, Kailash Budhathoki, Matth\"aus Kleindessner, Xiong Zhou, Junming Yin, Ashish Khetan, George Karypis</dc:creator>
    </item>
    <item>
      <title>Induced Minors and Coarse Tree Decompositions</title>
      <link>https://arxiv.org/abs/2603.11379</link>
      <description>arXiv:2603.11379v2 Announce Type: replace-cross 
Abstract: Let $G$ be a graph, $S \subseteq V(G)$ be a vertex set in $G$ and $r$ be a positive integer. The distance $r$-independence number of $S$ is the size of the largest subset $I \subseteq S$ such that no pair $u$, $v$ of vertices in $I$ have a path on at most $r$ edges between them in $G$. It has been conjectured [Chudnovsky et al., arXiv, 2025] that for every positive integer $t$ there exist positive integers $c$, $d$ such that every graph $G$ that excludes both the complete bipartite graph $K_{t,t}$ and the grid $\boxplus_t$ as an induced minor has a tree decomposition in which every bag has (distance $1$) independence number at most $c(\log n)^d$. We prove a weaker version of this conjecture where every bag of the tree decomposition has distance $16(\log n + 1)$-independence number at most $c(\log n)^d$. On the way we also prove a version of the conjecture where every bag of the decomposition has distance $8$-independence number at most $2^{c (\log n)^{1-(1/d)}}$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.11379v2</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Maria Chudnovsky, Julien Codsi, Ajaykrishnan E S, Daniel Lokshtanov</dc:creator>
    </item>
    <item>
      <title>Optimal b-Colourings and Fall Colourings in $H$-Free Graphs</title>
      <link>https://arxiv.org/abs/2603.26214</link>
      <description>arXiv:2603.26214v2 Announce Type: replace-cross 
Abstract: In a colouring of a graph, a vertex is b-chromatic if it is adjacent to a vertex of every other colour. We consider four well-studied colouring problems: b-Chromatic Number, Tight b-Chromatic Number, Fall Chromatic Number and Fall Achromatic Number, which fit into a framework based on whether every colour class has (i) at least one b-chromatic vertex, (ii) exactly one b-chromatic vertex, or (iii) all of its vertices being b-chromatic. By combining known and new results, we fully classify the computational complexity of b-Chromatic Number, Fall Chromatic Number and Fall Achromatic Number in $H$-free graphs. For Tight b-Chromatic Number in $H$-free graphs, we develop a general technique to determine new graphs $H$, for which the problem is polynomial-time solvable, and we also determine new graphs $H$, for which the problem is still NP-complete. We show, for the first time, the existence of a graph $H$ such that in $H$-free graphs, b-Chromatic Number is NP-hard, while Tight b-Chromatic Number is polynomial-time solvable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.26214v2</guid>
      <category>math.CO</category>
      <category>cs.CC</category>
      <category>cs.DM</category>
      <category>cs.DS</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jungho Ahn, Tala Eagling-Vose, Felicia Lucke, David Manlove, Fabricio Mendoza, Dani\"el Paulusma</dc:creator>
    </item>
    <item>
      <title>A semicontinuous relaxation of Saito's criterion and freeness as angular minimization</title>
      <link>https://arxiv.org/abs/2604.02995</link>
      <description>arXiv:2604.02995v2 Announce Type: replace-cross 
Abstract: We introduce a nonnegative functional $\mathfrak{S}$ on the space of line arrangements in $\mathbb{P}^2$ that vanishes precisely on free arrangements, obtained as a semicontinuous relaxation of Saito's criterion. Given an arrangement $\mathcal{A}$ of $n$ lines with candidate exponents $(d_1, d_2)$, we parameterize the spaces of logarithmic derivations of degrees $d_1$ and $d_2$ via the null spaces of the associated derivation matrices and express the Saito determinant as a bilinear map into the space of degree-$n$ polynomials. The functional admits a natural geometric interpretation: it measures the squared sine of the angle between the image of this bilinear map and the direction of the defining polynomial $Q(\mathcal{A})$ in coefficient space, providing a computable measure of how far an arrangement is from admitting a free basis of logarithmic derivations of the expected degrees. We prove that $\mathfrak{S}$ is upper semicontinuous on natural strata, and use this to give a functional reformulation of Terao's conjecture.
  Beyond its theoretical interest, $\mathfrak{S}$ provides a viable computational handle on the landscape of free arrangements. We illustrate this through two complementary roles: as a smooth reward signal driving a reinforcement learning search for moderate $n$, and as a fast pre-filter accelerating an algebraic extension procedure for larger $n$. For $n \leq 13$, the reinforcement learning system discovers hundreds of verified free arrangements spanning all admissible exponent types. For $n \geq 14$, where the reinforcement learning reward signal becomes insufficient, the hybrid extension procedure -- combined with classical supersolvable constructions -- produces at least one verified free arrangement for every admissible exponent pair $(d_1, d_2)$ with $n \leq 20$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.02995v2</guid>
      <category>math.AG</category>
      <category>cs.LG</category>
      <category>math.CO</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tom\'as S. R. Silva</dc:creator>
    </item>
    <item>
      <title>Measuring Depth of Matroids</title>
      <link>https://arxiv.org/abs/2604.04896</link>
      <description>arXiv:2604.04896v2 Announce Type: replace-cross 
Abstract: Motivated by recently discovered connections between matroid depth measures and block-structured integer programming [ICALP 2020, 2022], we undertake a systematic study of recursive depth parameters for matrices and matroids, aiming to unify recently introduced and scattered concepts. We propose a general framework that naturally yields eight different depth measures for matroids, prove their fundamental properties and relationships, and relate them to two established notions in the field: matroid branch-depth and a newly introduced natural depth counterpart of matroid tree-width. In particular, we show that six of our eight measures are mutually functionally inequivalent, and among these, one is functionally equivalent to matroid branch-depth and another to matroid tree-depth. Importantly, we also prove that these depth measures coincide on matroids and on matrices over any field, which is (somehow surprisingly) not a trivial task. Finally, we provide a comparison between the matroid parameters and classical depth measures of graphs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.04896v2</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Jakub Balab\'an, Petr Hlin\v{e}n\'y, Jan Jedelsk\'y, Krist\'yna Pek\'arkov\'a</dc:creator>
    </item>
    <item>
      <title>Geometric Entropy and Retrieval Phase Transitions in Continuous Thermal Dense Associative Memory</title>
      <link>https://arxiv.org/abs/2604.07401</link>
      <description>arXiv:2604.07401v2 Announce Type: replace-cross 
Abstract: We study the thermodynamic memory capacity of modern Hopfield networks (Dense Associative Memory models) with continuous states under geometric constraints, extending classical analyses of pairwise associative memory. We derive thermodynamic phase boundaries for Dense Associative Memory networks with exponential capacity $M = e^{\alpha N}$, comparing Gaussian (LSE) and Epanechnikov (LSR) kernels. For continuous neurons on an $N$-sphere, the geometric entropy depends solely on the spherical geometry, not the kernel. In the sharp-kernel regime, the maximum theoretical capacity $\alpha = 0.5$ is achieved at zero temperature; below this threshold, a critical line separates retrieval from non-retrieval. The two kernels differ qualitatively in their phase boundary structure: for LSE, a critical line exists at all loads $\alpha &gt; 0$. For LSR, the finite support introduces a threshold $\alpha_{\text{th}}$ below which no spurious patterns contribute to the noise floor, and no critical line exists -- retrieval is perfect at any temperature. These results advance the theory of high-capacity associative memory and clarify fundamental limits of retrieval robustness in modern attention-like memory architectures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.07401v2</guid>
      <category>cond-mat.dis-nn</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tatiana Petrova, Evgeny Polyachenko, Radu State</dc:creator>
    </item>
    <item>
      <title>Towards Optimal Passive Feedback Control of LTI Systems under LQR Performance</title>
      <link>https://arxiv.org/abs/2604.14854</link>
      <description>arXiv:2604.14854v2 Announce Type: replace-cross 
Abstract: We study state-feedback design for continuous-time LTI systems with a control input and an external input-output pair. Our objective is to determine feedback gains that render the closed-loop system (strictly) passive with respect to the external port while minimizing the standard LQR cost in the disturbance-free case. The resulting constrained optimization problem is intractable due to bilinear matrix inequalities. We analyze the set of passivating gains, showing it is unbounded, possibly nonconvex, path-connected, and contractible. We propose an indirect approach, in which the set of passivating feedback gains is inner-approximated by a compact, convex polytope. A projected gradient flow is employed to compute a gain within this polytope that minimizes the LQR cost. Numerical examples illustrate the effectiveness of the method.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.14854v2</guid>
      <category>math.OC</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Armin Gie{\ss}ler, Pol Jan\'e-Soneira, S\"oren Hohmann</dc:creator>
    </item>
    <item>
      <title>Denoising data using convex relaxations</title>
      <link>https://arxiv.org/abs/2605.02327</link>
      <description>arXiv:2605.02327v2 Announce Type: replace-cross 
Abstract: We study the problem of denoising observations \(Y_i=X_i+Z_i\), where the latent variables \(X_i\) are sampled from a low-dimensional manifold in \(\mathbb{R}^n\) and the noise variables \(Z_i\) are isotropic Gaussian. We propose a convex-relaxation estimator that first reduces dimension by principal component analysis and then projects the observations onto the convex hull of the projected latent manifold. We construct a statistical oracle that estimates its supporting hyperplanes from empirical Gaussian tail probabilities of the noisy sample. Under a lower-mass condition on the latent distribution, we prove finite-sample guarantees for the oracle and derive error bounds for the resulting denoiser. The analysis combines risk bounds for least-squares projection under convex constraints with entropy bounds for convex hulls. We also verify the assumptions of the framework for a Cryo-Electron Microscopy observation model by establishing suitable covering number and Lipschitz estimates for the associated group action and imaging operators.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02327v2</guid>
      <category>stat.ME</category>
      <category>cs.LG</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Charles Fefferman, Aalok Gangopadhyay, Matti Lassas, Jonathan Marty, Hariharan Narayanan</dc:creator>
    </item>
    <item>
      <title>Copula-Based Endogeneity Correction for Doubly Robust Estimation of Treatment Effect</title>
      <link>https://arxiv.org/abs/2605.03278</link>
      <description>arXiv:2605.03278v2 Announce Type: replace-cross 
Abstract: Doubly Robust (DR) estimation of treatment effect relies on an untestable assumption that is the absence of unobserved confounding. This assumption is par- ticularly problematic in the context of healthcare research, where variables like pre- scription refill rates serve as proxies for unobserved behaviors such as medication adherence. These proxy variables are often endogenous, exhibiting correlation with the regression error term due to unmeasured confounding or measurement error. We propose a copula-corrected doubly robust estimator that addresses endogeneity in both the treatment and outcome models without requiring instrumental variables. Gaussian copulas model the joint distribution of endogenous covariates and the error term, enabling consistent estimation while preserving the doubly robust property that requires correct specification of either the treatment or outcome model, not both. Monte Carlo simulations demonstrate that naive DR estimation exhibits substantial bias under endogeneity, whereas our corrected estimator recovers unbiased treatment effects across different data-generating processes. We apply our method to examine the effect of nutritional counseling on blood pressure using the National Health and Nutrition Examination Survey (NHANES) data. Naive DR estimation suggests counseling is associated with increased blood pressure. After copula correction, this effect becomes statistically insignificant, consistent with literature showing modest effects of nutri- Counseling in reducing blood pressure. Our methodology provides researchers with a practical tool for obtaining treatment effects in the presence of endogeneity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03278v2</guid>
      <category>stat.ME</category>
      <category>cs.AI</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sahil Shikalgar, Md. Noor-E-Alam</dc:creator>
    </item>
    <item>
      <title>Small Matrices with Small Inverses: Unimodular Zerofree Cases</title>
      <link>https://arxiv.org/abs/2605.03691</link>
      <description>arXiv:2605.03691v2 Announce Type: replace-cross 
Abstract: We consider unimodular matrices $M$ such that neither $M$ nor $M^{-1}$ contain zero entries. Matrices typically exhibit a trade-off: small $M$ imply large $M^{-1}$. We investigate rare cases where both remain small, classify these matrices up to symmetry, and discuss aspects of this balanced setting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03691v2</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <category>math.NT</category>
      <pubDate>Thu, 07 May 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Steven Finch</dc:creator>
    </item>
  </channel>
</rss>