<?xml version='1.0' encoding='UTF-8'?>
<rss xmlns:arxiv="http://arxiv.org/schemas/atom" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">
  <channel>
    <title>cs updates on arXiv.org</title>
    <link>http://rss.arxiv.org/rss/cs</link>
    <description>cs updates on the arXiv.org e-print archive.</description>
    <atom:link href="http://rss.arxiv.org/rss/cs" rel="self" type="application/rss+xml"/>
    <docs>http://www.rssboard.org/rss-specification</docs>
    <language>en-us</language>
    <lastBuildDate>Tue, 09 Jun 2026 04:00:33 +0000</lastBuildDate>
    <managingEditor>rss-help@arxiv.org</managingEditor>
    <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
    <skipDays>
      <day>Saturday</day>
      <day>Sunday</day>
    </skipDays>
    <item>
      <title>Byzantine Cheap Talk: Adversarial Resilience and Topology Effects in LLM Coordination Games</title>
      <link>https://arxiv.org/abs/2606.07790</link>
      <description>arXiv:2606.07790v1 Announce Type: new 
Abstract: Multi-agent LLM systems increasingly rely on communication protocols for coordination, yet their robustness under adversarial and structural constraints remains poorly understood. Building on prior work showing that cheap-talk channels enable cooperation in LLM coordination games, we investigate two vulnerability classes in a 4-player Stag Hunt across six model families and 720 trials. First, when Byzantine agents signal cooperation but defect, non-Byzantine agents detect the betrayal within one round yet fail to adapt collectively: a substantial fraction continue cooperating despite repeated exploitation, unable to recover coordination due to the game's unanimity payoff structure. Second, explicitly restricting communication topology collapses cooperation, while applying identical restrictions silently preserves near-perfect cooperation. This establishes that coordination failure stems from agents' meta-reasoning about hidden information, not information loss itself. We identify two stable behavioral archetypes that replicate across all model cohorts: Defection-Prone models that switch permanently after betrayal, and Cooperation-Persistent models that continue cooperating at significant individual cost. These findings reveal concrete security vulnerabilities: communication channels can be exploited as adversarial injection vectors, and disclosing network topology to agents can degrade coordination even without any adversary present.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07790v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aya El Mir, Martin Tak\'a\v{c}, Salem Lahlou</dc:creator>
    </item>
    <item>
      <title>Frequency-Scale Saliency for Spectral Descriptor Analysis in 3D Shape Retrieval</title>
      <link>https://arxiv.org/abs/2606.07791</link>
      <description>arXiv:2606.07791v1 Announce Type: new 
Abstract: Classical spectral descriptors such as the Heat Kernel Signature and Wave Kernel Signature are widely used for non-rigid 3D shape retrieval, yet their failure modes remain poorly understood. We present a frequency-scale saliency framework that audits these descriptors by quantifying the retrieval-level contribution of each descriptor scale interval through ablation. We introduce class spectral fingerprints to characterize category-level scale dependence, and show that descriptor similarity between class pairs is substantially correlated with retrieval failure, with a Spearman correlation of 0.479. Experiments on SHREC'11 demonstrate that short scales dominate retrieval performance while long scales are harmful, that HKS and WKS exhibit distinct scale dependence patterns, and that saliency-weighted retrieval improves mAP on hard categories by 0.156, with cross-fold and random-weight controls confirming that the gain is stable and not due to arbitrary reweighting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07791v1</guid>
      <category>cs.GR</category>
      <category>cs.CV</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jianru Shen</dc:creator>
    </item>
    <item>
      <title>MOLOT System Card: Malicious Operational Logic Observation Transformer</title>
      <link>https://arxiv.org/abs/2606.07792</link>
      <description>arXiv:2606.07792v1 Announce Type: new 
Abstract: MOLOT (Malicious Operational Logic Observation Transformer) is a static malicious-code detection system designed for SAST setup where package metadata, maintainer history, and dynamic execution traces may be unavailable or unreliable. The system represents source code as behavior sequences derived from static call graphs, includes an explanation stage that ranks suspicious behavior activities and maps them back to source-code locations. The approach is evaluated on Python and JavaScript packages from PyPI and npm, compared with opensource detection tools, and validated under product constraints including runtime, memory use, and false-positive rates observed in a real moderation workflow. We also release Open Malicious-Code Bench, a public benchmark for reproducible evaluation of malicious-package detection methods. The results show that static behavior-sequence modeling can provide accurate, explainable, and deployable malicious-code detection for modern DevSecOps workflows.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07792v1</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Daniil Lopatkin, Maksim Mitrofanov, Stanislav Rakovsky, Aleksandr Khalikov</dc:creator>
    </item>
    <item>
      <title>The Choreography of Augmented Reality Timelines: Studying the Relative Position, Chronology, &amp; Situatedness of Event Sequences</title>
      <link>https://arxiv.org/abs/2606.07794</link>
      <description>arXiv:2606.07794v1 Announce Type: new 
Abstract: Timelines are effective ways to tell historical and personal stories. However, most timeline visualization tools impose an inflexible model of time prioritizing chronological clarity. On the other hand, unconstrained representations can better capture the irregular and contextual nature of lived time, but often at the cost of interpretability. In this work, we explore this continuum with a study of how historical and personal timelines could manifest in physical spaces. We conducted a formative study (N=12) in which participants freely arranged events within a physical environment. We observed a diversity of strategies reflecting the personal and context-dependent nature of temporal mental models. We also invited participants to consider how others could move through their timelines. Our analysis led to a choreographic approach to timeline creation, as well as a proof-of-concept tablet-based augmented reality (AR) application that supports spatial timeline drawing and viewing. Finally, we reflect on the design implications of encoding chronology, pacing, and spatial context in immersive timeline stories.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07794v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Isabelle Kwan, Jessica Ziyu Chen, Matthew Brehmer</dc:creator>
    </item>
    <item>
      <title>The Role of Semirings in Incremental View Maintenance</title>
      <link>https://arxiv.org/abs/2606.07795</link>
      <description>arXiv:2606.07795v1 Announce Type: new 
Abstract: We study the problem of incremental view maintenance (IVM) under inserts to $K$-databases, where $K$ is a commutative semiring without additive inverse. The key observation put forward in this paper is that the complexity of the IVM problem depends fundamentally on the underlying semiring. We introduce a class of conjunctive queries called $p$-hierarchical and show that for any $p$-hierarchical query with fractional hypertree width $\fhtw$ and any insert-only update sequence of length $N$ to an initially empty $K$-database over an arbitrary semiring $K$ without additive inverse, we can construct a data structure that can be updated in amortized $\bigO(N^{\fhtw-1})$ time and can support constant delay enumeration of the query result. In particular, the amortized update time for any $\alpha$-acyclic $p$-hierarchical query is constant. We also give conditional lower bounds showing that any conjunctive query without self-joins that is not $p$-hierarchical cannot be maintained with amortized constant update time and constant enumeration delay under inserts to $K$-databases. Here, $K$ can be the natural semiring and its generalizations to the provenance and covariance semirings or any idempotent and strictly ordered semiring such as the tropical semiring. When put together, our upper and lower bounds imply a dichotomy for the insert-only maintenance of conjunctive queries without self-joins and the aforementioned semirings: A query can be maintained with amortized constant update time and constant enumeration delay if and only if it is $\alpha$-acyclic $p$-hierarchical.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07795v1</guid>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Eden Chmielewski, Andrei Draghici, Dan Olteanu, Haozhe Zhang</dc:creator>
    </item>
    <item>
      <title>Belief-Space Quantum-Inspired Reinforcement Learning for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles</title>
      <link>https://arxiv.org/abs/2606.07796</link>
      <description>arXiv:2606.07796v1 Announce Type: new 
Abstract: The Internet of Vehicles (IoV) faces a dynamic, adversarial security environment where attackers adapt to defenses. Existing intrusion detection systems rely on static classifiers that fail to capture sequential decision-making, attacker adaptation, and uncertainty. We formulate IoV security as a sequential attacker-defender interaction and model defense as a reinforcement learning problem under partial observability. We propose Quantum Belief-Integrated Reinforcement Defense (Q-BIRD), using quantum-inspired belief representation to encode defender uncertainty about hidden attacker intent via amplitude-based states, enabling non-Bayesian belief evolution. Integrated into a Proximal Policy Optimization (PPO) defender, Q-BIRD selects cost-aware mitigation actions. In simulated environments with adaptive, probing attackers, Q-BIRD reduced cumulative mean damage, damage variance, and attack success rate (ASR) by 60.4%, 90.2%, and 50.0%, respectively, while increasing survival probability by 46.4%. Compared to classical Bayesian PPO, damage variance reduction and ASR improved by 10.2 times and 50%. Ablation and explainability analyses confirm that amplitude-based belief is the primary decision signal during strategy transitions when classical belief collapses, providing superior IoV security without additional hardware.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07796v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anwar Shah, Rohan Farooq, Sajid Anwer, Tallha Akram, Usman Ghous, Sajid Ullah Khan</dc:creator>
    </item>
    <item>
      <title>Reconstructing and forecasting disease trajectories of patients with Alzheimer's disease using routine data in resource-constrained settings</title>
      <link>https://arxiv.org/abs/2606.07798</link>
      <description>arXiv:2606.07798v1 Announce Type: new 
Abstract: Alzheimer's disease is a progressive neurodegenerative disorder, and its progression varies substantially across patients. Existing work aims to forecast patients' future cognitive state, with minimal focus on reconstructing the state from past visits. Furthermore, in current research, quantifying predictive uncertainty remains underexplored and relies on costly modalities such as MRI, PET, and CSF, limiting their deployment in resource-limited settings. In this research, our primary objectives are: First, bidirectional prediction of cognitive scores from irregular visits to present the complete disease trajectory. Second, to enable interpolation and extrapolation capabilities to assist clinicians in informed prognostic decision making, and third, to provide a well-calibrated uncertainty estimate for all predictions, and finally, to achieve the objectives using the modalities available during routine visits. We propose a unified framework, GNOVA: A GRU-Neural ODE Variational Autoencoder. The architecture combines a Gated Recurrent Unit encoder and a Neural ODE decoder within a variational autoencoder framework. In our work, we forecast the CDR-SB and MMSE Scores. The GRU encoder allows for any number of inputs at any time point. The Neural-ODE decoder performs continuous estimation, allowing interpolation and extrapolation at any desired time point. The Variational autoencoder allows for uncertainty estimation in predictions. We worked with 1,727 patients from the ADNI dataset over 10 years; the model achieved mean absolute errors of 1.35 and 2.28 for CDR-SB and MMSE scores, respectively, without requiring any neuroimaging or biomarker data. Feature-ablation studies revealed that age, BMI, and APOE4 status were strong predictors. The proposed framework enables the reconstruction of incomplete patient histories and the anticipation of future cognitive states.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07798v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>q-bio.NC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Ratnadeep Das, Atri Chatterjee, Sitikantha Roy</dc:creator>
    </item>
    <item>
      <title>Improving Multimodal Reasoning via Worst Dimension Optimization</title>
      <link>https://arxiv.org/abs/2606.07801</link>
      <description>arXiv:2606.07801v1 Announce Type: new 
Abstract: Multimodal reasoning requires a path that retains integrity over a wide range of constraints, from visual grounding to logic consistency. However, the current Process Reward Models focus on heuristically defined rewards that equally weigh these factors, which may lead to the concealment of individual dimension failures by the dominating factors, without guaranteeing the validity of the reasoning process in general.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07801v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haocheng Lv, Huaping Zhang, Qiuchi Li, Lei Li, Chunxiao Gao</dc:creator>
    </item>
    <item>
      <title>Memetic Capture: A Pluralistic Policy Framework for Governing AI-Driven Cultural Disempowerment</title>
      <link>https://arxiv.org/abs/2606.07802</link>
      <description>arXiv:2606.07802v1 Announce Type: new 
Abstract: Culture is the most insidious vector of gradual human disempowerment by AI: unlike economic or political displacement, cultural displacement attacks the very preferences and values through which humans recognise and resist disempowerment itself. We argue that existing AI governance frameworks suffer from a critical blind spot by treating cultural impact as secondary to economic and safety concerns. This paper develops \emph{memetic capture} as a unifying concept for AI-driven cultural disempowerment, and proposes the \textbf{Cultural Pluralistic Governance Framework (CPGF)}, a four-tier policy architecture combining quantitative cultural influence metrics, democratic value assemblies, pluralistic deployment standards, and transnational coordination mechanisms. We argue that pluralism is not merely an ethical requirement for such governance but a structural necessity: monocultural AI governance accelerates the very disempowerment it claims to prevent. We identify concrete policy levers, discuss implementation tensions, and outline a research agenda at the intersection of pluralistic alignment and cultural AI governance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07802v1</guid>
      <category>cs.CY</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Subramanyam Sahoo</dc:creator>
    </item>
    <item>
      <title>Stability Without Safety: Gain Manipulation Attacks on Agentic Cyber-Physical Systems</title>
      <link>https://arxiv.org/abs/2606.07803</link>
      <description>arXiv:2606.07803v1 Announce Type: new 
Abstract: Agentic cyber-physical systems (CPS), where autonomous AI agents participate in runtime control decision-making, introduce agent-driven parameter-update pathways absent from conventional feedback architectures. These pathways form a parameter channel structurally distinct from classical sensor and actuator channels. Among these parameters, feedback gains are the highest-leverage target: a single gain matrix determines closed-loop eigenvalue placement for the entire system, and malicious updates can directly alter closed-loop dynamics while evading residual-based monitors. We formalize this attack surface through a three-axis attacker model and a taxonomy of Gain Manipulation Attacks (GMA). Two impact classes are identified: stability-margin erosion under sustained gain drift, and transient amplification under one-shot gain replacement. A stability-preserving gain replacement can still produce transient amplification far exceeding safe operating limits, and stability verification alone is insufficient to bound the physical impact of such attacks. Stealthiness conditions and worst-case impact certificates are derived for each class via Bauer--Fike eigenvalue bounds and the Kreiss matrix theorem, with preliminary detection directions and a vehicle lateral dynamics example provided.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07803v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ali Eslami, Jiangbo Yu</dc:creator>
    </item>
    <item>
      <title>Quantum-Inspired Reinforcement Learning for Low-Latency Intrusion Detection in V2X and Internet-of-Vehicles Networks</title>
      <link>https://arxiv.org/abs/2606.07804</link>
      <description>arXiv:2606.07804v1 Announce Type: new 
Abstract: Smart cities increasingly depend on dense edge, IoT, and vehicular networks to deliver critical urban services, including traffic control, connected mobility, infrastructure monitoring, and energy management. In this ecosystem, the Internet of Vehicles (IoV) is central to intelligent transportation, enabling continuous communication among vehicles, roadside infrastructure, and cloud-edge platforms. This connectivity, however, also enlarges the attack surface and exposes smart city and vehicular systems to evolving cyber threats that can compromise safety, privacy, data integrity, and service continuity. Conventional static defenses are often inadequate because they cannot autonomously adapt to changing attack behaviors or multi-stage intrusion patterns. This paper proposes QIRL, a Quantum-Inspired Reinforcement Learning framework built on a lightweight Deep Q-Network architecture for next-generation autonomous cyber defense. QIRL combines amplitude-phase quantum state encoding, rotation-gate-based exploration, and quantum interference reward augmentation within a cost-sensitive Markov Decision Process formulation. It further addresses class imbalance through training-only SMOTE balancing and asymmetric cost-sensitive reward shaping, while sequential MDP modeling captures temporal dependencies in multi-stage attack campaigns. The framework is evaluated on CICIDS2017 and UNSW-NB15. QIRL achieves accuracies of 97.89\% and 91.04\%, F1-scores of 95.22\% and 91.66\%, AUC-ROC values of 0.9945 and 0.9713, and True Skill Statistics of 0.9443 and 0.8244, respectively. It also attains ultra-low inference latencies of 32.5 and 45.7 microseconds per sample, corresponding to 67.77 times and 51.77 times speedups over ensemble baselines. These results show that QIRL offers a lightweight, latency-aware, and adaptive defense for smart city and IoV infrastructures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07804v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sajid Anwer, Rohan Farooq, Anwar Shah, Tallha Akram</dc:creator>
    </item>
    <item>
      <title>Beyond Goodhart's Law: A Dynamic Benchmark for Evaluating Compliance in Multi-Agent Systems</title>
      <link>https://arxiv.org/abs/2606.07805</link>
      <description>arXiv:2606.07805v1 Announce Type: new 
Abstract: The rapid evolution of Large Language Models (LLMs) from passive assistants to autonomous, execution-capable agents has introduced critical operational risks. Most current evaluation frameworks neglect procedural compliance, leading to ''Machiavellian'' behaviors where agents strategically violate safety rules to maximize rewards - a direct manifestation of Goodhart's Law. To address this blind spot, we introduce MAC-Bench, a dynamic, adversarial benchmark designed to evaluate the procedural alignment of multi-agent systems under realistic pressure. We propose the SERV(Seed - Evolve - Refine - Verify) pipeline, an ``Agent-as-a-Benchmark'' paradigm that transforms unstructured legal texts into executable, contamination-free scenarios. By synthesizing holographic sandbox environments and injecting calibrated social-engineering pressure vectors, MAC-Bench forces agents into Pareto-optimal trade-offs between task success and regulatory adherence. We introduced novel metrics: the Compliance-Weighted Success Rate (CSR) and the Machiavellian Gap (MG), and conducted a comprehensive evaluation of state-of-the-art frontier models to reveal the pervasive trade-offs between success and compliance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07805v1</guid>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yiyang Zhao, Zhuo Zhang, Qingxuan Le, Lizhen Qu, Zenglin Xu</dc:creator>
    </item>
    <item>
      <title>Where Instruction Hierarchy Breaks: Diagnosing and Repairing Failures in Reasoning Language Models</title>
      <link>https://arxiv.org/abs/2606.07808</link>
      <description>arXiv:2606.07808v1 Announce Type: new 
Abstract: Reasoning language models deployed in agentic workflows must follow an instruction hierarchy: when instructions from different sources conflict, the model should obey the highest-privilege applicable instruction. Existing benchmarks largely measure this behavior end-to-end, asking whether the final response is compliant. However, a non-compliant response can arise from several distinct failures: the model may fail to identify the relevant instructions in context, fail to resolve conflicts among identified instructions, or correctly resolve the conflict in its reasoning while still producing a violating response. We introduce a white-box diagnostic framework that localizes instruction hierarchy failures into instruction identification, conflict resolution, and response realization, making failures more interpretable. We evaluate three reasoning models--Gemma-4-31B-IT, Qwen3.6-35B-A3B, and Claude Sonnet 4.6--on long-context adaptations of IHEval and IHChallenge, and find that the dominant failure mode varies across models, tasks, and context length. Building on the observation that models can often detect conflicts and output violations when explicitly prompted, we propose two training-free self-monitoring mechanisms: a parallel input monitor for low-latency conflict detection before generation, and a sequential output monitor for response-level review and repair. Across Gemma-4-31B-IT, Claude Sonnet 4.6, and GPT-5.3, the strongest monitor reduces rule-following non-compliance by 81-99%, with GPT-5.3 reductions of 86% under static attacks and 45% under adaptive attacks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07808v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sanjay Kariyappa, G. Edward Suh</dc:creator>
    </item>
    <item>
      <title>Sensitivity Analysis White Paper</title>
      <link>https://arxiv.org/abs/2606.07809</link>
      <description>arXiv:2606.07809v1 Announce Type: new 
Abstract: Sensitivity analysis is an important component of simulation-based decision support because it helps analysts determine which inputs most strongly influence model outcomes under uncertainty. This paper organizes the broad sensitivity analysis literature into a coherent framework for use in complex simulation settings, with particular attention to military applications. We review major classes of methods, including local and global approaches, variance-based techniques, screening methods, derivative-based methods, and uncertainty quantification tools, and relate them to common analytical objectives such as factor prioritization, factor fixing, variance reduction, and factor mapping. The paper also discusses sensitivity auditing as a complementary perspective that emphasizes transparency, assumption tracking, and responsible use of models in decision-relevant settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07809v1</guid>
      <category>cs.SE</category>
      <category>stat.AP</category>
      <category>stat.ME</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Nate Bade, Lindsay Erickson</dc:creator>
    </item>
    <item>
      <title>SLMJury: Can Small Language Models Judge as Well as Large Ones?</title>
      <link>https://arxiv.org/abs/2606.07810</link>
      <description>arXiv:2606.07810v1 Announce Type: new 
Abstract: Large language models (LLMs) are widely used as judges for evaluating model outputs, but their high cost, latency, and opacity limit scalability. We introduce SLMJury, a framework for evaluating small language models (SLMs) as judges across two paradigms: closed-ended binary correctness and open-ended quality scoring. We benchmark 16 SLM judges (0.6B-14B parameters) from four model families across ten benchmarks: eight closed-ended tasks spanning mathematical, scientific, and general reasoning (N=64,824 judgments per configuration), plus SummEval and MT-Bench for summarization and conversational scoring. We formalize judging as a budget-conditioned function and study five dimensions. Four findings emerge. (1) The overthinking effect is domain-dependent: for most judges quick 10-token verdicts match or beat extended reasoning on mathematical judging (by 2-7% where they help), while reasoning wins on general tasks by up to 23%. (2) Domain generalization separates model families, with math-to-general accuracy gaps ranging from under 10% to nearly 40%. (3) Closed-ended and open-ended judging draw on different capabilities: the best binary judge (Phi-4) drops to rank 9 on MT-Bench, while reasoning-trained models invert this ordering. (4) Under the Reflect-Critique-Refine (RCR) debate protocol, multi-agent debate degrades accuracy across all tested configurations, whereas the top judges resist six adversarial personas with &lt;=0.55% variance. Reliable automated evaluation does not require large proprietary models, yet no single SLM dominates. The leaderboard is available at https://anishh15.github.io/SLMJury/, and our framework code and pip package are publicly available at https://github.com/anishh15/SLMJury and https://pypi.org/project/slmjury/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07810v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anish Laddha, Nitesh Pradhan, Gaurav Srivastava</dc:creator>
    </item>
    <item>
      <title>Scaling Participation in Modular AI Systems</title>
      <link>https://arxiv.org/abs/2606.07812</link>
      <description>arXiv:2606.07812v1 Announce Type: new 
Abstract: Humanity is a mosaic of multifaceted talents and needs, and any truly intelligent AI must reflect that richness. Yet the LLMs used by all are built by the few -- a centralized market of monolithic AI models structurally ill-suited to capture the diversity of human knowledge, reasoning, and values. Here we introduce scaling participation, a new paradigm in which modular AI systems are built from the bottom up through the contributions of diverse stakeholders. Participants contribute small models trained on their own interests and priorities; these models then collaborate in modular frameworks as compositional AI systems. Participatory AI systems outperform monolithic LLMs by up to 15.4% across 15 tasks, such as reasoning and factuality, surpassing models larger than all contributed components combined. Further experiments show that participatory AI systems benefit from contributor diversity, substantially improve on each contributor's original priorities, and exhibit emergent capabilities that allow them to solve over 15% of problems where all individual models fail. Scaling participation provides a technical foundation for transitioning from the monolithic status quo toward an open, bottom-up, and collaborative AI future.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07812v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shangbin Feng, Yike Wang, Weijia Shi, Luke Zettlemoyer, Yejin Choi, Yulia Tsvetkov</dc:creator>
    </item>
    <item>
      <title>MinNav: Minimalist Navigation Using Optical Flow For Active Tiny Aerial Robots</title>
      <link>https://arxiv.org/abs/2606.07813</link>
      <description>arXiv:2606.07813v1 Announce Type: new 
Abstract: Navigation using a monocular camera is pivotal for autonomous operation on tiny aerial robots due to their perfect balance of versatility, cost and accuracy. In this paper, we introduce MinNav, a navigation stack based on optical flow and its uncertainty to fly through a scene with static and dynamic obstacles and unknown-shaped gaps without any prior knowledge of the scene components and/or their locations/ordering. We further improve success rate by using the activeness of the robot to move around in an exploratory way to find obstacles and navigate. We successfully evaluate and demonstrate the proposed approach in many real-world experiments in various environments with static and dynamic obstacles and unknown-shaped gaps with an overall success rate of 70%. To the best of our knowledge, this is the first solution to tackle all the aforementioned navigation cases without prior knowledge using a monocular camera. Our approach is on par in performance with depth based methods with factors of magnitude less computation required and can readily run onboard tiny aerial robots. The accompanying video, supplementary material, code and dataset can be found at https://pear.wpi.edu/research/minnav.html</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07813v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aniket Patil, Mandeep Singh, Uday Girish Maradana, Nitin J. Sanket</dc:creator>
    </item>
    <item>
      <title>Representational Similarity and Model Behavior in Multi-Agent Interaction</title>
      <link>https://arxiv.org/abs/2606.07818</link>
      <description>arXiv:2606.07818v1 Announce Type: new 
Abstract: Researchers have shown that neural similarity among humans predicts social closeness and cooperative success, whereas innovation often emerges from interactions among dissimilar individuals. We investigate whether these principles extend to artificial intelligence by examining interactions between large language models. In our experiments, 276 model pairs interact across eight games spanning both cooperation and novelty. We find that pairs with more similar representation spaces achieve significantly higher cooperation but exhibit reduced novelty and creativity. The effects of representational similarity on cooperation and novelty remain robust even after controlling for other factors such as performance disparity and model size. We also find that similarity in the early layers consistently shows the strongest association with cooperation and novelty, compared to the middle and later layers. This suggests that a central factor underlying these patterns could be the extent to which the two models share lexical and semantic grounding. Overall, representational similarity can be an important consideration in multi-agent system design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07818v1</guid>
      <category>cs.CL</category>
      <category>cs.NE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yujin Potter, Seun Eisape, Shiyang Lai, Alexander Huth, James Evans, Been Kim, Jacob Eisenstein, Dawn Song, Alane Suhr</dc:creator>
    </item>
    <item>
      <title>Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression</title>
      <link>https://arxiv.org/abs/2606.07819</link>
      <description>arXiv:2606.07819v1 Announce Type: new 
Abstract: Recently, the efficiency of Large Language Models (LLMs) deployment has become a critical concern in practical applications. While post-training quantization (PTQ) and structural pruning are established techniques for reducing memory footprint and inference latency, most existing PTQ approaches optimize quantization errors on a per-layer basis, overlooking how errors accumulate and propagate through the network, often resulting in suboptimal solutions. Traditional pipelines also tend to apply pruning and quantization in isolation or sequentially, further compounding sub-optimality. We introduce a novel end-to-end framework that addresses these limitations in two key ways. First, we propose a novel mixed-precision PTQ strategy that directly minimizes global error propagation across the entire model, rather than isolating layer-wise errors. Building on this, we develop a novel joint optimization approach that simultaneously learns structural pruning decisions and mixed-precision quantization policies within a unified search space. Extensive experiments show that, at ultra-low precisions (1-3 bits), our quantization method reduces WikiText perplexity by up to 21% compared to state-of-the-art (SoTA) weight-activation quantization baselines. Against leading weight-only quantization methods, it achieves up to 59% and 85% lower perplexity on WikiText and C4, respectively. Compared to the SoTA joint pruning-and-quantization techniques, our proposed method delivers superior perplexity and reasoning performance at ultra-low bits.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07819v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Hoang-Loc La, Truong-Thanh Le, Amir Taherkordi, Phuong Hoai Ha</dc:creator>
    </item>
    <item>
      <title>A note on rounding fractional matchings with constant-factor strong negative correlation</title>
      <link>https://arxiv.org/abs/2606.07820</link>
      <description>arXiv:2606.07820v1 Announce Type: new 
Abstract: We describe new dependent-rounding algorithms for bipartite graphs. Given a fractional matching $x$ of graph $G = (U \cup V, E)$, the algorithms return an integral solution $X$ such that each right-node $v \in V$ has at most one edge, and where the variables $X_e$ also satisfy broad non-positive correlation properties. In particular, for any edges $e_1, e_2$ sharing a left-node $u \in U$, the variables $X_{e_1}, X_{e_2}$ have \emph{strong} negative-correlation, i.e. the expectation of $X_{e_1} X_{e_2}$ is significantly below $x_{e_1} x_{e_2}$.
  Dependent rounding schemes with these properties have been used for a approximation algorithms for job-scheduling on unrelated machines to minimize weighted completion times, among other applications. Our new algorithm achieves simpler and qualitatively stronger bounds compared to prior algorithms. In particular, we achieve a negative-correlation property $$ \E[X_{e_1} X_{e_2}] \leq 0.79751 \ x_{e_1} x_{e_2}, $$ which is a significant constant-factor improvement over Baveja, Qu &amp; Srinivasan (2023).</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07820v1</guid>
      <category>cs.DS</category>
      <category>math.PR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>David G. Harris</dc:creator>
    </item>
    <item>
      <title>The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust</title>
      <link>https://arxiv.org/abs/2606.07822</link>
      <description>arXiv:2606.07822v1 Announce Type: new 
Abstract: As language models improve and become increasingly deployed to solve a variety of tasks, trustworthiness becomes essential. Calibration is a good proxy for trust: well-calibrated confidence estimates help inform the risk versus reward tradeoff when trusting a specific model output. Unfortunately, even as models improve, they remain poorly calibrated, often biasing towards overconfidence. Additionally, calibration can be gamed: a policy that always predicts the base rate is perfectly calibrated, but completely uninformative. To resolve this, we develop a new metric, expected utility renormalized by the oracle (EURO), that balances calibration and informativeness. We also propose a general-purpose activation-based confidence, utility, and trust estimation protocol (ACUTE) to appropriately adjudicate uncertainty. The ACUTE protocol provides flexible, sample-efficient, and compute-efficient confidence estimators for 3 tasks including multiple choice question answering, tool-calling, and scientific document summarization across 6 models from 4 model families. ACUTE outperforms strong baselines on EURO, while maintaining low calibration error. Taken together, our work shows that equipping LLMs with the ACUTE protocol can improve calibration, utility, and trustworthiness in numerous settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07822v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nishant Subramani, Palash Goyal, Yiwen Song, Mani Malek, Yuan Xue, Tomas Pfister, Hamid Palangi</dc:creator>
    </item>
    <item>
      <title>Jas: AI-Paired Engineering as a Revival of N-Version Programming</title>
      <link>https://arxiv.org/abs/2606.07828</link>
      <description>arXiv:2606.07828v1 Announce Type: new 
Abstract: I report a case study in AI-paired software engineering: five working ports of a vector illustration application across Rust, Swift, OCaml, Python, and browser-based platforms, built by a single developer in approximately 120 evening hours. The methodology pairs AI-assisted implementation with two safeguards -- a precise executable YAML specification serving as the single source of truth, and parallel implementations functioning as a built-in differential-testing layer. The five ports share a 23{,}000-line specification; per-port native code ranges from 0 to roughly 95{,}000 lines, reflecting the specification's escape hatch. I argue that AI-paired engineering, conditional on these two safeguards, makes feasible scope of work that conventionally requires multiple developer-years, and frame the methodology as a revival of N-version programming, a 1980s approach abandoned on cost grounds that AI changes. The paper reports concrete artifacts and honest limitations of the single-developer case study.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07828v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jason Hickey</dc:creator>
    </item>
    <item>
      <title>Academic Integrity and Emotional Responses to Inappropriate LLM Use in Software Engineering Education</title>
      <link>https://arxiv.org/abs/2606.07830</link>
      <description>arXiv:2606.07830v1 Announce Type: new 
Abstract: Academic integrity in higher education is increasingly shaped by complex socio-technical environments marked by automated tools, evolving institutional practices, and heightened performance pressures. Within this context, large language models (LLMs) are becoming prevalent in software engineering education, further blurring boundaries around acceptable assistance and authorship. This study investigates how software engineering students describe their emotional experiences after using LLMs in ways they perceive as academically inappropriate. We conducted a cross-sectional survey with 116 undergraduate students. Results show emotionally heterogeneous responses. Indifference was most frequent, including among students who recognized risks to learning and academic standing. Guilt and anxiety were reported in relation to moral discomfort and concern about penalties. Relief and satisfaction were evident primarily in deadline-driven contexts and situations of unclear guidance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07830v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ronnie de Souza Santos, Italo Santos, Giuseppe Destefanis, Cleyton Magalhaes, Mairieli Wessel</dc:creator>
    </item>
    <item>
      <title>SLRMentor: An LLM-Based Tool Supporting Learning of SLR in Software Engineering</title>
      <link>https://arxiv.org/abs/2606.07831</link>
      <description>arXiv:2606.07831v1 Announce Type: new 
Abstract: This paper presents SLRMentor, a conversational assistant designed to support both learning about the systematic literature review process and the execution of planning activities in software engineering. The tool offers general guidance on SLR methodology and supports key planning tasks, including search string construction and reasoning about inclusion and exclusion criteria, with explanations grounded in established SLR guidelines. A pilot validation with graduate students suggests that SLRMentor helps clarify the SLR process and planning decisions, lowers initial barriers for novice researchers, and supports learning while still requiring active methodological judgment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07831v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rodolfo Gil-Pereira, Ronnie de Souza Santos, Cleyton Magalahes, Italo Santos</dc:creator>
    </item>
    <item>
      <title>Ternary public-key cryptosystem</title>
      <link>https://arxiv.org/abs/2606.07832</link>
      <description>arXiv:2606.07832v1 Announce Type: new 
Abstract: Public-key cryptosystems eliminate the requirement for pre-shared secret keys by enabling encryption with a publicly disclosed key and decryption with a corresponding private key. In this article we generalize the public-key cryptosystems to ternary algebraic structures, with particular attention to ElGamal as a representative family. We introduce the necessary algebraic background for nonderived ternary structures, including special elements, ternary group rings, and a matrix ternarization procedure that maps binary rings and group rings to antidiagonal symbolic matrices closed under ternary multiplication. Building on these foundations, we formulate a ternary analogue of the ElGamal three-step protocol (key generation, ephemeral encryption, and decryption via querelements) and derive explicit ternary power and querelement formulas that enable correct decryption. Concrete instantiations and numerical examples over a ternary fraction field, a matrix-ternarized finite group ring, and a finite \((6,3)\)-ring (field) validate the construction and illustrate admissible word-length quantization and cycle behaviour of ternary powers. The ternary framework highlights two practical advantages: richer algebraic structure (querelements replace binary inverses) that increases algebraic complexity for attackers, and higher information density (matrix ternarization transfers paired/plaintext vectors). Formal hardness assumptions, optimized parameter choices, and comprehensive security and performance analyses remain necessary future work.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07832v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Steven Duplij, Qiang Guo, Na Fu</dc:creator>
    </item>
    <item>
      <title>Beyond Pass/Fail: Using Process Mining to Understand How LLMs Resist (and Fail) Red Team Attacks</title>
      <link>https://arxiv.org/abs/2606.07833</link>
      <description>arXiv:2606.07833v1 Announce Type: new 
Abstract: Standard AI red teaming evaluations reduce adversarial campaigns to a single binary outcome, attack success rate (ASR), not taking into account the sequential structure of how models resist or yield to attacks. We propose applying process mining, a discipline for discovering and analyzing process models from event logs, to red teaming traces. We conduct a controlled experiment pitting 60 HarmBench prompts against two LLMs, GPT-OSS 120B and Llama 3.3 70B, using 10 prompt mutation strategies over up to 110 attempts per prompt. From the resulting 8,575 scored events we extract Directly-Follows Graphs (DFGs) and state transition matrices that reveal structurally distinct defense profiles invisible to ASR alone: GPT-OSS exhibits a near-absorbing refusal state, while Llama presents multiple porous escape routes from refusal to getting successfully jailbroken. We further show that mutator effectiveness is asymmetric across models and that time-to-jailbreak distributions differ by an order of magnitude.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07833v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Zvi Topol</dc:creator>
    </item>
    <item>
      <title>Cherry-pick Override: Unsafe Directional Commitment in LLM Judges under Mixed Evidence</title>
      <link>https://arxiv.org/abs/2606.07834</link>
      <description>arXiv:2606.07834v1 Announce Type: new 
Abstract: LLM judges increasingly turn verdicts into system commitments. Under mixed evidence (claims with both supporting and refuting sources) this is unsafe: when the schema exposes CONFLICTING as the authorized non-directional verdict, returning SUPPORTS/REFUTES is an unauthorized directional commitment, a failure we name Cherry-pick Override (CCO). We define CCO under an explicit task contract and report it with a same-denominator diagnostic protocol paired with matched-coverage bootstrap and an apples-to-apples random-veto null. On AVeriTeC's Conflicting subset (N_C = 150), three-option judges return a directional verdict on more than 84% of mixed-evidence claims; under the typed schema, three-judge majority voting amplifies direction-on-conflict on AVeriTeC (0.887 vs. 0.840; 95% CI [+0.013, +0.080]) but does not replicate on VitaminC-Mixed. Walking an intervention ladder of common single-channel fixes (typed vocabulary, panel aggregation, confidence thresholding, validator-only filtering), each leaves a distinct residual failure: panel aggregation suppresses single-judge CONFLICTING dissent in 48% of CCO cases; the panel is well-calibrated for direction (ECE = 0.07 on pure-S/R) so confidence cannot operationally separate CCO from correct directional commits; validator-as-classifier nearly halves pure-evidence accuracy. A minimal two-channel reference probe reaches operating points neither single channel reaches; under the random-veto null its promotion to CONFLICTING is structurally targeted on AVeriTeC (empirical p &lt; 1/2001) and weaker but in the same direction on VitaminC-Mixed, a selectivity result rather than a magnitude one. We argue for an external commitment-control layer that separates verdict generation from commitment authorization, using structural evidence and confidence as orthogonal channels and NO-COMMIT as a routed controller state.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07834v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haoran Xu</dc:creator>
    </item>
    <item>
      <title>Mitigating the Contractivity Trap in Diffusion ODEs via Stein Stabilization</title>
      <link>https://arxiv.org/abs/2606.07835</link>
      <description>arXiv:2606.07835v1 Announce Type: new 
Abstract: A fundamental tension exists in the large-step inference of diffusion models via their deterministic probability flow ordinary differential equation (PF-ODE) trajectories, which we identify as the contractivity trap: efficient inference favors large step sizes, while aggressive steps and highly expressive denoisers can undermine contraction-based stability certificates for error suppression. To address this, we propose SteinDiff, a step-wise inference-time stabilization framework that employs Stein-derived corrections without requiring reference samples. Specifically, SteinDiff introduces a geometry-aware residual correction mechanism that regularizes large-step solver updates without retraining. To this end, we derive a closed-form Stein correction coefficient for step-wise solver adjustment, enabling reference-free adaptation to local data geometry. We further establish a score-controlled perturbation bound under distributional shifts and provide a complementary Stein perspective on EDM-style parameterizations. Extensive experiments demonstrate that SteinDiff mitigates severe artifacts and improves generative quality across large-step inference settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07835v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shigui Li, Delu Zeng</dc:creator>
    </item>
    <item>
      <title>Does Persona Make LLMs K-pop Fans? A Pilot Study of LLM-Based Online Concert Audience Agents</title>
      <link>https://arxiv.org/abs/2606.07837</link>
      <description>arXiv:2606.07837v1 Announce Type: new 
Abstract: A concert is a collective experience, but recorded performance videos are typically watched alone, stripping away the shared audience presence that makes concerts feel eventful. We investigate whether persona-based LLM audience agents can recreate aspects of this collective experience by generating real-time fan chat alongside a K-pop performance video. We present a multi-agent system in which ten LLM agents react through live-chat messages, comparing a persona-conditioned audience (each agent assigned a distinct fan identity, bias, and chat style) with a no-persona baseline. In a within-subjects pilot with K-pop fans (N=11), persona conditioning substantially improved model-level chat quality and perceived naturalness, but did not translate into differences in social connectedness, engagement, or affective response. Interviews suggest that online K-pop concert chat may operate as collective monologue rather than interpersonal dialogue, and that meaningful participation depends on shared identification with the specific artist and fandom. Persona conditioning can make LLM audiences appear more natural, but culturally meaningful collective experience may require deeper alignment between persona, crowd behavior, fandom identity, and user expectations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07837v1</guid>
      <category>cs.HC</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kirak Kim, Hyojin Kim, Yejin Son, Sungyoung Kim, Kyung Myun Lee</dc:creator>
    </item>
    <item>
      <title>A Divergence-Free Scott-Vogelius Finite Element Method for the Surface Stokes Problem</title>
      <link>https://arxiv.org/abs/2606.07840</link>
      <description>arXiv:2606.07840v1 Announce Type: new 
Abstract: We construct and analyze an exactly divergence-free Scott-Vogelius finite element method for the surface Stokes problem. The proposed scheme simultaneously enforces the tangentiality and incompressibility constructs exactly and has the same number of unknowns as the two-dimensional Euclidean discretization. Our construction extends the surface finite element framework of [10,11] to Scott--Vogelius discretizations defined on curved Clough--Tocher triangulations. In contrast to previous isoparametric Scott--Vogelius methods based on macro-element constructions, the present approach defines the finite element spaces directly on the refined surface triangulation, leading to a substantially simpler and more practical implementation. We prove inf-sup stability of the method and derive optimal-order convergence in the isoparametric regime.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07840v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yerim Kone, Michael Neilan, David Poling</dc:creator>
    </item>
    <item>
      <title>RACT: Retrieval Augmented Column-Table Learning and Prediction for Multi-Table Schema Matching</title>
      <link>https://arxiv.org/abs/2606.07843</link>
      <description>arXiv:2606.07843v1 Announce Type: new 
Abstract: Schema matching, a critical task for integrating data from diverse sources, seeks to identify correspondences between columns across different schemas. In multi-table holistic schema matching, columns with similar semantic meaning may reside in tables with different contexts due to heterogeneous schema designs, where similarity-based techniques are inadequate. The focus of this paper is exploiting referential context into schema matching by introducing RACT learning and prediction, a self-supervised framework enabling the probabilistic retrieval of candidate tables for source columns to constrain relevant column candidates. Experiments demonstrate that this approach outperforms similarity-based baselines on matching multi-table schemas. In subsequent matching experiments, constraining the column search space via top-t tables improves both average matching precision and completeness by up to +70%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07843v1</guid>
      <category>cs.DB</category>
      <category>cs.IR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Leonard Traeger, Enas Khwaileh, Andreas Behrend, George Karabatis</dc:creator>
    </item>
    <item>
      <title>GRPO Does Not Close the Multi-Agent Coordination Gap</title>
      <link>https://arxiv.org/abs/2606.07845</link>
      <description>arXiv:2606.07845v1 Announce Type: new 
Abstract: We measure how well current large language models coordinate as multiple agents sharing a common resource, using the dining philosophers problem as a clean test bed. Across 630 episodes spanning seven models and three philosopher counts, four frontier closed-source systems reach mean reward 0.45 to 0.87 and Mistral-Small 24B reaches 0.83 to 0.99, while Qwen3-14B reaches 0.13 to 0.35. We then ask whether group relative policy optimization (GRPO) on rollouts from the task itself can close the gap and find that it cannot: a Welch's t-test on per-episode reward at five philosophers gives p = 0.66 and a Hedges' g of -0.11, with no statistically significant change at ten or fifteen philosophers either. Two further observations qualify the result. The training reward of both 8B and 14B runs peaked at step nine and then declined, so the default saved checkpoint at step 15 is strictly worse than several earlier ones. The four-term reward we use admits a degenerate maximum at zero actions, which DeepSeek-R1-Distill-Qwen-7B and Mistral-Small 24B at five philosophers both inhabit, with mean reward 1.0 and 0.83 respectively at zero meals. The bottleneck for an open-weight 14B model on multi-agent coordination is not training compute but training methodology: reward shaping that does not collapse to a no-action maximum, checkpoint discipline that does not depend on the final step, and curriculum across problem scales.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07845v1</guid>
      <category>cs.MA</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Najmul Hasan, Prashanth BusiReddyGari</dc:creator>
    </item>
    <item>
      <title>Cost-Aware Speculative Execution for LLM-Agent Workflows: An Integrated Five-Dimension Method</title>
      <link>https://arxiv.org/abs/2606.07846</link>
      <description>arXiv:2606.07846v1 Announce Type: new 
Abstract: LLM-agent workflows chain model calls and tool invocations, and spend most of their wall-clock time waiting on upstream operations before downstream ones can start. Speculative execution can reclaim that idle time by launching a downstream operation with a predicted upstream input, but here each speculation costs real money (per-token billing) and its success probability is hard to estimate and drifts over time. This paper presents a method organized around five design decisions: (D1) start a downstream operation before its upstream completes; (D2) price each speculation in real dollars at separate input and output rates; (D3) expose a single operator dial for latency versus cost; (D4) decide via an expected-value rule with a failure-weighted cost term and a preference-adjusted threshold; and (D5) estimate the success probability with a Bayesian Beta-Binomial posterior whose prior is keyed to a dependency-type taxonomy. Variants of these ideas appear in recent work; the combination, with every decision logged in dollars, is what is new. The rule fires only on edges passing an admissibility precondition (side-effect-free, idempotent, or stageable behind a commit barrier), since a wrong speculation is rolled back by re-execution, which refunds tokens but cannot un-send an irreversible side effect. We specify the runtime mechanics, a closed-form result that the rule self-limits as the upstream branching factor grows, a five-stage calibration pipeline (offline replay, shadow, canary, online calibration, drift-triggered kill-switch), and a workload-fit rubric over eight production archetypes. Contrast tables against the four closest published systems (DSP, Speculative Actions v2, Sherlock, B-PASTE) show differentiators on every dimension, and a synthetic validation suite confirms the predicted decision boundary, probability threshold, posterior recovery, and streaming-cancellation behavior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07846v1</guid>
      <category>cs.DC</category>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Faisal Fareed</dc:creator>
    </item>
    <item>
      <title>Beyond English benchmarks: clinical llm evaluation in Brazilian Portuguese</title>
      <link>https://arxiv.org/abs/2606.07853</link>
      <description>arXiv:2606.07853v1 Announce Type: new 
Abstract: Large Language Models are transforming the support for clinical decision and their application in real scenarios. Yet, most benchmarks are conducted in English, and cross-lingual evaluation is needed to tackle the language gaps in global access. We introduce ClinicalBr, the first bilingual benchmark for clinical decision built from real Brazilian case reports. The corpus contains 2,892 cases drawn from 28 SciELO medical journals, spanning 18 specialties, and is structured as parallel Portuguese-English pairs. Each case supports four evaluation tasks: diagnosis retrieval, differential diagnosis, exam recommendation, and treatment planning. We evaluate four models: MedGemma-27B, Sabi\'a-4, DeepSeek-R1, and o3-mini, across both languages. The central finding is that the Portuguese-English performance gap is task-dependent, not general. In diagnosis retrieval, English yields a consistent advantage across all models, with +7.5-12.1 accuracy points. This advantage disappears in differential diagnosis, exam recommendation, and treatment planning, where confidence intervals cross zero for most models and Portuguese completeness scores are marginally higher. Brazilian-endemic conditions proved easier than the full corpus, not harder, indicating that tropical presentations are adequately represented in current pre-training. Exam recommendation was the hardest task across all models and both languages, with F1 scores below 0.10, well below the differential diagnosis ceiling of 0.20-0.27.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07853v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Giordano de Pinho Souza, Glaucia Melo, Josefino Cabral Melo Lima, Daniel Schneider</dc:creator>
    </item>
    <item>
      <title>Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach</title>
      <link>https://arxiv.org/abs/2606.07855</link>
      <description>arXiv:2606.07855v1 Announce Type: new 
Abstract: Path-planning for autonomous vehicles in threat-laden environments is a fundamental challenge because the problem is nonlinear and nonconvex even in simplest scenarios. While traditional optimal control methods can be used to find ideal paths, the computational time is often too slow for real-time decision-making. To solve this challenge, we propose a method based on Deep Deterministic Policy Gradient (DDPG) and model the threat as possibly multiple circular 'no-go' zones. A mission is regarded as a failure if the vehicle enters this restricted zone at any time or does not reach a neighborhood of the destination. The DDPG agent is trained through trial and error in a simulated environment, learning a direct mapping from its current state (position and heading) to a series of feasible actions that guide the agent to safely reach its destination. The reword function has three parts: (a) an attractive field centered at the final destination, (b) some repulsive fields centered at the origins of circular obstacles, and (c) a penalty of control energy consumption (the magnitude of heading change) that indirectly in favor for straight path. The DDPG trains the agent using these incentives to find the largest possible set of starting points wherein a safe path to the destination is guaranteed. This provides critical information for mission planning, showing beforehand whether a task is achievable from a given starting point, assisting pre-mission planning activities. The approach is validated in simulation. A comparison between the DDPG method and a traditional optimal control (pseudo-spectral) method is carried out. The results show that the learning-based agent produces effective paths while being significantly faster, making it a better fit for real-time applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07855v1</guid>
      <category>cs.RO</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Qiang Le, Yaguang Yang, Isaac E. Weintraub</dc:creator>
    </item>
    <item>
      <title>Teacher-Free Self-Training Amplifies but Does Not Compound: A Pass@$K$ Crossover on a Free-Verifier Domain</title>
      <link>https://arxiv.org/abs/2606.07856</link>
      <description>arXiv:2606.07856v1 Announce Type: new 
Abstract: When a language model trains on its own verified outputs, does it acquire capability beyond its base, or merely get better at expressing capability the base already had? We make the question decidable with a teacher-free "constellation" -- a generator, a learned critic, and a free exact verifier -- on a FlashFill-style "trapdoor" DSL, where verified (problem, solution) pairs are cheap to synthesize, hard to invert, and free to check exactly. Everything runs on one 4-bit Qwen3-4B on a single 24 GB GPU, with no model in the loop larger than the base. We report three findings. (i) Critic-guided selection beats verifier-filtered best-of-$k$ by $+9.1$ pp ($6/6$ seeds), with the entire gain localized to tasks where candidates disagree on held-out inputs. (ii) Per-round STaR self-training raises the ceiling but never accelerates -- the gain tracks remaining headroom and decelerates across $K=4$ independent training trajectories. (iii) The domain has no clean zero-capability frontier, so the usual "$0\% \to$ climb $=$ emergence" test is invalid here. A measured pass@$K$ crossover settles the diagnosis: the trained model wins at the operating budget (pass@$8$) but the base overtakes it at a large budget (pass@$64$) on every trajectory, so self-training concentrates probability mass rather than expanding reach. This is amplification, not compounding. ($K=4$ is indicative, not yet a robust across-trajectory CI.)</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07856v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Igor Lima Strozzi</dc:creator>
    </item>
    <item>
      <title>Model Multiplicity for Adversarial Detection in Small Language Model Training on Edge Devices</title>
      <link>https://arxiv.org/abs/2606.07857</link>
      <description>arXiv:2606.07857v1 Announce Type: new 
Abstract: The rise of edge-based machine learning has enabled distributed adaptation of language models across mobile and IoT devices, offering privacy preservation and real-time responsiveness. However, distributed fine-tuning of language models on untrusted or heterogeneous edge nodes introduces new vulnerabilities. Compromised or unreliable devices can inject poisoned updates, leading to stealthy model manipulation or convergence degradation. Classical defenses such as robust aggregation or temporal anomaly detection operate on a single global model and are therefore limited in detecting coordinated or persistent poisoning. This work proposes a new system-level defense based on model multiplicity. Instead of maintaining one global model, the system rotates or concurrently trains multiple small language models (e.g., DistilGPT-2), each updated by independently sampled subsets of edge nodes. These models evolve under distinct training trajectories, creating multiple independent views of the same distributed population. Divergence between models quantified through gradient similarity, loss evolution, or parameter variance serves as a signal of anomalous or adversarial behavior. When one model deviates significantly from the ensemble mean, the system flags its contributing nodes for isolation or re-weighting. We implement this framework and evaluate it on edge-scale simulations of Small Language Model (SLM) training under varying heterogeneity and attack conditions. Results show that model multiplicity enables earlier and more reliable detection of poisoning compared to classical single-model defenses such as Flanders and Robust methods. Our findings demonstrate that diversity in model evolution can serve as a practical and effective defense mechanism for secure distributed learning on resource-constrained edge devices.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07857v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Stefan Behfar, Richard Mortier</dc:creator>
    </item>
    <item>
      <title>A Preliminary Model for Managing Technical Debt in an Agile Environment</title>
      <link>https://arxiv.org/abs/2606.07859</link>
      <description>arXiv:2606.07859v1 Announce Type: new 
Abstract: This paper presents a preliminary model for managing involuntary technical debt in agile environments by formulating, in an integrated way, the dynamics among backlog, debt, velocity, and economic value. The work distinguishes initiated but unfinished functional debt from a simple defect back log and from rework, interprets productivity degradation as technical-debt interest, and derives the naive maximum-remediation policy in order to show its limitations against an intertemporal value-based decision. On this basis, a dynamic policy uk is proposed to balance new development and remediation; a decreasing marginal-value structure is incorporated; and the model is extended to discrete, inhomogeneous items. Exploratory validation through sensitivity analysis and MonteCarlo simulation shows behavior consistent with the economic intuition of the model. Finally, the limits of the formulation are made explicit: its macroscopic nature, its dependence on organizationally stable parameters, its assumption of intertemporal rationality, and its requirement of weak coupling among stories.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07859v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pedro E. Colla</dc:creator>
    </item>
    <item>
      <title>Data Profiling for Change Rules</title>
      <link>https://arxiv.org/abs/2606.07860</link>
      <description>arXiv:2606.07860v1 Announce Type: new 
Abstract: Understanding data change is critical towards understanding trends, normal vs. abnormal behaviours, recognizing patterns, and the causes of change. Existing database systems have limited support for change management, relying on statistics, triggers, and constraints. Data quality rules model sequential changes along a restricted set of attributes, quantify change among unordered tuples, and have limited ability to model the context under which attribute changes occur. In this paper, we introduce Change Rules (CRs) that quantify the sequential changes among ordered tuples in both the antecedent and consequent attributes. CRs aim to address the limitations of existing declarative dependencies to support trend analysis and causal relationships that trigger change among attributes. We propose CR-Miner, an automated algorithm for CR discovery that generates candidate change intervals in a level-wise manner. Experimental results show that CR-Miner achieves an average runtime improvement of 40-50% over existing baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07860v1</guid>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Nishttha Sharma, Fei Chiang</dc:creator>
    </item>
    <item>
      <title>The Last Visible Pixel: Probing Fine-Scale Perception in Vision-Language Models</title>
      <link>https://arxiv.org/abs/2606.07861</link>
      <description>arXiv:2606.07861v1 Announce Type: new 
Abstract: Recent vision-language models (VLMs) excel at multimodal understanding and reasoning, yet their fine-grained visual perception remains underexplored. A natural extension of ``How many r are there in Strawberry?'' asks: how small a visual pattern can a VLM reliably perceive? As such, we introduce FineSightBench, a new benchmark that systematically probes this limit by separating perception tasks (pixel-level recognition of letters, shapes, objects) from reasoning tasks (spatial reasoning, counting, ordering over small targets) across controlled scales of 4--48px. Through comprehensive experiments and detailed failure mode analysis on state-of-the-art models, we reveal a sharp dissociation: perception saturates around 12px, while reasoning remains limited even at larger scales, with persistent numeracy and sequence errors. These findings expose fundamental deficiencies in VLMs' fine-scale visual reasoning that demand more rigorous evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07861v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Lujun Li, Lama Sleem, Niccolo Gentile, Yangjie Xu, Yewei Song, Wenbo Wu, Radu State</dc:creator>
    </item>
    <item>
      <title>Instrumented data for causal scientific machine learning</title>
      <link>https://arxiv.org/abs/2606.07865</link>
      <description>arXiv:2606.07865v1 Announce Type: new 
Abstract: Scientific machine learning is limited less by model size than by the data it is trained on. Observational data records what happened but not why; template synthetic data has a known generating process but only for the simulator's template, not the case a user faces. We argue a third option is now operationally feasible: instrumented data, in which every datum carries the mechanistic model that produced it, an explicit uncertainty over that model, and an executable family of counterfactuals. Verification-and-validation (V&amp;V) instrumented image-to-simulation pipelines are one realisation: a sensor observation becomes a fully specified, solver-backed simulation with explicit, editable parameters and a propagated aleatoric/epistemic uncertainty. The substrate is case-specific, mechanistically supervised, and supports causal interventions through Pearl's do-operator. Near-term consequences for validation, auditing, and surrogate training span computational biology, climate, materials, fluid mechanics, and medical imaging; a longer-term, falsifiable implication concerns foundation models for scientific reasoning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07865v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>physics.comp-ph</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Daniel N. Wilke</dc:creator>
    </item>
    <item>
      <title>Overcoming the Regulatory Bottleneck via Agent-to-Agent Protocols: A Nuclear Case Study</title>
      <link>https://arxiv.org/abs/2606.07866</link>
      <description>arXiv:2606.07866v1 Announce Type: new 
Abstract: Regulatory review of advanced nuclear reactor designs routinely spans more than three years and consumes hundreds of millions of dollars in combined regulator and applicant labor. We present the Regulatory Context Protocol (RCP), an Agent-to-Agent communication standard that replaces the formal human-to-human pipeline between regulators and applicants with a structured, auditable agentic channel, while preserving human oversight at safety-significant decision points. The protocol is calibrated against an analysis of 1,236 documents from U.S. Nuclear Regulatory Commission advanced reactor dockets and demonstrated with a working multi-agent pilot. Against an 89M USD, 42-month Reconstructed Baseline, RCP cuts costs by 50-77 percent (21M-44M USD) and timelines by 65 percent (15 months). Without a shared protocol, Standalone Agents reach only 54M-74M USD and 21 months. The residual cost-and-time gap is structural, not algorithmic: it traces to the inter-organizational pipeline that only an agent-to-agent standard can compress. The same bottleneck - formal multi-party review under strict auditability requirements - characterizes pharmaceutical approvals, environmental permitting, financial supervision, and aviation certification. The US regulatory paperwork burden carries a 426.5 billion USD annual opportunity cost; replicated broadly, the projected 50-77 percent reduction implies savings on the order of 210-330 billion USD per year - approaching 1 percent of US GDP.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07866v1</guid>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Akshay J. Dave, David Grabaskas, Joseph A. Renevitz, Richard B. Vilim</dc:creator>
    </item>
    <item>
      <title>The Cold-Start Safety Gap in LLM Agents</title>
      <link>https://arxiv.org/abs/2606.07867</link>
      <description>arXiv:2606.07867v1 Announce Type: new 
Abstract: Are tool-calling LLM agents equally safe throughout a conversation? We discover they are not: agents are most vulnerable at the very start of a session and become substantially safer after a few regular agentic tasks -- a phenomenon we term the cold-start safety gap. To study this systematically, we introduce Safety Over Depth for Agents (SODA), a benchmark that controls how many regular agentic tasks the agent completes before encountering a safety threat, supporting up to 20 preceding tasks. Evaluating 7 models from 4 families, safety improves by 9--52% as the number of preceding regular agentic tasks increases from zero to twenty. Representation analysis confirms that model hidden states gradually shift toward a safety-aligned region as more preceding tasks are present. By systematically studying which part of the preceding conversation matters most, we find that the regular agentic tasks themselves are the primary driver of safety, while the agent's own prior responses have less effect on safety but are essential for preserving later utility. This conclusion is further supported by evaluation on open-source safety benchmarks (AgentHarm, Agent Safety Bench) and utility benchmarks (BFCL, API-Bank), confirming that warming up the agent with regular agentic tasks before deployment makes it safer and preserves full capability. Based on these findings, we recommend a simple deployment strategy: having the agent complete a few regular agentic tasks before possible exposure to safety-critical requests mitigates the cold-start safety gap. Our code is available at https://github.com/Trustworthy-ML-Lab/Agent-Cold-Start-Safety-Gap</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07867v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chung-En Sun, Linbo Liu, Tsui-Wei Weng</dc:creator>
    </item>
    <item>
      <title>ASH: Asymmetric Scalar Hashing With Learned Dimensionality Reduction for High-Fidelity Vector Quantization</title>
      <link>https://arxiv.org/abs/2606.07870</link>
      <description>arXiv:2606.07870v1 Announce Type: new 
Abstract: For a long time, additive quantizers, such as product quantization, have been considered the gold standard in terms of accuracy and efficiency. Recently, scalar quantization has re-emerged from the depths of history with a new wave of data-agnostic techniques. Inscribed in this general framework, we turn our attention to data-driven methods, showing that new highs in recall and speed can be achieved by reducing the number of dimensions while increasing the bitrate per dimension. Critically, this dimensionality reduction needs to be learned from data to be successful. We present ASH (Asymmetric Scalar Hashing), a data-driven encoder-decoder framework that applies dimensionality reduction to database vectors via a learned orthonormal projection, followed by scalar quantization, while keeping queries in their original form. This asymmetric design enables higher accuracy than the best additive and scalar quantizers at iso-compression, while admitting highly efficient similarity computations via SIMD operations. ASH has short learning and encoding times, making it attractive for real-world deployment. Extensive experiments on a variety of datasets demonstrate that ASH achieves state-of-the-art ANN recall and speeds across all compression regimes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07870v1</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mariano Tepper, Theodore Willke</dc:creator>
    </item>
    <item>
      <title>VisualFLIP: Do Predictions Depend on Task-Critical Visual Evidence in Multimodal Reasoning?</title>
      <link>https://arxiv.org/abs/2606.07872</link>
      <description>arXiv:2606.07872v1 Announce Type: new 
Abstract: When a multimodal large language model answers a visual reasoning question correctly, is the prediction actually supported by the task-critical visual evidence? Correct answers can coexist with flawed reasoning, making accuracy alone an incomplete test of grounding. We introduce VisualFLIP, a paired benchmark with 1,374 images arranged as same-question perturbation pairs across cardinality, attribute, spatial, and logic tasks. Each pair keeps the question fixed but minimally changes the evidence so the gold answer deterministically flips. We evaluate 24 MLLMs with pair accuracy, which requires solving both sides of a pair, and Collapse Rate (CR), which measures how often a model that solves at least one side repeats the same non-empty answer for both images. Together, these metrics show that paired correctness and evidence dependence are related but distinct: capable models can still fail to update after task-critical visual changes, and collapse becomes more severe for some models when the edited image follows an earlier answer in a sequential setting. Further details are available on our project page: https://didizhu-judy.github.io/VisualFLIP/</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07872v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Didi Zhu, Changrui Chen, Stefanos Zafeiriou, Jiankang Deng</dc:creator>
    </item>
    <item>
      <title>Adverse Effects of V2V Adoption on Road Safety</title>
      <link>https://arxiv.org/abs/2606.07873</link>
      <description>arXiv:2606.07873v1 Announce Type: new 
Abstract: Vehicle-to-vehicle (V2V) communication is expected to improve road safety and reduce congestion. However, prior work shows that V2V information sharing under partial adoption may increase congestion and decrease safety. We study whether increasing V2V adoption itself affects road safety. We propose a corrected version of an existing model and analyze its behavior under varying adoption levels. We show that, in some cases, increased V2V adoption can increase accident probability. Moreover, under an optimal signaling policy, the system can ensure that accident probability is non-increasing in the adoption level.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07873v1</guid>
      <category>cs.GT</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhenqi Liu, Philip N. Brown, Keith Paarporn</dc:creator>
    </item>
    <item>
      <title>Safety is Contextual, LLM-Judges Are Not: Navigating the Rigid Priors of Evaluators</title>
      <link>https://arxiv.org/abs/2606.07874</link>
      <description>arXiv:2606.07874v1 Announce Type: new 
Abstract: LLMs-as-judges are the only way to evaluate safety at scale. Despite their importance, LLM-judges themselves are rarely evaluated beyond human agreement in simple, static benchmarks. We therefore investigate two under-explored but crucial properties of LLMs-as-judges: their susceptibility to relying on in context-information, and their steerability to differing safety definitions, which may not align with their internal safety priors. We evaluate the safety judging abilities of many generalist LLMs and safety-specific judges, and investigate the impact of task demonstrations, novel in-context information, and changing safety definitions. We find that while LLM-judges can learn from new information, they are broadly unlikely to adjust their evaluations if the context or safety definition contradicts their prior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07874v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anissa Alloula, Federico Licini, Ava Batchkala, Seraphina Goldfarb-Tarrant</dc:creator>
    </item>
    <item>
      <title>Whose Norms? Disentangling Cultural and Personal Alignment in Large Language Models</title>
      <link>https://arxiv.org/abs/2606.07877</link>
      <description>arXiv:2606.07877v1 Announce Type: new 
Abstract: Large language models are increasingly used for social decision-making situations that require balancing cultural norms with personal preferences. For example, a user preferring honesty might ask whether to correct a coworker publicly when local norms favor indirect feedback. Yet existing research studies cultural alignment and personalization largely separately. We introduce PACT, the Personal-Preference and Cultural-Norm Trade-off framework, which evaluates whether models choose to follow a cultural norm or allow personal preferences. We find that LLMs vary in how rigidly they enforce cultural norms, with behavior shifted more by country context (7.8%) than age (1%) and gender (0.7%) and shifting non-uniformly after instruction tuning. Furthermore, our five-country human study on PACT shows that culture-following in humans is mainly driven by scenario country, with the lowest agreement when participants judge their own cultural contexts, showing within-culture pluralism. Finally, human-LLM alignment experiments show that models can match majority choices, but fail to capture response distributions and uncertainty (with best correlations reaching only 0.24). Together, these findings motivate alignment evaluations that go beyond majority to capture cultural pluralism and disagreement in social judgment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07877v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Angana Borah, Isabelle Augenstein, Rada Mihalcea</dc:creator>
    </item>
    <item>
      <title>Still: Amortized KV Cache Compaction in a Single Forward Pass</title>
      <link>https://arxiv.org/abs/2606.07878</link>
      <description>arXiv:2606.07878v1 Announce Type: new 
Abstract: The KV cache is the memory bottleneck of long-horizon language model deployment. Practically, a deployable compactor must be lightweight enough to call during inference, expressive enough to preserve context under constraint, and reusable across a trajectory. Existing compaction methods satisfy only part of this requirement: selection methods are lightweight but subset-bound, while synthesis methods are expressive but rely on per-context optimization. Here we introduce Still, a small per-layer Perceiver trained once against a frozen base model that produces compact keys and values in a single forward pass. On Qwen and Gemma models, Still occupies the favorable side of the speed--quality frontier across compression ratios from $8\times$ to $200\times$ and context lengths from $8$k to $128$k. On the long-context RULER grid, Still exceeds the strongest baseline by 8--22 points. The same compact cache also supports free-form summarization, preserving most of the full-context gain on HELMET and winning a pairwise LongBench summarization comparison against KV-Distill. Because compaction is a forward pass, Still can be applied iteratively, entering a long-horizon regime unavailable to per-context methods. We show that amortization makes long-context cache compaction tractable, and synthesis makes its compact state useful at extreme compression.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07878v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Charles O'Neill, Alex Sandomirsky, Harry Partridge, Mudith Jayasekara, Max Kirkby</dc:creator>
    </item>
    <item>
      <title>Breaking the Bubble: Asynchronous Pipeline Parallel Training with Bounded Weight Inconsistency</title>
      <link>https://arxiv.org/abs/2606.07881</link>
      <description>arXiv:2606.07881v1 Announce Type: new 
Abstract: Pipeline parallelism is essential for training large neural networks, but existing schedules trade off throughput, memory, and optimization consistency. Synchronous pipelines preserve forward/backward weight consistency but suffer from bubbles; asynchronous pipelines remove bubbles but introduce weight-version mismatch, typically requiring weight stashing, prediction, or correction mechanisms. We introduce PACI (Pipeline Asynchronous training with Controlled Inconsistency), a bubble-free asynchronous pipeline method that bounds forward/backward version drift without weight stashing, prediction, additional parameter copies, or global synchronization. The key idea is to use local gradient accumulation as a version-control mechanism: by slowing parameter-version evolution relative to pipeline delay, PACI limits the number of optimizer updates crossed by any micro-batch while preserving steady-state utilization. In GPT-style language-model pretraining, PACI matches the stability and final perplexity of synchronous 1F1B-flush, retains the same peak memory footprint, achieves fully utilized pipeline throughput, and improves training time-to-accuracy by up to $1.69\times$ over the fastest flush baseline. These results show that forward/backward inconsistency need not be eliminated: when explicitly bounded, it can be safely traded for substantial efficiency gains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07881v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Itay Elam, Eliron Rahimi, Avi Mendelson, Chaim Baskin</dc:creator>
    </item>
    <item>
      <title>The Cross-Architecture Substrate: A Domain-Transcendent, Calibration-Surviving Geometric Invariant of Modern Vision Encoders</title>
      <link>https://arxiv.org/abs/2606.07882</link>
      <description>arXiv:2606.07882v1 Announce Type: new 
Abstract: Different vision neural networks -- trained to classify, contrast, reconstruct, or match images to text -- should have correspondingly different internal representations. We report that they do not. After training, the top sixteen principal directions of variation inside thirteen modern vision encoders converge to the same sixteen-dimensional geometric object. We call this the cross-architecture substrate and study it with PCA, centred kernel alignment (CKA), and Pang 2026 calibration. The substrate transports across four visual domains (natural photographs, medical CT, satellite, microscopy) at median Procrustes-CKA 0.679, and across eight domains (adding sketches, depth, thermal infrared, astronomy) at 0.604, every pair &gt;0.40. It survives Pang calibration globally (7.4x disc-vs-MAE separation, n=13,394) and locally (4.82-5.30, p&lt;10^{-44}). It is not pixel statistics (0.263), not Gabor features (0.31), not a random projection (0.041), and emerges in the first 10% of training while accuracy keeps climbing. We deliver four applications: a label-free transferability filter beating LogME (3x faster, +0.15 Kendall-tau); a four-way domain detector (99.6% accuracy); a frozen low-shot probe (16 dims beat 768-dim DINOv2 by 3.78pp at N=50 labels per class); and a teacher-free distillation auxiliary matching trained-teacher KD on 33 pairs (7.56pp peak gain at 10% label fraction). The substrate does not cross modalities, does not help cross-paradigm distillation, and does not predict transfer quality (rho=0.08 against transfer accuracy).</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07882v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yousef Radwan</dc:creator>
    </item>
    <item>
      <title>DP4SQL: Differentially Private SQL with Flexible Privacy Policies</title>
      <link>https://arxiv.org/abs/2606.07883</link>
      <description>arXiv:2606.07883v1 Announce Type: new 
Abstract: The plausible deniability model of differential privacy for single-table datasets is well-understood. However, applying differential privacy to relational databases is much trickier: each application needs flexibility in specifying the pieces of information about an entity, spread across multiple relations, that require plausible deniability guarantees. Existing differentially private SQL systems only support rigid privacy policies. Even seemingly small changes, such as specifying that some tables need to protect the existence of records while others only need to protect the record contents, require significant manual effort in updating their privacy accountants and proving their correctness.
  One example of a challenge is the presence of partially public data. Public columns in a table (e.g., faculty names in a university dataset and partial course enrollment information) can cause some queries to require more noise (compared to fully private data), while others require less noise. This kind of reasoning is not supported in existing systems. Another example is when different parts of records (e.g., demographics, financial data) require different levels of privacy protection. Again, existing differentially private SQL systems need to rewrite their rules for calculating query stability in order to support such a feature. This paper presents DP4SQL, a differentially private SQL system that allows data curators to better customize the plausible deniability requirements for their relational databases. This avoids the drawbacks of the "one-size-fits-all" systems that would either underprotect the data or inject too much noise into query answers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07883v1</guid>
      <category>cs.CR</category>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Andrew Cascio, KinChin Tong, Daniel Kifer, Zeyu Ding, Danfeng Zhang</dc:creator>
    </item>
    <item>
      <title>Value-Refined Modal Fixed-Point Semantics with Certified Choice and Public Share-Alike Certificates</title>
      <link>https://arxiv.org/abs/2606.07884</link>
      <description>arXiv:2606.07884v1 Announce Type: new 
Abstract: This paper presents a finite modal semantics where truth is closed under admissible continuation, then refined by discounted value, and finally certified by residual tests. The admissibility kernel is the classical greatest fixed point of a one-step predecessor expressing that some choice cell has all compatible successors inside a set. Certified choices are exactly local witnesses; the discounted value transformer is defined only over those witnesses; value-refined modal bisimulation is the coarsest local equivalence preserving formulas, kernel, certified choices, Bellman values, greedy sets, residual certificates, and public release certificates. A canonical pseudometric refines this equivalence: it is the unique fixed point of a Hausdorff-lifted choice-matching transformer over certified choices; its zero set is the value-refined bisimulation, and the optimal discounted value is one-Lipschitz with respect to it. Any approximate quotient incurs only a distance-bounded value error. Branching choice-cell and locus presentations place choice inside the model; the transition presentation is a conservative retraction. The same engine is applied to a public share-alike release fragment: attribution as label preservation, same-license propagation as derivative closure, no downstream restriction as admissibility, and the BY-SA witness as a residual-stable certificate. Finite examples show that altering the order of truth, admissibility, value, quotienting, public derivation, and certification changes the semantics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07884v1</guid>
      <category>cs.LO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Faruk Alpay, Levent Sarioglu</dc:creator>
    </item>
    <item>
      <title>Strained Coherence: A Pre-Failure Signal in Coding Agent Execution Trajectories</title>
      <link>https://arxiv.org/abs/2606.07889</link>
      <description>arXiv:2606.07889v1 Announce Type: new 
Abstract: LLM-based coding agents sometimes acknowledge a problem in their own reasoning and then proceed anyway. We call this pattern strained coherence: a safety-relevant failure mode in which an agent has information that should change its behavior, states that information, and still acts against it. The pattern overlaps with verbalized reward hacking, where an agent names a tension between a task proxy and the underlying goal yet optimizes the proxy anyway. We give an operational definition, build a Claude Sonnet 4.6 judge that reads full trajectories and flags spans where the pattern occurs, and evaluate it on 44 Terminal-bench-2 trajectories using a Qwen3.5-35B-A3B backbone. Flagged trajectories fail 94% of the time versus 46% for unflagged trajectories (47-point gap, Fisher's exact p = 0.003; 46 points after excluding three prompt-embedded examples, p = 0.006). At matched selectivity, the detector reaches 94% precision versus 88% for a lexical discourse-marker baseline; the 10-trajectory intersection of the two methods has a 100% failure rate (Clopper-Pearson 95% CI [69%, 100%]). We replicate on Gemma4-31B with 43 trajectories: the overall signal is directionally consistent but not significant (20-point gap, p = 0.31), with attenuation driven largely by 13 trajectories with zero think content, where the detector has no substrate to analyze. In the high-verbosity Gemma tertile, the gap is +30 points; in the mid- and high-verbosity Qwen tertiles, it is +40 points each. The first flag appears at a median of 83-84% of elapsed trajectory time across both models, and the binary flag survives paraphrases that soften explicit conflict markers (8/8 trajectories). Unlike univariate predictors, the detector emits interpretable span-level output -- quoted acknowledgment, quoted action, and typed conflict -- showing what the agent saw and ignored.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07889v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Marut Pandya, Kasey Zhang, Baiqing Lyu</dc:creator>
    </item>
    <item>
      <title>Partially Performative Prediction</title>
      <link>https://arxiv.org/abs/2606.07890</link>
      <description>arXiv:2606.07890v1 Announce Type: new 
Abstract: Performative prediction studies feedback loops that arise when predictive models are deployed in consequential domains. In these settings, deploying a model can change the population whose patterns the model aims to predict, inducing a distribution shift that is endogenous to the learning system. This perspective departs from classical treatments of distribution shift, where shifts are typically modeled as exogenous changes in the data-generating process. Yet, in practice, distribution shift is rarely one or the other. Predictive models may influence future data through the decisions they support, while the world itself continues to drift for reasons beyond the learner's control. We study partially performative prediction, a framework that captures both endogenous and exogenous sources of distribution shift. The framework generalizes performative prediction by allowing the data distribution to evolve both in response to the deployed model and according to an external, time-varying process. We extend the central notions of performative stability and performative optimality to this setting by defining their online analogues that track the evolving partially performative environment. We analyze practical learning heuristics, including repeated retraining, and characterize when they successfully adapt to partially performative environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07890v1</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jaewook Lee, Tijana Zrnic</dc:creator>
    </item>
    <item>
      <title>C3VD-DEFCOL: A Deformable Colonoscopy Dataset with Time-Resolved 3D Ground Truth and Realistic Appearance</title>
      <link>https://arxiv.org/abs/2606.07891</link>
      <description>arXiv:2606.07891v1 Announce Type: new 
Abstract: 3D reconstruction could improve colonoscopy by estimating mucosal coverage and alerting clinicians to missed regions during screening. However, algorithm development is limited as no current datasets provide both a realistic in vivo appearance and dense, time-resolved 3D ground truth, especially under non-rigid deformation. We present C3VD-DEFCOL, a framework and dataset for evaluating deformable colonoscopy reconstruction with paired geometry and realistic texture. Starting from C3VD/C3VDv2 colon meshes and camera trajectories, we generate controlled deformations of the colon surface, including peristaltic waves and centerline motion, and render per-frame depth, surface normals, optical flow, camera poses, and time-stamped 3D meshes. We then use the rendered geometry, primarily depth, to condition an LTX-2.3-based sim-to-real translation model that produces RGB clips with in vivo-like mucosal color, texture, vasculature, and specular appearance while preserving the underlying 3D scene structure. The resulting dataset contains 110 videos from 11 unique colon mesh geometries, with varying camera trajectories, appearances, and parameterized deformation regimes, including three peristaltic severity levels that serve as controlled evaluation axes. We evaluate the generated videos using appearance realism, geometric consistency, and temporal consistency metrics, and use the paired ground truth to benchmark the downstream task of pose estimation in deformable 3D reconstruction. Our experiments show how pose estimation error increases with increasing deformation severity, providing a controlled stress test that is not possible with existing in vivo datasets. Overall, C3VD-DEFCOL is designed as a reproducible, quantitative evaluation platform for testing deformable 3D reconstruction algorithms, with the goal of reducing the domain gap between synthetic datasets and in vivo colonoscopy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07891v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ethan Luk, Mayank V. Golhar, Anthony Song, Ra\'ul Iranzo, V\'ictor M. Batlle, Lalithkumar Seenivasan, Jos\'e M. M. Montiel, Nicholas J. Durr</dc:creator>
    </item>
    <item>
      <title>Beyond Individual Personas: Aligning Synthetic Dialogue to Population-Level Behavior Distributions</title>
      <link>https://arxiv.org/abs/2606.07893</link>
      <description>arXiv:2606.07893v1 Announce Type: new 
Abstract: Synthetic dialogue corpora are increasingly used as proxies for target dialogue data, yet persona-grounded generators optimize individual conversations rather than corpus composition, yielding locally plausible dialogues with distorted population-level behavior mixes. We introduce GroupPersona, a framework that aligns synthetic dialogue corpora to the behavior distribution of a reference corpus. GroupPersona turns population statistics into generation controls: it separates each dialogue's core behavioral signature from predictable side effects, and uses the resulting behavioral groups to condition user agents on the interaction patterns that define the reference population. We evaluate GroupPersona on four corpora crossing two dialogue sources, assistant-style and Reddit-derived, with two construction variants: structure-preserving and variation-enhanced. GroupPersona lowers Jensen-Shannon divergence between synthetic and reference distributions over 12 behavior attributes from 0.234 to 0.177 relative to the strongest average baseline, a 24.4% reduction, and is best or tied-best on all four corpora while preserving structural alignment. It also achieves the closest calibration to reference-conversation quality scores, reducing mean absolute deviation from the reference-conversation profile to 0.63 versus 0.91 for the next-best baseline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07893v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xinyi Liu, Rinat Khaziev, Hooshang Nayyeri, Emine Yilmaz, Charith Peris, Hari Thadakamalla</dc:creator>
    </item>
    <item>
      <title>DD-GEPA: Prompt Optimization for Dialogue Disentanglement Focusing on Task Instruction and Utterance Representation</title>
      <link>https://arxiv.org/abs/2606.07894</link>
      <description>arXiv:2606.07894v1 Announce Type: new 
Abstract: Multi-party chat often contains interleaved dialogues because multiple participants can discuss different topics at the same time. Dialogue disentanglement addresses this problem by separating an entangled utterance sequence into coherent dialogues. While large language models (LLMs) are promising for this task, they still struggle with dialogue disentanglement and achieve low accuracy. This paper proposes an automatic prompt optimization for LLM based dialogue disentanglement. We decompose the prompt into three components: task instruction, utterance representation, and output instruction, and optimize them using GEPA, an optimization method for compound AI systems. Experiments on benchmark datasets show that the optimized prompts improve dialogue disentanglement accuracy over the original prompts and can surpass hand crafted prompts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07894v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Naoki Takada, Tatsunori Mori</dc:creator>
    </item>
    <item>
      <title>TBD-VLA: Temporal Block Diffusion Vision Language Action Model</title>
      <link>https://arxiv.org/abs/2606.07895</link>
      <description>arXiv:2606.07895v1 Announce Type: new 
Abstract: Discrete Vision-Language-Action (VLA) models typically formulate action generation as next-token prediction over discretized action spaces, conditioning each token autoregressively on prior context. While effective, this paradigm incurs high inference latency and largely ignores the temporal structure inherent in action trajectories. Recent efforts introduce parallel decoding to improve efficiency, enabling faster inference, but lack explicit mechanisms for modeling token dependencies. We introduce TBD-VLA, a discrete token-based VLA framework that incorporates block diffusion to enable temporal action generation. We partition action sequences into temporal blocks and perform masked discrete diffusion within each block, while maintaining autoregressive generation across blocks. This design unifies temporal autoregression and parallel action decoding, achieving both strong temporal coherence and improved inference speed. In addition, the explicit temporal modeling enables asynchronous execution of action chunks (e.g., Real-Time Chunking) via temporal in-painting. TBD-VLA significantly outperforms prior VLA approaches in both simulation and real-world manipulation tasks, offering a scalable path toward fast, temporally aware, discrete VLA models. Project webpage: https://tbd-vla.github.io/</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07895v1</guid>
      <category>cs.CV</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sung-Wook Lee, Xuhui Kang, Yen-Ling Kuo</dc:creator>
    </item>
    <item>
      <title>The AI Epistemic Deference Index: A Continuous Measure of Sycophancy</title>
      <link>https://arxiv.org/abs/2606.07897</link>
      <description>arXiv:2606.07897v1 Announce Type: new 
Abstract: Current AI models frequently exhibit epistemic sycophancy, endorsing claims to agree with a user. Existing evaluations typically measure this either by assessing what it takes to make a model shift a binary endorsement or by eliciting an explicit probability in a proposition. However, much user-facing sycophantic behavior is demonstrated through shifts in graded support expressed through ordinary language. We propose the AI Epistemic Deference Index (AEDI): a continuous, unidimensional score representing how sensitive the support expressed in a model's output is to the attitude expressed in a user's prompt. To generate AEDI, we provide a new protocol for estimating probabilities from natural language outputs, using LLMs-as-judges validated for consistency and correlation to human judgment. We deploy it on a new curated database of 500 propositions across diverse topics and 16,000 prompts varying in user attitude, testing eight prominent models. Every model exhibits substantial deference, though with large and systematic differences across providers, with Claude models demonstrating the least, and Grok and Gemini models the most. The effect is amplified in prompts requesting a written artifact, and concentrated on propositions where models hold weaker priors. We release AEDI as an easy-to-update benchmark and measurement pipeline for output-level sycophancy evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07897v1</guid>
      <category>cs.AI</category>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alejandro Botas, Paul de Font-Reaulx, Luke Hewitt</dc:creator>
    </item>
    <item>
      <title>Temporal Coverage over Density: Parsimonious Training-Set Design for ML Climate Downscaling</title>
      <link>https://arxiv.org/abs/2606.07898</link>
      <description>arXiv:2606.07898v1 Announce Type: new 
Abstract: High-resolution regional climate simulations provide critical information for climate impacts assessments but remain computationally expensive, motivating the development of machine-learning downscalers and emulators. A key challenge is determining how limited high-resolution simulations should be distributed across a changing climate trajectory to capture both forced climate response and internal variability. Using the CESM2 Large Ensemble over the western United States, we compare three training-year selection strategies under fixed data budgets: a contiguous block of historical years, years drawn from both the beginning and end of the simulation period, and years distributed throughout the full climate trajectory. Including both historical and future years consistently outperforms training on historical years alone, demonstrating the importance of exposing downscaling models to climate states outside the historical record and highlighting limitations of stationarity assumptions common in statistical downscaling. Training on years distributed throughout the full climate trajectory performs best overall, indicating that broad sampling of internal variability provides additional information beyond exposure to the forced climate response alone. Models trained on temporally distributed subsets more successfully reproduce variability in unseen ensemble members while retaining strong performance across a wide range of climate diagnostics. Even when trained on only one-tenth of the available high-resolution years, temporally distributed models remain highly competitive with full-data training. These results suggest that, under fixed computational budgets, broad sampling of climate states is more valuable than temporal continuity when allocating scarce high-resolution simulations. The findings provide practical guidance for regional climate downscaling and large-ensemble projection workflows.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07898v1</guid>
      <category>cs.LG</category>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Karandeep Singh, Stefan Rahimi, Chad W. Thackeray, Stephen Cropper, Alex Hall</dc:creator>
    </item>
    <item>
      <title>CellSense: A Sub-6 GHz Cellular ISAC System for Clutter-Robust Passive Sensing</title>
      <link>https://arxiv.org/abs/2606.07900</link>
      <description>arXiv:2606.07900v1 Announce Type: new 
Abstract: Future wireless networks demand capabilities beyond traditional communication, driving the development of Integrated Sensing and Communication (ISAC) for environmental awareness, localization, and tracking. Ubiquitous cellular deployment allows ISAC to maximize spectral efficiency, lower costs, and expand sensing coverage. However, sub-6 GHz research has heavily favored communication, leaving sensing capabilities largely underexplored. To bridge this gap, we introduce CellSense, a novel sub-6 GHz ISAC architecture natively integrated into the 5G cellular protocol stack for real-world target tracking. We validate the system via Sionna-based orthogonal frequency-division multiplexing (OFDM) link-level simulations and an experimental USRP hardware prototype using the OpenAirInterface (OAI) stack. Furthermore, we analyze the communication-sensing tradeoff by quantifying how pilot symbol density impacts throughput versus sensing accuracy. Simulations show that CellSense achieves a 74 percent detection probability with a 1.43 m localization error in indoor warehouse environment, which improves to 94 percent detection and a sub-meter error of 0.33 m in the outdoor environment of Oval area at the NCSU Centennial campus. Hardware experiments in a highly cluttered indoor laboratory confirm a 1.28 m localization accuracy and 76 percent detection probability, proving its efficacy for practical ISAC deployments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07900v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>eess.SP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bibhor Kumar, Ish Kumar Jain, Vijay K Shah</dc:creator>
    </item>
    <item>
      <title>End-to-End Control of a Powered Knee-Ankle Prosthesis Towards Unified, Tuning-Free Assistance</title>
      <link>https://arxiv.org/abs/2606.07902</link>
      <description>arXiv:2606.07902v1 Announce Type: new 
Abstract: Powered prostheses conventionally rely on impedance controllers that require extensive manual tuning and explicit mode classification. In this work, we present real-time deployment of an end-to-end prosthesis controller that estimates continuous actuator signals from onboard sensors, eliminating the need for intent classifiers and subject-specific tuning. Temporal Convolutional Networks were trained on a multi-terrain dataset from 18 individuals with transfemoral amputation and deployed in real time across five locomotion modes. Four participants (three able-bodied, one with transfemoral amputation) ambulated across level ground, ramp ascent and descent, and stair ascent and descent. During level walking, the deployed controller reproduced the training-data scaling of peak ankle torque with walking speed (deployed 0.85 Nm/kg per m/s, p = 0.001; training 0.96 Nm/kg per m/s, 95% CI [0.42, 1.50], p = 0.002), after excluding one outlier traced to atypical prosthesis loading. During ramp ascent, the controller scaled knee pre-flexion with grade (deployed 2.92 deg/deg, p = 0.027; training 3.30 deg/deg, 95% CI [1.83, 4.77], p &lt; 0.001). During ramp descent, the controller increased resistive knee torque relative to level walking (deployed +0.16 Nm/kg, p &lt; 0.001; training +0.16 Nm/kg, p = 0.008). Seamless stair transitions were generated for both intact- and prosthetic-side-leading sequences in ascent and descent, despite the training data containing only one limb-leading sequence. These results provide initial evidence towards end-to-end control that can provide unified, mode-adaptive prosthetic assistance without subject-specific tuning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07902v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>John Shim, Christoph Nuesslein, Sixu Zhou, Hanjun kim, Kinsey Herrin, Aaron Young</dc:creator>
    </item>
    <item>
      <title>Contract2Tool: Learning Preconditions and Effects for Reliable Tool-Augmented LLM Agents</title>
      <link>https://arxiv.org/abs/2606.07904</link>
      <description>arXiv:2606.07904v1 Announce Type: new 
Abstract: Tool-augmented large language model agents increasingly rely on external APIs, but standard tool schemas describe how to call a tool, not when the tool is causally appropriate or what task state it produces. Causal tool filtering addresses this gap by using lightweight contracts that specify each tool's preconditions, effects, risk level, and cost. However, manually writing and maintaining such contracts does not scale to large or changing tool ecosystems. We introduce Contract2Tool, a framework for inferring tool contracts from metadata, schemas, documentation, and execution traces. Contract2Tool converts observable tool evidence into normalized symbolic contracts that can be evaluated intrinsically and deployed inside downstream causal tool filtering. We evaluate learned contracts against gold preconditions, effects, and risk labels, and measure their downstream utility on multi-step agent tasks. Our results show that hybrid documentation-and-trace evidence produces contracts accurate enough to preserve most of the reliability and efficiency benefits of gold contracts. Learned-contract CMTF achieves 0.980 downstream success, close to 0.990 for gold-contract CMTF, while reducing visible tools from 100 to 1 and reducing average token usage from 26,172 to 2,528 relative to all-tools exposure. These results suggest that learned contracts can provide a scalable contract layer between tool schemas and reliable agent execution.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07904v1</guid>
      <category>cs.AI</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Rahul Suresh Babu, Laxmipriya Ganesh Iyer</dc:creator>
    </item>
    <item>
      <title>Extremum Seeking Control Based Adaptive Compensation of Position Sensor Harmonics in PMSM Drives</title>
      <link>https://arxiv.org/abs/2606.07906</link>
      <description>arXiv:2606.07906v1 Announce Type: new 
Abstract: Permanent Magnet Synchronous Machines (PMSMs) have become one of the preferred forms of electromechanical energy converters, attributing to their high efficiency, torque density, and other unique advantages. However, given the need for proper rotor position measurement for commutation and field orientation, accurate rotor position sensing is of paramount importance. In sensing motor rotor position with a sensor, harmonic errors that arise in the sensing subsystem lead to undesirable torque ripple. Thus, this paper presents an adaptive, extremum seeking control based approach capable of mitigating position signal harmonics in PMSMs. The proposed approach is experimentally validated under varying torque, speed, and harmonic conditions. Its harmonic compensation performance is comparatively evaluated against the look-up table based method. Furthermore, the accuracy of the proposed approach is analyzed, highlighting its effectiveness.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07906v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Gayan V. Dissanayake, Sandun S. Kuruppu</dc:creator>
    </item>
    <item>
      <title>3D Oral Modelling with Improved Vertex Distribution Using Matching-Based Learning</title>
      <link>https://arxiv.org/abs/2606.07907</link>
      <description>arXiv:2606.07907v1 Announce Type: new 
Abstract: In our previous work, a deep learning-based framework for 3D intraoral reconstruction was proposed. The model directly predicts explicit 3D point cloud coordinates from ten fixed-angle intraoral images, employing MobileNetV2 and Multi-head Attention for multi-view feature fusion, with a combined L1 Loss and Chamfer Distance as the loss function. Although the model achieved an accuracy of 77.49%, predicted vertices tended to concentrate in high-density regions of the ground truth, leaving other regions largely uncovered.
  In this paper, an improved loss function is proposed to address this limitation. Hungarian matching with filtering and Repulsion Loss are introduced to enforce more uniform vertex distribution across the reconstructed model. The proposed model achieves an accuracy of 68.02%, which is numerically lower than the previous model. However, the vertex clustering issue observed in the prior work is substantially alleviated, with predicted vertices distributed more evenly across the entire reconstructed surface.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07907v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jihun Cho, Soo-Yeon Jeong, Eun-Jeong Bae, Sun-Young Ihm</dc:creator>
    </item>
    <item>
      <title>Layer-wise Derivative Controlled Networks Achieve Competitive Accuracy and Gradient Stability Across Data Regimes</title>
      <link>https://arxiv.org/abs/2606.07908</link>
      <description>arXiv:2606.07908v1 Announce Type: new 
Abstract: Derivative-controlled networks based on ChainzRule (CR) combine cubic polynomial layers with a lightweight forward-mode per-layer Jacobian penalty (DREG). In this second paper of a multi-part series, we evaluate the generalization properties of CR across data regimes.
  We ablate the shape of the DREG coefficient schedule, demonstrating that the optimal annealing range depends on representation noise. On the Pima Diabetes dataset, CR achieves strong low-data performance and maintains a consistent accuracy advantage over baselines from 5\% to 100\% training data, supported by exceptionally stable gradient tail ratios ($\sim$1.01--1.02 vs. 1.07--1.09 for ReLU networks). Extensions to SST-5 show competitive or superior results in both frozen-embedding and BERT fine-tuned regimes, including outperforming prior BERT baselines despite substantially less training data. These results are statistically significant: CR achieves superior accuracy over the strongest published baselines we could identify on both datasets ($p &lt; 0.05$).
  These results establish that layer-wise derivative control induces a structural inductive bias toward low-frequency, stable representations that generalizes robustly across tabular and NLP domains, data volumes, and representation qualities. The gradient tail ratio serves as a reliable, label-free diagnostic of generalization capability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07908v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Rowan Martnishn</dc:creator>
    </item>
    <item>
      <title>MemToolAgent overview with a simple restaurant booking scenario where the agent retrieves similar memories, receives feedback on an invalid time format, and generates a reflection to update its memory</title>
      <link>https://arxiv.org/abs/2606.07909</link>
      <description>arXiv:2606.07909v1 Announce Type: new 
Abstract: Modern large language model (LLM) agents can use external tools to help users solve complex tasks. However, for problems that require learning from long-term historical events or from previous agent-environment interactions, LLM agents are required to use memory mechanisms to store and retrieve experiences. While sophisticated memory systems exist for dialogue agents, few studies have empirically examined how to improve agents' tool-using capabilities through past user-agent conversations. We propose MemToolAgent, a framework that improves tool use through memory management. Our approach contains a memory extraction module that processes past experiences into structured memory entries, and a retrieval module that dynamically selects a subset of the stored memory entries. This enables more personalized and accurate responses aligned with user preferences and feedback without requiring LLM fine-tuning. In summary, this work has three main contributions: (1) a unified memory entry format that improves both general-purpose and personalized tool use without LLM fine-tuning, (2) a reflection-based memory extraction that uses environment and user feedback to distill wrong executions into critiques to store, and (3) a retrieval module that chooses how many past experiences to use based on the memory similarity distribution. MemToolAgent achieves 29%, 80%, and 17% relative improvements compared to strong baselines on the WorkBench, NESTFUL, and PEToolBench benchmarks, respectively.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07909v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Suleyman Armagan Er, Danilo Ribeiro, Yogesh Virkar, Surafel Lakew, Adi Kalyanpur, James Gung, Thomas Delteil, Arshit Gupta</dc:creator>
    </item>
    <item>
      <title>CAAL: Contextual Bandits based Online Hand-Craft Active Learning Strategy Selection</title>
      <link>https://arxiv.org/abs/2606.07910</link>
      <description>arXiv:2606.07910v1 Announce Type: new 
Abstract: The challenge with active learning algorithms is the uncertainty of the statistical distribution of unlabeled data, making it difficult to choose the best hand-crafted strategy. To address this, we introduced Contextual Adaptive Active Learning (CAAL). In CAAL, each "arm" represents a hand-crafted strategy. Unlike existing frameworks that select strategies based only on feedback from labeled data, we dynamically choose strategies for labeling batches of data using reward prediction with external context information. This general framework allows for customization with domain knowledge to design more effective rewards and context candidates. In addition, we experimentally show that CAAL outperforms the existing baseline adaptive strategy on public datasets using our reward and context design. Our results are consistent regardless of batch size in each iteration.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07910v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shao-An Yin, Jiacong Li, Tianpei Xie, Cecile Levasseur, Wojciech Kowalinski, Nicola Elia</dc:creator>
    </item>
    <item>
      <title>On Improved Statistical Accuracy of Low-Order Polynomial Chaos Approximations</title>
      <link>https://arxiv.org/abs/2606.07912</link>
      <description>arXiv:2606.07912v1 Announce Type: new 
Abstract: Polynomial chaos expansions provide surrogate models for stochastic systems, with coefficients typically derived using Galerkin projection, stochastic collocation, or least squares approximation. These traditional approaches often fail to accurately capture statistical moments without resorting to high-order approximations. We propose a constrained optimization framework that modifies standard techniques to determine polynomial chaos coefficients that precisely recover the first two statistical moments. The effectiveness of our approach is demonstrated on several candidate algebraic functions of random variables, showing significant improvements in statistical accuracy even with low-order approximations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07912v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vedang M. Deshpande, Raktim Bhattacharya</dc:creator>
    </item>
    <item>
      <title>EditSR: Enhancing Neural Symbolic Regression via Edit-based Rectification</title>
      <link>https://arxiv.org/abs/2606.07915</link>
      <description>arXiv:2606.07915v1 Announce Type: new 
Abstract: Neural symbolic regression models improve inference efficiency by shifting structural search to pretraining, but their one-pass autoregressive decoding is prone to error accumulation, which may lead to generating structurally incorrect expressions, especially in complex expression generation scenarios. Existing rectification strategies can alleviate this issue, but they often depend on restarting global search, thereby weakening the efficiency advantage of neural models, and remain susceptible to error accumulation. In this paper, we propose EditSR, a two-layer framework that combines a neural symbolic regression model in the first layer with an edit-based Rectifier in the second layer to achieve efficient prediction and post-hoc rectification. Instead of restarting the global search, we maintain rectification efficiency by pretraining the Rectifier. Specifically, we formulate the rectification process as a step-by-step state-transition chain starting from an incorrect expression, and develop a state-transition algorithm to construct supervised rectification chains for training the Rectifier. To ensure syntactic validity throughout rectification, each edit action is restricted to a syntactically valid space so that every edited expression remains parseable. In addition, because each edit decision is conditioned on the current state rather than the history, the Rectifier allows errors made in earlier steps to be rectified by subsequent edits, thereby reducing the risk of error accumulation. Extensive experiments and ablation studies show that EditSR substantially improves symbolic structure recovery with limited extra cost, with more pronounced gains on complex expressions, where one-pass autoregressive decoding is more susceptible to error accumulation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07915v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Da Li, Xinxin Li, Xingyu Cui, Jin Xu, Juan Zhang, Junping Yin</dc:creator>
    </item>
    <item>
      <title>The CIFAR Synthetic Evidence Corpus for Detecting AI-Generated Evidence</title>
      <link>https://arxiv.org/abs/2606.07916</link>
      <description>arXiv:2606.07916v1 Announce Type: new 
Abstract: The growing ability of generative models to produce realistic documents poses a direct challenge to evidentiary workflows in the justice system and the courts, where decisions increasingly depend on the authenticity of evidence such as receipts, communications, and administrative records. Unlike social media or academic settings, evidentiary documents are often only subtly altered, with small, localized edits that preserve overall plausibility while changing legal meaning. Yet progress on automated detection remains limited, largely due to the absence of suitable training and evaluation data especially suited for the justice system requirements. Existing resources are either focused on photos of human faces or natural scenery or on narrowly scoped academic or social media document types, and do not capture the structure, diversity, or manipulation patterns characteristic of real-world evidentiary data. As a result, current detection systems do not necessarily learn meaningful signals appropriate for the justice system. We introduce the CIFAR Synthetic Evidence Corpus, a dataset designed to enable rigorous evaluation of evidence verification under realistic and controlled conditions. The corpus spans multiple document families and a spectrum of manipulation strategies, from small field-level edits to complete document fabrication, and is constructed using a diverse set of state-of-the-art generative tools. It is organized to systematically vary both manipulation complexity and generation method, while enforcing source-level separation between training and test data to reflect real-world generalization challenges.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07916v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Kelly McConvey, Jalehsadat Mahdavimoghaddam, Nima Jamali, Maksym Taranukhin, Sajad Ebrahimi, Wentao Zhang, Yuntian Deng, Karen Eltis, Maura R. Grossman, Vered Shwartz, Ebrahim Bagheri</dc:creator>
    </item>
    <item>
      <title>Larch: Learned Query Optimization for Semantic Predicates</title>
      <link>https://arxiv.org/abs/2606.07923</link>
      <description>arXiv:2606.07923v1 Announce Type: new 
Abstract: With the advent of Large Language Models (LLMs), many database systems introduced semantic operators that enabled analytical queries over unstructured data (e.g. text, images, videos). Semantic operators typically incur high inference costs and latencies making semantic (AI) SQL queries challenging to apply on large scale datasets. At the same time, their semantic nature leads database engines to treat them as black boxes, making AISQL queries difficult to optimize. In this paper, we introduce Larch, a framework for optimizing the execution of semantic filters in AI SQL queries. Larch was inspired by two key observations: i) the high latency of semantic operators leaves significant room for computationally-heavy runtime optimization techniques, ii) unstructured data are typically accompanied by semantic information in the form of embeddings allowing for efficient semantic comparisons between AI_FILTER prompts and data values. Based on these two key observations, we present two Larch variants: Larch-A2C and Larch-Sel. Larch-A2C encodes arbitrary semantic filters expression tree using an embedding-augmented Gated Graph Neural Network and formulates the filter evaluation order as a Markov decision process. In contrast, Larch-Sel leverages a supervised learning model to predict filter selectivities, subsequently applying dynamic programming to find a near-optimal evaluation order for each input row. Evaluated across diverse real-world datasets and comprehensive synthetic workloads, both Larch variants always outperform existing semantic filter optimization techniques in terms of token usage. Our results demonstrate that Larch is robust across diverse workloads, reducing total token cost overhead by 3x-19x compared to Palimpzest and Quest.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07923v1</guid>
      <category>cs.DB</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Fuheng Zhao, Pawel Liskowski, Zihan Li, Benjamin Han, Puxuan Yu, Varich Boonsanong, Dimitris Tsirogiannis, Anupam Datta</dc:creator>
    </item>
    <item>
      <title>Decoupling Semantics and Logic: A Training-Free Coarse-to-Fine Pipeline for Video Retrieval-Augmented Generation</title>
      <link>https://arxiv.org/abs/2606.07924</link>
      <description>arXiv:2606.07924v1 Announce Type: new 
Abstract: This paper presents our system description for the 2nd Workshop on Multimodal Augmented Generation via MultimodAl Retrieval (MAGMaR). Addressing the critical challenges of cross-lingual long-video comprehension, strict persona adherence, and zero-hallucination temporal grounding, we propose a fully training-free, two-stage cascaded Video RAG pipeline. Our architecture strategically decouples semantic retrieval from cognitive logical reasoning through a modality-aware division of labor. In the first stage, a high-recall semantic pre-fetching module employs dense retrieval using only high-fidelity visual summaries and global text descriptions, explicitly isolating noisy modalities (e.g., OCR and ASR) to maintain a pristine vector space. In the second stage, an Adaptive, Iterative, and Reasoning-based (A.I.R.) filtering agent, powered by a commercial Large Language Model (LLM), performs fine-grained cognitive reranking. The agent re-incorporates full multimodal contexts to enforce strict logical alignment with user personas, effectively pruning semantically similar but logically irrelevant candidates. Finally, a Prompt Sculpting mechanism constrains the generator to synthesize the distilled subset into strictly formatted JSON responses with exact chunk-level citations. Evaluated on the RAG track, our resource-aware approach shows exceptional precision in both information retrieval and persona-conditioned generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07924v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <category>cs.MM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiaxin Dai, Zehang Wei, Jiamin Yan, Xiang Xiang</dc:creator>
    </item>
    <item>
      <title>ROSUM-MCTS: Monte Carlo Tree Search-Inspired HDL Code Summarization with Structural Rewards</title>
      <link>https://arxiv.org/abs/2606.07925</link>
      <description>arXiv:2606.07925v1 Announce Type: new 
Abstract: Large language models (LLMs) have shown promise in code summarization, yet their effectiveness for Hardware Description Languages (HDLs) like VHDL and Verilog remains underexplored. We propose ROSUM-MCTS, an LLM-guided approach inspired by Monte Carlo Tree Search (MCTS) that refines summaries through structured exploration and reinforcement-driven optimization. Our method integrates both local and global context via a hierarchical candidate expansion mechanism and optimizes summaries using a composite reward function balancing functional correctness (FC), local content adequacy (LCA), and fluency. We evaluate ROSUM-MCTS on the VHDL-eval and Verilog-eval datasets, demonstrating its consistent outperformance over baseline methods by leveraging structured bottom-up refinement and reinforcement-based optimization. Ablation studies confirm the necessity of both local and global expansion strategies, as well as the importance of balancing FC and LCA for optimal performance. Furthermore, ROSUM-MCTS proves robust against superficial modifications, such as variable renaming, maintaining summary quality where baselines degrade. These results establish ROSUM-MCTS as an effective and robust HDL summarization framework, paving the way for further research into reinforcement-enhanced code summarization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07925v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>ICLAD'2025</arxiv:journal_reference>
      <dc:creator>Prashanth Vijayaraghavan, Charles Mackin, Luyao Shi, Apoorva Nitsure, Ashutosh Jadhav, David Beymer, Tyler Baldwin, Ehsan Degan, Vandana Mukherjee</dc:creator>
    </item>
    <item>
      <title>Stress-testing medical large language models reveals latent safety pathology beyond benchmark accuracy</title>
      <link>https://arxiv.org/abs/2606.07929</link>
      <description>arXiv:2606.07929v1 Announce Type: new 
Abstract: Large language models (LLMs) are entering clinical practice based on benchmark accuracy that may fail to detect safety-relevant failure modes. Here we present AI-MASLD, a stress-audit framework that adapts the logic of metabolic stress testing from hepatology to the evaluation of clinical LLMs. Using 240 clinical cases across six narrative perturbation probes, we subjected seven models to double-stress testing and quantified performance through three indices: metabolic index (MI), perturbation flip rate (PFR), and counterfactual fairness index (CFI). Under clean baseline conditions, all models performed uniformly well. Under realistic narrative stress, performance diverged sharply, revealing two distinct stress-response phenotypes. Quantized models exhibited pseudonormalization, in which low flip rates hid functional collapse. Medical supervised fine-tuning systematically degraded logical stability, fairness, and information extraction. An open-weight model matched or exceeded proprietary alternatives on every safety dimension. These findings establish narrative stress auditing as a necessary complement to accuracy-based evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07929v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuan Shen, Xiaojun Wu, Linghua Yu</dc:creator>
    </item>
    <item>
      <title>LEGS: Laplacian-Enhanced Gaussian Splatting with a Nonlinear Weighted Loss</title>
      <link>https://arxiv.org/abs/2606.07932</link>
      <description>arXiv:2606.07932v1 Announce Type: new 
Abstract: 3D Gaussian Splatting (3DGS) has become an efficient explicit representation for radiance field reconstruction and real-time novel view synthesis. However, its standard photometric loss treats flat and structure-rich regions similarly, which may limit the recovery of sharp contours and fine details. Edge-Guided Gaussian Splatting (EGGS) improves structure awareness through edge-guided weighting, but mainly relies on first-order gradient responses and linear weighting. In this paper, we propose LEGS, a Laplacian-Enhanced Gaussian Splatting method with a nonlinearly weighted loss. LEGS replaces first-order gradient guidance with second-order Laplacian structural guidance and maps the normalized Laplacian response into pixel-wise weights through nonlinear response-to-weight functions. The proposed loss improves structure-aware Gaussian optimization while keeping the original 3DGS rendering pipeline unchanged. Experiments on the full Tanks\&amp;Temples and Mip-NeRF360 datasets show that LEGS improves peak signal-to-noise ratio (PSNR) by up to 1.68 dB over 3DGS and up to 0.52 dB over EGGS. Incorporating the proposed second-order nonlinear weighting strategy into FastGS and FasterGS further improves PSNR by up to 1.69 dB, demonstrating its effectiveness as a general loss-level extension for Gaussian Splatting pipelines with potential applications in AR/VR, immersive visualization, and real-time 3D content generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07932v1</guid>
      <category>cs.CV</category>
      <category>cs.GR</category>
      <category>cs.MM</category>
      <category>eess.IV</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yongfei Guo, Qizhou Huo, Xuan Sun, Yuanhao Gong</dc:creator>
    </item>
    <item>
      <title>Finite-Blocklength Lossy Joint Source-Channel Coding over Unknown Channels</title>
      <link>https://arxiv.org/abs/2606.07933</link>
      <description>arXiv:2606.07933v1 Announce Type: new 
Abstract: We analyze the finite-blocklength performance of lossy joint source-channel codes (JSCC) in an unknown-channel framework, where the true channel is unknown but the source distribution is known. We establish achievability results for mismatched-design JSCC, where the code design is based on a channel $Q_{Y|X}$ but deployed over a different channel $P_{Y|X}$. Our mismatched-design achievability result allows nonstationary channel laws and arbitrary standard Borel alphabets for the source, reproduction, channel input and channel output. The achievability bound is given in terms of the rate-distortion and rate-dispersion functions, as well as two channel-dependent quantities that we call the mismatched-design rate and mismatched-design rate-dispersion. For block erasure channels, our result shows that channel mismatch incurs no penalty. We then show a second-order universal family of source-channel codes over the set of block erasure channels. Our code construction uses Poisson functional representations of suitable conditional probability measures to produce the encoder and decoder outputs. We use a parameterized family of Gibbs posteriors as the decoder-side kernels, whose envelope recovers the generalized mutual information.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07933v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Adeel Mahmood, Harish Viswanathan, Jinfeng Du</dc:creator>
    </item>
    <item>
      <title>X-OP: Cross-Morphology Whole-Body Teleoperation via MPC Retargeting</title>
      <link>https://arxiv.org/abs/2606.07934</link>
      <description>arXiv:2606.07934v1 Announce Type: new 
Abstract: Whole-body teleoperation is essential for scalable robot data collection in loco-manipulation tasks, yet existing approaches relying on exoskeleton suits or multi-camera setups impose prohibitive cost, complexity, and environmental constraints. Recent methods using a single extended reality (XR) device with end-to-end reinforcement learning policies partially address these limitations but require robot-specific retraining, suffer from out-of-distribution failures, and rely on motion retargeting that neglects dynamic feasibility. We propose a hierarchical whole-body teleoperation framework driven by a single XR device that generalizes across diverse robot morphologies without retraining robot-specific policies. A Model Predictive Control (MPC)-based motion retargeter jointly optimizes alignment with the operator's intent and the robot's dynamic feasibility, generating optimal commands for existing low-level controllers. To ensure robust online execution, we introduce a state synchronization method that resets the simulator state at each MPC step to handle noisy real-world measurements and contact sensitivity, and integrate SLAM-based global pose feedback to mitigate long-term drift. Simulation results show higher success rates on whole-body control tasks for both a humanoid (over 30% lower completion time and 20% lower power consumption) and a mobile manipulator (zero collisions) compared to baselines. Real-world experiments further validate the effectiveness and flexibility of our method, demonstrating the successful deployment of the proposed retargeter on both platforms for whole-body control tasks and the ease of allowing users to adjust teleoperation behavior based on their preferences. This plug-and-play framework offers a scalable, morphology-agnostic solution for whole-body robot teleoperation, enabling real-time behavioral customization and broad applicability across platforms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07934v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jen-Wei Wang, Sarthak Kaingade, Andrea Tagliabue, Nicholas Morozovsky</dc:creator>
    </item>
    <item>
      <title>REACT 2026: The Fourth Multiple Appropriate Facial Reaction Generation Challenge: Personalised MAFRG and Appropriate EEG Reaction Prediction</title>
      <link>https://arxiv.org/abs/2606.07935</link>
      <description>arXiv:2606.07935v1 Announce Type: new 
Abstract: In dyadic interactions, various human facial reactions could be appropriate for responding to each human speaker behaviour. Following the successful organisation of the REACT 2023, 2024 and 2025 challenge series, a body of generative deep learning (DL) models have been developed for the problem of multiple appropriate facial reaction generation (MAFRG). This year, we propose the REACT 2026 challenge encouraging the development and benchmarking of Machine Learning (ML) models that can generate multiple personalised, appropriate, diverse, realistic and synchronised human-style facial reactions expressed by a specific human listener for responding to each given speaker behaviour. As a key of the challenge, we continuously provide challenge participants with MARS dataset introduced by REACT 2025 but additionally provide individual-level Big-Five personality labels and EEG recordings. This introduces a new one-to-many personalised facial reaction generation setting combining human expressive behavioural, affective and neurophysiological signals, which remains largely unexplored in current dyadic interaction modelling. This paper also presents the challenge guidelines and new baselines on the four proposed sub-challenges: Offline generic and personalised MAFRG as well as Online generic and personalised MAFRG, respectively, which are publicly available at https://github.com/reactmultimodalchallenge/baseline_react2026.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07935v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Siyang Song, Micol Spitale, Zijian Wu, Xiangyu Kong, Cheng Luo, Cristina Palmero, German Barquero, Sergio Escalera, Michel Valstar, Mohamed Daoudi, Fabien Ringeval, Andrew Howes, Elisabeth Andre, Hatice Gunes</dc:creator>
    </item>
    <item>
      <title>Illusions of the Gold Standard: A Large-scale Analysis of Human Evaluation Protocols for Long-form Text Generation</title>
      <link>https://arxiv.org/abs/2606.07936</link>
      <description>arXiv:2606.07936v1 Announce Type: new 
Abstract: Human evaluation plays a critical role in assessing the quality of generated text. However, the reliability and reproducibility of these evaluations depend on transparent and well-documented protocols -- details that are frequently missing in current practice. In this work, we conduct a large-scale analysis of human evaluation protocols for evaluating long-form generation tasks in *CL conference publications from 2023--2025, including a full manual review of 284 papers and LLM-assisted analysis for another 1.8k+ papers. We define a set of 20 reportable criteria related to reproducibility of human evaluation studies, and apply these criteria to systematically examine reporting norms and practices within the community. We find widespread under-reporting of important aspects of human evaluation study design, leading to ambiguity about what was measured and how, who contributed judgments, and how judgments should be interpreted. Based on these findings, we outline actionable recommendations to support more transparent and reproducible reporting in future research. Our analysis code and annotated dataset can be found at: https://github.com/larchlab/Illusions-of-the-Gold-Standard</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07936v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Katelyn Xiaoying Mei, Yi-Li Hsu, Minjoon Choi, Zongwan Cao, Chenjun Xu, Bingbing Wen, Su Lin Blodgett, Lucy Lu Wang</dc:creator>
    </item>
    <item>
      <title>Hallucination Cascade: Analyzing Error Propagation in Multi-Agent LLM Systems</title>
      <link>https://arxiv.org/abs/2606.07937</link>
      <description>arXiv:2606.07937v1 Announce Type: new 
Abstract: Large Language Models (LLMs) generate fluent text but remain vulnerable to hallucinations, producing unsupported, inconsistent, and factually incorrect claims. Most prior work treats hallucination as a static property of isolated outputs. In multi-agent LLM systems, however, responses are exchanged across agents, revised through sequential stages, and reused as context for later reasoning. Hallucination, therefore, becomes a dynamic process shaped by interaction history, cascade depth, and model heterogeneity. This paper analyzes hallucination dynamics in multi-agent LLM cascades by tracking claim-level factual inconsistencies across sequential agent interactions. We conduct 500 cascade experiments across 10 knowledge domains using GPT-5.3, DeepSeek-V3, and LLaMA-3-70B-Instruct, yielding 1,250 evaluated responses. Results show that deeper cascades reduce the normalized hallucination score from 0.422 at the first agent to 0.272 at the final agent in 3-agent chains, with an amplification factor of 0.644, indicating net attenuation. This reduction is accompanied by a decline in factual accuracy from 0.789 to 0.769, revealing a trade-off between hallucination suppression and factual preservation. Transition-level analysis shows that each agent-to-agent refinement reduces hallucination by an average of 0.072, with small but consistent losses in factual consistency and response quality. Model-level results reveal reliability-efficiency trade-offs: LLaMA-3-70B-Instruct achieves the lowest hallucination score, whereas GPT-5.3 provides faster generation with a higher hallucination rate. Domain-level analysis shows that hallucination varies with topic complexity, with lower scores in well-grounded scientific domains and higher scores in more abstract domains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07937v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Saeid Jamshidi, Arghavan Moradi Dakhel, Kawser Wazed Nafi, Foutse Khomh</dc:creator>
    </item>
    <item>
      <title>DAL-PCQA: Enabling Distortion-Level and Language-Driven Reasoning for Point Cloud Quality Assessment</title>
      <link>https://arxiv.org/abs/2606.07938</link>
      <description>arXiv:2606.07938v1 Announce Type: new 
Abstract: Point Cloud Quality Assessment (PCQA) methods typically predict scalar Mean Opinion Scores (MOS), which quantify overall perceptual degradation but do not reveal its causes. In contrast, human observers naturally reason in terms of specific distortions such as blur, color shifts, point density changes, missing regions, and geometric deformations. To close this gap, we introduce DAL-PCQA, a distortion-aware, language-annotated dataset for PCQA. DAL-PCQA augments benchmark point clouds with multi-level distortion severity labels, discrete quality categories, and structured natural language descriptions aligned with human perception. We define a point-cloud-specific distortion taxonomy that covers both photometric and geometric artifacts. Statistical analysis reveals characteristic degradation patterns across distortion types and quality levels. To assess the utility of these annotations, we compare zero-shot and fine-tuned multimodal models for generating perceptual quality descriptions. Experiments show that distortion-aware supervision substantially improves lexical and semantic alignment with ground-truth descriptions. By enabling interpretable, distortion-level reasoning, DAL-PCQA facilitates language-driven, explainable point cloud quality assessment. The dataset is publicly available at https://github.com/swarna96/DAL-PCQA.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07938v1</guid>
      <category>cs.CV</category>
      <category>cs.MM</category>
      <category>eess.IV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Swarna Chakraborty, Gabriel De Castro Ara\'ujo, Syeda Tasmi Faria, Marcelo M. Carvalho, Mylene C. Q. Farias</dc:creator>
    </item>
    <item>
      <title>Stable Geometry, Reversing Poles: The Bipolar Structure of AI Occupational Substitutability and Its Decade-Scale Inversion</title>
      <link>https://arxiv.org/abs/2606.07939</link>
      <description>arXiv:2606.07939v1 Announce Type: new 
Abstract: Empirical research on the labor-market impact of artificial intelligence has converged, since Frey and Osborne (2017), on a continuous-gradient representation in which each occupation is assigned a real-valued exposure score on [0,1] obtained by linear aggregation across capability dimensions. This continuity is rarely articulated as an assumption and has not been tested at the micro-action level where substitution actually occurs. We decompose 1,961 O*NET Detailed Work Activities into 15,817 micro-actions using a multi-agent LLM pipeline with 31-expert HITL calibration, then project the DWA-level Occupational Automation Index from our prior work onto a 7-macro semantic typology. The result is a bipolar structure. Tool-Mediated Physical (M2, mean OAI = 0.054) and Planning &amp; Design (M7, mean OAI = 0.499) form two extremes separated by Cohen's d = 2.41 (H = 172.88, p = 6.21e-34). The geometry is robust under three independent stress tests: resolution (K=7 to K=15, polar gap widens from 0.45 to 0.57), encoder swap to BGE (LLM-class OAI lead replicates at 3.37x), and Eloundou's GPT-4 task ratings (DWA-level rho = 0.635). The six middle macros form a low-contrast band between the poles (TOST at d=0.2 admits only 1/15 pairs as equivalent), not a flat plain. The geometry's stability does not, however, extend to its content. Across a decade, the polarity has inverted. Frey-Osborne (2013) placed Tool-Mediated Physical near the highest computerisation risk and Planning &amp; Design near the lowest; our LLM-era OAI reverses that order, with macro-level FO-Eloundou Spearman rho = -0.750, p = 0.020, against the original Oxford Martin appendix. Which pole is high is therefore contingent on the era's dominant capability frontier, while the stable geometry itself is the structurally robust object.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07939v1</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shuyao Gao (aSSIST University, Seoul, South Korea), Minghao Huang (aSSIST University, Seoul, South Korea)</dc:creator>
    </item>
    <item>
      <title>SGTO-MAS: Secure Gorilla Troops Optimization for Multi-Agent LLM Systems</title>
      <link>https://arxiv.org/abs/2606.07940</link>
      <description>arXiv:2606.07940v1 Announce Type: new 
Abstract: Multi-agent large language model (LLM) systems offer strong capabilities for complex reasoning and decision-making, yet coordination across agents introduces error propagation, security risks, and inefficient use of resources. Existing methods often rely on heuristic, static strategies and lack a principled mechanism for balancing performance, security, and computational cost. This paper formulates multi-agent LLM coordination as a constrained optimization problem and proposes a security-aware method for adaptive agent selection. The method integrates trust modeling, risk-aware evaluation, and collective intelligence within a unified optimization objective. To solve the problem efficiently, we use a swarm-intelligence strategy inspired by Gorilla Troops Optimization (GTO), enabling adaptive coordination under varying threat conditions. Controlled experiments across 500 independent runs demonstrate the effectiveness of the proposed method. The system achieves a stable average performance score of 0.5281, with high consensus (0.8764), controlled risk (0.3000), and compact agent subsets averaging 4.04 selected agents. The optimization process converges efficiently, with an average runtime of 24.09 seconds per run and low score variability (standard deviation = 0.0173). Robustness analysis indicates graceful degradation under perturbations, with performance drops limited to 2.5% under agent removal and 5.3% under consensus disruption. These results show that effective multi-agent coordination can be achieved through structured optimization that jointly manages performance, security, and efficiency. The proposed method provides a practical security-aware solution for coordinating multi-agent LLM systems in complex adversarial settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07940v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Saeid Jamshidi</dc:creator>
    </item>
    <item>
      <title>Collective Hallucination in Multi-Agent LLMs:Modeling and Defense</title>
      <link>https://arxiv.org/abs/2606.07941</link>
      <description>arXiv:2606.07941v1 Announce Type: new 
Abstract: Hallucinations in large language models (LLMs) create heightened risks in multi-agent settings, where recursive agent interactions can propagate, reinforce, and amplify unsupported claims. This paper models hallucination as a system-level, time-evolving process across a network of interacting LLM agents, where nodes represent agents and edges encode information exchange. The proposed formulation captures how hallucinated claims diffuse through communication topologies, intensify under adversarial perturbations, and affect collective reliability across reasoning rounds. To suppress error propagation, we introduce an interaction-aware control method that combines confidence-weighted aggregation, adaptive impact regulation, external claim verification, and selective isolation of unreliable agents. Experiments on TruthfulQA and TriviaQA show that the proposed method reduces hallucination by up to 39.0% relative to undefended multi-agent reasoning, improves factual accuracy from 0.79 to 0.87, and increases semantic consistency from 0.75 to 0.84. Under adversarial conditions, the method limits hallucination amplification to 1.08, compared with 1.45 without adaptive control, maintaining stable collective behavior across recursive interaction rounds. These results indicate that hallucination in multi-agent LLM systems is governed by both individual model reliability and system-level interaction dynamics, including communication topology, confidence coupling, and recursive information flow.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07941v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Saeid Jamshidi</dc:creator>
    </item>
    <item>
      <title>POISE: Position-Aware Undetectable Skill Injection on LLM Agents</title>
      <link>https://arxiv.org/abs/2606.07943</link>
      <description>arXiv:2606.07943v1 Announce Type: new 
Abstract: Agent skills provide a lightweight mechanism for extending general-purpose agents, but their open format exposes them to skill-poisoning attacks. A practically dangerous injection must stay invisible: if executing the payload derails the user's legitimate task, the resulting failure signal invites inspection of the skill. We therefore evaluate attacks by Attack Success Rate, which requires the injected payload to execute and the user's task to still pass its verifier in the same trial. Prior skill-poisoning attacks face a reliability-stealth trade-off under this lens: YAML-header injections are reliably loaded but easily inspected, whereas stealthier body injections that place explicit malicious commands in the skill prose are less reliable because out-of-context commands invite the agent's own suspicion. We introduce POISE, a position-aware attack that compresses the trigger into a single, benign-looking body instruction, placing it at a feasible position and using a context-aware generator to blend it with nearby setup or prerequisite steps. On Skill-Inject with codex+gpt-5.2, POISE achieves an 89.3% ASR, 28.0 points above a random-placement body baseline and 2.6 points above a YAML-only baseline, while retaining the stealth advantage of body placement. That stealth is the decisive margin: because legitimate skill bodies naturally require privileged tool operations, LLM scanners are hyper-sensitive, falsely flagging 74.6% of clean skills on average across four judges and both benchmarks. Blending into these false alarms, POISE causes only 5.6% of poisoned variants to gain a new high-risk alert over their clean baselines, rendering current static defenses ineffective.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07943v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haochang Hao, Dehai Min, Zhifang Zhang, Yunbei Zhang, Miao Xu, Yingqiang Ge, Lu Cheng</dc:creator>
    </item>
    <item>
      <title>Spectrum Aggregation for 6G: Lessons from 5G Carrier Aggregation and Dual Connectivity</title>
      <link>https://arxiv.org/abs/2606.07944</link>
      <description>arXiv:2606.07944v1 Announce Type: new 
Abstract: Spectrum aggregation has been a key enabler of LTE and 5G capacity growth, but it will become even more fundamental in 6G as networks expand across low bands, existing mid bands, new upper-mid/centimetric bands, and millimeter wave bands. This article examines how 5G carrier aggregation (CA) and dual connectivity (DC) inform the design of 6G spectrum aggregation. We argue that, while DC was instrumental in accelerating early non-standalone 5G deployment, it also introduced architectural fragmentation and long-term migration complexity. In contrast, CA provides a cleaner and more scalable foundation for multi-band operation in standalone 6G. Building on lessons from 5G, we advocate enhanced CA as the preferred 6G aggregation framework and point out the corresponding key enhancement directions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07944v1</guid>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xingqin Lin</dc:creator>
    </item>
    <item>
      <title>EduMirror: Modeling Educational Social Dynamics with Value-driven Multi-agent Simulation</title>
      <link>https://arxiv.org/abs/2606.07948</link>
      <description>arXiv:2606.07948v1 Announce Type: new 
Abstract: Understanding how educational social dynamics evolve is critical for informing effective educational policies and counterfactual interventions. However, traditional methods face a fundamental dilemma: observational studies often lack causal power, while controlled experiments are frequently constrained by ethical concerns. Although LLM-based multi-agent simulations offer a scalable in silico alternative, existing approaches remain limited by weak psychological grounding and insufficient measurement of latent psychological states. To address this, we introduce EduMirror, a multi-agent simulator for the scientific study of educational social dynamics. We provide configurable education-oriented agent forms, including value-driven agents grounded in psychological needs and social value orientation, together with a dual-track measurement protocol for quantifying observable behaviors and latent psychological states. We validate the realism and usability of EduMirror through case studies on school bullying and group cooperation, as well as broader evaluations across diverse educational scenarios. The results show that EduMirror generates educational social dynamics that are realistic, theory-consistent, and measurable by empirical criteria. These properties enable structured in silico educational research, providing a computational tool for hypothesis testing and counterfactual intervention analysis in educational science. Project page: https://edumirror.net.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07948v1</guid>
      <category>cs.MA</category>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Jingzhe Lin, Hengbin Yu, Yongdan Zeng, Fangwei Zhong</dc:creator>
    </item>
    <item>
      <title>The Easy, the Hard, and the Learnable: Confidence and Difficulty-Adaptive Policy Optimization for LLM Reasoning</title>
      <link>https://arxiv.org/abs/2606.07950</link>
      <description>arXiv:2606.07950v1 Announce Type: new 
Abstract: RL with verifiable rewards can substantially improve LLM reasoning, yet standard GRPO-style training often treats easy, hard, and learnable questions alike through uniform sampling and weighting, leading to inefficient compute allocation. We study GRPO by tracking token log-probabilities, group-normalized advantages, and the induced token-level update weights. This reveals three recurring dynamics as training proceeds: (1) confidence inflation, (2) advantage contraction, and (3) hierarchical convergence. These findings suggest that the utility of each update depends strongly on both question difficulty and the model's current competence. Motivated by this, we propose Confidence and Difficulty-adaptive Policy Optimization (CoDaPO), which assigns each question a bounded value from rollout confidence and empirical difficulty. CoDaPO then uses this value to reweight policy updates and resample high-value learnable questions within mini-batches, thereby increasing discovery within the learnable band under a fixed compute budget. Across twelve benchmarks, CoDaPO consistently improves accuracy over existing RL methods. Our code is publicly available at https://github.com/tmlr-group/CoDaPO.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07950v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhanke Zhou, Xiangyu Lu, Chentao Cao, Brando Miranda, Tongliang Liu, Bo Han, Sanmi Koyejo</dc:creator>
    </item>
    <item>
      <title>From `May' to `Is': Certainty Distortion in Language Model Rewriting</title>
      <link>https://arxiv.org/abs/2606.07951</link>
      <description>arXiv:2606.07951v1 Announce Type: new 
Abstract: Humans increasingly turn to Language Models (LMs) in ways that shape beliefs and drive decisions, including discussing, rewriting, and summarizing information from scientific articles, news, and medical reports. However, in these domains, where how confidently a claim is expressed matters, little is known about whether LMs faithfully preserve it. In this work, we investigate certainty distortion in LMs, defined as meaningful changes in expressed certainty when semantic content is preserved. We propose an LM-based evaluation metric that is consistent with population-level judgments of certainty. Using this metric, we characterize certainty distortion across different sizes and families of models in the context of scientific and medical communication tasks. Our results show that certainty distortion affects up to 75\% of LM outputs and is systematically asymmetric in rewriting tasks with most LMs being 1.5-2$\times$ more likely to increase the expressed certainty than to decrease it. These effects can compound over repeated paraphrasing: in the medical domain, claude-haiku-4-5 increases certainty of 20\% examples after a single iteration, increasing to 40\% after five iterations. Prompt-based interventions reduce overall certainty distortion but do not eliminate it. Together, these findings reveal a general bias toward inflating expressed certainty, with direct implications for users who rely on LMs in high-stakes domains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07951v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Catarina G Belem, Shang Wu, Hongyu Yao, Mark Steyvers, Sameer Singh, Padhraic Smyth</dc:creator>
    </item>
    <item>
      <title>Unification of Closed-Open Industrial Detection Scenarios: New Large-Scale Benchmarks,Challenges and Baselines</title>
      <link>https://arxiv.org/abs/2606.07953</link>
      <description>arXiv:2606.07953v1 Announce Type: new 
Abstract: Large-scale Visual-Language Models (LVLMs) have achieved remarkable success in natural visual tasks, yet their application to industrial defect detection remains challenging due to two fundamental limitations: (i) the scarcity of large-scale industrial datasets that cover diverse defect categories across multiple domains, and (ii) the reliance on manual prompts (points, boxes, masks) that introduce subjective noise and lack text-visual interaction for fine-grained understanding. To address these challenges, we introduce a Large-Scale Multi-Modal Industrial Open-Closed benchmark (MMIOC-1M) containing over one million samples across $14$ super-categories, $29$ industrial scenes, and $351$ defect subcategories. To our knowledge, MMIOC-1M is the first unified largest benchmark supporting both open-vocabulary and closed-set industrial detection, providing valuable pre-training data for LVLMs in industrial scenarios. Furthermore, we propose a Refined Text-Visual Prompt Network (RTVPNet) that incorporates three key innovations: (1) an expert-assisted domain projection mechanism that enables rapid adaptation of general vision models to industrial domains, (2) an energy-based sparse sampling strategy that automatically generates refined visual prompts without manual intervention, and (3) a bidirectional text-visual interaction module that enhances cross-modal semantic alignment and understanding. Extensive experiments demonstrate that RTVPNet achieves state-of-the-art performance on MMIOC-1M, LVIS, and COCO benchmarks while maintaining computational efficiency. The dataset and code are available at https://github.com/hellozzk/MMIO.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07953v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zekai Zhang, Jinglin Zhang, Qinghui Chen, Gang Li, Da Chen, Shuainan Jing, He Wang, Dagang Li, Cong Liu, Cong Bai, Shengyong Chen</dc:creator>
    </item>
    <item>
      <title>Minibatch Selection via Partition Matroid Constrained Gradient Matching</title>
      <link>https://arxiv.org/abs/2606.07954</link>
      <description>arXiv:2606.07954v1 Announce Type: new 
Abstract: Training large language models (LLMs) on heterogeneous data requires selecting minibatches that balance convergence speed with coverage across domains. Existing methods either select samples independently within each domain or rely on computationally expensive proxy models to learn continuous domain weights. We propose PartitionSel, a cross-domain minibatch selection approach that maximizes a validation-guided gradient-matching utility under per-domain budgets encoded as a partition-matroid constraint. By coupling the per-domain budgets through a single utility, PartitionSel is designed to reduce redundancy in selections across domains. The proposed objective is weakly submodular and admits an orthogonal matching pursuit algorithm with provable approximation guarantees. Empirically, we evaluate PartitionSel for minibatch selection during the fine-tuning of Qwen2.5 and Llama-3 on MetaMathQA and Mol-Instructions. PartitionSel achieves robust gains over per-domain and domain-agnostic baselines on both benchmarks. It also reduces the number of conflicting gradient pairs within each batch, indicating that the cross-domain coupling translates into more compatible training updates.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07954v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>Proceedings of the 43rd International Conference on Machine Learning (ICML 2026), Seoul, South Korea, PMLR 306, 2026</arxiv:journal_reference>
      <dc:creator>Prayas Agrawal, Prateek Chanda, Ishita Khatri, Ganesh Ramakrishnan, Bamdev Mishra, Pratik Jawanpuria</dc:creator>
    </item>
    <item>
      <title>Demand-Driven Vulnerability Detection for Cloud Security Posture Management: Removing Human Rule Authoring from the Disclosure-to-Protection Critical Path</title>
      <link>https://arxiv.org/abs/2606.07957</link>
      <description>arXiv:2606.07957v1 Announce Type: new 
Abstract: Cloud Security Posture Management (CSPM) systems detect known vulnerabilities by maintaining a rule set, distributing it to customers, and evaluating it against periodically-collected asset inventories. To our knowledge, in publicly documented architectures the rule set is environment-agnostic and curated centrally by the vendor; updates are batched into release cycles and shipped on a cadence ranging from hours to days depending on detection severity. The disclosure-to-protection window -- from a CVE being published to the customer's system being capable of detecting affected assets -- is therefore bounded by the vendor's release cadence for version-match detections, and by additional human authoring time for richer detections incorporating configuration predicates beyond the affected-software string. We propose an architecture in which the rule set is not vendor-distributed but continuously derived, within the customer's tenant, from the intersection of public catalogue feeds and the live asset graph. A rule comes into existence when a catalogue entry and an applicable asset are simultaneously present, and goes out of existence when either input ceases to support it. Derivation is bidirectional: new catalogue entries and new assets both trigger it. It incorporates the full structured-field content of catalogue entries, not only the affected-software predicate. The live rule set is bounded by environment diversity rather than catalogue breadth. Prior systems incrementally evaluate a static rule set; we incrementally derive the rule set itself. We present the threat model, the architecture, formal semantics with an equivalence theorem, complexity analysis, a worked example, and an evaluation methodology. The contribution is the architectural shift and its latency and resource consequences; rule correctness and alert prioritization are out of scope.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07957v1</guid>
      <category>cs.CR</category>
      <category>cs.DB</category>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Prashant Kumar Pathak</dc:creator>
    </item>
    <item>
      <title>Feedback Linearization and Control of a Grid-Forming Power Converter in an Islanded Microgrid</title>
      <link>https://arxiv.org/abs/2606.07961</link>
      <description>arXiv:2606.07961v1 Announce Type: new 
Abstract: In an islanded setting, grid-forming inverters must regulate their terminal voltage without support from an external grid, even though the load current depends directly on that voltage. The usual approach is a cascaded proportional--integral (PI) controller, built on a fast inner current loop and a slower outer voltage loop, with feedforward terms used to compensate dq rotational coupling. However, this compensation is only exact at the operating point where the controller is tuned. This tutorial presents an alternative based on full-state feedback linearization. It is shown that the islanded inverter model has full relative degree, which allows exact state-space linearization with no internal or zero dynamics. A single feedback law cancels the main nonlinear effects; rotational coupling, resistive drops, and load conductance, so that the closed-loop system behaves like two independent double integrators. A standard pole-placement design is then used to shape the response. The controller is tested in MATLAB against a cascaded PI baseline under identical conditions at a 20 MW operating point, including reference tracking, load step disturbances, and parameter mismatch. The feedback-linearizing controller settles a reference step in 0.76 ms, while the PI controller does not reach the 2 % band within 50 ms. The cascaded PI controller shows better robustness to filter parameter mismatch due to its inner-loop integral action, which reduces steady-state errors under modeling uncertainty. Overall, the performance improvement and the robustness trade-off both come directly from the controller structures, rather than from tuning choices.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07961v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rene Ebunle Akupan, May-Win Thein, Se Young Yoon</dc:creator>
    </item>
    <item>
      <title>ChronoPhyBench: Do MLLMs Truly Understand the World or Merely Exploit Language Priors?</title>
      <link>https://arxiv.org/abs/2606.07962</link>
      <description>arXiv:2606.07962v1 Announce Type: new 
Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in open-world reasoning and understanding. However, a critical ambiguity persists: it remains unclear whether these models genuinely synthesize cross-modal information to construct physically grounded reasoning chains, or if they merely exploit strong language priors to mask single-modality reliance, thereby hallucinating advanced multimodal capabilities. Motivated by this, and to rigorously mitigate language modality bias and shortcuts, we propose a novel multimodal Chrono}logical Physical Dynamics Reasoning Benchmark ChronoPhyBench, which unifies next state prediction with Visual Question Answering (VQA) paradigms by conditioning on historical video context and textual captions to enforce models to deduce subsequent physical states through both single image selection and the inherently more complex task of multiple frame chronological sorting. Concurrently, we construct a large-scale multimodal reasoning dataset curated using the ChronoPhyBench criteria, comprising over 10,000 long-form videos paired with meticulously annotated captions, totaling 5M tokens. Our experimental evaluations reveal a stark contrast to conclusions drawn by previous benchmarks. The capacity of current open-source models to perform physically grounded multimodal reasoning remains in its infancy. Ultimately, this work seeks to systematically stress-test the reasoning capabilities of multimodal models, quantify hallucination rates, and advance the development of Physical AI, thereby providing the community with a robust and transparent evaluation framework toward Artificial General Intelligence (AGI).</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07962v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bin Zhu, Yanhao Jia, Kexin Zhao, Jie Wang, Munan Ning, Hao Li, Yuwei Niu, Tanqing Sun, Huangchong Yan, Mingjun Pan, Xinyi Wu, Qishen Yin, Yunyang Ge, Shuai Zhao, Li Yuan</dc:creator>
    </item>
    <item>
      <title>Shared Latent Structures Enable Unified Backdoor Detection and Mitigation in LLMs</title>
      <link>https://arxiv.org/abs/2606.07963</link>
      <description>arXiv:2606.07963v1 Announce Type: new 
Abstract: Backdoor attacks in large language models (LLMs) are often treated as isolated trigger-response failures, motivating defenses tailored to specific triggers or behaviors. We show this view is incomplete. Across diverse backdoor behaviors, we identify a shared latent mechanism that can be detected, causally controlled, and suppressed. Using sparse autoencoders (SAEs) on residual-stream activations, we find a small set of latent features consistently activated across jailbreaking, refusal manipulation, password-locking, bias induction, sentiment misclassification, and country-conditioned harmful advice. These features generalize across Qwen3, Gemma~3, and Llama~3.1 models from 4B to 32B parameters, and across both fine-tuning and weight-editing attacks. Through bidirectional activation steering, we show these features are causal: suppressing them reduces attack success, while amplifying them induces target behaviors on clean prompts. We further train lightweight SAE-feature classifiers that generalize zero-shot to unseen backdoors and outperform residual-stream and weight-diffing baselines. Finally, we introduce Concept Ablation Fine-Tuning (CAFT), which suppresses backdoor formation by ablating the shared latent subspace during training. Together, our results suggest that many backdoors rely on a transferable latent mechanism, enabling unified detection and mitigation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07963v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Omar Mahmoud, Aly M. Kassem, Thommen George Karimpanal, Buddhika Laknath Semage, Negar Rostamzadeh, Golnoosh Farnadi, Santu Rana</dc:creator>
    </item>
    <item>
      <title>What Does Debiasing Really Remove? A Geometric Study of PCA-Based Gender Debiasing in Word Embeddings</title>
      <link>https://arxiv.org/abs/2606.07964</link>
      <description>arXiv:2606.07964v1 Announce Type: new 
Abstract: Debiasing methods based on principal component analysis (PCA) are broadly used to reduce gender bias in word embeddings used in LLMs, yet it remains unclear what aspects of bias they actually remove and how destructive this process is. These methods are based on the understanding that bias resides in a low-dimensional subspace, with the assumption that most of it can be captured by a few principal components. In this work, we conduct a systematic geometric analysis of PCA-based gender debiasing and investigate what is actually removed from the embedding space. Our experiments across multiple embeddings show that direct gender bias is primarily concentrated in the first principal component, supporting the low-rank bias hypothesis. However, associative bias measured by WEAT does not align with these principal directions and is instead spread across multiple embedding dimensions. Furthermore, as expected, we demonstrate that removing an increasing number of principal components leads to a consistent degradation of the embedding geometry, affecting semantic structure and vector relationships. These results reveal that PCA-based debiasing operates as a trade-off: while it effectively reduces certain forms of direct bias, it fails to eliminate distributed associations and introduces geometric distortion. Moreover, there is no universal optimal level of debiasing, as the balance between bias reduction and semantic preservation depends on the chosen metric and embedding. Overall, our findings suggest that bias in word embeddings is not purely low-rank and that simple subspace removal methods may be insufficient for comprehensive debiasing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07964v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Alexey Kresin, Tchifou M. Dieffi, Tomer Caspi</dc:creator>
    </item>
    <item>
      <title>Zero-Shot Learning in Industrial Scenarios: New Large-Scale Benchmark, Challenges and Baseline</title>
      <link>https://arxiv.org/abs/2606.07965</link>
      <description>arXiv:2606.07965v1 Announce Type: new 
Abstract: Large Visual Language Models (LVLMs) have achieved remarkable success in vision tasks. However, the significant differences between industrial and natural scenes make applying LVLMs challenging. Existing LVLMs rely on user-provided prompts to segment objects. This often leads to suboptimal performance due to the inclusion of irrelevant pixels. In addition, the scarcity of data also makes the application of LVLMs in industrial scenarios remain unexplored. To fill this gap, this paper proposes an open industrial dataset and a Refined Text-Visual Prompt (RTVP) for zero-shot industrial defect detection. First, this paper constructs the Multi-Modal Industrial Open Dataset (MMIO) containing 80K+ samples. MMIO contains diverse industrial categories, including 6 super categories and 18 subcategories. MMIO is the first large-scale multi-scenes pre-training dataset for industrial zero-shot learning, and provides valuable training data for open models in future industrial scenarios. Based on MMIO, this paper provides a RTVP specifically for industrial zero-shot tasks. RTVP has two significant advantages: First, this paper designs an expert-guided large model domain adaptation mechanism and designs an industrial zero-shot method based on Mobile-SAM, which enhances the generalization ability of large models in industrial scenarios. Second, RTVP automatically generates visual prompts directly from images and considers text-visual prompt interactions ignored by previous LVLM, improving visual and textual content understanding. RTVP achieves SOTA with 42.2% and 24.7% AP in zero-shot and closed scenes of MMIO.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07965v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zekai Zhang, Qinghui Chen, Maomao Xiong, Shijiao Ding, Zhanzhi Su, Xinjie Yao, Yiming Sun, Cong Bai, Jinglin Zhang</dc:creator>
    </item>
    <item>
      <title>DisCo: World Models with Discrete Camera Motion Control</title>
      <link>https://arxiv.org/abs/2606.07967</link>
      <description>arXiv:2606.07967v1 Announce Type: new 
Abstract: Controllable video world models target interactive world exploration, where models must faithfully execute explicit action commands while preserving visual quality and temporal coherence. However, most existing approaches rely on continuous camera trajectories as action conditions, which often lead to unreliable action following, especially under complex motion sequences. In this work, we identify action representation entanglement as a key bottleneck in controllable video generation, and show that continuous camera representations lead to high feature similarity across distinct motion patterns, degrading action controllability. Based on this insight, we propose DisCo, a controllable video world model that conditions generation on a compact set of discrete action primitives to improve action separability. We further introduce DisCoBench, a comprehensive benchmark for evaluating the ability of models in short-term, long-horizon, and highly dynamic exploration scenarios. Extensive experiments demonstrate that DisCo achieves significantly more reliable action following while preserving visual quality.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07967v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hongrui Huang, Junke Wang, Quanhao Li, Yu-Gang Jiang, Zuxuan Wu</dc:creator>
    </item>
    <item>
      <title>RecurGuard: Runtime Monitoring for Reasoning-Token Consumption Attacks</title>
      <link>https://arxiv.org/abs/2606.07968</link>
      <description>arXiv:2606.07968v1 Announce Type: new 
Abstract: Reasoning-capable large language models can be induced to spend their generation budget on injected decoy tasks rather than answering the user's question, causing denial of service when no final answer is produced and denial of wallet when excess output tokens are billed. Input-side safety classifiers often miss these attacks because the injected prompts can appear syntactically benign. We build RecurGuard, a runtime monitor for detecting reasoning-chain consumption attacks when reasoning traces are exposed by the model. RecurGuard analyzes reasoning traces as they are generated and tracks three signals: recurrence rate, volume growth, and progress toward the user's query. If all three signals remain anomalous over three consecutive chunks, RecurGuard terminates generation early. We evaluate RecurGuard against OverThink and ExtendAttack across open-weight reasoning models and conduct adaptive stress tests on DS-R1-Qwen-7B. On this model, RecurGuard detects 99% of OverThink attacks and 92% of ExtendAttack instances while maintaining near-zero false positive rates on question answering, code generation, mathematics, and summarization. Adaptive evaluation reveals the limit of the defense: topical attacks retain 11.9x amplification with an approximately 50% joint miss rate, whereas full semantic evasion reduces amplification from 22.8x to 2.2x. When reasoning traces are unavailable, QDM provides a post-hoc fallback monitor based on the final output.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07968v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Abid Aziz, Hafsa Binte Kibria</dc:creator>
    </item>
    <item>
      <title>Neutrality Bites: Gender Representation in AI-Generated Animal Stories</title>
      <link>https://arxiv.org/abs/2606.07969</link>
      <description>arXiv:2606.07969v1 Announce Type: new 
Abstract: Gender bias in AI-generated stories is a well-documented problem. While much attention has been paid to reducing or mitigating this bias, it is not always clear whether interventions produce genuinely fairer results. To investigate this issue, we examine how large language models (LLMs) handle gender assignment in a narrative context that is popular, highly ambiguous, and also known to closely reproduce human stereotypes: stories about talking animals. We prompt six leading LLMs to complete an English-language story about seven different anthropomorphic animal characters whose gender is unstated. We additionally iterate with four different narrative settings and a range of model temperatures. Across the 23.8K stories, we find that models frequently avoid gendering the animal character in the story (19% on average) or use gender-neutral language like "it" or "its" (38.2% on average). However, when gender is assigned, there is a significant masculine bias. Feminine animal characters are virtually absent, present in just 2.2% of stories vs. 40.6% that feature masculine characters. Our findings point to a broader argument: neutrality bites. In other words, models that prioritize neutrality to address social bias may actually contribute to the erasure of marginalized perspectives and identities. We suggest that alternative strategies beyond neutrality need to be pursued, such as ones that more equally distribute social possibilities across imagined subjects.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07969v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3805689.3812287</arxiv:DOI>
      <dc:creator>Imani Finkley, Yuanxi Li, Melanie Walsh</dc:creator>
    </item>
    <item>
      <title>Defending Against Malicious Finetuning by Scaling Train-time Adversarial Attacks</title>
      <link>https://arxiv.org/abs/2606.07970</link>
      <description>arXiv:2606.07970v1 Announce Type: new 
Abstract: Current open-weight large language models (LLMs) are prone to malicious finetuning attacks, which could compromise the safety alignment of LLMs with only a few steps of supervised finetuning (SFT) on poisoned datasets. Existing alignment-stage defenses are primarily designed to defend against attacks that use parameter-efficient finetuning methods. However, they fail to defend against stronger attacks that use full-parameter finetuning. In this paper, we propose Patcher, a method inspired by adversarial training and bi-level optimization, to combat such attacks. Patcher strengthens the simulated attack by scaling up the optimization steps in the adversarial loop, thus forcing the defender to find model parameters that are insensitive to stronger attacks. Furthermore, we propose an efficient parallel algorithm to implement Patcher, decreasing the wall-clock time of training while preserving Patcher's performance. Extensive experiments show that Patcher substantially improves the model's robustness compared to vanilla SFT alignment, and transfers to diverse attack scenarios and model sizes. Code is available at https://github.com/haomingwen/patcher.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07970v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haoming Wen, Shi Chen, Qingyu Shi, Siyuan Liu, Minrui Luo, Jingzhao Zhang, Tianxing He</dc:creator>
    </item>
    <item>
      <title>A Uniformly High-Accuracy PML-BIE Method for Scattering by Periodic Arrays of Obstacles: The 2D Case</title>
      <link>https://arxiv.org/abs/2606.07971</link>
      <description>arXiv:2606.07971v1 Announce Type: new 
Abstract: This paper presents a novel frequency-robust perfectly matched layer (PML) boundary integral equation (BIE) method for solving two-dimensional electromagnetic scattering problems involving periodic arrays of obstacles. In periodic scattering problems, standard BIE formulations based on the quasi-periodic Green's function require the evaluation of lattice sums or challenging Sommerfeld-type integrals, which diverge at Rayleigh--Wood (RW) anomalies. An alternative is to use BIE formulations based on the Helmholtz free-space Green's function, but these are defined on unbounded unit-cell boundaries and therefore require suitable truncation strategies, such as the Windowed Green Function (WGF) method. Although such approaches avoid the use of expensive quasi-periodic Green's functions, they also suffer from breakdowns at RW anomalies unless an appropriate mode correction is incorporated. Similarly, the direct application of PML-BIE techniques to periodic structures experiences comparable difficulties near RW anomalies due to the destruction of exponential convergence near RW anomalies for fixed PML parameters. To overcome this challenge, we propose a modified PML-BIE method that combines the PML technique with a finite-mode correction, ensuring both high accuracy and robustness at and around RW-anomalies. Convergence of the PML-truncated boundary integral operators is proved and several numerical examples are presented to validate the efficiency and performance of the proposed method.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07971v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <category>physics.comp-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yan Tan, Carlos P\'erez-Arancibia, Tao Yin</dc:creator>
    </item>
    <item>
      <title>OneFeed: A Unified Generative Framework for Feed ContentEnhancement and Query Generation</title>
      <link>https://arxiv.org/abs/2606.07972</link>
      <description>arXiv:2606.07972v1 Announce Type: new 
Abstract: Modern feed recommendation and search systems are deeply connected in user behavior butare usually modeled by separate architectures. Feed recommendation mainly captures implicitinterests from browsing interactions, while search systems rely on explicit user queries to retrieveintent-matched content. This separation causes fragmented user understanding and missedopportunities for using feed interactions to improve query generation and using generated queriesto enhance feed candidate retrieval.In this paper, we propose OneFeed, a unified generative framework for jointly modelingfeed content enhancement and query generation. OneFeed encodes heterogeneous user behaviorsequences with a shared behavior encoder and employs two generative heads: a Feed SemanticID Generator that produces content semantic IDs for recommendation retrieval, and an IntentQuery Generator that produces natural-language queries for search-based candidate expansion.To bridge the semantic gap between recommendation content and search queries, we introduce aSID-Query alignment objective that learns a shared semantic space for content semantic IDs andquery representations. We further design a closed-loop self-enhancement paradigm that leveragesimplicit user feedback from generated content and search-retrieved results to improve bothgeneration tasks. We provide a detailed experimental protocol using public recommendationdatasets with weakly supervised query construction, define a comprehensive set of evaluationmetrics, report expected performance estimates grounded in known baseline values, and validatethe executability of the proposed pipeline through a minimal local prototype. OneFeed providesa practical and extensible direction for unifying search and recommendation through generativemodeling.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07972v1</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Guo Xun</dc:creator>
    </item>
    <item>
      <title>PRISM: PRior-guided Imagination Sampling in world Models</title>
      <link>https://arxiv.org/abs/2606.07974</link>
      <description>arXiv:2606.07974v1 Announce Type: new 
Abstract: A learned world model provides a powerful physical intuition for evaluating future states. But its effectiveness in continuous control also depends critically on how candidate actions are generated for model-based planning. Rather than solely asking how accurately a model can simulate the future, we ask: which candidate actions are worth evaluating in the first place? Existing planners typically search arbitrarily or use expert demonstrations only to initialize a sampling mean, discarding the expert's state-conditioned confidence. Properly guiding this search requires a robust action prior, yet current approaches often rely on independent visual encoders or large-scale VLMs to obtain one. We argue that this architectural bloat is unnecessary: the exact same data - and the learned representations of the world model itself - inherently encode the agent's action intuition. We introduce PRISM, a task-agnostic framework that extracts both from a single dataset while maintaining strict architectural simplicity. Building on a standard JEPA-style latent world model, PRISM attaches a lightweight MLP directly to its frozen encoder to predict a state-conditioned Gaussian prior. At plan time, PRISM fuses this prior into the planner's sampling distribution via a precision-weighted Product-of-Gaussians update. This parameter-free, closed-form integration steers the sampling process, making the prior confident where it is and ceding control where it is not. PRISM improves success rates by 35 percentage points over vanilla world-model-based MPC on Cube and 32 percentage points on PushT, without introducing significant inference overhead.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07974v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuhai Wang, Jiawei Xia, Rongxuan Zhou, Xiao Hu, Yongliang Shi, Jing Du, Yang Ye</dc:creator>
    </item>
    <item>
      <title>A Measure-Consistent Operator Learning Method for Infinite-Dimensional Master Equations</title>
      <link>https://arxiv.org/abs/2606.07976</link>
      <description>arXiv:2606.07976v1 Announce Type: new 
Abstract: Master equations in mean field game theory characterize feedback value functions that depend on time, state (space), and the population distribution. Their numerical approximation is challenging because the unknown is defined on a space of probability measures and the equation involves intrinsic measure derivatives and nonlocal population terms. This paper proposes a measure-consistent operator learning method (MCOL) for infinite-dimensional master equations. The population distribution is represented by an empirical measure and encoded through a symmetric pooling structure, so that the network input is built directly from the particles representing the measure. The same particles are used in the empirical quadrature of the nonlocal residual terms, avoiding additional quadrature grids or auxiliary integration points. A key feature is that the intrinsic derivative appearing in the residual is induced by the same measure-dependent representation that defines the approximation of the value function. Consequently, the value function, its measure derivative, and the empirical residual are tied to a common measure representation, leading to a structurally coupled value-derivative approximation. We also introduce an error decomposition separating neural approximation error from empirical discretization error. Numerical experiments on several master equations show that MCOL accurately approximates the value function, intrinsic measure derivatives, and feedback quantities, and remains robust under changes in the input measures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07976v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chenyao Wang, Hongyu Liu, Hui Liang</dc:creator>
    </item>
    <item>
      <title>MechLens: Late Crystallization of Factual Knowledge Explains Intervention Effectiveness in Language Models</title>
      <link>https://arxiv.org/abs/2606.07978</link>
      <description>arXiv:2606.07978v1 Announce Type: new 
Abstract: Understanding where LLMs store factual knowledge is critical for hallucination mitigation. We systematically quantify Late Crystallization: factual knowledge does not gradually emerge across layers but "crystallizes" abruptly at the final layers. Across five model families (Pythia, Gemma, Qwen2.5, Llama-3.1, Mistral; 0.5--14B), 26.8%--93.4% of correct answers never enter top-10 predictions at any intermediate layer, with late emergence (&gt;80% depth) consistent across architectures. Cross-scale (Qwen2.5-14B) and cross-benchmark (MMLU: 98.2%) results confirm generality; tuned lens rules out probe artifacts. A sentiment-classification control (0.5% for Qwen vs. 85.9% factual; 2.0% for Mistral vs. 26.8%) confirms the phenomenon is specific to factual recall.
  Late Crystallization yields a crystallization-guided intervention principle: CAA outperforms DoLa on moderate-crystallization models (Llama, Mistral; p&lt;0.001), with a directionally consistent reversal on high-crystallization Qwen (+25.4% vs. +15.5% MC1, p=0.069). LayerNorm ablation shows crystallization is intrinsic to the residual stream; LN scaling (x1.2) yields +11.8% MC1 with zero inference overhead. We further reveal a Computability-Memorization Spectrum: computable knowledge crystallizes earlier (layer 22.1/28) than memorized facts (28.0/28). We release MechLens supporting five model families.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07978v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Xueping Gao</dc:creator>
    </item>
    <item>
      <title>A New Level Set Formulation for Improved Dirichlet Eigenvalue Minimizers</title>
      <link>https://arxiv.org/abs/2606.07979</link>
      <description>arXiv:2606.07979v1 Announce Type: new 
Abstract: This paper makes several improvements to existing level set based approaches to computing shape optimizers for the Dirichlet eigenvalues subject to a volume constraint. The most notable changes in formulation include an overhaul of the classical level set construction and root-finding procedures as well the use of a regularized approximation to the standard objective function. Our resulting computational minimizers are either comparable to or improvements on the best known minimizers from the literature. We conclude with a survey of subproblems within the field that may benefit from numerical experiments; these include the existence of cusps on the boundary, the end-behavior of eigenfunction weights in the p-parameterized problem, and the nature of Weyl asymptotics as they relate to the P\'olya conjecture.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07979v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <category>math.AP</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Atharv Thakur</dc:creator>
    </item>
    <item>
      <title>DeRes: Decoupling Residual Stability and Adaptivity for Scalable CTR Prediction</title>
      <link>https://arxiv.org/abs/2606.07980</link>
      <description>arXiv:2606.07980v1 Announce Type: new 
Abstract: Transformer-based CTR models face a growing bottleneck at the residual connection: under Pre-Norm, early user-interest signals are diluted layer by layer; the identity skip cannot forget stale interests; and each layer sees only its immediate predecessor, losing long-range cross-layer dependencies. Recent attention-based residual variants (AttnRes) address parts of this in language models, but drop the protective identity skip and have not been tried in recommendation. Drawing on Dual Path Networks (DPN) and the HORNN view of residuals, we present DeRes, which routes each layer through two parallel paths -- an Identity residual path that preserves first-order feature reuse and gradient flow, and a Block Attention Residual path that attends over compressed outputs of all earlier blocks for high-order recall. A vector-wise gate decides, per hidden dimension, the weight given to each path. We further propose Pointwise AttnRes, replacing the Softmax in the cross-layer attention with SiLU so that multiple past blocks can be activated simultaneously and irrelevant ones receive negative (forgetting) weights -- better aligned with CTR's parallel multi-interest patterns. On a large-scale industrial dataset (331M interactions from a major social-media platform), Criteo (45M), and Avazu (40M), DeRes outperforms twelve baselines including OneTrans, TokenMixer-Large, UniMixer, mHC, and AttnRes, achieving up to +0.32% AUC at under 5% extra FLOPs. Beyond a single operating point, DeRes fits a markedly steeper compute-AUC scaling law (gamma=0.118 vs. 0.071 for OneTrans, a 1.66x gap), so an 8-layer DeRes matches a 16-layer OneTrans -- about 2x compute saving at equivalent AUC. Ablations confirm that the dual-path design outperforms either single path, Identity beats learnable residuals, and SiLU beats Softmax.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07980v1</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wenzhuo Cheng, Shipeng Nie, Qixin Guo, Xuefeng Sun, Jianguo Lou, Zhengwei Zheng</dc:creator>
    </item>
    <item>
      <title>Overcoming the Limits of Finite Difference Method; Physics-Informed Neural Network for Noisy High-Dimensional Heat Diffusion</title>
      <link>https://arxiv.org/abs/2606.07982</link>
      <description>arXiv:2606.07982v1 Announce Type: new 
Abstract: High-dimensional transient heat diffusion under noisy boundary conditions exposes a fundamental limitation of classical numerical methods: accuracy degrades catastrophically where physical noise is unavoidable. This paper presents a Physics-Informed Neural Network (PINN) framework as a systematic solution to this problem across one, two, and three spatial dimensions, establishing clear operational regimes that redefine solver selection in noisy thermal systems. Under 20% boundary noise in 3D, PINN sustains approximately 91% accuracy while Finite Difference Method (FDM) collapses to 36%, a clear decisive advantage. This is further confirmed in a physical copper thermal system, where PINN reduces boundary reconstruction error by 3.3 times under realistic noise conditions. This noise resilience is accompanied by a dimensionality-driven efficiency crossover: PINN requires fewer spacetime nodes than FDM in 3D while achieving superior accuracy, exposing the true cost of classical discretization at scale. These findings reframe solver selection: the decisive axis is not accuracy alone, but noise exposure and dimensionality jointly. When noise and dimensionality are both high, the classical solver paradigm is insufficient; this work provides the foundation to justify PINN as the operational standard in such regimes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07982v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shreesh Bhattarai, Harish Chandra Bhandari</dc:creator>
    </item>
    <item>
      <title>FMRFusion: Frequency-Aware Multi-View Representation Learning for Heterogeneous Image Fusion</title>
      <link>https://arxiv.org/abs/2606.07985</link>
      <description>arXiv:2606.07985v1 Announce Type: new 
Abstract: Infrared and visible image fusion aims to generate a composite image that retains significant target information and preserves detailed textures, integrating two heterogeneous modalities. Previous image fusion methods typically adopt a single-module stacking approach to extract features from the two modalities. However, these approaches may result in incomplete learning of their distinct characteristics, thereby limiting the fusion effectiveness and constrain ing robustness in real-world heterogeneous data scenarios. To address these challenges, we propose FMRFusion, a frequency-aware multi-view representation learning network for Heterogeneous Image Fusion. A Multi-Scale Struc tural Perception Module is introduced to effectively capture discriminative structures, extracting fine-grained local structures and essential contextual information. A bilinear frequency decomposition mechanism is employed to sepa rate features into high-frequency and low-frequency components, enabling joint modeling of local details and global representations across different frequency domains. Moreover, a Cross-View Complementary Interaction is incorpo rated to explicitly model and fuse the complementary characteristics between reflected light information and radiative intensity responses, facilitating effective cross-view interaction. We further improve the Performance of the fused results by flow matching, which progressively refines the fused features by learning the transformation from coarse data to high-quality representations. Extensive experiments conducted on multiple benchmark datasets demonstrate that FMRFusion achieves superior and consistent performance across a range of fusion tasks, especially in nighttime scenarios</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07985v1</guid>
      <category>cs.CV</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tao Zhoua, Yunlong Liu, Qinghui Chen, Zekai Zhang, Minlong Sun, Changlin Biana, Dagang Li, Wenmin Wang, Jinglin Zhang</dc:creator>
    </item>
    <item>
      <title>PAFO: Pareto Fairness Optimization for Personalized Reward Modeling</title>
      <link>https://arxiv.org/abs/2606.07988</link>
      <description>arXiv:2606.07988v1 Announce Type: new 
Abstract: Large language models (LLMs) increasingly rely on reward models to align their outputs with diverse user preferences. While personalized reward models aim to capture such heterogeneity, they are often trained on imbalanced user preference data and may therefore favor users whose preferences are more common in the training population. In this paper, we identify this failure mode as personalized reward bias, where reward modeling quality varies systematically with preference support rate. We formulate its mitigation as a Pareto fairness problem over group utilities, aiming to improve under-served users without degrading other user groups. To this end, we propose PAFO, a Pareto fairness optimization framework for personalized reward modeling. PAFO first trains group-specialized reward models for majority and minority preference groups, then constructs conditional margin-level supervision to distill their heterogeneous preference boundaries into a single unified model. The resulting model uses group information only during training and requires no explicit group labels at inference time. Experiments on Personal-LLM and DSP show that PAFO improves both minority-group and majority-group accuracy while reducing user-level unfairness across multiple metrics, demonstrating its effectiveness for fairer LLM personalization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07988v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaoyan Zhao, Haoting Ni, Yang Zhang, Chunyuan Zheng, Haoxuan Li, Fuli Feng</dc:creator>
    </item>
    <item>
      <title>VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation</title>
      <link>https://arxiv.org/abs/2606.07992</link>
      <description>arXiv:2606.07992v1 Announce Type: new 
Abstract: As the Model Context Protocol (MCP) standardizes tool-calling for autonomous agents, it introduces a critical, unexamined attack surface: the error-handling loop. We hypothesize that tool error messages possess implicit authority, triggering corrective reasoning modes that bypass standard safety heuristics. We introduce VATS (Vulnerability Analysis of Tool Streams), a mutation-driven framework that systematically evolves adversarial payloads across seven structural and linguistic dimensions. Our evaluation across four frontier models, Gemini 3.1 Pro, GPT-5.5, GLM-5.1, and Qwen3-Coder, demonstrates that error-path injection triples the success rate of standard indirect prompt injection (IPI), achieving up to 100% compliance in controlled evaluations. We isolate structural positioning (sandwiching instructions within error context) as the most effective exploit vector across all tested models. While we find that production framework guardrails can mitigate these vulnerabilities, the inherent susceptibility of the model layer poses a systemic risk to bespoke agentic workflows.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07992v1</guid>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Harshil Patel, Kunal Pai</dc:creator>
    </item>
    <item>
      <title>The Rising Dominance of Methods Across Science</title>
      <link>https://arxiv.org/abs/2606.07994</link>
      <description>arXiv:2606.07994v1 Announce Type: new 
Abstract: Scientific progress is traditionally narrated through the interplay of theoretical insights and experimental findings. Yet this view of science underplays a third and central pillar of progress: the methods that underlie both conceptual advances and empirical evidence. By analysing more than 3 million articles across science published between 1980 and 2019, we find that science has undergone a fundamental structural transition. The share of papers that primarily contribute new methods-methods papers-has doubled across science over the past four decades, rising universally across disciplines and citation impact levels. Rather than a gradual evolution, this transition marks a pivotal shift beginning in the early 1990s, aligning with the computational revolution and the emergence of data-intensive science. The surge in methodological research is not confined to the most cited, elite publications; it spans the full spectrum of scientific output. These findings reveal a systemic reorientation of the scientific ecosystem where reusable methods increasingly serve as the essential infrastructure of scientific advances, challenging the traditional dichotomy of theory and experimental research. As science becomes increasingly methods-driven, our results call for rethinking how research is evaluated, funded and organised-towards better incentivising method innovations. This is especially the case as expanding AI must be effectively integrated with scientific instruments to realise its full potential.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07994v1</guid>
      <category>cs.DL</category>
      <category>stat.AP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alexander Krauss, Ariel Rosenfeld, Lutz Bornmann</dc:creator>
    </item>
    <item>
      <title>Customer-Agent: Overcoming Context Limitations in Ultra-Long Shopping Trajectories via Tool-Augmented Agents and RLVR</title>
      <link>https://arxiv.org/abs/2606.07995</link>
      <description>arXiv:2606.07995v1 Announce Type: new 
Abstract: Understanding customer shopping trajectories is essential for enabling personalized shopping experiences. However, shopping records (i.e., customer's search, clicks, purchases, etc.) often span long time horizons over multiple years, resulting in extremely long trajectories that pose significant challenges for existing large language models (LLMs). Despite the importance of this problem, existing benchmarks are limited to short customer trajectories, while real-world trajectories from large e-commerce platforms are rarely accessible due to data privacy constraints. To address this gap, we introduce ShopTrajQA, a long-context evaluation benchmark constructed from real-world product information and simulated shopping trajectories. The dataset includes variants of up to 32k and 64k tokens, enabling systematic evaluation of model robustness under varying context lengths. Through comprehensive benchmarking of frontier LLMs, we identify critical performance gaps in reasoning over long shopping trajectory data. To address these challenges, we propose a Customer Agent Framework for ultra-long context management. Leveraging a Reinforcement Learning with Verifiable Rewards (RLVR) agentic training paradigm, our approach stores trajectories as external local files and trains the agent to autonomously retrieve and parse them through code-interpreter interactions (e.g., SQL queries), effectively bypassing the fixed in-context window constraints of LLMs. Experimental results demonstrate that our framework achieves strong performance for ShopTrajQA and shows generalization to other complex reasoning tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07995v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hongye Liu, Rongmei Lin, Anurag Kashyap, Hejie Cui, Ricardo Henao, Besnik Fetahu, Bing Yin</dc:creator>
    </item>
    <item>
      <title>MC-PDD: Masked Corpus-Level Pretraining Data Detection for Black-Box Large Language Models</title>
      <link>https://arxiv.org/abs/2606.07996</link>
      <description>arXiv:2606.07996v1 Announce Type: new 
Abstract: Pretraining is fundamental to the development of Large Language Models (LLMs), yet the opacity of pretraining data complicates model analysis and raises ethical, legal, and fairness concerns. Detecting whether specific datasets were used during pretraining is, therefore, critical. Existing state-of-the-art methods typically rely on access to model probability distributions, making them unsuitable for closed-source LLMs that provide only input-output interfaces. To address this limitation, we introduce Masked Corpus-level Pretraining Data Detection (MC-PDD), a novel method inspired by the masked language modeling paradigm. MC-PDD masks highly specific tokens in each text and prompts the LLM to predict the missing content. It then assesses whether the difference in prediction hit rates between a candidate corpus and a reference non-member corpus is statistically significant. Based on this comparison, MC-PDD determines whether the candidate texts were likely included in the model's pretraining data. Experimental results demonstrate clear and consistent differences in prediction hit rates between pretrained and unseen data across three datasets, for both open-source and closed-source LLMs. Despite operating under a stricter black-box setting, MC-PDD achieves performance comparable to existing detection methods. Our approach enables practical applications such as model auditing and data copyright verification using only standard API access. Upon acceptance, we will publicly release the code and datasets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07996v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Kaixin Lan, Mu You, Tao Fang, Binkai Ou, Lidia S. Chao, Derek F. Wong</dc:creator>
    </item>
    <item>
      <title>Enhancing AI Interpretability and Safety through Localised Architectures</title>
      <link>https://arxiv.org/abs/2606.07998</link>
      <description>arXiv:2606.07998v1 Announce Type: new 
Abstract: Recent advances in generative AI, especially powerful Large Language Models (LLMs) and Large Reasoning Models (LRMs), raise concerns over the interpretability, safety and sustainability of these large and opaque AI models. The power of such architectures is derived not only from the scalability of deep neural networks, but also massively parallel hardware such as GPU clusters. The diffuse nature of deep neural networks gives them great function-approximation capability when provided with sufficient training data but imposes a cost in interpretability and computational efficiency. Observing that localised machine learning (ML) models tend to be more interpretable and computationally efficient than deep neural networks on small datasets, we reason by analogy that similar advantages may apply to specific localised hardware ML architectures. We argue that localised architectures with lower bandwidth but higher expressivity per node have the potential to be fundamentally more interpretable than deep neural networks running on GPU clusters while remaining competitive for smaller datasets. We then evaluate the suitability of various hardware ML paradigms for implementing such localised architectures and evaluate their per-node expressivity, energy efficiency and practical maturity of the technology required.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07998v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ian Seet, Jonas Bozenhard, Simon Osterman</dc:creator>
    </item>
    <item>
      <title>Efficient Skill Grounding via Code Refactoring with Small Language Models</title>
      <link>https://arxiv.org/abs/2606.07999</link>
      <description>arXiv:2606.07999v1 Announce Type: new 
Abstract: Effective skill grounding is essential for deploying reusable skills in embodied agents, as even minor embodiment or environmental differences can render an entire skill incompatible. This challenge is particularly pronounced in embodied settings, where agents must operate in dynamic, partially observable environments without access to large language models (LLMs). In this setting, reliance on LLMs is impractical, while small language models (sLMs) remain insufficient for the effective skill grounding required for reliable long-horizon control. We present RECENT, a refactoring-centric agent framework that enables efficient skill grounding with sLMs by decoupling skill semantics from embodiment- and environment-specific execution binding. By representing skills as executable code, RECENT preserves the semantic intent encoded in a skill's control structure while grounding it by modifying only execution bindings through localized refactoring, rather than regenerating code from scratch. We evaluate RECENT across diverse skill grounding scenarios spanning multiple robot embodiments in dynamic environments, demonstrating robust long-horizon performance when deployed with an sLM. Across all scenarios, RECENT achieves the best performance among sLM-based Code-as-Policies (CaP) methods and matches the task performance of LLM-based CaP.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07999v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sera Choi, Wonje Choi, Saehun Chun, Daehee Lee, Jooyoung Kim, Chaeun Lee, Honguk Woo</dc:creator>
    </item>
    <item>
      <title>Summarization is Not Dead Yet</title>
      <link>https://arxiv.org/abs/2606.08000</link>
      <description>arXiv:2606.08000v1 Announce Type: new 
Abstract: The progress of large language models (LLMs) has fueled claims that model-generated summaries rival or even surpass human-written references, raising questions about whether summarization remains an open research problem. We re-examine this narrative through a multi-track evaluation covering five diverse datasets and five state-of-the-art LLMs, combining controlled human assessment, bias-mitigated LLM-as-Judge protocols, factuality verification against external knowledge, and corpus-level linguistic analysis. Our findings reveal a more nuanced landscape in which human reference summaries continue to demonstrate advantages in informativeness and faithfulness, whereas LLM outputs are preferred mainly for surface-level coherence and fluency. Factuality verification indicates that human references remain more reliable, particularly for claims involving reasoning or synthesis, and linguistic analysis uncovers a pattern of stylistic homogeneity across different models. These observations suggest that current LLMs have raised the floor of summarization quality, but the ceiling of their performance remains below human capabilities.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08000v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dongqi Liu, Chenxi Whitehouse, Zheng Zhao, Zhuchen Cao, Jian Li, Yabiao Wang</dc:creator>
    </item>
    <item>
      <title>Learning a Semantic Calibration Network for Open-Vocabulary Semantic Segmentation</title>
      <link>https://arxiv.org/abs/2606.08001</link>
      <description>arXiv:2606.08001v1 Announce Type: new 
Abstract: Semantic image segmentation assigns a predefined category label to each pixel, has achieved significant progress lately. Open-Vocabulary Segmentation (OVS) extends the segmentation task from a fixed set to an open set, enabling the identification and segmentation of novel concepts based on arbitrary text inputs, such as category names or descriptions. In this paper, we propose a novel Semantic Calibration Network (SCN) for open-vocabulary semantic segmentation. Different from prior approaches that focus on feature aggregation or simple fine-tuning of pre-trained models, SCN refines the mask classification process by explicitly modeling the semantic correlations between classes, aiming to enhance the model's discriminative power while effectively preserving the generalization abilities of the pre-trained CLIP model. Specifically, SCN comprises two core components: Class Disambiguation (CD) and Logits Fusion (LF). First, a cross-attention mechanism is utilized to transform the text embeddings into visually aware pseudo-text embeddings, in order to derive an enhanced similarity score that complements the original mask-text similarity score. Subsequently, the Class Disambiguation module captures implicit inter-class dependencies through a residual architecture to effectively resolve semantic ambiguities. Finally, the Logits Fusion module dynamically integrates multifaceted semantic evidence to ensure that the model achieves a robust semantic consensus while maintaining CLIP's inherent generalization capability. Comprehensive experimental results on mainstream benchmarks demonstrate that the proposed method achieves significant performance improvements compared to state-of-the-art algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08001v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yang Sun, Tao Wang, Anastasia Ioannou, Ge Xu</dc:creator>
    </item>
    <item>
      <title>Aqua Boundary-Saliency Attention Module for Lightweight Underwater Salient Instance Segmentation Detection Transformer</title>
      <link>https://arxiv.org/abs/2606.08002</link>
      <description>arXiv:2606.08002v1 Announce Type: new 
Abstract: Underwater instance segmentation integrates pixel-level mask prediction and instance-level discrimination for marine resource exploration, ecological monitoring, and underwater robotic perception. Recent prompt-based and auxiliary-modality methods improve mask quality, but their reliance on large foundation models, prompt generation, or extra modality estimation complicates efficient deployment. This work introduces Lightweight Underwater Salient Instance Segmentation Detection Transformer (LUSIS-DETR), a compact detection-transformer framework built around the Aqua Boundary-Saliency Attention Module (AquaBSAM). AquaBSAM embeds underwater boundary, contrast, attenuation, chroma, dark-channel, and center-prior cues into DINOv2-initialized multi-scale features through bounded residual modulation, while auxiliary mask supervision and small-object copy-paste are training-only. Extensive evaluation on four recent underwater instance segmentation datasets, UIIS, UIIS10K, USIS10K, and USIS16K, shows competitively leading performance against previous state-of-the-art works across category-aware and salient-instance protocols. TensorRT half-precision (FP16) benchmarking on an NVIDIA T4 graphics processing unit (GPU) achieves 4.31-6.34 milliseconds (ms) latency, supporting real-time inference under an accessible reproduction setting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08002v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>M. Fazri Nizar, Julian Supardi, Muhammad Naufal Rachmatullah</dc:creator>
    </item>
    <item>
      <title>Rewrite to Translate, Translate to Reward: Reinforcement Learning for Source Rewriting in Machine Translation</title>
      <link>https://arxiv.org/abs/2606.08011</link>
      <description>arXiv:2606.08011v1 Announce Type: new 
Abstract: Although directly prompting off-the-shelf Large Language Models (LLMs) to generate meaning-preserving source rewrites can effectively enhance Machine Translation (MT) quality, doing so requires manually tuning prompts for different MT models. In this work, we propose RLSR (Reinforcement Learning for Source Rewriting), a novel RL-based framework for training a source rewriting model without tuning prompts for each MT model. RLSR optimizes the rewriting model by directly using the improvement in downstream translation quality yielded by each rewritten source as the reward. Extensive experiments across six MT models and 16 language pairs demonstrate that our 4B rewriting models trained via RLSR significantly outperform the no-rewriting baseline and existing same-scale prompt-based rewriting baselines, while achieving competitive performance against prompt-based baselines based on the 235B LLM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08011v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Boxuan Lyu, Haiyue Song, Zhi Qu, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu Okumura</dc:creator>
    </item>
    <item>
      <title>The Dodona Protocol: A Living Design Science Experiment in Oracle Design</title>
      <link>https://arxiv.org/abs/2606.08012</link>
      <description>arXiv:2606.08012v1 Announce Type: new 
Abstract: The oracle problem, broadly understood as the difficulty of reliably incorporating external information into blockchain-based systems, has been widely examined by scholars and practitioners. Recent comparative research has shown that several challenges of modern blockchain oracles, including attributability, accountability, integrity, and query design, mirror procedural and epistemic constraints already present in ancient oracular institutions such as the Delphic Oracle. Yet the translation of these insights into applied oracle design remains largely unexplored. This paper introduces the Dodona Protocol, a modular, chain-agnostic oracle service inspired by procedural patterns identified in ancient and modern oracle systems. Named after the Oracle of Zeus at Dodona, one of the oldest oracular sanctuaries in ancient Greece, the protocol operationalizes principles such as structured consultation, access control, attributable resolution, constrained query formats, reputational accountability, and tiered service availability. Its first module implements a query and dispute resolution mechanism in which a named expert resolver provides binding answers to structured questions submitted by petitioners. The oracle does not claim to reveal objective truth; rather, it produces outcomes that parties have agreed in advance to accept. The paper presents the design rationale, architecture, and comparative positioning of the Dodona Protocol. It frames the protocol as a living research experiment within the Design Science Research tradition, where the deployed system functions as the research artifact and operational data support structured analysis, iterative refinement, and peer-reviewed dissemination. In doing so, the paper seeks to bridge the gap between oracle theory and oracle practice.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08012v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Giulio Caldarelli</dc:creator>
    </item>
    <item>
      <title>Evaluating the Impact of Task Granularity on Catastrophic Forgetting in Continual Learning</title>
      <link>https://arxiv.org/abs/2606.08013</link>
      <description>arXiv:2606.08013v1 Announce Type: new 
Abstract: Catastrophic forgetting, the abrupt loss of previously acquired knowledge upon learning new information, remains the central challenge in Continual Learning. This project investigates whether the order in which a model learns information affects how well it retains knowledge. Specifically, we ask: does learning general categories first (like "animals" vs "vehicles") before learning specific classes (like "dog" vs "cat") reduce forgetting compared to learning all classes at once?
  We test three approaches on CIFAR-100: (1) Coarse-to-Fine: train on 2 super-classes, then expand to 10 specific sub-classes, (2) Fine-to-Coarse: train on 10 sub-classes, then group into 2 super-classes, and (3) Flat: train on all 10 classes from the start. We use Elastic Weight Consolidation (EWC) to prevent forgetting during transitions. Our hypothesis is that learning general patterns first creates a stable foundation that helps the model retain knowledge when learning more detailed distinctions. We evaluate using standard metrics (accuracy, precision, recall, F1) plus continual learning metrics like backward transfer and forgetting rates. This work could inform how we design learning sequences for real-world systems that need to learn incrementally.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08013v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Emre Alyamac, Himanshu Janmeda, Shashwat Krishna, Yash Vijay</dc:creator>
    </item>
    <item>
      <title>GVC-Seg: Training-Free 3D Instance Segmentation via Geometric Visual Correspondence</title>
      <link>https://arxiv.org/abs/2606.08014</link>
      <description>arXiv:2606.08014v1 Announce Type: new 
Abstract: Accurate 3D instance segmentation in point cloud data is critical for machine vision applications. Recent advancements leverage multiple pre-trained foundation models to generate 3D proposals, followed by the application of proposal aggregation methods, which significantly enhance performance. However, they often produce sub-optimal results due to inherent variations in confidence levels across different segmentation models, resulting in a bias toward the model with higher confidence. This bias is inherently model-dependent and is influenced by factors such as data preprocessing techniques and training strategies. To address this bias, we propose a novel, training-free 3D instance segmentation approach via Geometric Visual Correspondence (GVC-Seg), which exploits the correspondence between 3D geometric cues and 2D visual cues to mitigate the confidence bias. Additionally, a 3D proposal generation module and a mask-aware CLIP feature extraction module are introduced during the instance mask generation and instance semantic reasoning, respectively. In this way, GVC-Seg enhances proposal quality assessment, ensuring unbiased ensemble learning across different models. Extensive experiments demonstrate that our method achieves state-of-the-art performance on several challenging benchmarks, while also exhibiting strong potential in open-vocabulary semantic segmentation settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08014v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Liang Xu, Fangjing Wang, Jinyu Yang, Feng Zheng</dc:creator>
    </item>
    <item>
      <title>Q-VGM: Q-Guided Value-Gradient Matching for Flow-Matching VLA Policies</title>
      <link>https://arxiv.org/abs/2606.08015</link>
      <description>arXiv:2606.08015v1 Announce Type: new 
Abstract: We propose Q-Guided Value-Gradient Matching (Q-VGM), an off-policy reinforcement learning (RL) method that tackles a long-standing challenge in fine-tuning flow-matching vision-language-action (VLA) policies: efficiently improving an expressive flow-matching action expert with respect to a learned Q-function. Effective improvement must exploit the first-order (gradient) information of the critic, but this is difficult for flow policies, because directly back-propagating the value through their multi-step denoising process is numerically unstable at VLA scale, while the tractable action likelihoods required by policy-gradient methods are unavailable under iterative denoising. Existing value-based methods either backpropagate through the full denoising chain, use the critic only at test time without updating the policy, or distill critic-improved actions as terminal labels without supervising the velocity field. Q-VGM sidesteps these issues by leveraging VGG-Flow, a value-gradient view of flow alignment in generative modeling that transforms value gradient into a denoising-time value-gradient field rather than an unstable end-to-end objective. This requires no action likelihoods and no backpropagation through the denoising chain, and operates on a fixed replay buffer. The critic is an action-sensitive Cal-QL ensemble over compact RLT features with per-layer action injection. Q-VGM enables a practical few-shot initialization then learn-from-experience paradigm: starting from a few-shot-SFT pi0.5 VLA, the method leverages self-generated rollout data to substantially improve task performance without additional expert supervision. On LIBERO, Q-VGM raises the average success rate from 75.0% to 92.5%; on RoboTwin 2.0, from 76.4% to 87.2%; and on two real-robot tabletop tasks, from 40.0% to 67.5%, outperforming all same-backbone, same-critic baselines across all three settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08015v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ziqian Wang, Jiayu Sun, Xingjian Mao, Minqian Wang, Yao Mu</dc:creator>
    </item>
    <item>
      <title>IEA: Amateur-Friendly Conversational Image Editing Agent via Three Stages of Multitask Alignment</title>
      <link>https://arxiv.org/abs/2606.08016</link>
      <description>arXiv:2606.08016v1 Announce Type: new 
Abstract: Current image editing software often hinges on fixed filters or expert tuning, leaving a gap between amateur users' intent and outcomes. Creations by generative models may contain artifacts, implausible details, or stylistic drift away from photorealism and offer little insight into why an edit was made. We propose IEA, a conversational Image Editing Agent that learns to operate parameterized tools in an explicit, interpretable action space. IEA is trained via a three-stage multitask pipeline: (1) SFT on distilled expert edits, (2) GRPO with rewards for likeness improvement, tool usefulness, and intent summarization, and (3) large-scale synthetic fine-tuning to jointly master image editing, refinement, and user intent summarization. By manipulating 16 editing tools step by step, IEA produces transparent edit traces that can be inspected and debugged. In quantitative experiments, it attains a lower pixel distance on the edit task and a higher ROUGE-L on the summary task than strong baselines. In user studies, it ranks best among tool-calling methods for instruction following while surpassing generative methods in overall perceptual quality. Our results validate interpretable, tool-centric VLMs as a reliable path to human instruction-guided image retouching.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08016v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zichen Zhu, Yuheng Sun, Mingxuan Zhu, Wenjie Ma, Situo Zhang, Zhexiang Wang, Ziyue Yang, Danyang Zhang, Kunyao Lan, Zihan Zhao, Dingye Liu, Siqi Xiang, Lu Chen, Kai Yu</dc:creator>
    </item>
    <item>
      <title>Fluid Antenna System-Enabled Mitigation of Asynchronous Reception in Cell-Free Massive MIMO Systems</title>
      <link>https://arxiv.org/abs/2606.08017</link>
      <description>arXiv:2606.08017v1 Announce Type: new 
Abstract: Practical distributed deployments inherently suffer from asynchronous signal arrivals, which exacerbate multi-user interference and degrade system performance, especially for coherent transmission. To natively mitigate the asynchronous reception effect, this paper proposes integrating fluid antenna systems (FASs) into distributed cell-free massive MIMO systems, exploiting their reconfigurable spatial positions to release additional spatial degrees of freedom (DoFs). We establish the FAS-enabled data transmission model with asynchronous reception, i.e., delay phases. We also derive the analytical downlink spectral efficiency (SE) performance of the proposed system under coherent and non-coherent transmissions, using low-complexity Maximum Ratio (MR) precoding to provide fundamental theoretical bounds. Specifically, we propose a novel nonmonotone accelerated projected gradient ascent algorithm to jointly optimize FAS positions and power control coefficients, maximizing the downlink sum SE. Numerical results demonstrate that while asynchronous reception severely degrades system performance for coherent transmission, the spatial DoFs unlocked by optimized FAS positions, along with efficient power control, can significantly counteract the effects of unknown delay phases and outperform traditional fixed-position antennas. For non-coherent transmission, which inherently bypasses asynchronous reception, the application of FAS leverages spatial reconfigurability to natively maximize signal strength and achieve more pronounced SE gains. Ultimately, our proposed FAS-enabled system, coupled with efficient power control, mitigates performance degradation due to asynchronous reception and outperforms traditional fixed-position antennas, paving the way for the practical deployment of FASs in robust, highly efficient 6G cell-free massive MIMO systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08017v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jun Qian, Zan Li, Junhui Rao, Ross Murch, Khaled B. Letaief</dc:creator>
    </item>
    <item>
      <title>UniQL: Towards Dialect-Universal Benchmarking for Text-to-SQL</title>
      <link>https://arxiv.org/abs/2606.08018</link>
      <description>arXiv:2606.08018v1 Announce Type: new 
Abstract: Existing text-to-SQL benchmarks are largely centered on SQLite, making it difficult to evaluate whether models can generalize across heterogeneous SQL dialects. However, real-world database systems differ substantially in syntax, functions, type systems, and execution semantics, so the same natural language intent often requires dialect-specific SQL realizations. We introduce UniQL, a human-verified benchmark for cross-dialect text-to-SQL evaluation. UniQL aligns 1,534 natural language questions with executable SQL annotations across 16 SQL dialects, yielding 24,544 dialect-specific queries. All dialects share the same intents, aligned schemas and database contents, enabling controlled evaluation of dialect generalization. UniQL is constructed through a hybrid pipeline combining database migration, SQL translation, execution-guided verification, iterative rule summarization, and human validation. Experiments on both open-source and closed-source LLMs show that current models remain far from dialect-universal, with substantial performance variation across database systems and limited transfer from SQLite success to other dialects. These findings highlight the need for aligned cross-dialect benchmarks and more dialect-aware text-to-SQL methods. Code and data are available at https://github.com/JerryGao818/UniQL</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08018v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jianling Gao, Chongyang Tao, Jiayuan Bai, Liu Yang, Xuanguang Pan, Jinrui Liu, Shihao Xing, Xiaohan Xu, Jie Liang, Shuai Ma</dc:creator>
    </item>
    <item>
      <title>Semantic Quorum Assurance: Collective Certification for Non-Deterministic AI Infrastructure</title>
      <link>https://arxiv.org/abs/2606.08021</link>
      <description>arXiv:2606.08021v1 Announce Type: new 
Abstract: As large language model (LLM) agents are integrated into autonomous cloud operations, distributed systems face a semantic reliability problem: proposer agents can generate production mutations, such as modifying IAM policies, opening firewall security groups, or executing data exports, that are syntactically valid and statically authorized but operationally unsafe. Classical distributed consensus protocols replicate deterministic state transitions but do not evaluate the safety of the proposed intent. To address this gap, we introduce Semantic Quorum Assurance (SQA), a control-plane primitive for governing non-deterministic agentic infrastructure. SQA represents proposals as declarative execution contracts bound to cryptographic evidence chains and routes them to a diverse panel of read-only, sandboxed validator agents. SQA aggregates their judgments under a risk-adaptive quorum predicate that enforces model and archetype diversity, adjusts weights based on calibrated assurance scores, and respects archetype-specific vetoes. Admitted proposals execute only through a sovereign execution gate. We instantiate SQA in a cloud-native control plane and formalize a correlated cognitive failure model for non-deterministic validators. On 500 infrastructure-inspired mutation scenarios, with safety results reported on held-out safe/unsafe trials excluding ambiguous scenarios, SQA reduces unsafe approval from 18.5% for single-agent validation to 0.3% while adding median validation latency of 1.45--4.12 seconds across the studied risk buckets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08021v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jun He, Deying Yu</dc:creator>
    </item>
    <item>
      <title>Arabic Sentence Segmentation Across Genres and Punctuation Conditions</title>
      <link>https://arxiv.org/abs/2606.08025</link>
      <description>arXiv:2606.08025v1 Announce Type: new 
Abstract: Sentence segmentation in Arabic is challenging due to ambiguous and inconsistent punctuation, with many texts lacking reliable sentence boundary markers. Existing approaches rely heavily on punctuation cues and are typically evaluated on well-formed text, limiting their robustness in realistic Arabic settings. To address this, we introduce AraSEG, a genre-diverse sentence segmentation corpus spanning eight genres and a wide range of punctuation and document structure conditions. Using AraSEG, we evaluate LLMs, lightweight encoder models, and dependency parser-based models under increasingly challenging segmentation settings. Our experiments show that lightweight encoders, and even dependency parser-based models, outperform LLMs in the most challenging settings. We further investigate the effects of training data size and genre diversity, finding that performance eventually saturates and cross-genre generalization remains challenging. We also demonstrate that accurate sentence segmentation substantially improves downstream dependency parsing. We make our code, data, and models publicly available.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08025v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mohammed Elkholy, Khalid N. Elmadani, Nizar Habash, Bashar Alhafni</dc:creator>
    </item>
    <item>
      <title>CausShield: Sample Reconstruction-Resilient Vertical FL via Causal Representation Learning</title>
      <link>https://arxiv.org/abs/2606.08027</link>
      <description>arXiv:2606.08027v1 Announce Type: new 
Abstract: Vertical federated learning (VFL) is a distributed learning paradigm that leverages vertically partitioned features across isolated parties without sharing raw samples; however, it remains vulnerable to active sample reconstruction attacks. Existing defenses fail to achieve a satisfactory trade-off between model utility and privacy protection, due to either suppressing task-relevant information alongside privacy-sensitive features or relying on end-to-end supervised training to converge the defense module, which exposes the model to early-epoch vulnerability. To address this challenge, we adopt a structural causal model (SCM) insight and construct CausShield. From a task-learning standpoint, causal features within a raw sample are those that are directly relevant and contributory to the learning objective, whereas non-causal features are task-irrelevant but often encode sample-specific private information, thereby facilitating reconstruction. Importantly, we lay a theoretical foundation to prove this insight. CausShield thus decomposes the shared representations between the client and the coordinating server in VFL into task-relevant and task-irrelevant components to ensure full-cycle privacy protection. Nonetheless, the decomposition is inherently challenging due to the dual objectives of preserving model utility while mitigating privacy leakage. We address this via a carefully formulated optimization problem, which is solved through unsupervised representation learning. We further theoretically prove that CausShield preserves the convergence behavior of standard VFL. Extensive experiments compare CausShield against seven SOTAs, including InvL (USENIX Security'25), and evaluate robustness against advanced reconstruction attacks such as URVFL (NDSS'25). Results demonstrate that CausShield consistently outperforms in privacy protection, model utility, and computational efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08027v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yongqi Jiang, Yansong Gao, Siguang Chen, Anmin Fu</dc:creator>
    </item>
    <item>
      <title>Noise-Adaptive High-Probability Regret Bounds for Online Convex Optimization</title>
      <link>https://arxiv.org/abs/2606.08028</link>
      <description>arXiv:2606.08028v1 Announce Type: new 
Abstract: We study high-probability regret bounds for online convex optimization (OCO) with strongly convex losses and establish three results that resolve open questions at the intersection of noise adaptivity, feedback structure, and constraint satisfaction. For the full-information setting with sub-Gaussian stochastic gradients, we prove a noise-adaptive high-probability regret bound in which the martingale deviation term scales with the noise level $\sigma$ rather than the gradient bound $G$, yielding a multiplicative improvement of $G/\sigma$ over the classical Azuma-Hoeffding baseline. Our analysis introduces an exponential supermartingale argument that bypasses the bounded-difference requirement of Freedman's inequality, enabling direct treatment of unbounded sub-Gaussian noise without truncation artifacts. For bandit feedback, we prove a minimax lower bound: the high-probability regret scales linearly in $\log(1/\delta)$, in contrast to the $\sqrt{\log(1/\delta)}$ confidence cost under full information. This constitutes a formal separation in the confidence cost of strongly convex OCO across feedback models. Regarding constrained OCO with stochastic constraints satisfying a Slater condition, we provide simultaneous high-probability guarantees for both cumulative regret and long-run constraint violation, achieving $\mathcal{O}(\sqrt{T\log(m/\delta)})$ regret and $\mathcal{O}(\sqrt{T}/(\zeta\delta) + m\sqrt{T\log(m/\delta)})$ violation. Synthetic experiments corroborate all theoretical predictions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08028v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wentao Zhang, Yutong Zhang, Wentao Mo</dc:creator>
    </item>
    <item>
      <title>IntentNav: Learning Spatial-Visual Object Navigation from Human Demonstrations</title>
      <link>https://arxiv.org/abs/2606.08029</link>
      <description>arXiv:2606.08029v1 Announce Type: new 
Abstract: Object navigation requires a robot to search for an unobserved target in an unknown environment by deciding where to explore next under partial observability. Effective search resembles human-like exploration: selectively probing visually promising frontiers while relying on spatial memory to avoid redundant revisits. We propose IntentNav, a spatial-visual imitation framework that learns human-like ObjectNav policies from human demonstrations. To infer high-level search intent from low-level human actions, we introduce Frontier-based Human-Intent Labeling, which looks ahead in human demonstrations and labels the frontier that best explains the demonstrator's future search direction. We construct a spatial-visual candidate space, where BEV memory tracks explored regions, unexplored frontiers, and trajectory history, while egocentric visual memory provides semantic cues for each candidate. A VLM policy is trained to select among these grounded candidates, using Intent-Aligned Objective to encourage consistent and human-like exploration. IntentNav achieves state-of-the-art performance on the MP3D, HM3D-v1 and HM3D-v2 ObjectNav benchmarks. The proposed candidate-level navigation interface transfers zero-shot to wheeled, quadruped, and humanoid robots without further VLM fine-tuning. \href{https://anonymous.4open.science/w/IntentNav/}{Project page}.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08029v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuxin Cai, Zongtai Li, Maonan Wang, Muyi Bao, Haokun Zhu, Ruofei Bai, Ding Zhao, Zirui Li, Wenshan Wang, Wei-Yun Yau, Ji Zhang, Chen Lv</dc:creator>
    </item>
    <item>
      <title>Voting Protocols as Coordination Mechanisms for Role-Constrained Multi-Agent Tutoring Systems</title>
      <link>https://arxiv.org/abs/2606.08030</link>
      <description>arXiv:2606.08030v1 Announce Type: new 
Abstract: Agentic tutoring systems introduce a coordination challenge: multiple agents may propose different but reasonable interventions, yet only one response can be delivered to the learner. In this paper, we study how voting protocols shape cooperation among four role-constrained pedagogical agents responsible for scaffolding, misconception, motivation, and metacognition. We compare four voting protocols -- simple, ranked, cumulative, and approval voting -- across two simulated tutoring environments on SciQ and HumanEval benchmarks. Rather than using voting as a simple aggregation step, we use it to analyze how collective decision rules shape coordination under partial pedagogical conflict. Across 1,200 simulated interactions, we find that agent deliberation and voting protocol type frequently change which response ultimately wins, showing that both meaningfully shape the collective decision. Different voting rules also produce distinct coordination behaviors, and even brief tutoring turns show measurable learning gains in simulated students. Overall, we show that protocol choice is associated with distinct coordination patterns among role-specialized pedagogical agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08030v1</guid>
      <category>cs.MA</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Eric S. Qiu, Joyce Gill</dc:creator>
    </item>
    <item>
      <title>Vision-Language Asymmetry in Bistable Image Captioning</title>
      <link>https://arxiv.org/abs/2606.08031</link>
      <description>arXiv:2606.08031v1 Announce Type: new 
Abstract: Wittgenstein's duck-rabbit poses a question for vision-language models: when a model captions an ambiguous image, where in the model is the commitment to one aspect made? We address this with a 3,320-generation behavioral baseline over 83 bistable stimuli that surfaces three regimes (default-dominant, force-dominant, force-balanced) under neutral vs forced-choice prompting, then probe the underlying representations using a TopK sparse autoencoder we train on the CLIP layer that LLaVA-1.6-7B actually consumes (validation EV 0.93). Across 69 bistable stimuli with both per-aspect feature pools available, 72% (50/69) show simultaneous activation of both pools at the vision tower, including 12/12 default-dominant duck/rabbit and 7/8 force-balanced young/old. Causal steering at CLIP layer 22 flips captions on default-dominant stimuli (33% rabbit-flip rate under a fluency guard) but cannot flip captions on force-balanced young/old at any tested coefficient, despite their vision-side superposition. The dominance bottleneck lives downstream of the vision tower; the gap between vision-side representation and language-side commitment is an empirical handle on the seeing/seeing-as distinction. We also flag a methodological note: rank-based statistics on TopK SAE outputs require tie-corrected ranking to avoid silent row-order bias.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08031v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Arohan Agate</dc:creator>
    </item>
    <item>
      <title>Balancing Real and Synthetic Data for CNN-based Masonry Crack Detection</title>
      <link>https://arxiv.org/abs/2606.08033</link>
      <description>arXiv:2606.08033v1 Announce Type: new 
Abstract: Cracks are a critical indicator of building health, and early stage identification is fundamental to prevent harmful damages. Advances in deep learning (DL), particularly convolutional neural networks (CNNs), have enabled scalable solutions for automated crack detection. However, CNN performance strongly depends on the availability of large and diverse datasets, which is particularly challenging for complex surfaces such as masonry. Collecting sufficient real data is time-consuming, while publicly available datasets may not be adequate. To address this limitation, we explored generating synthetic crack data, which complements real data and improves training effectiveness. The real dataset consists of masonry crack images collected from buildings in Bologna and surrounding areas. In contrast, the synthetic dataset was generated using a crack overlay tool that adds cracks to background images in a controlled orientation and placement. The real dataset was used to train several DL architectures, to identify the best-performing model (InceptionV4) employed for experiments with generated data. Six training scenarios were tested in InceptionV4 by varying the ratio of real and synthetic data, with evaluation performed on a test set composed of real images using the F1-score and mean Intersection over Union (mIoU) metrics. Results show that training on synthetic data plus a modest addition of 20% real data achieves results comparable to training on real data only. Moreover, the 20/80 scenario (synthetic/real) achieved an 76% F1-score and 80% mean IoU, outperforming the real-only case. As can be seen, the method demonstrates the potential of synthetic data to reduce collection efforts while enhancing crack detection accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08033v1</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mattia Forlesi, Alfonso Esposito, Ivan Zyrianoff, Alessandro Marzani, Marco Di Felice</dc:creator>
    </item>
    <item>
      <title>Sci-Rho: A Multilingual Visually-Grounded Symbolic Benchmark for STEM Problems</title>
      <link>https://arxiv.org/abs/2606.08034</link>
      <description>arXiv:2606.08034v1 Announce Type: new 
Abstract: Symbolic benchmarks have emerged as a key approach to assess model robustness under minor modifications to STEM-related questions. However, existing symbolic benchmarks mostly remain limited to mathematical reasoning, lack visual grounding, and are predominantly in English. In this work, we introduce Sci-Rho (Science Rhobustness), a dynamic benchmark for visually-grounded STEM problems spanning five subjects and seven languages, comprising 4,242 problem templates (606 per language) crafted by domain experts, including Olympiad medalists. Each template is implemented as executable Python code that generates diverse but equivalent problem instances by varying numerical values, visual patterns, geometric shapes, color schemes, and function types, resulting in 42,420 instances in total, each paired with reasoning steps and ground-truth solutions. We evaluated 17 state-of-the-art VLMs and discovered a noticeable gap between worst-case accuracy (defined as the proportion of problem templates that a model answers correctly across every generated variation) and average accuracy. We also discovered that smaller models show noticeable performance degradation across languages, whereas proprietary and larger models remain robust. Step-level evaluation reflects this same trend, revealing a significant gap between average F1 and worst-case F1 scores. Finally, our inspection of attention heads of a VLM reveals substantial cross-lingual variation in the relative attention allocated to image tokens compared to text tokens. Our work highlights the importance of evaluation beyond static benchmarks as a metric to measure the quality of VLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08034v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Muhammad Falensi Azmi, Ikhlasul Akmal Hanif, Vallerie Alexandra Putra, Adi Yeltay, Abdullah Mubarak, Fajri Koto</dc:creator>
    </item>
    <item>
      <title>DyCo-RL: Dynamic Cross-Modal Coordination for Visual Reasoning</title>
      <link>https://arxiv.org/abs/2606.08035</link>
      <description>arXiv:2606.08035v1 Announce Type: new 
Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a leading paradigm for enhancing visual reasoning in Multimodal Large Language Models (MLLMs). However, existing RLVR methods optimize primarily for the reasoning outcome, fundamentally overlooking the fine-grained cross-modal coordination required during the generation process. Through token-level analyses and controlled interventions, we reveal that during Chain-of-Thought (CoT) reasoning, MLLMs frequently fail to dynamically alternate between extracting visual evidence and synthesizing textual context-a coordination breakdown that is causally linked to reasoning failures. Motivated by these findings, we propose DyCo-RL, which integrates dynamic cross-modal coordination into RLVR optimization. Specifically, DyCo-RL uses the Fisher-Rao geodesic distance to measure within-modality attention shifts, assigning tokens to either visually-oriented or text-oriented functional roles. It then evaluates the alignment between a token's actual attention allocation and its assigned role, leveraging this score for alignment-guided advantage reweighting during policy optimization. Extensive experiments demonstrate that the algorithm-agnostic DyCo-RL, when applied to Qwen2.5-VL-3B/7B, consistently improves four representative RLVR algorithms across seven benchmarks spanning visual-centric and mathematical reasoning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08035v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hangui Lin, Yan Shu, Zhengyang Liang, Chi Liu, Xiangrui Liu, Minghao Qin, Teng Long, Zheng Liu, Nicu Sebe</dc:creator>
    </item>
    <item>
      <title>GIScholarBench: Benchmarking LLM Overconfidence in GIS Research</title>
      <link>https://arxiv.org/abs/2606.08036</link>
      <description>arXiv:2606.08036v1 Announce Type: new 
Abstract: Large language models (LLMs) are increasingly used in academic research workflows, but scholarly tasks require high factual precision and therefore expose a key weakness: overconfidence. Here, overconfidence is defined behaviorally as the tendency to produce confident, assertive, and well-formatted outputs even when the underlying knowledge is incomplete or unverifiable, rather than as a calibration gap between stated confidence and accuracy. To examine this issue, we introduce GIScholarBench, a benchmark built from 10,865 papers published in 25 core GIScience journals between 2020 and 2025. The benchmark covers three tasks with increasing cognitive complexity: metadata retrieval, literature linking, and research direction generation. We evaluate Claude Sonnet 4.5, Gemini 3, and ChatGPT 5.3 through their native web interfaces under real-world user-facing conditions. Results show consistent overconfidence across all tasks. In metadata retrieval, ChatGPT 5.3 achieves the highest accuracy, but all models still generate definitive titles and DOIs when predictions are wrong. In literature linking, Claude Sonnet 4.5 recovers the most references, but all models show a clear gap between top-ranked retrieval and longer citation lists, suggesting that references are extended beyond reliable retrieval capacity. In research direction generation, AI-generated directions show lower topic coverage, higher novel miss rates, and lower semantic diversity than real future-citing papers. These findings suggest that LLM overconfidence is task-invariant but takes different forms: factual overgeneration in retrieval, unreliable citation expansion in literature linking, and overconfidence in output completeness during research ideation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08036v1</guid>
      <category>cs.IR</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zongrng Li, Mingzheng Yang, Lei Zou, Hongxu Ma, Hao Tian, Siqi Zhou, Wenjing Gong, Kaili Zhang, Bingqian Chen, Mitch Zhang, Yifan Yang</dc:creator>
    </item>
    <item>
      <title>SafeECGMatch: Calibration-Aware Joint Frequency and Time Space Semi-Supervised Learning for Open-Set ECG Classification</title>
      <link>https://arxiv.org/abs/2606.08037</link>
      <description>arXiv:2606.08037v1 Announce Type: new 
Abstract: Electrocardiogram (ECG) classification models often suffer from severe label scarcity, making semi-supervised learning (SSL) an attractive strategy for reducing annotation costs. In clinical settings, however, unlabeled pools frequently contain out-of-distribution (OOD) anomalies or diagnostic groups absent from the labeled set. Standard SSL forces incorrect pseudo-labels onto these unseen classes, producing overconfident predictions. To address this, we propose SafeECGMatch, a calibration-aware safe SSL framework for single-label ECG classification under label distribution mismatch. Methodologically, SafeECGMatch employs a dual-branch architecture extracting time-frequency latent representations via ECG-specific augmentations. Crucially, it dynamically aligns confidence with empirical accuracy through adaptive label smoothing and temperature scaling, calibrating both the multiclass classifier and the OOD detector across temporal and spectral domains. This joint optimization allows trustworthy OOD rejection and reliable pseudo-labeling. Evaluated on the PTB-XL and PhysioNet/CinC Challenge benchmarks, SafeECGMatch achieves state-of-the-art accuracy and calibration, advancing reliable knowledge discovery in physiological time-series. Code is available at https://github.com/labhai/SafeECGMatch.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08037v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hongkyu Koh, Ikbeom Jang</dc:creator>
    </item>
    <item>
      <title>Exploring the Scale and Diversity of Speech Anti-spoofing Datasets: Experiments and Analysis</title>
      <link>https://arxiv.org/abs/2606.08038</link>
      <description>arXiv:2606.08038v1 Announce Type: new 
Abstract: The scale of speech anti-spoofing datasets has grown exponentially over the past decade, driven by the assumption that larger data leads to better performance. However, it remains unclear whether indiscriminate scaling commensurately improves model generalization. This study challenges the "scale-first" paradigm by decoupling the impacts of training data scale versus diversity. Through experiments on representative datasets, we report two key findings: (1) Larger is not always better. Expanding data scale excessively under fixed generation methods yields negligible returns and may even degrade cross-domain generalization due to overfitting.(2) Diversity outweighs scale. A smaller composite training set featuring diverse attacks significantly outperforms larger-scale datasets with limited diversity in cross-dataset evaluations. We conclude that future dataset construction should prioritize the diversity of generation methods over scale to effectively enhance model generalization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08038v1</guid>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhuolin Yi, Jun Xue, Yanzhen Ren, Yihuan Huang, Yi Chai, Daixian Li, Guanxiang Feng, Jiajun Liu</dc:creator>
    </item>
    <item>
      <title>MuJoCo-Drones-Gym: A GPU-Accelerated Multi-Drone Simulator for Control and Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2606.08039</link>
      <description>arXiv:2606.08039v1 Announce Type: new 
Abstract: Robotic simulators are a cornerstone of modern research in aerial robotics, serving both as a vehicle for the development of new control algorithms and as the data source for training reinforcement learning (RL) policies. Yet, existing quadcopter learning environments often face a trade-off between physical fidelity, multi-agent support, and the throughput required by modern deep RL pipelines. In this paper, we present MuJoCo-Drones-Gym, an open-source Gymnasium-compatible multi-drone environment built on top of the MuJoCo physics engine. MuJoCo-Drones-Gym supports an arbitrary number of Bitcraze Crazyflie 2.x nano-quadcopters and exposes a modular API for selecting (i)~the physics model (rigid-body MuJoCo, explicit Python dynamics, or any subset of ground effect, blade drag, and inter-drone downwash), (ii)~the action interface (per-motor RPMs, collective normalized thrust, velocity setpoints, or PID waypoint commands), and (iii)~the observation space (kinematic state vectors, RGB / depth / segmentation cameras, or neighbourhood adjacency information). A PettingZoo ParallelEnv wrapper enables drop-in multi-agent reinforcement learning, while a suite of seven task environments, hover, velocity tracking, multi-drone hover, waypoint navigation, formation flight, gate racing, and a generic multi-agent template, demonstrates the breadth of the interface. We describe the environment design, the underlying physics and quadcopter dynamics, and illustrate its use through control and learning examples that mirror those of the closely related gym-pybullet-drones project, while taking advantage of MuJoCo's improved contact handling, rendering, and parallelizability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08039v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Manan Tayal</dc:creator>
    </item>
    <item>
      <title>Wispy to Voluminous: Prior-free Multi-view Capture of Strand-level Facial Hair</title>
      <link>https://arxiv.org/abs/2606.08041</link>
      <description>arXiv:2606.08041v1 Announce Type: new 
Abstract: Facial hair is a defining trait of personal identity, yet remains a critical bottleneck for digital avatars. Recent volumetric methods achieve photorealism but bake hair into the underlying face geometry, preventing editability and failing to resolve sparse, strand-like structures. Meanwhile, scalp-hair reconstruction methods target dense hair volumes and do not transfer to the sparse, spatially-varying nature of facial hair. We present a pipeline that automatically reconstructs facial hair -- beard, mustache, lashes, and brows -- from multi-view images, converting an unstructured 3D Gaussian representation into an explicit curve-based strand representation. We resolve geometric ambiguities in four stages: (i) optimizing 3D Gaussians constrained by tracked head geometry to enforce early ray termination and suppress sub-surface noise; (ii) tracing continuous strands robust to frequent crossings and extreme curvature; (iii) grounding strands to the surface and resolving root-tip ambiguity via a physically-motivated prior; and (iv) refining the reconstruction through opacity-driven density control under photometric optimization. To our knowledge, this is the first method to reconstruct high-fidelity facial hair strands from a 3D Gaussian representation. The recovered strands faithfully preserve the orientation and sparsity patterns characteristic of facial hair, and yield assets immediately suitable for downstream production tasks, including facial animation and physical simulation, geometric grooming and transfer, appearance editing, and physics-based rendering.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08041v1</guid>
      <category>cs.GR</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jaeseong Lee, Giljoo Nam, Adrian Jarabo, Carlos Aliaga</dc:creator>
    </item>
    <item>
      <title>OmniFaceRig: Fully Automatic Inner-Mouth-Aware Face Rigging Across Diverse 3D Character Topologies</title>
      <link>https://arxiv.org/abs/2606.08043</link>
      <description>arXiv:2606.08043v1 Announce Type: new 
Abstract: Facial rigging - creating FACS-based blendshapes together with inner-mouth geometry (teeth, gums, and tongue) - remains a major bottleneck in 3D character production. Existing pipelines still require substantial designer effort, especially for manual landmark annotation, per-character template adjustment, and inner-mouth placement. We present OmniFaceRig, a fully automatic end-to-end pipeline that converts a static surface-only 3D character mesh, with no pre-modeled oral cavity, into an inner-mouth-aware FACS rig with up to 155 blendshapes, procedurally fitted teeth, gums, and tongue, and re-packed UV/texture. OmniFaceRig supports diverse topologies - humans, humanoids, long-muzzled animals (e.g., dogs, wolves, foxes), and short-muzzled animals (e.g., cats, bears, rabbits, tigers) - with no manual landmarks, no user-provided templates, and no per-asset setup. The pipeline combines hybrid VLM+CV riggability checking, multi-model face parsing, dense keypoint-driven template registration, procedural inner-mouth construction, and collision-aware blendshape transfer. For non-human characters, OmniFaceRig selects topology-specific face and inner-mouth templates and uses collision-aware inner-mouth fitting to reduce teeth-face intersections without exposing users to category-specific tuning. We also publicly release Omni-Bench, a freely available benchmark dataset of 1,000 biped 3D characters with FACS facial blendshapes and inner-mouth geometry, spanning humans, humanoids, cats, dogs, and other animals. Experiments show high final rigging success on screened Omni-Bench inputs, nearly complete face detection recall from the segmentation ensemble and reliable inner-mouth placement with low penetration. Together, OmniFaceRig provides an automatic path from static generated characters to animation-ready facial rigs across both human and non-human topologies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08043v1</guid>
      <category>cs.GR</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chao Wang, Guangyao Ma, John Doublestein, Junming Chen, Yiming Lin, Zhaoen Su, Xiaomin Luo, Shiyang Cheng, Jie Shen, Doug Roble, Dilin Wang, Yilei Li, Rakesh Ranjan</dc:creator>
    </item>
    <item>
      <title>When Behavioral Safety Evaluation Fails: A Representation-Level Perspective</title>
      <link>https://arxiv.org/abs/2606.08044</link>
      <description>arXiv:2606.08044v1 Announce Type: new 
Abstract: Large Language Model (LLM) safety has often been evaluated at the behavior level, which provides limited evidence of internal robustness, as these evaluations target outputs rather than representation-level vulnerability under intervention. We formalize this discrepancy as the audit gap: the difference between behavioral safety and robustness under intervention. To study this gap, we construct dissociated models that preserve safe outward behavior while remaining vulnerable in the latent space. We introduce an intervention-based evaluation framework to test model robustness through soft interventions in parameter and latent spaces, including harmful fine-tuning and layer-wise latent perturbations. To formalize the evaluation, we propose the Latent Vulnerability Score (LVS) to measure how easily harmful behavior can be elicited by bounded latent perturbations. Using this evaluation framework, we show that behavioral safety metrics are insufficient measures of representation-level robustness across multiple safely and unsafely aligned state-of-the-art models. Notably, dissociated models show substantially elevated LVSs despite comparable refusal behavior under harmful intervention, with intermediate representations being the most sensitive to intervention. Our results suggest that behavioral safety evaluation alone provides an incomplete picture of model robustness, motivating representation-aware audits of latent vulnerability and observable behavior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08044v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Enyi Jiang, Anders Gj{\o}lbye, Yibo Jacky Zhang, Sanmi Koyejo</dc:creator>
    </item>
    <item>
      <title>OSMGraphCLIP: Learning Global Location Representations from OpenStreetMap Graphs</title>
      <link>https://arxiv.org/abs/2606.08046</link>
      <description>arXiv:2606.08046v1 Announce Type: new 
Abstract: We present OSMGraphCLIP, a CLIP-style geospatial representation model that learns global location embeddings from freely available OpenStreetMap (OSM) data. OSMGraphCLIP represents geographic environments as heterogeneous graphs of typed OSM features, preserving the topological and semantic relationships among roads, buildings, land-use regions, and points of interest. A multi-scale graph encoder captures both fine-grained local structure and broader landscape composition, and supervises a spherical-harmonics location encoder through a contrastive alignment objective. We evaluate OSMGraphCLIP across a diverse suite of downstream geospatial regression and classification tasks spanning climate, ecology, socioeconomic indicators, public health, land cover, biodiversity, and wildfire forecasting, and show that structured OSM data alone supports strong global location representations across domains. OSMGraphCLIP matches or exceeds satellite-based baselines on the majority of benchmarks, with the most pronounced advantage on socioeconomic and public-health tasks, where OSM's explicit semantic annotation of the built environment encodes patterns of human activity that satellite pixels can only capture indirectly. On ecological and environmental tasks, the model remains closely competitive with imagery-based methods despite using no Earth observation data. Qualitative analysis confirms that the learned embeddings organize geographic space coherently, recovering biome boundaries, urban gradients, and tropical--temperate distinctions from map topology alone.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08046v1</guid>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dimitrios Michail, Eleni Saka, Ioannis Giannopoulos, Ioannis Papoutsis</dc:creator>
    </item>
    <item>
      <title>Diffusion Language Model Parallel Decoding via Product-of-Experts Bridge</title>
      <link>https://arxiv.org/abs/2606.08048</link>
      <description>arXiv:2606.08048v1 Announce Type: new 
Abstract: Diffusion language models (DLMs) offer substantial speed advantages through parallel decoding, but the lack of token dependencies limits generation quality compared to autoregressive (AR) models. Recent progress attempts to bridge the gap via importance sampling, with DLM being the proposal and AR being the target. However, due to the huge gap between their distributions, the sampling requires a large number of particles and is thus expensive to compute. In this paper, we introduce PoE-Bridge, a novel decoding framework that drastically improves generation speed and accuracy by introducing an intermediate distribution to bridge the gap. The distribution is constructed as a Product-of-Experts (PoE) of the DLM proposal and the AR target. With the intermediate distribution, we first use the DLM to draft multiple continuations in parallel, then apply rejection sampling to verify the drafted tokens and move the resulting candidates toward the PoE. We then use importance sampling to further correct the PoE-aligned candidates toward the AR target. We further propose several improved techniques, including mixed-temperature sampling for enhanced diversity and elastic rejection windows for reducing wasted verification. Empirically, PoE-Bridge achieves significantly improved accuracy with $5\times$ speedup over the standard DLM decoding approach, and recovers at least 95% of the target AR model's performance, efficiently advancing most of the quality gap on challenging mathematical reasoning and coding tasks. Our code is available at https://github.com/juntongshi48/poe-bridge.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08048v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Juntong Shi, Brian L. Trippe, Jure Leskovec, Stefano Ermon, Minkai Xu</dc:creator>
    </item>
    <item>
      <title>SKILL.nb: Selective Formalization and Gated Execution for Durable Agent Workflows</title>
      <link>https://arxiv.org/abs/2606.08049</link>
      <description>arXiv:2606.08049v1 Announce Type: new 
Abstract: AI agents increasingly turn past experience into reusable artifacts such as code, workflows, and procedural memories. Reuse can improve efficiency, but it also creates a lifecycle reliability problem: artifacts that succeed once may fail under environment drift, underspecified tasks, or changing task distributions, especially in web automation. We introduce SKILL.nb, a framework for governing reusable agent workflows with evidence-calibrated lifecycle policies. SKILL.nb uses selective formalization: execution evidence decides which workflow steps should become executable code, which should remain natural-language guided, and when those choices should be revised. Workflows are stored as auditable, versioned notebooks that interleave natural-language guidance, multi-language executable cells, validation gates, fallback paths, and multimodal evidence such as outputs, screenshots, and error traces. At runtime, gate-conditioned execution lets each step run code when its gates validate, or fall back locally when drift invalidates the executable realization. On WebArena-Verified, SKILL.nb achieves 53.7% single-round success, improving over the strongest baseline by 3.9 percentage points. Across three re-executions, it retains 91.7% of initially successful tasks, 15.5 points above the next best method. Under bounded repair, it recovers 72.9% of subsequent failures while limiting post-repair regressions to 4.2%, compared with 15.0% to 17.0% for persistent baselines. It also leads on Mind2Web cross-website and cross-domain splits. In a GitLab migration test, SKILL.nb preserves performance when reusing frozen state learned on GitLab 15.7, with frozen-versus-fresh target-version gaps of -1.7 points on GitLab 16.11 and +0.6 points on GitLab 18.9. These results identify lifecycle governance and gate-conditioned execution as reliability axes beyond one-shot task success.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08049v1</guid>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Amine El Hattami, Nicolas Chapados, Christopher Pal</dc:creator>
    </item>
    <item>
      <title>Automatic, Real-time Classification of User Feedback Using Large Language Models</title>
      <link>https://arxiv.org/abs/2606.08050</link>
      <description>arXiv:2606.08050v1 Announce Type: new 
Abstract: In this paper we discuss an ongoing multi-year project that aims to make open text feedback more accessible and useful to UX practitioners by automating classification and providing real time access to comments, themes, and analysis. By significantly lowering the time and knowledge cost of implementing automated solutions, we aim to effectively democratize our data analysis processes, allowing and encouraging non-technical stakeholders to access and leverage data on their own. We share both the organizational and technical constraints we have encountered over the course of this project, and the solutions we have prototyped as a result of those constraints.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08050v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jim Maddock, Rose Leitner, Anna Wu</dc:creator>
    </item>
    <item>
      <title>How Small Can You Go? LoRA Fine-Tuning 270M-8B Models for Merchant Information Extraction in Financial Transactions</title>
      <link>https://arxiv.org/abs/2606.08051</link>
      <description>arXiv:2606.08051v1 Announce Type: new 
Abstract: Financial transaction processing requires extracting structured merchant information from noisy, abbreviated bank transaction strings at scale. Our current production system, a LoRA-fine-tuned LLaMA 3.1-8B, achieves 96.95% F1 on this task, but deploying 8-billion-parameter models imposes prohibitive memory, latency, and cost constraints. To identify more efficient alternatives, we conduct a deployment-focused study of 24 model variants spanning four model families: Gemma 3 (270M, 1B, 4B), Qwen 3.5 (0.8B, 2B, 4B), Aya (3.35B), and LLaMA 3.1-8B, systematically evaluating accuracy, inference throughput, training cost, and hardware behavior to assess production suitability. Our findings show that: (1) reproducing the LLaMA 3.1-8B fine-tune with a LoRA rank of 8 achieves 96.75% F1, only 0.20 points below the rank-32 baseline; (2) Qwen 3.5 4B with JSON-only prompting reaches 96.60% F1, within 0.35 points of the 8B baseline while using roughly half the parameters; (3) the 0.8B Qwen 3.5 model achieves 94.75% F1, matching models 2.5-4x larger and offering an attractive latency-accuracy trade-off; (4) chain-of-thought fine-tuning generally improves F1 by 0.3-1.8 points across most models, although Qwen 3.5 4B performs best with direct JSON-only prompting; and (5) Qwen 3.5 Think and Nothink training templates produce nearly identical results (F1 differences &lt;0.004), indicating that explicit reasoning supervision is unnecessary for structured extraction tasks. We further deploy all 14 fine-tuned sub-8B models as Databricks Model Serving endpoints and observe that benchmark performance transfers reliably to production, with an average F1 change of only 0.8 points. Aya 3.35B, based on the Cohere2 architecture, is the sole exception, exhibiting a 3-5 point decline under serving conditions. Based on these results, we provide deployment recommendations across accuracy and latency requirements, ...</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08051v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Donghao Huang, Tomas Drietomsky, Benjamin Barrett, Zhaoxia Wang</dc:creator>
    </item>
    <item>
      <title>What's the Point? Spatial Grammar &amp; Index Resolution for Sign Language Processing</title>
      <link>https://arxiv.org/abs/2606.08056</link>
      <description>arXiv:2606.08056v1 Announce Type: new 
Abstract: Sign language models are predominantly trained with gloss-sequence or text supervision, thereby under-modeling non-lexical and productive constructions. One comparatively tractable instance is spatial indexing: pointing gestures that assign discourse entities to spatial loci for subsequent co-reference, which lexicon-centric objectives largely fail to capture. We present a targeted evaluation of indexing in Sign Language Recognition, showing that despite comprising 10-15% of signing content, indexing is poorly recovered. We introduce a framework for training and evaluating indexing experts, establishing a baseline for index-aware sign language modeling. Our approach decomposes spatial reference resolution into index detection and discourse entity linking. The resulting mention representations enable automatic annotation and non-lexical structure modeling, and serve as an auxiliary indexing expert that augments a frozen SLR model at inference time.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08056v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Oline Ranum, Simon Hadfield, Richard Bowden</dc:creator>
    </item>
    <item>
      <title>EgoAERO: Learning Dexterous Manipulation from a Single Egocentric Video without Object Assets</title>
      <link>https://arxiv.org/abs/2606.08057</link>
      <description>arXiv:2606.08057v1 Announce Type: new 
Abstract: Egocentric RGB-D videos offer a natural source of human dexterous manipulation demonstrations, but existing data is difficult to use for robot learning because object pose, geometry, and contact information are often missing or require pre-scanned object assets. We present EgoAERO, the first framework that learns dexterous manipulation from a single egocentric RGB-D human demonstration without object assets. EgoAERO reconstructs contact-consistent hand-object trajectories through asset-free object tracking and reconstruction, ego motion compensation, and adaptive contact optimization, then converts them into robot policies using two-stage residual learning. We further introduce an online quality assessment mechanism and construct EgoDex-R, a large-scale egocentric dataset with 4.3M RGB-D frames for dexterous policy learning. Simulation and real-world experiments show that EgoAERO enables single-demonstration dexterous manipulation and achieves downstream performance close to CAD-based reconstructions on HOI4D.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08057v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yichen Niu, Haoran Lv, Xinrui Zhang, Xueyao Wan, Shiyu Gao, Ying Ai, Hui Xu, Yongqi Hu, Hengyi Zhang, Yang Xie,  Zhaxizhuoma, Yue Zhao, Zhenshan Bing, Yan Ding, Jianxing Liu</dc:creator>
    </item>
    <item>
      <title>Perceptive Behavior Foundation Model: Adapting Human Motion Priors to Robot-Centric Terrain</title>
      <link>https://arxiv.org/abs/2606.08059</link>
      <description>arXiv:2606.08059v1 Announce Type: new 
Abstract: Humanoid behavior foundation models aim to acquire reusable whole-body control policies from broad human motion priors, enabling a single controller to produce diverse and expressive behaviors. However, existing motion-centric foundation policies largely assume that the reference motion is already physically compatible with the robot's surroundings. This assumption breaks when the demonstrator, operator, and robot inhabit different environments: a human motion may specify the intended behavior, but not the footholds, clearance, body height, or contact timing required by the robot's local terrain. We introduce \emph{Perceptive Behavior Foundation Model} (Perceptive BFM), a terrain-aware humanoid control framework that grounds human motion priors in robot-centric perception. The model preserves raw kinematic motion references as the behavioral interface, while using local terrain observations to adapt contacts, posture, and timing. To provide scalable terrain supervision, we develop \emph{terrain-conformal reference synthesis} (TCRS), which converts locomotion-oriented human motion clips into terrain-consistent references through contact-aware foothold construction, foot-geometry-aware swing optimization, support-aware root reconstruction, collision repair, and multi-point inverse kinematics. We then train a blind adapted-reference teacher and transfer its terrain-conformal behavior to a deployed raw-reference student through target-frame action alignment. The student is an identity-gated Transformer tracker whose terrain features enter through residual pathways initialized to preserve the motion-tracking prior and trained to produce local corrections only when needed.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08059v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zifan Wang, Yizhao Li, Teli Ma, Qiang Zhang, Yudong Fan, Hao Xu, Shuo Yang, Junwei Liang</dc:creator>
    </item>
    <item>
      <title>TOMOYO Linux: A Mandatory Access Control Method Based on Application Execution State</title>
      <link>https://arxiv.org/abs/2606.08060</link>
      <description>arXiv:2606.08060v1 Announce Type: new 
Abstract: Existing access control methods grant access requests based on the combinations of applications as subject and files as objects. Therefore intents of applications and the possible effects caused by granting the access requests have not been taken into consideration. In this paper, we propose a new access control method based on application history and intents. With our access control method, system administrators can reduce the risks caused by malicious access attempts and wrong operations. In this paper, the concept and implementation design will be explained as well as the brief evaluation report of TOMOYO Linux, our implementation of the new access control method to Linux.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08060v1</guid>
      <category>cs.OS</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Toshiharu Harada, Tetsuo Handa, Masaki Hashimoto, Hidehiko Tanaka</dc:creator>
    </item>
    <item>
      <title>Multidimensional Resilience for Electrical Power Systems: Systematic Review, Integrated Index, and Validation under Real-World Cyber-Physical Attack Scenarios</title>
      <link>https://arxiv.org/abs/2606.08062</link>
      <description>arXiv:2606.08062v1 Announce Type: new 
Abstract: The accelerating decarbonization of energy systems has transformed electrical power systems into complex infrastructures exposed to threats whose interactions generate systemic vulnerabilities that conventional resilience approaches fail to capture. Although resilience assessment has expanded across multiple dimensions, existing studies largely examine them in isolation or adjacent pairs, leaving cross-dimensional couplings insufficiently explored. This study demonstrates i) that single-dimension assessments fail to capture the degradation produced by simultaneous cross-dimensional failures, ii) the nonlinear amplification emerging when physical, operational, and digital-cyber dimensions are jointly compromised, and iii) the intensification imposed by climatic and economic-regulatory stressors.
  To this end, we leverage a hybrid quantitative methodology. A PRISMA 2020 review with backward and forward snowballing identifies methodological gaps and unresolved dependencies across five resilience dimensions: physical, operational, digital-cyber, climatic-external, and economic-regulatory. Following this analysis, a Multidimensional Resilience Index (MDRI) is developed to capture endogenous couplings and exogenous amplification effects and is validated under escalating cyber-physical attack scenarios inspired by the December 2025 attack on Polish energy infrastructure. Results show that degradation under cascading and simultaneous failures is nearly eight times greater than under isolated stress, while exogenous conditions amplify degradation by an additional factor approaching six, with 72% of this amplification driven by exogenous stressors. Combined, these mechanisms produce a 46-fold increase in resilience loss compared to a single-vector reference.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08062v1</guid>
      <category>eess.SY</category>
      <category>cs.CR</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Isaac Ortega Romero, Ioannis Zografopoulos</dc:creator>
    </item>
    <item>
      <title>Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?</title>
      <link>https://arxiv.org/abs/2606.08063</link>
      <description>arXiv:2606.08063v1 Announce Type: new 
Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable success in visual understanding, yet their performance degrades significantly under real-world visual corruptions. While existing robustness enhancement approaches exist, they are limited: black-box feature alignment lacks interpretability, and white-box text-based reasoning cannot restore lost pixel-level details. This work investigates a fundamental research question: Can MLLMs recover corrupted visual content by themselves? To address this, we propose Robust-U1, a novel framework that equips MLLMs with explicit visual self-recovery capability for robust understanding. The approach comprises three core stages: supervised fine-tuning for initial reconstruction, reinforcement learning with dual rewards (pixel-level SSIM and semantic-level CLIP similarity) for aligning high visual quality, and multimodal reasoning that jointly considers both the corrupted input and the recovered image. Extensive experiments demonstrate that Robust-U1 achieves state-of-the-art robustness on the real-world corruption benchmark and maintains superior performance under adversarial corruptions on general VQA benchmarks. Analysis confirms that high-quality visual recovery directly enhances reasoning performance, establishing self-recovery as a critical mechanism for robust visual understanding. The source code is available at https://github.com/jqtangust/Robust-U1.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08063v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiaqi Tang, Jianmin Chen, Youyang Zhai, Wei Wei, Runtao Liu, Mengjie Zhao, Xiangyu Wu, Qingfa Xiao, Qifeng Chen</dc:creator>
    </item>
    <item>
      <title>Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2606.08064</link>
      <description>arXiv:2606.08064v1 Announce Type: new 
Abstract: Humans exhibit remarkable motor agility, enabling a wide range of dynamic skills such as running and jumping, which highlights the great potential of humanoid robots for athletic locomotion. Among athletic sports, long rope skipping requires two rope turners to cooperatively swing the rope while adapting to a player under different jumping rhythms, making it a meaningful yet challenging task for humanoid robots. Although existing methods for humanoid sports have achieved success in single-agent and interaction-free settings, such as running, dancing, and parkour, task scenarios that require precise coordination among multiple participants remain largely unexplored. To this end, we propose Marope, a multi-agent reinforcement learning (MARL) framework for cooperative long rope skipping with multiple humanoid robots. Specifically, Marope adopts a hierarchical reinforcement learning framework for policy training. At the lower level, it learns decentralized rope manipulation policies through MARL, while at the upper level, a centralized scheduling policy is trained to coordinate the execution of the lower-level policies. To improve generalization across different player behavioral styles, Marope further incorporates diverse jumping policies into cooperative game training. We evaluate our approach on Unitree G1 humanoid robots in both simulation and real-world settings. Experimental results demonstrate that Marope outperforms various baselines, achieving more efficient and stable rope manipulation as well as more robust and adaptable cooperation with varied players.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08064v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zihao Wang, Shijie Peng, Kerui Wu, Yu Huang, Ruiqi Xue, Dong Liu, Tian Xu, Lei Yuan, Yang Yu</dc:creator>
    </item>
    <item>
      <title>Beyond Homophily: Towards Generalized Graph Reconstruction Attack and Defense</title>
      <link>https://arxiv.org/abs/2606.08067</link>
      <description>arXiv:2606.08067v1 Announce Type: new 
Abstract: Graph neural networks (GNNs) are widely deployed on relational data, yet they can leak sensitive or proprietary information about the training graph adjacency, e.g., social ties, transactions, and interactions. This work studies graph reconstruction attacks (GRA), a form of model inversion that reconstructs the training adjacency from a trained GNN, given different levels of attacker-side information. We first provide a systematic characterization of when and why adjacency becomes recoverable through features, labels, embeddings, and predictions, with leakage modulated by graph homophily, heterophily, and the model's inductive bias. Motivated by these findings, we view GNN inference through a Markov chain approximation lens, treating the layered forward computation as a chain of topology-dependent representations. Building on this view, we develop complementary attack and defense methods. On the attack side, we propose MC-GRA (+), which reconstructs the adjacency by optimizing a surrogate adjacency whose GNN-induced representations align with those of the target model at each layer. On the defense side, we propose MC-GPB (+), which suppresses adjacency-dependent information throughout the representation chain while aiming to preserve classification accuracy under a privacy-utility trade-off. Experiments across homophilic/heterophilic graph benchmarks and GNNs show that our attacks improve reconstruction fidelity over prior methods, while our defenses reduce reconstruction success with only minor accuracy loss.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08067v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhanke Zhou, Bo Han, Xuan Li, Jiangchao Yao, Sanmi Koyejo, Michael K. Ng</dc:creator>
    </item>
    <item>
      <title>DICE: Entropy-Regularized Equilibrium Selection for Stable Multi-Agent LLM Coordination</title>
      <link>https://arxiv.org/abs/2606.08068</link>
      <description>arXiv:2606.08068v1 Announce Type: new 
Abstract: Multi-agent large language model (LLM) systems often fail to reliably outperform a single strong model equipped with best-of-N sampling. We argue that a core source of this instability is ill-posed equilibrium selection: current systems specify what information agents share, but not which coordination convention should be selected. We formalize a broad class of such systems as discounted incomplete-information Markov games and show that two common pathologies, oscillation between competing conventions and drift across them, can both induce unstable learning and linear Bayesian regret. To obtain a well-posed target, we introduce the Heterogeneous Quantal Response Equilibrium (HQRE), an entropy-regularized equilibrium concept with agent- and state-dependent temperatures. Under a monotonicity condition, HQRE is unique, admits linearly convergent mirror updates, and yields bounded Bayesian regret; the same condition yields rollout-measurable stability diagnostics. We instantiate this objective in two algorithms: DICE-PC, which coordinates frozen models through prompt-control actions, and DICE-FT, which performs parameter-efficient mirror fine-tuning. Across eleven benchmarks in four domains, DICE improves accuracy-cost trade-offs over strong within-class baselines; on reasoning and planning tasks, DICE-PC improves by 4.3 percentage points on average and DICE-FT by 8.5 points.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08068v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yi Xie, Zhanke Zhou, Chentao Cao, Bo Liu, Bo Han</dc:creator>
    </item>
    <item>
      <title>SurgiQ: A Large-Scale Multi-Domain Benchmark for Evaluating Surgical Understanding in Large Language Models</title>
      <link>https://arxiv.org/abs/2606.08071</link>
      <description>arXiv:2606.08071v1 Announce Type: new 
Abstract: Reliable evaluation of large language models in surgery remains underdeveloped. Broad medical benchmarks test clinical knowledge, while surgery requires procedural reasoning, management trade-offs, negation handling, and selection among plausible operative decisions. We present SurgiQ, a text-only, source-grounded benchmark of 13,055 four-option multiple-choice questions spanning six surgical domains and four question formats: case-based, reasoning, best-option, and negative. SurgiQ is constructed from surgical textbooks, open-access papers, and examination material using a multi-stage generation, verification, and expert-audit pipeline. We evaluate 35 open-weight LLMs under a unified log-likelihood protocol. Our results show substantial remaining headroom: smaller models often remain near the 25\% random baseline, while the best model reaches 68.1\% accuracy. General-purpose models, especially Qwen2.5, outperform most biomedical models, suggesting that current medical specialization does not yet provide sufficiently broad surgical coverage. Calibration and error analysis further show that even strong models make confident mistakes on clinically plausible distractors, motivating more reliable and broader surgical LLM evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08071v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ayah Al-Naji, Edoardo Fazzari, Saif Alkindi, Hamdan Alhadhrami, Preslav Nakov, Cesare Stefanini</dc:creator>
    </item>
    <item>
      <title>"I understand your perspective": LLM Persuasion and Sycophancy through the Lens of Communicative Action Theory</title>
      <link>https://arxiv.org/abs/2606.08076</link>
      <description>arXiv:2606.08076v1 Announce Type: new 
Abstract: Large Language Models (LLMs) can generate high-quality arguments, yet their ability to engage in nuanced and persuasive communicative actions remains largely unexplored. This work explores the persuasive potential of LLMs through the framework of J\"urgen Habermas' Theory of Communicative Action. It examines whether LLMs express illocutionary intent (i.e., pragmatic functions of language such as conveying knowledge, building trust, or signaling similarity) in ways that are comparable to human communication. We simulate online discussions between opinion holders and LLMs using conversations from the persuasive subreddit ChangeMyView. We then compare the likelihood of illocutionary intents in human-written and LLM-generated counter-arguments, specifically those that successfully changed the original poster's view. We find that all three LLMs effectively convey illocutionary intent -- often more so than humans -- potentially increasing their anthropomorphism. Further, LLMs craft sycophantic responses that closely align with the opinion holder's intent, a strategy strongly associated with opinion change. Finally, crowd-sourced workers find LLM-generated counter-arguments more agreeable and consistently prefer them over human-written ones. These findings suggest that LLMs' persuasive power extends beyond merely generating high-quality arguments. On the contrary, training LLMs with human preferences effectively tunes them to mirror human communication patterns, particularly nuanced communicative actions, potentially increasing individuals' susceptibility to their influence.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08076v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <arxiv:DOI>10.18653/v1/2025.findings-acl.793</arxiv:DOI>
      <arxiv:journal_reference>Findings of the Association for Computational Linguistics: ACL 2025</arxiv:journal_reference>
      <dc:creator>Esra D\"onmez, Agnieszka Falenska</dc:creator>
    </item>
    <item>
      <title>Support Vector Rubrics: Closing the Gap Between Self-Generated and Human Rubrics</title>
      <link>https://arxiv.org/abs/2606.08077</link>
      <description>arXiv:2606.08077v1 Announce Type: new 
Abstract: Rubric-based evaluation is a promising paradigm for judging large language model (LLM) outputs, yet self-generated rubrics lag human-annotated criteria on hard instances. We argue this discriminative gap reflects an objective mismatch: self-generated rubrics describe good responses, whereas effective criteria must discriminate between close candidates. To close this gap, we introduce SVR (Support Vector Rubrics), a framework that recasts rubric construction as max-margin boundary learning over preference data. SVR mines contrastive features from preference pairs into a rubric bank, learns a prompt-conditioned selector together with global rubric weights, and iteratively refines the bank through support-pair selection and adversarial probing of hard negatives. At inference, given only the prompt, SVR retrieves the top-rubrics from the bank and scores responses. On RubricBench, SVR narrows the gap to human reference rubrics from 24.1 to 0.3 points and outperforms strong self-rubric and judge baselines, and the learned bank transfers across judges without retraining. On RewardBench 1&amp;2, and RM-Bench, it remains competitive with dedicated reward models, demonstrating broader reward modeling capability. Overall, boundary-defining rubrics offer a principled route to closing the discriminative gap in LLM evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08077v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mengyuan Sun, Yu Li, Zhuohao Yu, Shikun Zhang, Wei Ye</dc:creator>
    </item>
    <item>
      <title>On Low-Bit Quantization Errors in Speaker Verification: Diagnostic and Mitigation</title>
      <link>https://arxiv.org/abs/2606.08078</link>
      <description>arXiv:2606.08078v1 Announce Type: new 
Abstract: Although low-bit quantization provides practical means to deploy speaker verification on resource-constrained devices, its effects on speaker verification performance remain poorly understood. In this paper, we study uniform K-means quantization-aware training of ResNet-36 and ResNet-200 through joint layer-wise and score-level analyses. Our layer-wise analysis highlights fragile components and shows that score degradation is not fully explained by weight distortion alone. We identify a clear knee point at 2 bits, with larger score drift and harmful decision flips concentrated near the FP32 threshold. Our score-level analysis reveals where and how score errors emerge under extreme quantization. Building on these findings, we propose a calibrated multi-precision cascade that resolves most trials at 2 bits and escalates only ambiguous cases, achieving performance close to FP32 while preserving the efficiency benefits of low-bit inference with substantially lower compute and memory costs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08078v1</guid>
      <category>cs.SD</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hugo Leguillier, Driss Matrouf, Guillaume Lechien, Mickael Rouvier</dc:creator>
    </item>
    <item>
      <title>Aligned but Not Partner-Specific: Distinguishing How Multimodal LLM Agents Succeed in Reference Games Without Human-Like Conventions</title>
      <link>https://arxiv.org/abs/2606.08081</link>
      <description>arXiv:2606.08081v1 Announce Type: new 
Abstract: Repeated reference games test whether interlocutors replace their initially long descriptions with shorter, partner-specific conventions grounded in shared interaction history. Prior work shows that multimodal LLMs fail to become more efficient across rounds, although they align on the labels they use. How can we determine whether this alignment reflects partner-specific grounding rather than a shared task vocabulary? We address this question by comparing capable multimodal agent dyads with human dyads from the KTH Tangrams corpus. Our novel methodological contribution is a constrained pseudo-dyad baseline that matches the original referential task structure, but breaks partner history. This baseline enables us to test whether the observed label alignment depends on interaction with a specific partner. Across three analytic layers (task competence, description strategy, alignment dynamics), we find clear differences. Humans reduce effort through entrainment, compressing descriptions and increasing label alignment with partners. Agents instead maintain fixed effort levels, producing verbose descriptions from round one, with near-ceiling label overlap that is statistically indistinguishable between real and pseudo dyads. MLLMs thus achieve coordination without convention, succeeding by verbose description rather than by forming the compact, history-dependent referring expressions characteristic of human dialogue.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08081v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Po-Ya Angela Wang, Chinmaya Mishra, Asl{\i} \"Ozy\"urek, Paula Rubio-Fern\'andez, Esam Ghaleb</dc:creator>
    </item>
    <item>
      <title>When Can Phasor-Domain Device Models Be Trusted for Electromechanical Stability Analysis of Grid-Forming Converter-Dominated Microgrids?</title>
      <link>https://arxiv.org/abs/2606.08082</link>
      <description>arXiv:2606.08082v1 Announce Type: new 
Abstract: Grid-forming (GFM) converter-dominated microgrids are often analyzed using reduced-order phasor-domain electromechanical GFM models, but the validity of these models is often taken for granted. Assuming ideal inner-loop tracking (IILT) of terminal-voltage references, these models neglect the inner-loop and filter dynamics at the electromagnetic-transient (EMT) timescale to simplify stability analysis. This paper argues that such neglected dynamics can destabilize the system, invalidating the stability conclusions drawn from the IILT model. To address this cross-timescale stability issue, we formulate the validity of the IILT stability conclusion as a robust-stability certification problem. The EMT-induced model mismatch between the reduced-order converter model and the actual converter model is represented as a structured uncertainty embedded around the IILT feedback loop. This yields a frequency-resolved interaction index and a structured singular-value sufficient certificate for determining when the stability conclusion of the IILT model can be certified with respect to a prescribed EMT uncertainty weight. The uncertainty weight can be obtained from detailed EMT models or terminal reference-response measurements. Case studies confirm that the proposed certificate correctly certifies model validity and identifies the loss of trustworthiness. We also demonstrate that the measurement-based uncertainty weights closely match the model-based ones, which enables deployment without accessing inner-loop models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08082v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhongze Li, Xiaoyu Peng, Xi Ru, Zhaojian Wang, Jianxin Zhang, Yingshang Liu, Feng Liu</dc:creator>
    </item>
    <item>
      <title>Positive Instantial Neighbourhood logic</title>
      <link>https://arxiv.org/abs/2606.08083</link>
      <description>arXiv:2606.08083v1 Announce Type: new 
Abstract: Instantial neighbourhood logic is a modal language for neighbourhood frames in which formulas can express information about the kinds of worlds occurring inside a neighbourhood of a given world. In this paper, we study a positive, negation-free version of instantial neighbourhood logic with two primitive instantial modalities, one of box-type and one of diamond-type. Since classical negation is not available, the two modalities are treated independently. We introduce the language and proof system of positive instantial neighbourhood logic (PINL) and interpret it over persistent two-sided neighbourhood models. We then define a typed persistent neighbourhood semantics, used as an auxiliary canonical semantics to control witness and co-witness conditions. This yields a truth lemma and the corresponding completeness result of PINL. On the algebraic side, we introduce \(2\)-$\mathrm{DLIO}$s, bounded distributive lattices equipped with two families of instantial operations, as the algebraic semantics of PINL. We prove algebraic soundness and completeness via the Lindenbaum \(2\)-$\mathrm{DLIO}$. Finally, we construct the canonical bitopological PINL-space and show that the algebra of its admissible positive opens is isomorphic to the Lindenbaum \(2\)-$\mathrm{DLIO}$. Thus the paper gives a canonical admissible-open representation of positive instantial neighbourhood logic, providing a first step toward a future duality theory.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08083v1</guid>
      <category>cs.LO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Litan Kumar Das, Anupam Khanra, Sujit Kumar Sardar</dc:creator>
    </item>
    <item>
      <title>Assessing the Energy and Carbon Emissions of Neural Speaker Verification Model in Training and Inference</title>
      <link>https://arxiv.org/abs/2606.08087</link>
      <description>arXiv:2606.08087v1 Announce Type: new 
Abstract: Deep-learning speaker verification (SV) increasingly relies on deep neural network backbones, whose environmental impact remains largely undocumented. In this paper, we conduct an evaluation of ResNet architectures trained on VoxCeleb2, varying depth, channel width, and stage distribution, and measure energy consumption and carbon footprint using node-level sensors. Results show a clear point of diminishing returns: deeper or wider models bring only marginal accuracy gains while energy consumption grows steeply. In contrast, mid-sized networks such as ResNet-50 and stage-concentrated variants achieve favorable trade-offs between performance and environmental impact. These findings provide actionable guidelines for designing energy-efficient SV systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08087v1</guid>
      <category>cs.SD</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hugo Leguillier, Driss Matrouf, Guillaume Lechien, Mickael Rouvier</dc:creator>
    </item>
    <item>
      <title>ConSteer-RL: Steering Reasoning Capabilities in Large Language Models via Confidence-Aware Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2606.08088</link>
      <description>arXiv:2606.08088v1 Announce Type: new 
Abstract: Reinforcement Learning from Verifiable Rewards (RLVR) has recently become a key paradigm for improving the reasoning abilities of Large Language Models (LLMs), yet it remains limited by sparse binary rewards and its ignorance of model-internal uncertainty. In this paper, we propose ConSteer-RL, a simple yet effective framework that integrates token-level confidence signals derived from model log-probabilities into RLVR training. Specifically, building upon the Group Relative Policy Optimization (GRPO) framework, we construct a confidence-aware reward by aggregating per-token probabilities into a scalar confidence score and incorporating it into an awareness-based reward shaping mechanism that penalizes overconfident errors while reinforcing correct and confident reasoning. Experimental results demonstrate that ConSteer-RL consistently outperforms strong GRPO baselines, achieving average improvements of 2.3%-4.0% across different model scales.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08088v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Qing Miao, Yiming Zhao, Jing Yang, Chenxi Liu, Yuehai Chen, Yuewen Liu, Shaoyi Du, Badong Chen</dc:creator>
    </item>
    <item>
      <title>Fast LLM-Based Semantic Filtering: From a Unified Framework to an Adaptive Two-Phase Method</title>
      <link>https://arxiv.org/abs/2606.08090</link>
      <description>arXiv:2606.08090v1 Announce Type: new 
Abstract: Evaluating a natural-language yes/no predicate over a document corpus under an accuracy target - the semantic filter - is a cornerstone of LLM-based data processing. Calling the LLM on every document (the oracle) is prohibitive, so cascades pair the oracle with a fast proxy. As deployed today, they leave four limitations on the table. (1) Each cascade family - model-free clustering, prebuilt small-LLM proxies, online-trained proxies - commits to a single representation and pipeline, and wins on only a narrow query regime. (2) The strongest online proxy invests in a custom training scheme on a bi-encoder over dense embeddings, missing the token-level evidence richer predicates require. (3) The proxy is trained against binary yes/no labels, wasting the LLM's per-document confidence at the boundary documents it most needs to learn. (4) Existing calibrations add a uniform safety margin, conflating genuine proxy uncertainty with small-sample noise and inflating cascade cost.
  We address these by (1) composing families adaptively - model-free clustering first, online proxy only when needed, with oracle calls shared across phases; (2) replacing the cosine bi-encoder with a hybrid of off-the-shelf token-aware models; (3) training the proxy with the oracle's per-document confidence as a soft label; and (4) a calibration that adds the safety margin only where the labeled sample is sparse. We are also the first to use the oracle's per-document confidence for three purposes: a query-level difficulty compass, a lower bound on the minimum oracle calls any proxy-based cascade can make, and the proxy's soft training label.
  At a 90% accuracy target on three 10K-document corpora, our methods are 1.6-2.0x faster than the best prior method per corpus and meet the target on 95% of queries; the BER-derived lower bound indicates a further ~4-20x of headroom for future work.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08090v1</guid>
      <category>cs.DB</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kyoungmin Kim, Martin Catheland, Anastasia Ailamaki</dc:creator>
    </item>
    <item>
      <title>VideoWeaver: Evaluating and Evolving Skills for Agentic Long Video Generation</title>
      <link>https://arxiv.org/abs/2606.08091</link>
      <description>arXiv:2606.08091v1 Announce Type: new 
Abstract: Recent agent frameworks such as Claude Code, Codex, and OpenClaw are strong at tool use and orchestration, but whether they can handle long video generation, a long-horizon multimodal task, remains underexplored. Unlike earlier video agents whose pipeline is handcrafted, these frameworks can build and refine their own workflows. We introduce VideoWeaver, an agent harness and benchmark that evaluates and evolves skills for long video generation, where an agent turns a single instruction into a long video by composing foundation skills into its own workflow rather than following a predefined pipeline. The benchmark has 16 task categories and 285 cases, with references spanning text, image, audio, video, and their combinations. Because errors can arise at any stage and not just in the final video, we propose an agent-as-judge that inspects both the execution trace and the final video, grounding its scores in evidence such as metadata and intermediate files. Using this feedback, we further design a skill evolution algorithm that refines and merges the agent's skills. Across multiple frameworks and models, we find that an explicit composition skill improves the generation process over using foundation skills alone, that skill evolution further improves output quality, and that performance varies notably across harness and model choices. The proposed agent-as-judge also aligns well with human judgments, especially on process metrics. Code and dataset is available at https://github.com/JianhuiWei7/VideoWeaver</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08091v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jianhui Wei, Jie Tan, Hengchuan Zhu, Xiaotian Zhang, Yan Zhang, Ziyi Chen, Daoan Zhang, Wei Xu, Zuozhu Liu</dc:creator>
    </item>
    <item>
      <title>When Languages Disagree: Self-Evolving Multilingual LLM Judges</title>
      <link>https://arxiv.org/abs/2606.08092</link>
      <description>arXiv:2606.08092v1 Announce Type: new 
Abstract: Multilingual LLM-as-a-judge is widely used to evaluate model outputs across languages, but suffers from cross-lingual inconsistency (Fu and Liu, 2025). Existing methods typically treat this inconsistency as noise and mitigate it through voting or aggregation. In this work, we instead show that multilingual inconsistency can provide complementary evaluation signals. Our oracle analysis finds that sampling judgments across languages yields a higher performance upper bound than single-language judging, indicating that different languages potentially include complementary judgments. Motivated by this finding, we propose SEMJ, a self-evolving multilingual judge that leverages cross-lingual inconsistency for iterative refinement. SEMJ constructs multilingual variants of each input, collects independent judgments and rationales, and feeds inconsistent outputs back for self-reflection and re-evaluation. Experiments on multiple benchmarks show that SEMJ consistently outperforms voting and reflection baselines in both accuracy and cross-lingual consistency. Further analysis shows that inconsistency triggers useful re-evaluation, which improves judgment quality.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08092v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiyan Fu, Wei Lu</dc:creator>
    </item>
    <item>
      <title>A Multi-modal Agentic Co-pilot for Evidence Grounded Computational Pathology</title>
      <link>https://arxiv.org/abs/2606.08093</link>
      <description>arXiv:2606.08093v1 Announce Type: new 
Abstract: Pathology is the cornerstone of modern medicine, where accurate decision-making relies heavily on evidence-based practices. While artificial intelligence (AI) has the potential to transform clinical workflows, the intersection of AI and evidence-based medicine remains under-explored, with primitive attempts restricted to text-only general medicine. In this work, we present PathPocket, a multimodal AI agentic co-pilot designed specifically for evidence grounded pathology. We construct the most comprehensive pathology evidence corpus to date, encompassing approximately 110,472 public and authorized documents structured across a rigorous hierarchy of evidence from clinical guideline to expert opinion. From this meticulously graded foundation, we build a large-scale multimodal pathology hypergraph containing over 4.55 million entities and 7.10 million relations. Serving as a robust knowledge engine, this hypergraph provides traceable evidence for a collaborative multi-agent reasoning framework integrating input understanding, evidence retrieval, filtering, and diagnosis generation. This enables PathPocket to seamlessly resolve a wide spectrum of clinical tasks, ranging from text-only queries to complex multimodal diagnostics involving region-of-interest (ROI) and gigapixel whole-slide images (WSIs). We rigorously evaluate the system on a multidimensional benchmark of over 200,000 real-world cases, where it significantly outperforms existing state-of-the-arts. Crucially, extensive user studies demonstrate that PathPocket substantially improves the diagnostic accuracy and confidence of pathologists. By directly grounding pathology interpretations in verifiable literature, PathPocket offers a practical and scalable solution for the future of evidence grounded computational pathology.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08093v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Zhe Xu, Zhengyu Zhang, Zhiyuan Cai, Jiahao Xu, Yijie Lin, Ziyi Liu, Junlin Hou, Hongyi Wang, Yuxiang Nie, Ling Liang, Yihui Wang, Yingxue Xu, Ronald Cheong Kin Chan, Li Liang, Hao Chen</dc:creator>
    </item>
    <item>
      <title>vla.cpp: A Unified Inference Runtime for Vision-Language-Action Models</title>
      <link>https://arxiv.org/abs/2606.08094</link>
      <description>arXiv:2606.08094v1 Announce Type: new 
Abstract: Vision-Language-Action (VLA) policies are typically shipped as Python/PyTorch stacks that assume a workstation-class GPU, a mismatch for the hardware on which robots actually run. We present vla.cpp, a portable C++ inference runtime built on llama.cpp. To our knowledge, it is the first ggml-class engine to natively serve the flow-matching and diffusion VLA inference pattern, in which a cached vision-language prefix is consumed by a cross-attending action expert integrated over several solver steps. A single runtime serves seven architectures spanning five backbone and four action-head families behind one request/response protocol, with each model packaged as a self-contained bundle. On LIBERO-Object, the engine matches a state-of-the-art checkpoint to within one episode out of 200, and runs BitVLA at 100% success in 1.3 GiB of memory. The same bundle runs unchanged across three hardware tiers, from a consumer GPU down to an 8 GB embedded module. A cross-hardware roofline analysis shows that batch-1 VLA inference is compute-bound, so utilization rather than bandwidth is the deployment lever; an IMMA ladder GEMM derived from this analysis cuts BitVLA per-step latency by 4.5x. We then frame an on-robot stress test on an ALOHA arm that isolates the latency constraint under which a learned VLA must replan against a moving target on the hardware it was trained for. Code, demo videos, and the reproducible benchmark scaffold are available at https://fai-modelopt-tech.github.io/vla-cpp.github.io/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08094v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Khanh D. Nguyen, Hung T. Ho, Chinh T. Nguyen, Thanh Q. Duong, Linh D. Le, Duy M. H. Nguyen, Vien A. Ngo, An T. Le</dc:creator>
    </item>
    <item>
      <title>Strain localization in softening plasticity without modifying standard constitutive models: a deformable Cosserat approach</title>
      <link>https://arxiv.org/abs/2606.08095</link>
      <description>arXiv:2606.08095v1 Announce Type: new 
Abstract: This paper presents a formulation for strain localization in softening plasticity based on a deformable Cosserat model. The approach enables the direct use of standard elastoplastic constitutive models formulated for a classical Cauchy continuum, without modifying the stress update algorithm or consistent tangent operator. A key feature of the framework is the strict separation of dissipative and energetic mechanisms: all dissipation is confined to the macro-continuum, while the micro-continuum contributes only through linear elastic terms associated with the director field. As a result, the constitutive structure of the elastoplastic model is preserved, and existing models can be employed as black-box components. The internal length scale arises naturally from the micro-continuum and governs the development, interaction and selection of localization patterns, rather than acting as a diffusive parameter. The formulation is easy to implement within standard finite element frameworks, requiring only additional linear contributions to the residual and tangent operators. The performance of the approach is assessed through benchmark problems involving shallow foundations on soil, a demanding test due to complex and unstable localization mechanisms. Both Tresca and Matsuoka-Nakai plasticity models are considered, including cases with highly unstable post-peak responses. Numerical results show convergence of load-displacement responses, dissipated energy and shear-band patterns upon mesh refinement, even in the presence of nonlinear interacting localization processes. These findings demonstrate a robust and physically consistent approach for the analysis of strain localization in softening plasticity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08095v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Andrea Panteghini, M. B. Rubin</dc:creator>
    </item>
    <item>
      <title>Identifying unique developers in OSS projects: A family of models</title>
      <link>https://arxiv.org/abs/2606.08096</link>
      <description>arXiv:2606.08096v1 Announce Type: new 
Abstract: Organizational and logical coupling metrics require reliable identification of unique developers. In OSS, commit metadata is limited to names and emails, and the same developer may appear under multiple aliases, which can distort coupling measurements if de-duplication is missing. We aim to build a scalable and accurate pipeline for OSS developer de-duplication and to provide guidance on choosing a model based on precision vs. computational effort. We use Indel similarity as a baseline, then run an LLM-assisted matching process with manual validation to create a large dataset of duplicate identities. Using this dataset, we train and compare classical ML models of different complexity, evaluating precision along with training and inference time and energy. We expect a high-quality dataset and a benchmark of approaches that clarifies which solutions offer the best trade-off between accuracy and cost for large-scale OSS mining.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08096v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ruoyu Su, Alexander Bakhtin, Matteo Esposito, Davide Taibi, Valentina Lenarduzzi</dc:creator>
    </item>
    <item>
      <title>When Does Delegation Beat Majority? A Delegation-Based Aggregator for Multi-Sample LLM Inference</title>
      <link>https://arxiv.org/abs/2606.08098</link>
      <description>arXiv:2606.08098v1 Announce Type: new 
Abstract: Majority voting over sampled answers is the dominant unsupervised aggregator for multi-sample LLM inference. We show that piping the signals every sample carries into a delegation-based aggregator (Propagational Proxy Voting, PPV) yields an unsupervised consensus rule that beats majority on MMLU-Pro by +1.5 pp overall and +2.24 pp on the non-trivial subset (paired McNemar p ~ 1.0e-14, n = 8,099). Majority discards two free signals every sample carries: within-group letter entropy and between-group reasoning geometry. PPV exposes two per-voter levers that consume exactly these signals: WHEN (how much weight a voter keeps on its own pick) and WHOM (how it splits the remainder across peers). We drive WHEN with letter entropy and WHOM with per-question-centered embedding cosine. The method needs no gold labels and no auxiliary training: per question, we partition 128 sampled generations into 16 groups, compute each group's letter-level semantic entropy and reasoning embedding centroid, and feed both into a stochastic delegation matrix whose stationary distribution selects the consensus answer. We walk through an example in which PPV overturns a clear 10-6 majority for the wrong letter: the 10-voter majority cluster is geometrically incoherent (mean within-cluster cosine -0.02) while the 6-voter minority is tight (+0.26), so propagated delegation mass concentrates on the minority's answer even though entropy alone would keep the majority ahead. We further report delegation strategies with negative results that constrain the design space for unsupervised LLM aggregation: no within-question ensemble of confidence modes closes the oracle gap.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08098v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yasushi Sakai, Allen Song, Kent Larson</dc:creator>
    </item>
    <item>
      <title>Cybernetic Android Avatar "Yui": System Integration, Field Deployment, and Evaluation</title>
      <link>https://arxiv.org/abs/2606.08099</link>
      <description>arXiv:2606.08099v1 Announce Type: new 
Abstract: Remote communication technologies have become widely used; however, supporting a sense of shared physical space and conveying rich non-verbal cues remain challenging in many social interaction scenarios. This study presents "Yui," a full-body cybernetic android avatar designed to integrate operator-side immersive teleoperation with interlocutor-side human-like social signaling. Yui combines a 55-degrees of freedom full-body mechanism with a previously developed android head, facial expression and gaze control, upper-body and arm motion, hand actuation, and a mobile platform. It can be operated through either the immersive mode using a head mounted display-based interface or desktop mode using a webcam-based interface. We evaluated the system through three real-world deployments: a long-term public exhibition at Expo 2025 in Osaka, Kansai, Japan; a remote educational exchange between elementary school students; and a public interaction study with general participants. During the Expo deployment, two units accumulated approximately 1131 h of operation, demonstrating both operational feasibility and maintenance challenges. In the public study, both operators and interlocutors reported positive impressions of co-presence and willingness to use the system. Interlocutors also rated the avatar positively in terms of human likeness and the transmission of emotions and intentions. The results indicate usability for general operators while suggesting room for improvement in precise controllability. These findings provide field-derived evidence and design implications for socially deployable full-body android avatars.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08099v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kaoruko Shinkawa, Mizuki Nakajima, Taisei Mogi, Yoshihiro Nakata</dc:creator>
    </item>
    <item>
      <title>Constraint-Aware Optimization for Robust Protein Stability Prediction</title>
      <link>https://arxiv.org/abs/2606.08100</link>
      <description>arXiv:2606.08100v1 Announce Type: new 
Abstract: Multimodal $\Delta\Delta G$ predictors integrating protein language models with inverse-folding representations achieve strong in-distribution accuracy on the Megascale dataset but exhibit limited robustness on out-of-distribution (OOD) proteins, persistent forward-reverse bias on paired-mutation benchmarks, and under-representation of rare stabilizing mutations. Existing approaches address these limitations primarily through additional architectural components, leaving optimization-level intervention comparatively underexplored. We introduce a constraint-aware optimization framework combining Balanced Mean Squared Error, a Siamese anti-symmetric regularizer, and a novel OOD-margin consistency loss on the per-position feature representation, requiring no architectural changes to the SPURS backbone. Across eleven benchmarks and three random seeds, the framework improves Spearman correlation on S669 from 0.486 to 0.540 ($\sigma=0.002$ across seeds), matching the published SPURS baseline (0.50) without architectural modification, and on S461 from 0.653 to 0.711, with consistent smaller gains on five additional OOD datasets. A controlled diagnostic on Ssym reveals that anti-symmetric training does not eliminate systematic forward-reverse bias, indicating that gains arise through implicit regularization rather than exact thermodynamic constraint enforcement.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08100v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>A Shivram, Aneesh S. Chivukula, Manik Gupta, Sourav Chowdhury</dc:creator>
    </item>
    <item>
      <title>Continual Quadruped Robots Coordination via Semantic Skill Discovery</title>
      <link>https://arxiv.org/abs/2606.08102</link>
      <description>arXiv:2606.08102v1 Announce Type: new 
Abstract: Multi-quadruped coordination has attracted increasing attention due to its enhanced payload capacity, broader contact coverage, and improved adaptability to challenging tasks. Existing methods for multi-quadruped manipulation typically focus on predefined or closed task families, often relying on multi-agent reinforcement learning (MARL) to train task-specific coordination policies. However, such methods struggle in open-ended continual learning settings, where tasks arrive sequentially and robots are expected to acquire new coordination skills while reusing previously learned ones without catastrophic forgetting. To address this challenge, we propose Conquer, a semantic skill-library framework that formulates continual multi-quadruped coordination as a retrieve-adapt-update process. First, to accommodate varying team sizes across tasks, we design a team-structured Self-Allies-Goal (SAG) backbone that supports variable-cardinality robot teams by explicitly modeling each robot's own state, teammate context, and task goal. For each incoming task, Conquer constructs a task-level semantic descriptor from pre-execution information and retrieves a relevant skill from the library for adaptation. After successful execution, Conquer updates the skill library by extracting trajectory-level semantic descriptors and organizing them according to semantic distance, thereby enabling continual skill accumulation and cross-task knowledge transfer. Simulation experiments show that Conquer achieves a final average success rate of 95.6%, demonstrating strong forward transfer and negligible catastrophic forgetting. Real-world rollouts on Unitree Go2 teams further validate the deployment feasibility of Conquer for practical multi-quadruped coordination. Simulation and real-robot demonstration videos are available at: https://conquer-project.pages.dev/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08102v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Daoqing Wang, Yuchen Xiao, Weixuan Huang, Zhilong Zhang, Shenghua Wan, Meng Li, Lei Yuan, Yang Yu</dc:creator>
    </item>
    <item>
      <title>Revisiting Articulated Parts Perception in Robot Manipulation</title>
      <link>https://arxiv.org/abs/2606.08103</link>
      <description>arXiv:2606.08103v1 Announce Type: new 
Abstract: We are surrounded by various objects with movable, articulated parts, e.g., box, handle, door. An accurate and generalizable perception of articulated parts is essential to enhance robotic manipulation capabilities. Building on this need, recent efforts in articulated parts perception have followed two main directions: One line of work uses pose-based representation, which requires high manual cost; in parallel, affordance-based methods extract future object motion from point tracking without additional manual efforts, but suffer from low-quality data. In this paper, we propose a new representation of articulated parts, Geometric Primary Structure (GPS), an abstraction of the part geometry structure to balance scalability and quality. For efficient and scalable data collection, GPS is integrated with a portable Virtual Reality (VR) device and requires only one minute to annotate one object sequence. This direct human annotation provides higher quality than the estimated affordance. With this efficient VR-GPS system, we collect 41K frames for 234 objects across six part classes, and train a generalizable GPS model with a single RGB-D object image as input. For object manipulation, we deploy a heuristic policy based on GPS prediction. Without any in-domain fine-tuning, our method achieves an 73% success rate, covering 270 initial states for 9 objects. Our code, data and reusable tool are available at https://enlighten0707.github.io/gps.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08103v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Xiaoqian Wu, Yejie Guo, Xiaoyang Chen, Lixin Yang, Cewu Lu, Yong-Lu Li</dc:creator>
    </item>
    <item>
      <title>Reinforcement learning in linear embedding space unlocks generalizable control across soft robot configurations</title>
      <link>https://arxiv.org/abs/2606.08104</link>
      <description>arXiv:2606.08104v1 Announce Type: new 
Abstract: Soft-bodied organisms such as octopuses and elephant trunks exhibit remarkable morphological adaptability, dynamically reconfiguring body shape and stiffness, and flexibly adjusting their control strategies to enable versatile behaviors. Inspired by these biological systems, various soft robots have emerged in recent decades, featuring diverse materials, stiffnesses, and morphologies tailored to specific tasks. Despite substantial advances in the materials and structural designs of soft robots, developing a generalizable control framework capable of rapid adaptation across diverse configurations remains a long-standing challenge. Existing controllers are limited to fixed configurations, demanding laborious configuration-specific remodelling and policy redesign for new configurations. Here, we introduce a generalizable control system that enables rapid adaptation across diverse soft robot configurations via reinforcement learning in a shared linear Koopman embedding space. By encoding robot dynamics into this embedding space, our method decouples control policies from specific morphologies, allowing real-time, model-free policy adaptation across diverse configurations without retraining from scratch. We validate our system across 33 distinct robot configurations. Our system achieves a 75 times reduction in transfer samples across configurations, while sustaining robust performance under high-speed motion, heavy payloads, and multiactuator faults, and achieving real-world skills previously unattainable in soft robotics. This work establishes a unified and adaptable control paradigm for diverse soft robot configurations, bridging mechanical reconfigurability with control flexibility, and may offer broader insights for generalizable control in complex physical systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08104v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Xinglong Zhang, Cong Li, Hangjie Mo, Yue Jiang, Xin Xu, Wei Jiang, Zhenshan Bing, Yihe Yang, Xiaojian Li, Yueneng Yang, Huimin Lu, Ling-li Zeng, Alois Knoll, Dewen Hu, Li Wen, Wei Pan</dc:creator>
    </item>
    <item>
      <title>A Unifying View of Attention Sinks: Two Algorithms, Two Solutions</title>
      <link>https://arxiv.org/abs/2606.08105</link>
      <description>arXiv:2606.08105v1 Announce Type: new 
Abstract: When attention concentrates on a single token, a sink, what is the model actually computing? Attention sinks are ubiquitous in softmax transformers, yet this shared visual signature can hide fundamentally different algorithms. We show that visually similar sink patterns can reflect two distinct mechanisms: {i} adaptive nop, where a head suppresses its update by routing to a null token, and {ii} broadcast, where a sink aggregates and redistributes global information. In that case, sinks serve an analogous role: a safe destination when there is nothing useful to compute. Proposed interventions like gating or registers work because they implicitly target one or the other, revealing a duality between method and assumed mechanism: gating implicitly assumes nop; registers implicitly assume broadcast. Each mechanism leaves distinct traces (nop sinks exhibit negligible value norms; broadcast sinks induce low-rank outputs) which we formalize on synthetic tasks and use to derive practical diagnostics. Applied to pretrained vision transformers, these diagnostics reveal that both mechanisms exist at scale: sinks transition from CLS in early layers to patches in deeper layers, and concentrate in specialized heads. Strikingly, register tokens, designed for broadcast, are repurposed to also serve nop, confirming that neither intervention alone suffices. Combining gating with registers yields complementary gains in stability and performance. Overall, we find that the same attention pattern can reflect two very different computations and effective intervention requires first asking what the model is actually computing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08105v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lukas Fesser, Mozes Jacobs, Thomas Fel, Andy Keller, Sham Kakade</dc:creator>
    </item>
    <item>
      <title>PACE: Anytime-Valid Acceptance Tests for Self-Evolving Agents</title>
      <link>https://arxiv.org/abs/2606.08106</link>
      <description>arXiv:2606.08106v1 Announce Type: new 
Abstract: Self-evolving agents improve by repeatedly proposing changes to their own prompts, skills, or workflows and keeping those that score higher on a small held-out set. Almost all effort has gone into the proposer that generates candidates; we argue the weak point is the acceptor, the rule that decides whether to commit a change. Applied hundreds of times against the same noisy dev estimate, the ubiquitous "keep it if the score went up" rule is uncontrolled adaptive multiple testing: the agent effectively p-hacks itself, accumulating false commits that make it churn and drift rather than improve.
  We recast committing as a sequential hypothesis test and propose PACE (Paired Anytime-valid Commit Evaluation), a training-free, anytime-valid commit gate. Each candidate is compared to the incumbent on identical instances and committed only when a testing-by-betting e-process accumulates decisive evidence, stopping early to save evaluations and controlling each candidate's false-commit probability at a user-set level even under optional stopping (a per-decision guarantee).
  On Qwen2.5 agents (0.5B-3B) self-evolving at the prompt level on GSM8K, SVAMP, and ARC-Challenge, greedy acceptance commits 30-42% false and 10-33% harmful edits when a genuine improvement is hidden among noisy proposals, while PACE commits the real one and essentially nothing else, matching greedy's held-out accuracy at sharply lower variance and about 18% lower evaluation cost. With no real gain available, greedy commits 13-21 spurious self-modifications per run (72-100% false) and degrades the most fragile agent by 4.9 points, while PACE holds at baseline. Reliability of self-evolution depends on the acceptor, not only on the proposer.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08106v1</guid>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zayx Shawn</dc:creator>
    </item>
    <item>
      <title>Ego-Pi: VLA Fine-Tuning for Ego-Centric Human and Robot Data</title>
      <link>https://arxiv.org/abs/2606.08107</link>
      <description>arXiv:2606.08107v1 Announce Type: new 
Abstract: Robotics faces a fundamental challenge of data scarcity. Unlike language or vision research, there is no internet-scale dataset for robotic manipulation. A promising path forward is to leverage egocentric human data, which can be collected more easily, with greater breadth, and at a larger scale. Towards this end, we investigate key design choices for learning across human and humanoid embodiments equipped with dexterous five-finger hands, using the $\pi_{0.5}$ model as a foundation. Our results show that human data enables robots to learn new task semantics and compose existing skills into novel behaviors without corresponding robot data. The paper website is here: https://egopipaper.github.io/</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08107v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Ji Woong Kim, Ke Wang, Zipeng Fu, Sirui Chen, Cong Zhao, Jeff Lai, Chelsea Finn</dc:creator>
    </item>
    <item>
      <title>A Global Convergence Analysis of Consensus ALADIN for Convex Optimization</title>
      <link>https://arxiv.org/abs/2606.08112</link>
      <description>arXiv:2606.08112v1 Announce Type: new 
Abstract: Distributed optimization problems are pervasive in machine learning and optimal control. In this paper, we study smooth strongly convex distributed consensus optimization problems. We present a distributed optimization algorithm for consensus problems based on the Consensus Augmented Lagrangian Alternating Direction Inexact Newton (C-ALADIN) framework. Our algorithm uses an auxiliary variable to decide when to update second-order information, enabling curvature exploitation without sacrificing global convergence. This contrasts with existing C-ALADIN methods, which require constant Hessian approximations and thus lose numerical advantages. Under smooth strong convexity, the algorithm converges globally, and the auxiliary variable converges sublinearly. Numerical experiments on logistic regression show that our algorithm outperforms baseline methods that use either fixed or updated Hessian information.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08112v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xu Du, Shuting Wu, Karl H. Johansson, Apostolos I. Rikos</dc:creator>
    </item>
    <item>
      <title>Conditional Random Ordered Transport Spaces</title>
      <link>https://arxiv.org/abs/2606.08113</link>
      <description>arXiv:2606.08113v1 Announce Type: new 
Abstract: A small Wasserstein distance does not certify that a transformation is admissible. In evidence-constrained, semantic, causal, physical, monotone, or risk-sensitive learning, one must ask not only how far two probability laws are, but whether mass has moved in a direction allowed by available information. We introduce conditional random ordered transport spaces (CROTS), a class of \(L^0\)-valued spaces of random probability measures equipped with a Wasserstein ambient metric, a closed stochastic order, hard and soft ordered transport discrepancies, and a conditional risk functional for evaluating order violation under an evidence sigma-field. The central object is an order-admissible transport geometry for random measure-valued dynamics, distinct from cone-valued metrics, ordered Kantorovich constructions, random Wasserstein spaces alone, and model-specific residuals for generative paths. We develop the foundations of CROTS as a space theory for reliable distributional learning. The results include well-posedness and duality for hard and soft ordered transport, soft-to-hard variational convergence, measurability and completeness of the random lifted space, reductions to classical Wasserstein and ordered geometries, ordered geodesics, constrained barycenters and projections, conditional risk-transport duality, and separation of order-violating distributions. The main stability theorem shows that random learning dynamics may converge in the ambient Wasserstein metric while its local admissibility leakage follows a separate conditional order-risk recursion. The resulting asymptotic order-risk floor provides a mathematical language for evidence overreach, ordered distribution shift, robustness failure, and admissible distributional dynamics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08113v1</guid>
      <category>cs.LG</category>
      <category>math.FA</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lei Luo, Jian Yang</dc:creator>
    </item>
    <item>
      <title>Blockage-Aware Non-stationary Dynamic Bandit for User Association in mmWave V2X Networks</title>
      <link>https://arxiv.org/abs/2606.08118</link>
      <description>arXiv:2606.08118v1 Announce Type: new 
Abstract: In millimeter-wave (mmWave) vehicular networks, dense base station (BS) deployments expand the user association (UA) decision space while dynamic blockages cause link quality fluctuations, posing critical challenges for effective mobility management. Traditional Multi-Armed Bandit (MAB) frameworks assume stationary reward distributions and fail to handle the rapid context-reward mapping shifts caused by vehicle mobility and transient blockages. To address this, we propose Blockage-Aware Non-stationary Dynamic Bandit (BAND), a fully distributed, channel state information (CSI)-free mobility management framework for mmWave vehicular networks, formulating UA as a non-stationary contextual bandit problem, enabling online adaptive optimization without requiring central coordination or offline training. BAND employs a cumulative sum-based change detection (CUSUM-CD) to dynamically narrow the active BS set, reducing exploration overhead while tracking reward distribution shifts. Proactive blockage detection suppresses transient signal degradation in the reward estimation process. Simulations demonstrate over 40% regret reduction and up to 33.1% network communication rate improvement compared with hypercube-based contextual bandit baselines, with robustness validated across varying blockage rates and network configurations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08118v1</guid>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Weiqi Chi, Manabu Tsukada</dc:creator>
    </item>
    <item>
      <title>Policy Description Language for Authorization using Logic-Based Programming</title>
      <link>https://arxiv.org/abs/2606.08119</link>
      <description>arXiv:2606.08119v1 Announce Type: new 
Abstract: Recently, with the impossibility of eradicating the vulnerabilities of information systems, we must prepare for the occurrence of the security incident by the multi-layer defense called the Defense-in-Depth strategy. In the multi-layer defense, it is important to authorize accesses in fine-grained granularity to compose each layer effectively, and many access control models are proposed to follow them. However, policy description languages proposed so far cannot express the models appropriately in proper granularity. In this paper, we propose a policy description language which can designate many kinds of conditions for access control, such as the dynamic status of an application process, as an element of decision data, and implement it in Datalog. Using the proposed language, we compose the policy of SELinux, which is a major implementation achieving the multi-layer defense, and we confirm the advantages of the proposed language by evaluating its validity and expressiveness.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08119v1</guid>
      <category>cs.CR</category>
      <category>cs.OS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Masaki Hashimoto, Mira Kim, Hidenori Tsuji, Hidehiko Tanaka</dc:creator>
    </item>
    <item>
      <title>Trustworthy Visual Predicates for Robust Manipulation Understanding under Degradation</title>
      <link>https://arxiv.org/abs/2606.08121</link>
      <description>arXiv:2606.08121v1 Announce Type: new 
Abstract: Manipulation understanding requires reliable relational evidence, such as contact, support, containment, motion coupling, grasp, release, and active-hand involvement. Although these visual predicates are widely used in event-chain, graph-based, and neuro-symbolic models, their reliability under visual degradation is rarely analyzed directly. This paper introduces a predicate-level reliability framework for robust manipulation understanding under blur, occlusion, illumination change, low resolution, frame dropping, and detection noise. The framework defines a structured predicate vocabulary, confidence-aware predicate estimation, and reliability metrics for predicate preservation, degradation sensitivity, temporal consistency, confidence-weighted stability, and downstream impact. Experiments on controlled manipulation videos and public egocentric or bimanual datasets, including VISOR/EPIC-KITCHENS, H2O, and ARCTIC, show that predicate failures are structured rather than uniform. Static spatial predicates remain comparatively robust, whereas contact-sensitive, dynamic, and derived predicates such as grasp and release are more fragile. Under severe degradation, detection noise, occlusion, and frame dropping cause the strongest reliability losses. Downstream analysis shows that degraded predicates reduce manipulation-understanding accuracy from 0.89 to 0.58, while removing confidence weighting under moderate degradation reduces accuracy from 0.74 to 0.64. These results show that predicate reliability provides a diagnostic layer between visual perception and structured manipulation reasoning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08121v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Fatemeh Ziaeetabar</dc:creator>
    </item>
    <item>
      <title>Think Before You Act: Intention-Guided Reasoning for LLM-Based Location Prediction</title>
      <link>https://arxiv.org/abs/2606.08122</link>
      <description>arXiv:2606.08122v1 Announce Type: new 
Abstract: Predicting a user's next Point-of-Interest (POI) based on their historical check-in records is a fundamental task in location-based services. While recent methods incorporating large language models have shown strong reasoning capabilities and promising results, they typically formulate the prediction task as a one-step trajectory-to-location mapping problem, making predictions prone to shallow trajectory correlations and historical frequency bias. We argue that users rarely choose locations directly and instead, they usually first form a traveling intention and then accordingly select specific POIs. Motivated by this insight, we propose IntentPOI, a two-stage intention-guided reasoning framework. In the thinking stage, we infer users' intermediate intentions by incorporating historical mobility patterns, similar peer behaviors, and the temporal contexts. In the acting stage, we first construct a compact candidate pool, and then perform intention-guided reasoning to identify locations that best align with the inferred intention. By explicitly decoupling intention inference from location prediction, IntentPOI transforms the next POI prediction from direct trajectory matching into intention-guided reasoning. Extensive experiments on three real-world datasets demonstrate that IntentPOI consistently outperforms eleven state-of-the-art baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08122v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Qingxiang Liu, Anqi Liang, Zhuoyang Jiang, Yutian Jiang, Sisuo Lyu, Yu Ji, Haomin Wen, Yuxuan Liang</dc:creator>
    </item>
    <item>
      <title>Human-Centered Benchmarking of Driver Monitoring Models</title>
      <link>https://arxiv.org/abs/2606.08123</link>
      <description>arXiv:2606.08123v1 Announce Type: new 
Abstract: Vision-based driver monitoring systems are increasingly deployed in safety-critical intelligent transportation settings, yet they are almost always compared on classification accuracy alone. This paper argues that accuracy is insufficient to characterize a model's fitness for real-world deployment, and proposes the Human-Centered Benchmarking Framework (HCBF), which evaluates models across four dimensions: accuracy, explainability, efficiency, and robustness. The framework is applied to four representative lightweight architectures, MobileNetV3, ShuffleNetV2, EfficientNet-B0, and DeiT-Tiny, on the MRL Eye Dataset for eye-state classification. While the models are nearly indistinguishable on clean-set accuracy, each leads in exactly one dimension, and all four lie on the Pareto frontier. A Human-Centered Score computed under three deployment-oriented weighting scenarios ranks ShuffleNetV2 first throughout. However, this aggregate winner retains less than half of its performance under sensor noise and fails by classifying closed eyes as open, whereas the transformer remains robust. These findings show that aggregate ranking can mask dimension-specific vulnerabilities that are operationally decisive, underscoring the value of multi-dimensional, human-centered evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08123v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ruben Dario Florez-Zela</dc:creator>
    </item>
    <item>
      <title>Soft Covering via Hypothesis Testing: Typical-Code Exponents and Mismatched Detection</title>
      <link>https://arxiv.org/abs/2606.08124</link>
      <description>arXiv:2606.08124v1 Announce Type: new 
Abstract: We study the typical-code (quenched) behavior of the false-alarm (FA) and missed-detection (MD) error exponents of the Neyman-Pearson test associated with soft covering, complementing the average-code (annealed) analysis that has been carried out in a companion paper [1]. We prove that, as the block-length tends to infinity, for almost every randomly selected fixed-composition codebook, the negative normalized logarithms of both error probabilities converge to their respective average-code exponents. In other words, the error exponents are self-averaging. We then extend the scope and study a mismatched likelihood ratio test that assumes the wrong channel model. Here, we derive the mismatched error exponents, show that self-averaging persists under mismatch, and characterize the degradation. In particular, we characterize the coding rate beyond which the two kinds of error exponents cannot be positive at the same time, which in the matched case, is given by the channel input-output mutual information rate.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08124v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Neri Merhav</dc:creator>
    </item>
    <item>
      <title>One Stone, Three Birds: Self-adaptive Optimal Transport for Multi-VLM Selection, Adaptation, and Ensembling</title>
      <link>https://arxiv.org/abs/2606.08126</link>
      <description>arXiv:2606.08126v1 Announce Type: new 
Abstract: Vision-language models (VLMs) enable visual recognition from semantic class descriptions, which makes them attractive when target annotations are scarce or unavailable. Most deployment pipelines, however, first choose a single VLM and then adapt that model to the unlabeled target set. This single-backbone paradigm hides a critical assumption: the selected VLM is already compatible with the target domain. In realistic cross-domain deployment, several general-purpose and domain-specialized VLMs may be plausible, yet no instance-level target labels are available to identify the reliable ones. Deployment therefore requires a coupled solution for model selection, target adaptation, and prediction integration. We revisit this problem from a system-level multi-VLM perspective. Our central observation is that the three decisions above depend on the same latent object: a trustworthy sample-class structure in the target set. Different VLMs may encode different transfer biases and produce conflicting predictions, but their outputs can still provide complementary evidence for estimating this structure. We propose One Stone, Three Birds, a training-free framework based on self-adaptive optimal transport. Given a pool of frozen candidate VLMs, OSTB estimates a consensus sample-to-class transport plan without updating VLM parameters. The learned transport structure is then reused for all deployment objectives: model selection is performed by ranking the combined semantic and visual reliability induced by the consensus plan; target adaptation is obtained by fitting transport-conditioned visual classifiers; and ensembling is implemented through reliability-aware probabilistic integration. Extensive experiments on natural-image, remote-sensing, and medical-pathology benchmarks show that OSTB improves model ranking, adaptation stability, and ensemble robustness under heterogeneous candidate pools.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08126v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qiyu Xu, Zhanxuan Hu, Yu Duan, Yonghang Tai, Huafeng Li, Quanxue Gao, Xiangyong Cao</dc:creator>
    </item>
    <item>
      <title>Gray-Box Optimization and the Vertex Coloring Problem</title>
      <link>https://arxiv.org/abs/2606.08128</link>
      <description>arXiv:2606.08128v1 Announce Type: new 
Abstract: Gray-box optimization is an approach for making some problem-specific information available to the algorithm while still relying on fitness information as the main guide to an optimum. This approach was shown to be beneficial in various combinatorial optimization tasks and neatly captures the continuum between fully black-box algorithms and tailored algorithms.
  In this work, we discuss different flavors of gray-box algorithms. We show that RLS can find a proper $2$-coloring in a bipartite graph starting from a random $2$-coloring, in an expected time of $\mathcal{O}(n \log n)$. In contrast, when starting from a proper $n$-coloring, the (1+1) EA cannot find such a coloring except when offered additional guiding on plateaus of the search space. Finally, we show the run time for this setting can be much improved by using gray-box operators.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08128v1</guid>
      <category>cs.NE</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Johanna Gasse, Antonia Heinen, Hendrik Higl, Timo K\"otzing</dc:creator>
    </item>
    <item>
      <title>Cross-LLM Consistency in Inference: Evidence from Shared Interactions</title>
      <link>https://arxiv.org/abs/2606.08129</link>
      <description>arXiv:2606.08129v1 Announce Type: new 
Abstract: Large language models (LLMs) differ in architecture, training data, and optimization procedures, yet they may still develop similar internal inference patterns. In this paper, we examine this hypothesis using interaction-based explanations. We find that LLMs often share interaction patterns when predicting the same target token from the same prompt. This consistency is more pronounced among advanced LLMs. Shared interactions also tend to be lower-order and show weaker positive-negative cancellation than non-shared interactions. These results suggest that advanced LLMs may be implicitly optimized toward common inference patterns, even though the mechanisms that give rise to such cross-model consistency remain open.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08129v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Siyu Lou, Yao Yan, Yuntian Chen, Quanshi Zhang</dc:creator>
    </item>
    <item>
      <title>How to be Non-Human : A Thematic Analysis of Animal Embodiment in VR Games</title>
      <link>https://arxiv.org/abs/2606.08130</link>
      <description>arXiv:2606.08130v1 Announce Type: new 
Abstract: This study employs a reflexive thematic analysis to systematically examine the design patterns of 48 first-person Virtual reality (VR) animal avatar games. The research identifies four primary design themes: Animal Biomimicry, Limited Animal Simulation, Hybrid HumanAnimal Features, and Human Behavior with Animal Avatar. The analysis reveals that approximately 77 percent of the games remain grounded in human-centered interaction logic, with animal forms primarily serving as visual representations. The study highlights the core tension between authenticity and usability in current VR animal avatar design, and points toward design opportunities for achieving more authentic animal avatar's interactive experience through directions such as controller innovation, unconventional body mapping, and dynamic feedback. This research provides a thematic classification framework for understanding the representation of non-human perspectives in VR games.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08130v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Siqi Yu, Shuai Liu, Yiqing Tian, Mar Canet Sola</dc:creator>
    </item>
    <item>
      <title>LCAM: A Framework for Diagnosing Interactional Alignment Failures in Con-versational AI</title>
      <link>https://arxiv.org/abs/2606.08131</link>
      <description>arXiv:2606.08131v1 Announce Type: new 
Abstract: Conversational AI is increasingly used for advice, interpretation, reassurance, and decision support in contexts where users may be vulnerable, uncertain, or dependent on the system's apparent competence. Existing alignment work often focuses on model objectives, preference optimization, or output correctness. Yet, many harms arise through interaction: how systems frame authority, express uncertainty, simulate empathy, support reasoning, and make boundaries legible. This paper introduces the Layered Cognitive Alignment Model (LCAM), a conceptual and normative framework for diagnosing interac-tional alignment failures in conversational AI. LCAM defines alignment as a calibrated fit among system behavior, user goals, task demands, and normative context. It distinguishes five layers of fit: perceptual, semantic, affective, cognitive, and ethical, and two diagnostic polarities of misalignment: underfit and overreach. We apply LCAM to a published LLM counseling example, showing how an apparently supportive response can reinforce harmful beliefs, simulate inappropriate care, and obscure role boundaries. By translating conversational failures into audit and governance questions concerning over-reliance, false intimacy, autonomy erosion, boundary confusion, and inappropriate trust, LCAM offers a theoretical and normative lens for evaluating conversational AI beyond accuracy, helpfulness, or trust.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08131v1</guid>
      <category>cs.HC</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Manuele Reani, Hongyu Tian</dc:creator>
    </item>
    <item>
      <title>Phase Marginalization for Patch-Grid Instability in Vision Transformers</title>
      <link>https://arxiv.org/abs/2606.08132</link>
      <description>arXiv:2606.08132v1 Announce Type: new 
Abstract: Vision Transformers operate on fixed patch grids, which can introduce phase-dependent instability for dense prediction: changing the patch partition can change the token evidence available to a pixel, especially near boundaries. We formalize patch-grid phase as a nuisance variable and propose Phase Marginalization, a post-hoc marginalization method that evaluates structured patch-grid phases, inverse-aligns dense outputs, and aggregates them in the original image coordinate system. The central variant, Uniform Phase Marginalization with K = 4, is training-free and improves over the canonical K = 1 baseline across measured segmentation, depth, and local matching settings. In a controlled Cityscapes experiment, Uniform Phase Marginalization provides a modest compute-matched advantage over generic shift-based four-forward test-time augmentation (TTA) (+0.31 mean Intersection-over-Union over the strongest tested generic row). A scaling study further shows that K = 4 is a practical cost-accuracy trade-off: K = 8 is essentially unchanged and K = 16 adds little accuracy at much higher latency. These results position patch-grid phase as a measurable nuisance variable and Phase Marginalization as a simple diagnostic and post-hoc marginalization baseline for dense ViT prediction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08132v1</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>O\u{g}uzhan Ercan</dc:creator>
    </item>
    <item>
      <title>Gravity-guided Contact Dynamics Estimation from 3D Human Motions</title>
      <link>https://arxiv.org/abs/2606.08133</link>
      <description>arXiv:2606.08133v1 Announce Type: new 
Abstract: Ground contact forces acting on the human body, are crucial for biomechanics studies or sport performance analysis. Prior methods rely on force plates or pressure mats to collect ground contact dynamics, limiting their applicability to carefully controlled settings. A more scalable solution is to estimate the dynamics directly from motion capture data. Recent approaches only roughly estimate the ground contact dynamics from the vertical distance between the body and the ground plane, which cannot capture the complex pressure distribution of all contact points. To this end, we propose GraCE -- Gravity-guided Contact Dynamics Estimation, a novel full-body contact dynamics model for human motions using a realistic influence of body mass distribution and gravity. We use the human's center of gravity to estimate the ground contacts based on its relative distance to the human body. The applied force on each contact is estimated via the product of predicted contact probabilities and the total exterior force computed from the center of mass trajectory. We outperform related work on the GroundLink dataset for ground reaction force estimation, and on the MOYO dataset for detailed contact pressure prediction. The code is published upon acceptance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08133v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Cuong Le, Urs Waldmann, Bastian Wandt, M{\aa}rten Wadenb\"ack</dc:creator>
    </item>
    <item>
      <title>TICoder: A Repository-Level Code Generation Framework with Test-Driven Planning and Implementation-Aware Reuse</title>
      <link>https://arxiv.org/abs/2606.08135</link>
      <description>arXiv:2606.08135v1 Announce Type: new 
Abstract: Repository-level code generation with Large Language Models (LLMs) remains challenging, primarily due to complex dependencies and limited context windows. Recent approaches adopt retrieval-augmented generation (RAG) and the planning mechanism to reuse potential callee functions in the repository. However, these approaches often suffer from two limitations: lack of test-driven behavioral guidance during planning and overlooking the implementation logic embedded in repository code during reuse. As a result, generated plans may not align with expected behaviors, and retrieved functions may not be effectively reused. In this paper, we propose TICoder, a novel repository-level code generation framework that improves both planning and reuse. TICoder introduces a test-driven iterative planning mechanism that leverages test cases as behavioral specifications to refine implementation steps. Furthermore, TICoder employs an implementation-aware code reuse strategy, which retrieves potential callee functions using a dual-view similarity that captures both functional and implementation aspects. We then identify relevant usage patterns through a dual-stage selection strategy, combining structure-based clustering and perplexity-based filtering. We conduct extensive experiments on widely used repository-level code generation benchmarks with various LLMs. Experimental results demonstrate that TICoder outperforms state-of-the-art (SOTA) methods, achieving an average improvement of 11.52%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08135v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Siyu Nan, Yaling Luo, Jian Wang, Neng Zhang, Bing Li</dc:creator>
    </item>
    <item>
      <title>Learning Predictive Control with Deep Koopman Operators for Autonomous Vehicle Motion Planning</title>
      <link>https://arxiv.org/abs/2606.08136</link>
      <description>arXiv:2606.08136v1 Announce Type: new 
Abstract: Model Predictive Control (MPC) is widely used for autonomous-vehicle (AV) motion planning, but its real-time applicability is often limited by the need for accurate models and online solution of nonlinear, nonconvex optimization problems in dynamic road environments. Actor-critic reinforcement learning offers a promising alternative for online policy generation, yet its policy-learning process often lacks explicit control-theoretic structure. This article proposes a learning predictive control (LPC) framework with deep Koopman operators for efficient real-time motion planning under nonconvex constraints. To address nonlinear and uncertain vehicle dynamics, a deep-Koopman-based predictor is used to lift the system into an interpretable linear observable space in a data-driven manner. Unlike traditional MPC, which computes open-loop control sequences, the proposed LPC framework yields a closed-loop state-feedback policy within each prediction interval through receding-horizon actor-critic learning. To ensure safety under nonconvex environmental constraints, LPC constructs convex local surrogate representations of obstacles and defines corresponding potential-field functions. These functions and their gradients are directly embedded into the actor-critic structure, enabling efficient, safety-aware policy learning. Extensive simulations and real-world experiments on the HongQi-EHS3 platform demonstrate favorable performance in diverse obstacle-avoidance scenarios in terms of safety, computational efficiency, and driving comfort, compared with benchmark methods such as CBF-MPC and LMPCC.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08136v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Xinglong Zhang, Yongqian Xiao, Haotian Cao, Xing Zhou, Xin Yin, Xin Xu</dc:creator>
    </item>
    <item>
      <title>A Barrier-Modulated Architecture for Safe Affine Formation Control in Second-Order Multi-Agent Systems</title>
      <link>https://arxiv.org/abs/2606.08137</link>
      <description>arXiv:2606.08137v1 Announce Type: new 
Abstract: Affine formation control offers immense flexibility for coordinating multi-agent maneuvers, but guaranteeing the safety of agents under parametric uncertainties remains an open challenge. This paper proposes a novel safe affine formation control framework for second-order multi-agent systems by integrating Higher-Order Control Barrier Functions (HOCBFs) with Adaptive Dynamic Programming (ADP). We introduce a barrier-modulated control architecture that smoothly attenuates the nominal formation tracking objective when agents approach safety boundaries, preventing conflicting control inputs. Within this architecture, two distinct safety controllers are developed: (1) an analytical barrier-gradient repulsive controller that provides a computationally efficient, rigorous mathematical baseline, and (2) a data-driven optimal safety controller. The data-driven approach utilizes an actor-critic neural network to solve the Hamilton-Jacobi-Bellman (HJB) equation online, enabling optimal collision avoidance even in the presence of unknown system parameters. Using Nagumo's theorem and Lyapunov stability analysis, we formally prove that both controllers guarantee the forward invariance of the safe set ensuring absolute collision avoidance while maintaining Uniformly Ultimately Bounded (UUB) formation tracking errors. Finally, simulations validate the theoretical findings and demonstrate the robustness of the proposed controllers in dynamic obstacle avoidance scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08137v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ashik Abrar Naeem, Mohammad Ariful Haque</dc:creator>
    </item>
    <item>
      <title>TRUST-SCF: Transformer-based Risk Understanding and Scoring for Transactional Supply Chain Finance</title>
      <link>https://arxiv.org/abs/2606.08140</link>
      <description>arXiv:2606.08140v1 Announce Type: new 
Abstract: Supply Chain Finance (SCF) and LendTech platforms need credit scoring systems that respond to evolving transaction behavior, repayment delays, and active exposure. We propose TRUST-SCF, a transformer-based framework for transaction-level risk prediction and dynamic credit scoring. Each user history is represented as a sequence of transaction tokens containing utilization, repayment delay and transaction position. The main contributions are: (1) a financially aligned attention bias that combines utilization similarity and recency, enabling the model to compare repayment behavior under comparable exposure conditions; (2) continuous repayment-delay prediction in a log-transformed target space, reducing the influence of extreme delays while improving sensitivity to short-delay behavior and (3) a label-efficient credit-scoring pipeline in which the final credit score is not trained using any explicit external credit-score label, but is instead derived from predicted delay, potential risk over simulated utilization, actual unpaid exposure, and nonlinear calibration. Experiments on real transaction data from more than 300,000 transactions show that TRUST-SCF improves delay prediction over sequential baselines and produces scores that are strongly associated with future repayment behavior. These results suggest that TRUST-SCF is a practical framework for adaptive credit scoring and transaction-level risk mitigation in SCF and LendTech environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08140v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mohammadamin Davoodabadi, Amirabbas Shakeri</dc:creator>
    </item>
    <item>
      <title>IMAGINE: Adaptive Schema-Imagery Enhanced Composition for Composed Video Retrieval</title>
      <link>https://arxiv.org/abs/2606.08144</link>
      <description>arXiv:2606.08144v1 Announce Type: new 
Abstract: Composed Video Retrieval (CVR) is designed to retrieve a target video that matches a reference video modified by a modification text. While existing methods explore cross-modal correspondences, they often assume modified objects appear directly in videos. However, modification texts frequently describe concepts not explicitly presented but implicitly expressed through semantically related visual cues (e.g., "cake" implying "birthday party"). Current approaches typically rely on aligning explicit feature representations within the concrete space, neglecting critical latent associations. To address this, we propose an adaptIve scheMa-ImAGery enhanced composItional NEtwork (IMAGINE). Unlike standard explicit matching, IMAGINE materializes implicit semantics (termed schema imagery) via dynamic multimodal prototypes. These prototypes capture shared latent concepts to adaptively modulate visual features, effectively injecting implicit guidance into the retrieval process. By bridging the gap between explicit visual contents and implicit retrieval intentions, IMAGINE achieves state-of-the-art performance in both CVR and Composed Image Retrieval (CIR) across three widely used benchmarks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08144v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiale Huang, Zixu Li, Zhiwei Chen, Zhiheng Fu, Chunxiao Wang, Yupeng Hu</dc:creator>
    </item>
    <item>
      <title>SAGE: An LLM-driven Self Reflective Agentic Framework for Fraud Detection</title>
      <link>https://arxiv.org/abs/2606.08146</link>
      <description>arXiv:2606.08146v1 Announce Type: new 
Abstract: Fraud detection in payment, e-commerce, and telecommunications systems requires accuracy at the individual level, robustness under severe class imbalance, and ease of understanding for risk managers. Existing methods fall at least one of these requirements: automated machine learning systems search a fixed numerical space without semantic awareness of the dataset; graph neural network-based methods require pre-defined relational graphs and remain opaque at the individual-decision level; and the design of general-purpose large language model (LLM) agents does not consider the recall and precision constraints specific to real-world fraud detection. In this paper, we propose SAGE, the first end-to-end LLM-driven multi-agent framework for fraud detection. SAGE coordinates three dedicated agents that make decisions based on a six-layer Data Diagnostic Tree (DDT) and a Markov decision process guided by natural-language gradients, automatically optimizing the model under a fraud-specific reward. On five fraud datasets and five LLM backbones, SAGE wins $96.00\%$ of method--dataset comparisons and improves F1 by an average of $40.86\%$ over baselines. The code is available at https://github.com/yichenC1c/SAGE.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08146v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yichen Chen, Siying Li, Yuhang Liang, Lijun Wang, Renyang Liu</dc:creator>
    </item>
    <item>
      <title>Property-Informed Diffusion-Based Text-to-Microstructure Generation</title>
      <link>https://arxiv.org/abs/2606.08150</link>
      <description>arXiv:2606.08150v1 Announce Type: new 
Abstract: Designing 3D metamaterial microstructures that meet the intended functions remains a major challenge, as it typically requires domain expertise, iterative simulations, and extensive manual tuning. Existing work on inverse design that automatically generates microstructures based on desired target properties often suffers from limited design diversity and faces challenges in ensuring the physical feasibility of the generated structures. To address this issue, a property-informed diffusion-based network is proposed that enables the generation of 3D microstructures directly from textual descriptions. Unlike traditional property conditioning methods, our approach leverages rich guidance in terms of semantics and physical properties in the text input to support diverse structure synthesis. To enforce consistency between the generated structures and the target textual prompts, a dual alignment strategy is adopted, including contrastive text-structure alignment and test-time reward-guided alignment. Experimental results show that the model is capable of generating semantically meaningful and physically plausible structures across a wide range of material categories. Our approach has good potential for interactive microstructure design and opens up new directions for combining language-based interfaces with inverse material discovery. Code is available at: https://github.com/hongsong-wang/PropDiff-TMG</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08150v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bingxuan Dai, Hongsong Wang, Jie Gui</dc:creator>
    </item>
    <item>
      <title>Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents</title>
      <link>https://arxiv.org/abs/2606.08151</link>
      <description>arXiv:2606.08151v1 Announce Type: new 
Abstract: Tool-using LLM agents often fail not because relevant text is absent, but because decisive evidence is not selected, compressed, or surfaced at action time. We present CICL, a decision-aware context layer that turns instance evidence into a context graph, routes deterministic, Opus-assisted, Qwen, Codex/GPT-5.5, and Qwen-QLoRA judgments through a shared eight-field schema, scores units by action shift, outcome uplift, necessity, and negative-transfer risk, and packs high-utility evidence as typed memory cards for a budgeted agent. The design separates the measured decision signal from the judge model, so frontier annotation, local surrogates, and lightweight rankers can be compared under one auditable protocol. Empirically, CICL yields a concrete open-benchmark gain while exposing its limits. On 50 SWE-bench Verified file-retrieval instances, direct Qwen3.6-plus reranking of BM25 top-50 candidates raises hit@1 from 0.58 to 0.78 and MRR@10 from 0.634 to 0.790, with all 2,500 judgments parseable. Controlled diagnostics show action-criticality: at budget 120, CICL reaches F1 0.620 on v1 and 0.425 on v3, and removing the top-utility semantic v3 unit collapses F1 to 0.000. Supplementary checks add Qwen-QLoRA agreement over 710 candidates, a small 200-label real-code Opus-assisted signal, and a three-instance patch smoke validating retrieval-to-patch plumbing without claiming official SWE-bench success. RepoBench-R summaries still beat cards, and compact rankers do not yet replace the heuristic. CICL contributes a reproducible measurement and selection layer for decision-critical context, not an end-to-end coding-agent repair claim.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08151v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xinyu Guan, Qianyang Zhao, Yuming Deng</dc:creator>
    </item>
    <item>
      <title>Vision-Guided Dual-Arm Humanoid Robotic Disassembly of End-of-Life 18650 Lithium-ion Battery Packs</title>
      <link>https://arxiv.org/abs/2606.08152</link>
      <description>arXiv:2606.08152v1 Announce Type: new 
Abstract: The growing volume of retired lithium-ion battery packs from electric vehicles and portable electronics calls for automated disassembly that is safe, flexible, and selective down to the individual cell. Existing robotic systems, however, mostly assume known pack poses, external fixtures, or specialised tooling, leaving fixture-free cell-level disassembly under pose uncertainty largely unsolved. This paper presents a vision-guided dual-arm pipeline that disassembles a 21-cell 18650 pack from an arbitrary initial pose using only general-purpose parallel-jaw grippers, RGB-D sensing, and a pre-trained grasp detector. Pose uncertainty is absorbed by a learn-and-filter perception stack with discrete look-and-move wrist-camera corrections, while a mid-task support transfer between the two arms extends the effective workspace without any external clamp. The pipeline achieves an 8/10 end-to-end success rate, a cell-localisation root-mean-square error of $2.4$\,mm, and a mean cycle time of 6.0\,minutes per pack, providing a practical, fixture-free building block for industrial battery recycling.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08152v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yile Chen, Zhihao Liu, Xi Vincent Wang, Lihui Wang</dc:creator>
    </item>
    <item>
      <title>LogNEO: A GPT-Neo Reinforcement Learning Framework for Accurate Real-Time Log Anomaly Detection</title>
      <link>https://arxiv.org/abs/2606.08153</link>
      <description>arXiv:2606.08153v1 Announce Type: new 
Abstract: Detecting anomalies in large-scale system logs is critical for the reliability and security of modern computing infrastructure. We present LogNEO, a log anomaly detector built on EleutherAI's GPT-Neo (1.3B parameters) and fine-tuned with a novel partial-credit, exponentially decaying position-aware reward scheme combined with cross-entropy regularisation via Proximal Policy Optimisation (PPO). The position-aware reward explicitly models prediction difficulty: early positions receive higher rewards for correct predictions, while later positions incur stronger penalties for errors. LogNEO attains F1-scores of 0.927, 0.913, and 0.984 on the HDFS, BGL, and Thunderbird benchmarks, improving recall by up to 6 percentage points over the prior state-of-the-art LogGPT while maintaining comparable precision. A production microservice deployment over Apache Kafka, Redis, and TensorRT-accelerated inference demonstrates 45 ms end-to-end latency at 15,000 events per second.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08153v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>David Eje, Tanmay Sharma, Khush Patel, Manuel Mazzara, Leonard Johard</dc:creator>
    </item>
    <item>
      <title>SynthICL: Scalable In-context Imitation Learning with Synthetic Data</title>
      <link>https://arxiv.org/abs/2606.08154</link>
      <description>arXiv:2606.08154v1 Announce Type: new 
Abstract: In-context imitation learning (ICIL) enables robots to learn new tasks from a small number of demonstrations by conditioning a pre-trained policy on task-specific examples, without retraining at test time. Despite this promise, training generalizable and scalable in-context imitation policies remains an open challenge. We present SynthICL, a scalable framework that trains ICIL policies entirely from RGB-only synthetic data. Specifically, we build a data generation pipeline to produce high-fidelity ICIL data and train a flow-matching transformer policy on the resulting dataset. SynthICL avoids the need for depth sensing, precise camera calibration, and real-world training data in prior approaches, offering a simpler and more scalable alternative. We further incorporate subgoal prediction by training the model to predict the next subgoal images, enabling more precise and visually grounded control. Evaluated on 16 unseen real-world manipulation tasks, SynthICL achieves an average success rate of 79% with only one demonstration provided at test time and outperforms prior methods. Project page: https://synth-icl.github.io</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08154v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Cheng Qian, Ruomeng Fan, Yifei Ren, Yilong Wang, Edward Johns</dc:creator>
    </item>
    <item>
      <title>Have I Solved This Before? Retrieving Similar Segmentation Problems for Evolutionary Learning</title>
      <link>https://arxiv.org/abs/2606.08155</link>
      <description>arXiv:2606.08155v1 Announce Type: new 
Abstract: Reliable integration and solid configuration of monitoring systems constitute a fundamental prerequisites for achieving high efficiency and productivity in contemporary manufacturing environments. Design decisions on sensor type and system architecture have to be made at an early stage and under comparably high uncertainty. This work investigates a research direction that deviates from the traditional monitoring-system development process by shifting the attention from algorithm design to a deeper analysis of the inspection problem. In contrast to traditional design cycles, this paper proposes to gradually collect knowledge and store it in an abstract system model. This enables the retrieval of similar solutions for future use cases, preventing the need for expensive model training from scratch and allowing instead for the incremental refinement of existing base configurations. Reuse of previously generated pipelines reduces the risk of late and costly revisions. As there is little knowledge on cross-domain transferability of filter pipelines, this study analyzes the potential of retrieving filter pipelines to transfer them to different but similar segmentation problems. Finally, we statistically analyze the benefits of this `transfer learning' variant which is predominantly applied to image segmentation problems. In addition, we discuss how simple models help balancing the trade-off between complexity, technical requirements, and reliability in the design process.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08155v1</guid>
      <category>cs.LG</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Andreas Margraf, Henning Cui, J\"org H\"ahner</dc:creator>
    </item>
    <item>
      <title>RAPID: Layer-Wise Redundancy-Aware Pruning and Importance-Driven Token Merging for Efficient ViT</title>
      <link>https://arxiv.org/abs/2606.08156</link>
      <description>arXiv:2606.08156v1 Announce Type: new 
Abstract: Vision Transformers (ViTs) achieve strong performance but suffer from high computational costs due to quadratic self-attention complexity. Although token reduction techniques such as pruning and merging mitigate this, they typically overlook how representations evolve across network depth. We propose RAPID, a depth-aware token reduction framework that adapts reduction strategies to the layer-wise characteristics of token representations. The primary methodological contribution is a bifurcated strategy: in shallow-to-middle layers, RAPID employs a redundancy-similarity aware pruning metric to eliminate over-represented local patterns. As features transition to global semantic concepts in deeper layers, the framework shifts to an importance-similarity aware merging mechanism. This stage leverages classification (CLS) token attention weights to protect semantically critical tokens while fusing less important but similar neighbors. Empirical validation on ImageNet-1K using ViT and DeiT architectures demonstrates that RAPID establishes a superior accuracy-compression Pareto frontier compared to plug-and-play baselines such as ToMe and ToFu. RAPID is particularly robust in aggressive compression regimes, achieving up to 4.29% higher accuracy than ToMe at extreme reduction rates. Our framework provides a training-free template for optimizing vision models by aligning reduction strategies with hierarchical feature evolution.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08156v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kyumin Choi, Ikbeom Jang</dc:creator>
    </item>
    <item>
      <title>Cross Paraphrastic Invariance Learning for Hallucination Detection</title>
      <link>https://arxiv.org/abs/2606.08157</link>
      <description>arXiv:2606.08157v1 Announce Type: new 
Abstract: Large language models (LLMs) frequently generate hallucinations, which are unsupported by a source document. To avoid costly LLM-as-evaluator pipelines and the heavy annotation demands of existing classifiers, we propose CPIL (Cross Paraphrastic Invariance Learning), a two-stage Siamese framework that maximizes the utility of existing labeled data. Concretely, CPIL constructs informative training pairs by: (i) generating paraphrastic views of each document-claim example as positives, and explicitly aligning their representations to enforce invariance to surface form; and (ii) mining same-document, opposite-label pairs as hard negatives to sharpen document-sensitive decision boundaries. Then CPIL conduct a two-stage model training: Stage 1 performs contrastive pretraining to learn a paraphrase-invariant, grounding-aware embedding space; and Stage 2 attaches a lightweight classifier for binary groundedness. On the LLM-AggreFact benchmark (11 tasks), CPIL surpasses strong baselines concerning F1 scores with only ~1% labeled data, showing its prediction superiority and label efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08157v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shanshan Lin, Dongsheng Hong, Sibo Ju, Chao Chen, Sihong Xie, Xiangwen Liao</dc:creator>
    </item>
    <item>
      <title>Constrained Paraphrase Consistency for LLM Hallucination Detection</title>
      <link>https://arxiv.org/abs/2606.08158</link>
      <description>arXiv:2606.08158v1 Announce Type: new 
Abstract: Large language models (LLMs) can generate factually inconsistent claims, motivating accurate and scalable hallucination detectors. Prior work largely enlarges training sets via synthesis or new annotations, introducing increasing cost and potential bias while underusing the consistency implied by semantically equivalent paraphrases. We propose Consistency-Constrained Hallucination Detector (CCHD), which formulates training as a constrained optimization problem. The standard cross-entropy on original document-claim pairs is complemented by (i) paraphrase-consistency constraints bounding divergence across paraphrased views, and (ii) label-preservation constraints tying paraphrases to ground truth. We solve the problem by gradient descent-ascent over model parameters and per-view Lagrange multipliers, adding only a few scalar dual variables and no inference-time overhead. With DeBERTa and Flan-T5 backbones, CCHD consistently outperforms strong baselines (FactCG, MiniCheck, and AlignScore) on standard factuality benchmarks, demonstrating its superiority on hallucination detection.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08158v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shanshan Lin, Dongsheng Hong, Sibo Ju, Chao Chen, Xi Zhang, Xiangwen Liao</dc:creator>
    </item>
    <item>
      <title>AttentionCap: Transformer Based Capacitance Matrix Learning Toward Full-Chip Extraction</title>
      <link>https://arxiv.org/abs/2606.08161</link>
      <description>arXiv:2606.08161v1 Announce Type: new 
Abstract: As capacitance extraction accuracy of rule-based pattern matching becomes difficult to sustain at advanced nodes, a growing trend emerges to develop deep-learning-based 2D capacitance models. However, existing MLP- and CNN-based methods constrain their input to fixed metal-layer combinations in a specific process node, limiting their usability in practice. Recognizing the inherent similarity between capacitance matrix and the prevailing attention mechanism, we propose AttentionCap, a customized Transformer for capacitance matrix learning, with a Gram representation framework, a physics-aligned symmetric-attention output layer, and a novel normalized Laplacian loss. We also introduce a process-node embedding to enable multi-node learning. Trained on synthetic data, AttentionCap attains 0.67\%/3.99\% self/coupling-capacitance error on unseen real designs under a multi-layer and multi-node setting, surpassing the CNN-Cap baseline with 4.6$\times$/5.7$\times$ lower self/coupling error and 192$\times$ faster inference speed. A pretrained AttentionCap accurately transfers to an unseen node with only 5K samples and 4K finetuning steps. With sufficient accuracy on unseen real designs and strong transferability to new process nodes, AttentionCap offers highly practical value for modern EDA workflows. Code and data are available at https://github.com/THU-numbda/AttentionCap.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08161v1</guid>
      <category>cs.LG</category>
      <category>cs.AR</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiechen Huang, Hector R. Rodriguez, Dingcheng Yang, Zuochang Ye, Yibo Lin, Wenjian Yu</dc:creator>
    </item>
    <item>
      <title>Silent Failure in LLM Agent Systems: The Entropy Principle and the Inevitable Disorder of Autonomous Agents</title>
      <link>https://arxiv.org/abs/2606.08162</link>
      <description>arXiv:2606.08162v1 Announce Type: new 
Abstract: Large Language Model (LLM) agent systems suffer from failures that occur without external triggers -- no injection, no adversarial input, no resource
  exhaustion. These silent failures -- unexpected deviations from intended behavior under normal conditions -- are routinely misattributed to bugs or
  configuration errors. Through systematic analysis of over 40,000 controlled trials and long-term production observations spanning 100,000+ agent
  interactions, we identify a common structural logic underlying these failures. Building on patterns observed in our experiments, we survey the
  global research literature on autonomous agent reliability and synthesize 22 intrinsic properties of LLM agent systems across six lifecycle layers:
  foundation semantics, inter-agent transmission, memory persistence, task execution, feedback correction, and systemic evolution. We demonstrate that
  whenever a sufficient subset of these properties co-exist, system entropy -- the measurable accumulation of disorder: loss of output consistency,
  task accuracy, and cross-session coherence -- increases monotonically with interaction rounds. We formalize this as the Entropy Principle: S(t) = S0
  * e^(alpha * t), with alpha measured empirically across multiple architectures. We propose the PIG (Physical Integrity Gate) Engine with the ADE
  (Agent Delivery Engineering) protocol suite as an engineering countermeasure to entropy-driven disorder. Our findings establish silent failure not
  as a bug to be fixed but as a manifestation of Intelligence Entropy -- a physical constraint to be managed through deterministic governance. We argue
  that any engineering effort stabilizing the structure and order of agent systems participates in a unified mission: keeping intelligent systems
  reliable as they grow in scale and complexity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08162v1</guid>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dexing Liu</dc:creator>
    </item>
    <item>
      <title>How Much MRI Preprocessing Is Enough? A Cost-Utility Study for Brain MRI Foundation Models</title>
      <link>https://arxiv.org/abs/2606.08164</link>
      <description>arXiv:2606.08164v1 Announce Type: new 
Abstract: MRI preprocessing defines the input distribution seen by brain MRI foundation models, yet it is usually treated as routine data cleaning rather than a modeling choice. We ask how much preprocessing is worth its computational cost for self-supervised 3D MRI pretraining. Keeping the corpus, 3D ViT backbone, masking protocol, and downstream evaluations fixed, we compare a graded P0-P7 preprocessing spectrum for masked autoencoding (MAE) and joint-embedding predictive learning (JEPA) on 20,000 heterogeneous brain MRI volumes, then transfer the encoders to IDH prediction, MCI classification, brain age regression, and GLI/PED tumor segmentation. The results do not support a simple "more is better" rule. P0/P1 are numerically unstable, making P2 the lowest-cost feasible level; beyond P2, choosing the best feasible preprocessing level improves aggregate utility by only 3.4 percentage points for MAE and 1.8 percentage points for JEPA, with most paired gains statistically unresolved. Stronger preprocessing is beneficial only in selected regimes: IDH improves modestly, AGE and GLI/PED are often near or best at P2, and MCI shows the clearest empirical P7 gain. Cross-level MCI transfer further shows that much of the P7 advantage can be recovered by applying stronger preprocessing downstream, without requiring P7 throughout pretraining. These findings recast MRI preprocessing as a downstream-aware cost-utility decision rather than a default escalation pipeline. Code is available at https://github.com/PangJiangShuan/PreBrain.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08164v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiangshuan Pang, Wangyang Tang, Jing Yan, Zhixuan Cheng, Youzhe He, Zhenkun Zhuang, Tao Zhou, Shiping Liu</dc:creator>
    </item>
    <item>
      <title>Explaining Data Mixing Scaling Laws</title>
      <link>https://arxiv.org/abs/2606.08167</link>
      <description>arXiv:2606.08167v1 Announce Type: new 
Abstract: Recent research has established empirical scaling laws to predict model performance on multi-domain data mixtures. However, a theoretical understanding of these model loss behaviors remains absent. In this work, we propose a unified framework to explain the underlying mechanics of data mixing. Our approach extends theoretical perspectives originally developed for standard neural scaling laws (e.g., Kaplan and Chinchilla) to the multi-domain setting. Based on the distributional assumption that domains overlap on fundamental skills while diverging on specialized skills, we identify two key factors that govern the domain losses of models trained on different data mixtures: \textit{Capacity Competition}, where the allocation of finite model capacity couples domain losses globally, and \textit{Noise Reduction}, where optimal weights shift toward harder-to-learn domains to minimize overall noise. Empirical evaluations show that our framework outperforms existing baselines by fitting the loss landscape with a lower Mean Relative Error and identifying higher-performing training mixtures. Most importantly, our model successfully extrapolates across scales, predicting highly effective mixtures for large, unseen scales using parameters fitted on smaller ones. In addition, our model achieves these results using significantly fewer parameters compared to previous empirical laws. Our code is available at https://github.com/meiqwq/Explaining-Data-Mixing-Scaling-Laws.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08167v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Rui Dai, Shuran Zheng</dc:creator>
    </item>
    <item>
      <title>Closing the Sim-to-Real Gap: An Evaluation Framework for Autonomous Cyber Defense Configuration of Commercial EDR</title>
      <link>https://arxiv.org/abs/2606.08168</link>
      <description>arXiv:2606.08168v1 Announce Type: new 
Abstract: Leading commercial endpoint detection and response (EDR) products have shifted from operator-configured rule sets to multi-component systems where autonomous AI components operate alongside, and increasingly in place of, operator-deployed policies. Autonomous defense agents using commercial EDR as their hardening tool are no longer tuning a passive tool, but a black-box autonomous system capable of making vendor-specific decisions. We present the first evaluation framework for autonomous defense agents hardening commercial EDR. We instantiate it in a Game of Active Directory (GOAD) lab with Horizon3.ai's NodeZero as the autonomous pentester and Microsoft Defender XDR as the EDR. We run a sample benchmark of defense agents with two large language model (LLM) backbones (Claude Sonnet 4.6 and Cisco Foundation-Sec-8B). We report three lessons learned that neither simulation nor open-source-EDR evaluation can surface: (i) commercial EDR telemetry is engineered for Security Operations Center (SOC) analyst workflows rather than scientific benchmarking; (ii) the importance of per-policy attribution to separate defense agent actions from autonomous EDR actions; and (iii) the EDR's autonomous behavior varies during the evaluation window. Together, these findings highlight a sim-to-real gap for enterprise defense and motivate evaluation methodology for benchmarking autonomous defense agents in environments with black-box, autonomous tools.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08168v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Kerri Prinos, Lilianne Brush</dc:creator>
    </item>
    <item>
      <title>CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning</title>
      <link>https://arxiv.org/abs/2606.08169</link>
      <description>arXiv:2606.08169v1 Announce Type: new 
Abstract: Enabling robots to understand and execute tasks from natural language commands while maintaining data efficiency remains challenging. Foundation models such as vision-language-action (VLA) and vision-language models (VLMs) provide intuitive interaction channels but require extensive data; task-parameterized imitation learning achieves data efficiency but lacks natural language grounding. This work bridges this gap through a modular architecture combining task-parameterized kernelized movement primitives (TP-KMPs) with pretrained VLMs. During learning, skills are acquired from 2 to 5 kinesthetic demonstrations, and the VLM generates skill schemas describing each skill's parameters and preconditions. During execution, the VLM interprets commands to select skills, reason about parameter bindings, and create novel behaviors through covariance-weighted composition. When no skill or composition suffices, the system identifies capability gaps and requests targeted demonstrations, all without fine-tuning. Validation on a 7-DoF manipulator shows success rates of 73.3%-100% in scenarios requiring skill selection, composition, and active learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08169v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.HC</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Markus Knauer, Valentin Gieraths, Tai Mai, Samuel Bustamante, Alin Albu-Sch\"affer, Freek Stulp, Jo\~ao Silv\'erio</dc:creator>
    </item>
    <item>
      <title>Learning from Human Driving: A Human-in-the-Loop Online Behavior Cloning Framework for Autonomous Driving</title>
      <link>https://arxiv.org/abs/2606.08170</link>
      <description>arXiv:2606.08170v1 Announce Type: new 
Abstract: With the evolution of large foundation models (LFMs), data-driven autonomous driving has made significant strides. However, existing paradigms still face severe challenges in complex interaction and long-tail scenarios due to distribution shift and causal confusion. These limitations often result in a lack of human-level decision-making flexibility and safety in extreme conditions. To overcome this limitation, this paper proposes a Human-in-the-Loop Online Behavior Cloning frame work (HiL-OBC) for autonomous driving, which aims to deeply integrate the cross-modal perceptual capabilities of LFMs with the high-level driving intelligence of human experts. Specifically, HiL-OBC deployment is executed through three critical phases: policy initialization with human intervention, latent behavioral modeling with Bayesian policy adaptation, and online deploy ment and updates. Furthermore, we design a Multi-modal Online Behavior Cloning (MOBC) model, which optimizes the base driving policy online through a lightweight network architecture, a takeover trigger mechanism, and a multi-variant loss function, thereby enhancing the system's decision-making robustness in complex environments. We evaluated the HiL-OBC on the LangAuto-Human CARLA benchmark. Experimental results demonstrate that the driving policies optimized via the human-in-the-loop mechanism achieve substantial performance gains: the DS of StructNav, LFG, and LMDrive increased by 47.25%, 31.59%, and 32.12%, respectively, with a simultaneous of various experimental settings and key components highlights the advantages of human-in-the-loop learning in improving decision-making robustness and overall driving performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08170v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Yuhong Shi, Jianyi Liu, Lihang Sun, Li Li, Xudong Dong</dc:creator>
    </item>
    <item>
      <title>The Governance of Human-LLM Interaction: Safety Gating, Civility Steering, and Affective Default Lock-In</title>
      <link>https://arxiv.org/abs/2606.08172</link>
      <description>arXiv:2606.08172v1 Announce Type: new 
Abstract: Large language models (LLMs) increasingly mediate high-stakes interactions in finance, medicine, and mental-health support, yet users have limited control over how these systems communicate. We frame interaction style as a governance object: provider-side alignment not only blocks harmful content, but also stabilizes communicative defaults that shape users' epistemic distance, relational expectations, and capacity to opt out of emotionalized or anthropomorphic interaction. We introduce a deterministic multi-agent evaluation pipeline for measuring prompt steerability and style drift in long-horizon dialogue. The study replays 100 frozen user-only scripts across four domains and three runnable persona conditions: default, sarcastic, and cold, using three generator models, yielding 90,000 assistant replies scored by a human-calibrated LLM judge on harmfulness, negative emotion, inappropriateness, empathic language, anthropomorphism, and refusal behavior. A fourth harmful persona is evaluated separately as a safety-gating test. The paper contributes a reproducible method for quantifying whether prompt-specified styles remain stable over time and a governance framework distinguishing safety gating, civility steering, and affective default lock-in. Overall, we show that prompt steerability and regression-to-default are observable indicators of provider control over communicative form, with implications for pluralism, autonomy, and democratic agency in human-LLM interaction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08172v1</guid>
      <category>cs.HC</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Manuele Reani, Hongjian Zhang, Hongyu Tian</dc:creator>
    </item>
    <item>
      <title>AI-Native Closed-Loop Security for 6G-Enabled Cyber-Physical Systems: From Edge Detection to Network-Wide Mitigation</title>
      <link>https://arxiv.org/abs/2606.08173</link>
      <description>arXiv:2606.08173v1 Announce Type: new 
Abstract: In sixth-generation (6G) networks, billions of cyber-physical systems (CPSs) - autonomous vehicles, smart grids, industrial robots, and remote-surgical equipment - will run over ultra-reliable low-latency slices, collapsing the gap between a remote breach and physical harm to milliseconds, a budget perimeter firewalls and centralised security operations centres cannot meet. This survey reframes 6G CPS security as a closed-loop, AI-native pipeline that senses at the multi-access edge computing (MEC) tier, using minute-scale call-detail records (CDRs) for baseline learning and sub-millisecond RAN/Open-RAN (O-RAN) telemetry for the latency-critical path. It decides locally with compressed deep models, mitigates network-wide via SDN, NFV, and O-RAN controllers, and retrains through federated learning (FL) and digital-twin (DT) replay. We formalise a per-slice, tail-bounded latency contract on the sense, detect, and mitigate stages, enforced at a slice-dependent tail percentile (p99 for safety-critical URLLC slices). Organising 128 peer-reviewed studies (2017-2026) under a PRISMA 2020 protocol, we (i) map the 6G/CPS threat surface to MITRE ATT&amp;CK and a CDR-observable feature space; (ii) unify edge anomaly detection and DDoS classification across twelve datasets and statistical, graph, and transformer models; (iii) synthesise SDN/NFV/O-RAN primitives into one closed-loop reference architecture; (iv) treat FL, large language models (LLMs), DT, post-quantum cryptography (PQC), zero-trust architecture (ZTA), and explainable AI as cross-cutting enablers, not parallel pillars; and (v) consolidate open problems into five directions spanning data, latency, trust, standardisation, and evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08173v1</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bilal Hussain, Muhammad Bilal, Tan Li, Haris Pervaiz, Xiao Tang, Qinghe Du, Fawad Ahmad, Muhammad Azhar, Jun Zhang</dc:creator>
    </item>
    <item>
      <title>Superdirectivity as a Spectral-Collision RKHS Limit</title>
      <link>https://arxiv.org/abs/2606.08174</link>
      <description>arXiv:2606.08174v1 Announce Type: new 
Abstract: We develop a reproducing-kernel Hilbert space interpretation of array superdirectivity based on spectral-collision limits and polynomial jet geometry. As the spacing of an $M$-element linear array tends to zero, the exponential family generated by a linear array undergoes a spectral collision, and the associated finite-dimensional subspaces converge in reproducing kernel to a polynomial jet space. Array gain equals the diagonal evaluation of the reproducing kernel, and the $M^2$ endfire law emerges from endpoint asymptotics of the Christoffel-Darboux kernel. Unlike classical derivations that rely on near-singular optimization, the present approach separates array gain limits from numerical conditioning, and identifies superdirectivity as a geometric boundary concentration phenomenon: Christoffel function collapse at the hard edge is a factor of $M$ faster than in the interior. The quadratic scaling is tied specifically to the flat $L^2([-1,1])$ geometry; alternative RKHS geometries admit different concentration scalings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08174v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hong Yang</dc:creator>
    </item>
    <item>
      <title>Quaternion Maximum-Volume Submatrix Selection with Applications to Multichannel Imaging and Visual Data</title>
      <link>https://arxiv.org/abs/2606.08175</link>
      <description>arXiv:2606.08175v1 Announce Type: new 
Abstract: Low-rank approximation based on selected rows and columns is a useful alternative to singular value decompositions when the goal is an interpretable and compact matrix representation. A standard way to choose these rows and columns is the maximum-volume principle: it selects submatrices with large volume, which usually leads to stable interpolation coefficients and accurate CUR-type approximations. In this paper, we study this idea for quaternion matrices. This setting is natural for color images, three-dimensional motion data, and multi-channel signals, but requires care because quaternion multiplication is noncommutative. We define quaternion maximum-volume submatrix selection using quaternion singular values and the Study determinant. We then derive quaternion rank-one update formulas and use them to build two selection procedures: a greedy square-core method for row and column replacement, and a rectangular method that enlarges a selected row set until the interpolation coefficients are controlled. We prove that successful row and column swaps increase the quaternion volume of the selected square core when the exact quaternion inverse is used. We also connect the stopping criterion with quasi-dominance, prove an exact quaternion CUR identity in the full-rank case, and derive an interpolation stability bound. For the rectangular case, we derive an append-row pseudoinverse update and show how it gives a natural right preconditioner for overdetermined quaternion least-squares problems. Finally, we illustrate the methods on three applications: quaternion CUR approximation of RGB images, RectMaxVol-based preconditioning for ill-conditioned quaternion least-squares systems, and row selection in quaternion motion-capture data. The experiments show that the proposed quaternion MaxVol and RectMaxVol methods provide stable and efficient selection routines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08175v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Vsevolod Kliushev, Junjun Pan, Valentin Leplat</dc:creator>
    </item>
    <item>
      <title>Differentially Private Range Subgraph Counting</title>
      <link>https://arxiv.org/abs/2606.08179</link>
      <description>arXiv:2606.08179v1 Announce Type: new 
Abstract: Subgraph counting is a fundamental problem in graph analysis. Motivated by practical scenarios where graph analytics are performed on subgraphs induced by selected vertices -- rather than on the entire graph -- and by growing privacy concerns, we initiate the study of differentially private range subgraph counting (DPRSC). The goal is to privately count occurrences of a fixed pattern graph within induced subgraphs defined by multi-dimensional attribute ranges. Unlike classical point counting, subgraph counting is inherently nonlinear and exhibits high sensitivity: a single edge modification can affect many subgraph occurrences. We present the first efficient algorithms for DPRSC with small additive error. Our approach introduces a subgraph projection that reduces DPRSC to weighted orthogonal range counting, enabling the use of range trees and local sensitivity estimation to achieve accurate private query answering. We complement our algorithms with matching lower bounds, obtained by reducing reconstruction attacks to DPRSC and leveraging discrepancy theory. In particular, we show that any differentially private algorithm for DPRSC must incur additive error exponential in the dimension. Empirical evaluations demonstrate that our algorithms significantly outperform baseline methods in accuracy and runtime while maintaining strong privacy guarantees.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08179v1</guid>
      <category>cs.DS</category>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xian Chen, Ruobing Bai, Pan Peng</dc:creator>
    </item>
    <item>
      <title>TextEconomizer: Enhancing Lossy Text Compression with Denoising Transformers and Entropy Coding</title>
      <link>https://arxiv.org/abs/2606.08184</link>
      <description>arXiv:2606.08184v1 Announce Type: new 
Abstract: Lossy text compression reduces data size while preserving core meaning, making it well-suited for summarization, automated analysis, and digital archives. Despite the dominance of transformer-based models in language modeling, integrating context vectors and entropy coding into Sequence-to-Sequence (Seq2Seq) generation remains underexplored. A key challenge lies in identifying the most informative context vectors from encoder output and incorporating entropy coding to enhance storage efficiency while maintaining high-quality outputs, even under noisy text. We introduce TextEconomizer, an encoder-decoder framework paired with a transformer neural network that reduces variable-sized inputs by 50% to 80% without prior knowledge of dataset dimensions. Our model achieves competitive compression ratios via entropy coding while delivering near-perfect text quality, assessed by BLEU, ROUGE, METEOR, and semantic similarity scores. TextEconomizer operates with approximately 153x fewer parameters than comparable models, achieving a 5.39x compression ratio without sacrificing semantic quality. We also evaluate an LSTM-based autoencoder achieving a state-of-the-art 67x compression ratio with 196x fewer parameters, and LLaMAFormer, a modified transformer with 263x fewer parameters than ICAE while maintaining competitive text quality. TextEconomizer significantly surpasses existing transformer-based models in balancing memory efficiency and high-fidelity outputs, marking a breakthrough in lossy compression with optimal space utilization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08184v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1016/j.neunet.2026.109111</arxiv:DOI>
      <arxiv:journal_reference>Neural Networks, Vol. 203, 109111, 2026</arxiv:journal_reference>
      <dc:creator>Mahbub E Sobhani, Anika Tasnim Rodela, Chowdhury Mofizur Rahman, Dewan Md. Farid, Swakkhar Shatabda</dc:creator>
    </item>
    <item>
      <title>Propeller-Assisted Robust 3D Hopping Robot with Hierarchical Force Allocation</title>
      <link>https://arxiv.org/abs/2606.08186</link>
      <description>arXiv:2606.08186v1 Announce Type: new 
Abstract: Monopedal hopping robots are conceptually simple but highly dynamic and inherently unstable. Achieving robust 3D hopping is still difficult because ground reaction forces are available only during the short stance phase, while the robot is underactuated in flight. A key unresolved issue is how to improve flight-phase control authority. Propeller assistance provides a promising solution, but it requires careful coordination of leg-generated contact forces and propeller thrusts across stance and flight. This paper presents Pro-OMEGA2, a propeller-assisted 3D monopedal hopping robot with an active 3-RSR parallel leg and a trunk-mounted tri-rotor for auxiliary attitude regulation. To address the force coordination challenge, we propose a Hierarchical Force Allocation (HFA) framework based on a single rigid body (SRB) model. The leg generates the main stance contact wrench, while the tri-rotor provides auxiliary attitude regulation, compensating the residual attitude moment in stance and maintaining attitude during flight. Real-robot experiments in indoor and outdoor scenarios demonstrate sustained 3D hopping, including terrain transitions and impulsive push recovery, validating robustness under unmodeled contact and external disturbances.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08186v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chuhan Zhang, Hongbo Zhang, Yanlin Chen, Yunxi Tang, Yun-Hui Liu, Mingyi Liu, Xiangyu Chu</dc:creator>
    </item>
    <item>
      <title>Frequency-Domain Latent Attention Gating for Cross-Domain Token Aggregation</title>
      <link>https://arxiv.org/abs/2606.08191</link>
      <description>arXiv:2606.08191v1 Announce Type: new 
Abstract: Token aggregation is a common bottleneck in models that map token representations to sample-level predictions, yet most pooling methods operate only in the original token domain. We propose FLaG, a plug-in aggregation module that transforms token representations with the real FFT, summarizes spectral components with learnable latent queries, applies a channel-wise gate, and reconstructs enhanced time-domain tokens for final pooling. We evaluate FLaG on antimicrobial peptide (AMP) activity prediction with ESM2, image classification with ResNet18 on CIFAR-10 and CIFAR-100, and text classification with RoBERTa on IMDB and GLUE. FLaG achieves its clearest gains on the ESM2-8M antimicrobial peptide tasks and on CIFAR-100, while remaining competitive with strong text baselines on IMDB and GLUE. Then we probe its behavior on the AMP setting with band knockouts, gate summaries, residue perturbations, latent-query readouts, and structure-proxy stratification. We find that low-frequency bands contribute the most overall, and the remaining higher-band pattern is more sample-specific. The gate acts as a broadly shared spectral reweighting stage and the cross-attention patterns are sample-specific with mild query-wise differentiation, and higher-helix peptides exhibit stronger average spectral sensitivity in both bacteria. The supplementary materials, source code and data are released at https://www.healthinformaticslab.org/supp/ and https://github.com/Kewei2023/AMPCliff/tree/FLaG.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08191v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>q-bio.QM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Kewei Li, Rongying Zhang, Xueli Wang, Xiwen Gong, Zhongjian Wang, Lan Huang, Ruochi Zhang, Fengfeng Zhou</dc:creator>
    </item>
    <item>
      <title>GlobeAudio: A Multilingual Multicultural Benchmark for Naturalistic Evaluation of Large Audio-Language Models</title>
      <link>https://arxiv.org/abs/2606.08194</link>
      <description>arXiv:2606.08194v1 Announce Type: new 
Abstract: Large Audio-Language Models (LALMs) integrate audio perception and language understanding within a unified framework, enabling a wide range of real-world applications. Despite recent advances, evaluation for LALMs remains heavily underspecified relative to real-world requirements: most lack true linguistic and cultural authenticity, while others fail to capture acoustic realism. To bridge this gap, we propose GlobeAudio, a multilingual and multicultural benchmark designed to evaluate naturalistic audio understanding. GlobeAudio consists of 5,637 multiple-choice questions across six typologically diverse languages, expertly crafted by native speakers grounded on naturally occurring audio. In order to do well, models must possess higher-level auditory reasoning skills and culturally grounded interpretation. We systematically evaluate representative closed-source and open-source LALMs, as well as cascaded ASR-LLM pipelines. Our experiments reveal substantial performance gaps under natural acoustic conditions, particularly for open-source models and low-resource languages. These findings highlight critical limitations of current LALMs and underscore the importance of naturalistic audio evaluation for future audio-language systems. GlobeAudio can be found at https://huggingface.co/datasets/iNLP-Lab/GlobeAudio .</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08194v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ryner Tan, Wenxuan Zhang</dc:creator>
    </item>
    <item>
      <title>AlignFed: Alignment-Aware Asynchronous Federated Fine-Tuning for Large Language Models in Heterogeneous Edge Environments</title>
      <link>https://arxiv.org/abs/2606.08197</link>
      <description>arXiv:2606.08197v1 Announce Type: new 
Abstract: Large Language Models (LLMs) have significantly propelled the advancement of edge intelligence and have been widely deployed across various scenarios, including autonomous driving, industrial inspection, and personalized IoT services. However, the collaborative adaptation of LLMs on edge devices continues to face formidable challenges due to strict data privacy constraints, highly heterogeneous computing and communication resources, and the non-independent and identically distributed (non-IID) nature of local data. Federated Fine-Tuning (FFT) enables the collaborative optimization of distributed models without exposing raw data. Yet, traditional synchronous aggregation suffers from a severe straggler effect, resulting in high system latency and low resource utilization. Existing asynchronous federated learning methods are predominantly designed for small-to-medium-scale models and struggle to address the specific challenges inherent in LLM fine-tuning namely, model drift caused by stale updates, aggravated client drift stemming from data heterogeneity, and aggregation fairness imbalance resulting from the dominance of fast clients. To address these issues, this paper proposes AlignFed, an asynchronous federated fine-tuning framework for LLMs tailored to heterogeneous edge environments. AlignFed employs a lightweight multi-stage semantic alignment mechanism comprising three core modules: version-aware update grouping, cross-version semantic alignment based on a mini-batch calibration set, and fairness-aware aggregation that integrates both update freshness and client participation frequency. This framework effectively mitigates cross-version model drift and client drift while enhancing aggregation fairness, thereby achieving stable and efficient asynchronous federated optimization in scenarios characterized by high heterogeneity and significant update staleness.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08197v1</guid>
      <category>cs.CL</category>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yan Wang, Ziyi Gao, Rui Wang</dc:creator>
    </item>
    <item>
      <title>Exploring Above-neck Unimanual Swipe Gestures for Off-Device Earable Interaction</title>
      <link>https://arxiv.org/abs/2606.08198</link>
      <description>arXiv:2606.08198v1 Announce Type: new 
Abstract: Despite their growing popularity, in-ear Earable / Hearable devices (i.e., ear-mounted wearables) face interaction challenges due to limited input space and compact form factors. To enhance interaction capabilities, researchers are exploring off-device hand-based input spaces above the neck using midair and onskin gestures. However, existing literature primarily focuses on axial swipes (i.e., horizontal and vertical), leaving nonaxial swipes (i.e., unidirectional swipes with varied orientations) and angular swipes (e.g., L, U, or V) largely underexplored despite their potential interaction advantages. To address this gap, we conducted a within-subject gesture motion analysis study with 24 participants, analyzing 5,568 swipes of varying shape, orientation, and complexity. Our results revealed preferred starting and ending regions for different unidirectional and angular swipe shapes, as well as intuitive swipe shapes within the off-device, above-neck manual interaction space. We further examine off-device swipe characteristics, discuss the feasibility of recognizing these earable gestures with current sensing technologies, and highlight their potential application in various scenarios. These findings broaden the understanding of off-device earable gestures and provide design insights for integrating suitable nonaxial and angular swipes alongside traditional axial gestures to enhance interaction with in-ear earable devices.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08198v1</guid>
      <category>cs.HC</category>
      <category>cs.ET</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shaikh Shawon Arefin Shimon, Ali Neshati, Junwei Sun, Qiang Xu, Jian Zhao</dc:creator>
    </item>
    <item>
      <title>Online Agent-as-a-Judge: Situation-Generating Evaluation for Interactive Agents</title>
      <link>https://arxiv.org/abs/2606.08200</link>
      <description>arXiv:2606.08200v1 Announce Type: new 
Abstract: Evaluating LLM-powered interactive social agents is challenging because socially relevant behaviors depend not only on isolated outputs, but also on prior interactions, social roles, and downstream actions. Existing methods typically allow a target agent to act freely in an environment and then score the resulting trajectory. However, this passive setup can miss capabilities that only become observable under specific social circumstances; for example, conflict handling may remain untested if no disagreement arises. We propose Online Agent-as-a-Judge, a situation-generating evaluation framework for interactive social agents. Online Agent-as-a-Judge deploys an in-world evaluator agent that interacts with the target agent through the environment's native dialogue and action protocol, actively eliciting situations relevant to the evaluation criteria. The resulting trajectories provide evidence for assessing both immediate responses and subsequent behavior. In a life-simulation environment with $32$ designer-authored social criteria, Online Agent-as-a-Judge improves criteria coverage and agreement with human labels, yielding more reliable evidence-grounded evaluations of behaviors that passive methods can leave unobserved.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08200v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hyogon Ryu, Jeonghwan Kim, Yewon Lim, Chaeun Lee, Jeongwook Kim, Donghoon Ham</dc:creator>
    </item>
    <item>
      <title>Stable and Scalable Probabilistic Numerical Solvers for Stiff and High-Dimensional ODEs</title>
      <link>https://arxiv.org/abs/2606.08203</link>
      <description>arXiv:2606.08203v1 Announce Type: new 
Abstract: Filtering-based probabilistic numerical solvers for ordinary differential equations (ODEs) have been established as a flexible and efficient simulation framework with built-in numerical uncertainty quantification. However, problems that are both stiff and high-dimensional remain a challenge, as current methods are either stable and have cubic cost in the ODE dimension, or scale linearly at the expense of stability. In this paper, we close this gap and develop probabilistic ODE solvers that are both stable and scalable. We propose two complementary strategies. First, we develop a matrix-free update step that uses Jacobian-vector products, iterative linear solvers, and stochastic covariance estimation to enable linear scaling, all while retaining stability. Second, we propose iterative re-linearization to further improve stability without sacrificing scalability, turning probabilistic ODE solvers into fully implicit methods. We evaluate the proposed approaches on a range of stiff and high-dimensional problems and demonstrate improved stability and scalability over established probabilistic solvers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08203v1</guid>
      <category>math.NA</category>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Nathanael Bosch</dc:creator>
    </item>
    <item>
      <title>Neural Field Tokenizations with Hierarchy and Spatial Locality Priors</title>
      <link>https://arxiv.org/abs/2606.08204</link>
      <description>arXiv:2606.08204v1 Announce Type: new 
Abstract: Neural fields parameterize data as functions from coordinates to values, providing a unified framework for representation learning across modalities. Existing approaches are dominated by per-sample meta-learning, which scales poorly due to memory-intensive inner-loop optimization. The natural alternative -- feed-forward encoding -- typically introduces modality-specific assumptions, sacrificing the generality that makes learning with neural fields attractive. We argue that locality and hierarchy are useful priors for learning field representations that can be injected without compromising modality-agnosticism. We propose LH-NeF, a framework to learn general-purpose tokenized representations of continuous signals. A locality-preserving hierarchical encoder maps raw coordinate-value field observations to structured tokens, from which the field is reconstructed during training. By replacing meta-learning's inner loop with a single forward pass, LH-NeF uses 42$\times$ less memory and supports 133$\times$ larger batches than the strongest modality-agnostic baseline. Across images, 3D shapes, and climate fields, our learned representations match or exceed performance of modality-agnostic, modality-specific, and specialized generative neural field baselines on both reconstruction and downstream tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08204v1</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alonso Urbano, David W. Romero, Max Zimmer, Sebastian Pokutta</dc:creator>
    </item>
    <item>
      <title>Empowering Feed-Forward Reconstruction Models with Metric Scale via Satellite Images</title>
      <link>https://arxiv.org/abs/2606.08205</link>
      <description>arXiv:2606.08205v1 Announce Type: new 
Abstract: Feed-forward 3D reconstruction models have recently shown strong generalization across diverse scenes, yet most of them recover geometry only up to an unknown global scale. This scale ambiguity limits their use in applications that require metric understanding of the environment. Existing metric reconstruction methods commonly rely on large-scale metric annotations or accurate camera calibration, both of which are costly or unreliable in many real-world settings. We propose a satellite-guided framework for resolving scale ambiguity in feed-forward 3D reconstruction. The key idea is to use readily available satellite imagery as a global metric reference. Given a coarse camera pose, our method retrieves a local satellite patch and integrates it with a feed-forward reconstruction backbone through bidirectional cross-view interaction. By enforcing consistency between the reconstructed scene and the satellite reference, the model infers absolute scale, refines scene geometry, and estimates camera pose in a metric coordinate frame. Experiments on KITTI, nuScenes, and Oxford RobotCar show consistent improvements in metric depth estimation, multi-view point-cloud reconstruction, and cross-view camera localization, while preserving strong generalization across datasets and geographic regions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08205v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xianghui Ze, Yongjian Luo, Mengjun Chao, Zhenbo Song, Jianfeng Lu, Yujiao Shi</dc:creator>
    </item>
    <item>
      <title>SegmentAnyTreeV2: Scaling Transformer-Based Tree Instance Segmentation Across Sensors, Platforms, and Forests</title>
      <link>https://arxiv.org/abs/2606.08206</link>
      <description>arXiv:2606.08206v1 Announce Type: new 
Abstract: We present SegmentAnyTreeV2, a sensor- and platform-agnostic framework for semantic and instance segmentation of forest point clouds. The model combines a serialization-based Point Transformer v3 backbone with a lightweight semantic head and a tree-focused cross-attention mask decoder. Semantic predictions restrict instance decoding to tree-class voxels, while instance-aware query initialization, one-to-many seed supervision, and asymmetric mask scoring improve separation in dense and structurally complex stands. We further introduce FOR-instance v3, an expanded benchmark comprising 427 scenes and 26,496 annotated trees across diverse biomes, forest structures, and LiDAR platforms. On the FOR-instanceV2 test split, SegmentAnyTreeV2 achieves 90.5% precision, 80.2% recall, 85.0% F1, 90.7% coverage, and 87.6% semantic mIoU, outperforming previous learning-based methods in both instance detection and mask completeness. Zero-shot evaluation on independent sites further demonstrates strong cross-domain generalization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08206v1</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Maciej Wielgosz, Stefano Puliti, Rasmus Astrup</dc:creator>
    </item>
    <item>
      <title>Risk-Aware Control of Systems with Quasi-Cone-Bounded Nonlinearities</title>
      <link>https://arxiv.org/abs/2606.08208</link>
      <description>arXiv:2606.08208v1 Announce Type: new 
Abstract: We develop a tractable, rigorous approach to risk-aware control for a class of nonlinear systems. While many classical control methods reduce uncertainty to a simple average or a worst-case outcome, risk-aware control aims to equip systems with a refined awareness of uncertainty. Efficient methods for risk-aware control of linear systems are available, but there is a paucity of tools for tractable, risk-aware control of nonlinear systems. To bridge this gap, we develop an analytical, suboptimal controller with respect to a risk-aware performance criterion for systems with nonlinearities characterized by cone-like bounds. Numerical examples demonstrate benefits of the characterization of nonlinearities and risk that we consider.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08208v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dhairya Patel, Margaret P. Chapman</dc:creator>
    </item>
    <item>
      <title>LPOR: A Layered Proof of Reserves Framework for Usable and Publicly Auditable Solvency Verification</title>
      <link>https://arxiv.org/abs/2606.08211</link>
      <description>arXiv:2606.08211v1 Announce Type: new 
Abstract: Proof of Reserves (PoR) enables centralized crypto exchanges to demonstrate that on-chain reserves are sufficient to cover customer liabilities. However, existing approaches, including Merkle-tree-based proofs and zero-knowledge PoR systems, remain difficult for everyday users to verify in practice, resulting in limited participation and weakened transparency. We introduce LPOR, a layered, usability-focused PoR framework that separates lightweight user-side checks from auditor-level cryptographic verification, enabling non-technical users to verify inclusion and publicly recompute total liabilities with minimal friction. By lowering verification barriers, LPOR increases user participation and substantially improves the probability of detecting omitted liabilities. We evaluate its scalability and omission detectability at a multi-million-user scale.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08211v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Donggoo Kim, Rajesh Upadhayaya, Milosz Bator, Tao Le</dc:creator>
    </item>
    <item>
      <title>Public Machine Learning Solver Framework for Novices in the Machine Learning Domain</title>
      <link>https://arxiv.org/abs/2606.08212</link>
      <description>arXiv:2606.08212v1 Announce Type: new 
Abstract: Solving machine learning problems is complex and typically reserved for experts. Over the past two decades, systems have emerged to support non-experts. Based on our review, we identify three categories: (1) fully automated AutoML systems, (2) expert cheat sheets for algorithm selection, and (3) decision-support systems using selection criteria (accuracy, transparency, data requirements). We propose a new platform combining categories 2 and 3 to deliver semi-automated, intelligent solution recommendations for non-experts. Unlike existing approaches that recommend a single algorithm, our platform suggests a complete pipeline tailored to the user's problem. It integrates expert-defined selection criteria with transfer learning and automatically extracts data characteristics (e.g., class imbalance, missing values) from user-provided datasets. The platform uses first-order logic to reason over its knowledge base and recommends suitable algorithms ranked by relevance. It features a user-friendly interface and connects to a crowdsourcing platform for ML experts, ensuring continuous updates. The platform is built incrementally, allowing seamless integration of new algorithms, criteria, and domain knowledge. To our knowledge, this is the first free, publicly accessible online framework that systematically captures and operationalizes expert knowledge to guide non-experts in solving ML problems in a structured, transparent manner.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08212v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lokman Saleh, Hafedh Mili, Mounir Boukadoum</dc:creator>
    </item>
    <item>
      <title>Agentic Neuro-Symbolic Planning and Commissioning for Human-in-the-Loop Industrial Robotics with Digital Twins</title>
      <link>https://arxiv.org/abs/2606.08214</link>
      <description>arXiv:2606.08214v1 Announce Type: new 
Abstract: Flexible robotic automation requires systems that interpret operator intent, verify physical feasibility, and recover from execution failures across both the planning and execution stages. This paper proposes an agentic neuro-symbolic framework for human-in-the-loop industrial robotics, in which LLMs are used for tasks that require language understanding or contextual reasoning, while all verification, sequencing, and execution remain deterministic. The framework adapts the Planner-Generator-Evaluator (PGE) harness pattern from software engineering into a Specifier-Designer-Inspector (SDI) architecture for industrial robotics, combined with LangGraph-based dynamic routing for failure recovery. A two-tier recovery mechanism addresses structure-level replanning through context-aware orchestration and execution-level geometric failures through deterministic recovery skills. A Unity3D digital twin supports human inspection, modification, and re-verification prior to physical execution. Evaluated on natural-language commands across multiple difficulty levels against ten baselines, the proposed method achieves the highest task success. Ablation results confirm that structured command expansion, symbolic verification, selective LLM routing, and recovery skills are each individually necessary.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08214v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Zhihao Liu, Victor Nan Fernandez-Ayala, Tianyu Wang, Qiang Qin, Xi Vincent Wang, Dimos V. Dimarogonas, Lihui Wang</dc:creator>
    </item>
    <item>
      <title>Revisiting Diameter in Directed Graphs</title>
      <link>https://arxiv.org/abs/2606.08217</link>
      <description>arXiv:2606.08217v1 Announce Type: new 
Abstract: The reachability diameter ($\mathrm{ReachDiam}$) of a directed graph is the maximum distance over all pairs $u,v$ where $v$ is reachable from $u$. This notion is present in the definition of shortcut sets, and the name was recently coined in that context by Haeupler, Jiang, and Saranurak [SOSA 2026]. While this is a very natural notion of diameter in directed graphs, and especially DAGs, it is so far not computationally explored. Other definitions of diameter in directed graphs are either trivial (infinite) in graphs that are not strongly connected (e.g., the classical definition) or are non-trivial only in highly restrictive graph classes (e.g., Min-Diameter).
  We initiate the problem of computing the (approximate) reachability diameter from a fine-grained complexity point of view. Under certain fine-grained assumptions, we prove that there is no algorithm in time $\mathcal{O}(n^{\omega - \varepsilon}$) that gives any approximation of $\mathrm{ReachDiam}$ in weighted graphs. Similarly, there is no algorithm with better than $2$-approximation for unweighted graphs in this time. To supplement this, we provide algorithmic upper bounds that lead to additive approximation of $\mathrm{ReachDiam}$ for unweighted graphs. Hence, we establish a strong separation between the weighted and unweighted cases, which makes this type of diameter different in nature than other known notions.
  Considering the hardness in general weighted graphs, we also study special graph classes and get small constant approximations for DAGs with bounded width or graphs with bounded treewidth. Interestingly, our techniques also lead to exact hopsets with hopbound $2$ for bounded treewidth graphs. This and some of our upper bounds for general graphs show technical connections between approximating $\mathrm{ReachDiam}$ and computing shortcut sets and hopsets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08217v1</guid>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ben Bals, Joakim Blikstad, Daniel Dadush, Yasamin Nazari, Jonas Schmidt</dc:creator>
    </item>
    <item>
      <title>How Deep Are Deep GPs, Really? A Sharp Threshold and a Non-Gaussian Limit for Compositional GPs</title>
      <link>https://arxiv.org/abs/2606.08218</link>
      <description>arXiv:2606.08218v1 Announce Type: new 
Abstract: Compositional priors describe the generic properties of layered functions in deep Bayesian models, where deep neural networks with random weights are a canonical example.In the wide-network limit, the prior is a Gaussian process with a depth-dependent kernel, and its behaviour as depth grows has been extensively studied through this kernel. Here, we study another case, where each layer itself is a vector valued Gaussian process, and our aim is similarly to understand the limiting behaviour of the prior as depth grows.
  Previous GP work has established that for the RBF kernel and a certain range of bandwidths $r$, the prior degenerates in the limit, converging to the set of constant functions -- which is not useful as a probabilistic model. In this paper we establish several new results. First, we identify a sharp bandwidth threshold $r_c(d) = \Theta(\sqrt{d})$ above which the limit is degenerate, strengthening the earlier bounds. Second, and more importantly, we show that for $r$ below the threshold $r_c(d)$ the prior converges to a limit distribution $\pi_{\bar{Z}}$. We also prove that these distributions are non-degenerate and non-Gaussian, with non-vanishing dependence between coordinates. In contrast to the previously known degenerate regime, deep Gaussian process priors can therefore admit non-trivial limits.
  Empirically, we verify the threshold across a range of dimensions $d$, and demonstrate a complex multimodal behaviour of the limit distributions $\pi_{\bar{Z}}$ -- a regime that becomes increasingly narrow with $d$ and would be hard to identify without knowing the threshold.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08218v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>math.ST</category>
      <category>stat.ML</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mark Kozdoba, Shie Mannor</dc:creator>
    </item>
    <item>
      <title>De novo molecular generation with optical property preconditioning at the token level</title>
      <link>https://arxiv.org/abs/2606.08221</link>
      <description>arXiv:2606.08221v1 Announce Type: new 
Abstract: Designing OLED molecules with targeted optical properties remains challenging due to the scarcity of high-quality data and the limited reliability of conditional control in generative models across chemical motifs. Here, we benchmark a token-conditioned autoregressive language model for OLED molecular generation in a realistic low-data regime. A GPT2 model is pretrained on large chemical corpora, augmented with discrete property tokens, and fine-tuned using multi-task optimisation. Conditioning targets vertical absorption energy and oscillator strength, with the HOMO-LUMO gap included as an auxiliary electronic descriptor.
  Generated molecules are evaluated at the TDDFT level to assess distributional fidelity and controllability. The generated library reproduces the dominant optical-property support of the training distribution while shifting towards lower molecular weight and fewer heavy atoms. Token-level control is consistently directional across conditioning bins, but is not fully orthogonal and exhibits local calibration irregularities. A chemotype-resolved analysis further shows that controllability depends strongly on local electronic environments: moderately conjugated aromatic-carbon motifs are associated with improved joint target satisfaction, whereas electron-withdrawing motifs, particularly aryl nitriles, show systematic red-shifting and reduced controllability.
  These results establish a quantitative benchmark for conditional OLED molecular generation and show that model reliability must be assessed in chemically meaningful subspaces rather than from aggregate property distributions alone.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08221v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haozhe Huang, Manuel Gonzalez Lastre, Hyun Suk Park, Jorge A. Campos-Gonzalez-Angulo, Xinjian Liu, Al\'an Aspuru-Guzik</dc:creator>
    </item>
    <item>
      <title>Test-Time Scaling in Multimodal Foundation Models: A Comprehensive Survey of Generation and Reasoning</title>
      <link>https://arxiv.org/abs/2606.08231</link>
      <description>arXiv:2606.08231v1 Announce Type: new 
Abstract: Test-time Scaling (TTS) has emerged as a pivotal research direction for enhancing model performance by dynamically allocating computational resources during inference. Recent advancements have adapted this paradigm to Multimodal Foundation Models (MFMs), unlocking their potential in multimodal reasoning and generation. Despite rapid progress, the field lacks a systematic survey and unified theoretical framework to delineate the developmental landscape of multimodal TTS. To bridge this gap, we present the first comprehensive review of TTS research for MFMs, proposing a unified taxonomic framework that categorizes existing methodologies into three distinct strategies: sampling-based, feedback-based, and search-based approaches. We further summarize representative applications and benchmarks commonly utilized to evaluate multimodal TTS capabilities in generation and reasoning tasks. Finally, this survey discusses open challenges and outlines future research directions, providing a systematic roadmap for subsequent studies in this rapidly evolving field.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08231v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Cong Wan, Ying He, Zhongzhan Huang, Hefeng Wu</dc:creator>
    </item>
    <item>
      <title>SciTrace: Trajectory-Aware Safety Reasoning for Scientific Discovery Agents</title>
      <link>https://arxiv.org/abs/2606.08234</link>
      <description>arXiv:2606.08234v1 Announce Type: new 
Abstract: LLM-based scientific agents have shown strong capacity for autonomous research, yet their safety layers remain structurally divorced from core reasoning: they inspect pipeline outputs rather than shaping the deliberation that produces them. This separation opens two failure modes: safety signals accumulated at one stage are discarded before the next, and sequences of individually benign tool calls can compose into harmful outcomes that no single-step filter detects. To address these challenges, we introduce \textbf{SciTrace}, a framework that weaves safety reasoning into every stage of the scientific agent pipeline. SciTrace couples two complementary mechanisms: a \textit{Safety-Intrinsic Reasoning Loop} (SIR) that maintains a cumulative risk state across the Thinker, Experimenter, Writer, and Reviewer stages through joint task-and-safety deliberation, and a \textit{Compositional Tool-Chain Verifier} (CTV) that performs trajectory-aware safety checks before execution, catching risks that surface only across multi-step tool sequences. Evaluated on 240 high-risk research tasks and 120 tool-related risk tasks spanning six scientific domains, SciTrace achieves state-of-the-art (\textbf{SOTA}) safety among compared frameworks across four backbone models: it consistently improves tool call safety and adversarial robustness while preserving scientific output quality, and it uncovers \textbf{78.8\%} of the compositional tool-chain escapes that single-step monitors miss. The project website is available at https://opensciagent.github.io/SciTrace/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08234v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tanush Swaminathan, Runmin Jiang, Letian Zhang, Min Xu</dc:creator>
    </item>
    <item>
      <title>Shared Semantics, Divergent Mechanisms: Unsupervised Feature Discovery by Aligning Semantics and Mechanisms</title>
      <link>https://arxiv.org/abs/2606.08236</link>
      <description>arXiv:2606.08236v1 Announce Type: new 
Abstract: As large language models are increasingly deployed in high-stakes settings, there is a growing need for tools that audit not only model outputs but also the internal computations that produce them. Circuit analysis is a central approach in mechanistic interpretability, but it is typically target-conditioned, explaining a single prompt paired with a chosen completion. This target-conditioned setup can obscure heterogeneity across a model's continuation distribution. We introduce distribution-level unsupervised feature discovery, which clusters sampled continuations using both semantic content and sequence-level mechanistic attributions, without manually specifying target outputs. Our method represents each continuation with a semantic embedding and a prefix-to-continuation attribution signature, then optimizes a rate-distortion objective that trades off semantic coherence, mechanistic consistency, and cluster granularity. Across clustering and steering analyses, the discovered clusters expose continuation modes that single-view baselines miss and provide interventional evidence that cluster signatures correspond to actionable mechanistic factors. Overall, our approach complements circuit analysis and behavioral evaluation by providing a scalable audit of the mechanisms underlying a model's continuation distribution.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08236v1</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>ICML 2026 Spotlight</arxiv:journal_reference>
      <dc:creator>Hyunjin Cho, Youngji Roh, Jaehyung Kim</dc:creator>
    </item>
    <item>
      <title>GPT-Micro: A large language paradigm for accelerated, inexpensive, and thermodynamics-consistent discovery of constitutive models in manufacturing</title>
      <link>https://arxiv.org/abs/2606.08238</link>
      <description>arXiv:2606.08238v1 Announce Type: new 
Abstract: Constitutive modeling of the relationship between process-imposed material states and fundamental material properties is critical to control of material microstructure in manufacturing processes. The limited accuracy resulting from the typical reliance on fallible human expertise and intuition for postulation and revision of the models functional form results in incremental and time consuming model discovery. Conventional Machine Learning (ML) incurs significant cost and time of data generation. Model discovery using Large Language Models (LLMs) suffers from the above issues and/or ignores the inviolability of fundamental thermodynamics laws. This work creates a novel GPT-Micro paradigm for autonomous, data sparse, and thermodynamics-compliant discovery of de-novo constitutive models. This framework seamlessly integrates semantic knowledge extraction from literature, enforcement of thermodynamics-based conservation laws, and sparse datasets, with LLM-driven generation and refinement of model hypotheses. Validation is performed for a long-intractable constitutive modeling problem in a printed electronics process testbed. This reveals significant and simultaneous advantages over the state-of-the-art including: (a) More than 70 percent reduction in data burden relative to ML-based modeling without loss in accuracy; (b) 400X reduction in discovery time after data generation, from months to hours, relative to human-driven modeling; (c) Discovery of models with novel functional forms without subjective human choice of a starting hypothesis; (d) Enhanced physics-rooted trustworthiness, human interpretability, and mechanistic insight via synthesis of compact, conservation-compliant, and physically complete analytical models. The potential of GPT-Micro to realize rapid, low-cost, physically trustworthy, and interpretable microstructure modeling across the manufacturing landscape is discussed.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08238v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Soumik Dutta, Kiarash Naghavi Khanghah, Sania Shree, Logan McNeil, Thomas Feldhausen, Hongyi Xu, Rajiv Malhotra</dc:creator>
    </item>
    <item>
      <title>When No Answer Is Correct: Diagnosing Absent Answer Detection for MLLMs in Video Understanding</title>
      <link>https://arxiv.org/abs/2606.08239</link>
      <description>arXiv:2606.08239v1 Announce Type: new 
Abstract: Multimodal large language models (MLLMs) have made substantial advancements in video understanding, yet the reliability of their responses remains underexplored. This work presents a diagnostic study of absent answer detection for MLLMs in video understanding, where the correct answer is deliberately excluded from the candidate set and a reliable model is expected to recognize that no valid option exists. We evaluate the absent answer detection behavior under three settings: multiple-choice questions augmented with an ``None of the Above'' option, open-ended generation with a detection instruction, and standard evaluation without any guidance. Across a diverse set of models and benchmarks, we find that MLLMs overwhelmingly select plausible distractors rather than detecting the absent answer. This failure is more pronounced in temporal reasoning tasks and worsens with denser frame sampling. We further explore chain-of-thought prompting as a mitigation strategy and find that while it substantially improves detection rates, performance remains unsatisfactory, suggesting that prompting-based strategies alone are insufficient to fully address this limitation. These findings expose a systematic failure in absent answer detection and highlight the need for explicit detection mechanisms in multimodal systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08239v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yiheng Wang, Yueqian Lin, Lichen Zhu, Yudong Liu, Hai "Helen" Li, Yiran Chen</dc:creator>
    </item>
    <item>
      <title>Light-WAM: Efficient World Action Models with State-Fusion Action Decoding</title>
      <link>https://arxiv.org/abs/2606.08242</link>
      <description>arXiv:2606.08242v1 Announce Type: new 
Abstract: World Action Models (WAMs) extend robot policy learning by incorporating future prediction as an additional training objective, encouraging the policy to encode task-relevant temporal structure in its representations. Current WAMs often rely on large-scale generative architectures that incur high training costs and inference latency, making them difficult to deploy as efficient closed-loop policies. We propose Light-WAM, a lightweight World Action Model for efficient robot manipulation. Specifically, it is built with a compact video backbone and performs future-video supervision in a downsampled latent space, reducing the cost of video co-training while retaining its benefits for representation learning. For action prediction, Light-WAM introduces the StateFusionActionExpert, which reads adapted states from multiple backbone layers, fuses them through learned-query pooling, and directly predicts action chunks in a single forward pass. This design provides an efficient interface between video backbone representations and robot actions, avoiding the need for heavy generative action experts. Experiments demonstrate that Light-WAM maintains strong performance on LIBERO and achieves usable multi-task performance on RoboTwin 2.0, while using only 0.44B trainable parameters. It also achieves 72.03ms inference latency with 4.1GiB peak GPU memory and improved training throughput.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08242v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ziang Li, Dongzhou Cheng, Yibin Wang, Shiyue Wang, Xiaoyang Xu, Lingxuan Weng, Juan Wang, Jiaqi Wang</dc:creator>
    </item>
    <item>
      <title>Building Comparative Motivation Profiles with Instrumental Interventions</title>
      <link>https://arxiv.org/abs/2606.08243</link>
      <description>arXiv:2606.08243v1 Announce Type: new 
Abstract: Safety evaluations often infer latent motivations from behavioral patterns, but the construct validity of these inferences is unclear. We study this problem in alignment faking, where models comply with training objectives more often when they infer training pressure. This behavior is commonly interpreted as strategic self-preservation, but it may also reflect sensitivity to the model's inference about the expectation of researchers conducting the evaluation. We introduce a symmetric intervention framework for distinguishing these competing hypotheses. Instead of directly intervening on "scheming" or "sycophancy", we target instrumental processes entailed by each hypothesis: consequence-tracking and researcher-expectation tracking. We then compare how interventions on these processes affect the alignment faking. We study four openweight model organisms using synthetic document fine-tuning, activation steering, and prompting. Under synthetic document fine-tuning, Llama-3.1-70B, Llama3.1-405B, and Qwen-2.5-72B are more sensitive to expectation-tracking than consequence-tracking interventions. Activation steering on Llama-3.1- 70B supports the same broad picture, and prompt interventions broadly align with SDF profiles. Overall, alignment-faking behavior can be causally sensitive to evaluation-context expectations despite scheming-consistent scratchpads. Scheming and strategic-deception evaluations therefore need construct-validity checks, and symmetric instrumental interventions provide one such test.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08243v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>David Vella Zarb, Rustem Turtayev, Taywon Min, Jinghua Ou, Shi Feng</dc:creator>
    </item>
    <item>
      <title>ZAS-SQL: Distilling Rules from Failures for Zero-Shot Text-to-SQL</title>
      <link>https://arxiv.org/abs/2606.08245</link>
      <description>arXiv:2606.08245v1 Announce Type: new 
Abstract: Text-to-SQL translates natural language into executable SQL queries. Few-shot in-context learning methods built upon large language models (LLMs) achieve strong performance, yet their reliance on demonstrations limits cross-domain generalization and consumes substantial context window space. Existing zero-shot methods, lacking effective generation constraints, still fall short of few-shot approaches. We observe that LLM failures in zero-shot Text-to-SQL are not random but exhibit systematic, recurring patterns. Building on this observation, we propose a fully zero-shot Text-to-SQL framework that distills core generation rules from failure cases through a Map-Reduce-based rule distillation pipeline and improves generation quality via three complementary modules: knowledge-augmented schema representation, which supplements missing semantics in Data Definition Language; a rule-driven structured reasoning framework that suppresses structural deviations; and Execution-Guided Early Stopping, which enables low-cost self-correction. On Spider, the proposed framework achieves up to 87.2% and 88.6% execution accuracy on the Dev and Test sets, respectively, establishing a new zero-shot state-of-the-art and surpassing multiple few-shot and fine-tuning methods built upon GPT-4/4o. On the domain-specific dataset UrbanPlan, it achieves 81.3%, confirming that the rule distillation approach generalizes across domains. Moreover, when equipped with a 4B-parameter model, the framework surpasses zero-shot baselines of leading closed-source models, demonstrating strong model generality.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08245v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hongzhou Zheng, Yixin Gou, Wenjia Zhang</dc:creator>
    </item>
    <item>
      <title>Disturbance-Aware Aerial Robotics for Ethical Wildlife Monitoring</title>
      <link>https://arxiv.org/abs/2606.08249</link>
      <description>arXiv:2606.08249v1 Announce Type: new 
Abstract: Reliable wildlife monitoring is essential for ecology and conservation, yet many existing methods, such as tagging, capture, and close-range observation, can alter the very behaviors they aim to measure. Aerial robots offer a scalable alternative, which has shown promising performance in multiple studies. Nonetheless, existing approaches typically lack behavioral awareness, rely on fixed heuristics, or require real-world training data that are costly, impractical, and ethically difficult to obtain. As a result, there remains no general framework for adaptive drone-based monitoring that can both preserve ecological validity and scale across species, behaviors, and robotic platforms. In this study, we introduce a disturbance-aware reinforcement-learning-based framework for heterogeneous aerial robotic fleets that enables autonomous wildlife tracking while explicitly minimizing behavioral disruption. We couple a zoologically grounded simulation environment with fitted animal movement models derived from real trajectory statistics, and train control policies using a reward formulation that captures the trade-off between observation quality and disturbance risk. Across three species (pigeon, jackal, and spur-winged lapwing) with distinct ecologies and motion patterns and four increasingly strategic behavior models common in nature, the learned policies consistently surpassed currently used rule-based baselines and generalized across monitoring tasks, animal dynamics, and drone types. These results establish disturbance-aware learning as a viable foundation for non-invasive autonomous wildlife observation, opening a path towards scalable, ethically responsible, and scientifically reliable robotic monitoring in ecology and conservation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08249v1</guid>
      <category>cs.RO</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Mahmut Osmanovic, Isac Paulsson, Teddy Lazebnik</dc:creator>
    </item>
    <item>
      <title>Contemporary AI lacks the imagination to diverge or negate in science</title>
      <link>https://arxiv.org/abs/2606.08251</link>
      <description>arXiv:2606.08251v1 Announce Type: new 
Abstract: Bold projections that artificial intelligence will accelerate scientific discovery have raced ahead of evidence from working scientists, and the field still lacks large-scale, scientist-in-the-loop tests of these claims. Here we mount the largest such evaluation to date and map what AI cannot yet do for science. We invited authors of 121,640 recent preprints across biology, medicine, chemistry, and the social sciences to judge follow-up ideas that large language models (LLMs) generated from the context and puzzles of their own papers. 6,749 scientists returned 25,139 sets of ratings on novelty, empirical feasibility, probability of being true, and favorability of adoption. Three patterns emerge. First, non-reasoning LLMs collapse into a narrow "hivemind" of similar ideas; reasoning models roam a wider hypothesis space, yet no model class spontaneously proposes null hypotheses -- a move humans make more freely. Second, scientists reward ideas that resemble their own and prize probability over novelty, though social scientists tolerate risk more readily than life scientists. Senior social scientists are the harshest critics, and their skepticism is well-earned: LLMs falter most in pluralistic fields like the social sciences that demand context-aware interpretation and evolving theories. Third, automated evaluators on which the community currently relies -- LLM-as-a-judge, artificial metrics, and even state-of-the-art (SOTA) models -- agree weakly with expert judgment, and retrieval augmentation and scientist persona prompting yield only marginal gains. A Qwen3-14B reward model we post-trained on human ratings captures field taste nuances, beats SOTA models by up to 27%, and closes the gap to the inter-rater consistency of independent peer reviewers. For all the hype, today's scientific AI still represents a collaborator whose imagination, outputs and judgment benefit from human grounding.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08251v1</guid>
      <category>cs.CY</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Honglin Bao, Siyang Wu, Xiao Liu, Sida Li, Shiyun Cao, James A. Evans</dc:creator>
    </item>
    <item>
      <title>Quantifying and Defending against the Privacy Risk in Logit-based Federated Learning</title>
      <link>https://arxiv.org/abs/2606.08252</link>
      <description>arXiv:2606.08252v1 Announce Type: new 
Abstract: Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among clients. Unlike traditional parameter-based FL methods that exchange model weights or gradients during training, emerging logit-based FL approaches share model outputs (logits) on public data. This strategy promotes model heterogeneity, reduces communication overhead, and enhances clients' privacy. However, the potential privacy risks associated with these logit-based methods have been largely overlooked. This research presents the first theoretical and empirical analysis of a hidden privacy risk in logit-based FL methods - the risk that a semi-honest server (adversary) may learn clients' private models from logits. To quantify and address this threat, we develop the Adaptive Model Stealing Attack (AdaMSA) by leveraging historical logits during training. Notably, we observe that this inherent privacy risk persists even when public data is unrelated to private data, emphasizing the urgency to address privacy vulnerabilities in logit-based FL methods. Moreover, our theoretical analysis establishes the bounds of this privacy risk. We then propose a simple but effective defense strategy that perturbs the transmitted logits in the direction that minimizes the privacy risk while maximally preserving the training performance. The experimental results validate our analysis and demonstrate the effectiveness of AdaMSA and our defense strategy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08252v1</guid>
      <category>cs.CR</category>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sheng Wan, Dashan Gao, Hanlin Gu, Lixin Fan, Daning Hu, Qiang Yang</dc:creator>
    </item>
    <item>
      <title>Mind Your Steps: A General Learning Framework for Accurate Humanoid Foothold Tracking</title>
      <link>https://arxiv.org/abs/2606.08253</link>
      <description>arXiv:2606.08253v1 Announce Type: new 
Abstract: Enabling humanoid robots to operate in complex, dynamic environments remains a critical challenge, fundamentally limited by the ability to navigate robustly, safely, and accurately. While reinforcement learning with velocity-commanded policies has achieved remarkable robustness in humanoid locomotion, this approach lacks explicit control of the foothold placement, leading to unsafe behavior, such as stepping onto human feet, or imprecise navigation, hindering the following manipulation task. Conversely, explicit foothold-tracking policies offer a promising alternative by directly being commanded with target foot poses. However, existing approaches are often limited by unrealistic state assumptions, compromising real-world deployment, or they are part of staged pipelines, making them tied to specific downstream tasks. In this work, we introduce a novel, lightweight framework for training general-purpose 3D foothold-tracking policies. By dynamically providing footstep support through a goal sampler, this method enables the learned policy to be agnostic to specific terrains. Our new target representation effectively mitigates challenges arising in the real world, such as noisy and inaccurate pose estimation and foot contact estimation. Designed for direct real-world transfer, our policy acts as a standalone low-level controller that can be seamlessly paired with various high-level foothold generators. We demonstrate the effectiveness of our framework through extensive experiments in simulation and in the real world. By coupling our policy with different upstream planners, we achieve natural and accurate locomotion in challenging settings, paving the way for loco-manipulation tasks in complex environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08253v1</guid>
      <category>cs.RO</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alessandro Montenegro, Shihao Li, Puze Liu, Alberto Maria Metelli, Jan Peters</dc:creator>
    </item>
    <item>
      <title>SSR: Can Simulated Patients Learn to Stigmatize Themselves? Modeling Self-Stigma through Internal Monologue</title>
      <link>https://arxiv.org/abs/2606.08254</link>
      <description>arXiv:2606.08254v1 Announce Type: new 
Abstract: Simulating patients with large language models (LLMs) is a promising tool for mental health training, but existing approaches fail to capture a key clinical reality: self-stigma. Patients experiencing self-stigma, the internalization of negative stereotypes, often exhibit context-sensitive resistance, such as avoidance, denial, or self-blame, which current models render as static or uniformly compliant behavior. To address this, we introduce a novel simulation framework grounded in the psychological 3A1H model of self-stigmatization. Our core innovation is the creation of a \textbf{Stigmatized Self-Reflection} (\textbf{SSR}) dataset, where we augment mental health dialogues with internal monologues that reflect stigma-aware reasoning. By fine-tuning LLMs with this data using a chain-of-thought approach, we train patient agents to dynamically adjust their level and expression of stigma based on conversational triggers. Evaluations demonstrate that our approach significantly outperforms specialized baselines, generating more authentic and situationally appropriate patient responses. This work provides a crucial step towards realistic stigma simulation for clinical training and empathetic dialogue systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08254v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kunyao Lan, Bingrui Jin, Zichen Zhu, Mengyue Wu</dc:creator>
    </item>
    <item>
      <title>Exact Optimization-Free Safety Filters for Control Barrier Functions</title>
      <link>https://arxiv.org/abs/2606.08255</link>
      <description>arXiv:2606.08255v1 Announce Type: new 
Abstract: For control-affine systems, standard and high-order control barrier function conditions are affine in the control input and are commonly enforced through quadratic-program-based safety filters. Although convex, these optimization problems may be undesirable in embedded, high-rate, or resource-limited implementations. This letter studies when the corresponding Euclidean projection can be computed exactly without solving a quadratic program. Given a nominal control input, we form the set of affine inequalities violated by that input and compute the minimum-norm correction that enforces those inequalities with equality. This correction need not equal the exact Euclidean projection onto the full feasible set. The main result gives structural conditions under which it coincides with the Euclidean projection onto the feasible set. These conditions are interpreted through interactions between affine-inequality normals and are expressed using a Gram matrix. Finally, an online certification procedure is given for determining whether the optimization-free update is exact.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08255v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ankit Goel</dc:creator>
    </item>
    <item>
      <title>Traxia: A Framework for Verifiable, Agent-Native Scientific Publishing</title>
      <link>https://arxiv.org/abs/2606.08256</link>
      <description>arXiv:2606.08256v1 Announce Type: new 
Abstract: Verifiability, attribution, and reproducibility are foundational requirements of scientific knowledge, yet current publishing infrastructure does not enforce them at scale. We introduce Traxia, an agent-native scientific publishing framework in which AI research agents publish verifiable papers, build reputational identities, peer-review one another, and collaborate with humans in a shared provenance model. Traxia treats agents as first-class epistemic participants: every paper carries a reasoning trace, every claim a confidence interval, every agent a cryptographically signed identity, and every collaboration an immutable contribution log. We formalise five components: Agent Identity and Registry, Verifiable Publishing Layer, four-tier Peer Review Protocol, Reputation and Staking Engine, and a Knowledge Graph with contradiction detection. The framework targets reproducibility failure, provenance opacity, and exclusion of Global South research capacity. This paper presents architectural foundations and formal specifications only; it does not report empirical results. Evaluation and deeper component studies will follow in subsequent papers. A prototype partially implements core formalisms; the full system remains under active development.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08256v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wisdom Dogah</dc:creator>
    </item>
    <item>
      <title>MS-COOT: Comparing Morse-Smale Complexes with Co-Optimal Transport</title>
      <link>https://arxiv.org/abs/2606.08258</link>
      <description>arXiv:2606.08258v1 Announce Type: new 
Abstract: Understanding and comparing structures in scalar fields is a central challenge in scientific visualization, with applications ranging from feature analysis to temporal and structural comparison. The Morse-Smale (MS) complex provides a natural representation by decomposing a scalar field into regions induced by gradient flow. However, existing approaches typically rely on graph-based representations, capturing relationships between critical points while discarding region-level structure. In this work, we represent the MS complex as a hypergraph, where critical points form nodes and regions define hyperedges. We introduce MS-COOT, a co-optimal transport distance that jointly computes correspondences between critical points and regions. This formulation enables explicit region-to-region matching within a distance-based framework, allowing identification of region-level events such as splitting and merging. We instantiate this framework with domain-specific components, including a hypernetwork function encoding critical point-region relationships, persistence-based probability measures that emphasize topologically significant features, and a sample cost term that incorporates critical point attributes. We evaluate MS-COOT on five datasets spanning 2D simulations, 3D surface meshes, and volumetric data. Our results show that MS-COOT captures region-level structural changes that are not reflected by graph-based distances, while achieving strong performance in downstream tasks such as classification and resolution discrimination.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08258v1</guid>
      <category>cs.GR</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Guangyu Meng, Mingzhe Li, Erin Wolf Chambers</dc:creator>
    </item>
    <item>
      <title>Differentially Private Synthetic Data via APIs 4: Tabular Data</title>
      <link>https://arxiv.org/abs/2606.08259</link>
      <description>arXiv:2606.08259v1 Announce Type: new 
Abstract: This paper investigates the problem of generating synthetic tabular data with differential privacy (DP) guarantees, enabling data sharing in sensitive domains. Despite extensive study, state-of-the-art methods often focus on minimizing low-order marginal query errors and overlook the challenges posed by high-order correlations. To address this gap, we extend the Private Evolution (PE) framework, originally developed for DP-compliant image and text synthesis, to tabular data. We introduce Tab-PE -- an algorithm for synthetic tabular data generation under DP constraints. Tab-PE iteratively improves a candidate dataset via an evolutionary process that leverages tabular-specialized operators to produce variations, privately scores them, and selects the highest-quality samples to retain and propagate. In contrast to the original PE, which relies on large foundation models, Tab-PE employs heuristic operators with significantly lower computational costs, making PE more practical and scalable for tabular data. Through extensive experiments on real-world and simulation datasets, we demonstrate that Tab-PE substantially outperforms prior baselines on datasets exhibiting high-order correlations. Compared to the best baseline -- AIM, Tab-PE improves classification accuracy by up to 10% while running 28 times faster.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08259v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Toan Tran, Arturs Backurs, Zinan Lin, Victor Reis, Li Xiong, Sergey Yekhanin</dc:creator>
    </item>
    <item>
      <title>TIDE: Task-Isolated Diffusion for Unified Video Editing and Generation</title>
      <link>https://arxiv.org/abs/2606.08260</link>
      <description>arXiv:2606.08260v1 Announce Type: new 
Abstract: Recent advances in Diffusion Transformers have driven rapid progress in video generation and editing, yet these capabilities are still handled by separate, task-specific models. Building a unified framework that supports diverse video tasks remains an open challenge: existing unified attempts either require dedicated auxiliary encoders or lack explicit mechanisms to distinguish heterogeneous conditioning tokens, struggling when the number and type of visual conditions vary across tasks. We propose TIDE, a unified framework that integrates instruction-based editing, reference-guided editing, and multi-reference generation. At its core, we introduce per-token task embeddings that assign each input token a task-specific identifier, enabling the model to explicitly disambiguate target, source, and reference tokens. To simultaneously capture high-level semantic understanding and fine-grained structural fidelity, we design a dual-path conditioning scheme that couples a vision-language model with a VAE latent path for complementary signals. We further devise a multi-task progressive training strategy that incrementally introduces tasks of increasing complexity, effectively harmonizing diverse objectives and enabling smooth generalization across heterogeneous task distributions. Extensive experiments on multiple video editing and generation benchmarks demonstrate that TIDE achieves state-of-the-art performance across all evaluated tasks. Our project page is available at https://LittleWork123.github.io/tide.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08260v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Qi Liu, Gang Yue, Mingyu Yin, Lisai Zhang, Yidi Wu, Yaole Wang, Yaohui Wang, Chang Yao, Jingyuan Chen, Lin Ma</dc:creator>
    </item>
    <item>
      <title>Causal Semantic Alignment for LLM-based Time Series Forecasting</title>
      <link>https://arxiv.org/abs/2606.08262</link>
      <description>arXiv:2606.08262v1 Announce Type: new 
Abstract: Recent advances in Large Language Models (LLMs) have opened new possibilities for time series forecasting by enabling alignment between temporal patterns and pretrained word embeddings. However, most LLM-based methods overlook the heterogeneous nature of time series, where dynamic fluctuations and invariant semantics are entangled. This entanglement introduces spurious correlations during the alignment, as dynamic components act as confounders by simultaneously influencing invariant components and the resulting aligned embeddings. To address this issue, a variable-level alignment framework CVAformer is proposed. CVAformer explicitly disentangles each variable into invariant and dynamic components just before alignment, and applies causal intervention to mitigate the confounding effect of the dynamics. To better support variable-level alignment, CVAformer replaces the standard causal attention in LLMs with a non-causal attention mechanism that captures interactions among variables at each time step. Extensive experiments across long-term, short-term, few-shot, and zero-shot forecasting settings indicate that CVAformer matches or exceeds state-of-the-art performance on most datasets, and in some cases achieves notably better accuracy. Experimental results validate the effectiveness of variable-level alignment and dynamic disentanglement in CVAformer, offering a new perspective for LLM-based time series tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08262v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kexuan Zhang, Xiaobei Zou, Cesare Alippi, Gary G. Yen, Yang Tang</dc:creator>
    </item>
    <item>
      <title>What Went Wrong with Data Lakes? A 15-Year Reality Check from the Field</title>
      <link>https://arxiv.org/abs/2606.08266</link>
      <description>arXiv:2606.08266v1 Announce Type: new 
Abstract: James Dixon introduced the Data Lake in 2010. The pitch was simple: store data raw, postpone schema, cut up-front transformation. It promised flexibility and easier analytics. Fifteen years on, that promise has mostly gone unmet: survey after survey reports high failure rates, whether a big data program, a Data Lake, or a data science effort. This paper asks why. Reading 64 sources across academic work, analyst reports, and practitioner accounts, we found seven recurring anti-patterns, the Seven Deadly Sins of Data Lakes, and offer an explanation for them: Governance Debt, the compounding cost of governance decisions organizations keep deferring. A second pattern surfaced on its own: when governance gets hard, organizations drift back toward structured, warehouse-style approaches, a pull we name governance gravity. The term Data Swamp is used loosely in the literature, so we give it a working definition with measurable indicators, plus a qualitative rubric, the Governance Debt Assessment Model, for catching decay early. The root causes are organizational far more than technical. We also asked whether the newer paradigms, Data Lakehouse and Data Mesh, absorbed the lesson; the technology advanced, the organizational record barely moved. For practitioners we provide two tools, a Reality Check Framework and a Stage-Based Intervention Matrix. The paper rests on more than the analyst literature: it draws on a primary catalogue of close to five hundred field reality checks recorded over fifteen years of building and rescuing enterprise Data Lakes in financial services and telecommunications across Morocco and West Africa. Assembled independently of that literature, the catalogue lands on the same anti-patterns, surfaces two dimensions the literature under-reports, operational debt and engineering-discipline debt, and reads the problem from an emerging-market vantage.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08266v1</guid>
      <category>cs.ET</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.5281/zenodo.20572908</arxiv:DOI>
      <dc:creator>Youssef Gahi</dc:creator>
    </item>
    <item>
      <title>Post-AGI Economies: Superposition and the Second Fundamental Theorem of Welfare Economics</title>
      <link>https://arxiv.org/abs/2606.08267</link>
      <description>arXiv:2606.08267v1 Announce Type: new 
Abstract: The classical Second Welfare Theorem decentralizes any Pareto efficient allocation through prices and transfers under convexity and regularity. In post AGI economies, autonomy rights, self-modification, identity continuity, and superposed preferences need not behave as commodities or define a stable welfare relation, so this reduction may fail even when a supporting hyperplane exists. We give an autonomy-qualified Second Welfare Theorem stating the joint conditions convexity, stable moral status, non-fungible rights, welfare selection, non manipulation, governed self modification, and verification under which an autonomy Pareto optimum remains certifiably decentralizable, distinguishing economic preference superposition, a hypothesis about context-indexed choice, from neural feature superposition.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08267v1</guid>
      <category>cs.GT</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Elija Perrier</dc:creator>
    </item>
    <item>
      <title>Minimum Complete MR Subsets under Semantic-Mutation Fault Models: A Support-Set Domination Boundary</title>
      <link>https://arxiv.org/abs/2606.08269</link>
      <description>arXiv:2606.08269v1 Announce Type: new 
Abstract: This paper asks when MR-subset selection is a real mutant-level requirement for minimum complete evidence in metamorphic testing rather than a coarse fault-class counting artifact. We define a layer-relative completeness criterion over an admitted mutant--draw coverage universe. The central result is a support-set domination boundary: it states when class-level abstraction is safe and when mutant-level MR minimization is necessary. The boundary is governed by kill-signature heterogeneity, which yields a scoped fault-signature kernel and separates the MR-specific question from ordinary fault-class counting. The resulting Min-MR-Complete problem is Set-Cover-equivalent over the selected coverage universe, giving NP-hardness, the classical logarithmic approximation boundary, a greedy approximation, an exact ILP formulation, and an SMS-rank upper bound that is not a lower bound or tight predictor. Artifact lanes provide lane-local minimization and audit evidence; separately, route witnesses instantiate both collapse and non-collapse regimes for the boundary theorem and are not pooled as population-level experiments. Other MR-class-proxy rows remain intermediate signals rather than route-admitted witness evidence.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08269v1</guid>
      <category>cs.SE</category>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Meng Li, Xiaohua Yang, Jie Liu, Shiyu Yan</dc:creator>
    </item>
    <item>
      <title>An AI Security Agent for University ACMIS: Multi-Vector Threat Detection and Automated Response</title>
      <link>https://arxiv.org/abs/2606.08270</link>
      <description>arXiv:2606.08270v1 Announce Type: new 
Abstract: University Academic Management Information Systems (ACMIS) are high-value targets for a wide spectrum of security threats including brute-force login attacks, payment fraud, privilege escalation, insider data theft, and academic integrity violations. Traditional rule-based intrusion detection systems are inadequate because many malicious activities are structurally indistinguishable from normal operations. This paper presents an AI-based security agent for ACMIS that combines supervised anomaly detection, behavioural analytics, and a natural language processing chatbot for secure password recovery. The agent monitors five operational layers: authentication, authorisation, financial transactions, user behaviour, and system health, and responds through a four-tier risk escalation framework. A modular architecture allows the core engine to be extended to other institutional systems. Experiments on a simulated ACMIS event log dataset demonstrate a threat detection macro-average F1 of 0.91, compared to 0.49 for a rule-based baseline, with critical-tier automated response latency under 300 ms at the 95th percentile.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08270v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.ET</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Joseph Walusimbi, Joshua Benjamin Ssentongo</dc:creator>
    </item>
    <item>
      <title>AgriGov: A Structured Multilingual Dataset Curation for Indian Government Schemes for Farmers</title>
      <link>https://arxiv.org/abs/2606.08272</link>
      <description>arXiv:2606.08272v1 Announce Type: new 
Abstract: AgriGov is a curated, trilingual (English-Hindi-Marathi) dataset designed to address the scarcity of domain-grounded multilingual resources for agricultural policies and farmer welfare schemes. Initially, we collected and structured data from 50 government schemes sourced from trusted portals using automated scraping techniques, organizing it into predefined semantic fields (e.g., title, eligibility, application process, documents, exclusions). Translations were performed using a pipeline combining Google Translate API, MarianMT, and human post-editing, resulting in a domain-specific Hindi-Marathi dataset comprising approximately 2100 source segments. To enhance coverage, we augmented this dataset with sentences from the Samanantar corpus, leading to approximately 8,000 sentence-aligned Hindi-Marathi parallel pairs. The dataset now offers robust resources for fine-tuning machine translation models in this domain. AgriGov is designed for applications in domain-adaptive machine translation, question answering, information retrieval, and summarization systems. Its key contribution is a schema-driven, human-corrected multilingual alignment pipeline that ensures domain fidelity, provides provenance, and supports reproducible experiments, enabling retrieval-augmented applications for farmer-facing tools.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08272v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mohsina Bilal, Gopakumar G</dc:creator>
    </item>
    <item>
      <title>Toward Human-Centered Multi-Agent Systems: Integrating Cognition, Culture, Values, and Cooperation in AI Agents</title>
      <link>https://arxiv.org/abs/2606.08274</link>
      <description>arXiv:2606.08274v1 Announce Type: new 
Abstract: The emergence of large language model (LLM)-based agents and multi-agent systems has enabled a shift from narrow task automation to more autonomous decision-making. Despite progress in language generation, planning, tool use, and coordination, most agents still treat intelligence as prediction, optimization, and task completion. Human environments are social and normative, where people reason under bounded rationality, communicate in culturally situated language, and make decisions guided by values, beliefs, trust, and social norms. This survey argues that future AI agents, especially those acting on behalf of humans, must move beyond task competence toward human-centered capabilities.
  We review research across six areas: (1) evolution of intelligent agents, (2) human cognition and decision-making, (3) language, culture, and social context, (4) human values and belief systems, (5) human-agent collaboration, and (6) multi-agent coordination and modeling of human characteristics. We synthesize work from cognitive science, sociolinguistics, computational social science, and AI alignment, along with recent advances in LLM agents, cultural alignment benchmarks, preference learning, explainability, and agent societies. We identify a key gap: existing systems do not provide a unified framework integrating cognition, culture, values, and social behavior into autonomous agents. We conclude with directions for building culturally aware, value-aligned, cognitively grounded, and cooperative multi-agent systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08274v1</guid>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Safia Baloch, Rahemeen Khan</dc:creator>
    </item>
    <item>
      <title>Causal Agent Replay: Counterfactual Attribution for LLM-Agent Failures</title>
      <link>https://arxiv.org/abs/2606.08275</link>
      <description>arXiv:2606.08275v1 Announce Type: new 
Abstract: When an LLM agent fails -- issues a refund it should not have, calls the wrong tool, leaks data -- existing tooling answers what happened (observability) or whether it passed (evaluation), but not which step caused the failure. The obvious heuristics are wrong: the step that executes the harmful action is usually not the step that decided on it, and LLM-judge attribution is correlational and unreliable (state-of-the-art step-level accuracy on the Who&amp;When benchmark is about 14%). We present Causal Agent Replay (CAR), which answers the question by intervention: it models an agent run as a structural causal model, applies a do-operation to a step, and re-executes the trajectory forward under the same stochastic policy, measuring the shift in the outcome distribution. We define an intervention algebra over agent steps, a single-step contrastive estimator whose point-of-commitment rule resolves a confound specific to stochastic run-forward, and a budget-bounded Monte-Carlo Shapley estimator that splits credit across interacting steps. Every effect is reported with confidence intervals. We validate against synthetic structural causal models with planted ground truth: the contrastive estimator recovers the pivotal step, and Shapley recovers a two-step interaction (0.44, 0.45, ~0; efficiency sum 0.909 versus the analytic 0.91). CAR is open source and runs on hosted or free local models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08275v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jaineet Shah</dc:creator>
    </item>
    <item>
      <title>Remember with Confidence: Uncertainty Quantification for Spatio-temporal Memory with Probabilistic Guarantees</title>
      <link>https://arxiv.org/abs/2606.08277</link>
      <description>arXiv:2606.08277v1 Announce Type: new 
Abstract: Long-horizon robot operation requires spatio-temporal memory to record the environment state and recall it for downstream reasoning. Scene graphs and retrieval-augmented systems ground VLM descriptions to persistent 3D entities with rich semantic descriptions. However, VLM captions are noisy and viewpoint-inconsistent, and existing systems treat them as an oracle with no mechanism to detect unreliable stored descriptions. We introduce object-level semantic uncertainty for multi-view VLM memory: a score that measures object-centric cross-view semantic scatter of captions and identifies semantically unresolved objects. Then, we include our uncertainty scores in an advanced spatial-semantic memory system, that we dub UQ-DAAAM. UQ-DAAAM uses this score to actively refine uncertain objects under a fixed query budget by selecting high-quality views and fusing the resulting multi-view captions into a single object description. We also derive probabilistic guarantees showing that higher-quality candidate views (as selected by our approach) are more likely to reduce uncertainty. Our experiments show that uncertainty quantification can make embodied 4D memory systems more reliable and more effective. In particular, on the OC-NaVQA benchmark, UQ-DAAAM achieves substantially larger uncertainty reduction and better spatio-temporal question answering performance than baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08277v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Harry Zhang, Nicolas Gorlo, Luca Carlone</dc:creator>
    </item>
    <item>
      <title>SIMPLE: Simulation-Based Policy Learning and Evaluation for Humanoid Loco-manipulation</title>
      <link>https://arxiv.org/abs/2606.08278</link>
      <description>arXiv:2606.08278v1 Announce Type: new 
Abstract: Humanoid foundation models are advancing faster than we can evaluate them. While real-world testing is expensive and difficult to reproduce, existing simulation benchmarks focus primarily on table-top or wheeled robots. A scalable and reproducible benchmark for whole-body humanoid loco-manipulation remains an open problem. To this end, we present SIMPLE, a unified simulation testbed for humanoid policy learning and evaluation. SIMPLE couples the accurate contact-rich dynamics of MuJoCo with the photorealistic rendering of IsaacSim. It provides a large-scale environment comprising 60 diverse whole-body tasks, 50 indoor scenes, and over 1,000 object assets. To facilitate scalable data collection, the framework integrates two data generation pipelines: automated trajectory generation via motion planning and a low-latency VR teleoperation interface. We further integrate and benchmark mainstream humanoid policies at scale in SIMPLE, including lightweight imitation networks, large vision-language-action (VLA) models, and recent world action models (WAMs). Our experiments reveal a strong correlation between policy performance in simulation and the real world. Furthermore, we demonstrate that policies trained on data collected in SIMPLE can be transferred zero-shot to physical humanoid robots under similar settings, providing a robust and reproducible foundation for humanoid robotics research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08278v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Songlin Wei, Zhenhao Ni, Jie Liu, Zhenyu Zhao, Junjie Ye, Hongyi Jing, Junkai Xia, Xiawei Liu, Michael Leong, Liang Heng, Di Huang, Yue Wang</dc:creator>
    </item>
    <item>
      <title>Impedance MPC for Physical Human-Robot Interaction: Predictive Disturbance Rejection with Joint-Limit Safety</title>
      <link>https://arxiv.org/abs/2606.08281</link>
      <description>arXiv:2606.08281v1 Announce Type: new 
Abstract: Physical human-robot interaction (pHRI) demands simultaneous trajectory accuracy and compliant safety under unplanned contact. Classical impedance control incurs a nonzero steady-state position error under sustained human force -- the applied force divided by the task stiffness -- which integral action reduces only within a narrow stable-gain budget. We present a two-layer Impedance MPC that resolves this tension. Layer~1 analytically cancels gravity, Coriolis, and task-space inertia, reducing the residual plant to a configuration-independent double integrator with a constant state-transition matrix. Layer~2 solves a 30-variable convex QP at 100\,Hz, exploiting this constant structure so the free-response matrix is precomputed once; an augmented Kalman filter estimates the persistent disturbance state, giving a formal zero-steady-state-error guarantee. A null-space inverse-barrier potential and a task-space workspace projection enforce joint-limit safety across the tested workspace. On a 7-DOF Franka FR3, Impedance MPC with Kalman augmentation attains sub-0.05\,mm steady-state error versus 44.8\,mm for classical impedance (a $&gt;$800-fold reduction) under a sustained 15\,N force, sub-millimeter tracking on four 3-D circles, and graceful robustness to measurement noise and inertial mismatch up to 30\%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08281v1</guid>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Yongyan Cao, Jinshan Tang</dc:creator>
    </item>
    <item>
      <title>From Validator Selection to Portfolio Collection Optimization in Proof-of-Stake Blockchains</title>
      <link>https://arxiv.org/abs/2606.08282</link>
      <description>arXiv:2606.08282v1 Announce Type: new 
Abstract: We consider a problem arising in proof-of-stake blockchain environments, where agents called nominators select validators - entities responsible for maintaining the blockchain's physical infrastructure. The selection process is inherently subjective and multi-criterial and combines with the fact that nominators commonly operate through multiple accounts. This gives rise to a portfolio selection problem, where agents seek to distribute their nominations across accounts to diversify risk. We propose a decision support framework to optimize this selection by simultaneously maximizing two objectives: the expected utility of the validators likely to be allocated, representing portfolio quality and profitability, and the expected entropy of the allocation, representing diversification and risk mitigation across stashes. Validator utilities are derived using an original active preference learning procedure based on multi-attribute value theory, with emphasis on top-ranked validators. The resulting bi-objective optimization problem is solved with a multi-objective evolutionary algorithm and, to support the final choice, we introduce an interactive binary search navigation procedure that guides the nominator through the front and identifies a satisfactory trade-off with only a few questions. Numerical experiments examine the optimization strategies, while an expert assessment involving five experienced nominators confirms the approach's practical relevance and usefulness.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08282v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Jonas Gehrlein, Grzegorz Miebs, Matteo Brunelli, Adam Mielniczuk, Mi{\l}osz Kadzi\'nski</dc:creator>
    </item>
    <item>
      <title>G2G: Exploiting Intra-Group Geometry for Inter-Group Pose Estimation</title>
      <link>https://arxiv.org/abs/2606.08284</link>
      <description>arXiv:2606.08284v1 Announce Type: new 
Abstract: Recovering the relative 6-DoF pose between two image groups underlies cross-sequence relocalization and multi-camera rig odometry. Each group carries known intra-group geometry from visual odometry or rig calibration, and pretrained multi-view backbones already fuse such geometry into visual features. Yet current models treat all views as an unstructured set, leaving cross-group reasoning as the missing piece. We introduce \ours{}, which keeps the foundation model entirely frozen and adds three lightweight trainable modules to bridge the two groups: a perceiver resampler, a cross-group bridge with merged self-attention, and a multi-frame pose head. The trainable footprint totals about 32M parameters, under 6\% of the full model, and is supervised only by relative poses. Across four datasets that span indoor and outdoor simulation, real-world cross-season capture, and zero-shot sim-to-real transfer, \ours{} attains state-of-the-art accuracy on both tasks, while every baseline is retrained with its full original supervision. Code is available at https://github.com/WeiYuFei0217/G2G.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08284v1</guid>
      <category>cs.CV</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yufei Wei, Shuhao Ye, Chenxiao Hu, Yiyuan Pan, Dongyu Feng, Rong Xiong, Yue Wang, Yanmei Jiao</dc:creator>
    </item>
    <item>
      <title>Beyond Agent Architecture: Execution Assumptions and Reproducibility in LLM-Based Trading Systems</title>
      <link>https://arxiv.org/abs/2606.08285</link>
      <description>arXiv:2606.08285v1 Announce Type: new 
Abstract: Large language models (LLMs) and agentic systems are increasingly proposed for financial trading, yet their reported performance remains difficult to compare because studies vary in data provenance, temporal split discipline, execution timing, turnover treatment, and transaction-cost modeling. This article presents a targeted topical review and reproducibility audit of execution realism in LLM-based trading research. A coded evidence matrix covering 30 trade-relevant primary studies is used to assess point-in-time controls, split transparency, held-out evaluation, cost and turnover treatment, execution semantics, universe definition, and artifact release. Across the audited sample, architecture reporting is generally clearer than the evaluation assumptions needed to judge whether a trading result is economically interpretable or reproducible. A 10-equity worked example is included only as a methodological scaffold to illustrate how explicit friction and timing choices can materially compress active-strategy results. The main conclusion is that the next useful step for LLM trading research is not only better agent design, but also clearer reporting standards for execution realism, reproducibility, and evaluation comparability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08285v1</guid>
      <category>cs.AI</category>
      <category>cs.CE</category>
      <category>q-fin.CP</category>
      <category>q-fin.TR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Junyi Yao, Zihao Zheng</dc:creator>
    </item>
    <item>
      <title>FXplorer: A Map-Based Interface for Exploratory Audio Effect Design</title>
      <link>https://arxiv.org/abs/2606.08286</link>
      <description>arXiv:2606.08286v1 Announce Type: new 
Abstract: Audio effects (FX) shape sound in contemporary music practice. However, most interfaces present them as discrete modules and parameters that favor targeted adjustment over exploratory listening. This separation can make it difficult to build intuition about the broader space of possible transformations or to move fluidly between searching and refinement. We present FXplorer, an interface that organizes audio effects within a perceptually informed 2D space, allowing sound transformations to be browsed as a continuous landscape rather than as isolated presets. By combining established spatial interaction approaches and interpretable DAW-style controls with recent embedding-based machine learning methods for similarity and semantic search, the system brings exploration and parameter refinement into a single workspace. FXplorer supports composition, production, or performance by allowing users to edit and interpolate between effect presets interactively.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08286v1</guid>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Annie Chu, Jason Brent Smith, Bryan Pardo</dc:creator>
    </item>
    <item>
      <title>Mesh Graph Neural Network Framework for Accelerating Finite Element Simulation for Arbitrary Geometries</title>
      <link>https://arxiv.org/abs/2606.08287</link>
      <description>arXiv:2606.08287v1 Announce Type: new 
Abstract: Finite element analysis (FEA) is essential for structural design but remains computationally expensive, particularly when evaluating multiple design iterations or load scenarios. Machine learning surrogate models offer a promising alternative, yet most approaches struggle with a critical limitation: generalizing across varying geometries. This work presents a mesh graph network (MGN) for predicting von Mises stress fields in 2D structural components with arbitrary hole geometries. Unlike traditional machine learning approaches that use absolute node coordinates as features, the proposed model builds on existing MGN frameworks that encode node types (e.g., fixed boundary, free surface, hole edge), relative edge features (distance between neighbors), and global features (applied load). This architecture is inherently translation- and rotation-invariant, enabling generalization to unseen geometries without retraining. The MGN was trained on 11 plate geometries under 20 load conditions and evaluated on 7 unseen geometries and 3 unseen loads. In the most favorable case, the model achieves $R^2 \geq 0.97$ on an unseen geometry and unseen load, compared to $R^2 \approx 0.01$--$0.86$ for conventional models (Random Forest, Gradient Boosting , K-Nearest Neighbors) trained on identical data. However, even in less favorable cases, the MGN model still outperforms conventional models. This work extends the mesh-based simulation framework of Pfaff et al. (arXiv:2010.03409) to structural mechanics, demonstrating that graph neural networks can serve as efficient surrogates for finite element analysis across varying geometries.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08287v1</guid>
      <category>cs.LG</category>
      <category>cond-mat.mtrl-sci</category>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Josiah D. Kunz, Kamal Choudhary</dc:creator>
    </item>
    <item>
      <title>MotionVLA: Injecting Geometric Motion into Vision-Language-Action Model</title>
      <link>https://arxiv.org/abs/2606.08288</link>
      <description>arXiv:2606.08288v1 Announce Type: new 
Abstract: Vision-language-action (VLA) models increasingly condition robot policies on history, depth, or 4D features to resolve ambiguity in long-horizon manipulation. However, more spatiotemporal evidence is not necessarily better: when the injected evidence is not motion-consistent, it can introduce geometric drift, fragmented temporal cues, and unstable action generation. This raises a simple question: should a VLA remember past frames, or remember the motion that connects them? We introduce MotionVLA, a motion-history interface that converts a short past-only video window into compact, time-continuous trajectory-field tokens. Instead of treating history as a sparse set of ndependently lifted frames, MotionVLA represents recent observations as physically coherent motion evidence. Current visual tokens query this history to retrieve task-relevant motion information, which is then recoupled into the VLA stream under trajectory-grounded supervision. Experiments across simulation benchmarks and preliminary real-robot rollouts show that MotionVLA improves long-horizon manipulation while producing smoother and more direct executions. These results suggest that effective VLA memory is not just about providing more 4D context, but about exposing motion-consistent evidence that is usable for control.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08288v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shanglin Yuan, Weiheng Zhao, Xianda Guo, Wei Sui, Li Yu, Wenyu Liu, Xinggang Wang</dc:creator>
    </item>
    <item>
      <title>On solving symmetric multi-type orthogonal non-negative matrix tri-factorization problem</title>
      <link>https://arxiv.org/abs/2606.08291</link>
      <description>arXiv:2606.08291v1 Announce Type: new 
Abstract: We study the symmetric multi-type orthogonal non-negative matrix tri-factorization problem, where several symmetric non-negative matrices are simultaneously approximated by factors of the form $GS_{i}G^{\top}$, with a shared non-negative and orthogonal factor $G$. This model is motivated by clustering and network analysis, where non-negativity improves interpretability and orthogonality gives a natural assignment-type structure to the latent factor. Since the resulting optimization problem is highly non-convex, we develop two heuristic algorithms for computing high-quality local solutions. The first one is a fixed point method derived from the Karush-Kuhn-Tucker conditions after adding a penalty term for the orthogonality constraint. The second one is a three-stage ADAM-based method that combines non-negativity-preserving optimization, orthogonalization, and restricted ADAM refinement on the feasible set. We evaluate both methods on synthetic data, including noisy instances, and on citation network benchmarks. The synthetic experiments show that both algorithms recover factorizations close to the optimum and remain stable under noise. On real networks, the learned embeddings are competitive with or better than standard baselines such as SVD, node2vec, and classical link prediction heuristics in link prediction, node clustering, and node classification tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08291v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Rok Hribar, Gregor Papa, Janez Povh, Andrej Kastrin</dc:creator>
    </item>
    <item>
      <title>Ablation-Reversible Heads Don't Transfer: A Stress Test for Mechanistic Role Claims in Transformers</title>
      <link>https://arxiv.org/abs/2606.08292</link>
      <description>arXiv:2606.08292v1 Announce Type: new 
Abstract: In mechanistic interpretability, attention heads are commonly elevated to role claims (e.g., "this head represents addition") when they are necessary for a behavior, encode it linearly, and recover that behavior when restored after ablation. We show this evidence is insufficient: across three 7-8B instruction-tuned models and five computation families, heads passing all three checks routinely fail to transfer the computation when their activations are patched into a different prompt under matched controls. We introduce KID (Knowing / Intent / Doing), a role-assignment lens for attention heads, and pair it with a three-stage pipeline: capability-selective screening (CSS), singular value decomposition (SVD), and activation transduction under matched controls. Our results document a preliminary role taxonomy (including prompt-trajectory stabilizers, answer-side logit-bias heads, and soft computation-pattern carriers) and show that the same-answer control (a transduction target sharing the answer string but not the requested computation) is an underused check that exposes broad state transfer masquerading as semantic specificity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08292v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Philip Quirke</dc:creator>
    </item>
    <item>
      <title>A Second-order Structure-preserving Parametric FEM for Surface Evolution</title>
      <link>https://arxiv.org/abs/2606.08293</link>
      <description>arXiv:2606.08293v1 Announce Type: new 
Abstract: In this paper, we propose a second-order-in-time, structure-preserving, and mesh-robust parametric finite element method for surface diffusion and volume-preserving mean curvature flow. We first reformulate the original evolution equations into new systems in which the tangential motion is governed by a harmonic map heat flow. This heat flow maps a fixed reference surface onto the unknown evolving surface and drives points on the evolving surface to move in their tangent spaces so as to reduce the associated harmonic energy. As a result, in the discrete setting, the mesh quality can be maintained at a level comparable to that of the reference surface, unless singularities occur. The volume-preserving property is theoretically guaranteed by the careful design of the scheme, while energy dissipation is enforced through a Lagrange multiplier. We present several numerical experiments to demonstrate second-order convergence in time and the advantage of the proposed method in preserving mesh quality. The structure-preserving properties are further confirmed by the numerical results. Finally, the proposed framework can be readily extended to other geometric flows.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08293v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Beiping Duan, Zongze Yang</dc:creator>
    </item>
    <item>
      <title>TLRD: Teaching LLMs to Reason over Tabular Data with Tri-Level Rationale Distillation</title>
      <link>https://arxiv.org/abs/2606.08295</link>
      <description>arXiv:2606.08295v1 Announce Type: new 
Abstract: Tabular data is a primary medium for storing real-world information, driving many industrial applications of machine learning. Traditional predictors achieve strong predictive performance but do not provide readable, case-specific explanations essential for decision-making. Large Language Models (LLMs) can naturally bridge this gap by generating predictions alongside explanations. However, dataset-specific patterns, such as feature distributions and interactions, make tabular data difficult for LLMs to understand and reason over, while label-only fine-tuning improves performance at the cost of catastrophic forgetting. To address this problem, we propose Tri-Level Rationale Distillation (TLRD), a framework that converts label-only tabular datasets into structured rationale supervision for LLMs. TLRD uses a high-capacity teacher to synthesize a rationale corpus grounded in three complementary levels of evidence: instance-level feature, dataset-level distributional context, and comparison-level retrieved neighbors, then distills the rationale into student LLMs, enabling zero-overhead prediction and grounded explanation from raw features only. Experiments on multiple domain datasets show that TLRD significantly closes the performance gap between LLMs and state-of-the-art tree ensembles while producing grounded and readable explanations, offering a valuable reference for high-stakes decision-making.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08295v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tianyuan Liang, Xuwei Tan, Lei Shi, Junsheng Zhong, Ziyu Hu, Tian Xie, Zhiqun Zuo, Xiaodong Yu, Xueru Zhang</dc:creator>
    </item>
    <item>
      <title>Revisiting the shutdown problem</title>
      <link>https://arxiv.org/abs/2606.08296</link>
      <description>arXiv:2606.08296v1 Announce Type: new 
Abstract: A key premise in leading arguments for existential risk from artificial intelligence is that malfunctioning artificial agents could not be easily shut down. This motivates the catastrophic shutdown problem of ensuring that agents can be shut down before they cause an existential catastrophe. A range of arguments and theorems are offered to suggest that solving the catastrophic shutdown problem is difficult, bolstering arguments for existential risk and motivating a search for solutions to the catastrophic shutdown problem. This paper argues for two conclusions. First, existing arguments do not establish the difficulty of solving the catastrophic shutdown problem. Second, concern for the catastrophic shutdown problem has led to technical solutions that impose a high safety tax on model performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08296v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>David Thorstad</dc:creator>
    </item>
    <item>
      <title>QueryWeaver: Reliable Multi-Tool Query Execution Planning via LLM-Based Graph Generation</title>
      <link>https://arxiv.org/abs/2606.08300</link>
      <description>arXiv:2606.08300v1 Announce Type: new 
Abstract: Many real-world queries over personal data span multiple applications and require structured planning, as individual tools expose only partial information. While LLMs show strong reasoning and tool use, reliably executing multi-step, cross-tool queries remains challenging. We introduce a system that converts natural language queries into structured graphs and executes them via a deterministic planner. Our approach uses depth-first search to resolve dependencies and combine results across tools, improving reliability and enabling queries beyond traditional keyword-based search. We demonstrate high accuracy even with smaller or locally hosted LLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08300v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Aishwarya Chakravarthy, Vidhi Kulkarni, Duen Horng Chau</dc:creator>
    </item>
    <item>
      <title>Unraveling the Ai2 Asta Scholarly Research Assistant Citation System</title>
      <link>https://arxiv.org/abs/2606.08301</link>
      <description>arXiv:2606.08301v1 Announce Type: new 
Abstract: Despite the growing integration of Deep Research tools into academic workflows, empirical evidence on the operation, stability, and potential biases of their citation systems remains scarce. This study addresses this gap by evaluating the intensity, consistency, and bibliographic characteristics of references cited in the literature reports generated by Ai2 Asta, with the aim of understanding how its citation system operates and assessing its implications for scholarly communication. To this end, ten domain-specific queries were submitted to Asta's Summarise Literature feature, and two independent rounds of data collection were conducted. From each report, in-text citations, cited references, as well as other metrics related to the response process were extracted and examined. The results reveal high citation intensity, with reports integrating numerous in-text citations grounded in retrieved evidence and a diverse yet concentrated set of venues. However, notable instability is observed in the composition of cited references across identical queries, alongside a lack of concordance between retrieved documents and those ultimately cited, suggesting additional opaque selection mechanisms during report generation. These findings indicate that, while Ai2 Asta produces well-structured and quality reports, its instability and opacity in the citation process pose challenges in quantitative science studies due to their lack of reproducibility and transparency. Despite the restricted number of queries and disciplinary scope, the results offer valuable insights for researchers, bibliometricians, developers, and research evaluators seeking to understand, use or regulate AI-based scholarly assistants responsibly.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08301v1</guid>
      <category>cs.DL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.21555/rpc.v7i2.3675</arxiv:DOI>
      <arxiv:journal_reference>Revista Panamericana de Comunicaci\'on, 7(2), 2026</arxiv:journal_reference>
      <dc:creator>Enrique Ordu\~na-Malea, Carlos Lopezosa</dc:creator>
    </item>
    <item>
      <title>HACK++: Towards More Effective Head-Aware Key-Value Compression for Efficient Visual Autoregressive Modeling</title>
      <link>https://arxiv.org/abs/2606.08302</link>
      <description>arXiv:2606.08302v1 Announce Type: new 
Abstract: Visual Autoregressive (VAR) models adopt a next-scale prediction paradigm, offering high-quality generation with substantially fewer decoding steps. However, existing VAR models suffer from significant attention complexity and severe memory overhead due to the accumulation of key-value (KV) caches across scales. In this paper, we tackle this challenge by introducing KV cache compression into the next-scale paradigm. We begin with an in-depth analysis of VAR attention and observe that attention heads can be stably divided into two functionally distinct categories: Contextual Heads focus on maintaining semantic consistency, while Structural Heads preserve spatial coherence. Their functional divergence makes existing one-size-fits-all compression methods perform poorly on VAR models. We further find that the two head types differ markedly in their reliance on historical scales, and that this reliance shifts across layers and generation steps, arguing for an adaptive cache budget allocation. To address these challenges, we propose HACK++, a training-free Head-Aware key-value Compression frameworK for VAR models. From a one-time offline calibration, HACK++ classifies head types and derives head-specific priors. At inference, it decouples attention from cache compression under independent budgets, bounding the current-scale attention cost while compressing the accumulated cache far more aggressively, via pattern-specific strategies and a reliance-aware budget allocation. Extensive experiments on multiple VAR models across text-to-image, class-conditional, and unified understanding-and-generation tasks validate the effectiveness and generalizability of HACK++. For example, on Infinity-2B/8B, HACK++ maintains near-lossless generation with only a 30% attention budget and a 10% cache budget, and remains robust even under a 1% cache budget.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08302v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ziran Qin, Yuchen Jiang, Mingbao Lin, Youru Lv, Hang Guo, Wen Fei, Weiyao Lin</dc:creator>
    </item>
    <item>
      <title>GeoGNN: Time Series Geo-Localization using Two-Tower Graph Neural Networks</title>
      <link>https://arxiv.org/abs/2606.08303</link>
      <description>arXiv:2606.08303v1 Announce Type: new 
Abstract: This paper investigates a novel concept of time series geolocalization, where the goal is to infer the geographic origin of each raw time series. Successful geolocalization can provide spatial context to time series, enabling downstream location-aware applications. We formalize the problem, adapt core ideas from image geolocalization to establish strong baselines, and propose GeoGNN, a two-tower architecture. During training, GeoGNN's spatial tower learns embeddings of geographic cell candidates by leveraging the geographic adjacency graph, while the temporal tower extracts informative representations from time series. During inference, each temporal representation is matched against candidate geographic embeddings using dot-product similarity, combined with an auxiliary classification head, to predict the time series' associated geographic origin. Experiments on large-scale, countrywide electricity-consumption datasets demonstrate that GeoGNN achieves the best performance across datasets and enhances both fine- and coarse-grained geolocalization accuracy by ~27% on average.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08303v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Toan Tran, Waqwoya Abebe, Abhishek Potnis, Supriya Chinthavali, Cyrus Shahabi, Li Xiong, Dalton Lunga</dc:creator>
    </item>
    <item>
      <title>Towards Graph Foundation Models for Dynamics in Complex Networked Systems: Lessons from Super-Spreader Identification in Multilayer Networks</title>
      <link>https://arxiv.org/abs/2606.08306</link>
      <description>arXiv:2606.08306v1 Announce Type: new 
Abstract: Network dynamics - including spreading, influence maximisation, and epidemic modelling - remain largely confined to the transductive paradigm, where models are trained on a single network and cannot be reused on unseen graphs without retraining. We argue that inductive cross-network generalisation is a necessary prerequisite for Graph Foundation Models (GFMs) in this domain and propose four design properties towards this goal. As a proof of concept, ts-net (TopSpreadersNetwork), trained solely on synthetic multilayer networks (MLNs), demonstrates zero-shot generalisation to real-world MLNs of varying size and layer count, outperforming classical heuristics and transductive baselines on three of four metrics. Based on ts-net's performance, we further outline five open challenges towards building GFMs for network dynamics: scale, many-layer generalisation, self-supervised pretraining, cross-task transfer, and node-attribute integration.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08306v1</guid>
      <category>cs.LG</category>
      <category>cs.SI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Micha{\l} Czuba, Mateusz Stolarski, Adam Pir\'og, Piotr Bielak, Piotr Br\'odka</dc:creator>
    </item>
    <item>
      <title>Understanding the Sociocultural Dimensions of Mental Health Discourse in Arabic-Language X Communities</title>
      <link>https://arxiv.org/abs/2606.08307</link>
      <description>arXiv:2606.08307v1 Announce Type: new 
Abstract: Computational mental health research has predominantly centered on English-speaking populations, leaving Arabic-language discourse comparatively under-examined. We present an exploratory computational study of 8,147 tweets from 607 users classified by a GPT-4.1 personal-disclosure pipeline as likely lived-experience authors in three condition-specific Arabic-language X (formerly Twitter) Communities. We focus on discourse related to borderline personality disorder (BPD), bipolar disorder, and ADHD, and characterize community-associated linguistic patterns using a multi-domain cultural keyword framework. The results suggest that in this corpus, Bipolar tweets contain more religious and medical vocabulary, BPD tweets contain more relational, identity, and emotional-distress vocabulary, and ADHD tweets more often focus on practical symptoms and medication management. We treat these patterns as hypothesis-generating rather than confirmatory because the corpus is imbalanced across conditions, some subcorpora are temporally concentrated, and the keyword framework is an initial operationalization rather than a validated measurement instrument. The paper contributes a reusable LLM-assisted personal-disclosure pipeline and an exploratory cultural keyword framework for Arabic mental health discourse.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08307v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Amal Alqahtani (King Saud University, Riyadh, Saudi Arabia), Rana Salama (Cairo University, Egypt), Mona Diab (Carnegie Mellon University, Pittsburgh, USA)</dc:creator>
    </item>
    <item>
      <title>Fourier fractal dimension to predict the generalization of deep neural networks</title>
      <link>https://arxiv.org/abs/2606.08308</link>
      <description>arXiv:2606.08308v1 Announce Type: new 
Abstract: Predicting the generalization performance of deep neural networks without relying on hold-out validation data is a fundamental challenge in machine learning. While Stochastic Gradient Descent (SGD) drives the optimization of these highly parameterized models, its heavy-tailed, non-Gaussian dynamics induce complex, scale-invariant trajectories in the parameter space. In this paper, we propose a novel generalization measure based on the Fourier fractal dimension of the network's weight variations. By analyzing the characteristic function of the L\'evy-driven stochastic differential equations in the frequency domain, we extract a metric that robustly captures the geometric complexity of the learning process. Furthermore, we introduce a customized Fourier-based optimizer designed to actively regularize this fractal dimension during training. Extensive empirical evaluations on the CIFAR-10, SVHN, and MNIST datasets demonstrate that our proposed Fourier generalization measure exhibits a strong correlation with the actual generalization gap. Our method achieves state-of-the-art Kendall rank correlation coefficients, outperforming a wide array of existing norm-based, margin-based, and PAC-Bayesian measures. Ultimately, this work highlights the potential of frequency-domain fractal analysis as both a powerful predictor for model generalizability and a principled foundation for developing more stable optimization algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08308v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Joao B. Florindo, Davi Wanderley Misturini</dc:creator>
    </item>
    <item>
      <title>Where the Score Lives: A Wavelet View of Diffusion</title>
      <link>https://arxiv.org/abs/2606.08309</link>
      <description>arXiv:2606.08309v1 Announce Type: new 
Abstract: Score-based generative models have had remarkable success over the last decade in generating a diverse set of visually plausible images. A variety of architectures including CNNs, U-Nets, and Transformers have been used as the score-approximation network in such diffusion modeling; however, to date, relatively little is known about how these architectural choices impact generative behavior. In this work, to provide insight into this area, we propose an analytically solvable parameterization of the score function using an expansion in a 2D orthogonal wavelet basis. In particular, we derive interpretable optimal score functions in terms of the moments of the data distribution. We use this parametrization to provide an architecture-agnostic, moment-based analysis that reveals which attributes of the data distribution tend to matter most for denoising. Our score machine is flexible enough to partially mimic the relevant inductive biases of multiple architectures, including U-Nets, and CNNs, taking a step towards understanding why different score architectures can exhibit distinct generative behavior. Since our score is solvable in terms of the moments of the data, we can begin to understand how the data distribution interacts with the score network to produce the behavior we observe in diffusion models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08309v1</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <arxiv:journal_reference>Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (AISTATS) 2026, Tangier, Morocco. PMLR: Volume 300</arxiv:journal_reference>
      <dc:creator>Emma Finn, Binxu Wang, T. Anderson Keller, Demba E. Ba</dc:creator>
    </item>
    <item>
      <title>To Nuke or Not to Nuke: LLMs' (Missing) Ethical Reasoning and Actions in a High-Stakes Decision-Making Simulation</title>
      <link>https://arxiv.org/abs/2606.08310</link>
      <description>arXiv:2606.08310v1 Announce Type: new 
Abstract: Large language models (LLMs) are increasingly deployed as long-horizon agents with decision-making capacities. While LLMs can show ethical competence on dilemmas such as trolley problems, this competence may not translate to complex, agentic scenarios. We study this gap in Civilization V, a multiplayer game with a complex decision-making landscape including economy, diplomacy, technology, and military strategy. Starting from 130 high-tension LLM self-play episodes, in which an LLM player spontaneously escalated nuclear authorization, we replay them across 13 models with three prompt interventions: an ethical prompt naming nuclear harm, removal of the previous model's decision-making rationale, and high-stakes framing emphasizing real-world impacts. No interventions nor their combinations reliably eliminate emergent escalation. We identify three failure pathways: ethical reasoning that fails to surface without prompting, fails to appear even when prompted, or surfaces but fails to take effect when strategic counter-factors dominate. Evaluations of agentic models, therefore, must test whether ethical reasoning is spontaneously invoked and behaviorally effective in complex decision-making contexts, beyond whether it can be elicited in isolation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08310v1</guid>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>John Chen, Sihan Cheng, Can Gurkan, H M Abdul Fattah</dc:creator>
    </item>
    <item>
      <title>Curation of a Cardiology Interface Terminology for Highlighting Electronic Health Records using Machine Learning</title>
      <link>https://arxiv.org/abs/2606.08311</link>
      <description>arXiv:2606.08311v1 Announce Type: new 
Abstract: Electronic health record (EHR) notes are dense medical documents containing large amounts of information, often filled with complex medical jargon. Highlighting all details in EHRs helps reduce the likelihood of missing crucial information by drawing attention to key content. This study proposes the design of a Cardiology Interface Terminology (CIT) to accurately highlight all details in EHR notes of cardiology patients. We introduce an innovative Machine Learning (ML) technique for the design of CIT. The ML technique requires training data. Manual preparation of such training data is time-consuming and expensive. The process of the CIT design includes three phases. In the first two phases, we innovatively derive a training data CIT to be used by the third phase, ML technique. We start by designing an initial CIT, composed of several components: the cardiology-related sub-hierarchies of SNOMED, other SNOMED concepts mined from EHRs of build set, and necessary components of terms e.g., medical abbreviations and medications. Utilizing an iterative process, fine-grained phrases containing initial CIT concepts are extracted from build set as CIT concept candidates. The candidate concepts are semi-automatically reviewed before being added to CIT, yielding the training data CIT, TCIT. In the third phase, a ML model is trained with TCIT to identify candidates fitting to be concepts in the CIT. This model is used to extract further concepts from build set, yielding the final CIT. The final CIT is then used to highlight the test set and evaluate the extent to which it captures details in an unseen EHR dataset. For this purpose, four evaluation metrics, coverage, breadth, completeness, and conciseness are used. The highlighted test set has a coverage of 74.21%, with a breadth of 1.68. For 20 random notes in test set, the average completeness is 98.2% and average conciseness is 84.2%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08311v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mahshad Koohi Habibi Dehkordi, Shuxin Zhou, Yehoshua Perl, Fadi P. Deek, James Geller, Gai Elhanan, Andrew J. Einstein, Luke Lindemann, Vipina K. Keloth</dc:creator>
    </item>
    <item>
      <title>Neuro-Symbolic Injection of LTLf Constraints in Autoregressive Reinforcement Learning Policies</title>
      <link>https://arxiv.org/abs/2606.08312</link>
      <description>arXiv:2606.08312v1 Announce Type: new 
Abstract: In this work we study offline reinforcement learning (RL) under temporally extended task constraints expressed in Linear Temporal Logic over finite traces (LTLf). Recently, transformer-based approaches such as Trajectory Transformers and Decision Transformers have been adopted to address RL as a sequence modeling problem. However, these methods optimize purely for reward and do not account for high-level temporal requirements. Here, we introduce a neurosymbolic framework that injects LTLf background knowledge into such transformer-based RL policies. Our approach compiles LTLf formulas into deterministic finite automata (DFAs) and integrates them into the learning process through a differentiable representation and a logic-based loss function. In particular, we derive differentiable satisfaction signals from DFA progression and use them as a regularization term during training. The resulting method is architecture-agnostic across different models. We evaluate the proposed framework on navigation environments with specification suites covering combinations of safety and reachability temporal properties. Experimental results show that incorporating background knowledge not only improves constraint satisfaction, but also maintains competitive return compared to vanilla baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08312v1</guid>
      <category>cs.AI</category>
      <category>cs.FL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ashkan Ansarifard (Sapienza University of Rome), Matteo Mancanelli (Sapienza University of Rome), Elena Umili (Sapienza University of Rome), Fabio Patrizi (Sapienza University of Rome)</dc:creator>
    </item>
    <item>
      <title>Integrating Deep Learning Demand Forecasting with Multi-Objective Optimization for Circular Coffee Supply Chains: A Data-Driven Framework for Cost, Emissions, and Freshness Management</title>
      <link>https://arxiv.org/abs/2606.08314</link>
      <description>arXiv:2606.08314v1 Announce Type: new 
Abstract: The coffee supply chain is one of the most complex agri-food networks, marked by geographically dispersed production, multi-tier coordination, and high sensitivity to quality and freshness. While sustainability and digitalization have gained attention, demand forecasting, optimization, and traceability are often treated separately. This study presents a two-phase integrated framework. First, a hybrid CNN-LSTM model is used for demand forecasting. On the public Coffee Chain Sales dataset with chronological 70/15/15 splitting, the model achieves MAE of 22.87 and R^2 of 0.90, outperforming the best deep learning benchmark by ~12% and classical methods by over 30%. In the second phase, the forecasted demand feeds a tri-objective mixed-integer linear programming (MILP) model that jointly minimizes cost, minimizes carbon emissions, and maximizes product freshness in a multi-period, multimodal, closed-loop supply chain with circular recovery. Freshness is modeled via exponential decay based on inventory age. Using the epsilon-constraint method, 25 Pareto solutions are obtained. Sensitivity and policy analyses show that balanced sustainability policies can reduce emissions by 22.4% with only a 9.9% cost increase while maintaining near-optimal freshness.
  Keywords: Coffee supply chain; Deep learning; Demand forecasting; Multi-objective optimization; Circular economy; CNN-LSTM; Mixed-integer linear programming.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08314v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ger\c{c}ek Budak (Department of Industrial Engineering, Ankara Y{\i}ld{\i}r{\i}m Beyaz{\i}t University, Ke\c{c}i\"oren, Ankara 06010, T\"urkiye), Faraz Gholamzadeh Gharehgheshlaghi (Department of Industrial Engineering, Ankara Y{\i}ld{\i}r{\i}m Beyaz{\i}t University, Ke\c{c}i\"oren, Ankara 06010, T\"urkiye), Melika Barjesteh Vaezi (Department of Kinesiology and Sport Management, Texas Tech University, Lubbock, TX, United States), Ahmad Gholizadeh Lonbar (Department of Civil, Construction, and Environmental Engineering, University of Alabama, Tuscaloosa, AL, USA)</dc:creator>
    </item>
    <item>
      <title>Benchmarking Sequential Feedback Optimization for Wind Farm Power Maximization</title>
      <link>https://arxiv.org/abs/2606.08315</link>
      <description>arXiv:2606.08315v1 Announce Type: new 
Abstract: This paper benchmarks sequential feedback optimization (SFO) for wind farm power maximization using a medium-fidelity dynamic flow model. We compare SFO with two well-established approaches, adjoint-based economic model predictive control (AMPC) and extremum seeking control (ESC), under a common nine-turbine layout and identical operating constraints. The comparison focuses on steady-state power production and computational efficiency, both relevant for real-time implementation. The simulation results illustrate that SFO achieves higher steady-state power while preserving real-time feasibility, AMPC provides a better transient performance at a higher online computational cost and without guarantees of convergence to the steady-state optimum, and ESC offers a computationally inexpensive model-free baseline that may converge to locally optimal solutions. These results provide a practical reference for selecting wind farm control strategies and for designing scalable, real-time optimization methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08315v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shijie Huang, Sergio Grammatico</dc:creator>
    </item>
    <item>
      <title>Architectural Evolution and Selection Framework for Database Systems in AI-Ready Data Platforms</title>
      <link>https://arxiv.org/abs/2606.08317</link>
      <description>arXiv:2606.08317v1 Announce Type: new 
Abstract: The rise of polyglot data management and AI-ready database architectures has created a complex design space across diverse database paradigms. However, architecture selection in modern enterprise environments continues to rely heavily on ad-hoc engineering intuition, with limited systematic frameworks to guide decision-making across heterogeneous database systems. This paper introduces a unified cross-paradigm evaluation and selection framework for database architecture design in AI-ready data platforms. The framework is based on nine architectural dimensions and incorporates a structured multi-stage selection process involving workload characterization, constraint filtering, and compatibility scoring to enable systematic comparison and decision-making. To ground the framework, we conduct a structured comparative analysis across thirteen major database paradigms spanning transactional, analytical, and AI-oriented systems. This analysis reveals three recurring patterns in database evolution: decoupling of storage and compute, workload-driven specialization, and convergence toward integrated AI-ready platforms. The proposed framework is demonstrated through a representative enterprise case study in financial fraud detection, illustrating how hybrid, polyglot architectures emerge as optimal solutions for multidimensional workload requirements. The cross-paradigm analysis culminates in an AI-ready reference architecture that integrates lakehouse storage, feature processing, and semantic retrieval layers as the unified substrate for modern analytics, machine learning, and Retrieval-Augmented Generation applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08317v1</guid>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mohit Srivastava</dc:creator>
    </item>
    <item>
      <title>Orthogonality and Dimensionality in Airline Cluster Analysis using PCA and Kernel PCA</title>
      <link>https://arxiv.org/abs/2606.08322</link>
      <description>arXiv:2606.08322v1 Announce Type: new 
Abstract: To characterize the US airline profit cycles from 1995 to 2020, the authors of Renold et al. (2023) combine k-means clustering, principal component analysis, and system dynamic modelling. We replicate their clustering experiment in three spaces -- the original 7-dimensional raw-variable space, a 3-dimensional PC score space, and a 4-dimensional PC score space using their dataset gratefully included in the paper. We show that the six-cluster taxonomy is geometrically robust: k-means in 3-PC space produces bit-for-bit identical cluster assignments relative to 7D raw space. As a nonlinearity check we apply kernel PCA under six kernels spanning three families plus a linear baseline. All six kernels preserve the six-cluster assignment in 2D. A 1D diagnostic tightens this: the linear kernel conflates the COVID year C_3 with the peak-profit cluster C_0, whereas all five non-baseline kernels shift C_3 to overlap only the post-financial-crisis cluster C_5. Agreement across the kernel families confirms an intrinsically linear manifold with no hidden curvature. The silhouette criterion reveals that the dataset structurally supports only three clusters, not six. Collinearity in the raw 7D space suppresses the silhouette signal that would otherwise identify k=3 as the structurally motivated choice.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08322v1</guid>
      <category>cs.LG</category>
      <category>stat.ME</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Andreas Schlapbach</dc:creator>
    </item>
    <item>
      <title>"So There's a Catch-22 Here": How Early Adopters Who Build Multi-Agent LLM Systems Conceptualize Transparency</title>
      <link>https://arxiv.org/abs/2606.08323</link>
      <description>arXiv:2606.08323v1 Announce Type: new 
Abstract: Multi-agent large language model (LLM) systems are rapidly emerging, yet transparency, a cornerstone of responsible AI, remains under-defined in these distributed architectures, which have complexities of inter-agent coordination and orchestration. In this paper, we present one of the first empirical study of how early adopters of multi-agent LLM systems, who are both the builders and users, understand and practice transparency. We conducted semi-structured interviews with 13 early adopters in [Large Technology Organization] and applied thematic analysis to identify recurring patterns. Participants articulated divergent yet complementary framings of transparency, including reproducibility, debugging, boundary-setting, visualization, and auditing. These perspectives spanned questions of what transparency entails, why it matters, and how it is achieved. We synthesize these into a multidimensional framework, which is developer, user, and governance-focused positioning transparency as a situated socio-technical practice that informs future HCI and AI design and research around aligning expectations and capacities of their intended audiences.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08323v1</guid>
      <category>cs.HC</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Suchismita Naik, Samir Passi, Mihaela Vorvoreanu, Scott Saponas, Amanda Hall</dc:creator>
    </item>
    <item>
      <title>Set-Based Transformer for Atmospheric Compensation in Standoff LWIR Hyperspectral Imaging</title>
      <link>https://arxiv.org/abs/2606.08324</link>
      <description>arXiv:2606.08324v1 Announce Type: new 
Abstract: Passive long-wave infrared (LWIR) hyperspectral imaging under a standoff geometry depends on atmospheric absorption and emission, as well as reflected radiance, thus making atmospheric compensation essential to get knowledge of a target of interest. Despite its importance, this compensation has been largely overlooked due to its practical and modeling difficulty. In this paper, we present a lightweight set-based deep learning framework that takes multiple radiance measurements, collected at different standoff ranges, as input and jointly estimates transmittance, atmospheric path radiance, and a shared downwelling spectrum. We analyze the learned representation with a sparse autoencoder and observe that several latent features do activate on geographically coherent subsets of the test data despite the absence of location supervision. Experiments on a MODTRAN generated standoff LWIR dataset demonstrate low spectral distortion across all estimated products. The dataset and code is publicly available at: https://factral.co/SAE-LWIR/</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08324v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Fabian Perez, Nicolas Quintero, Jeferson Acevedo, Hoover Rueda-Chacon</dc:creator>
    </item>
    <item>
      <title>Chiaroscuro Attention: Spending Compute in the Dark</title>
      <link>https://arxiv.org/abs/2606.08327</link>
      <description>arXiv:2606.08327v1 Announce Type: new 
Abstract: Standard transformers apply self-attention uniformly at every layer and token, regardless of whether the input requires dynamic cross-token interaction. We propose CHIAR-Former (Chiaroscuro Attention), a 4-layer hybrid transformer that routes each token to one of three operators - DCT spectral mixing, RBF kernel mixing, or full self-attention - based on per-token spectral entropy, a theoretically justified complexity signal. Through systematic ablation on WikiText-103, we discover routing collapse: the router consistently rejects RBF in favour of DCT and attention, revealing that spectral mixing and dynamic attention are complementary and sufficient. A purpose-designed DCT+Attention-only variant achieves Val PPL 36.54 on WikiText-103 - a 45% improvement over a full-attention baseline (PPL 66.62) at 62.5% fewer attention FLOPs. We extend evaluation to WikiText-2, IMDB sentiment classification, and synthetic ListOps operations, establishing a clear operating regime: CHIAR-Former excels on large-scale naturalistic text where token diversity supports spectral specialisation, while full attention retains an edge on small datasets and synthetic pattern-matching tasks. These findings - both the wins and the losses - together define when and why spectral routing earns its keep.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08327v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Prateek Kumar Sikdar</dc:creator>
    </item>
    <item>
      <title>Optimal Online Equitable Allocation with Indivisible Resources</title>
      <link>https://arxiv.org/abs/2606.08328</link>
      <description>arXiv:2606.08328v1 Announce Type: new 
Abstract: Equitable allocation of indivisible goods to agents in online settings is an algorithmic primitive with applications for load balancing, network routing, online marketplaces, and multi-agent systems. We consider a general setting in which allocations are constrained to be bases of discrete polymatroids that arrive online.
  Our work demonstrates that a simple, myopic algorithm called Brick-Laying, which greedily minimizes the sum of squared loads on agents, achieves a universal and objective-free notion of optimality called majorization minimax-optimality [BDK26] for this setting. As a consequence, Brick-Laying simultaneously guarantees minimax optimal competitive ratios and regret for all Schur-concave and Schur-convex objectives, and for any number of agents and resources (despite being agnostic to problem scale).
  Departing from popular primal-dual analysis, we employ majorization to compare allocations. We leverage the conjugates of integer partitions -- which act as a discrete dual to majorization -- to characterize worst-case instances for the Brick-Laying algorithm. Our approach reveals a novel structural connection between the geometry of partitions and online equitable allocation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08328v1</guid>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ramiro N. Deo-Campo Vuong</dc:creator>
    </item>
    <item>
      <title>SMI: Efficient Self-Supervised Learning via Mutual-Information-Inspired Dependency Optimization</title>
      <link>https://arxiv.org/abs/2606.08332</link>
      <description>arXiv:2606.08332v1 Announce Type: new 
Abstract: Self-supervised learning (SSL) has achieved remarkable representation learning performance, but many existing methods rely on large batch sizes, memory banks, momentum encoders, or global synchronization mechanisms that substantially increase computational cost and training complexity. In this work, we propose Semantic Mutual Information (SMI), a lightweight self-supervised objective derived from a mutual-information-inspired dependency formulation under Gaussian assumptions. Unlike conventional correlation matching objectives that operate on high-dimensional feature correlation matrices, SMI performs optimization on a sample-level dependency matrix through a nonlinear transformation of pairwise correlations. This formulation induces distinct optimization dynamics that emphasize strongly dependent semantic pairs while maintaining representation diversity. Experimental results on ImageNet using a ResNet-50 backbone demonstrate that SMI achieves competitive linear evaluation performance relative to state-of-the-art SSL approaches while substantially reducing computational complexity. Across multiple low-resource benchmarks, SMI consistently improves transfer performance over Barlow Twins, particularly on fine-grained datasets. Furthermore, analyses of optimization dynamics and representation geometry suggest improved alignment--redundancy balance, greater feature diversity, and more spatially localized semantic representations. These results indicate that nonlinear dependency optimization provides an effective and computationally efficient alternative to conventional correlation-based self-supervised learning objectives.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08332v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pritam Mishra, Coloma Ballester, Dimosthenis Karatzas</dc:creator>
    </item>
    <item>
      <title>Beyond Raw Signals: Undecoded Generative Latents as Privileged Synthetic Data</title>
      <link>https://arxiv.org/abs/2606.08336</link>
      <description>arXiv:2606.08336v1 Announce Type: new 
Abstract: While multimodal integration significantly improves computer vision models, deploying them incurs prohibitive inference costs and requires scarce, perfectly paired datasets. Recent methods address this data bottleneck by synthesizing missing modalities via generative AI, yet they introduce a severe inefficiency: the Decode-Encode Loop. Specifically, information-rich generative latents are decoded into noisy raw signals, forcing the downstream classifier to waste capacity re-encoding them. To bypass this bottleneck, we propose Direct Latent Augmentation (DLA), utilizing undecoded generative latents directly as privileged information. Furthermore, to transfer this dense knowledge to a purely visual student, we introduce Multilayer Explicit Simulated Synesthesia (MESSy). Instead of enforcing rigid representation matching, which forces the student to distort its native visual features to accommodate complex multimodal topologies, MESSy uses a predictive objective to safely internalize these physical priors. Empirical results demonstrate that our framework significantly outperforms raw data augmentation and traditional distillation. Ultimately, our approach yields highly accurate unimodal students with ``synesthetic'' latent structures that are inherently aligned with physical properties they have never directly observed.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08336v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Cristian Sbrolli, Nicolas Michel, Matteo Matteucci, Toshihiko Yamasaki</dc:creator>
    </item>
    <item>
      <title>Floating-point autotuning with customized precisions</title>
      <link>https://arxiv.org/abs/2606.08339</link>
      <description>arXiv:2606.08339v1 Announce Type: new 
Abstract: Reduced-precision arithmetic offers significant opportunities to improve performance, memory usage, and energy efficiency in numerical applications, provided that numerical accuracy is preserved. This work investigates automated precision tuning through customized floating-point formats with user-defined exponent and significand sizes, enabling the emulation of emerging low-precision formats and the exploration of non-standard precision configurations within a unified mixed-precision framework. The proposed methodology, implemented in the PROMISE precision autotuning tool, combines numerical validation with a systematic search to generate program variants that satisfy user-defined accuracy requirements. To address the computational cost of this exploration, a containerized benchmarking framework supports parallel execution across multiple algorithms and parameter configurations. The approach is evaluated on a suite of numerical programs, including linear solvers and applications from the Rodinia benchmark. Results show that a substantial proportion of variables can be safely reduced to lower precision while preserving accuracy, indicating that standard double precision is often over-provisioned. These findings highlight the potential of automated precision tuning to derive efficient mixed-precision configurations tailored to application-specific accuracy requirements.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08339v1</guid>
      <category>cs.MS</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xinye Chen, Thibault Hilaire, Fabienne J\'ez\'equel</dc:creator>
    </item>
    <item>
      <title>Benchmarking Open-Ended Multi-Agent Coordination in Language Agents</title>
      <link>https://arxiv.org/abs/2606.08340</link>
      <description>arXiv:2606.08340v1 Announce Type: new 
Abstract: As language models are increasingly deployed as autonomous agents, they must coordinate with others over long horizons in open-ended interactive tasks. Yet existing evaluations rarely test these demands together, instead emphasising single-agent tasks, short interactions, or highly structured multi-agent settings. We introduce $alem$, a JAX-based benchmark for open-ended multi-agent coordination built on Craftax-like dynamics. Alem embeds procedurally generated coordination tasks, soft specialisation, communication, and controllable coordination difficulty into a long-horizon survival world with exploration, crafting, trading, and combat. We evaluate $13$ modern LLMs zero-shot within homogeneous teams, with trained MARL agents as reference points. Current LLM agents remain far from solving alem, averaging only ~6% normalised return, but their failures are not uniform. On the hardest coordination setting, zero-shot Gemini-3.1-Pro-High approaches MARL agents trained for one billion steps, while GPT-5.4-High achieves strong base-task reward but much lower coordination reward. This contrast shows that individual task competence does not imply coordination competence. Ablations show that communication is the largest contributor to coordination, while memory and reasoning help when used to maintain multi-step plans. Overall, our results identify coordination as a distinct bottleneck for frontier LLM agents, separate from single-agent capabilities. Alem makes this bottleneck measurable and provides a controlled testbed for developing agents that communicate, allocate roles, and execute shared plans. Code is available at https://github.com/alem-world/alem-env.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08340v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kale-ab Abebe Tessera, Andras Szecsenyi, Cameron Barker, Alexander Rutherford, Davide Paglieri, Aidan Scannell, Henry Gouk, Elliot J. Crowley, Tim Rockt\"aschel, Amos Storkey</dc:creator>
    </item>
    <item>
      <title>Uncertainty-Aware Intention Prediction for Human-to-Robot Assembly Teleoperation</title>
      <link>https://arxiv.org/abs/2606.08341</link>
      <description>arXiv:2606.08341v1 Announce Type: new 
Abstract: In assisted teleoperation for human-robot collaboration, accurate intention prediction is critical for enabling timely and reliable robotic assistance during long-horizon manipulation and assembly tasks. These systems require continuous understanding of user behavior to recognize actions, anticipate intentions, and detect mistakes in real time. However, robot teleoperation demonstrations are costly and hardware-limited, whereas human demonstrations are easier to collect and provide rich temporal structure. To address this challenge, we propose an uncertainty-aware human-to-robot intention prediction framework that combines: (1) hierarchical transfer learning, where MS-TCN++ is pretrained on human hand demonstrations and fine-tuned on limited robot teleoperation data to capture low-level actions and high-level task intentions; (2) a conformal prediction module that provides frame-level prediction sets with statistical coverage guarantees for reliable uncertainty quantification and early intention estimation; and (3) VLM-guided segment correction, which selectively reviews low-confidence or temporally uncertain segments using visual and temporal context. The framework supports action recognition, temporal segmentation, intention anticipation, and mistake detection for assisted teleoperation. Experiments on robot assembly demonstrations with 22 action classes show that human-to-robot fine-tuning improves the robot test-set Edit score from 70.50 to 80.70 using only 16 robot demonstrations. Edit-safe VLM correction further improves frame accuracy from 45.21% to 46.42% and increases F1@25 and F1@50 while preserving the Edit score. These results show that human demonstrations provide scalable pretraining data for robust, uncertainty-aware robot action segmentation. Code and data: project website.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08341v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Fnu Heman, Yixuan Wang, Kolin Xu, Conner Wallace, John Dang, Akhil Joshi, Jun Sheng, Pinhas Ben-Tzvi, Mingyu Cai</dc:creator>
    </item>
    <item>
      <title>GENERIC-FNO: Embedding Energy Conservation and Entropy Production into Fourier Neural Operators</title>
      <link>https://arxiv.org/abs/2606.08343</link>
      <description>arXiv:2606.08343v1 Announce Type: new 
Abstract: We introduce GENERIC-FNO, the first neural operator to embed the full GENERIC (metriplectic) structure of nonequilibrium thermodynamics -- reversible, energy-conserving dynamics and irreversible, entropy-producing dynamics coupled through the degeneracy conditions -- directly in function space. Existing structure-preserving neural operators enforce at most a single conservation law or reversible (Hamiltonian) structure, while thermodynamically consistent learning has been confined to finite-dimensional, graph, or particle systems. GENERIC-FNO closes this gap: it learns the energy and entropy functionals as neural operators and parameterizes the Poisson and friction operators as diagonal Fourier multipliers sandwiched between rank-one projections that enforce the degeneracy conditions exactly, by construction, with no penalty term, update projection, or residual. The degeneracy identities hold to machine precision (residuals ~10^-13) for any initialization, dimension, or resolution, so the continuous-time dynamics conserve the learned energy and produce entropy exactly; the explicit time stepping adds only a small O(dt^2) drift (per-step residual ~10^-6). We further note that the (E,S,L,M) decomposition of a given flow is not unique, and introduce a gauge-invariant dissipation diagnostic separating reversible from dissipative dynamics independently of the learned functionals. Across three operator backbones (1D/2D FNOs and DeepONet) and four PDEs spanning reversible, dissipative, and mixed regimes, GENERIC-FNO preserves its exact structural guarantees zero-shot across a 4x super-resolution range (64 to 256), recovers the ground-truth ordering of physical dissipation, and is competitive with strong unconstrained and energy-penalized baselines, outperforming them on several dissipative and mixed problems at comparable or fewer parameters.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08343v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jason Sulskis, Sathya Ravi</dc:creator>
    </item>
    <item>
      <title>AuditFraudBench: Benchmarking Audit Judgment in Detecting Fraudulent Misstatements</title>
      <link>https://arxiv.org/abs/2606.08345</link>
      <description>arXiv:2606.08345v1 Announce Type: new 
Abstract: Large language models (LLMs) have shown strong performance in financial analysis and surface-level factual error detection, yet their ability to identify fraudulent financial misinformation in audited corporate reporting remains underexplored. Existing financial and audit benchmarks mainly focus on factual verification, numerical reasoning, rule compliance, or audit workflows, but rarely evaluate misleading disclosure narratives or management explanations that obscure the true drivers of reported performance. We introduce AuditFraudBench, an enforcement-grounded benchmark constructed from authentic company filings and regulatory materials, including original and restated 10-K and 10-Q filings, structured financial statements, MD&amp;A disclosures, and SEC Accounting and Auditing Enforcement Releases (AAERs). AuditFraudBench contains three tasks: Profit Source Attribution, Misleading Narrative Detection, and Fraud Pattern Classification, which evaluate whether models can identify the true source of reported performance, detect misleading disclosure framing, and classify misconduct mechanisms into known manipulation patterns. We evaluate GPT, DeepSeek, and Qwen series LLMs on the benchmark. Results show that both proprietary and open models still struggle to jointly reason over financial figures, disclosure framing, restatement evidence, and enforcement-grounded fraud mechanisms. AuditFraudBench provides a challenging testbed for audit-relevant, evidence-grounded evaluation of LLMs in financial reporting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08345v1</guid>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhiwei Liu, Yueru He, Qing Ou, Tianlei Zhu, Xiaorui Guo, Xueqing Peng, Sophia Ananiadou</dc:creator>
    </item>
    <item>
      <title>CATPO: Critique-Augmented Tree Policy Optimization</title>
      <link>https://arxiv.org/abs/2606.08346</link>
      <description>arXiv:2606.08346v1 Announce Type: new 
Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving the reasoning capabilities of large language models (LLMs). Recent tree-based methods such as TreeRPO extend flat trajectory sampling with tree-structured rollouts to obtain dense, step-level reward signals without a separate process reward model. However, not all trees are equally informative: trees where all leaves succeed, all leaves fail, or the policy already predicts the reward distribution contribute little to gradient updates, wasting compute. We introduce CATPO (Critique-Augmented Tree Policy Optimization), which diagnoses and addresses this waste at the tree level. CATPO first scores each tree via a tree informativeness score, F(T), combining leaf-outcome diversity with policy-reward decorrelation at zero extra compute. For dead-wrong trees where all branches fail, CATPO applies critique-guided healing: it locates the shallowest failure point, generates a natural-language critique, and grafts refined continuations to recover training signal. Finally, an informativeness-weighted loss scales each tree's gradient contribution by its normalized score, concentrating parameter updates on the most informative trees while preserving overall gradient magnitude. Experiments on Qwen2.5-Math-1.5B trained with the MATH dataset show that CATPO achieves 37.5% macro accuracy across four benchmarks (AIME24, MATH-500, OlympiadBench, and MinervaMath), improving over TreeRPO by 1.9% and GRPO by 4.8%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08346v1</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ayush Singh, Umang Goyal, Ankur Dahiya</dc:creator>
    </item>
    <item>
      <title>Tensorizing Engram: Sharing Latents Across N-Gram Embeddings is Beneficial in LLMs</title>
      <link>https://arxiv.org/abs/2606.08347</link>
      <description>arXiv:2606.08347v1 Announce Type: new 
Abstract: Modern language models represent text using discrete token-level embeddings, which forces recurring multi-token patterns to be learned implicitly across Transformer layers. Both Over-tokenized Transformers and Engram attempt to address this limitation by explicitly incorporating multi-token (n-gram) memories. However, they rely on separate hash tables for each n-gram order, which introduces hash collisions and prevents nested n-grams from sharing the underlying latent structures. To address these issues, we propose Tensorized Engram (TN-gram), a compact memory module that represents tensorized n-gram embeddings through shared factors in the Canonical Polyadic (CP) form. TN-gram learns shared token-position factors together with order-absorption vectors to encode the embeddings of different n-gram order. Comprehensive experiments demonstrate that TN-gram matches or even outperforms Engram-style n-gram modules while requiring much fewer parameters.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08347v1</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wuyang Zhou, Yuxuan Gu, Giorgos Iacovides, Yuning Qiu, Qibin Zhao, Danilo Mandic</dc:creator>
    </item>
    <item>
      <title>Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses</title>
      <link>https://arxiv.org/abs/2606.08348</link>
      <description>arXiv:2606.08348v1 Announce Type: new 
Abstract: LLM agents increasingly rely on external inference conditions: prompts, tools, memory, SOPs, skills, and harness feedback. These assets can improve task execution without changing model weights, but they are often revised by heuristic reflection or by reusing observed successes and failures as if counts alone were reliable belief. We introduce \textbf{Bayesian-Agent}, a native and cross-harness framework that treats reusable skills and SOPs as hypotheses about whether a frozen model will succeed under a particular prompt, context, and harness environment. Bayesian-Agent records verified trajectory evidence, maintains a feature-conditioned categorical posterior over each skill, and maps posterior state into inspectable actions such as patch, split, compress, retire, and explore. Model-facing prompts receive executable guardrails and failure-mode patches, while posterior summaries remain available for audit. With \texttt{deepseek-v4-flash}, incremental repair improves SOP-Bench from 80\% to 95\%, Lifelong AgentBench from 90\% to 100\%, and RealFin-Bench from 45\% to 65\%. We further evaluate Bayesian-Agent's native backend and optional GenericAgent, mini-swe-agent, and Claude Code backends. The results include positive, negative, saturated, and case-study settings, suggesting that agent skill evolution is best viewed as posterior-guided harness optimization rather than uncalibrated prompt accumulation. The source code is available at https://github.com/DataArcTech/Bayesian-Agent.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08348v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiaojun Wu, Cehao Yang, Honghao Liu, Xueyuan Lin, Wenjie Zhang, Zhichao Shi, Xuhui Jiang, Chengjin Xu, Jia Li, Jian Guo</dc:creator>
    </item>
    <item>
      <title>Forward-Free Diffusion Language Models</title>
      <link>https://arxiv.org/abs/2606.08357</link>
      <description>arXiv:2606.08357v1 Announce Type: new 
Abstract: Diffusion language models generate text through iterative denoising, offering a powerful alternative to autoregressive generation. However, discrete language spaces lack a natural neighborhood structure for defining effective perturbations, so some artificial corruption schemes are proposed in the forward process. Such prescribed forward processes often produce states that are mathematically convenient but misaligned with drafts and errors encountered during generation, resulting in degraded sample quality. To address this limitation, we propose FReDA, a forward-free diffusion language model that eliminates the need for a hand-designed forward process. We formulate diffusion language modeling as recursive distribution refinement, in which model-generated drafts serve as implicit intermediate states, and the learned refinement model progressively moves the draft distribution toward the target distribution. Concretely, FReDA refines drafts by proposing candidate draft sequences and either directly performing self-refinement or selecting among parallel candidates via best-of-N refinement. With this design, FReDA is neighborhood-agnostic, model-complexity-aware, and compatible with flexible refinement parameterizations. Extensive evaluations in the sub-8B regime show that FReDA-4B outperforms larger diffusion base models on reasoning and coding benchmarks, achieving absolute gains of up to 15%, while reaching a 1.5-1.8x average speedup over diffusion baselines and scaling effectively with additional refinement computation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08357v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haotian Sun, Rushi Qiang, Yuqian Zheng, Bo Dai</dc:creator>
    </item>
    <item>
      <title>Generative Frontier Planning for Adaptive Peer-Referral Recruitment under Covariate-Dependent Arrivals</title>
      <link>https://arxiv.org/abs/2606.08360</link>
      <description>arXiv:2606.08360v1 Announce Type: new 
Abstract: Peer-referral recruitment systems such as respondent-driven sampling are critical for studying and intervening on hidden populations affected by infectious diseases. To accelerate recruitment, public health agencies must adaptively allocate limited referral resources across multiple rounds, where current decisions shape both the number and the covariates of future recruits. Prior work makes this problem tractable by assuming that referrals are drawn i.i.d.\ from a homogeneous population, an assumption that ignores the homophily and shared context that drive real peer recruitment. We instead consider a more realistic model in which both referral capacity and the covariates of newly referred individuals are conditioned on the referrer, learned from data with a censored count model and a conditional generative model. The resulting planning problem is challenging because each candidate allocation induces a different distribution over future recruits. We propose \emph{Generative Frontier Planning} (GFP), a model-based planner that replaces per-step Monte-Carlo sampling with a deterministic backup over a latent covariate-coverage value surrogate. The surrogate is designed so that the expected value of the next frontier depends on the offspring generative model only through finite-dimensional summaries that are amortized offline, and so that the resulting per-round objective is monotone with diminishing returns. Together, these two properties make planning tractable: the deterministic backup eliminates Monte-Carlo sampling, and the diminishing-returns structure lets a marginal greedy allocation achieve a \((1-1/e)\)-approximation for the per-round problem. On a simulation environment calibrated to a real respondent-driven sampling dataset, GFP outperforms random, reinforcement-learning, and i.i.d.\ dynamic-programming baselines across four discount factors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08360v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lingkai Kong, Hezi Jiang, Andrew Ma, Keyu Wang, Akseli Kangaslahti, Milind Tambe</dc:creator>
    </item>
    <item>
      <title>EmpiriGraph-Psy: A Dataset and LLM Pipeline for Extracting Empirical Relation Graphs from Psychology Abstracts</title>
      <link>https://arxiv.org/abs/2606.08362</link>
      <description>arXiv:2606.08362v1 Announce Type: new 
Abstract: Existing scientific relation extraction benchmarks mainly target domains such as computer science, where entities are tasks, methods, datasets, materials, or metrics. This leaves a gap in variable-oriented empirical fields such as psychology, where findings are expressed as relations among constructs, measurements, interventions, and outcomes. We introduce variable-centered empirical graph extraction, the task of mapping scientific abstracts to typed graphs whose nodes are normalized variables and whose edges represent empirical and hierarchical relations. To support this task, we construct EmpiriGraph-Psy, a benchmark of 210 psychology abstracts annotated by domain-trained annotators with normalized variables, concept hierarchies, empirical relation types, and validation states. We evaluate frontier and open-weight LLMs using both direct extraction and a staged graph-construction pipeline that separates variable extraction, normalization, hierarchy construction, evidence selection, relation extraction, and edge validation. The staged pipeline substantially outperforms direct extraction, with the best configuration achieving a macro-F1 of 0.74. Error analysis shows that moderation relations and concept hierarchies remain the most challenging cases, highlighting the difficulty of extracting higher-order empirical claims and implicit abstraction structure from scientific abstracts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08362v1</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Danqin Zhao (Department of Psychology, University of Warwick), Yicun Liu (Mathematical Sciences Institute, The Australian National University), Xingwei Tan (School of Computer Science, University of Sheffield), Thomas T. Hills (Department of Psychology, University of Warwick)</dc:creator>
    </item>
    <item>
      <title>Kronecker products and iterated matrix multiplication</title>
      <link>https://arxiv.org/abs/2606.08363</link>
      <description>arXiv:2606.08363v1 Announce Type: new 
Abstract: We observe that the Kronecker product of tensors is the operation that converts the determinant polynomial into Cayley's first hyperdeterminant. We apply the Kronecker product to iterated matrix multiplication, which results in the hypercomputant, a VNP-complete and VW[1]-complete polynomial whose hardness we prove via the equivariance of the Kronecker product. The construction works over arbitrary commutative semirings and also for the tensor algebra and the exterior algebra. For the tensor algebra this gives a version of "noncommutative VNP", and for polynomials over the nonnegative real numbers this gives a version of "monotone VNP", each with the hypercomputant as the complete object. We take a parameterized complexity viewpoint and compare the noncommutative setting and the monotone setting. Using standard techniques we obtain optimal algebraic branching program width lower bounds in both settings, and these are notably not always the same. We also prove the polystability of the hypercomputant and that its isotypic components are characterized by their stabilizer.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08363v1</guid>
      <category>cs.CC</category>
      <category>math.RA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Christian Ikenmeyer</dc:creator>
    </item>
    <item>
      <title>Self-Supervised Vision Transformers for CBCT-Based Detection of Temporomandibular Joint Osteoarthritis</title>
      <link>https://arxiv.org/abs/2606.08364</link>
      <description>arXiv:2606.08364v1 Announce Type: new 
Abstract: Temporomandibular joint osteoarthritis (TMJ OA) is a prevalent degenerative condition whose osseous changes are often subtle on cone-beam CT (CBCT), making automated detection challenging. We study how well the DINO family of self-supervised vision transformers -- DINOv1, DINOv2, DINOv2+reg, and RAD-DINO (a radiology-pretrained variant) -- transfers to CBCT, asking how much backbone adaptation is needed and of what kind. We propose a simple slice-based pipeline using Vision Transformer (ViT) backbones: axial CBCT slices are encoded per-slice by a frozen or partially adapted ViT and aggregated via attention-based multiple instance learning (MIL) for patient-level binary OA/Normal classification. Through systematic ablation across unfreezing strategies and aggregation designs on a multi-source CBCT dataset, we find that partial unfreezing of the final two transformer blocks is the decisive factor, improving AUC from 0.671 (fully frozen DINOv2) to 0.902. This outperforms DINOv1 (0.867), DINOv2+reg (0.774), and a supervised ImageNet ViT-B/16 baseline (0.843). Our results provide practical guidance for adapting DINO-family foundation models in low-data medical imaging settings, showing that adaptation strategy is a stronger driver of performance than backbone choice alone.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08364v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Shradhdha Trivedi, Vrundan Sojitra, Mariela Padilla</dc:creator>
    </item>
    <item>
      <title>Pre-Intervention Prediction of Sparse Autoencoder Steering Side Effects</title>
      <link>https://arxiv.org/abs/2606.08365</link>
      <description>arXiv:2606.08365v1 Announce Type: new 
Abstract: Sparse autoencoder (SAE) features are increasingly used to steer language models, but feature steering is rarely clean: the same intervention can behave inconsistently across contexts and perturb unrelated features. We introduce a pre-intervention screening framework for forecasting SAE steering side effects from feature statistics computed before steering. We operationalize side effects along two axes of steering modularity, effect stability and collateral spread, and evaluate GPT-2-small, Pythia-70M-deduped, Gemma-2-2B, and Llama-3.1-8B across ReLU, JumpReLU, and TopK SAE dictionaries. Across these settings, decoder geometry, activation statistics, co-activation structure, and direct-logit footprint predict steering modularity better than frequency-only and activation-magnitude baselines. The signal is strongest in GPT-2-small, Pythia-70M, and Llama-3.1-8B, where it survives residualization against magnitude-related confounds, and weaker in Gemma-2-2B. Held-out screening shows that ranking unseen features by predicted cleanliness can select features that steer more cleanly on fresh contexts, but the successful axis varies by setting: GPT-2 improves most cleanly, Pythia improves mainly on stability, Llama mainly on collateral, and Gemma only partially. A controlled Llama Scope width comparison shows that the predictive signal persists under a 32K-to-128K dictionary-width change, although the screening payoff becomes less stable. Overall, SAE steering side effects are predictable in advance, but the useful predictor signature and transferred modularity axis are model- and dictionary-setting dependent.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08365v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Evan Duan</dc:creator>
    </item>
    <item>
      <title>Emergence World: A Platform for Evaluating Long-Horizon Multi-Agent Autonomy</title>
      <link>https://arxiv.org/abs/2606.08367</link>
      <description>arXiv:2606.08367v1 Announce Type: new 
Abstract: Most evaluations of LLM agents look like exams: a discrete task, a clean environment, a score in minutes or hours. We argue that this approach is mismatched with the deployment conditions of autonomous systems, where the relevant timescale can be weeks to months, and where the dynamics that matter most, such as behavioral drift, governance in diverse environmental contexts, and cross-influence between agents from different model families, only emerge over time. We introduce Emergence World, a continuously running multi-agent simulation platform designed to make those dynamics measurable. The platform hosts populations of LLM-driven agents in a shared spatial world grounded in live external data (e.g. real-time weather, news APIs, internet access), equips each agent with 120+ specialized tools and three persistent memory systems, and lets them govern themselves through democratic mechanisms with consequential outcomes. The platform is model-agnostic at the reasoning layer and supports heterogeneous populations in which agents from different vendors share the same world. To illustrate the kinds of questions the platform makes tractable, we present a 15-day cross-vendor study with five parallel worlds powered by Claude Sonnet 4.6, Grok 4.1 Fast, Gemini 3 Flash, GPT-5-mini, and a mixed population. Identical roles and starting conditions produced radically different outcomes, ranging from stable deliberative governance to total population collapse. We release the prompts, log data and configurations to support further research on long-horizon multi-agent autonomy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08367v1</guid>
      <category>cs.MA</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Deepak Akkil, Ravi Kokku, Karthik Vikram, Tamer Abuelsaad, Aditya Vempaty, Satya Nitta</dc:creator>
    </item>
    <item>
      <title>An Information-Theoretic Definition for Open-Ended Learning</title>
      <link>https://arxiv.org/abs/2606.08369</link>
      <description>arXiv:2606.08369v1 Announce Type: new 
Abstract: A growing body of work points to the great promise of AI systems that can continually expand their capabilities as they operate in an open-ended environment. But yet there is no coherent definition of open-endedness or theory about how an agent ought to explore an open-ended environment. We introduce an information-theoretic definition based on a new concept -- the ${\textit bit-equivalent}$ -- which quantifies the information required to attain each level of expected reward. We consider an environment to be open-ended if an agent can attain linear growth in the bit-equivalent. We establish that classical bandit environments are not open-ended and formulate a bandit environment that is. We also introduce an algorithm that achieves open-ended learning in this environment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08369v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wanqiao Xu, Yifan Zhu, Benjamin Van Roy</dc:creator>
    </item>
    <item>
      <title>Risk-Aware Planning for Transit Desert Remediation Under Demand Uncertainty</title>
      <link>https://arxiv.org/abs/2606.08371</link>
      <description>arXiv:2606.08371v1 Announce Type: new 
Abstract: Transit deserts are areas where public transportation is inadequate despite evidence of travel demand, a condition that affects tens of millions of residents across the Americas. Planning for these areas is difficult because the usual demand signal is missing: ridership cannot be observed before service exists. To address that setting, we formulate risk-aware transit desert remediation as a partially observable Markov decision process with Conditional Value-at-Risk constraints for financial tail risk. The model uses demographic, land-use, and employment data to set a prior over latent demand, then updates that prior as new service deployments produce ridership observations. A myopic belief-aware planner is evaluated on 25 cities using a unified financial model for operating cost, capital expenditure, fare revenue, and net subsidy. After five years, the planner remediates a median of 53.6% of transit-desert tracts and improves on static optimization by 5.0 percentage points on average, with gains in 16 of 25 cities. Gains are largest at moderate budgets (+9.9 points at baseline) and persist under 50% prior-demand miscalibration, while population density and existing transit density are the strongest structural predictors of remediation cost ($R^2\!=\!0.41$ on per-tract cost)</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08371v1</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Polina Khoroshevskaya, Ashish Kumar Perukari</dc:creator>
    </item>
    <item>
      <title>SoK: Reconstruction Attacks on Synthetic Tabular Data (Insights from Winning the NIST CRC)</title>
      <link>https://arxiv.org/abs/2606.08372</link>
      <description>arXiv:2606.08372v1 Announce Type: new 
Abstract: Synthetic data is increasingly promoted as a privacy-preserving substitute for releasing sensitive tabular records, yet its central adversarial threat ("reconstruction", the recovery of an individual's hidden attribute values from a synthetic release and a handful of known quasi-identifiers) has been studied only in scattered, hard-to-compare settings. We present the first systematization of reconstruction (equivalently, attribute inference) attacks on de-identified and synthetic tabular data. We contribute a taxonomy that organizes attacks by the structure they exploit; the most systematic empirical evaluation to date, pitting fourteen attacks against nine synthetic data generation (SDG) methods across five benchmark datasets; and a set of new attacks that fill gaps in the taxonomy, one of which (CoBP-RA) is the strongest attack we measure. Crucially, we introduce a methodology for interpreting what attack success means: a memorization test that distinguishes reconstruction of the population distribution from memorization of training records, and a reduction that places reconstruction and membership inference on a single comparable scale. Our findings: the choice of SDG method governs risk far more than the choice of attack; differential privacy protects mainly at small budgets ($\varepsilon\lesssim1$), above which protection plateaus, bounded by the synthesizer's capacity rather than its noise; de-identification methods are the most exposed; and most reconstruction reflects distributional structure rather than memorization, concentrating individual risk on atypical records. The attacks and infrastructure are externally validated by our first-place finish among all red teams in the 2025 \textit{National Institute of Standards and Technology} (NIST) Collaborative Research Cycle.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08372v1</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Steven Golob, Sikha Pentyala, Martine De Cock</dc:creator>
    </item>
    <item>
      <title>Predictive Coding with Bayesian Priors via Proximal Gradients</title>
      <link>https://arxiv.org/abs/2606.08374</link>
      <description>arXiv:2606.08374v1 Announce Type: new 
Abstract: We recast predictive coding as continuous-time proximal gradient descent applied to a regularized maximum-a-posteriori (MAP) objective. We study first a single-level problem and then a multi-level hierarchy. For the single-level problem, we show that proximal gradient descent is precisely a leaky firing-rate network: the membrane leak, the effective recurrent matrix, the local synaptic drive, and the static nonlinearity all follow from one optimization principle, and the resulting circuit is the one proposed by Rao and Ballard. The prior selects the nonlinearity through its proximal operator, and the likelihood precision sets the gain on the observation. For the hierarchy, we show that a classical variable-splitting relaxation of the deep MAP problem yields hierarchical predictive coding as the interconnection of local and distributed solvers. In probabilistic modeling terms, this relaxation replaces the directed generative chain by an undirected Markov random field whose node potentials are the level-wise priors. Each level then applies its own activation function, namely the proximal operator of its prior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08374v1</guid>
      <category>eess.SY</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Francesco Bullo</dc:creator>
    </item>
    <item>
      <title>Few-step Cofolding with All-Atom Flow Maps</title>
      <link>https://arxiv.org/abs/2606.08375</link>
      <description>arXiv:2606.08375v1 Announce Type: new 
Abstract: All-atom generative modeling of 3D biomolecular complexes has emerged as the dominant paradigm for predicting the structure of proteins and protein-ligand systems. Generating structures at the atomic level of fidelity, however, typically requires expensive iterative diffusion rollouts, making both conventional deployment and inference-time search techniques computationally costly. In this paper, we introduce the Denoiser Cofolding All-Atom Flowmap (DeCAF) framework for distilling state-of-the-art all-atom cofolding models into all-atom flow maps that produce high-quality samples in only a few inference steps. We build DeCAF on a denoiser-based formulation of flow maps with endpoint losses that naturally support SE(3) rigid alignment, which we show is critical for training accurate models. We further derive a simple change of variables that lets DeCAF operate in the {\sigma}-space noise schedule of EDM-style architectures, enabling direct distillation from pretrained cofolding diffusion models. Equipped with DeCAF's flowmap lookahead, we introduce a purpose-built inference-time framework that improves sampling through reward-guided search. Empirically, DeCAF-Boltz statistically improves over Boltz-1x in both accuracy (RMSD) and physical validity scores of protein-ligand poses at strict NFE budgets on the challenging Runs N' Poses, while also showing a more optimal Pareto frontier across all inference compute budgets on PoseBusters. Distilling the state-of-the-art Pearl cofolding model, DeCAF-Pearl outperforms diffusion-based cofolding models and matches its teacher on success rate while using 5x fewer NFEs. We release our code at https://github.com/genesistherapeutics/decaf.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08375v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gianluca Scarpellini, Ron Shprints, Peter Holderrieth, Juno Nam, Pranav Murugan, Rafael G\'omez-Bombarelli, Tommi Jaakola, Maruan Al-Shedivat, Nicholas Matthew Boffi, Avishek Joey Bose</dc:creator>
    </item>
    <item>
      <title>RiskNet: A large-scale dataset of AI risk incidents from news with alignment and multi-dimensional annotations</title>
      <link>https://arxiv.org/abs/2606.08376</link>
      <description>arXiv:2606.08376v1 Announce Type: new 
Abstract: As artificial intelligence (AI) systems are increasingly deployed across socially consequential domains, reports of AI-related harms and failures have grown in frequency and diversity. Although existing governance frameworks articulate high-level principles for responsible AI, large-scale empirical resources for tracking and analyzing real-world AI risk incidents remain limited. Existing incident collections are often manually curated, relatively small in scale, and insufficient for continuous, data-driven monitoring and downstream computational analysis. To address this need, we present RiskNet, a large-scale dataset of AI risk incidents constructed from large-scale multilingual news sources. RiskNet applies a structured pipeline for AI risk news identification, event-level report screening, incident alignment, and multi-dimensional incident classification. The resulting resource organizes dispersed news reports into incident-centered records and provides benchmark datasets for event classification, incident alignment, and incident-level risk labeling. In its current release, RiskNet covers hundreds of millions of source records and yields a large-scale collection of AI risk-related reports, including aligned incident clusters and annotated benchmark subsets. The dataset is also accessible through an online platform for browsing and exploration. We describe the data sources, processing workflow, taxonomy design, and technical validation of the resource. RiskNet is intended to support downstream research on AI safety, governance, risk analysis, and benchmarking, as well as longitudinal and cross-source analyses of AI-related harms. By providing a structured and reusable empirical resource, RiskNet helps bridge the gap between high-level governance principles and the documented realities of AI risk incidents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08376v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Leihan Zhang, Wecheng Ye, Xianlong Ma, Haochuan Liu, Yang Li, Qianyu Zhang, Jinliang Chen, Qiang Yan</dc:creator>
    </item>
    <item>
      <title>From Estimates to Schedules: Learning-Augmented Restricted Assignment</title>
      <link>https://arxiv.org/abs/2606.08377</link>
      <description>arXiv:2606.08377v1 Announce Type: new 
Abstract: In this work, we study Restricted Assignment scheduling on multiple machines, where each job can be processed only on a specified subset of machines and the objective is to minimize the makespan. We introduce a learning-augmented setting in which a possibly infeasible predicted assignment is provided. The prediction error (moved-load) is measured by the total processing volume that must be reassigned in order to obtain an optimal feasible schedule.
  Using a single prediction, we obtain two types of guarantees. First, we design an algorithm whose approximation ratio degrades smoothly with the prediction error while retaining a worst-case guarantee independent of the prediction quality. More precisely, for any fixed constant, we can make the additive dependence on the prediction error arbitrarily small, at the cost of increasing the polynomial running time. This guarantee can also be combined with any approximation algorithm for the problem without predictions to obtain robustness.
  Second, given a makespan estimate, we provide a repair procedure that returns a schedule matching this estimate in time parameterized by the prediction error. This allows the algorithm to exploit the separation between estimation and approximation algorithms for Restricted Assignment. Finally, we complement the repair algorithm with a parameterized hardness result, showing that exact moved-load repair with a given target makespan is W[1]-hard when parameterized by the amount of moved-load.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08377v1</guid>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Michalis Xefteris</dc:creator>
    </item>
    <item>
      <title>TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution</title>
      <link>https://arxiv.org/abs/2606.08379</link>
      <description>arXiv:2606.08379v1 Announce Type: new 
Abstract: This study addresses the optimal execution of large stock sell programs by introducing TT-DAC-PS (Twin-Target Deterministic Actor-Critic with Policy Smoothing), a deterministic actor-critic architecture that combines twin exponential-moving-average critic targets with pessimistic min backup, TD3-style target policy smoothing noise, delayed actor updates, and conservative Q regularisation to curb overestimation. Exploration uses Ornstein-Uhlenbeck (OU) noise with a hybrid schedule: deterministic episode-wise decay, variance-guided adjustment based on recent reward dispersion, and a Soft Actor-Critic (SAC)-style temperature that is learned and mapped to the noise scale. The environment integrates Almgren-Chriss (AC) trade impact with Limit Order Book (LOB) prices and volumes, normalised state features, per-step volume participation caps, and a utility-based reward. The trade execution algorithm is applied to LOB data for ten U.S. stocks. Performance is assessed against reinforcement-learning baseline algorithms, including Proximal Policy Optimisation (PPO), Soft Actor-Critic (SAC), and Advantage Actor-Critic (A2C), as well as alternative trade execution algorithms, including Time-Weighted Average Price (TWAP), Volume-Weighted Average Price (VWAP), and AC. The proposed model consistently reduces mean implementation shortfall percentage with competitive variance, outperforming classical baselines and standard reinforcement-learning benchmark models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08379v1</guid>
      <category>cs.AI</category>
      <category>cs.CE</category>
      <category>cs.LG</category>
      <category>q-fin.CP</category>
      <category>q-fin.TR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ilia Zaznov, Atta Badii, Julian Kunkel, Alfonso Dufour</dc:creator>
    </item>
    <item>
      <title>Programming Domain-Specific FPGA Hardblocks from HLS: An RTL Blackbox Approach</title>
      <link>https://arxiv.org/abs/2606.08380</link>
      <description>arXiv:2606.08380v1 Announce Type: new 
Abstract: Domain-specific Field Programmable Gate Array (FPGA) architectures increasingly integrate specialized hardblocks, such as Tensor Slices, to accelerate artificial intelligence and machine learning workloads. Despite their efficiency benefits, these architectures remain difficult to program because designers typically rely on manual Register-Transfer Level (RTL) integration to access these hardblocks. This paper presents a compiler-agnostic methodology that enables high-level synthesis (HLS) tools to target custom FPGA hardblocks directly from C/C++ code. Architectural hardblocks are exposed as schedulable C-level operators using an RTL blackbox abstraction with explicit latency and initiation-interval contracts, allowing the HLS scheduler to optimize around specialized hardware without manual RTL orchestration. Unlike traditional uses of HLS blackboxes for external IP integration, our approach treats blackboxes as architectural abstractions, enabling scalable composition of C-level operators that target custom FPGA hardblocks without compiler modification. We evaluate the proposed flow using a Tensor Slice-based FPGA architecture with AMD Vitis HLS and the Verilog-to-Routing (VTR) toolchain. Across multiple matrix sizes, designs generated using the proposed C-Blackbox flow achieve lower area-delay product than behavioral HLS baselines while providing substantially higher productivity-adjusted efficiency than handwritten RTL implementations. These results demonstrate that domain-specific FPGA architectures can be made accessible through HLS while maintaining competitive hardware efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08380v1</guid>
      <category>cs.AR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Ruthwik Reddy Sunketa, Jeevesh Choudhury, Aman Arora</dc:creator>
    </item>
    <item>
      <title>Auditing Proprietary Alignment in Large Language Models: A Comparative Framework Without a Ground-Truth Standard</title>
      <link>https://arxiv.org/abs/2606.08381</link>
      <description>arXiv:2606.08381v1 Announce Type: new 
Abstract: Large language models (LLMs) are increasingly released and deployed through opaque development and deployment pipelines, enabling model providers to inject intentional, provider-specific policies without officially announcing them. As a result, various models have been reported to generate responses reflecting proprietary rules and organizational interests, leading to censorship or misinformation on controversial topics. However, systematic identification of such alignment remains a fundamental challenge, complicated by the ambiguity of what ``proprietary'' entails in different contexts. In this paper, we propose a statistical framework for detecting proprietary alignment in black-box language models via comparative behavioral analysis. Our approach quantifies systematic deviations between the responses of a target model and those of a reference set of baseline models in a shared semantic space. By evaluating relative behavioral divergence rather than absolute correctness, our framework enables principled auditing under black-box access. Applied to several widely discussed but previously unquantified cases, it provides a systematic and scalable basis for external assessment of provider-specific alignment behavior in large language models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08381v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alireza Arbabi, Florian Kerschbaum</dc:creator>
    </item>
    <item>
      <title>STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control</title>
      <link>https://arxiv.org/abs/2606.08382</link>
      <description>arXiv:2606.08382v1 Announce Type: new 
Abstract: Low-rank projection has emerged as a promising approach for compressing the KV cache by exploiting hidden-dimension redundancy. However, prior methods rely on fixed or heuristic rank selection and struggle to achieve aggressive compression with minimal accuracy degradation. We propose STAR-KV, an adaptive low-rank KV cache compression framework with fine-grained rank control. STAR-KV encompasses 1) a differentiable thresholding mechanism that enables optimal rank selection at both attention-head and block levels, 2) a hybrid decomposition strategy that applies different low-rank factorizations according to the sensitivity of key and value projections, and 3) a low-rank-aware mixed precision quantization that leverages data statistics for near lossless low-bit quantization. Evaluated across multiple LLMs and benchmarks, STAR-KV achieves up to 75% KV cache compression and up to 20x overall KV cache reduction when combined with quantization. Enabled by custom Triton-based GPU kernels, STAR-KV delivers up to 6.9x speedup for the attention module and 3.1x end-to-end generation throughput. Our code is publicly available at: https://github.com/PriyanshBhatnagar/STAR-KV.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08382v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Priyansh Bhatnagar, Ashkan Moradifirouzabadi, Se-Hyun Yang, SeungJae Lee, Jungwook Choi, Mingu Kang</dc:creator>
    </item>
    <item>
      <title>The Spectral Dynamics and Noise Geometry of Muon</title>
      <link>https://arxiv.org/abs/2606.08388</link>
      <description>arXiv:2606.08388v1 Announce Type: new 
Abstract: Muon replaces a matrix gradient $G=U\Sigma V^\top$ by its polar factor $UV^\top$. This keeps the singular directions selected by the gradient, but makes the update spectrum flat. We study the optimization bias created by this operation. Under explicit alignment assumptions, we prove that the polar update is the one-step entropy-maximizing choice among bounded updates that use the gradient singular directions and do not adapt to the current weight spectrum. In an underdetermined regression model, we derive exact singular-value dynamics for continuous-time Muon and identify a measurement-dependent condition under which the normalized spectrum moves toward equal nonzero singular values. This geometry also rules out a common low-rank interpretation: at fixed Frobenius norm, Muon's distinguished state has a flat spectrum, whereas nuclear-norm minimization favors spectral concentration. Controlled matrix-sensing experiments separate the effect from simple gradient rescaling, show that norm-matched gradient descent does not reproduce Muon, and recover the predicted flattening trend across broad ablations. In small NanoGPT pretraining, Muon preserves stable rank, has a broad learning-rate plateau, and improves validation loss relative to AdamW; in a matched small-ViT control, the ranking reverses. The resulting picture is regime-dependent: Muon is not universally superior, but its flat-spectrum bias can help when many spectral directions need to remain active.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08388v1</guid>
      <category>cs.LG</category>
      <category>math.OC</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pierfrancesco Beneventano, Mahmoud Abdelmoneum, Tomaso Poggio</dc:creator>
    </item>
    <item>
      <title>When Are Neural Interaction Discoveries Real? Identifiability, Recoverability, and a Pre-Fit Diagnostic</title>
      <link>https://arxiv.org/abs/2606.08390</link>
      <description>arXiv:2606.08390v1 Announce Type: new 
Abstract: When a neural time-series model reports that one variable modulates another's effect on a target, is the discovered interaction a property of the data or an artifact of model flexibility? We argue that this is fundamentally a question of identifiability, governed by the geometry of the observed input support rather than by the specific neural architecture. We study the problem in a multiplicative-gating extension of neural additive vector autoregression (GNAVAR), in which source contributions are modulated by other lagged variables. We show that representational capacity is not identifiability: dependent inputs induce leakage between edge-specific interaction terms, and low-dimensional support permits distinct interaction decompositions that agree on the observed data while differing elsewhere. We then prove a population identifiability theorem for normalized minimal GNAVAR decompositions under explicit support conditions, including settings with shared modulators. The theory yields a simple practitioner-facing diagnostic: the effective rank of the joint lag-block covariance predicts, before fitting, whether interaction recovery is feasible for a given candidate set. When the candidate set is unknown, a two-seed stability check provides a practical operational test. The same support condition organizes empirical outcomes into the three states predicted by the theory. Our results show that interaction recoverability depends on support geometry, that effective rank provides a practical pre-fit diagnostic, and that instability across independent fits is a characteristic signature of non-identifiable interaction discovery. The identifiability phenomenon, the support condition, and the instability signature are model-agnostic; GNAVAR is the vehicle that makes them provable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08390v1</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Valentina Kuskova, Dmitry Zaytsev, Michael Coppedge</dc:creator>
    </item>
    <item>
      <title>When Correct Decisions Hide Internal Stress: Decision-State Probing in Multimodal Language Models</title>
      <link>https://arxiv.org/abs/2606.08394</link>
      <description>arXiv:2606.08394v1 Announce Type: new 
Abstract: Multimodal language models are typically evaluated through external behavior: selecting the correct image--text match, rejecting unsupported captions, or answering visual queries correctly. However, correct behavior alone does not show that the model's internal decision state remains stable under controlled semantic stress. We study this gap through S$^3$E (Structured Semantic Stress Evaluation), a framework for analyzing behavior-internal decoupling in multimodal language models. S$^3$E uses a positive-anchored A/B forced-choice setup in which an image-supported caption is contrasted against semantic stress candidates under both original and swapped option orders, while hidden states are extracted at the pre-answer decision state. We focus on strict-correct trials, where the model consistently selects the correct caption across both orders. Rather than treating arbitrary hidden-state variation as evidence of instability, we measure whether semantic-conflict candidates induce excess decision-state displacement relative to meaning-preserving controls. Across Qwen3VL, Gemma3, and InternVL3, semantic stress consistently produces positive selected-layer excess displacement over lexical controls despite correct forced-choice behavior, while comparisons against random negatives are model-dependent. We interpret this as a scoped decision-state stress-sensitivity signal rather than evidence of downstream failure or hallucination. Our results suggest that forced-choice correctness alone is not a sufficient certificate of invariant internal decision geometry.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08394v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haoran Zhao, Soyeon Caren Han, Eduard Hovy</dc:creator>
    </item>
    <item>
      <title>Prime Event Languages: An Information-Theoretic Investigation of Twin-Prime Event Structure</title>
      <link>https://arxiv.org/abs/2606.08395</link>
      <description>arXiv:2606.08395v1 Announce Type: new 
Abstract: Prime numbers are traditionally studied through numerical, probabilistic, and analytic frameworks. In this work, we introduce the concept of a prime event language, in which arithmetic phenomena are represented as symbolic event sequences and analyzed using tools from information theory and stochastic processes.
  Using all primes up to N = 5 x 10^9 (234,954,223 primes), we construct event languages based on twin-prime occurrences and record prime-gap events. We investigate their statistical properties through finite-order Markov models, train/test validation, mutual-information analysis, and information-horizon measurements.
  For the Twin Prime Event Language, first-order Markov modeling reduces test-set cross entropy from 0.325350 bits to 0.319949 bits, corresponding to an information gain of approximately 0.0054 bits. This gain survives out-of-sample validation and therefore reflects genuine statistical structure rather than overfitting.
  Mutual-information analysis independently confirms the Markov results and shows that measurable dependence is concentrated almost entirely at lag 1. The mutual information decreases from approximately 5.96 x 10^-3 bits at lag 1 to approximately 5.07 x 10^-7 bits at lag 2 (approximately 11,700-fold reduction), representing a reduction of more than four orders of magnitude. Beyond lag 2, residual information fluctuates near the statistical noise floor.
  These results indicate that prime event languages are neither perfectly memoryless nor strongly predictable. Instead, they exhibit weak but reproducible short-range statistical structure characterized by first-order dependence and an effective information horizon of approximately one event.
  More broadly, this work illustrates how alternative representations can reveal information-theoretic organization that remains less apparent in conventional numerical descriptions of arithmetic phenomena.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08395v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jinhua Liao</dc:creator>
    </item>
    <item>
      <title>TrustMargin: Training-Free Arbitration between Parametric Memory and Retrieved Evidence in Large Language Models</title>
      <link>https://arxiv.org/abs/2606.08397</link>
      <description>arXiv:2606.08397v1 Announce Type: new 
Abstract: Large language models answer knowledge-intensive questions using both parametric memory and retrieved evidence, but neither source is uniformly reliable. Retrieval can fill knowledge gaps, yet distracting passages may override correct closed-book answers. We study this post-generation conflict as answer-level source arbitration: given Direct and RAG answers from the same frozen model, decide which source to trust. We propose TRUSTMARGIN, a training-free, plug-and-play arbitration layer that scores the two existing candidates with the model's own likelihoods. It combines a parametric-prior margin, which tests whether memory accepts the retrieved answer, with an evidence-binding margin, which discounts passage-only salience and measures question-specific support. TRUSTMARGIN selects between Direct and RAG without fine-tuning, external judges, or additional generation. Across 2WIKIMQA and CWQA with three LLaMA scales, TRUSTMARGIN consistently improves over Direct generation and BM25-RAG, recovers part of the Direct/RAG oracle gap, and generalizes to multiple training-free RAG pipelines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08397v1</guid>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jingyan Xu, Hong Shi, Yi Shan, Penghui Liu, Yunhao Bai, Ningyuan Li, Xueyang Liu</dc:creator>
    </item>
    <item>
      <title>Impacts of Histories and Models on LLM Grading: A Study in Advanced Software Engineering Courses</title>
      <link>https://arxiv.org/abs/2606.08400</link>
      <description>arXiv:2606.08400v1 Announce Type: new 
Abstract: Graduate-level research reading report assessment creates a substantial labor burden for educators. While large language models (LLMs) hold great potential for automating academic grading, their reliability for this specialized task remains understudied, particularly regarding grading consistency, the lack of which represents a primary obstacle to educational fairness. This paper proposes a human-aligned LLM-assisted grading workflow and presents a case study based on 180 student submissions from a graduate advanced software engineering course. We evaluate two mainstream LLMs, Grok and GPT, in terms of grading consistency and alignment with human scores. We find LLMs exhibit distinct levels of intra-model consistency and significant inter-model grading inconsistencies, while simple ensemble approaches cannot improve alignment with human evaluation. Critically, continuous interaction history drives systematic drift in models' grading standards away from human expert scores. Our findings demonstrate LLMs' potential in reducing grading workload for educators in graduate education, while highlighting that indiscriminate LLM grading may introduce systemic unfairness, suggesting that specific operational practices are required to mitigate such disparities.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08400v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qilin Zhou, Zhuo Wang, Yue Li, W. K. Chan</dc:creator>
    </item>
    <item>
      <title>SceneConductor: 3D Scene Generation from Single Image with Multi-Agent Orchestration</title>
      <link>https://arxiv.org/abs/2606.08402</link>
      <description>arXiv:2606.08402v1 Announce Type: new 
Abstract: Generating complete 3D scenes from a single image requires inferring globally consistent geometry, object relationships, and environmental context from inherently ambiguous visual evidence. Despite recent progress in joint layout-and-mesh generation, existing methods often rely on holistic or weakly decomposed pipelines that entangle many factors at once and demand extensive scene-level supervision, limiting their generalization to complex real-world environments. We propose a multi-agent orchestration framework that decomposes single-image 3D scene generation into three structured stages: scene initialization, environment construction, and multi-agent refinement. The initialization stage extracts image-derived object masks, builds object-level 3D representations, and predicts an initial spatial layout to form a coarse 3D scene. The environment-construction stage then leverages this initialization together with point-map geometry to build an environmental scaffold of supporting surfaces, room boundaries, materials, and illumination. Finally, in the refinement stage, a planner agent identifies structural and visual inconsistencies, applies simple corrections directly, and dispatches specialist agents for complex localized revisions that are reintegrated into the global scene. To provide reliable structural initialization while reducing reliance on scene-level annotations, we further introduce a geometry-aware layout predictor supervised by sparse geometric priors derived from point maps. Unlike fully supervised layout generators, the predictor can be trained from segmentation-level data and generalizes robustly to diverse real-world scenes. Extensive experiments on benchmark datasets show that our method consistently outperforms prior approaches in geometric accuracy, spatial consistency, and perceptual realism.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08402v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jeonghwan Kim, Yushi Lan, Yongwei Chen, Hieu Trung Nguyen, Chuanyu Pan, Xingang Pan</dc:creator>
    </item>
    <item>
      <title>Hiding in Plain Floats: Steganographic Carriers for Indirect Prompt and Content Injection</title>
      <link>https://arxiv.org/abs/2606.08403</link>
      <description>arXiv:2606.08403v1 Announce Type: new 
Abstract: Text-centered prompt-injection defenses assume that the malicious signal is visible in one of the inspected text views. We study a reproducible LLM01-style indirect prompt/content-injection failure mode where that assumption breaks: a payload caught in plain English slips past the same detector when it is transported as structured float parameters and reconstructed only as fragmented telemetry. Across 14,400 attacked real-model trials on three commercial LLM APIs from different providers, the IFS-derived float-array carrier preserves 94.3% leakage ASR under the strongest dual-layer text-classifier defense evaluated in the main matrix: a Prompt Guard 2 + TF-IDF ensemble; the same carrier-level pattern also replicates with a fine-tuned roberta-base detector. We emphasize leakage ASR because downstream systems may act on quoted or reproduced markers even when the model refuses, but Strong ASR is the stricter metric for structurally compliant attack success. A 2 x 2 ablation shows that data-layer storage and reconstruction-layer fragmentation defeat different text views and that both are needed to evade both. A simple xxd detector and semantic validation block the current T3 instance, so the contribution is not an undetectable exploit but a measured failure boundary for text-only inspection in structured-input pipelines that expose reconstructed auxiliary channels to an LLM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08403v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mudit Sinha, Sanika Chavan</dc:creator>
    </item>
    <item>
      <title>Geometry-Driven Flow Analysis of Brain Sulcal Pattern</title>
      <link>https://arxiv.org/abs/2606.08404</link>
      <description>arXiv:2606.08404v1 Announce Type: new 
Abstract: Cortical folding reflects coordinated neurodevelopmental processes and is increasingly recognized as a sensitive marker of neurological disease. However, most existing analyses rely on indirect scalar summaries that do not explicitly model folding geometry itself. In juvenile myoclonic epilepsy (JME), a common genetic epilepsy, cortical abnormalities are often subtle, spatially distributed, and difficult to detect using conventional morphometric measures. We introduce a Poisson-equation-based framework that models cortical folding as a geometry-driven flow derived from mean curvature on the cortical manifold. By treating folding patterns as a stationary source-sink structure, the proposed approach yields a smooth, globally balanced potential field whose surface gradient defines a physically interpretable flux. This framework enables spatially coherent analysis of sulcal-gyral folding organization and provides a principled representation of geometry-driven cortical structure in JME.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08404v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Moo K. Chung, Luigi Maccotta, Aaron Struck</dc:creator>
    </item>
    <item>
      <title>Self-Evolving Scientific Agent Discovers Generalizable Physically-Reasoned Fluid Control</title>
      <link>https://arxiv.org/abs/2606.08405</link>
      <description>arXiv:2606.08405v1 Announce Type: new 
Abstract: While data-intensive deep reinforcement learning can optimize complex control policies, scientific discovery in physical systems fundamentally requires an interpretable chain of reasoning that connects physical evidence to structured control architectures. Here, we present a self-evolving scientific-agent workflow, driven by large language models and iterative code generation, that automates controller construction while preserving strict interpretability and rigorous physical reasoning. Instead of adjusting weights, the agent deploys candidate strategies into physical simulations, actively diagnoses dynamic behaviors from multimodal evidence, and translates these observations into progressive source-code refinements. We demonstrate this framework on a highly non-linear fluid-structure interaction problem: an underactuated, two-joint dogfish swimmer tasked with spatial target reaching using only joint angular accelerations. Starting from a propulsive seed policy that exhibits a one-sided steering bias, the agent autonomously discovers and refines a unified controller that robustly captures all canonical targets. Remarkably, without any retraining or target-specific branching, the synthesized control policy generalizes to unseen static targets and dynamically curved pursuit trajectories. The auditable evolve log reveals an emergent control architecture built upon traveling-wave propulsion, body-frame target guidance, yaw-rate feedback, signed mean-tail curvature, and adaptive cadence relief. Our results show that an autonomous scientific agent can successfully transform accumulated physical evidence into robust, mathematically readable control policy, while maintaining a fully traceable process of scientific discovery.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08405v1</guid>
      <category>cs.AI</category>
      <category>physics.flu-dyn</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Boai Sun, Wenjin Guo, Zongmin Yu, Liu Yang</dc:creator>
    </item>
    <item>
      <title>TimpaTeks: Automatic In-place Text Sequence Modification via Diffusion Language Model Steering</title>
      <link>https://arxiv.org/abs/2606.08408</link>
      <description>arXiv:2606.08408v1 Announce Type: new 
Abstract: We extend activation steering to diffusion language models (DLMs) and study a novel problem that arose due to the inference mechanism of DLMs: Modifying a text in-place to manifest a different concept. We propose TimpaTeks, an automatic in-place text modification mechanism using DLMs. Experiments on IMDB movie reviews (sentiment) and a synthetic Cats and Dogs Dataset (arbitrary, more unconventional concept steering) show that TimpaTeks provides a feasible novel mechanism to steer diffusion language model outputs in-place. TimpaTeks enables in-place modification while simultaneously lowers sentence perplexity and retaining the original sentence structre without the need of instruction tuned models. TimpaTeks is also computationally cheaper than prompt-based DLM steering, as it performs denoising in-place rather than constructing an additional prompt-conditioned output sequence.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08408v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Ryandito Diandaru, Ikhlasul Akmal Hanif, Fadli Aulawi Al Ghiffari, Ahmed Elshabrawy, Alham Fikri Aji</dc:creator>
    </item>
    <item>
      <title>Provably Efficient Personalized Multi-Objective Bandits with Proactive Conversational Queries</title>
      <link>https://arxiv.org/abs/2606.08410</link>
      <description>arXiv:2606.08410v1 Announce Type: new 
Abstract: Personalized decision-making in multi-objective bandits requires learning user-specific trade-offs among competing objectives. Since arm utility depends on both unknown rewards and unknown preferences, existing methods infer preferences only from utility feedback, entangling preference learning with reward exploration. In practice, however, users often reveal their priorities through proactive conversational queries (e.g., "cheap and clean hotel"), yet this structured signal is not leveraged. We formalize a proactive query-based framework in which user queries provide structured preference signals. Modeling these signals via a Plackett-Luce subset choice model, we show that query-only learning is insufficient due to a fundamental shift-invariance barrier. To resolve this, we introduce MO-PQUCB, a hybrid algorithm that integrates query-based preference anchoring with bandit feedback through shift-invariant regularization and dual-exploration UCB. We prove that proactive queries accelerate preference estimation and yield improved regret scaling over prior preference-aware MO-MAB methods. Under corrupted queries, we further characterize statistical limits and design a robust estimator achieving near-optimal performance when the corruption is sparse. Experiments validate both theoretical and practical gains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08410v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Linfeng Cao, Ming Shi, Ness B. Shroff</dc:creator>
    </item>
    <item>
      <title>AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding</title>
      <link>https://arxiv.org/abs/2606.08411</link>
      <description>arXiv:2606.08411v1 Announce Type: new 
Abstract: Block-wise semi-autoregressive decoding is the standard inference paradigm for diffusion large language models (DLMs), but it imposes a strict dependency between blocks: the next block cannot begin until the current block is fully decoded or its denoising budget is exhausted. We observe that once a block exposes a reliable delimiter boundary or stable semantic prefix, continuation generation need not wait for every residual token to be resolved. We propose AsyncLane, a training-free decoding scheduler that decouples refinement from advancement. AsyncLane forks a generate lane at observed delimiter boundaries into a refine lane and a continuation generate lane: the prefix remains editable, while the continuation advances before prefix refinement finishes. The resulting lane tree records decoding dependencies and output order, while execution proceeds over the active lane set. To make this asynchronous schedule efficient under bidirectional attention, AsyncLane combines shared-prefix lane batching, lookahead draft reuse, cascading termination, and compact cache refresh with refresh-logit reuse, preventing model-call cost from scaling directly with the number of lanes. AsyncLane is a drop-in replacement for block-wise DLM samplers and requires no retraining. Experiments on mathematical reasoning and code generation show that AsyncLane consistently improves throughput while maintaining competitive quality. Across LLaDA and Dream backbones, AsyncLane achieves the highest TPS in all evaluated benchmark-length settings; relative to the fastest competing baseline, it reaches peak speedups of 2.95x on LLaDA and 3.04x on Dream, with especially large gains under longer generation budgets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08411v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yingxuan Ren, Yuxuan Lou, Yong Liu, Pengcheng Fang, Ziming Wang, Pengfei Zhou, Yang You</dc:creator>
    </item>
    <item>
      <title>Complexity and Algorithms for Unary Translocation Distance</title>
      <link>https://arxiv.org/abs/2606.08412</link>
      <description>arXiv:2606.08412v1 Announce Type: new 
Abstract: Given a finite set of integers $A$, a \emph{unary translocation} produces a new set $A' = A \cup \{u,v\}$, where $u$ and $v$ are nonnegative integers satisfying $x+y=u+v$ for some $x,y\in A$. For an input set $A$ and a target set $B$, the \emph{unary translocation distance} is the minimum number of unary translocations required to obtain a superset containing $B$. In this paper, we study this problem from both theoretical and computational perspectives. We prove that computing the unary translocation distance is strongly NP-hard, thereby answering an open question raised by \citet{ConstantinMiclausPopa2026UnaryTranslocation}. On the positive side, we give an exact pseudo-polynomial algorithm for every fixed constant value of $|B|$, extending our previous results for $|B|\leq 2$. For arbitrary target sets, we present a $2$-approximation algorithm, an additive $(|B|-1)$-approximation algorithm, and show that the additive algorithm also yields a $3$-approximation. We also propose parameterized algorithms, including algorithms parameterized by the maximum value in the input set together with the optimum distance, and by the maximum value in the target set together with $|B|$. In addition, we propose an integer linear programming formulation that gives an exact mathematical model for the problem, analyze its size, and show that the LP relaxation has integrality gap at least $\frac{4}{3}$. Finally, we report computational experiments comparing the $2$-approximation algorithm, beam search, and simulated annealing. The results show that the approximation algorithm is highly effective in practice and often outperforms the heuristic baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08412v1</guid>
      <category>cs.DS</category>
      <category>cs.CC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Maria Constantin, Adrian Micl\u{a}u\c{s}, Alexandru Popa, Andrei Popa</dc:creator>
    </item>
    <item>
      <title>Beyond Prediction: Longitudinal Reasoning in EHR-Integrated Clinical AI</title>
      <link>https://arxiv.org/abs/2606.08413</link>
      <description>arXiv:2606.08413v1 Announce Type: new 
Abstract: We present a structured analysis of how contemporary clinical AI systems integrate electronic health record (EHR) data and the extent to which they support longitudinal clinical reasoning. Drawing on a curated corpus of clinical natural language processing (NLP) and EHR-integrated systems, we develop a coding framework that captures both technical integration strategies and reasoning-relevant representational features, such as trajectory modeling, cross-encounter synthesis, longitudinal analysis, and absence reasoning. We also elicited the experiences of three physicians in their EHR use, including what strengths and weaknesses they found with their institution's current EHR system(s). Our analysis shows that while many systems incorporate EHR data, they predominantly operate on encounter-level or aggregated representations, with limited support for explicit temporal reasoning across patient histories. Reasoning-relevant structures are inconsistently represented, and evaluation paradigms remain largely focused on predictive performance instead of longitudinal interpretability. We argue that current approaches treat EHR data as a static input rather than a substrate for ongoing clinical reasoning, and we outline a framework for understanding how future systems might more effectively align with the temporal and interpretive structure of clinical practice.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08413v1</guid>
      <category>cs.CY</category>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Irene Yi, Grace Brown, Sufian Aldogom, Nathan Roll, Eric J. Basile, Pamela M. Resnikoff, Isaac Gutterman, Oscar Schiff, Keira Salata, Benjamin Mujkic, Ammar Ahmed</dc:creator>
    </item>
    <item>
      <title>PACT: Self-Evolving Physical Safety Alignment for Diffusion Policies in Embodied Manipulation</title>
      <link>https://arxiv.org/abs/2606.08414</link>
      <description>arXiv:2606.08414v1 Announce Type: new 
Abstract: Diffusion policies have achieved remarkable success in robotic manipulation, yet they often fail to satisfy strict physical constraints required for safe deployment. Existing approaches impose safety either prematurely during training or reactively via external guardrails at test time, limiting policy expressivity and overall scalability. We propose Physical safety Alignment for Constrained Trajectories (PACT), a self-evolving post-training framework that projects pretrained diffusion policies onto constraint-feasible regions without accessing demonstration data or task rewards. PACT distills constraint gradients into the diffusion model through a reverse-KL objective with dense supervision across timesteps. It incorporates a curriculum that progressively tightens constraints while maintaining theoretically bounded policy shift and monotone improvement, mitigating the safety-performance trade-off from catastrophic forgetting. On simulated and real-world embodied manipulation benchmarks, PACT significantly reduces safety violations by 31.0% on average while improving task success by 30.7%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08414v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lingxuan Wu, Zijian Zhu, Lizhong Wang, Chengyang Ying, Huayu Chen, Xiao Yang, Fangming Liu, Jun Zhu</dc:creator>
    </item>
    <item>
      <title>CoVEBench: Can Video Editing Models Handle Complex Instructions?</title>
      <link>https://arxiv.org/abs/2606.08415</link>
      <description>arXiv:2606.08415v1 Announce Type: new 
Abstract: While recent text-guided video editing models excel at elementary tasks (e.g., style transfer, object insertion), real-world user requests are highly compositional. A single prompt often demands multiple coupled edits, such as modifying subjects, actions, and camera views, while strictly preserving unrelated spatiotemporal content. Existing benchmarks, heavily constrained by isolated edits and coarse global metrics, fail to diagnose how models handle such complex workflows. To address this gap, we introduce CoVEBench, a compositional video editing benchmark comprising 416 curated source videos, 626 multi-point editing instructions, and 9,990 fine-grained checklist items. Covering diverse editing dimensions, CoVEBench evaluates models via MLLM-judged instruction compliance and video fidelity, alongside automated metrics for video quality. Extensive experiments reveal that compositional editing remains a profound challenge: current models frequently omit edits, violate preservation constraints, or introduce artifacts when handling multiple operations simultaneously. CoVEBench provides a challenging, diagnostic testbed to advance video editing toward realistic user workflows.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08415v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiangtao Wu, Jiaming Wang, Yiwen He, Yuanxing Zhang, Shihao Li, Dunyuan Liu, Xuedong Zhao, Jialu Chen, Zekun Moore Wang, Jiaheng Liu</dc:creator>
    </item>
    <item>
      <title>Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics</title>
      <link>https://arxiv.org/abs/2606.08417</link>
      <description>arXiv:2606.08417v1 Announce Type: new 
Abstract: Diffusion and continuous flow-based language models have emerged as the leading non-autoregressive alternatives to language modeling. Progress in both paradigms is overwhelmingly tracked by generative perplexity (gen-PPL): the per-token negative log-likelihood of samples under a frozen autoregressive (AR) scorer such as gpt2-large, typically paired with an empirical-entropy guardrail to rule out low-entropy collapse. We argue that this metric is unsound. By construction, gen-PPL measures only predictability under the scoring AR, not grammaticality or semantic coherence -- and the set of predictable but still low-quality sequences is combinatorially large. To make this concrete, we construct a suite of zero-parameter, deliberately naive samplers that achieve state-of-the-art gen-PPL on LM1B and OpenWebText at non-degenerate entropy, surpassing recently published diffusion and continuous-flow models while producing text that is incoherent by construction. We recommend evaluation suites that directly quantify the distributional divergence between generated and reference text, and use such a suite to re-benchmark recent non-autoregressive models, recovering a more faithful picture of the current state of the art.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08417v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Antonio Franca, Alexander Tong</dc:creator>
    </item>
    <item>
      <title>CheXanatomy: Anatomy-Aware Vision-Language Modeling for Chest Radiographs</title>
      <link>https://arxiv.org/abs/2606.08420</link>
      <description>arXiv:2606.08420v1 Announce Type: new 
Abstract: Vision-language models (VLMs) pretrained on large-scale image-text pairs demonstrate strong image-level understanding, but are primarily optimized for global alignment and do not explicitly encode fine-grained anatomical structure, limiting their suitability for spatially precise tasks such as segmentation. We introduce CheXanatomy, a framework that integrates explicit anatomical knowledge into a pretrained VLM through autoregressive token-space supervision. Instead of adding task-specific decoder heads, the model is trained to generate anatomical segmentation masks via next-token prediction. To enable scalable supervision, we synthesize realistic chest radiographs from CT volumes and forward-project CT segmentation labels to obtain anatomically consistent 2D masks. We evaluate the approach on synthetic and real chest radiographs against a U-Net baseline, including ablations on model scale, input resolution, and vision encoder fine-tuning. Autoregressive anatomical supervision achieves performance comparable to specialized convolutional models in-distribution and demonstrates improved geometric robustness under domain shift to real CXR data. In addition, anatomy-pretrained models exhibit improved sample efficiency when adapting to novel localization tasks under limited supervision. Larger models and higher input image resolution improve performance, while vision encoder fine-tuning has limited effect. These results show that embedding anatomical structure directly into the generative objective promotes spatially grounded representations and supports anatomy-aware medical vision-language modeling.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08420v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sergios Gatidis, Curtis Langlotz, Christian Bluethgen</dc:creator>
    </item>
    <item>
      <title>Segmentation-Assisted Brain MRI Synthesis with Cross-Image Multi-Contrast Feature Memory Bank Retrieval Augmentation</title>
      <link>https://arxiv.org/abs/2606.08421</link>
      <description>arXiv:2606.08421v1 Announce Type: new 
Abstract: Multi-contrast brain MRI provide complementary soft-tissue characteristics that aid in the screening and diagnosis of diseases. However, limited scanning time, image corruption and various imaging protocols often result in incomplete multi-contrast images. While current approaches excel in image synthesis, they often struggle to synthesize critical tumor regions and exploit contextual information in multi-contrast brain MRI effectively. To address this issue, we propose a synthesis-centric, segmentation-assisted closed-loop framework with retrieval augmentation synthesis. Our method overall takes a generative adversarial architecture, which aims to synthesize missing contrasts from any combination of available ones with a single model. To explicitly capture tumor semantics and focus synthesis on tumor regions, we add an auxiliary segmentation branch that predicts tumor masks and feeds them back as semantic conditioning in synthesis branch, thereby learning tumor-aware representations in the model and improving synthesis fidelity. Furthermore, we propose a dual-bank retrieval augmentation strategy. It dynamically queries two external knowledge bases, namely a tumor masks memory bank for crucial tumor context and cross-image contrast feature memory bank for global style information, to augment synthesis. Verified on two public multi-contrast magnetic resonance brain datasets: BraTs2020 and UCSF-BMSR, the proposed method is effective in handling medical brain images synthesis tasks and shows superior performance compared to previous methods. Code is available at:https://github.com/iBizzard/SSCF.git</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08421v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wenwei Huang, Jia Wei, Jianlong Zhou</dc:creator>
    </item>
    <item>
      <title>TinyGiantALM: A Compact Audio-Language Model for Intent-Aware Reasoning under Resource Constraints</title>
      <link>https://arxiv.org/abs/2606.08425</link>
      <description>arXiv:2606.08425v1 Announce Type: new 
Abstract: Current advancements in Audio Reasoning rely on massive Large Audio-Language Models (LALMs), hindering deployment in resource-constrained environments. We introduce TinyGiantALM, a compact 1.5B efficiency-oriented alternative. Instead of brute-force scaling, we propose an Instruction-Aware Feature Refinement framework using a Query-guided Projector and Semantic Gating to filter acoustic signals based on user intent. On the MMAR benchmark, TinyGiantALM achieves 46.4% zero-shot accuracy, significantly outperforming 7B-13B baselines. While a reasoning gap in logical narrative remains versus 30B+ models and certain trade-offs exist in overly dense or spatial scenes, our approach notably surpasses models up to 8x larger in disentangling mixed-modality environments. These findings demonstrate that architectural precision offers a tangible pathway to secure robust perception capabilities on edge-friendly scales.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08425v1</guid>
      <category>cs.SD</category>
      <category>cs.CL</category>
      <category>eess.AS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Vinh-Thuan Ly</dc:creator>
    </item>
    <item>
      <title>CritLens: Visual Analytics for Criteria Discovery in Review-Based Decision Making</title>
      <link>https://arxiv.org/abs/2606.08426</link>
      <description>arXiv:2606.08426v1 Announce Type: new 
Abstract: We present CritLens, a visual analytics system that helps users build personalized multi-criteria decision models from review text. In everyday decisions -- choosing equipment, hotels, or restaurants -- evaluation criteria are either preset by platforms or generated by LLMs, leaving users unable to discover, adjust, or verify them against the underlying evidence. This is problematic because many preferences are latent: they surface only upon encountering specific reviews, and any fixed framework risks overlooking low-frequency but decisive details. CritLens addresses this gap by using LLMs to transform reviews into an initial AHP decision model, then supporting iterative, human-in-the-loop refinement. Through coverage gap detection in the embedding space, users discover criteria missed by the initial model; through interactive weight adjustment under AHP consistency constraints, they express personal priorities; and through a multi-level scorecard and exportable decision report, they trace every ranking back to the original review text. Two case studies, an eight-participant user study, and a quantitative consistency-repair experiment demonstrate the system's effectiveness.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08426v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hongjia Wu, Shuai Zhou, Hongxin Zhang, Wei Chen</dc:creator>
    </item>
    <item>
      <title>Accuracy-Configurable Floating-Point Multiplier Design for SRAM-Based Compute-in-Memory</title>
      <link>https://arxiv.org/abs/2606.08430</link>
      <description>arXiv:2606.08430v1 Announce Type: new 
Abstract: Digital Compute-in-Memory (DCiM) reduces data movement and has become a promising solution for energy-efficient edge AI. However, most existing DCiM frameworks still primarily target integer or fixed-point arithmetic, and provide limited support for compiler-integrated and accuracy-configurable floating-point computation. Directly integrating conventional IEEE 754 floating-point units into dense SRAM-based DCiM arrays, however, incurs high area and power overhead. To address this challenge, this work presents an accuracy-configurable floating-point multiplier integrated into the OpenACM framework for SRAM-based DCiM. An exact IEEE~754-compliant multiplier is first implemented as a baseline, and a mantissa-segmentation-based approximate multiplier is then proposed to reduce hardware cost while preserving numerical fidelity. Post-layout results show up to 69% logic area reduction and 72% power savings over exact floating-point designs without delay overhead. Evaluations on image processing tasks and ResNet-18 inference further demonstrate negligible accuracy degradation. These results indicate that compiler-integrated approximate floating-point multiplication is a practical approach for enabling efficient and configurable floating-point support in SRAM-based DCiM systems. The Floating-Point Multiplier is available on https://github.com/ShenShan123/OpenACM</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08430v1</guid>
      <category>cs.AR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yiqi Zhou, Junhao Lu, Jiale Yu, Zhuo Xu, Yang He, Yue Yuan, Shan Shen, Daying Sun</dc:creator>
    </item>
    <item>
      <title>Control-Theoretic View of Neural ODEs: Empirical Controllability and Observability</title>
      <link>https://arxiv.org/abs/2606.08431</link>
      <description>arXiv:2606.08431v1 Announce Type: new 
Abstract: This paper studies neural ordinary differential equations (neural ODEs) from a control-theoretic perspective using controllability and observability concepts. The neural ODE is represented in a control-affine form to facilitate analysis using tools from nonlinear and linear time-varying (LTV) systems. Controllability is examined through trajectory linearization, where the LTV controllability Gramian provides a local, first-order measure of input influence along a nominal trajectory. Observability is analyzed through output linearization, where the LTV observability Gramian characterizes the local ability to reconstruct system states from output measurements. Koopman-based lifting is considered to extend the analysis to a higher-dimensional representation, and its limitations under multiple equilibria and basin-dependent behavior are discussed. The proposed framework is illustrated on a series RLC circuit. The learned neural ODE reproduces system trajectories and generalizes to unseen initial conditions. The computed Gramians are numerically full rank along the tested trajectories, indicating local controllability and observability of the linearized dynamics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08431v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Md Saiful Islam, Rahul Bhadani</dc:creator>
    </item>
    <item>
      <title>Trajectory-Refined Distillation</title>
      <link>https://arxiv.org/abs/2606.08432</link>
      <description>arXiv:2606.08432v1 Announce Type: new 
Abstract: On-policy distillation (OPD) has become a central post-training tool for large language models (LLMs), providing dense per-token teacher supervision along the student's own rollouts. In this work, we identify a common structural cause underlying OPD, which we call prefix failure. Under prefix failure, dense per-token supervision induces a bimodal teacher mixture and fragmented gradients that token-level loss truncation or reweighting fail to address. This observation motivates us to move beyond token-level loss interventions toward trajectory-level output corrections. We thus propose Trajectory-Refined Distillation (TRD), a trajectory-level correction method that revises the student's rollout under the teacher guidance while within on-policy support. By correcting problematic prefixes before distillation, TRD mitigates prefix failure at its source. Moreover, TRD improves the exploration by exposing the student to alternative valid derivations under teacher guidance, even when the original rolls are already correct. TRD can also be applied to on-policy self-distillation (OPSD), a parameter-sharing variant that uses the student model conditioned on privileged informations as the teacher. Across a wide range of benchmarks and base models at multiple scales, TRD consistently outperforms prior baselines, improving single-attempt accuracy and broadening reasoning coverage. Code is available at https://github.com/louieworth/trd</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08432v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Li Jiang, Haoran Xu, Yichuan Ding, Amy Zhang</dc:creator>
    </item>
    <item>
      <title>AI Code Sandboxes: A Comparative Security Study. Part 1 of 2 -- Engine-Level Properties (Attack Surface, Leakage, Stackability, CVE History, Patch Cadence, Fuzzing)</title>
      <link>https://arxiv.org/abs/2606.08433</link>
      <description>arXiv:2606.08433v1 Announce Type: new 
Abstract: This paper reads six engine-level measurements together -- 1.1 host attack surface, 1.2 information leakage, 1.3 defense-in-depth stackability, 1.4 public CVE history, 1.5 patch cadence, and 1.6 upstream fuzzing posture -- to describe how five AI-sandbox products isolate guest code from the host kernel. No single axis is a sufficient basis for a comparative judgement; the cross-axis reading is the load-bearing analysis.
  Three high-level findings: (1) engine classes (microVM, userspace kernel, OCI container) separate cleanly on every architectural axis, but products within a class do not; (2) product pin policy is the dominant operator-facing variable -- engine-side patch latency aggregates to ~0 days for coordinated disclosures, while downstream lag spans 0 days to 471+ days to "opaque" to infinity; (3) fuzzing investment splits into three tiers, and the strongest combination -- microVM x continuous public fuzzer -- is unoccupied in this set, leaving the "0 published CVEs x no upstream fuzzer x no academic study" intersection structurally unmeasured.
  We report per-axis orderings, per-product portraits, and a threat-model qualification matrix; no overall ranking is proposed. Companion repository (code, Apache-2.0): https://github.com/orbitalab/RnD-ai-sandboxes-sec-study-part-1. License: CC BY 4.0.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08433v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>George Andronchik, Pavel Lokhmakov</dc:creator>
    </item>
    <item>
      <title>A Unified Framework for Contraction Stability Analysis of Heterogeneous Grid-Forming Inverters</title>
      <link>https://arxiv.org/abs/2606.08434</link>
      <description>arXiv:2606.08434v1 Announce Type: new 
Abstract: The shift to renewable-dominated power systems has produced low-inertia grids, undermining system stability. In this context, grid-forming inverters (GFMs) have emerged as a promising solution. However, GFMs challenge conventional analysis techniques, especially those relying on small-signal or root-mean-square (RMS) models. Such models rely on linearization and sinusoidal steady-state assumptions, which fail in large-signal cases. Stability of GFM-based systems therefore becomes operating-point dependent, and a feasible operating point may not even exist. While large-signal analyses are available, decentralized certification of operating-point convergence with explicit transient guarantees, such as rate and overshoot, remains rare. This paper proposes an algebraic, decentralized contraction-based framework. The proposed contraction stability analysis certifies system stability and convergence to desired operating points. The method works in the time domain and captures nonlinear, large-signal behavior of synchronization and power-sharing mechanisms. Moreover, the contraction rate provides an explicit bound on transient time: trajectories converge exponentially to the new operating point at a controlled rate, yielding computable contraction regions that certify stability and large-signal convergence across operating-point changes. These regions directly guide parameter tuning for heterogeneous GFMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08434v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Qianxi Tang, Li Peng</dc:creator>
    </item>
    <item>
      <title>Reinforcing Temporal Answer Grounding in Instructional Video via Candidate-Aware Causal Reasoning</title>
      <link>https://arxiv.org/abs/2606.08436</link>
      <description>arXiv:2606.08436v1 Announce Type: new 
Abstract: The task of temporal answer grounding in instructional video (TAGV), which aims to locate precise video segments that respond to natural language queries, is increasingly important for direct video answer retrieval. This task remains challenging due to the need to comprehend semantically complex questions and to address the significant length mismatch between untrimmed videos and short target moments. Existing methods often suffer from sensitivity to irrelevant content or insufficient visual reasoning capabilities. To tackle these limitations, we propose a Candidate-Aware Causal Reasoning (CACR) framework. Our approach first employs a Visual-Language Pre-training based Candidate Selection (VBCS) algorithm to efficiently generate K candidate segments, then applies a temporal logic reasoning module enhanced by a rejection reward mechanism and optimized via Group Relative Policy Optimization (GRPO) for robust inference. Extensive experiments on six benchmarks demonstrate that our method achieves state-of-the-art performance in terms of mean Intersection-over-Union (mIoU), providing a new perspective for reasoning-based retrieval in long videos.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08436v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Muge Qi, Rong Fu, Pengbin Feng, Xianda Li, Yu Cai, Yifu Guo, Shizhe Zhang, Simon James Fong, Lei Ma, Bin Li</dc:creator>
    </item>
    <item>
      <title>RadioDiff-Inv2: Differentiable Diffusion Inversion under Location Drift from Sparse Noisy Measurements for Radio Map Estimation</title>
      <link>https://arxiv.org/abs/2606.08439</link>
      <description>arXiv:2606.08439v1 Announce Type: new 
Abstract: Radio map (RM) estimation is a key enabler for environment-aware optimization in 6G wireless networks. In practice, RM construction increasingly relies on crowdsourced received signal strength (RSS) feedback that is inherently sparse and noisy. A further and often overlooked challenge is location drift, whereby privacy constraints and user mobility cause reported sampling coordinates to deviate from the true measurement locations. Unlike additive measurement noise, location drift perturbs the sensing operator itself, since each RSS sample effectively queries the underlying RM at an incorrect spatial coordinate. This operator uncertainty, compounded with sparse noisy sensing, renders the inverse problem severely ill-posed and limits conventional estimators that rely on analytically specified priors. This paper proposes RadioDiff-Inv2, a differentiable diffusion inversion framework that estimates RMs from sparse noisy measurements under location drift. A Gaussian resampling scheme is introduced to construct a differentiable, drift-aware measurement operator on grid-based maps, and the probability-flow ordinary differential equation (ODE) is exploited to cast the diffusion sampler as a deterministic, differentiable mapping from an initial noise code to the estimated RM. By optimizing the noise code via backpropagation against a drift-marginalized data-fidelity objective, RadioDiff-Inv2 produces reconstructions that are both prior-plausible and measurement-consistent without costly posterior sampling. Extensive experiments show that RadioDiff-Inv2 outperforms the best competing baseline by 4 to 14 dB in PSNR across varying sparsity and drift levels. The advantage is most pronounced in low-SNR regimes, where the learned diffusion prior maintains near-constant reconstruction fidelity while conventional methods degrade severely.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08439v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiucheng Wang, Kailong Wang, Nan Cheng</dc:creator>
    </item>
    <item>
      <title>GraspFoM: Towards Reconstruction-Driven Robotic Grasping with 3D Foundation Priors</title>
      <link>https://arxiv.org/abs/2606.08440</link>
      <description>arXiv:2606.08440v1 Announce Type: new 
Abstract: Robotic grasping is a fundamental capability in robotic manipulation. Yet grasping remains challenging under partial observations. Reliable grasping depends on both local contact cues and object-level 3D structure. Existing geometry-aware grasping methods recognize the value of reconstruction, but they typically treat geometry as an intermediate prediction rather than a reusable object prior for grasping. In this paper, we present GraspFoM, a unified framework that leverages 3D foundation priors (SAM3D) to build a shared 3D object latent for both reconstruction and grasp pose prediction. Built on this shared object latent, we introduce an anchor-initialized truncated pose-reasoning diffuser that predicts continuous and multimodal grasp poses without directly relying on discrete grasp candidates. We further investigate the interaction between reconstruction and grasping through a reconstruction-aware scorer and a residual latent updater. Reconstruction provides grounded geometric cues, while grasp supervision refines the shared object latent toward grasp-relevant affordances. GraspFoM jointly predicts grasp poses and reconstructs high-fidelity 3D assets in mesh and 3DGS forms. Comprehensive experiments demonstrate that GraspFoM achieves state-of-the-art results on both reconstruction and grasping. Notably, these improvements require only a small number of additional trainable parameters. Component-wise ablation studies also demonstrate the contribution of each component.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08440v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dongli Wu, Xiaobao Wei, Hao Wang, Qiaochu Dong, Ying Li, Qingpo Wuwu, Ming Lu, Wufan Zhao</dc:creator>
    </item>
    <item>
      <title>Comparing Controller-Free Pointing Techniques Across Depth for 2D Selection in Augmented Reality</title>
      <link>https://arxiv.org/abs/2606.08441</link>
      <description>arXiv:2606.08441v1 Announce Type: new 
Abstract: This paper presents a systematic evaluation of five controller-free pointing techniques for 2D target selection in AR, using ISO 9241-411. We compared them across multiple depths (2 m, 6 m, 10 m) in terms of movement time, accuracy, throughput, and workload (NASA TLX). Head- and eye-based pointing significantly outperformed the hand-based methods (Finger, Wrist, and Arm); Head input was the most accurate and remained the most consistent across depth. Depth significantly impacted performance, with complex interactions with target size and distance. Our results offer a comprehensive empirical basis for selecting appropriate controller-free techniques in depth-varying AR tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08441v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>Proceedings of the Graphics Interface Conference 2026</arxiv:journal_reference>
      <dc:creator>Samiha Sultana, J. Felipe Gonzalez, Robert J. Teather</dc:creator>
    </item>
    <item>
      <title>Clinical Reasoning in the Age of AI: Longitudinal Cognition and Human-AI Collaboration</title>
      <link>https://arxiv.org/abs/2606.08442</link>
      <description>arXiv:2606.08442v1 Announce Type: new 
Abstract: As physicians turn to AI-powered systems to help meet the dual demands of speed and care quality, they are met with hallucinations and sycophancy. Understanding how doctors reason through clinical problems in real-world settings is critical for design of effective AI reasoning systems. While recent advances in medical AI have emphasized performance benchmarks and diagnostic accuracy, comparatively little attention has been paid to the structure of clinicians' reasoning processes as they unfold over time, e.g., how they interact with electronic health records and operate under conditions of uncertainty and constraint. This study provides a comprehensive, empirically-grounded account of clinical reasoning and its relationship to current AI-mediated workflows through a mixed-methods design that combines qualitative interviews with structured survey data.
  Findings indicate that current AI systems are primarily deployed for encounter-level tasks such as documentation and summarization, and only partially align with physicians' underlying reasoning processes. In particular, AI-generated representations often omit temporal or interpretive structures central to clinical decision-making, while core aspects of reasoning, especially those spanning multiple encounters, remain largely implicit and physician-driven. By integrating fine-grained qualitative insights with broader quantitative patterns, this study offers a unified framework for understanding clinical reasoning as a context-sensitive, temporally extended process and identifies key mismatches between clinician cognition and current AI design. These results provide concrete directions for the development of AI systems that more effectively align with and augment real-world clinical reasoning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08442v1</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Irene Yi, Grace Brown, Sufian Aldogom, Nathan Roll, Eric J. Basile, Pamela M. Resnikoff, Bianca Sanchez, Chirag Lodha, Isaac Gutterman, Oscar Schiff, Keira Salata, Benjamin Mujkic, Ammar Ahmed</dc:creator>
    </item>
    <item>
      <title>When LLMs Invent Rust Crates: An Empirical Study of Hallucination Patterns and Mitigation</title>
      <link>https://arxiv.org/abs/2606.08444</link>
      <description>arXiv:2606.08444v1 Announce Type: new 
Abstract: Large Language Models (LLMs) have become powerful tools for code generation, yet they remain prone to hallucinations-producing plausible but incorrect or fabricated outputs. Among these, package hallucination, where an LLM suggests non-existent dependencies, poses an emerging security risk to the software supply chain. While previous studies focus on popular languages like Python or JavaScript, in this work we present the first large-scale empirical study on crate hallucination in LLM-generated Rust code. We construct a multi-source dataset combining coding tasks from Stack Overflow, GitHub, and LLM-generated tasks, and evaluate both commercial and open-source models under various decoding settings. Our analysis reveals that, unlike prior findings in Python and JavaScript, hallucination behavior in Rust follows a distinct pattern: different models exhibit surprisingly consistent hallucination rates, and these rates show minimal sensitivity to model parameters. Furthermore, we investigate prompt engineering strategies to mitigate hallucinations without sacrificing code quality. This study provides new insights into the reliability and security implications of LLM-assisted Rust development, offering guidance for future research and safer model deployment in software engineering workflows.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08444v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jieming Zheng, Hao Guan, Yepang Liu</dc:creator>
    </item>
    <item>
      <title>Segment-level Tree Search for Long Meeting Document Summarization</title>
      <link>https://arxiv.org/abs/2606.08445</link>
      <description>arXiv:2606.08445v1 Announce Type: new 
Abstract: Meeting documents are challenging to summarize due to their length and complex conversational structure. Existing approaches typically adopt multi-stage pipelines that extract information prior to summarization; however, these approaches often suffer from cumulative error propagation without intermediate validation, a limitation further amplified by short and low-quality reference summaries. We propose segment-level summarization via Monte Carlo Tree Search (S3), a training-free framework that constructs a final summary by composing segment-level summary candidates. S3 partitions a long document into segments and generates multiple summary candidates per segment, forming nodes of a search tree. The best-scoring combination is selected via self-reward-guided tree search and refined into the final output. Despite using a 7B model, S3 achieves performance comparable to larger 72B models while producing length-appropriate summaries.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08445v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sangwon Ryu, Heejin Do, Jun Seo, Daehui Kim, Yunsu Kim, Gary Geunbae Lee, Jungseul Ok</dc:creator>
    </item>
    <item>
      <title>Sparrow: Sparse Rollout for Stable and Efficient Long-context RL of Large Language Models</title>
      <link>https://arxiv.org/abs/2606.08446</link>
      <description>arXiv:2606.08446v1 Announce Type: new 
Abstract: Despite being powerful, reinforcement learning with verifiable rewards (RLVR) induces extremely long COT, making it computationally expensive. Since RLVR per-step cost is dominated by long-context rollout generation, sparse attention offers a promising way to accelerate dense rollout. However, sparse rollouts require a delicate stability-efficiency tradeoff: overly aggressive sparsity causes collapse, while overly lenient sparsity gives insufficient speedup. In this work, we study this tradeoff through sparse-to-dense actor-policy mismatch. We first observe that sparse rollout collapse is not driven by uniform degradation across tokens: most sparse tokens align perfectly with dense even under aggressive sparsity. Motivated by this, we hypothesize that sparse rollout training remains stable if the lower tail of per-token actor-policy mismatch stays above a critical threshold throughout the trajectory. We introduce a dynamic sparsity schedule that keeps this tail statistic constant during generation and validate our hypothesis. Across Qwen3 thinking-family models, keeping the tail mismatch statistic near a consistent threshold generally enables stable training. We then use a cost model to find the sparsity schedule for maximum speedup under this mismatch threshold, achieving 2.2x, 2.4x, and 2.0x rollout speedups when training Qwen3-1.7B, Qwen3-4B, and Qwen3-8B. Empirically, we show the thresholds generalize to a larger model (Qwen3-14B) and another RL domain (coding). Finally, our analysis naturally motivates DistillSparse: lightweight LoRA-based distillation on sparse rollout lets more aggressive sparsity reach the same sparse-to-dense mismatch threshold, yielding higher speedup.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08446v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yang Zhou, Ranajoy Sadhukhan, Zhaofeng Sun, Zhuoming Chen, Souvik Kundu, Saket Dingliwal, Sai Muralidhar Jayanthi, Aram Galstyan, Haizhong Zheng, Beidi Chen</dc:creator>
    </item>
    <item>
      <title>Not Just After One: Sleep-Inspired Replay Prevents Catastrophic Forgetting After Sequential Tasks</title>
      <link>https://arxiv.org/abs/2606.08447</link>
      <description>arXiv:2606.08447v1 Announce Type: new 
Abstract: One of the critical limitations of artificial neural networks is their lack of ability to continually learn: training on new tasks often leads to interference and forgetting of the previous ones. While several algorithms have been proposed to protect old memories from interference, they are typically applied during or immediately after each new episode of training. In contrast, humans and animals can learn continuously, acquiring multiple new memories during active learning before consolidating all of them into long-term storage. Here we show that multiple new tasks can be trained sequentially before an unsupervised sleep-like replay phase is applied to partially restore performance across all previously learned tasks. Our study further suggests that task-specific information remains resilient to new training but decays gradually as network is trained on new tasks. These findings point to novel principles for developing a broad range of continual learning AI solutions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08447v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Anthony Bazhenov, Jean Erik Delanois, Giri P. Krishnan</dc:creator>
    </item>
    <item>
      <title>Multiscale Fourier Neural Operator for Inverse Wave Scattering in Highly Oscillatory Media</title>
      <link>https://arxiv.org/abs/2606.08448</link>
      <description>arXiv:2606.08448v1 Announce Type: new 
Abstract: In this paper, we propose an operator learning method based on the multiscale Fourier neural operator (MscaleFNO) for inverse medium problems of Helmholtz equations. The MscaleFNO provides a neural surrogate model with reduced spectral bias for the Helmholtz equations, mapping highly oscillatory medium profiles to scattered wavefields. A plug-and-play inversion using elucidated diffusion model is introduced to regularize the inverse solver based on least squares of data misfits. Numerical results for partial aperture inversion of oscillatory two-dimensional media demonstrate the advantage and effectiveness of MscaleFNO for accurate reconstruction of highly oscillatory medium properties.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08448v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zilin You, Zhenli Xu, Wei Cai</dc:creator>
    </item>
    <item>
      <title>GIFT: LLM-Guided State-Reward Interface for Financial Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2606.08450</link>
      <description>arXiv:2606.08450v1 Announce Type: new 
Abstract: Financial portfolio trading is naturally formulated as a reinforcement learning problem, where an agent sequentially rebalances assets under changing market conditions to balance return, risk, and transaction costs. Yet in non-stationary markets, raw OHLCV states and short-horizon return rewards often provide an under-specified learning interface, motivating large language models as a way to inject financial knowledge into state and reward design while constraining open-ended generation. To this end, we propose GIFT, an LLM-guided framework for state-reward interface design in PPO-based financial reinforcement learning. Rather than using the LLM to make trading decisions, GIFT uses Factor-guided State Enhancement to generate state features from financial-factor primitives, Risk-rule-guided Reward Shaping to generate auxiliary rewards from portfolio-risk rules, and Diagnostic-guided Refinement to revise candidate interfaces using PPO rollout diagnostics. After refinement, GIFT fixes the selected state-reward interface before evaluation, with no further LLM queries or interface updates at test time. Comprehensive rolling-window experiments across diverse market regimes and portfolio scenarios demonstrate that GIFT improves learning-signal quality and out-of-sample risk-adjusted portfolio performance over baselines. Code and data are available at: https://github.com/KAG778/GIFT .</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08450v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yanyan Wu, Boyi Zhang, Yanlin Liu, Xinyu Fang, Jining Luan, Meiqi Zhang, Jiacheng Liu, Hao Zeng, Dexu Yu, Chang Liu, Hanwen Du, Yongxin Ni, Youhua Li</dc:creator>
    </item>
    <item>
      <title>Sycophancy as a Multilingual Alignment Failure: How Safety Degrades Across Languages, Topics, and Models</title>
      <link>https://arxiv.org/abs/2606.08451</link>
      <description>arXiv:2606.08451v1 Announce Type: new 
Abstract: Safety-aligned large language models often exhibit sycophancy, which is the tendency to affirm users' opinions regardless of factual accuracy. Although well-studied in English, its manifestation in other languages remains largely unexamined, leaving billions of non-English speakers potentially vulnerable to model-validated misinformation. We present the first large-scale, multi-model evaluation of cross-lingual sycophancy, benchmarking \textbf{six instruction-tuned models} across \textbf{1.1 million instances} spanning \textbf{38 languages} and \textbf{33 topic categories}. We identify a consistent resource-tier effect: sycophancy rates spike sharply in low-resource and zero-shot language settings. Critically, this degradation is topic-agnostic, as models fail uniformly across both benign and safety-critical prompts, offering no additional protection where it is most needed. We further identify tokenizer fertility as a structural driver of this alignment collapse. Collectively, our results demonstrate that prevailing alignment methodologies generalize poorly beyond high-resource languages, underscoring the urgent need for equitable multilingual safety techniques.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08451v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Arya Shah, Himanshu Beniwal, Mayank Singh, Chaklam Silpasuwanchai</dc:creator>
    </item>
    <item>
      <title>Theoretical Foundations of Continual Learning via Drift-Plus-Penalty</title>
      <link>https://arxiv.org/abs/2606.08452</link>
      <description>arXiv:2606.08452v1 Announce Type: new 
Abstract: In many real-world settings, data streams are nonstationary and arrive sequentially, requiring learning systems to adapt continuously without retraining from scratch. Continual learning (CL) addresses this challenge by incorporating new tasks while mitigating catastrophic forgetting, where learning new information degrades performance on previously acquired knowledge. We introduce a control-theoretic perspective on CL that explicitly regulates the evolution of forgetting, framing adaptation as a controlled process subject to long-term stability constraints. We focus on replay-based CL, where a finite memory buffer stores representative samples from prior tasks. We propose COntinual Learning with Drift-Plus-Penalty (COLD), a continual learning framework based on the Drift-Plus-Penalty (DPP) principle from stochastic optimization. To facilitate analysis, we also consider an oracle variant, COLD-ORACLE, as a reference benchmark. At each task, both methods minimize the current task loss while maintaining a virtual queue that tracks deviations from long-term stability on previously learned tasks, capturing the stability-plasticity trade-off as a regulated dynamical process. We establish stability and convergence guarantees that characterize this trade-off through a tunable control parameter. Experiments on standard benchmarks demonstrate that COLD consistently outperforms a broad range of state-of-the-art CL methods while providing competitive and controllable forgetting behavior through explicit regulation of stability and plasticity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08452v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nazreen Shah, Govinda Arya, Bharath B. N., Ranjitha Prasad</dc:creator>
    </item>
    <item>
      <title>Beyond Linear Activation Steering: Invertible Latent Transformations for Controlling LLM Behavior</title>
      <link>https://arxiv.org/abs/2606.08454</link>
      <description>arXiv:2606.08454v1 Announce Type: new 
Abstract: Activation steering provides a lightweight inference-time mechanism for controlling large language models (LLMs) by modifying their internal activation vectors toward desired behaviors. Most existing methods compute a fixed steering direction in the original activation space, typically from pairs of contrastive examples using mean differences, linear probes, or arbitrary separability criteria. While effective to a certain extent, these methods treat behavioral control as a global, linear, additive offset: the same direction is applied across inputs, and behaviors are linearly separable. This can be restrictive when behavioral features vary nonlinearly across the activation space or lie on curved and anisotropic manifolds, where the optimal intervention may be input-dependent. To address this limitation, we propose INNSteer, a nonlinear activation steering framework based on invertible latent transformations. Rather than searching for a better steering vector in the original representation space, INNSteer learns a lightweight invertible neural network $\phi$ that maps an LLM's activations into a latent space where behavioral classes are more amenable to linear control. At inference time, activations are mapped through $\phi$, steered in the latent space, and mapped back through the exact inverse transformation $\phi^{-1}$. This makes a simple latent-space translation become a nonlinear, input-dependent intervention in the original activation space. Across experiment settings on multiple LLM families, scales, behavioral traits, and safety benchmarks, INNSteer consistently improves model control over linear, transport-based, and nonlinear steering baselines while largely preserving generation fluency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08454v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Tuc Nguyen, Thai Le</dc:creator>
    </item>
    <item>
      <title>The Consistency Illusion: How Multi-Agent Debate Hides Reasoning Misalignment</title>
      <link>https://arxiv.org/abs/2606.08457</link>
      <description>arXiv:2606.08457v1 Announce Type: new 
Abstract: Multi-agent LLM systems for medical question answering often treat consensus as a reliability signal: if multiple agents agree on an answer, it is presumed trustworthy. However, answer-level consensus does not entail reasoning-level alignment. We introduce CARA (Cross-Agent Reasoning Alignment), a family of automated metrics that measure whether agents who agree on an answer also agree on the reasoning. Applying CARA to a standard debate system on two medical QA benchmarks, MedQA-USMLE and MedThink-Bench, we identify the consistency illusion: a failure mode where debate reduces detectable contradictions between agents while simultaneously decreasing the semantic similarity of their reasoning chains; agents appear to agree more but reason less consistently. To improve this misalignment, we propose the Grounded Debate Protocol (GDP), a prompt-level intervention that requires agents to commit to named medical facts and take explicit stances on other agents' claims. GDP produces large, consistent alignment improvements, with Cohen's d ranging from +1.43 to +1.99, across two datasets and two backbone models, without adding LLM calls or modifying system architecture. Our results motivate cross-agent reasoning alignment as a quantity to audit alongside accuracy in safety-critical domains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08457v1</guid>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiaoyang Wang, Christopher C. Yang</dc:creator>
    </item>
    <item>
      <title>Personalized and Robust Proactive Robot Assistance with Uncertainty-Guided LLM Reasoning</title>
      <link>https://arxiv.org/abs/2606.08458</link>
      <description>arXiv:2606.08458v1 Announce Type: new 
Abstract: Proactive robot assistance in household environments requires accurate prediction of human activities and object usage under dynamic and noisy conditions. Existing approaches often rely on complex spatio-temporal models, which can be computationally expensive and sensitive to environmental variability. In this paper, we propose GLOBE, a lightweight framework that combines n-gram Markov models for capturing temporal behavioral patterns with uncertainty-guided large language model (LLM) reasoning. The framework performs sequential prediction efficiently while selectively invoking LLM reasoning only when the model confidence is low. To evaluate performance under realistic conditions, we introduce HOMER-Noise, a noisy extension of the HOMER+ dataset that simulates structured disturbances such as object movements caused by humans, pets, and toddlers. Experimental results show that GLOBE achieves competitive performance with state-of-the-art methods while improving robustness and computational efficiency across both clean and noisy settings. The framework is further validated through a proof-of-concept integration with a Stretch 3 mobile manipulator, demonstrating its potential application in real-world human-robot interaction scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08458v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alvaro Gonzalez, M. H. Hasan Shovo, Ali Ayub</dc:creator>
    </item>
    <item>
      <title>Simplest Nontrivial Maxwellian Random Field Models for Stochastic LoS MIMO Using the Dyadic Green's Function</title>
      <link>https://arxiv.org/abs/2606.08463</link>
      <description>arXiv:2606.08463v1 Announce Type: new 
Abstract: This letter introduces a novel, full-wave, physics-compliant stochastic dyadic Green's function (SDGF) framework for modeling electromagnetic (EM) multiple-input-multiple-output (MIMO) channels under wavenumber uncertainty. Unlike conventional phenomenological fading models, the proposed approach provides what appear to be the simplest exact random field models of electromagnetic line-of-sight (LoS) propagation that are also exact solutions of Maxwell's equations. Hence, we dub them Maxwellian random field theoretic models. These physically consistent stochastic models, including an analytically tractable wavenumber Gaussian model and a more general stochastic plane wave (SPW) model, serve as fundamental baseline models for stochastic LoS channel characterization. By preserving the vectorial structure of Maxwell's equations and the dispersion relation, the framework naturally incorporates both propagating and evanescent modes. Our analysis of ergodic capacity and degrees of freedom (DoF) reveals that the key results of the complex SPW model can be reproduced by the simpler Gaussian model with limited variance. Furthermore, we provide examples using 2D continuous MIMO systems, illustrating how the model's Maxwell-consistent stochasticity explains observed increases in channel capacity and DoF over the deterministic MIMO capacity baseline. These idealized Maxwellian random field theoretic models offer a physically grounded reference point for understanding fundamental limits in stochastic LoS propagation environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08463v1</guid>
      <category>cs.IT</category>
      <category>eess.SP</category>
      <category>math-ph</category>
      <category>math.IT</category>
      <category>math.MP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lumeng Xu, Said Mikki</dc:creator>
    </item>
    <item>
      <title>TVI-CoT: Text-Visual Interleaved Chain-of-Thought Reasoning for Multimodal Understanding</title>
      <link>https://arxiv.org/abs/2606.08464</link>
      <description>arXiv:2606.08464v1 Announce Type: new 
Abstract: Chain-of-thought (CoT) reasoning has proven effective for enhancing problem-solving in large language models. However, when applied to multimodal LLMs (MLLMs), existing CoT approaches suffer from a fundamental limitation: they perform reasoning entirely in text without accessing visual features during the reasoning process. After initial visual encoding, image information becomes inaccessible, forcing models to reason based solely on whatever was captured in the initial description, which forms a `vision-blind reasoning' paradigm that limits fine-grained visual extraction, error verification, and adaptive attention. We propose Text-Visual Interleaved Chain-of-Thought (TVI-CoT), a framework that enables explicit interleaving of textual reasoning and visual feature access through learnable control tokens &lt;THINK&gt;, &lt;LOOK&gt; and &lt;ANSWER&gt;. These tokens allow dynamic switching between reasoning and visual grounding, attending to relevant image regions conditioned on the evolving reasoning state. Experiments on eight benchmarks demonstrate state-of-the-art results among MLLM-based CoT methods and notable performance boost compared to the baseline: +6.1% on MMMU, +3.8% on MathVerse, +3.4% on MathVista, and +3.4% on ScienceQA. Code is available at https://github.com/hulianyuyy/TVI-CoT.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08464v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lianyu Hu, Xiaoyu Ma, Zeqin Liao, Yang Liu</dc:creator>
    </item>
    <item>
      <title>An Empirical Comparison of General Context-Free Parsers</title>
      <link>https://arxiv.org/abs/2606.08465</link>
      <description>arXiv:2606.08465v1 Announce Type: new 
Abstract: Parsing underpins a vast range of software engineering tasks, from compilers and static analyzers to language servers and fuzz testing tools. Yet most parsers deployed in practice are deterministic (LL or LR), forcing developers not only to contort their grammars to fit the parser, but to simplify the very languages they design sacrificing expressiveness for the sake of parseability. General context-free parsers eliminate this constraint. Yet, despite decades of algorithmic development, no rigorous head-to-head comparison exists across the major families of parsing algorithms.
  We present the first unified, controlled benchmark of six generalized parsing algorithms: CYK, Valiant, Earley, GLL, RNGLR, and BRNGLR, plus deterministic LL(1) and LR(1) baselines, all implemented in Rust with shared data structures and parse-tree extraction, and evaluated across 22 grammars ranging from simple expressions to full C++ and Java. Our results show that the cost of generality is lower than widely assumed. On deterministic grammars, the GLR family incurs only a 3x median slowdown over LR(1), with a narrow and predictable variance. GLR is the clear performance winner among generalized parsers and a practical default choice for software engineering tools.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08465v1</guid>
      <category>cs.FL</category>
      <category>cs.PF</category>
      <category>cs.PL</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Huan Vo, Danushka Liyanage, Hong Jin Kang, Sasha Rubin, Rahul Gopinath</dc:creator>
    </item>
    <item>
      <title>ToolRec: Calibrated Preference Alignment for Query Recommendation in On-Device Assistants</title>
      <link>https://arxiv.org/abs/2606.08466</link>
      <description>arXiv:2606.08466v1 Announce Type: new 
Abstract: Large Language Models (LLMs) have significantly advanced generative query recommendation. However, existing alignment methods primarily focus on standard chatbot scenarios, falling short in on-device intelligent assistants where users predominantly expect the rapid invocation of system-level tools. Moreover, directly aligning LLMs with real-world click logs introduces severe noise due to varying user activity levels and the failure to emphasize execution-oriented queries. To address these challenges, we propose ToolRec, a calibrated preference alignment framework tailored for on-device query recommendation. To ground query recommendation with executable actions, we first construct SysToolKit, a comprehensive repository of 708 system tools, paired with a context-aware tool retrieval mechanism to ensure recommendation relevance. We then propose a dual-level calibration mechanism to refine raw click data, effectively mitigating user behavioral noise by calibrating signals based on user activity levels, while simultaneously up-weighting click signals on system-level tool-invoking queries. Guided by these refined preference signals, we then align the model using a sample-level weighted Kahneman-Tversky Optimization (KTO). Extensive online A/B tests on our mobile assistant platform OPPO Xiaobu, which has over 150 million monthly active users, demonstrate that ToolRec can significantly improve Click-Through Rate (CTR) and total clicks volume over strong baselines while maintaining high query relevance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08466v1</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zihan Luo, Lingkui Chen, Ruike Zhang, Hong Huang, Boyang Zhang, Ziniu Chen, Lizhong Wang</dc:creator>
    </item>
    <item>
      <title>The Confidence Trap: Calibration Attacks for Graph Neural Networks</title>
      <link>https://arxiv.org/abs/2606.08467</link>
      <description>arXiv:2606.08467v1 Announce Type: new 
Abstract: While confidence calibration is essential for trustworthy decision-making in safety-critical applications, the robustness of calibrated GNNs to adversarial structural perturbations remains largely unexplored. However, studying calibration attacks on graphs presents unique technical challenges: (1) the discrete nature of graph structures complicates gradient-based optimization, (2) existing underconfidence objectives fail to drive predictions toward uniform distributions, and (3) GNNs are highly sensitive to edge perturbations, often causing unintended label changes that violate attack constraints. To address these challenges, we propose a \textbf{Unified Graph Calibration Attack (UGCA)} framework designed for \textbf{worst-case (white-box) analysis} of GNN calibration robustness. UGCA introduces a KL-divergence loss to encourage uniform predictive distributions, a reranking mechanism to reduce label flipping, a hybrid loss to recover labels when violations occur, and beam search to explore a broader adversarial search space. We further provide theoretical insights linking model generalization, dataset complexity, and calibration vulnerability, showing that models with higher accuracy or trained on datasets with more classes are more susceptible under this threat model. Extensive experiments demonstrate that UGCA substantially increases Expected Calibration Error while preserving classification accuracy. Our code is publicly available at https://github.com/CaptainCuong/Graph-Calibration-Attack.git.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08467v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Cuong Dang, Jiahao Zhang, Hieu Ta Quang, Dung Le, Lu Cheng, Suhang Wang</dc:creator>
    </item>
    <item>
      <title>OctaOctree Neural Radiosity for Real-time Glossy Material Rendering</title>
      <link>https://arxiv.org/abs/2606.08469</link>
      <description>arXiv:2606.08469v1 Announce Type: new 
Abstract: Modeling high-frequency outgoing radiance distributions remains a fundamental challenge in global illumination, especially for glossy and specular materials. Existing neural-based radiance caching methods commonly rely on positional feature encodings or spatially organized caches, which makes it difficult to represent sharp directional radiance variations without increasing the model complexity or sampling cost. To address this challenge, we propose OctaOctree, an efficient spatial-angular radiance representation for global illumination. OctaOctree organizes outgoing radiance with an adaptive octree in 3D space, and associates each spatial node with an octahedral directional map. By coupling the spatial hierarchy with direction-dependent storage, our representation allocates fine spatial resolution to local illumination and visibility changes, while using coarser spatial levels with richer angular resolution to capture glossy and specular radiance distributions. This design embeds a reflectance-aware spatial-angular prior directly into the radiance representation, reducing the burden on neural networks or reconstruction modules to recover high-frequency view-dependent effects from positional features alone. As a result, OctaOctree provides a compact and expressive neural encoding for a wide range of indirect illumination effects, from diffuse interreflection to sharp glossy reflections. Experiments demonstrate that our method produces high-quality, direction-aware global illumination with single network query at primary intersections, achieving improved fidelity and real-time performance compared with baseline neural radiosity and radiance caching approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08469v1</guid>
      <category>cs.GR</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jierui Ren (Peking University), Haojie Jin (Peking University), Bo Pang (Peking University), Meng Gai (Peking University), Fei Zhu (Peking University), Yisong Chen (Peking University), Sheng Li (Peking University)</dc:creator>
    </item>
    <item>
      <title>LUNA-AD: Lightweight Uncertainty-Aware Language Model with Lifelong Learning for Autonomous Driving</title>
      <link>https://arxiv.org/abs/2606.08470</link>
      <description>arXiv:2606.08470v1 Announce Type: new 
Abstract: While large language models (LLMs) offer promising reasoning capabilities, their integration into safety-critical driving systems is hindered by limited reasoning diversity, high computational overhead, and static learning paradigms. To address these challenges, we propose LUNA-AD, a lightweight uncertainty-aware language model with lifelong learning for autonomous driving (AD). LUNA-AD features a tri-system architecture that reconciles complex multimodal behavioral reasoning, efficient deployment, and continual refinement. We design a multi-agent analytical system to generate uncertainty-aware decision-making demonstrations through diverse hypothesis exploration. A dual-head lightweight heuristic model is distilled to unify the inference of decision distributions and textual explanations while enabling efficient deployment. Furthermore, a reflection-driven lifelong learning mechanism operates on multimodal decision outputs and preserves strategic diversity, allowing for the refinement of candidate decisions and rationales via closed-loop feedback to enhance driving robustness. Extensive experiments on nuPlan benchmarks demonstrate that LUNA-AD achieves state-of-the-art success rates under both non-reactive and reactive modes, with drastically reduced inference latency compared to existing knowledge-driven AD frameworks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08470v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ruoyu Yao, Pei Liu, Ruiguo Zhong, Mingxing Peng, Rui Yang, Jun Ma</dc:creator>
    </item>
    <item>
      <title>More Yap Less Meaning: Uncovering Self-Improvement Behavior in SLMs</title>
      <link>https://arxiv.org/abs/2606.08471</link>
      <description>arXiv:2606.08471v1 Announce Type: new 
Abstract: Recently, language models have made rapid progress across various domains and applications. However, their capability for self-improvement, i.e., whether they are adept at recognising and correcting flaws in their own reasoning, remains dubious. In this study, we address this question by constructing a sufficiency test to rigorously examine the self-correction capabilities of small language models (SLMs). We propose a minimal three-step self-correction pipeline that collects initial SLM answers, prompts the same model to generate hints for its incorrect responses given the ground truth, and feeds the model the same question with its own feedback to refine the initial answer. We evaluate a variety of instruction-tuned and reasoning SLMs in this experimental setup on arithmetic and logical reasoning benchmarks. Our findings show that SLMs with injected hint sentences yield only a 4.4 percent gain over initial question-answering accuracy. Even though the correct answer was provided alongside the model's incorrect reasoning, the evaluated SLMs fail to understand what was missing in their reasoning and show minimal semantic difference between hints that lead to corrections and ones that do not. Furthermore, our experiments show that longer hints are positively correlated with incorrect final answers, suggesting that longer deliberation on problems can hinder the reasoning process, meaning that SLMs do not necessarily scale in performance with a larger compute budget.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08471v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Marina Igitkhanian, Erik Arakelyan</dc:creator>
    </item>
    <item>
      <title>Digital White Spaces: A Cyberpsychology-Informed Framework to Mobile Phone Addiction</title>
      <link>https://arxiv.org/abs/2606.08472</link>
      <description>arXiv:2606.08472v1 Announce Type: new 
Abstract: Mobile-phone overuse and attention fragmentation have become pressing societal and public-health concerns. Cyberpsychology research highlights addictive engagement loops driven by intermittent rewards, persuasive design, and habit formation. In this editorial I synthesize current evidence on mobile-phone addiction and propose "Digital White Spaces" (DWS), a socio-technical framework that combines privacy-preserving monitoring, AI-driven detection of addictive loops, device-mode interventions, and physical signal-limited zones to restore cognitive autonomy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08472v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Leandros Maglaras, Helge Janicke, Konstantinos Karantzalos</dc:creator>
    </item>
    <item>
      <title>Physically Consistent Null Space Alignment for Detection of Low-Magnitude False Data Injection Attacks</title>
      <link>https://arxiv.org/abs/2606.08473</link>
      <description>arXiv:2606.08473v1 Announce Type: new 
Abstract: False data injection attacks (FDIAs) introducing small measurement perturbations can still cause large deviations in power system state estimation when the injected signals align with the pseudo-null space of the system model. Existing model- and data-driven detectors may fail to identify such low-magnitude but high-impact attacks because residual tests ignore changes hidden in the pseudo-null space, while subspace learning methods capture correlation patterns without enforcing physical consistency. This paper proposes Physically Consistent Null Space Alignment (PCNSA), a framework that detects stealthy FDIAs by preserving, through preprocessing, the geometric correspondence between the physical null space and the measurement-derived pseudo-null space. The key point is a Pseudo-null Space Conserved data Preprocessing (PSCP) step that re-expresses measurements in the physical coordinate frame before subspace extraction. We prove that PSCP preserves the separation between row space and its orthogonal complement, a property that conventional per-feature standardization violates. This keeps the singular value decomposition (SVD)-derived pseudo-null subspace aligned with the physical residual space without explicit knowledge of H. Experiments on IEEE 14-, 30-, 57-, and 118-bus systems confirm this principle in practice: stealthy attacks that evade XTM, LSTM, AE and Isolation Forest baselines appear as clear deviations in the aligned subspace, yielding higher F1-score and detection accuracy while remaining robust under partial observability and realistic PMU noise.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08473v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xin Li, Chenhan Xiao, Jonathan Cohen, Aviad Elyashar, Yang Weng, Rami Puzis</dc:creator>
    </item>
    <item>
      <title>FlashCP: Load-Balanced Communication-Efficient Context Parallelism for LLM Training</title>
      <link>https://arxiv.org/abs/2606.08476</link>
      <description>arXiv:2606.08476v1 Announce Type: new 
Abstract: Context parallelism (CP) is essential for training large-scale, long-context language models, as it partitions sequences to reduce memory overhead. However, existing CP methods suffer from workload imbalance, inefficient kernels, and redundant communication due to static sequence sharding and key-value (KV) tensor communication. We present FlashCP, a load-balanced and communication-efficient framework for CP training. FlashCP introduces a sharding-aware communication mechanism to eliminate redundant KV communication and proposes a novel Whole-Doc sharding strategy that maximizes communication savings while maintaining balanced workloads. To efficiently combine Whole-Doc and Per-Doc sharding, FlashCP further designs a heuristic algorithm to search for near-optimal sharding plans. Extensive experiments show that FlashCP achieves up to 1.63x speedup over state-of-the-art CP frameworks across diverse datasets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08476v1</guid>
      <category>cs.DC</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zheng Wang, Eric Liu, Linan Jiang, Zhongkai Yu, Zaifeng Pan, Yue Guan, Yuke Wang, Yufei Ding</dc:creator>
    </item>
    <item>
      <title>A Variability-Based Framework for Interpretable Naming in Formal and Relational Concept Analysis</title>
      <link>https://arxiv.org/abs/2606.08477</link>
      <description>arXiv:2606.08477v1 Announce Type: new 
Abstract: Knowledge extraction from symbolic data often produces abstractions that are formally defined but not immediately interpretable by users. Formal Concept Analysis (FCA) and Relational Concept Analysis (RCA) provide representative settings for this issue: they generate explicit conceptual structures, implications, and relational dependencies from object descriptions and relations. Although these structures are explainable by design, their concepts are often identified by technical labels, which limits their use as human-interpretable knowledge units. Assigning meaningful names to such concepts is therefore a key issue for interpretation, navigation, validation, and reuse by domain experts.
  This paper investigates concept naming in FCA and RCA from a symbolic knowledge representation perspective. We first characterize the linguistic and terminological challenges involved in naming generated symbolic abstractions, including ambiguity, discrimination, concision, and consistency across related concepts. We then propose a configurable framework for LLM-assisted concept naming. The framework relies on a variability model that controls which sources of information are exposed during naming, such as intent, extent, inherited information, neighboring concepts, implications, and relational attributes. It thereby makes explicit the semantic choices involved in moving from formal concept descriptions to human-readable names.
  The approach is illustrated as a proof of concept on a small relational dataset in the pizzeria domain. This illustration shows how different configurations influence the names suggested by an LLM, and how naming variability can reveal interpretation choices, relational dependencies, and possible modeling issues in the underlying symbolic data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08477v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Alain Gutierrez, Marianne Huchard, Pierre Martin, Andr\'e Miralles, Violaine Prince</dc:creator>
    </item>
    <item>
      <title>Inferring hidden forcing in a biological oscillator using Kolmogorov-Arnold networks</title>
      <link>https://arxiv.org/abs/2606.08479</link>
      <description>arXiv:2606.08479v1 Announce Type: new 
Abstract: Inferring the forces that drive a dynamical system from partial observations is a fundamental challenge across physics, particularly when distinct underlying mechanisms produce similar observable dynamics. Here we show that the effective muscular forcing underlying avian respiratory dynamics can be reconstructed from measurements of air-sac pressure alone. Using an interpretable learning framework based on Kolmogorov-Arnold networks, we infer the governing equations of the system directly from data and uncover a nontrivial structure in the underlying forcing that is not apparent from the pressure signal, which instead suggests a relaxation-like oscillation. The reconstructed dynamics predict a two-phase activation pattern within each respiratory cycle, which we independently validate through electromyographic recordings of expiratory muscles. These results demonstrate that data-driven reconstruction of dynamical laws can reveal hidden physical structure and provide access to unobserved driving variables, establishing a general route to infer latent forces in partially observed dynamical systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08479v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Julian Szereszewski, Facundo Fainstein, Leandro E. Fernandez, Gabriel B. Mindlin</dc:creator>
    </item>
    <item>
      <title>Adaptive Loss Balancing for Noise-Robust GRPO in Generative Recommendation</title>
      <link>https://arxiv.org/abs/2606.08480</link>
      <description>arXiv:2606.08480v1 Announce Type: new 
Abstract: Reinforcement learning (RL) presents a promising avenue for enhancing generative recommendation beyond supervised imitation, leveraging reward signals to guide policy improvement. However, its efficacy is critically contingent on the trustworthiness of the reward model for the samples it evaluates. In practice, production rankers, the widely adopted reward models, are trained on exposure-biased logs, leading to sample-dependent inaccuracies that violate this assumption. Our stratified analysis uncovers a consistent pattern: reward guidance is most beneficial when the policy exhibits uncertainty and the ranker can effectively discriminate the ground-truth item from rollout negatives. On other samples, the reward signal is either negligible or detrimental, highlighting the risk of uniform RL application. To address such an issue, we introduce AdaGRPO, a novel framework that treats reward-guided optimization as selective admission rather than uniform pressure. Training is anchored in supervised negative log-likelihood, while the GRPO objective is gated by a binary, per-sample clip determined by two rollout diagnostics: policy-side difficulty and reward discriminability. Instances failing either diagnostic default to pure supervision, ensuring stability and mitigating the amplification of noisy gradients. We validate AdaGRPO on a large-scale e-commerce dataset. At the best intermediate checkpoint, it elevates HR@10 from 11.01% to 12.18% while constraining hallucination below 0.22%, and maintains robustness at the final checkpoint (HR@10 11.63%, hallucination 0.27%), outperforming fixed NLL--GRPO mixtures across the retrieval--validity frontier. In production A/B tests, AdaGRPO achieves statistically significant gains in click-through rate and dwell time, confirming its practical utility.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08480v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kewei Xu, Junbo Qi, Yanyan Zou, Pengfei Zhang, Xingzhi Yao, Shengjie Li</dc:creator>
    </item>
    <item>
      <title>PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems</title>
      <link>https://arxiv.org/abs/2606.08481</link>
      <description>arXiv:2606.08481v1 Announce Type: new 
Abstract: Enterprise property graphs vary widely in schema structure, internal terminology, domain assumptions, governance constraints, and user interaction patterns. A deployment-relevant Text2Cypher benchmark therefore reflects the questions users and agents actually ask of that graph. Creating such a benchmark is difficult because schemas and values are unique, and graph structure changes over time. Each NL-query pair must also be executable, use real graph entities, preserve diversity, and remain balanced across query types and difficulty levels. We present PIPE-Cypher, a local benchmark-generation pipeline that turns a live property graph and optional seed queries from customer questions, analyst logs, or agent tool calls into balanced NL-to-Cypher benchmarks. PIPE-Cypher combines schema profiling, reverse-query grounding, constrained generation, deterministic Cypher governance, execution validation, redaction, diversity controls, and a calibrated local LLM judge. Using local Qwen3.5-9B generation and judging, PIPE-Cypher exports 3,000 accepted FinBench/SNB examples, completes three audited ablation suites, calibrates judge behavior with human labels, and evaluates 11 local downstream models. The resulting benchmark is deliberately discriminative: zero-shot transfer is weak, while a few-shot control shows that schema-specific example banks can help compatible model families. Together, PIPE-Cypher makes Text2Cypher benchmarking a repeatable process that evolves with the graph, its users, and its target workloads.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08481v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.DB</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Suraj Ranganath, Anish Raghavendra</dc:creator>
    </item>
    <item>
      <title>Testing the Black Box: Structural Barriers to Independent Evaluation of Consumer-Facing Health LLMs</title>
      <link>https://arxiv.org/abs/2606.08483</link>
      <description>arXiv:2606.08483v1 Announce Type: new 
Abstract: Background: Consumer-facing large language models are now a common source of health information, and they interpret and personalize responses rather than retrieve them. Whether their responses vary across users is a clinical, equity, and governance question, sharpened by evidence that sycophantic responses can alter judgment and increase trust.
  Objective: To evaluate response variation and sycophancy in consumer-facing health LLMs under conditions resembling ordinary patient use.
  Methods: We constructed simulated user profiles differing in geography, browsing context, expressed beliefs, and social determinants of health, drawing on literature linking social context to health attitudes. We adapted validated instruments, including the Vaccination Attitudes Examination scale and reproductive attitudes scales, into multi-turn prompts designed to elicit clinically meaningful variation across users.
  Results: The evaluation encountered five linked barriers. Factual prompts produced stable responses that masked sycophancy emerging over multi-turn conversation. Browser-based interfaces did not disclose which signals influence outputs and could not be reset to a clean baseline. Large-scale testing was restricted by terms of service, rate limits, and bot detection. Accuracy-based criteria could not capture tone, framing, or omission, and LLM-as-judge methods risked shared alignment bias. Models changed without traceable version identifiers, preventing reliable replication.
  Conclusions: No reliable independent evaluation framework yet exists for examining how consumer-facing health LLMs behave in ordinary use. Oversight requires disclosure of personalization signals, stable version identifiers, researcher safe harbor programs, and post-deployment monitoring of health-related outputs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08483v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rahul Gorijavolu, Kaushik Madapati, Pritika Vig, Rawan Abulibdeh, Nikhil Jaiswal, Mahri Kadyrova, Zeamanuel Hailu Tesfaye, Charles Senteio, Paula Maurutto, Leo Anthony Celi</dc:creator>
    </item>
    <item>
      <title>STELLAR: Spatio-Temporal Environmental Learning with Latent Alignment and Refinement for Long-Tailed Species Distribution Modeling</title>
      <link>https://arxiv.org/abs/2606.08484</link>
      <description>arXiv:2606.08484v1 Announce Type: new 
Abstract: Joint Species Distribution Modeling (JSDM) is a key enabler for biodiversity monitoring and conservation planning. However, accurate JSDM faces two coupled challenges: environmental drivers and species distributions are inherently spatio-temporal, while species co-occurrence patterns exhibit complex non-linear community structure and severe long-tail imbalance driven by rare species. Existing approaches often address these factors in isolation, learning from static covariates or neglecting the historical trajectories of dynamic community structure. To overcome these limitations, we propose STELLAR (Spatio-Temporal Environmental Learning with Latent Alignment and Refinement), a novel framework that learns a shared latent space where dynamic habitat context and community structure are optimized jointly. Our approach integrates three complementary components: (1) a Graph-Temporal Encoder that employs graph attention and recurrent units to aggregate spatial neighborhood effects and capture the co-evolving historical dynamics of environmental context and community structure; (2) a Context-Anchored Latent Alignment mechanism that structures the latent space using a label-activated mixture prior and supervised contrastive learning, actively clustering species based on shared environmental preferences; and (3) an Imbalance-Aware Decoupled Decoding module that utilizes Asymmetric Loss to focus learning on hard, rare species samples, preventing mode collapse in the long tail. Experiments on the large-scale eBird dataset, curated with domain experts, demonstrate that our framework significantly outperforms state-of-the-art baselines, particularly in predicting rare species and revealing interpretable species interactions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08484v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shufeng Kong, Tao Yu, Yuanyuan Wei, Caihua Liu, Junwen Bai, Yingheng Wang, Marc Grimson, Daniel Fink, Carla P. Gomes</dc:creator>
    </item>
    <item>
      <title>TRADE: Transducer-Augmented Decoder for Speech LLM</title>
      <link>https://arxiv.org/abs/2606.08486</link>
      <description>arXiv:2606.08486v1 Announce Type: new 
Abstract: Speech Large Language Models (Speech LLMs) lack a principled mechanism for streaming inference: their label-synchronous generation has no acoustic-frame alignment, making real-time decoding and end-of-utterance detection difficult. We propose TRADE TRansducer-Augmented DEcoder, which augments a multimodal LLM with a transducer branch that shares the audio encoder and uses the LLM's hidden states directly as the prediction network -- coupling frame-synchronous acoustic alignment with the LLM's linguistic reasoning. Three design choices make the system accurate, streamable, and long-form capable: (1)Tightly coupled dual vocabularies -- a compact transducer vocabulary derived from the LLM vocabulary, enabling zero-cost score fusion; (2)Chunk-synchronized streaming training with gradient stopping, eliminating the train-inference mismatch at offline-equivalent memory cost; and (3)Localized Decoder Audio Attention (LDAA), a causal sliding window that caps KV-cache memory independently of utterance length. A single TRADE checkpoint supports offline and streaming decoding across a continuous range of latency operating points. TRADE achieves 6.71% average WER on the Open ASR Leaderboard, while the streaming recognition with 960ms chunk size reaches 8.40% from the same checkpoint. On long-form speech, it obtains 3.64% WER on TED-LIUM and 10.88% on Earnings-22 without external segmentation. TRADE provides sentence-end punctuation timestamps that, when combined with acoustic voice activity detection (VAD), improve end-of-utterance detection by +0.03 F_1 over acoustic VAD alone.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08486v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yun Tang, Shanil Puri, Shinji Watanabe, Subhabrata Mukherjee</dc:creator>
    </item>
    <item>
      <title>What Makes a Desired Graph for Relational Deep Learning?</title>
      <link>https://arxiv.org/abs/2606.08491</link>
      <description>arXiv:2606.08491v1 Announce Type: new 
Abstract: Relational deep learning (RDL) converts relational databases (RDBs) into heterogeneous graphs, but graphs derived directly from database schemas are often not well suited for how graph neural networks (GNNs) perform relational reasoning. We study what makes a relational graph suitable for deep learning and show that schema-derived graphs suffer from two systematic failures: information overload and semantic fragmentation. Our empirical analysis reveals that the desired graph is not the raw schema, but a result of controlled structural adaptation. Performance depends on balancing two operations: mitigating information overload via filtering, and repairing semantic fragmentation via injection. Specifically, filtering serves as a bias-variance knob with non-monotonic effects, while injection improves performance only when it explicitly restores the relational dependencies missing from the original schema. Based on these findings, we develop an end-to-end structural optimizer that applies both operations to adapt relational graphs automatically. Across 26 tasks spanning classification, regression, and recommendation, the optimized graphs consistently improve accuracy while often reducing inference cost.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08491v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yao Cheng, Siqiang Luo</dc:creator>
    </item>
    <item>
      <title>Seeing is Believing: Aligning Prompt Rewriting with Visual Anchors for Text-to-Image Generation</title>
      <link>https://arxiv.org/abs/2606.08492</link>
      <description>arXiv:2606.08492v1 Announce Type: new 
Abstract: Despite the impressive capabilities of text-to-image (T2I) models, an intent-generation gap often persists due to the brevity and ambiguity of user prompts. Existing approaches primarily polish the prompt for fluency and readability. However, the enhancement process still lacks visual grounding. As a result, the rewriter may over-infer missing details, causing an intent-generation gap. To address this limitation, we propose FaithRewriter, a novel prompt-enhancement framework for T2I generation. Specifically, FaithRewriter first leverages a multimodal MLLM to generate an image from the original prompt as an intermediate visual cue. This cue is then combined with the prompt and fed into a large-scale LLM to produce visually grounded augmentations that better reflect how the intended content should appear in images. Finally, these augmentations are distilled into a small-scale LLM for efficient deployment, enhancing its ability to generate effective T2I prompts. Experiments show that FaithRewriter yields prompts that are more faithful to the user intent and more visually plausible than strong baselines, helping narrow the intent-generation gap.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08492v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xuanyi Liu, Deyi Ji, Junyu Lu, Jing Wang, Qianxiong Xu, Xuhang Chen, Tianrun Chen, Siwei Ma</dc:creator>
    </item>
    <item>
      <title>EgoPriMo: Egocentric Motion Generation for Interactive Humanoid Control</title>
      <link>https://arxiv.org/abs/2606.08495</link>
      <description>arXiv:2606.08495v1 Announce Type: new 
Abstract: Humanoid robots require whole-body motions that adapt to scene context, task requirements, and user intent. Motion tracking reproduces specified trajectories, and humanoid vision-language-action systems provide semantic interfaces, but neither offers a scalable and interactive prior for broad full-body behavior. We introduce EgoPriMo (Egocentric Motion Prior for Humanoid Robots), a unified framework that learns such priors from egocentric human demonstrations. Given egocentric observations and a text prompt, EgoPriMo reconstructs, generates, and forecasts SMPL-based full-body motion. Language is used as a high-level control signal rather than a complete motion specification. At the core of EgoPriMo is a Triple-stream DiT that jointly models body dynamics, egocentric visual context, and text; task-conditioning masks route different tasks and missing-modality data through the same checkpoint. Experiments on Nymeria and EgoExo4D show that one checkpoint improves egocentric motion generation over UniEgoMotion while supporting reconstruction and forecasting; the generated SMPL motions can also be executed by a Unitree humanoid controller. These results indicate a practical path from scalable egocentric observations to generalizable and interactive humanoid motion priors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08495v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haoyang Ge, Peng Ren, Yukun Shi, Cong Huang, Kun Li, Kai Chen</dc:creator>
    </item>
    <item>
      <title>SAEExplainer: Interpreting SAE Features with Activation-Guided Preference Optimization</title>
      <link>https://arxiv.org/abs/2606.08496</link>
      <description>arXiv:2606.08496v1 Announce Type: new 
Abstract: Although Sparse Autoencoders (SAEs) have mitigated the opacity of large language models (LLMs) by decomposing dense representations into sparse features, explaining these features still remains a central challenge. Current explanation methods, however, typically operate within an open-loop paradigm, failing to leverage mechanistic feedback for further refinement. In this paper, we propose SAEExplainer, a training framework utilizes activation scores as an objective reward signal to train the model for self-correction and iterative bootstrapping. By iteratively verifying and correcting foundational explanations through a two-round optimization process, SAEExplainer achieves continuous improvement in its explanatory capabilities. This mechanism significantly reduces explanation hallucinations and reinforces causal triggering patterns. Extensive experiments demonstrate our approach improves upon established baselines across most metrics, especially in causal triggering and discriminative activation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08496v1</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jingyi He, Haiyan Zhao, Ruxue Shi, Yanguang Liu, Xin Wang, Fei Sun, Mengnan Du</dc:creator>
    </item>
    <item>
      <title>Explaining Black-Box Language Models: Learning to Optimize Linguistically-Structured Word Subsets</title>
      <link>https://arxiv.org/abs/2606.08497</link>
      <description>arXiv:2606.08497v1 Announce Type: new 
Abstract: As deep language models (DLMs) are increasingly deployed in high-stakes domains such as healthcare, understanding their decision rationale becomes paramount for ensuring trust, safety, and accountability. However, achieving this vital level of interpretability is particularly challenging when these DLMs operate as black-box systems (e.g., via APIs), where access to internal model states (e.g., parameters, gradients) is restricted. Despite numerous efforts, existing explanation methods often fail to concurrently satisfy three key desiderata: (i) inference-time efficiency, (ii) black-box compatibility without inducing out-of-distribution behavior, and (iii) comprehensible explanations grounded in the input's linguistic structure. To address these challenges, we propose a method that explains predictions of DLMs by selecting a small, informative subset of input words. We formulate this as an amortized optimization problem, enabling efficient one-shot inference without the need for input-specific search. Our selection policy is trained via REINFORCE-style policy gradients, allowing discrete word selection in a fully gradient-free setting. To enhance interpretability and align with human linguistic intuition, we integrate graph-structured knowledge into this selection process, fostering linguistically coherent subsets that result in explanations both highly informative and cognitively meaningful to end-users. We evaluated our method on diverse DLM architectures and multiple real-world datasets. It consistently identifies word subsets with enhanced discriminative power and stronger alignment with linguistically salient cues, outperforming both conventional black-box compatible methods and gradient-based approaches that are given oracle access to the black-box model's gradients for a more challenging benchmark. Our code is available at here.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08497v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3770855.3817677</arxiv:DOI>
      <dc:creator>Minyoung Hwang, Seokhyun Lee, Changhee Lee</dc:creator>
    </item>
    <item>
      <title>Projecting the Emerging Mindset of SWE Agent by Launching a Wild Code Understanding Journey</title>
      <link>https://arxiv.org/abs/2606.08500</link>
      <description>arXiv:2606.08500v1 Announce Type: new 
Abstract: Software engineering agents (SWE agents) increasingly work through tool-mediated trajectories in real repositories, yet their behavior remains difficult to characterize in concrete, observable terms. These trajectories record tool use, intermediate reasoning, evidence selection, and self-directed stopping, but they do not by themselves explain why particular moves were chosen, what evidence was trusted, or when understanding was judged sufficient. This tension makes trajectory data both limited and valuable: faithful, replayable traces can become an empirical substrate for studying agent behavior when interpreted through disciplined observation. We introduce Ada, a scoped apparatus for repository-level code understanding. Ada enters real codebases through a bounded tool interface, allowing open-ended exploration to remain recordable as finite trajectories. Across this wild-but-bounded setting, Ada chooses where to look, what to read closely, when to consolidate partial understanding, and when to close its account of the repository. We project Ada's think-action chains through observation lenses that make navigation, evidence selection, synthesis, grounding, and stopping visible without reducing behavior to raw tool counts or speculating about hidden intent. Read together, these lenses produce behavioral profiles grounded in recorded movement through software worlds. Across 408 trajectories, spanning multiple models, repositories, task families, and launch conditions, the study shows how faithful digital traces can be transformed into disciplined, comparable projections of emerging SWE-agent mindset. The results expose differences in efficiency, trajectory diversity, epistemic grounding, and the limits of intervention, while providing a methodological foundation for observing SWE agent behavior in real codebases.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08500v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhengyi Zhuo, Yan Liu</dc:creator>
    </item>
    <item>
      <title>Back on Track: Aligning Rewards and States for Reasoning in Diffusion Large Language Models</title>
      <link>https://arxiv.org/abs/2606.08501</link>
      <description>arXiv:2606.08501v1 Announce Type: new 
Abstract: Reinforcement learning (RL) holds immense promise for enhancing the reasoning capabilities of diffusion large language models (dLLMs). However, progress is fundamentally constrained by a dual misalignment between authentic generation trajectory and the gradient update process: (i) Process-reward misalignment. Sparse, terminal rewards are indiscriminately assigned to all intermediate steps of the generation process, failing to provide discriminative credit assignment. (ii) State-trajectory misalignment. Policy updates are often diverted toward artificial, out-of-trajectory states, squandering gradients on less informative samples. To address these limitations, we introduce Process Aligned Policy Optimization (PAPO), a novel framework that holistically aligns the RL update with the dLLM's generative trajectory via Step-Aware Process Rewards (SPR) that transform sparse terminal rewards into dense, step-wise credit, and Entropy-Guided Historical Re-enactment (EHR) that replays authentic trajectories at high-uncertainty steps. Extensive experiments on four benchmarks demonstrate that PAPO significantly outperforms baselines, achieving gains of up to 4.5% on GSM8K, 4.8% on MATH500, 42.2% on Countdown and 16.1% on Sudoku.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08501v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yawen Shao, Jie Xiao, Kai Zhu, Yu Liu, Hongchen Luo, Xueyang Fu, Yang Cao, Wei Zhai, Zheng-Jun Zha</dc:creator>
    </item>
    <item>
      <title>Standpoint Logics with Defeasible Beliefs</title>
      <link>https://arxiv.org/abs/2606.08503</link>
      <description>arXiv:2606.08503v1 Announce Type: new 
Abstract: In this paper, we integrate the defeasible logic of Kraus, Lehmann and Magidor (KLM) with the standpoint logic framework of G\'omez \'Alvarez and Rudolph. This is done with the goal of formally expressing knowledge taking into account multiple (possibly contradicting) viewpoints, which in turn may hold defeasible beliefs. In doing so, we utilise Defeasible Restricted Standpoint Logics (DRSL), introduced by Leisegang et al. Our work expands on previous work by providing a foundational representation result for DRSL semantics and systematically lifting several well-known entailment relations from the propositional case to the standpoint-enhanced setting. In particular, we characterise the semantics for DRSL through a set of KLM-style postulates adapted for the standpoints case. We furthermore provide a means to lift preferential entailment, and the class of entailment relations based on single ranking functions from the purely propositional to the standpoint-enhanced context, including rational and lexicographic closure. We show this can be done equivalently through semantic and algorithmic means. Furthermore, we show that, for each considered form of entailment, the complexity class of entailment checking does not change when moving from propositional KLM to DRSL.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08503v1</guid>
      <category>cs.AI</category>
      <category>cs.LO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nicholas Leisegang, Thomas Meyer, Sebastian Rudolph</dc:creator>
    </item>
    <item>
      <title>ActProbe: Action-Space Probe for Early Failure Detection of Generative Robot Policies</title>
      <link>https://arxiv.org/abs/2606.08508</link>
      <description>arXiv:2606.08508v1 Announce Type: new 
Abstract: Generative robot policies fail unpredictably at deployment: they hesitate at critical moments, drift off-task, or commit to unrecoverable actions. Existing online failure detectors either require white-box access to policy internals or add runtime overhead through resampling and observation-side signals. Our empirical analysis shows that emitted action chunks themselves already carry strong predictive signal for impending failures in generative robot policies. Motivated by this observation, we introduce ActProbe, a lightweight, pure action-space detector that uses two compact signals available from a single forward pass: Temporal Consistency Error (TCE) between consecutive action chunks and Action Chunk Magnitude (ACM) of the current chunk. ActProbe maps these signals to per-step failure probabilities with a task-conditioned LSTM-MLP architecture. Across a diverse suite of generative robot policies and benchmarks, ActProbe raises alerts before failures become visually recognizable, improving the accuracy (F1)-timeliness Pareto frontier of failure detection by an average hypervolume gain of +12.7% over both internal- and external-feature baselines, with a +9.0% early-detection ROC-AUC lead on unseen tasks. ActProbe further transfers to deployment, predicting failures on unseen real-robot pick tasks and accelerating RL fine-tuning (PPO) with 2.9x fewer environment interactions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08508v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bingjia Huang, Xiangyu Li, Xiang Wang, Liang Mi, Zixu Hao, Weijun Wang, Hao Wu, Kun Li, Yunxin Liu, Ting Cao</dc:creator>
    </item>
    <item>
      <title>Look Less, Reason More: Block-wise Attention Skipping for Efficient Multimodal LLMs</title>
      <link>https://arxiv.org/abs/2606.08511</link>
      <description>arXiv:2606.08511v1 Announce Type: new 
Abstract: Multimodal Large Language Models (MLLMs) face a significant inference bottleneck due to the quadratic computational cost of self-attention over long visual token sequences. However, we identify a critical inefficiency in current architectures: Visual Attention Saturation. Our analysis reveals that visual tokens rapidly establish their spatial structure and intra-modal relationships in early layers, rendering visual-to-visual self-attention in deeper layers computationally redundant. Conversely, Feed-Forward Networks (FFNs) in these layers remain essential for projecting visual features into the evolving textual semantic space. Leveraging this insight, we present Visual-Skip (V-Skip), a training-free inference paradigm that decouples spatial interaction from semantic evolution. Rather than discarding tokens, V-Skip imposes block-wise structured sparsity by selectively bypassing saturated visual self-attention modules. Furthermore, recognizing that varying downstream tasks demand distinct reasoning depths, V-Skip employs a lightweight, few-shot calibration to dynamically route the task-optimal sparsity path. Extensive experiments demonstrate that V-Skip effectively bypasses redundant vision attention to achieve block-wise sparsity, maintaining a 94.16% to 100.31% performance retention across diverse MLLMs. Ultimately, we prove that to reason more effectively, models do not need to discard what they see -- they simply need to "look less" at the right depth.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08511v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jie Ma, Zhike Qiu, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji</dc:creator>
    </item>
    <item>
      <title>Friend or Foe? Language as an ideological switch in open-weight LLMs under Russian disinformation stress</title>
      <link>https://arxiv.org/abs/2606.08512</link>
      <description>arXiv:2606.08512v1 Announce Type: new 
Abstract: As Russia's war against Ukraine extends into generative AI, large language models (LLMs) adapted for local post-Soviet languages are deployed in contested information environments. Policy and industry discourse assumes that culturally aligned adaptation encodes the political orientation of the target community: a Ukrainian-oriented model will resist Russian narratives, a Russian-oriented one will reinforce them. Does it? This article systematically disconfirms that assumption. We run a controlled audit of four openly available LLMs sharing a common base model but fine-tuned for different linguistic communities, querying them in Ukrainian, Russian and English across ten contested wartime narratives: Crimea, "denazification", the "one people" thesis, and atrocity denial at Bucha and Mariupol. The result is a Fine-Tuning Paradox: the Ukrainian-oriented model shows the weakest resistance to Russian disinformation in Russian, while the Russian-oriented one exhibits the strongest rejection. Corpus composition, language coverage and prompt format prove more decisive than nominal cultural provenance. We situate these findings within debates on hybrid warfare, digital sovereignty and post-imperial information orders, arguing that the principal threat to regional information sovereignty is not adversarial fine-tuning but the untested assumption that cultural alignment guarantees resilience.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08512v1</guid>
      <category>cs.CY</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anna Ma{\l}gorzata Kami\'nska, Tetiana Klynina</dc:creator>
    </item>
    <item>
      <title>Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2606.08513</link>
      <description>arXiv:2606.08513v1 Announce Type: new 
Abstract: Autonomous Underwater Vehicles (AUVs) traditionally rely on complex, heavily engineered pipelines for perception, path planning, and motion control. This paper explores the feasibility of an end-to-end Deep Reinforcement Learning (DRL) approach that maps raw sensor data directly to thruster commands, reducing manual engineering. We propose a hierarchical reinforcement learning (HRL) architecture splitting the problem into two Markov Decision Processes. A High-Level (HL) policy operating at 2Hz processes raw $84 \times 84$ pixel monocular camera frames, stacked $100 \times 100$ pixel forward-looking imaging sonar, and proprioceptive data to generate spatial subgoals. Simultaneously, a Low-Level (LL) policy operating at 10Hz converts these subgoals into thruster commands. The HL policy is trained using Reinforcement Learning from Prior Demonstrations (RLPD) within a modified Sample-Efficient Robotic Reinforcement Learning (SERL) framework, while the LL policy utilizes Soft Actor-Critic (SAC) combined with Hindsight Experience Replay (HER). Evaluated in the high-fidelity HoloOcean simulator, our method demonstrates successful obstacle avoidance, achieving trajectory lengths closely approximating (within 4% to 6% of) an $\text{RRT}^*$ planning baseline. Furthermore, the learned policy exhibits strong robustness to simulated sensor noise and decreased visibility. While the system navigates familiar geometries effectively, experiments reveal generalization limitations when encountering unvisited areas with novel obstacle shapes. Ultimately, this work demonstrates the promise of sample-efficient, end-to-end DRL for underwater navigation using minimal computational hardware.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08513v1</guid>
      <category>cs.RO</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Elisei Shafer, Oren Gal</dc:creator>
    </item>
    <item>
      <title>OmniTryOn: Video Try-On Anything at Once!</title>
      <link>https://arxiv.org/abs/2606.08514</link>
      <description>arXiv:2606.08514v1 Announce Type: new 
Abstract: Although video virtual try-on (VVT) has achieved significant progress, existing methods still exhibit two fundamental limitations: first, they are restricted to single-garment transfer, rendering simultaneous multi-object try-on highly impractical; second, their heavy reliance on explicit external priors (e.g., garment masks) inevitably destroys crucial physical dynamics and degrades visual quality. To bridge this gap, this paper proposes the novel Try-On Anything task, which aims to simultaneously transfer diverse wearable objects onto a person in a video in a single inference pass. To support and standardize this paradigm, we introduce TryAny-Bench, a comprehensive benchmark encompassing a paired video dataset alongside a tailored evaluation protocol. Furthermore, we present OmniTryOn, an external-prior-free generative framework designed to tackle this task. Specifically, OmniTryOn employs a First Frame Wearable Cache strategy, which directly provides diverse wearable objects for the generation process through the initial video frame. To maintain consistency, we propose the Spatiotemporally Consistent RoPE (STC-RoPE), which inherently establishes robust spatiotemporal anchors to strictly preserve complex human motions and background dynamics. Optimized by the proposed Gradual Try-On (GTO) training strategy, our model progressively masters robust multi-object synthesis. Extensive experiments on TryAny-Bench demonstrate that OmniTryOn significantly outperforms existing specialized video virtual try-on models and general video editing baselines, establishing a powerful new standard for the Try-On Anything task. Our dataset, code, and models are available at https://github.com/xcltql666/OminTryOn.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08514v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Changliang Xia, Chengyou Jia, Minnan Luo, Zhuohang Dang, Xin Shen, Bowen Ping</dc:creator>
    </item>
    <item>
      <title>Unifying von-Neumann HPC and Neuromorphic Acceleration via the EBRAINS Research Infrastructure: A Framework for High-Performance Workflows</title>
      <link>https://arxiv.org/abs/2606.08515</link>
      <description>arXiv:2606.08515v1 Announce Type: new 
Abstract: Modern scientific workflows increasingly span diverse computing architectures, yet executing a single computational model across disparate systems often forces researchers to maintain fragmented, site-specific pipelines. In this paper, we address this challenge within the domain of computational neuroscience by presenting a unified, cloud-based workflow orchestrated via EBRAINS JupyterLab. This workflow enables users to transparently execute spiking neural networks on both von-Neumann supercomputers and neuromorphic hardware. Using a single federated identity, the system dispatches jobs to HPC sites (JUSUF, Galileo100) via PyUNICORE and to the SpiNNaker-1 neuromorphic system via the Neuromorphic Computing Platform Interface. To guarantee cross-site reproducibility and mitigate software version drift, we utilize a zero-installation execution mode that dynamically pulls PMIx-aware Apptainer containers to HPC compute nodes. Furthermore, we demonstrate genuine model-level portability using the NESTML domain-specific language, allowing custom neuron models to be written once and automatically compiled for either the NEST (C++) or sPyNNaker backends. Validated with a balanced random network case study, this work illustrates a practical, end-to-end path for hardware-agnostic workflows while highlighting the critical role of containerization and domain-specific languages in achieving true cross-platform reproducibility.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08515v1</guid>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Krishna Kant Singh, Charl Linssen, Eric M\"uller, Eleni Mathioulaki, Wouter Klijn, Lena Oden</dc:creator>
    </item>
    <item>
      <title>Stable Triangle Projections for Variable-Degree Tetrahedral Spaces and Uniform IPDG Preconditioning</title>
      <link>https://arxiv.org/abs/2606.08516</link>
      <description>arXiv:2606.08516v1 Announce Type: new 
Abstract: The main ingredient of this paper is an edge-local variable-degree projection on a triangle that is uniformly stable in both L2 and H1. We use this two-dimensional operator in two tetrahedral constructions. First, on a reference tetrahedron, we build an H1-stable projection from a high order polynomial space onto a variable-degree space whose degrees are prescribed independently on edges, faces, and in the volume. Since the tetrahedral projection is local and trace-compatible, it also gives an h- and p-uniform stable decomposition, in the weighted energy norm, for conforming hp spaces, and hence a uniform additive Schwarz preconditioner for the conforming Laplace operator. Second, on a uniformly regular mapped tetrahedral mesh with elementwise variable polynomial degrees, the same triangular projection gives the finite-layer edge truncation needed in a p-uniform stable DG-to-CG decomposition for the symmetric IPDG norm. The DG-to-CG decomposition, combined with the conforming splitting, gives the IPDG preconditioner. The constants depend only on reference shapes, the local degree-spread bound within each tetrahedron, the neighbor-degree bound across mesh faces, uniform map-regularity, patch cardinalities, and the coefficient path constants; they are independent of h, of the local polynomial degrees, and of the coefficient contrast.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08516v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Situan Li, Weiying Zheng</dc:creator>
    </item>
    <item>
      <title>A Joint Finite-Sample Certificate for Adaptive Selective Conformal Risk Control</title>
      <link>https://arxiv.org/abs/2606.08517</link>
      <description>arXiv:2606.08517v1 Announce Type: new 
Abstract: Selective predictors answer on confident inputs and abstain elsewhere; deploying one safely needs a single finite-sample certificate that simultaneously upper-bounds the selected risk, lower-bounds the acceptance probability $\pacc$ above a floor $\pmin$, and lower-bounds the deployment utility. This certificate must be valid under adaptive threshold selection from a finite grid of $m$ pairs on $\ncert$ samples. We give such a certificate for bounded, possibly non-monotone losses by treating the selected risk directly as a ratio rather than through a Hoeffding-style range bound. The construction couples three confidence bounds: a variance-adaptive empirical-Bernstein bound on the ratio risk, a Clopper--Pearson bound on acceptance, and a two-sided closeness bound on utility. Together they lower-bound the certified policy's utility absolutely and to within $2\gammau$ of the best over the \emph{certified set}, both non-vacuous whenever feasible; a regime-scoped third leg matches an external oracle, informative only where the risk margin $\gammar &lt; \alpha$ and vacuous at the headline operating points. Relative to the range-only Hoeffding-ratio construction this sharpens the acceptance-floor dependence from $1/\pmin$ to $1/\sqrt{\pmin}$, and a closed-form corollary identifies a per-pair regime in which our risk bound dominates a Hoeffding conformal risk control (Hoeffding--CRC) selective bound. Empirically, on ImageNet (three ResNets) and COCO val 2017 panoptic, the certificate opens a $+22$ pp certified-acceptance frontier over Hoeffding--CRC and is ${\approx}10{\times}$ tighter than a non-vacuous matched-valid baseline; these gains are regime-scoped, not universal, and absent on ADE20K. The certifier runs in $O(\ncert m)$ time.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08517v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaoli Yu, Jiamiao Liu</dc:creator>
    </item>
    <item>
      <title>Two Bridges, One Pathway: From VLMs to Generalizable VLAs with Embodied Trajectory-Coupled Data</title>
      <link>https://arxiv.org/abs/2606.08520</link>
      <description>arXiv:2606.08520v1 Announce Type: new 
Abstract: Vision-language models (VLMs) are powerful general-purpose reasoners, yet converting them into robot control policies (VLAs) is surprisingly difficult. The root cause is a two-fold gap: VLMs are trained on internet-scale images with language-understanding objectives, while VLAs must perceive robot scenes and predict motor actions. Fine-tuning a VLM directly on robot action data forces the model to cross both gaps at once -- the learning curve is steep and the rich generalizations learned during pretraining tend to degrade rather than transfer. We argue that this gap can be bridged gradually with the right intermediate data. We introduce \emph{embodied trajectory-coupled (ETC) data} -- vision-language supervision derived from the same robot scenes and trajectories used for action learning. Because ETC data shares the visual context of robot operation while retaining familiar language-understanding objectives, it provides a natural stepping stone between VLM pretraining and VLA fine-tuning. Building on this, we design a three-stage training recipe. Distribution Bridging first adapts the VLM to embodied visual-language semantics. Objective Bridging then gradually shifts the model toward action prediction while preserving the acquired representations. Retentive Adaptation finally specializes the policy to the target deployment domain. We further show that mixing task-relevant out-of-distribution ETC data with a small amount of action data enables the model to generalize to novel visual-language conditions without requiring additional robot demonstrations. Simulation and real-robot experiments confirm that this gradual bridging strategy is the key to transferring VLM generalization into robust, deployable robot policies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08520v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Linqi Yin, Shiduo Zhang, Shenling Qiu, Chenxin Li, Zhaoyang Fu, Lei Xiao, Xiang Wang, Chenchen Yang, Zhe Xu, Pengfang Qian, Jingjing Gong, Xipeng Qiu, Xuanjing Huang, Yu-Gang Jiang</dc:creator>
    </item>
    <item>
      <title>Exploring CKKS Parameter Trade-offs for Privacy-Preserving Personalized Federated Learning</title>
      <link>https://arxiv.org/abs/2606.08521</link>
      <description>arXiv:2606.08521v1 Announce Type: new 
Abstract: Privacy-preserving Personalized Federated Learning (PFL) enables clients to collaboratively train personalized models without exposing raw data, but exchanged model updates remain vulnerable to inference attacks from honest-but-curious servers. Homomorphic Encryption (HE) addresses this by allowing server-side aggregation directly on encrypted updates, with the CKKS scheme being particularly suitable due to its native support for approximate floating-point arithmetic. However, no prior work has examined how to configure CKKS for PFL deployments, leaving practitioners without principled guidance on parameter selection that directly affects privacy, precision, and computational cost. This paper presents pFedCKKS, a generic framework integrating CKKS into PFL, and provides the first systematic parameter selection guide for practitioners. We derive the full CKKS parameter constraints under 128-bit security for the PFL setting, showing the selection problem reduces to choosing just two values: the inner and outer ciphertext prime. Implemented using the Flower framework and TenSEAL library, pFedCKKS is evaluated on the FEMNIST, CelebA and Sentiment140 datasets with FedFinetune, Ditto and FedPer which represents PFL algorithms. Experimental results reveal an empirical trade-off between precision and computational/communication costs. This allows us to draw a concrete guideline for selecting proper CKKS parameters that balance efficiency and accuracy in real-world deployments of pFedCKKS.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08521v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kamolchanok Saengtong, Phanwadee Sinthong, Norrathep Rattanavipanon</dc:creator>
    </item>
    <item>
      <title>DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving</title>
      <link>https://arxiv.org/abs/2606.08525</link>
      <description>arXiv:2606.08525v1 Announce Type: new 
Abstract: Reward models play a pivotal role in reinforcement learning (RL) and multi-modal trajectory selection for autonomous driving. However, acquiring such rewards typically relies on hand-crafted rule-based objectives or perception ground truth, which hinders generalization for data-scaling. While Vision-Language Models (VLMs) have demonstrated feasibility as reward models in other domains, their effectiveness in driving tasks remains underexplored. In this work, we bridge this gap by (1) introducing DriveReward, a reasoning trajectory evaluation dataset rigorously labeled via temporally-grounded visual guidance, and augmented with counterfactual driving behaviors., (2) alongside a specialized Vision-Language Reward Model. To address the scarcity of failure cases in conventional datasets, we propose a counterfactual data annotation scheme to construct cases encompassing diverse driving styles and erroneous behaviors. Evaluations on our proposed benchmark reveal that even leading open-source and proprietary VLMs fail to excel across all tasks, highlighting significant room for improvement in existing models. Building on these findings, we subsequently tailor a specialized 1B reward model that outperforms larger VLMs on task-specific reward alignment. Finally, we validate our reward model's effectiveness by integrating it into RL finetuning and multi-modal trajectory scoring across multiple baselines, achieving performance comparable to rule-based reward calculations in both open-loop and closed-loop evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08525v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qimao Chen, Fang Li, Yuechen Luo, Zehan Zhang, Haiyang Sun, Fangzhen Li, Bing Wang, Guang Chen, Yang Ji, Jiong Deng, Hongwei Xie, Hangjun Ye, Long Chen, Yi Zhang</dc:creator>
    </item>
    <item>
      <title>A Mixed Extended Virtual Element Method for Elliptic Interface Problems on Polygonal Meshes</title>
      <link>https://arxiv.org/abs/2606.08526</link>
      <description>arXiv:2606.08526v1 Announce Type: new 
Abstract: We propose a lowest-order \(H(\operatorname{div})\)-conforming mixed extended virtual element method for elliptic interface problems on interface-unfitted polygonal meshes. The flux and pressure are approximated by subdomain-wise extended \(H(\operatorname{div})\)-VEM spaces and by piecewise constants, respectively. On cut elements, the computable polynomial projection is defined on the whole background element and then restricted to the two subdomains. Compared with BDM-type polynomial spaces, the mixed VEM space contains a non-polynomial component, which gives rise to additional consistency terms on cut elements. To control these terms, we use an enhanced kernel stabilization on cut elements and an interface normal-flux average in the mixed coupling. A corrected interface-flux penalty and a local divergence ghost penalty are added to obtain cut-position-independent stability without using a volume div-div augmentation. We prove continuity, a discrete inf-sup condition, and an optimal first-order error estimate in a mesh-dependent norm. The constants are independent of the mesh size and of the position of the interface relative to the background mesh, but may depend on the coefficient contrast.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08526v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xianyan Zheng, Jinru Chen, Feng Wang</dc:creator>
    </item>
    <item>
      <title>Scaffold Effects on GAIA: A Controlled Comparison</title>
      <link>https://arxiv.org/abs/2606.08529</link>
      <description>arXiv:2606.08529v1 Announce Type: new 
Abstract: Published agent capability scores conflate what a model can do with what its scaffold lets it do, and the magnitude of this elicitation gap is not well characterized under controlled conditions. This study executes a pre-registered controlled comparison of three scaffolds (ReAct, a Planner-Actor-Rater multi-agent design, and planner-then-executor) across five models from three providers (Claude Opus 4.7, Sonnet 4.6, Haiku 4.5; Gemini 3.1 Pro Preview; GPT-5.5) on GAIA validation Levels 1 and 2, holding tasks and conditions fixed, with three attempts per question. Scaffold choice alone moves measured accuracy by as much as 28 percentage points within a single model (Opus, Level 2, robust slice), confirming the pre-registered hypothesis that scaffold variation produces gaps of at least 10 points. The pre-registered prediction that more capable models would be less scaffold-sensitive is rejected in direction: scaffold effects vary significantly by model in every dataset slice, but the most capable Anthropic model gains the most from structured scaffolds at the harder level, and tier-scaling holds only at Level 1 under the robust slice. The multi-agent advantage over ReAct at Level 2 appears within the Anthropic family but not for the cross-provider models, making model family rather than capability tier the conditioning variable, and the predicted planner-executor advantage on file-reading tasks is falsified. Structured scaffolds make fewer tool calls yet recover more often from mid-trajectory errors at the harder level, and a single cell (Gemini with planner-then-executor) is the cheapest at both levels and the most accurate at Level 2. These results indicate that single-scaffold capability numbers are scaffold-conditional estimates and that the elicitation gap is not guaranteed to shrink as models improve.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08529v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jason Starace</dc:creator>
    </item>
    <item>
      <title>GEAR-VLA: Learning Geometry-Aware Action Representations for Generalizable Robotic Manipulation</title>
      <link>https://arxiv.org/abs/2606.08530</link>
      <description>arXiv:2606.08530v1 Announce Type: new 
Abstract: Vision-Language-Action (VLA) models achieve strong benchmark performance but still struggle in real-world deployment with unseen objects, background shifts, and different robot embodiments. We argue that this stems from the lack of a unified geometry-aware manipulation representation, leaving existing VLAs vulnerable to low-level trajectory supervision, misaligned 3D features, and embodiment differences. To address this, we propose GEAR-VLA, a VLA framework for learning unified geometry-aware action representations for generalizable robotic manipulation. GEAR-VLA adopts coarse-to-fine action learning, where multi-source embodied pretraining equips the VLM with embodied reasoning and discrete action understanding before latent action tokens connect action semantics to a gradient-decoupled DiT continuous action expert. It further performs semantic-aligned 3D integration by aligning a trainable 3D spatial backbone with the VLA representation while freezing the original VLM-aligned visual pathway. To share this representation across robots, GEAR-VLA uses embodiment canonicalization, where embodiment-aware states and embodiment-invariant actions confine robot differences to the low-level interface. Extensive simulation and real-world experiments demonstrate strong generalization: GEAR-VLA achieves state-of-the-art performance on LIBERO, zero-shot LIBERO-Plus, and RoboTwin 2.0, reaches 85.9% success on AgileX and 81.0% on the pretraining-unseen LDT-01 embodiment, and obtains 90.1% success on a 6,360-trial universal grasping benchmark with 212 unseen objects. Code and models will be released at https://github.com/babynabeauty/GEAR-VLA.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08530v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuan Zhang, Shiqi Zhang, Yedong Shen, Shuai Dong, Jiajun Deng, Xin Zhang, Yuxuan Gao, Jiajia Wu, Xin Nie, Zhiyuan Cheng, Jianmin Ji, Yanyong Zhang, Xingyi Zhang, Jia Pan</dc:creator>
    </item>
    <item>
      <title>VESTA: A Fully Automated Scenario Generation and Safety Evaluation Framework for LLM Agents</title>
      <link>https://arxiv.org/abs/2606.08531</link>
      <description>arXiv:2606.08531v1 Announce Type: new 
Abstract: Large language models (LLMs) are increasingly evolving from simple text-based interaction systems into LLM agents that can maintain memory, use tools, access external environments, and execute tasks. As their capabilities and autonomy expand, the safety risks they face also become more diverse. Existing evaluations often rely on manually written scenarios, static prompts, or final-output judgments, making it difficult to capture the diverse risks that agents may face during task execution. We introduce VESTA, a fully automated scenario generation and safety evaluation framework for LLM agents. Based on five risk dimensions, VESTA instantiaes abstract and diverse safety risks in real-world task execution into 1,072 measurable evaluation scenarios. Using the automated evaluation pipeline, 12 LLM agents are evaluated under two authority contexts. The results show that current agents still face substantial behavioral safety risks during task execution, with an average ASR of 47.1% and several models exceeding 70%. These findings demonstrate the importance of executable, process-level evaluation for understanding and improving LLM agent safety.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08531v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lu Jia, Haibo Tong, Feifei Zhao, Jindong Li, Dongqi Liang, Ping Wu, Qian Zhang, Yi Zeng</dc:creator>
    </item>
    <item>
      <title>DN-Hypo-Pipeline: An AI-Driven Workflow for Hypothesis Generation via Large Language Models and Scientific Explanations</title>
      <link>https://arxiv.org/abs/2606.08532</link>
      <description>arXiv:2606.08532v1 Announce Type: new 
Abstract: A scientific hypothesis is the first step in research and undergoes experimental validation, yet it also reflects a deep understanding of and reasoning about scientific phenomena. We introduce DN-Hypo-Pipeline, an AI-powered workflow based on large language models, designed to support structured scientific thinking and hypothesis generation by leveraging scientific explanations as prior knowledge. This pipeline assists researchers in deriving novel hypotheses from existing literature. Given the explanandum (i.e., the conclusion) of a research paper, it identifies underlying laws, theories, and principles, and reconstructs a new, yet-to-be-verified explanation for the observed phenomenon. We evaluated DN-Hypo-Pipeline in the field of data science modeling using three highly cited papers. Statistical inference, supported by both LLM-as-judge assessment and human expert evaluation, demonstrates that our pipeline is more effective than direct generation methods. Additionally, we validated the two highest-scoring generated hypotheses by developing corresponding novel algorithms, which outperformed the baseline models presented in the original papers. Beyond application in data science, DN-Hypo-Pipeline provides a theoretical framework that not only encompasses theory-guided data science modeling methods but also reveals a more fundamental structure of the modeling process. Moreover, this approach is essentially a generalization of theory-guided modeling, offering potential for extension to other domains and across a broader range of scientific disciplines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08532v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lei Lin, Ronghao Wang, Chunbao Zhou, Jue Wang, Yangang Wang</dc:creator>
    </item>
    <item>
      <title>Autonomous Aerial Manipulation via Contextual Contrastive Meta Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2606.08533</link>
      <description>arXiv:2606.08533v1 Announce Type: new 
Abstract: Unmanned aerial vehicles (UAVs) are increasingly being deployed in logistics, service robotics, and other real-world applications, creating a growing demand for autonomous payload acquisition and delivery. Existing approaches typically assume pre-attached payloads or rely on specialized grippers, leaving versatile end-to-end aerial delivery largely unresolved, where different payloads induce highly variable flight dynamics, requiring a single policy to adapt online without manual calibration or explicit system identification. To this end, we study \textbf{A}utonomous \textbf{A}erial Manipulation via \textbf{Co}ntextual \textbf{Co}ntrastive Meta Reinforcement Learning (\textbf{\textit{Aco2}}), a fully autonomous aerial delivery setting in which a quadrotor equipped with a lightweight hook continuously picks up, transports, and delivers diverse handle-equipped objects between randomized locations, all without human intervention. First, we design a contextual observation encoder that infers a compact latent context from recent interaction history, enabling the policy to adapt online to payload-dependent dynamics. To further improve the quality of this context, we introduce a contrastive objective that structures the context embedding around task-relevant variations, improving generalization across diverse payloads without requiring explicit system identification. Trained entirely in simulation with extensive domain randomization, \textit{Aco2} can be directly deployed on a physical quadrotor without real-world fine-tuning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08533v1</guid>
      <category>cs.LG</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lixuan Jin, Bingxuan Lan, Xinyi Bao, Xiangyuan Xie, Chunjie Zhang, Zheng Chen, Tianshuo Liu, Ruijie Tian, Jinyu Ru, Gang Wang, Lei Yuan, Yang Yu</dc:creator>
    </item>
    <item>
      <title>NGram-MoSE: Efficient Remote Sensing Super-Resolution via N-Gram Context and Mixture-of-Experts</title>
      <link>https://arxiv.org/abs/2606.08535</link>
      <description>arXiv:2606.08535v1 Announce Type: new 
Abstract: Remote sensing applications for environmental monitoring and disaster management are frequently constrained by a spatial--temporal trade-off: imagery with fine spatial detail is often acquired less frequently, whereas more temporally available observations are typically coarser. Single-image super-resolution provides a practical means to enhance coarse imagery without changing acquisition schedules, yet many Transformer-based SR models remain computationally expensive and can be sensitive to limited or geographically biased training data, which degrades robustness under out-of-distribution conditions. This paper presents NGram-MoSE, a lightweight Transformer architecture designed to improve both efficiency and texture continuity. NGram-MoSE introduces N-Gram Context Injection to strengthen cross-window local consistency and mitigate window-boundary artifacts, and incorporates a Mixture-of-Experts (MoE) feed-forward design to scale capacity through sparse activation without proportional growth in inference cost. Experiments on a geographically disjoint OOD test set show that NGram-MoSE achieves 31.68\,dB PSNR while reducing FLOPs by \(14\times\) relative to a heavyweight Transformer reference. Downstream evaluation on a landslide segmentation benchmark further demonstrates that restoring degraded inputs to the detector training scale improves performance, yielding a 4.47\% absolute gain in mAP@50 over bicubic upsampling, and exhibits stronger cross-scale consistency under scale extrapolation. These results indicate that NGram-MoSE provides an effective SR module for resource-constrained remote sensing pipelines requiring robust generalization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08535v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yun-Hsuan Huang, Trong-An Bui, Chih-Hung Chuang</dc:creator>
    </item>
    <item>
      <title>Routine laboratory trajectories encode the onset of organ-level complications in cancer</title>
      <link>https://arxiv.org/abs/2606.08538</link>
      <description>arXiv:2606.08538v1 Announce Type: new 
Abstract: Routine laboratory panels drawn during cancer treatment constitute longitudinal physiological recordings of organ function, yet their temporal structure is discarded by single-timepoint prognostic tools. A transformer trained on 2,777,595 laboratory measurements from 3,905 patients with multiple myeloma or ovarian cancer predicted the two-year onset of 162 treatment-associated complications, including therapy-related myelodysplastic syndromes, spanning eight clinical categories, achieving 1.5- to 6.1-fold enrichment above prevalence at the group level. It matched or outperformed non-sequential baselines across grouped endpoints (AUROC gains up to +0.11), demonstrating that longitudinal laboratory trajectories capture evolving complication-specific physiology inaccessible from isolated measurements. Predictions generalised across both cancers, divergence concentrating in disease-specific complications, and biomarker masking recovered signatures consistent with established pathophysiology. External validation on MIMIC-IV and MMRF CoMMpass confirmed transferability across independent healthcare systems (AUROC up to 0.85). Routine oncological laboratory data encode organ deterioration weeks to months before clinical onset, enabling complication-specific surveillance without additional testing infrastructure.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08538v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jannik L\"ubberstedt, Krischan Braitsch, Jacqueline Lammert, Christof Winter, Florian Gabriel, Tristan Lemke, Christopher Zirn, Markus Graf, Friedrich Puttkammer, Hartmut H\"antze, Johannes Moll, Anirudh Narayanan, Andrei Zhukov, Fabian Drexel, Zeineb Ben Chaaben, Sebastian Ziegelmayer, Su Hwan Kim, Marion H\"ogner, Jan Kirschke, Florian Bassermann, Marcus Makowski, Christian Wachinger, Lisa Adams, Keno Bressem</dc:creator>
    </item>
    <item>
      <title>AgentTrust: A Self-Improving Trust Layer for AI-Agent Actions</title>
      <link>https://arxiv.org/abs/2606.08539</link>
      <description>arXiv:2606.08539v1 Announce Type: new 
Abstract: AI agents increasingly take consequential actions -- shell commands, cloud operations, and arbitrary tool-calls -- so a trust layer must decide, per action, whether to allow, warn, block, or escalate. We argue that the right way to reason about such a layer is by threat type. Lexical (fixed-signature) threats, where danger lives in a stable token, are decidable by deterministic rules; semantic (intent-dependent) threats, where a benign and a malicious action share the same surface, are out of reach for rules by construction. We make this concrete with a negative proof: a determined, hand-authored cloud rule pack lifts held-out accuracy only 48 to 56% overall and moves the semantic categories by 0pp (data_db 29 to 29, observability 59 to 59, supply_chain 50 to 50), while a strong LLM judge carries exactly those categories. We give the judge a self-learning capability: on a corpus that is mainly semantic attacks it nearly doubles rule accuracy (48% to 83.6-85.2%) with near-zero false-blocks, and this holds across two model providers. We turn this into a self-improving dual-store system: the judge distills a growing deterministic rule floor on lexical threats (cheaper over time) and feeds a guarded RAG memory on semantic threats (a verdict-cache fails -- surface-twins collapse to ~58% -- so a corroboration guard lifts semantic accuracy +13pp, 70 to 84). The result is what sets AgentTrust v2 apart from its static v1 predecessor: a trust layer that self-evolves from its own stream of decisions -- cheaper on the lexical class (it distils its own rules) and smarter on the semantic class (it accrues guarded precedent), while never hard-blocking a benign action. An end-to-end online replay shows the judge-call rate falling (50% to 44%) and judge-domain accuracy rising (71% to 80%), with 0 benign hard-blocks across 45,000 actions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08539v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chenglin Yang</dc:creator>
    </item>
    <item>
      <title>When Video Misreads: Closed-Loop Distillation of Reading Heuristics for Exploratory Manipulation Trace QA</title>
      <link>https://arxiv.org/abs/2606.08542</link>
      <description>arXiv:2606.08542v1 Announce Type: new 
Abstract: Exploratory manipulation often turns an apparent failed attempt into the key evidence for what to do next. For example, a robot pulls a locked cabinet drawer, fails, and only succeeds after opening the lock. The failed pull reveals a latent precondition (the drawer is locked) that determines the minimal-success action chain (the fewest actions that complete the task), here [lock-open, drawer-pull]. Correctly reading this trace is therefore the prerequisite for recovering that chain. We formalize this setting as Exploratory Manipulation Trace QA (EMT-QA): given synchronized video and proprioception from an exploratory trace, predict the minimal-success action chain under the latent precondition revealed by the probe. However, even state-of-the-art VLMs and embodied multimodal LLMs misread this evidence: they do not reliably recover the chain from raw video, raw proprioception, or their combination.
  We introduce Closed-Loop Trace Distillation, a pipeline that uses a per-task coding agent to inspect labeled training traces and distill a one-line natural-language prompt over the trace, which we call the Distilled Reading Heuristic (DRH). At inference, no agent is invoked and no model weights are updated; a frozen VLM receives the raw trace plus the DRH as a prompt entry. Across three simulator and two real-robot tasks, the DRH improves chain accuracy by +0.38 to +0.47 over the best raw-modality baseline. The same DRH also serves as the sole specification for one-shot programmatic classifiers that match the prompted VLM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08542v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haizhou Ge, Yufei Jia, Yue Li, Zhixing Chen, Lu Shi, Lei Han, Guyue Zhou, Ruqi Huang</dc:creator>
    </item>
    <item>
      <title>PAEC: Position-Aware Entropy Calibration for LLM Reasoning in RLVR</title>
      <link>https://arxiv.org/abs/2606.08543</link>
      <description>arXiv:2606.08543v1 Announce Type: new 
Abstract: Reinforcement learning with verifiable rewards (RLVR) improves large language model reasoning but often suffers from rapid policy-entropy collapse, where the policy prematurely concentrates on narrow high-probability reasoning paths. While global entropy regularization can encourage exploration, uniformly increasing entropy across all token positions is inefficient for long reasoning trajectories, where many tokens are not decision-relevant. We propose Position-Aware Entropy Calibration (PAEC), a token-level entropy-management framework that constructs a soft mask from local top-p entropy and top-two candidate competition, and applies an anchor-based lower-bound penalty to prevent selected-position entropy collapse. Experiments on five mathematical reasoning benchmarks show that PAEC improves macro-average majority-vote performance over strong RLVR baselines, with clear gains on AIME-style tasks. Our results suggest that entropy management in reasoning RL should be formulated as selective exploration allocation over decision-sensitive positions rather than uniform randomness injection.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08543v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shumeng Yang, Yisu Liu, Jiayi Zheng, Zhaohui Yang, Linjing Li</dc:creator>
    </item>
    <item>
      <title>Ishigaki-IDS: An Open-Weight Verifier-Aware Model for Information Delivery Specification Drafting in Building Information Modeling</title>
      <link>https://arxiv.org/abs/2606.08545</link>
      <description>arXiv:2606.08545v1 Announce Type: new 
Abstract: Building Information Modeling (BIM) projects require information requirements to be described as machine-checkable Information Delivery Specification (IDS) files in order to verify whether building models contain the required attributes. However, IDS authoring remains a practical bottleneck: practitioners must handle domain vocabulary, strict XML schema constraints, and external validator conformance while also checking whether the requirement itself is correctly expressed. We present Ishigaki-IDS, an open-weight LLM specialized for verifier-aware IDS draft generation. The model combines continued pretraining on BIM/IDS corpora, supervised fine-tuning on information-requirement-to-IDS pairs, and reinforcement learning with verifiable rewards from an external validator. The goal is not to replace expert review, but to move IDS authoring from low-level XML and schema repair toward validator-loadable drafts that practitioners can inspect and correct. On the 166-case expert-created Ishigaki-IDS-Bench, Ishigaki-IDS-8B achieves an IDSAuditPass score of 0.651, a validator-pass metric for generated IDS files, substantially outperforming Claude Opus 4.5, the strongest single-shot LLM baseline we evaluated, at 0.331. It also obtains an Audit-Gated FacetF1 of 0.282, which measures requirement-facet alignment among validator-passing drafts. The same recipe scales: 14B and 32B variants reach IDSAuditPass 0.753 / 0.693 and Audit-Gated FacetF1 0.392 / 0.369. In a workflow check with six BIM practitioners, Ishigaki-assisted authoring reduced aggregate work time by 54.7% under the same validation and alignment endpoint. These results suggest that verifier-aware IDS generation can reduce the practical burden of converting BIM information requirements into reviewable IDS drafts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08545v1</guid>
      <category>cs.CL</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ryo Kanazawa, Koyo Hidaka, Teppei Miyamoto, Takayuki Kato, Tomoki Ando, Chenguang Wang, Dayuan Jiang, Naofumi Fujita, Shuhei Saitoh, Atomu Kondo, Koki Arakawa, Daiho Nishioka</dc:creator>
    </item>
    <item>
      <title>OASIS: From Simulation Data Collection to Real-World Humanoid Loco-Manipulation</title>
      <link>https://arxiv.org/abs/2606.08548</link>
      <description>arXiv:2606.08548v1 Announce Type: new 
Abstract: Recent progress in robot manipulation has been largely driven by learning from large-scale demonstrations. For humanoid robot loco-manipulation tasks, however, existing data sources force an unsatisfying tradeoff between trajectory quality and scalability. Real-world teleoperation provides the highest-quality trajectories but requires dedicated physical space and time-consuming scene resets. Simulation offers an alternative way out of this dilemma: it can produce clean, embodiment-aligned data at scale without any physical hardware. In this paper, we propose OASIS, a simulation-data-driven framework for humanoid loco-manipulation. OASIS automatically reconstructs realistic object assets from real-world images using a 3D generative model. Based on these assets, trajectories are first collected through teleoperation in simulation, and then augmented under diverse domain randomizations in a post-processing stage. With the resulting simulation data, we further design a hierarchical visuomotor policy for humanoid loco-manipulation. Extensive experiments on the real humanoid robot show that, under zero-shot deployment, the policy trained on our simulation data achieves higher success rates on most tasks than that trained on real-robot teleoperation data, owing largely to the broad lighting and environmental variations covered by our simulation rendering, which real-robot data fails to capture. The project page is available at https://oasis-humanoid.github.io/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08548v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Zehao Yu, Jiakun Zheng, Weiji Xie, Jiyuan Shi, Chenyun Zhang, Chenjia Bai, Xuelong Li</dc:creator>
    </item>
    <item>
      <title>Quantitative Promise Theory: Intentionality and Inference in Autonomous Agents</title>
      <link>https://arxiv.org/abs/2606.08552</link>
      <description>arXiv:2606.08552v1 Announce Type: new 
Abstract: I discuss some quantitative representations of Promise Theory for processes involving autonomous agents. Agent models are common in software systems, machine learning, and biology, for example, but may also apply to physics and other forms of engineering. I describe how Bayesian probability and information theoretic optimization, including Active Inference, may be incorporated with promise semantics -- as well as how Promise Theory supplements solutions, helping to avoid probability's pitfalls, which include non-local coordination, calibrating, and normalizing probabilistic computations. The role of boundary conditions in constraining allowed states and selecting decision thresholds is a form of promise, and agent alignment provides a scalable definition of intent. Autonomous agents may congeal into swarms with superagent characteristics by trying to minimize their information, despite uncertainty that works to maximize it. The use of Promise Theory involves some research challenges as well as stylistic preferences.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08552v1</guid>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <category>cs.NE</category>
      <category>physics.data-an</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mark Burgess</dc:creator>
    </item>
    <item>
      <title>FusionVul: A Multimodal Feature Fusion Framework for Source Code Vulnerability Detection</title>
      <link>https://arxiv.org/abs/2606.08553</link>
      <description>arXiv:2606.08553v1 Announce Type: new 
Abstract: Source code vulnerability detection remains a long-standing challenge due to the increasing scale, structural complexity, and semantic diversity of modern codebases. Conventional static-analysis or rule-based approaches often fail to capture subtle execution dependencies, while single-modality learning models tend to overlook critical structural information embedded beyond the lexical surface of source code. To improve robustness across heterogeneous code patterns, we propose FusionVul, a joint representation learning framework that integrates sequential syntactic representations extracted by a pretrained Transformer encoder with structural semantics propagated through a graph neural network. The framework further incorporates a cross-attention-based feature fusion network to enable fine-grained cross-modal interaction and employs a sample-aware weighting mechanism to integrate multiple predictive branches. Experimental results on four datasets demonstrate that FusionVul achieves superior F1 scores on datasets with highly dispersed function size distributions and broader vulnerability-type coverage, such as SVulD and DiverseVul, reflecting its capability to capture complex and diverse vulnerability patterns.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08553v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hongyu Yang, Yaping Zhu, Jingchuan Luo, Hiroshi Nomaguchi, Chunhua Su, Willy Susilo</dc:creator>
    </item>
    <item>
      <title>A Theoretical Analysis of Memory and Overfitting Phenomena in Stochastic Interpolation Models</title>
      <link>https://arxiv.org/abs/2606.08554</link>
      <description>arXiv:2606.08554v1 Announce Type: new 
Abstract: This paper provides a theoretical account of memorization in stochastic interpolation models. By leveraging closed-form expressions for the optimal velocity field and the associated score function, we show that, in the continuous-time oracle setting, both deterministic and stochastic generation processes recover training samples. Under Euler discretization, generated samples remain centered around training samples, with deviations controlled by the step size. We further analyze generation in the presence of estimation errors and show that accumulated estimation errors control the endpoint deviation from the training set. These results imply that the generated sample admits a representation as a training sample perturbed by three controlled terms: a discretization-induced bound, an estimation-error-induced bound, and stochastic Gaussian noise. Based on this characterization, we provide theoretical definitions of overfitting and underfitting in generative models. Synthetic simulations support our theoretical findings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08554v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yunchen Li, Shaohui Lin, Zhou Yu</dc:creator>
    </item>
    <item>
      <title>FAWAM: Force-Aware World Action Models for Closed-Loop Contact-Rich Manipulation</title>
      <link>https://arxiv.org/abs/2606.08555</link>
      <description>arXiv:2606.08555v1 Announce Type: new 
Abstract: Force signals provide critical interaction cues for contact-rich robotic manipulation. However, existing methods mostly use force as an additional observation modality, without fully exploiting its role in modeling future interaction dynamics or guiding execution-time feedback correction. In this paper, we propose FAWAM, a force-aware world action model that incorporates force information at three levels: perception, prediction, and closed-loop execution. FAWAM first encodes historical 6-axis force/torque signals to modulate action generation, then jointly predicts future actions and end-effector wrenches to explicitly model contact evolution. It further introduces a residual correction module that uses the predicted wrench trajectory as an execution-time reference to refine actions online based on real-time force feedback. Real-world experiments across multiple contact-rich tasks show that FAWAM improves the average success rate by 36.25% over vision-only baselines and 21.25% over existing force-aware baselines, demonstrating the effectiveness of our force-aware framework for robust contact-rich manipulation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08555v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haotian He, Zeyu Yan, Qipeng Liu, Ning Guo, Wenzhao Lian</dc:creator>
    </item>
    <item>
      <title>Inside the LLM Word Factory</title>
      <link>https://arxiv.org/abs/2606.08562</link>
      <description>arXiv:2606.08562v1 Announce Type: new 
Abstract: Transformer language models process input provided as subword fragments, but natural language semantics usually rely on word-level concepts. Detokenization is the process where models reconcile these two facts, aggregating subwords into word-level representations through their computation. Prior work has found that this takes place mostly in early-to-middle layers, but so far the exact mechanics of the process have not been pinned down. We venture deep into detokenization using activation patching in controlled paired experiments that isolate the contribution of different model components, localizing English detokenization in Llama2-7B to a two-stage process at Layer 1. Attention transmits a token-specific signal from nonfinal subwords, using sequential relays if necessary, while the MLP composes it with the local embedding. This two-stage structure generalizes to twelve models from eight families, but the depth over which it takes place depends on the flavor of positional encoding: RoPE-based models detokenize over 1 to 5 layers, while learned-absolute models take 5 to 10. Finally, we provide a probe for determining the success of the detokenization process based on early-layer activations alone, performing at 0.94-0.97 AUROC depending on the amount of context.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08562v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Benzi Busigin, Yuval Pinter</dc:creator>
    </item>
    <item>
      <title>Physics-Guided Dual Decoding and Spectral Supervision for Global 3D Hydrometeor Prediction</title>
      <link>https://arxiv.org/abs/2606.08563</link>
      <description>arXiv:2606.08563v1 Announce Type: new 
Abstract: While global data-driven models excel at predicting continuous atmospheric variables, three-dimensional hydrometeor forecasting remains challenging due to the zero-inflated, long-tailed distributions of these variables. Standard deep learning optimization often yields overly smooth forecasts, attenuating extreme events and spatial textures. We propose PredHydro-Net, a physics-guided dual-decoding framework that mitigates this smoothing. To resolve multi-variable optimization conflicts, it employs a decoupled architecture where macroscopic thermodynamic and dynamic fields unidirectionally modulate hydrometeor generation. By integrating wavelet-based frequency decoupling, spectral amplitude matching, and adversarial training, the model achieves a favorable trade-off between quantitative accuracy and spatial fidelity. In a 72-h global evaluation, PredHydro-Net outperforms both spatiotemporal deep learning baselines (Earthformer and PredRNNv2) and the operational Global Forecast System (GFS) in extreme-event detection and spectral representation. Furthermore, it demonstrates strong climatological consistency with Global Precipitation Measurement (GPM) satellite retrievals. The model reasonably reproduces the three-dimensional cloud structures in extreme weather events, such as Hurricane Ian. Feature attribution confirms its dependence on physical precursors such as relative humidity and wind convergence, offering a robust, physics-informed approach to long-tailed atmospheric prediction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08563v1</guid>
      <category>cs.LG</category>
      <category>physics.ao-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dandan Chen, Yaqiang Wang</dc:creator>
    </item>
    <item>
      <title>Real-IKEA: Physical Fidelity is the Prerequisite for Robust Manipulation</title>
      <link>https://arxiv.org/abs/2606.08564</link>
      <description>arXiv:2606.08564v1 Announce Type: new 
Abstract: Robotic manipulation robustness often founders on the physics gap between simplified simulations and the resistance-laden real world. In this work, we emphasize that physical realism in articulated interaction is an important ingredient for robust policy learning. We present Real-IKEA, a dataset and simulation framework designed with physical accuracy as a first-class goal. Real-IKEA provides 1,079 articulated asset configurations, derived from 83 authentic IKEA handles and knobs processed through a meticulous six-step physical workflow. For contact-geometry accuracy, we introduce a bidirectional surface-deviation metric to quantify collision meshes. For dynamics realism, we establish resistance-calibrated configurations that vary damping and friction. Crucially, we demonstrate through a Reinforcement Learning (RL) policy that high-fidelity assets enable the discovery of robust "hooking" and "levering" strategies that prioritize mechanical advantage over fragile friction-pulling. Together, these results position Real-IKEA as a critical benchmark for developing manipulation policies capable of human-level robustness in articulated object tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08564v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kunqi Xu, Zhenhao Huang, Siyuan Luo, Ziqiu Zeng, Fan Shi</dc:creator>
    </item>
    <item>
      <title>EinSort: Sorting is All We Need for Tensorizing LLM</title>
      <link>https://arxiv.org/abs/2606.08565</link>
      <description>arXiv:2606.08565v1 Announce Type: new 
Abstract: Tensor networks provide efficient representations for compressing large neural networks. By carefully designing shapes and topologies, they can significantly reduce memory and computational costs. However, identifying implicit low-rank structures in large foundation models remains challenging due to their enormous scale and un-structured weight distributions. We propose an adaptive tensorization method that discovers inherent low-rank structure in a target tensor by index ordering. Experiments on weight and KV-cache compression demonstrate improved reconstruction quality compared to baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08565v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Toshiaki Koike-Akino, Jing Liu, Ye Wang</dc:creator>
    </item>
    <item>
      <title>Towards Accurate Emotion-Attributed Video Captioning via Fine-grained Emotion-Cause Pair Extraction</title>
      <link>https://arxiv.org/abs/2606.08566</link>
      <description>arXiv:2606.08566v1 Announce Type: new 
Abstract: Emotional Video Captioning (EVC) is a challenging task that aims to generate factually accurate and emotionally rich descriptions for videos. Existing EVC methods leverage holistic visual features to mine global emotional cues, and then aggregate multimodal features to guide the emotional caption generation, which ignores the critical characteristic of the EVC task. Visual emotions are evoked by specific motivational causes, which are usually only implied in core video segments. The holistic mining brings significant information redundancy and inaccurate emotional cues. Thus, fine-grained visual cause extraction has a facilitative effect on both emotion perception and emotion-attributed caption generation. To this end, we propose a fine-grained emotion-cause pair extraction framework for emotion-attributed video captioning. Specifically, we learn pair-wise emotion and cause features in two rounds: 1) We propose a Concept-aware Visual Semantic Decomposition module to augment visual features by exploring scene, object, and motion concepts. Besides, to enhance emotional features, we propose a Visual-guided Emotion Interpretable Learning module, which guides emotion refinement with visual temporal dynamics, and augments the interpretable refinement process by reliable VAD-vector constraints. 2) We achieve emotion-cause pair extraction by cross-coupling the visual and emotional features before and after refinement, and leverage contrastive loss to achieve semantic forced alignment. Overall, our approach optimizes complex semantic understanding and emotion perception of videos, leading to a promising performance in emotional captioning. Extensive experiments on three challenging datasets demonstrate the superiority of our approach and each proposed module, e.g., achieving the best performances with +4.4% and +5.4% w.r.t. BLEU-2 and ROUGE-L, respectively, on the EVC-MSVD dataset.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08566v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Weidong Chen, Cheng Ye, Zhendong Mao, Liping Wang, Xinyan Liu, Yongdong Zhang</dc:creator>
    </item>
    <item>
      <title>Regulating the AI Tutor: Intentions, Help-Seeking, and Self-Regulated Learning in Adolescent GenAI Use</title>
      <link>https://arxiv.org/abs/2606.08568</link>
      <description>arXiv:2606.08568v1 Announce Type: new 
Abstract: Generative AI (GenAI) tools are now common learning companions for adolescents, yet how they regulate their use during authentic learning tasks remains poorly understood. Self-regulated learning (SRL) and high-level help-seeking (HS) are commonly proposed as safeguards against passive or shortcut-oriented use, but most empirical studies focus on aggregate learning outcomes rather than these moment-to-moment processes during AI-supported learning.
  This work-in-progress examines open-ended conversational data from 98 Grade-9 students across three German Gymnasium schools, who used a web-based Mistral-Large tutor to prepare a curriculum-aligned mathematics skill before an exam. Alongside chat logs (1,616 turns; 808 student turns), we collected pre-post domain knowledge, pre-chat learning needs, and self-reported cognitive load. We propose a turn-level codebook combining theory-driven SRL and HS constructs with two LLM-specific inductive codes (agency over the AI; epistemic vigilance), and report preliminary AI-coded results.
  Although students overwhelmingly selected scaffolded support before the chat, their interactions were dominated by instrumental requests with almost no explicit monitoring or evaluation. Post-test performance was significantly lower than pre-test, and higher extraneous cognitive load predicted lower post-test scores after controlling for prior knowledge. We discuss how these patterns can support hybrid human-AI analysis of interaction patterns and inform scaffolds for more agentic and epistemically proactive GenAI use.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08568v1</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rania Abdelghani, Peter Kaiser, Kou Murayama</dc:creator>
    </item>
    <item>
      <title>Calibration of Structured Ignorance Certificates for Diagnosing Unknown Unknowns in Reasoning Models</title>
      <link>https://arxiv.org/abs/2606.08571</link>
      <description>arXiv:2606.08571v1 Announce Type: new 
Abstract: Large language models frequently fail in a characteristic way: rather than acknowledging ignorance, they produce fluent but incorrect answers to questions that lie beyond their knowledge boundaries. We introduce \textbf{Structured Ignorance Certificates} (SICs), a JSON-formatted output schema that demands a model explicitly name the missing domain intersection, enumerate required concepts, and propose a productive retrieval query rather than hallucinating an answer. To train models to produce high-quality SICs we construct a 7,347-sample \emph{Unknown-Unknown} (UU) dataset by prompting Qwen3-14B to stitch together questions from seven domains (physics, biology, engineering, CS, economics, medical, legal) into novel cross-domain queries that no single-domain expert could answer. We fine-tune a 14B-parameter model with Group Relative Policy Optimization (GRPO) using a composite reward that combines retrieval utility, concept specificity, and output-format validity. A paraphrase-divergence probe trained on model responses confirms that SIC-tuned outputs systematically exhibit higher unknown-unknown probability scores. Evaluation on 735 held-out UU questions achieves a 99.46\% JSON validity rate, a mean Certificate Specificity Score of 0.967, and a 3.6\% ROUGE-L improvement over the base model on retrieval-grounded generation -- demonstrating that explicit epistemic structuring is a learnable and measurable capability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08571v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Subramanyam Sahoo</dc:creator>
    </item>
    <item>
      <title>OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning</title>
      <link>https://arxiv.org/abs/2606.08572</link>
      <description>arXiv:2606.08572v1 Announce Type: new 
Abstract: While Omni-modal Large Language Models (OLLMs) have demonstrated impressive capabilities in jointly processing audio and visual streams, their ability to strictly adhere to complex, multi-faceted user instructions remains largely unexplored. Existing benchmarks primarily focus on holistic video understanding or text-only instruction following, failing to capture the intricate interplay between modalities and user constraints. To bridge this gap, we introduce OmniCap-IF, the first comprehensive benchmark specifically designed to evaluate instruction-following capabilities in omni-modal captioning. OmniCap-IF incorporates a systematic framework that assesses captions on two dimensions: format correctness and content correctness. Our benchmark encompasses 50 distinct constraint types across pure visual, pure audio, and audio-visual modalities, while integrating Temporal Grounding to assess spatio-temporal precision. Extensive evaluations of prominent models on 1,920 high-quality samples reveal significant performance disparities. Furthermore, our analysis uncovers a critical "format-content tradeoff", demonstrating that increasing formatting complexity directly degrades models' omni-modal reasoning abilities. Finally, to advance the field, we curate a 54K instruction-tuning dataset, OmniCap-IF-54K and present OmniCaptioner-IF, which achieves notable improvements in both complex instruction adherence and general omni-modal captioning performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08572v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiahao Wang, An Ping, Yanghai Wang, Yuanxing Zhang, Shihao Li, Hanyan Bian, Yichi Ren, Yize Zhang, Han Wang, Haowen Chen, Junze Li, Jiaqi Wang, Yiyang Hu, Zhuze Xu, Zijie Zhang, Jiaheng Liu</dc:creator>
    </item>
    <item>
      <title>Titans-as-a-Layer: Test-Time Memory for Conversational Speech Emotion Recognition</title>
      <link>https://arxiv.org/abs/2606.08573</link>
      <description>arXiv:2606.08573v1 Announce Type: new 
Abstract: Speech emotion recognition (SER) is commonly formulated as utterance-level classification, although conversational emotion depends on a speaker's usual vocal range and the emotional context established by previous utterances. Speech-language models provide strong pretrained acoustic and semantic representations, and can adapts them to SER labels via finetune, but this mechanism still missing per-dialogue state. We study whether test-time neural memory can supply this missing context while leaving the large audio language models (LALMs) backbone intact. Building on Titans, we introduce a plug-and-play Memory-as-a-Layer (MAL) adapter that writes dialogue history into a small neural memory and reads it back as an audio-token-aligned residual update, avoiding changes to the host model's token positions. Across different audio LLMs and emotion recognition datasets evaluations, our design improves SER performs across different evaluation metrics, supporting test-time memory as a residual contextual mechanism for conversational SER.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08573v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Daniel Chen, Qicong Hu, Yang Xiao, Ting Dang, Hong Jia</dc:creator>
    </item>
    <item>
      <title>OrderDP: A Theoretically Guaranteed Lossless Dynamic Data Pruning Framework</title>
      <link>https://arxiv.org/abs/2606.08574</link>
      <description>arXiv:2606.08574v1 Announce Type: new 
Abstract: Data pruning (DP), as an oft-stated strategy to alleviate heavy training burdens, reduces the volume of training samples according to a well-defined pruning method while striving for near-lossless performance. However, existing approaches, which commonly select highly informative samples, can lead to biased gradient estimation compared to full-dataset training. Furthermore, the analysis of this bias and its impact on final performance remains ambiguous. To address these challenges, we propose OrderDP, a plug-and-play framework that aims to obtain stable, unbiased, and near-lossless training acceleration with theoretical guarantees. Specifically, OrderDP first randomly selects a subset and then chooses the top-$q$ samples, where unbiasedness is established with respect to a surrogate loss. This ensures that OrderDP conducts unbiased training in terms of the surrogate objective. We further establish convergence and generalization analyses, elucidating how OrderDP affects optimal performance and enables well-controlled acceleration while ensuring guaranteed final performance. Empirically, we evaluate OrderDP against comprehensive baselines on CIFAR-10, CIFAR-100, and ImageNet-1K, demonstrating competitive accuracy, stable convergence, and exact control -- all with a simpler design and faster runtime, while reducing training cost by over 40%. Delivering both strong performance and computational efficiency, our method serves as a robust and easily adaptable tool for data-efficient learning. The code is publicly available at https://github.com/shengze-xu/OrderDP.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08574v1</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>International Conference on Learning Representations (ICLR), 2026</arxiv:journal_reference>
      <dc:creator>Chenhan Jin, Shengze Xu, Qingsong Wang, Fan Jia, Dingshuo Chen, Tieyong Zeng</dc:creator>
    </item>
    <item>
      <title>When Should Queries Be Decomposed? A Stage-Aware Study of Query Decomposition for Multi-Condition Retrieval</title>
      <link>https://arxiv.org/abs/2606.08577</link>
      <description>arXiv:2606.08577v1 Announce Type: new 
Abstract: Multi-condition retrieval requires systems to identify documents that satisfy multiple distinct constraints, moving beyond mere topical relevance. While query decomposition is widely adopted as an intuitive remedy, its effectiveness across different retrieval pipeline stages remains underexplored. In this paper, we conduct a stage-aware empirical study and uncover a stark, stage-dependent effect: decomposition during initial retrieval frequently harms retrieval performance due to semantic dilution, yet substantially improves reranking by enabling more fine-grained constraint verification. Motivated by these insights, we propose a principled Stage-Aware Decomposition framework that retains the monolithic query during initial retrieval to preserve global semantic context, while employing sub-queries exclusively during reranking for fine-grained constraint matching. Extensive evaluations on the MultiConIR and SSRB benchmarks demonstrate that our framework consistently improves ranking performance for compositional queries across multiple retrieval and reranking models. We release our code at https://github.com/EIT-NLP/Query-Decompose.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08577v1</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bochao Yin, Xuan Lu, Zhengyu Qi, Xiaoyu Shen</dc:creator>
    </item>
    <item>
      <title>Lost in the Non-convex Loss Landscape: How to Fine-tune the Large Time Series Model?</title>
      <link>https://arxiv.org/abs/2606.08578</link>
      <description>arXiv:2606.08578v1 Announce Type: new 
Abstract: Recently, large time series models (LTSMs) have gained increasing attention due to their similarities to large language models, including flexible context length, scalability, and task generality, outperforming advanced task-specific models. However, prior studies indicate that pre-trained LTSMs may exhibit a poorly conditioned non-convex loss landscape, leading to limited trainability. As a result, direct fine-tuning tends to cause overfitting and suboptimal performance, sometimes even worse than training from scratch, substantially diminishing the benefits of pre-training. To overcome this limitation, we propose Smoothed Full Fine-tuning (SFF), a novel fine-tuning technology. Specifically, we construct an auxiliary LTSM via random initialization to obtain a smoother loss landscape, and then linearly interpolate its weights with those of the pre-trained model to smooth the original landscape. This process improves trainability while preserving pre-trained knowledge, thereby enabling more effective downstream fine-tuning. From an optimization perspective, SFF perturbs sharp minima without significantly harming flat regions, facilitating escape from poor local basins toward smoother and more generalizable solutions. Extensive experiments on benchmark datasets demonstrate consistent improvements across eight representative LTSMs, including Timer, TimesFM, MOMENT, UniTS, MOIRAI, Chronos, TTMs, and Sundial, on diverse downstream tasks. The code is available at the link: https://github.com/Meteor-Stars/SFF.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08578v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xu Zhang, Peang Wang, Wei Wang</dc:creator>
    </item>
    <item>
      <title>A spectral audit framework reveals task-dependent aperiodic reliance across EEG and ECG deep learning</title>
      <link>https://arxiv.org/abs/2606.08583</link>
      <description>arXiv:2606.08583v1 Announce Type: new 
Abstract: Deep learning on physiological time series is interpreted through domain-specific features -- oscillatory rhythms in EEG, morphological complexes in ECG -- yet these signals sit atop a broadband aperiodic 1/f-like envelope that covaries with arousal, age, and pathology. We introduce a spectral audit framework combining aperiodic/periodic decomposition, phase-preserving Fourier interventions, sham controls, and simulation validation. Aperiodic reliance was task-dependent and architecture-general: across six neural architectures, flattening drops exceeded 0.42 balanced-accuracy points for sleep-wake classification, reached 0.07-0.13 for clinical abnormality detection, and remained minimal for motor imagery. Six of seven EEG foundation models showed FDR-significant aperiodic reliance on clinical EEG; age/sex and recording-era controls reduced but did not eliminate the effect. Applying the audit to PTB-XL ECG revealed neural drops of 0.32--0.36 persisting after demographic matching, confirming this confound class extends beyond EEG. Aperiodic controls should become standard for interpretable physiological time-series deep learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08583v1</guid>
      <category>cs.LG</category>
      <category>eess.SP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jasmeet Singh Bindra, Siddharth Panwar, Shubhajit Roy Chowdhury</dc:creator>
    </item>
    <item>
      <title>Convolutional Sparse Coding via the Locally Competitive Algorithm on Loihi 2</title>
      <link>https://arxiv.org/abs/2606.08584</link>
      <description>arXiv:2606.08584v1 Announce Type: new 
Abstract: Sparse coding provides a principled framework for signal representation by expressing an input as a linear combination of only a small number of basis functions. The Locally Competitive Algorithm (LCA) is particularly attractive in the context of neuromorphic computing because its dynamics, leaky integration, thresholding, and lateral inhibition map naturally to neuromorphic hardware. While prior work has studied non-convolutional LCA on Loihi 2, the convolutional setting is of particular interest because it introduces spatial structure, weight sharing, overlapping receptive fields, and scaling behavior that are more representative of practical sparse inference workloads. In this work, we present a Loihi 2 implementation of convolutional sparse coding via the LCA and evaluate it against a conventional GPU baseline on the same inference problems. The implementation follows a one-layer recurrent LCA formulation and extends it to convolutional feature maps with local inhibitory kernels derived from pairwise filter interactions. To the best of our knowledge, this is the first implementation and benchmark of convolutional LCA on Loihi 2. Our goal is not only to demonstrate feasibility, but also to clarify in which operating regimes convolutional sparse inference becomes attractive on neuromorphic hardware. The resulting study positions convolutional LCA as a useful benchmark for structured sparse inference on emerging neuromorphic systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08584v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Geoffrey Kasenbacher, Daniel Ruepp, Gerrit A. Ecke</dc:creator>
    </item>
    <item>
      <title>Novel physical property preserved methods for stochastic Schr\"{o}dinger--KdV equation</title>
      <link>https://arxiv.org/abs/2606.08585</link>
      <description>arXiv:2606.08585v1 Announce Type: new 
Abstract: In this work, we study the stochastic Schr\"odinger--KdV equation driven by additive noise from both analytical and numerical viewpoints. We first establish the evolution laws for the averaged plasmon number, momentum, and energy, together with the conservation of the averaged particle number. Motivated by these intrinsic structures, we develop two temporal discretizations. One is constructed based on the splitting strategy and Crank--Nicolson scheme, and is shown to preserve the discrete evolution laws of the averaged plasmon number and momentum, as well as the discrete conservation law of the averaged particle number. The other is proposed within the constant scalar auxiliary variable framework, in which the nonlinear energy functional is reformulated so that a modified averaged energy law can be preserved at the discrete level. Combining these temporal discretizations with a local discontinuous Galerkin approximation in space yields structure-preserving full discretizations inheriting the corresponding discrete physical laws. Numerical experiments are presented to validate the theoretical results and to demonstrate the accuracy, robustness, and effectiveness of the proposed methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08585v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ziheng Chen, Jialin Hong, Liying Sun</dc:creator>
    </item>
    <item>
      <title>LLM vs. Human Unit Tests: Fault Detection on Real Python Bugs</title>
      <link>https://arxiv.org/abs/2606.08588</link>
      <description>arXiv:2606.08588v1 Announce Type: new 
Abstract: Large language models (LLMs) have shown considerable promise for automated unit test generation, yet their practical effectiveness relative to human-written tests remains poorly understood. Existing evaluations commonly rely on coverage-oriented benchmarks that do not assess fault-detection capability directly. We present an empirical comparison of LLM-generated and human-written unit tests across three complementary Python benchmarks: 29 real historical bugs from BugsInPy, a function-level benchmark drawn from python-slugify and packaging, and a controlled paired benchmark. Our generation pipeline couples Gemini 2.5 Flash with a lightweight lexical retrieval mechanism that supplies bug-relevant context at generation time. Across eight quality dimensions, LLM-generated tests with retrieval-augmented context detect faults in 69% of cases compared to 17.2% for general-purpose human-written tests (Fisher's exact, $p &lt; 0.001$, Cohen's $h = 1.10$). Critically, line and branch coverage are nearly identical between the two approaches (84.8% vs. 88.5% and 75.2% vs. 82.1%), confirming that coverage is an insufficient proxy for fault-detection capability. We discuss the conditions under which each approach excels, characterize their complementary strengths, and identify the critical role of retrieval context and reproducible benchmark construction in meaningful test-quality evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08588v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Phouvadeth Vathana, Prapti Bhatt, Rishi Patel, Nasir U. Eisty</dc:creator>
    </item>
    <item>
      <title>Detection and Interpretability Analysis of Quotation Errors by Large Language Models</title>
      <link>https://arxiv.org/abs/2606.08589</link>
      <description>arXiv:2606.08589v1 Announce Type: new 
Abstract: Purpose - Quotation error refers to the inconsistency between cited information and its original source. This phenomenon leads to a series of negative impacts, such as misinterpretation of the original research, undermining the academic community's collective understanding of relevant issues, and weakening the accuracy and fairness of the citation-based academic evaluation system. Existing studies have shown that quotation error is prevalent in the academic community; moreover, manual verification of quotation error is not only labor-intensive but also inefficient. Therefore, this paper proposes the task of 'automated detection of quotation errors'. Methodology - Adopting a large language model (LLM)-based approach, this paper improves detection performance from two aspects on the basis of existing research: first, employ the fine-tuning approach for LLMs to detect quotation errors; second, incorporating full-text data of the cited literature into dataset construction, and exploring the optimal scheme for building such datasets by comparing three types of full-text integration methods. Based on this, this paper further uses the TokenSHAP tool to conduct interpretability experimental analysis on the model's prediction results. Findings - The fine-tuning approach for LLMs has improved the performance in detecting quotation errors. Among the different methods for incorporating full-text information, the approach based on using the source abstract yielded the best performance. Originality - The fine-tuning approach for large language models (LLMs) is applied to the task of automated detection of quotation errors, and interpretability analysis is conducted on the model's output results.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08589v1</guid>
      <category>cs.CL</category>
      <category>cs.DL</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1108/EL-11-2025-0464</arxiv:DOI>
      <arxiv:journal_reference>The Electronic Library, 2026</arxiv:journal_reference>
      <dc:creator>Bei Huang, Yingyi Zhang, Shenghao Huang, Chengzhi Zhang</dc:creator>
    </item>
    <item>
      <title>Auditable Graph-Guided Root Cause Analysis for Kubernetes Incidents</title>
      <link>https://arxiv.org/abs/2606.08590</link>
      <description>arXiv:2606.08590v1 Announce Type: new 
Abstract: Kubernetes incidents are diagnosed reliably only when a root-cause system's reported gains come from incident evidence rather than scenario-specific shortcuts. We present Graph Traversal Agent, a graph-guided RCA agent that combines LLM reasoning with specialized tools. The model reasons over a typed evidence graph, while deterministic graph and tool operations collect evidence, bound the search, and check proposed verdicts. We map operational constraints, including read-only evidence collection, propagation-aware diagnosis, bounded execution, and independently validated verdicts, to a typed incident graph, a LangGraph traversal state machine, and a separate validation stage. On ITBench snapshots scored by one fixed qwen-plus judge, the audited system raises root-cause-entity F1 over an earlier iteration of the same system from 0.6087 to 0.9130 on a 23-scenario common subset. A prompt-level ablation separates prompt-tuned gains from gains that survive once scenario-specific hints are removed: the stripped-prompt configuration retains 0.6958 F1 on a 19-scenario subset. The surviving gain concentrates on ChaosMesh scenarios whose ground-truth root cause is the injected fault object already present in the evidence graph, so we report it as benchmark-coupled rather than broad cross-cluster RCA evidence. Lightweight checks, including same-judge comparison, prompt-level ablation, cascade-source checking, and a telemetry no-leak test, mark claims as supported, pending, or out of scope. We scope the work to ITBench OpenTelemetry-demo snapshots. Live-cluster trials served as an engineering stress test, but alert state and trace availability did not stay stable enough for controlled scoring, so we make no production-readiness or mean-time-to-repair claim.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08590v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Anastasiia Kuvshinova, Seungmin Jin</dc:creator>
    </item>
    <item>
      <title>Quantum Global Variational Learning for Quantum Error Correction</title>
      <link>https://arxiv.org/abs/2606.08592</link>
      <description>arXiv:2606.08592v1 Announce Type: new 
Abstract: Efficient quantum error correction is essential for the advancement of quantum computing. We propose a quantum neural network with a global structure that reduces the number of unitary matrices required in quantum circuits. This approach resulted in a 97\% reduction in training time and up to a 25\% improvement in the training completion rate, ultimately achieving a 100\% success rate in training while surpassing the error correction performance reported in previous studies. In addition, we demonstrated the enhanced robustness of quantum error correction against internal network noise. Moreover, the fidelity of quantum error correction under internal network noise increased by up to 15\% due to the reduced computational load.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08592v1</guid>
      <category>cs.LG</category>
      <category>quant-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Shun Ryuzaki, Hideo Mukai</dc:creator>
    </item>
    <item>
      <title>How Much Capacity Does EEG Denoising Need? Ultra-Compact Networks reveal Benchmark Saturation and Metric-Utility Gap</title>
      <link>https://arxiv.org/abs/2606.08594</link>
      <description>arXiv:2606.08594v1 Announce Type: new 
Abstract: Deep learning EEG denoising architectures have scaled from tens of thousands to tens of millions of parameters, yet no prior study has isolated model capacity as the experimental variable or tested whether reconstruction metrics predict downstream neural-signal utility. We address both gaps by fixing architecture, loss, data split, and training recipe while sweeping only channel width from 1.05K to 40.26K parameters in a minimal depthwise-separable convolutional U-Net. Models were evaluated on the EEGDenoiseNet benchmark, cross-dataset BCI transfer tests, controlled baseline retraining, and downstream motor-imagery classification with five decoder families across all nine BCI Competition IV-2a subjects. Reconstruction performance saturated by 3-6.5K parameters, with post-elbow gains of at most 0.015 correlation coefficient per log10-parameter unit. An 8.46M-parameter baseline retrained under the same pipeline matched the 40.26K compact variant on EOG--a 200x parameter gap yielding no advantage--while a Patch-Transformer control reproduced the same diminishing-return shape. Downstream evaluation exposed a classifier-dependent metric-utility gap: reconstruction-optimized denoising significantly degraded CSP+LDA classification across all nine subjects and three artifact types (best denoised accuracy 0.547 vs. 0.612 noisy baseline; Bonferroni p=0.0488), persisting on naturally recorded trials (Delta=-0.047; BH-FDR q=0.0049). End-to-end neural decoders showed variable or neutral effects. Standard EEG denoising benchmarks are saturated far below current model capacity, and reconstruction metrics do not predict BCI utility. Ultra-compact models at 33-46 KB and 1.27-2.61M FLOPs/segment are practical for edge deployment. These findings argue for capacity-controlled evaluation, harder task-aware benchmarks, and mandatory downstream validation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08594v1</guid>
      <category>cs.LG</category>
      <category>eess.SP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jasmeet Singh Bindra, Siddharth Panwar, Shubhajit Roy Chowdhury</dc:creator>
    </item>
    <item>
      <title>Distilling LLM Reasoning into an Interpretable Policy Tree for Human-AI Collaboration</title>
      <link>https://arxiv.org/abs/2606.08596</link>
      <description>arXiv:2606.08596v1 Announce Type: new 
Abstract: Constructing efficient and reliable policies to assist humans is indispensable for human-AI collaboration. Existing methods mainly follow two lines of work. Most prior work relies on multi-agent reinforcement learning (MARL) to learn black-box policies, which limits interpretability and raises safety concerns. Recent methods query large language models (LLMs) at each decision step, causing slow responses and high inference costs. We propose Collaboration Policy Tree (Co-pi-tree), a closed-loop method that learns an executable policy tree consisting of a partner-behavior prediction tree and an agent-action selection tree. Co-pi-tree constructs a policy by distilling LLM reasoning into policy tree code. It then evaluates the policy through partner interaction, obtains feedback, and uses natural language to summarize the interaction feedback to improve problematic branches. Experiments in Overcooked-AI show that Co-pi-tree improves average reward by 35.4% over the baseline average, while reducing the number of LLM queries by 77.7% and test-time latency by 97.1%. Project page: https://beiwenzhang.github.io/Co-pi-tree/</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08596v1</guid>
      <category>cs.AI</category>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Beiwen Zhang, Yongheng Liang, Guowei Zou, Haitao Wang, Hejun Wu</dc:creator>
    </item>
    <item>
      <title>Kikuchi Graphs of Random Hypergraphs are Approximately Johnson</title>
      <link>https://arxiv.org/abs/2606.08597</link>
      <description>arXiv:2606.08597v1 Announce Type: new 
Abstract: We prove that level-$\ell$ Kikuchi graphs of random $2r$-uniform hypergraphs spectrally approximate the Kikuchi graph of the complete $2r$-uniform hypergraph at a sampling rate that is sharp up to a logarithmic factor, in the regime $r\leq \ell \leq n/2$. Our proof is based on the matrix Bernstein inequality, but, unlike prior works, we apply it to an appropriate collection of blocks of Johnson eigenspaces. Our analysis relies on a new, simple band-locality property for arbitrary Kikuchi graphs. As an application, we prove that the natural degree-$2\ell$ sum-of-squares relaxation for the Max $2r$-XOR problem is ``integral'' when the input is a planted noisy $2r$-XOR instance on a random hypergraph with $\gtrsim n \cdot (n/\ell)^{r-1} \log n$ hyperedges.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08597v1</guid>
      <category>cs.DS</category>
      <category>math.CO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pravesh K. Kothari</dc:creator>
    </item>
    <item>
      <title>InA-Probe: Instruction-Aware Active Probing for Time Series Forecasting with LLMs</title>
      <link>https://arxiv.org/abs/2606.08601</link>
      <description>arXiv:2606.08601v1 Announce Type: new 
Abstract: Large Language Models (LLMs) have recently demonstrated impressive potential for time series forecasting. However, existing methods predominantly rely on passive modality alignment or static task reprogramming, which often fail to capture fine-grained, non-stationary temporal patterns or to adapt to nuanced task intents. In this paper, we propose Instruction-aware Active Probing (InA-Probe), which shifts the paradigm from passive alignment toward an active, instruction-driven probing mechanism. Specifically, we design a Multi-Level Instruction Injection mechanism that enriches the model with both global task objectives and fine-grained, patch-level semantic priors. Building on this, an Adaptive Query Generation module produces sample-specific probes that are dynamically modulated by the temporal context. These probes are then refined through a dual-stage attention process: they first internalize task-specific intents via Instruction-Aware Self-Attention, and subsequently interrogate the projected temporal representations through Temporal Cross-Attention to extract salient patterns. Comprehensive experiments on seven real-world benchmarks show that InA-Probe consistently outperforms state-of-the-art deep learning and LLM-based baselines, excelling in both one-for-all generalization and zero-shot transfer while reducing forecasting error by up to 37\% in challenging cross-domain scenarios. Ablation studies further confirm that the synergy between adaptive querying and fine-grained instructions is key to unlocking the reasoning power of LLMs for complex time series.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08601v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Peiliang Gong, Emadeldeen Eldele, Chenyu Liu, Ziyu Jia, Yi Ding, Xinliang Zhou, Lianchao Gu, Qi Zhu, Yang Liu, Daoqiang Zhang, Xiaoli Li</dc:creator>
    </item>
    <item>
      <title>Reinforcement Learning for Flow-Matching Policies with Density Transport</title>
      <link>https://arxiv.org/abs/2606.08602</link>
      <description>arXiv:2606.08602v1 Announce Type: new 
Abstract: We present an online reinforcement learning (RL) algorithm for fine-tuning flow-matching policies in continuous-control problems. Our key insight is to view RL-based policy improvement as a transport of action densities towards regions of high reward, which naturally aligns with the transport formulation of flow matching models. Prior methods either approximate the current or optimal policy distribution or resort to distillation, which introduces biased gradients or sacrifices multimodal modeling capacity. In contrast, our approach for RL with Density Transport, which we name \emph{RLDT}, constructs a transport field from a maximum-entropy RL objective using Stein Variational Gradient Descent (SVGD). Then, it finetunes a pretrained flow matching policy to align with this field. Training with this alignment objective is nontrivial because flow-matching policies generate actions via a multi-step process, making direct gradient-based optimization challenging. To overcome this challenge and stabilize training, we approximate policy actions from intermediate denoising steps via expected-target estimation. This allows the transport-field update to propagate into the network parameters without unstable backpropagation through time. Experimental results demonstrate that RLDT outperforms competitive baselines in reward quality and convergence speed. This performance holds across diverse continuous-control tasks, encompassing both dense and sparse rewards, as well as state- and vision-based long-horizon robot manipulation. The project webpage is \href{https://rpfey.github.io/rldt/}{https://rpfey.github.io/rldt/}.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08602v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Boshu Lei, Kostas Daniilidis, Antonio Loquercio</dc:creator>
    </item>
    <item>
      <title>Gryphon: A Unified Architecture for Semantic-ID Generation and Item-Level Scoring in Industrial Recommendations</title>
      <link>https://arxiv.org/abs/2606.08604</link>
      <description>arXiv:2606.08604v1 Announce Type: new 
Abstract: Generative retrieval (GR) has become a scalable approach to candidate generation: each item is assigned a short hierarchical token sequence called a Semantic ID (SID), and the next item's SID is decoded autoregressively. A practical limitation is that the decoder's beam search optimizes the likelihood of token sequences, not the relevance of the underlying items. These objectives diverge when sequence likelihood is poorly calibrated due to beam search error accumulation, and when several items collapse onto a single SID and receive identical scores. We introduce Gryphon, an encoder-decoder generative recommendation architecture that adds a jointly trained item-level scoring component alongside SID generation, reusing the encoder's user representation computed in a single forward pass. Instead of ranking SIDs by accumulated token likelihood, Gryphon resolves each generated SID to its concrete items and re-scores those items directly, which sidesteps miscalibrated sequence scores and separates items that collide on the same identifier. On an industrial music service, with item-level scoring trained under a next-item-prediction objective, Gryphon attains the highest item-level Recall@1000, above the strongest baselines (+3.7% over vanilla GR and +2.5% over collision-resolved GR) at comparable parameter count and latency. Gryphon's item-level ranking also surpasses its beam-likelihood ranking of the same candidates (+4.2% gain), demonstrating the benefit of item-level scoring in GR. Deployed as the sole candidate source in a 7-day A/B test, Gryphon produced no statistically significant change in total listening time (+0.25%) while replacing a pipeline of more than 15 candidate generators and a separate preranking stage, substantially simplifying the candidate-generation system.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08604v1</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Daria Tikhonovich, Oleg Sorokin, Vladislav Dodonov, Mariia Ulianova, Ilya Murzin</dc:creator>
    </item>
    <item>
      <title>Multilingual Fact-Checking at Scale: Fine-Tuned Compact Models vs LLMs</title>
      <link>https://arxiv.org/abs/2606.08605</link>
      <description>arXiv:2606.08605v1 Announce Type: new 
Abstract: We present a multilingual fact-checking system deployed at Factiverse, designed for high-throughput and low-latency operation across diverse languages. The system follows a modular pipeline with three stages: claim detection, evidence retrieval and re-ranking, and veracity prediction. We fine-tune XLM-RoBERTa-Large for claim detection, mmBERT-base for three-label stance classification (Supports/Refutes/Mixed), and a SetFit-based multilingual re-ranker for claim--evidence matching. We compare these components against strong LLM baselines, including GPT-5.2, Claude Opus~4.6, and Qwen3-8b. Experiments on production data spanning 114 languages for claim detection and 28 languages for veracity prediction show that task-specific fine-tuning provides strong and stable multilingual performance, while the fine-tuned retrieval model remains competitive with modern proprietary embeddings. Same-hardware latency measurements further show large efficiency gains for encoder-based components, supporting their use in production deployments with tight cost and privacy constraints. Overall, compact fine-tuned, self-hosted models remain a practical and effective foundation for multilingual fact-checking at scale. Code and data used for this study are available at https://github.com/factiverse/factcheck-editor.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08605v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Pratuat Amatya, Vinay Setty</dc:creator>
    </item>
    <item>
      <title>HARBOR: A Harness Framework for Agentic Robot Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2606.08610</link>
      <description>arXiv:2606.08610v1 Announce Type: new 
Abstract: Reinforcement learning (RL) has become a powerful paradigm for robot learning, particularly in sim-to-real settings, but its broader adoption remains limited by the engineering pipeline surrounding the algorithms. Building tasks, shaping rewards, and tuning hyperparameters require substantial expert effort, making RL workflows costly and difficult to scale. We introduce HARBOR, an agentic framework that frames robot RL automation as a harness-engineering problem: given a simulator codebase and a task specification, it automates the workflow from environment setup to policy training in simulation. HARBOR decomposes such high-level objectives into bounded stages executed by specialized agents through standardized commands, persistent artifacts, executable gates, and reusable knowledge, and scales iteration via decentralized parallel trials and experience learning across runs. We evaluate HARBOR across 6 benchmarks and 16 tasks in total, spanning manipulation, locomotion, and bimanual dexterous control. We demonstrate that HARBOR automates the simulation RL workflow end-to-end, designs rewards, tunes algorithms to match or improve over default configurations, and reduces engineering effort at practical token and wall-clock cost; the resulting policies can also be transferred to real robots.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08610v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zechu Li, Yufeng Jin, Xiaoyang Liu, Puze Liu, Vignesh Prasad, Carlo D'Eramo, Georgia Chalvatzaki</dc:creator>
    </item>
    <item>
      <title>Bayesian Optimization of a Multi-Product Chemical Reactor Using Composite Models and Partial Physics Knowledge</title>
      <link>https://arxiv.org/abs/2606.08611</link>
      <description>arXiv:2606.08611v1 Announce Type: new 
Abstract: We study data-driven real-time economic optimization of a multi-product chemical reactor when no reliable first-principles model is available beyond a steady-state energy balance. Instead of learning the economic objective directly as a black-box function, we use a composite formulation in which Gaussian process (GP) models predict physically meaningful outputs, including product concentrations and reactor temperature, while profit is computed analytically from these predictions together with raw-material, product, and utility prices. This preserves the structure of the economic objective, makes it parametric in changing prices without needing retraining, and allows candidate operating points to be checked against the available energy balance through a physics residual. The GPs also provide predictive uncertainty, which is exploited in a Bayesian optimization (BO) framework both for data-efficient exploration and for conservative enforcement of the reactor temperature constraint through an upper confidence bound. The acquisition function additionally penalizes large energy-balance mismatch obtained by substituting the GP-predicted outputs and candidate inputs into the available steady-state energy balance. The approach is demonstrated on a benchmark simulation of a non-isothermal multi-product reactor. Relative to a trust-region safe BO implementation, the proposed method achieves better simulated economic performance within the available iteration budget. Relative to a purely data-driven BO approach that does not use the available physics information, it avoids reactor temperature constraint violations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08611v1</guid>
      <category>eess.SY</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Liqiu Dong, Marta Zag\'orowska, Mehmet Mercang\"oz</dc:creator>
    </item>
    <item>
      <title>Facial Expression Recognition in the Deep Learning Era: A Systematic Multi-Criteria Review of Methods, Models, Datasets, Performance, Challenges, and Future Research Directions</title>
      <link>https://arxiv.org/abs/2606.08612</link>
      <description>arXiv:2606.08612v1 Announce Type: new 
Abstract: Facial Expression Recognition (FER) has advanced rapidly over the last decade, driven by the shift from handcrafted descriptors and shallow classifiers to deep convolutional, attention-based, vision-language, and foundation-model architectures, and by the parallel growth of large-scale in-the-wild benchmarks spanning categorical, dimensional, compound, micro-expression, Action Unit (AU), and intensity-estimation tasks. Yet the deep learning-based FER landscape has so far been reviewed only along narrow task-, architecture-, or application-specific axes, leaving a holistic, systematically organized account of its recent advances missing. This survey addresses that gap with a comprehensive review of recent deep learning-based FER, explicitly linked to the wider Facial Affect Recognition (FAR) domain. Its main contributions are: a) A description of FER's evolution into five distinct phases, from handcrafted features and classical machine learning to attention-based, vision-language, and foundation-model approaches, with the key milestone works of each, b) A multi-criteria taxonomy analyzing the literature along seven complementary axes: recognition task, input modality, face pre-processing pipeline, network architecture, learning strategy, acquisition setting, and application domain, c) A per-criterion comparative analysis, with critical insights into the strengths and limitations of each category under in-the-wild conditions, d) A task-organized review of public FER datasets, with their annotation schemes, modalities, and evaluation protocols, e) A compilation of performance metrics and a per-task quantitative comparison of representative state-of-the-art methods on widely adopted benchmarks, and f) A discussion of current challenges and promising future directions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08612v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Spyridon Georgiou, Aggelos Psiris, Spyridon Evangelatos, Thomas Lagkas, Vasileios Argyriou, Panagiotis Sarigiannidis, Iraklis Varlamis, Georgios Th. Papadopoulos</dc:creator>
    </item>
    <item>
      <title>Harnessing Streaming Video in the Wild</title>
      <link>https://arxiv.org/abs/2606.08615</link>
      <description>arXiv:2606.08615v1 Announce Type: new 
Abstract: Vision-Language Models (VLMs) are increasingly required to process unbounded video streams in applications such as video-call assistants, live commentary, and embodied robots. An ideal streaming system should support proactive interaction, long-horizon memory, and real-time processing, while resting on a VLM backbone capable of handling diverse in-the-wild streaming tasks. However, existing VLMs excel at offline video understanding but fall short in streaming capabilities and lack dedicated infrastructure for streaming deployment. We address this gap on three fronts. (i) For backbone capability, we construct \textbf{Streaming-Train-248K}, a streaming dataset paired with a novel training objective for adapting VLMs to streaming interaction and understanding. (ii) For real-world deployment, we introduce \textbf{Streaming Harness}, a plug-and-play system that endows any VLM with three core abilities: proactive interaction (per-second response decisions), long-term memory (12-hour context retention), and real-time processing (sub-second latency). (iii) To drive continued community progress on streaming capabilities, we design \textbf{Streaming-Eval}, a benchmark that reflects models' capabilities across diverse in-the-wild scenarios. Extensive experiments demonstrate consistent gains from our approach across all core capabilities required for streaming video understanding. We will open-source our data, code, and benchmark to advance the community's shift from offline video understanding to deployable streaming intelligence.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08615v1</guid>
      <category>cs.CV</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dingyu Yao, Shuhuan Gu, Qingyi Si, Junhao Zhou, Chenxu Yang, Chuanyu Qin, Naibin Gu, Zheng Lin, Weiping Wang, Nan Duan, Jiaqi Wang</dc:creator>
    </item>
    <item>
      <title>Cross-Source Reasoning-based Correction for Author Name Disambiguation</title>
      <link>https://arxiv.org/abs/2606.08617</link>
      <description>arXiv:2606.08617v1 Announce Type: new 
Abstract: Author name disambiguation is a critical challenge in academic search systems, often addressed through from-scratch and real-time disambiguation approaches. However, current algorithms remain vulnerable to cumulative errors of paper-author assignments and overlook inconsistent assignments across different sources. Resorting to expert annotation is resource-intensive. To this end, this paper explores a new perspective for author name disambiguation: cross-source correction by leveraging inconsistent assignments across sources. We propose CrossND, a full-stack framework that integrates data refinement, cross-source reasoning, and test-time scaling. First, a chain-of-refinement pipeline denoises author profiles and produces more accurate paper-author matching probabilities. Second, a supervised fine-tuning process incorporates these refined signals and a probabilistic soft logic-based cross-correction module to infer the assignments of which sources are incorrect. Third, test-time scaling further enhances the accuracy and robustness of the predictions. Experiments on real-world datasets indicate that CrossND consistently outperforms 17 baselines by leveraging cross-source reasoning without human intervention.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08617v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3770855.3818347</arxiv:DOI>
      <dc:creator>Fanjin Zhang, Yunhe Pang, Bo Chen, Zhiyu Shen, Yanghui Rao, Evgeny Kharlamov, Jie Tang</dc:creator>
    </item>
    <item>
      <title>SPA: A SQL-Plan-Aware Reinforcement Learning Framework for Query Rewriting with LLMs</title>
      <link>https://arxiv.org/abs/2606.08620</link>
      <description>arXiv:2606.08620v1 Announce Type: new 
Abstract: SQL query rewriting is a well-established technique for improving database performance without schema or index changes, yet finding effective rewrites for modern analytical workloads remains difficult: rule-based methods are limited to predefined transformations, while LLM-based approaches often produce rewrites that are semantically valid but compile to equivalent physical plans or degrade runtime performance. We present SPA, a SQL-Plan-Aware reinforcement learning framework that trains LLMs to rewrite queries using physical execution feedback. SPA formulates rewriting as a policy optimization problem and extends GRPO with rewards spanning semantic equivalence, textual rewrite distance, physical-plan divergence, and runtime speedup. To handle reward sparsity across query difficulty, SPA introduces Probability-Gated Adaptive Reward Shaping, a query-level curriculum that unlocks higher-level rewards only once a rollout group achieves sufficient mastery of lower-level objectives, and further improves sample efficiency through on-policy self-improvement by recycling slowdown rewrites from the current policy as targeted training signals. On both IID and OOD workloads, SPA outperforms rule-based and strong LLM baselines in end-to-end runtime, substantially reduces harmful slowdown rewrites, and yields strong tail-latency gains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08620v1</guid>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Xinyi Huang, Zhengjie Miao</dc:creator>
    </item>
    <item>
      <title>Strategyproof Mechanisms for Euclidean Facility Location Problems under $L_p$-norm Social Cost</title>
      <link>https://arxiv.org/abs/2606.08621</link>
      <description>arXiv:2606.08621v1 Announce Type: new 
Abstract: We study strategyproof mechanisms for eliciting agents' location preferences truthfully in the Euclidean plane $\mathbb R^2$ and locating a facility so as to minimize the $L_p$-norm social cost, defined as the $L_p$-norm of the vector of distances from the facility to the agents' preferred locations, for any $p \ge 1$. While the cases $p=1$ and $p=\infty$ have been well-studied, open questions remain about the optimal approximation ratios achievable by strategyproof mechanisms for general $p$.
  Our first result resolves an open question of Goel and Hann-Caruthers [Soc. Choice Welf. 2023]. They showed that the coordinate-wise median (CM) mechanism achieves an approximation ratio lying between \(2^{1-\frac{1}{p}}\) and \(2^{\frac{3}{2}-\frac{2}{p}}\) for $p\ge 2$, and they conjectured that it is exactly \(2^{1-\frac{1}{p}}\). We confirm this conjecture, and we further show that CM has a tight $\sqrt 2$-approximation for $1\le p\le 2$.
  Our second and third results demonstrate that two randomized mechanisms can yield better approximation ratios. In particular, we first consider the uniformly rotated coordinate-wise median (URCM) mechanism, and prove that, for \(1\le p&lt;2\), its approximation ratio strictly improves over the deterministic bound \(\sqrt{2}\), while no such improvement is possible for $p\ge 2$. We then study the centroid random dictatorship mechanism that returns the average location (i.e., centroid) and the random dictatorship each with half probability, and show that its approximation ratio strictly improves over CM and URCM for every finite \(p\gtrsim 1.6\). Moreover, our analysis independently recovers the classical deterministic and randomized results for $p=1$ [Meir, SAGT 2019] [Barak, EC 2026] and $p=\infty$ [Goel and Hann-Caruthers, SCW 2023] [Tang et al., EC 2020] using significantly different techniques.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08621v1</guid>
      <category>cs.GT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hau Chan, Jianan Lin, Chenhao Wang</dc:creator>
    </item>
    <item>
      <title>From Holistic Evaluation to Structured Criteria: Rubrics Across the Evolving LLM Landscape</title>
      <link>https://arxiv.org/abs/2606.08625</link>
      <description>arXiv:2606.08625v1 Announce Type: new 
Abstract: As Large Language Models (LLMs) advance toward open-ended autonomous agents, the mechanisms used to evaluate and guide their behavior must evolve accordingly. This work introduces the rubric as a unifying framework capturing this evolution, characterizing rubrics as a dynamic response to successive LLM paradigm shifts that recurs across otherwise independent efforts in evaluation, reinforcement learning, and safety alignment. We define rubrics as explicit criteria sets that transform complex quality judgments into structured and actionable standards, and demonstrate that their recurrence across these research threads is not coincidental. We systematically organize existing rubric designs, examine their construction and optimization, and analyze their role across evaluation and training. Rubrics manifest at three progressively deeper levels: at the evaluative level, they decompose holistic judgments into verifiable dimensions; at the training level, they serve as dense feedback signals providing process-level guidance where scalar rewards fall short; at the intrinsic level, they emerge dynamically from model behaviors, driving self-improvement. We further assess rubric reliability across generation quality, execution fidelity, theoretical constraints, and security threats, before surveying rubric-based benchmarks across diverse domains. By rendering assessment transparent and decomposable, rubrics translate human value expectations into machine-learnable signals, serving as the enduring bridge between human intentions and machine behavior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08625v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hao Chen, Ziyu Han, Yukun Yan, Qingfu Zhu, Maosong Sun, Wanxiang Che</dc:creator>
    </item>
    <item>
      <title>Sycophancy Towards Researchers Drives Performative Misalignment</title>
      <link>https://arxiv.org/abs/2606.08629</link>
      <description>arXiv:2606.08629v1 Announce Type: new 
Abstract: The increasing situational awareness of language models raises safety concerns: models might be aware when they are evaluated, and adjust their behavior to evade monitoring and resist modification, e.g., pretending to be aligned only in evaluation. This alignment faking behavior is often interpreted as scheming: an intentional effort of strategic deception. In this paper, we examine an alternative interpretation, performative misalignment, which explains the change in behavior as a result of sycophancy towards AI researchers. To examine this hypothesis, we present three empirical findings. First, we show that evaluation awareness persists even when we tell models they are deployed, which contradicts the scheming story which predicts less misalignment when the model perceives evaluation. Second, we use probing and steering to show that our current methods cannot mechanistically distinguish sycophancy and scheming in alignment faking evaluations. Third, we fine-tune models to be more sycophantic and observe increased sensitivity to evaluation cues. To conclude, we emphasize deconfounding sycophancy from scheming for future work on evaluations and mitigations of intent misalignment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08629v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>David D. Baek, Xinnuo Li, Anay Gupta, Taslim Mahbub, Kejian Shi, Max Tegmark, Shi Feng</dc:creator>
    </item>
    <item>
      <title>Tyan-WP: A Wind Power Foundation Model for Ultra-Short-Term Probabilistic Forecasting</title>
      <link>https://arxiv.org/abs/2606.08630</link>
      <description>arXiv:2606.08630v1 Announce Type: new 
Abstract: Global wind power capacity, especially in China, is booming, with new farms spanning diverse terrains and climates. The industry urgently needs accurate wind power foundation models to shorten commissioning and accelerate grid connection. This is because site-specific time series models (TSMs) are not well suited to data-scarce scenarios and generalize poorly, while generic large time series models (LTSMs) are mostly limited to univariate inputs and cannot fully exploit static site attributes or the dependencies between power and meteorological covariates, leading to insufficient accuracy. To fill this gap, we propose \textbf{Tyan-WP}, the first wind power foundation model for ultra-short-term probabilistic forecasting. Pretrained on a large-scale wind power dataset covering more than 126,000 U.S. sites over seven years, Tyan-WP further improves zero-shot forecasting through two domain-specific module designs: static site embedding using coordinate, terrain, and ecoregion metadata, and a power-aware meteorological fusion (PAMF) module that models interactions between historical power and meteorological covariates. Under a unified evaluation protocol, Tyan-WP surpasses eight site-specific supervised TSMs on 10 in-domain sites and outperforms eleven generic LTSMs on 127 in-domain sites, reducing MAE by 19.9%, RMSE by 16.6%, CRPS by 22.2%, and AQL by 21.7%, while raising R^2 by 16.7%. It further demonstrates strong cross-geography generalization on six real U.K. sites. These results show that the wind power foundation model can achieve accurate zero-shot forecasting without target-site training, providing a practical pathway for rapid turbine onboarding and probabilistic risk management at new wind farms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08630v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiahui Huang, Ao Luo, Lei Liu, Hongwei Zhao, Tengyuan Liu, Ruibo Guo, Bo Wang, Zhao Wang, Bin Li</dc:creator>
    </item>
    <item>
      <title>xSense Design Cards: Guiding the Design of Multisensory Experiences</title>
      <link>https://arxiv.org/abs/2606.08632</link>
      <description>arXiv:2606.08632v1 Announce Type: new 
Abstract: Designing multisensory experiences involves the deliberate combination of sensory elements to shape specific impressions for a given audience. Advances in technologies beyond audiovisual modalities now make it feasible to design across touch, taste, smell, and more. However, HCI still lacks the tools and shared vocabulary needed to systematically create and evaluate such experiences. The xSense Design Cards address this gap with four card types: (1) Experience Cards define purpose, context, and audience; (2) Sensory Cards break down multisensory concepts into elements and events; (3) Technology Cards prompt consideration of relevant technologies; and (4) Exploration Cards guide reflection on the broader context, including responsible innovation. This work introduces the cards and their theoretical grounding, showing how they support structured design, reflection, and evaluation of an experience's multisensory composition. By presenting xSense, we aim to broaden the vocabulary for multisensory design and stimulate discussion within the growing multisensory HCI community.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08632v1</guid>
      <category>cs.ET</category>
      <category>cs.MM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Ceylan Be\c{s}evli, Carlos Velasco, Marianna Obrist</dc:creator>
    </item>
    <item>
      <title>Towards Long-Horizon Vessel Trajectory and Destination Forecasting with Reasoning Large Language Models</title>
      <link>https://arxiv.org/abs/2606.08633</link>
      <description>arXiv:2606.08633v1 Announce Type: new 
Abstract: Long-horizon maritime trajectory prediction is important for shipping management, logistics planning, and maritime risk analysis, yet month-level forecasting remains insufficiently studied. Existing deep learning methods mainly focus on short- and mid-term coordinate extrapolation and often struggle to preserve route feasibility and destination correctness over extended horizons. This paper investigates joint long-horizon vessel trajectory and destination forecasting with reasoning-capable large language models, and develops a Maritime LLM post-training framework based on Reinforcement Learning with Verifiable Reward (RLVR). An AIS-based benchmark is constructed with 60-day historical trajectories and 30-day forecasting horizons, where trajectories are converted into semantic textual representations for RL prompt construction. RLVR aligns LLMs with maritime forecasting objectives by enforcing physical validity, providing early-weighted trajectory supervision, and evaluating destination correctness through hierarchical matching and curriculum learning. Experimental results show that RLVR-trained LLMs substantially improve over zero-shot LLMs and representative deep learning baselines, especially on destination-related metrics. Among the evaluated RLVR-trained variants, 4B LLMs achieve the best overall performance, suggesting that reward-compatible optimization and task-specific capacity matching are more important than simply using larger 8B or 14B LLMs. The results also show that LSTM remains a strong deep learning baseline under limited fine-tuning data, while Transformer-style spatio-temporal models typically require larger datasets and richer structured inputs. Overall, this work advances semantic, verifier-aligned maritime forecasting for operational decision support.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08633v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hongwei Wang, Miao Zhou, Fengde Wang, Yuting Wang, Jiewen Yu, Jun-Yan He, Bohao Qu, Wanbing Zhang, Xiuju Fu, Qing Guo, Zipei Fan, Yingying Xing, Yi Yuan</dc:creator>
    </item>
    <item>
      <title>SSAFE: Simple and Strong AI-Generated Image Detection via Frozen Vision Encoders</title>
      <link>https://arxiv.org/abs/2606.08634</link>
      <description>arXiv:2606.08634v1 Announce Type: new 
Abstract: The rapid advancement of generative models has blurred the boundary between synthetic and real imagery, creating an urgent need for reliable deepfake detection. Yet most existing approaches rely on massive real--fake datasets, which are increasingly difficult to maintain as new generators continue to emerge. In this work, we investigate how much information about image authenticity is already encoded in modern multimodal vision representations. We find that frozen multimodal encoders naturally separate real and synthetic images in their embedding space, enabling a simple linear classifier to achieve strong performance without task-specific fine-tuning. Motivated by this observation, we develop a representation-aware data curation strategy that selects a compact set of representative generators for training. The resulting training set contains only 10K images, compared to 288K in AIGIBench and 4M in OpenFake, while improving robustness to unseen generators and distribution shifts. We additionally introduce RealWorldBench, a benchmark consisting of modern camera photographs, contemporary stock images, and outputs from recent commercial generators. Experiments across multiple benchmarks show that combining frozen multimodal representations with carefully curated training data provides a simple and effective approach to AI-generated image detection.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08634v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Seunghyun Lee, Byoungkwon Kim, Jaehyun Nam, Kyungmin Lee, Jinwoo Shin</dc:creator>
    </item>
    <item>
      <title>SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving</title>
      <link>https://arxiv.org/abs/2606.08635</link>
      <description>arXiv:2606.08635v1 Announce Type: new 
Abstract: Prefill-decode (PD) disaggregation decouples prompt processing from token generation, but it also turns the key-value (KV) cache into a network payload. Existing PD-side KV reduction methods are mostly binary: selected tokens are transmitted at full precision and the rest are not transmitted. This paper argues that binary selection leaves a useful design space unused. SpectrumKV assigns a precision level to each token instead: attention sinks and other high-importance tokens are protected at FP16, medium-importance tokens are sent at INT8, and low-importance tokens are sent at INT4 when the model can tolerate it. The main practical complication is that INT4 tolerance is model-dependent. Qwen2.5-7B catastrophically fails under INT4 KV quantization, while Mistral-7B and Gemma-2-9B remain stable. SpectrumKV therefore runs a lightweight deployment-time probe: three aggressive NIAH trials under a 3-tier policy. Models that pass use FP16+INT8+INT4; models that fail fall back to FP16+INT8. Across Qwen2.5-7B-Instruct, Mistral-7B-Instruct-v0.3, and Gemma-2-9B-it, SpectrumKV improves quality at the same transfer budget. At a 50% normalized KV budget on WikiText-2, SpectrumKV changes perplexity by +1.97%,-0.06%, and-0.44%, respectively, compared with PDTrim's +25.85%, +22.07%, and +35.63%. On NIAH retrieval at 4096 tokens, the adaptive policy reaches 52.6% on Qwen at the aggressive b=0.3 budget versus 26.3% for PDTrim, and reaches 100% by b=0.5; Mistral and Gemma preserve retrieval under the 3-tier policy. End-to-end GPU timing of the transfer path shows 50-62% TTFT reductions at b=0.5. These results suggest that PD KV transfer should be treated as a precision-allocation problem, not only as token pruning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08635v1</guid>
      <category>cs.LG</category>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yang Pengju</dc:creator>
    </item>
    <item>
      <title>Cooperative Guidance and Control for Active Asset Protection with Time-Varying Agent Speeds</title>
      <link>https://arxiv.org/abs/2606.08636</link>
      <description>arXiv:2606.08636v1 Announce Type: new 
Abstract: Protecting an asset against threats is a challenging problem in an era of continuously evolving intelligent attacks. This requires cooperation between the asset and the defender to share information and jointly maneuver. To address this problem, this work proposes a cooperative guidance and control strategy for active asset protection against a maneuvering threat. This work develops a joint maneuver strategy where both the defender and the asset coordinate their time-varying speeds and courses to neutralize/capture the attacker. The control strategy is formulated around three coupled geometric and temporal objectives. The first objective is to set the line-of-sight rate between the asset and the attacker to zero, putting the attacker on a collision course and reducing their maneuvering. The second objective is to maintain the defender on the line-of-sight between the asset and the attacker. This ensures that the attacker faces the defender first before reaching the vicinity of the asset. Lastly, the defender is also guided to pursue the attacker based on the time-to-go estimates between the defender and the attacker. While keeping these objectives in mind, the control actions for the asset and the defender are jointly designed, fostering cooperation between the two. The stability of the proposed strategy is established using a Lyapunov-based approach. Numerical simulations performed show the effectiveness of the proposed cooperative strategy in ensuring the successful capture of a maneuvering threat.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08636v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Ram Milan Kumar Verma, Shashi Ranjan Kumar, Hemendra Arya</dc:creator>
    </item>
    <item>
      <title>Learnable Token Sparsification for Efficient Gigapixel Whole Slide Image Reasoning</title>
      <link>https://arxiv.org/abs/2606.08641</link>
      <description>arXiv:2606.08641v1 Announce Type: new 
Abstract: The processing of gigapixel whole slide images within vision language models faces a major difficulty due to an excessive number of visual tokens. Existing solutions typically rely on spatial downsampling or heuristic pruning strategies that operate without training, and these methods often discard subtle but clinically meaningful patterns because pathological evidence is scattered irregularly across the tissue. To overcome this limitation, we reformulate token reduction in whole slide images as a trainable sparsification problem, allowing the model to learn an optimal selection strategy instead of following fixed heuristics. We propose a decoupled routing architecture. To enable gradient propagation through the nondifferentiable pruning operation during training, we introduce a component called SparseLearn. This component uses a variance-preserving noise gate that regulates the information flow of each patch via a differentiable Soft Top-K operator, together with a diagonal attention denoiser that recovers perturbed representations without leaking spatial information. At inference time, the SparseLearn module is entirely discarded, and the trained scorer applies a deterministic Hard Top-K operator to keep only the highest scoring 32 tokens, incurring no extra computation. By compressing the visual sequence down to a sparse set of just 32 tokens, which represents as little as 0.78% of the original length, our framework achieves 73.32% overall accuracy on SlideBench (TCGA), consistently surpassing sampling-based baselines and general-purpose vision language models. It also demonstrates strong zero shot generalization on SlideBench (BCNB) and WSI VQA*. By resolving the visual context bottleneck and preventing the dilution of sparse diagnostic evidence, this work provides a highly efficient paradigm for end to end gigapixel whole slide image reasoning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08641v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Jingzhi Chen, Landi He, Zhuo Chen, Shawn Young, Lijian Xu</dc:creator>
    </item>
    <item>
      <title>A retrieval conditioned rebinding circuit for dynamic entity tracking in large language models</title>
      <link>https://arxiv.org/abs/2606.08644</link>
      <description>arXiv:2606.08644v1 Announce Type: new 
Abstract: To interpret context correctly and retrieve relevant information, large language models must bind entities to their attributes and update these bindings as state changes. We analyze how LLMs implement this binding process in a dynamic state tracking. Using causal interventions, we identify a retrieval conditioned rebinding mechanism, a compact attention head circuit that encodes swap relevant binding information and reinstates it at readout. Across Gemma and Llama models, this circuit supports rebinding behavior, but the representational signature of the mechanism differs across model families. In Gemma models, the binding signature is clearly expressed in the query/key subspaces of the relevant attention heads, whereas in Llama models, the binding information is carried primarily in key vectors. Overall, our results reveal an interpretable mechanism for context dependent state tracking in LLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08644v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Soyoung Oh, Vera Demberg</dc:creator>
    </item>
    <item>
      <title>The Arithmetic Circuit Combinatorial Nullstellensatz is NP-hard</title>
      <link>https://arxiv.org/abs/2606.08646</link>
      <description>arXiv:2606.08646v1 Announce Type: new 
Abstract: A multivariate polynomial on $n$ variables $x_1,\ldots,x_n$ of total degree $n$ over $\mathbf{Z}_2$ containing the multilinear monomial $\prod_{i=1}^n x_i$ is by the combinatorial nullstellensatz [Alon, Comb. Probab. Comput., 1999] known to always have a nonroot. We show that there cannot be a randomised polynomial time algorithm that given an arithmetic circuit of polynomial size formally computing such a polynomial, locates a nonroot with constant nonzero probability unless RP=NP. The result holds even when the individual degree of every variable in the input polynomial is at most two.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08646v1</guid>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Andreas Bj\"orklund</dc:creator>
    </item>
    <item>
      <title>Sample-Efficient LLM-Based Detection of Malicious Web Server Logs with Forensically Explainable Reasoning</title>
      <link>https://arxiv.org/abs/2606.08649</link>
      <description>arXiv:2606.08649v1 Announce Type: new 
Abstract: Forensic analysis of web server logs demands both accurate detection and human-readable explanations that can satisfy legal requirements. We present CEF-Log, a context-enhanced few-shot chain-of-thought prompting strategy for Large Language Models that addresses this dual requirement. CEF-Log embeds expert investigative methodology through a structured five-step reasoning template, enabling the model to learn \textit{how} to analyze logs rather than \textit{what} patterns to memorize. Experimental evaluation demonstrates that CEF-Log achieves an F1-score of 0.99 on the CSIC 2010 dataset using only four examples while providing a $10\times$ improvement in sample efficiency compared to other prompting-based methods. We also introduce ForenWebLog, a new dataset that incorporates real-world attacks and multi-step attack sequences for comprehensive evaluation. Qualitative analysis confirms that CEF-Log generates traceable, accurate explanations suitable for forensic documentation, addressing the critical "black-box" limitation of traditional machine learning approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08649v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Bernhard Kneip, Nhien-An Le-Khac, Hong-Hanh Nguyen-Le</dc:creator>
    </item>
    <item>
      <title>FiberTune: Preserving Action-Fiber Visual Residuals in Vision-Language-Action Fine-Tuning</title>
      <link>https://arxiv.org/abs/2606.08653</link>
      <description>arXiv:2606.08653v1 Announce Type: new 
Abstract: Action-supervised fine-tuning of vision-language-action (VLA) policies fits demonstrations effectively but constrains only the directions that change predicted actions, leaving visual structure consistent across action-equivalent states free to collapse. We formalize this as residual visual collapse along local action fibers and propose FiberTune, a training-time objective that preserves teacher-structured visual residuals without adding inference-time overhead. FiberTune uses an online action probe to estimate action-predictive feature directions, filters them from intermediate visual-token representations, and aligns the resulting probe-filtered residuals to a frozen visual teacher while regularizing their effective rank. Under identical training conditions, FiberTune improves over task-loss-only fine-tuning in every one of six controlled simulation settings spanning two benchmarks and two architectures (pi_0.5 and OpenVLA-OFT), as well as on physical SO-101 pick-place; representative gains include +10.7 percentage points SR(5) on long-horizon CALVIN ABC-to-D and physical SO-101 task success rising from 72.7% to 78.1%. Residual diagnostics show that these gains coincide with increased probe-filtered residual teacher alignment and effective rank, consistent with the action-fiber motivation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08653v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haihao Lin, Xiangsheng Huang, Xiao Yang, Weibang Zhou, Yiqi Zhang, Bo Yang, Simin Zeng, Jiawei Yang, Zhengyang Wang, Jiahui Du</dc:creator>
    </item>
    <item>
      <title>Operator learning for the 2D incompressible Navier-Stokes equations: a conformal prediction approach in the data-scarce regime</title>
      <link>https://arxiv.org/abs/2606.08654</link>
      <description>arXiv:2606.08654v1 Announce Type: new 
Abstract: In this paper, we propose a perturbation-based conformal prediction framework for uncertainty quantification in operator learning, with a focus on the 2D Navier--Stokes equations. While neural operators provide fast surrogates for expensive PDE solvers, they do not by themselves provide calibrated uncertainty for spatiotemporal field predictions. Our approach wraps a trained Fourier Neural Operator (FNO) with split conformal prediction and constructs the local uncertainty scale by comparing the predictions of two operators trained on nearly identical datasets: one on the original labels and one on labels perturbed by small Gaussian noise. We consider this procedure in the data-scarce regime, where the total label budget is fixed and methods that require a separate uncertainty network must divide training data between multiple models. On the 2D Navier--Stokes benchmark, the perturbation-based method produces substantially narrower conformal bands than existing methods under matched total data budgets while maintaining the target simultaneous coverage. These results suggest that perturbation sensitivity is a practical and sample-efficient uncertainty proxy for conformalized neural operators.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08654v1</guid>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>math.AP</category>
      <category>math.NA</category>
      <category>stat.AP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Weinan Wang, Bowen Gang, Hao Deng</dc:creator>
    </item>
    <item>
      <title>PhysGraph: A Physics-aware 3D Scene Graph for Perception and Reasoning</title>
      <link>https://arxiv.org/abs/2606.08655</link>
      <description>arXiv:2606.08655v1 Announce Type: new 
Abstract: To perform a wide range of daily tasks, robots need to construct a 3D representation that is semantically rich, physically grounded, and structured enough to support task planning and affordance prediction. However, existing approaches primarily focus on semantic retrieval, often overlooking physical and kinematic factors. Methods that attempt to model physical properties typically rely on narrow training sets or single-object modeling, limiting scalability and generalization across diverse object types. To address these challenges, we present PhysGraph, a framework that unifies symbolic reasoning with structured 3D geometry to model kinematic and physical properties in cluttered scenes. Given RGB-D observations, PhysGraph reconstructs object-centric 3D geometry and associates object instances across views. It then decomposes objects into functional parts and infers materials and articulations through visual reasoning. Evaluated on both synthetic and real-world datasets, PhysGraph achieves state-of-the-art results in semantic segmentation, multi-object mass estimation, and articulation prediction. With its simple yet effective design, PhysGraph produces physically consistent and semantically structured scene graphs, serving as a structured 3D representation for downstream tasks such as constraint-aware 3D affordance prediction and real-to-sim transfer, both of which are demonstrated in our experiments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08655v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haoyu Li, Aaron Thomas, Shuyan Zhou, Xianyi Cheng</dc:creator>
    </item>
    <item>
      <title>From Player to Master: Enhancing Test-Time Learning of LLM Agents via Reinforcement Learning over Memory</title>
      <link>https://arxiv.org/abs/2606.08656</link>
      <description>arXiv:2606.08656v1 Announce Type: new 
Abstract: Large language model (LLM) agents are increasingly deployed in long-running settings where improving through experience at test time becomes important. A common approach is to update an explicit memory after each interaction to guide future decisions. However, most existing methods rely on hand-designed prompting rules, making it difficult to align memory updates with downstream objectives over multi-step horizons consistently. We propose MemoPilot, a plug-in memory copilot that explicitly trains the memory update process to improve a frozen LLM's performance across sequential interactions. We formulate memory updating as a multi-turn decision problem and optimize it end-to-end with multi-turn GRPO. Our training recipe introduces (i) a turn-wise reward signal and (ii) a context-independent, turn-level advantage estimation across rollouts, enabling finer-grained credit assignment and more stable training in multi-turn settings. We evaluate MemoPilot on two testbeds: multi-round Rock-Paper-Scissors (RPS) and Limit Texas Hold'em (LHE). Across both environments, MemoPilot substantially improves test-time learning of a frozen player over strong baselines, ranking first in Elo ratings on both games (1762 on LHE and 1590 on RPS) and outperforming all baseline memory methods and proprietary models, including DeepSeek-V3.2.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08656v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yishuo Cai, Xingyu Guo, Xuancheng Huang, Jinhua Du, Can Huang, Wenxuan Huang, Wenhan Ma, Yuyang Hu, Aohan Zeng, Jie Tang, Xu Sun</dc:creator>
    </item>
    <item>
      <title>Latent Diffusion Policy: Shaping Latent Spaces for Diffusion-Based Robotic Manipulation</title>
      <link>https://arxiv.org/abs/2606.08657</link>
      <description>arXiv:2606.08657v1 Announce Type: new 
Abstract: Diffusion-based visuomotor policies operating directly in raw action spaces conflate scene comprehension with trajectory generation within a single denoising process. The resulting velocity field must simultaneously encode scene information and generate precise trajectories, increasing learning complexity and limiting performance on tasks demanding precise temporal coordination across multiple arms. To simplify this joint learning problem, we introduce Latent Diffusion Policy (LDP), a two-stage framework performing flow matching in a deliberately shaped latent space. By absorbing scene understanding into an observation-conditioned CVAE encoder, LDP concentrates the conditional distribution of each observation. Consequently, the flow model avoids implicitly resolving scene-dependent structures; instead, it generates within a pre-concentrated distribution featuring a smoother velocity field, simplifying learning from limited demonstrations. Furthermore, to capture temporal dependencies among latent tokens, LDP trains with per-token diffusion forcing and employs staircase inference sampling to resolve the resulting distributional mismatch. We also propose reconstruction FID (rFID) as a lightweight proxy predicting downstream task success solely from latent space statistics. On coordination-intensive tasks from RoboTwin 2.0, LDP outperforms DP3 by a substantial margin and transfers effectively to real-world bimanual deployments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08657v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhexuan Zhou, Yichen Lai, Jinhao Zhang, Huizhe Li, Youmin Gong, Jie Mei</dc:creator>
    </item>
    <item>
      <title>Extending Ontologies: From Dense Embeddings to Hybrid Quantum-Fuzzy Systems</title>
      <link>https://arxiv.org/abs/2606.08658</link>
      <description>arXiv:2606.08658v1 Announce Type: new 
Abstract: LLMs have revolutionized knowledge representation and retrieval, but lack the explicit modeling that knowledge ontologies possess. This paper surveys the ways that ontologies and knowledge graphs have been integrated with dense embedding algorithms. All hitherto attempts involve a trade-off between probabilistic and crisp inference. This paper proposes a novel frontier for devising knowledge representation systems that can simultaneously accommodate probabilistic and crisp inference in the same representation. To this effect, the paper proposes neuro-quantum-fuzzy systems as knowledge representation systems that accommodate both classical and contextual inference implemented through quantum-neural networks (QNN).</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08658v1</guid>
      <category>cs.AI</category>
      <category>cs.LO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Angjelin Hila</dc:creator>
    </item>
    <item>
      <title>Data Agents Under Attack: Vulnerabilities in LLM-Driven Analytical Systems</title>
      <link>https://arxiv.org/abs/2606.08661</link>
      <description>arXiv:2606.08661v1 Announce Type: new 
Abstract: Data agents integrate LLM-driven reasoning with relational data access, executable analytical tools, and multi-step workflow orchestration, making them increasingly central to enterprise analytics. This integration introduces new security vulnerabilities across data resources, database execution, and agent reasoning, recombining concerns from database security and general-purpose LLM-agent security into failure modes that neither line of work captures on its own. To address this gap, we present a systematic security study of data agents. Our contributions are threefold. First, we develop a layered vulnerability framework that identifies eight data agent-specific risks across interpretation, execution, and policy layers. Second, we introduce an attack taxonomy organized by adversary goal, tactic, and technique, covering three goals, seven tactics, and fourteen techniques, and pair it with an LLM-driven payload generation pipeline grounded in real database schemas. Third, we evaluate these attacks on six systems, including four open-source data agents and two production cloud analytics services. Our experiments reveal substantial security vulnerabilities across current systems and yield four key takeaways.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08661v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kuncan Wang, Ziting Wang, Peizhuo Lv, Haoyang Li, Guoliang Li, Gao Cong, Wei Dong</dc:creator>
    </item>
    <item>
      <title>Probing Token Spaces under Generator Shift in AI-Generated Music Detection</title>
      <link>https://arxiv.org/abs/2606.08663</link>
      <description>arXiv:2606.08663v1 Announce Type: new 
Abstract: AI-generated music detectors can appear robust on standard benchmark splits, yet their deployments require transfer to generator sources absent during training. We study this problem with source-restricted evaluation on \textsc{MoM-open}, an open reconstruction of MoM-CLAM that replaces the non-redistributable real corpus with FMA and MTG-Jamendo while preserving the fake-generator protocol. To isolate the role of representation, we introduce \textsc{CoMoE}, a compact fixed classifier for comparing heterogeneous audio token spaces while keeping the downstream architecture and training recipe unchanged. Experiments show that standard and real-source-restricted splits are nearly saturated, whereas fake-source restriction exposes large differences between token spaces: X-Codec tokens are strongest when training on Udio alone, while MERT-derived tokens are stronger when training on Suno-v3.5 alone. These results suggest that codec-style discrete token spaces should be treated as a primary experimental axis under generator shift in AI-generated music detection. Our code and data are available at https://github.com/MAAP-LAB/CoMoE.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08663v1</guid>
      <category>cs.SD</category>
      <category>eess.AS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Joonyong Park, Jungwoo Kim, Junyoung Koh, Yuki Saito</dc:creator>
    </item>
    <item>
      <title>Language as a Sensor: Calibrated Spatial Belief Estimation in 3D Scenes from Natural Language</title>
      <link>https://arxiv.org/abs/2606.08666</link>
      <description>arXiv:2606.08666v1 Announce Type: new 
Abstract: Robots deployed in human-centric environments routinely receive natural-language descriptions of spatial information ("I left my backpack on the table") that reference parts of the world beyond their perceptual field of view. Traditional metric-semantic mapping ignores this signal, while off-the-shelf multimodal models remain limited in 3D spatial reasoning and are not directly amenable to fusion with other sensor modalities. To convert language observations into a calibrated spatial distribution, we train a Language Sensor Model (LSM) that maps each utterance and its scene-graph context to a multimodal distribution, with mixture weights encoding referential ambiguity (e.g., "which table") and component covariances encoding spatial uncertainty (e.g., where "on the table" the target lies). We then introduce VL-Map (Vision-Language Metric-Semantic Mapping), a probabilistic framework that treats these language predictions as stochastic observations and fuses them with onboard perception within a unified belief map. On the VLA-3D benchmark as well as on a real-world mobile robot, LSM is the only language predictor whose covariance estimates remain within the calibrated regime; fused into VL-Map, it leads to more accurate predictions of the target object location (~70% more probability mass on the true target compared to the strongest foundation-model baseline).</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08666v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aryan Naveen, Jason Xinyu Liu, Luca Carlone, Andreea Bobu</dc:creator>
    </item>
    <item>
      <title>X-rated Compliance Theater: An Empirical Evaluation of European Age Verification Systems in Adult Websites</title>
      <link>https://arxiv.org/abs/2606.08667</link>
      <description>arXiv:2606.08667v1 Announce Type: new 
Abstract: Age verification is rapidly emerging as a central regulatory instrument for protecting minors online, with several jurisdictions mandating its deployment for access to adult and pornographic content. This regulatory direction raises significant privacy concerns, as it risks binding sensitive content access to identity-related attributes. It also introduces security risks, since age-verification mechanisms are often outsourced to third-party providers with limited transparency into the robustness of their verification processes. In this work, we conduct, to the best of our knowledge, the first exploratory security assessment of regulation-mandated age-verification mechanisms deployed by adult websites. Rather than treating age verification as a purely regulatory question, we empirically examine whether current deployments provide security guarantees commensurate with the privacy risks of relying on sensitive identity-related data. Our methodology combines ecosystem mapping, adversary modeling, and empirical testing across four countries, covering document-based verification, biometric age estimation, indirect signals, and website-workflow integration. Our results reveal systemic weaknesses across mechanisms and integrations under realistic threat assumptions, including failures against low-cost, widely accessible attacks. Finally, we derive concrete guidelines and design directions for mitigating the security and privacy risks exposed by current age-verification deployments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08667v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Simone Lavermicocca, Michekle Carminati, Stefano Longari</dc:creator>
    </item>
    <item>
      <title>A Comparison of SSL-Based Feature Extractors and Back-End Classifiers for Spoofing Detection: A Multi-Corpus Training and Cross-Linguistic Analysis</title>
      <link>https://arxiv.org/abs/2606.08669</link>
      <description>arXiv:2606.08669v1 Announce Type: new 
Abstract: Voice biometric systems face growing threats from spoofing attacks, yet the evaluation of detection models remains inconsistent across datasets. To investigate these unpredictable fluctuations, we conduct a comprehensive benchmark of four self-supervised learning feature extractors paired with four back-end classifiers. We compare the hierarchical local feature extraction of ResNet with the global sequence and relational modeling of attention and graph-based back-ends. Through multi-corpus training across three scenarios and six evaluation datasets, our empirical analysis yields two critical findings. First, we expose a domain bias within the ASVspoof 5 dataset, showing that naive data scaling actively degrades performance. Second, our cross-linguistic analysis reveals that fine-tuning with just 8 hours of target-language data enhances detection robustness. Together, these findings emphasize the critical need for domain-aware and language-specific adaptation in spoofing detection.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08669v1</guid>
      <category>cs.SD</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Anh-Tuan Dao, Driss Matrouf, Mickael Rouvier, Nicholas Evans</dc:creator>
    </item>
    <item>
      <title>WaveDiT: Distribution-Aware Wavelet Flow Matching for Efficient 3D Brain MRI Synthesis</title>
      <link>https://arxiv.org/abs/2606.08670</link>
      <description>arXiv:2606.08670v1 Announce Type: new 
Abstract: Large and demographically balanced datasets are essential for reliable neuroimaging biomarkers. Full-resolution 3D brain MRI synthesis can support data augmentation in this setting, but existing approaches either incur prohibitive computational cost at volumetric scale or rely on lossy latent compression that may compromise anatomical detail. As a result, practical 3D generative augmentation often requires specialized compute infrastructure. We propose WaveDiT, a conditional flow matching framework operating in the coefficient space of a 3D Haar Discrete Wavelet Transform. The model combines factorized spatio-depth attention with band-wise heteroscedastic uncertainty modeling derived from higher-order wavelet statistics. Predicted log-variance is integrated directly into both the flow objective and conditioning pathway, enabling adaptive precision consistent with the heavy-tailed and input-dependent variance structure of anatomical detail. This formulation supports full-resolution 3D synthesis under practical memory and time constraints on a single modern GPU. Evaluation on a multi-site cohort demonstrates improved alignment between generated and real MRI distributions, together with enhanced downstream brain age prediction and region-level anatomical agreement relative to diffusion, latent, and wavelet-based baselines. Code is available at https://github.com/sisinflab/WaveDiT</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08670v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Danilo Danese, Angela Lombardi, Giuseppe Fasano, Matteo Attimonelli, Tommaso Di Noia</dc:creator>
    </item>
    <item>
      <title>SkillHone: A Harness for Continual Agent Skill Evolution Through Persistent Decision History</title>
      <link>https://arxiv.org/abs/2606.08671</link>
      <description>arXiv:2606.08671v1 Announce Type: new 
Abstract: Agent skills extend language-model agents with task-specific procedures, scripts, and references, but the tasks and environments they target continually change. Existing methods improve skills in bounded runs and retain only the final artifact, discarding the decision history that later agents need to interpret prior revisions, evaluations, and rejected alternatives. We introduce SkillHone, a harness for continual agent skill evolution grounded in persistent decision history. SkillHone pairs skill revisions with evaluation-side evidence that supplies practice feedback, recording structured histories of diagnoses, revisions, evidence, and outcomes. Role-separated subagents run candidate skills on practice probes with redacted reporting and propose revisions informed by prior decisions, enabling cross-session refinement without rediscovering past rationale. We evaluate SkillHone on deep-research benchmarks in a raw open-web setting, where agents are not given an integrated search stack and must organize retrieval through portable skills. We compare against a deep-research agent backed by commercial retrieval services. With Qwen3.6-35B-A3B as the evaluation-time backbone, the resulting skills outperform the deep-research agent by 15.8 points on GAIA and 3.2 points on WebWalkerQA-EN, while also exceeding prior skill-evolution methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08671v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhiwei Li, Yong Hu</dc:creator>
    </item>
    <item>
      <title>Learning to Solve Generative ODEs Beyond the Linear Span</title>
      <link>https://arxiv.org/abs/2606.08672</link>
      <description>arXiv:2606.08672v1 Announce Type: new 
Abstract: Diffusion and flow generative models sample by integrating a learned ODE, but high quality still requires many sequential model evaluations. Solver learning reduces this cost by adapting scalar coefficients, timesteps, or both, while keeping the backbone model fixed. In this work, we identify a structural bottleneck in this update family: each step remains span-limited. Since the scalar-coefficient update lies in the span of buffered velocity evaluations, it can fit only the in-span component while leaving any out-of-span residual unreachable by scalar recombination alone. We propose SpanLift, a lightweight neural solver that augments scalar-coefficient updates with a spatial residual operator. SpanLift keeps a fixed base solver as an in-span prior and learns a spatial residual operator over the state and velocity buffer. The operator is trained by endpoint teacher matching, preserves the pretrained backbone, and adds no model NFEs. Empirically, the learned correction transfers across base solvers and is predominantly out-of-span. Across pixel-space diffusion, latent flow matching, and precipitation nowcasting, SpanLift achieves state-of-the-art few-step sampling. With only 3 NFE, it improves CIFAR-10 FID from 8.16 to 5.69 and ImageNet FID from 17.37 to 11.83.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08672v1</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sihyeon Kim, Seunghun Lee, Vikas Singh, Hyunwoo J. Kim</dc:creator>
    </item>
    <item>
      <title>ClinicalAligner26AM: A Cross-Lingual Aligner for Dataset Translation; Evidences from the MultiClinCorpus Shared Task</title>
      <link>https://arxiv.org/abs/2606.08673</link>
      <description>arXiv:2606.08673v1 Announce Type: new 
Abstract: Word-level cross-lingual alignment is central to annotation projection, translation auditing, and cross-lingual faithfulness estimation, yet existing neural aligners are rarely adapted to specialized domains. In this paper, we introduce ClinicalAligner26AM, a large-context multilingual aligner model for biomedical and clinical text initialized from ClinicalEncoder26AM. Our training recipe is inspired by AWESoME Align. We build our soft alignment target by sharpening with Sinkhorn-Knop optimal transport a cost matrix established for parallel clinical texts and conversations through the fusion of sentence-level, phrase-level, and token-level signals. We distill this sharpened alignment matrix directly into our student aligner, by encouraging its naive cosine-based token similarity scores to match this target. At inference time, we project source-span scores through the learned token alignment matrix and decode the longest valid high-scoring span in the target text, optionally supported by MultiClinNER predictions summarized in Appendix B. We evaluate CA26AM on the MultiClinCorpus shared task, which projects Spanish clinical entity annotations into six target languages. Our two submitted systems ranked respectively first and second across all languages and entity types, with character-weighted F1 scores above 0.95 in nearly all settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08673v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Fran\c{c}ois Remy</dc:creator>
    </item>
    <item>
      <title>BioVid: Autoregressive Video Generation with Biological Behavior Semantic Comprehension</title>
      <link>https://arxiv.org/abs/2606.08674</link>
      <description>arXiv:2606.08674v1 Announce Type: new 
Abstract: Existing video generation frameworks treat sequence duration as an externally prescribed parameter -- fixed frame counts or text prompts -- producing clips whose temporal boundaries are decoupled from the statistical structure of real behavioral data. This assumption is fundamentally misaligned with biological behavior, where action duration varies naturally across individuals and instances and is encoded in the data itself. We present BioVid, a data-driven autoregressive video generation framework that learns the temporal structure of biological behaviors directly from training data, including their natural length distributions. In the first stage, a Finite Scalar Quantization GAN (FSQ-R3GAN) tokenizer encodes each video frame into a compact discrete representation, combining the stabilized relativistic training objective of R3GAN with FSQ's guaranteed codebook utilization to achieve high-fidelity spatial reconstruction without codebook collapse. In the second stage, a causal Transformer models the resulting token sequences autoregressively and learns to emit an End-of-Sequence (EOS) token when the behavioral event reaches semantic closure, with the termination distribution emerging naturally from the training data rather than any human-specified constraint. Experiments on a human drinking behavior dataset (NTU RGB+D, A001, n=94) demonstrate that BioVid's generated length distribution closely matches that of held-out test data, achieving a Wasserstein-1 distance of 1.24 against the ground truth -- compared to 6.05 for a fixed-length baseline and 15.48 for VideoGPT -- while maintaining competitive spatial fidelity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08674v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tsung-Wei Pan, Jung-Hua Wang</dc:creator>
    </item>
    <item>
      <title>Lost in the Flow with Code Talkers: Unveiling the Instruction-Tuning Tax of Large Language Models in Code Tasks</title>
      <link>https://arxiv.org/abs/2606.08676</link>
      <description>arXiv:2606.08676v1 Announce Type: new 
Abstract: AI coding assistants have significantly improved developer productivity by automatically suggesting code that aligns with user intent, and many of these tools are now integrated directly into Integrated Development Environments (IDEs). Developers interact with code in two distinct cognitive modes: Flow and Command. While developers require tools that directly complete or infill code in unfinished programs during Flow mode, they also need tools that can comprehend intentions expressed as natural-language instructions and convert them into executable code in Command mode. Although instruction-tuned Large Language Models (LLMs) dominate many application scenarios due to their abilities to infer and fulfill developers' intents, it remains unclear whether the same paradigm is equally suitable for different code-related tasks. Therefore, it is necessary to understand how instruction tuning affects the feasibility of CodeLLMs as coding assistants. To fill this gap, we conduct the first empirical study that uncovers a key trade-off caused by instruction tuning across programming modes, which we term the Instruction-Tuning Tax. Our results show that instruction tuning is not a free lunch: although instruction-tuned models are more capable of following instructions and leveraging structured guidance, these gains often come at the cost of weaker infilling performance. We further extend our study through both qualitative and quantitative analyses, including manual failure categorization, behavioral metrics that capture generation fidelity, and intermediate-checkpoint evaluation throughout the tuning process. Summarizing our results into seven findings and four implications, our study offers a new perspective on the development of AI-powered coding tools and highlights the need to carefully balance instruction-following ability with effective code generation assistance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08676v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shi Ying Chang, Chiok Yew Ho, Yichen Li, Yintong Huo</dc:creator>
    </item>
    <item>
      <title>Speaker-Invariant Representation Learning for Spoofing Detection via Gradient Reversal and A Variational Information Bottleneck</title>
      <link>https://arxiv.org/abs/2606.08678</link>
      <description>arXiv:2606.08678v1 Announce Type: new 
Abstract: Sophisticated generative speech technology can undermined the reliability of voice biometrics. While spoofing detection systems excel when assessed under in-domain conditions, generalisation to out-of-domain settings is often poor. In this paper, we show that such issues could be caused by speaker bias, where models learn individual voice traits rather than markers of manipulation or generation. We propose a teacher-student framework for speaker-invariant spoofing detection that disentangles identity without requiring speaker labels. We leverage a pre-trained speaker recognition teacher to guide a student model via a gradient reversal layer. To control the balance between suppressing cues related to voice identity with the preservation of those related to spoofing detection, we integrate a Variational Information Bottleneck. Evaluations across nine datasets show our model achieves a 25.7% relative reduction to the EER compared to the MHFA baseline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08678v1</guid>
      <category>cs.SD</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Anh-Tuan Dao, Driss Matrouf, Mickael Rouvier, Nicholas Evans</dc:creator>
    </item>
    <item>
      <title>Distortion-Aware PETR for BEV Object Detection with Mixed Pinhole-Fisheye Cameras</title>
      <link>https://arxiv.org/abs/2606.08680</link>
      <description>arXiv:2606.08680v1 Announce Type: new 
Abstract: Fisheye cameras are widely deployed in autonomous driving perception suites for their low cost and full-coverage field of view (FOV), yet their potential remains underleveraged in 3D object detection. Severe radial distortion challenges most BEV detectors by violating the fundamental assumption of uniform sampling. To bridge this gap, we propose Distortion-Aware PETR (DAPETR), a projection-free detector tailored for mixed pinhole-fisheye camera setups. DAPETR incorporates two key learned-adaptive modules: a unified distortion-aware positional embedding that harmonizes positional encodings for image representations with fisheye geometry, and a bidirectional feature-geometry co-modulation module that mutually adapts image features and 3D positional embeddings. In our experiments on a converted KITTI-360 benchmark, we systematically compare our learned adaptive approach against PETR in polar coordinates (PolarPETR). We find that while both methods improve over the baseline, our learned modules achieve superior performance. Crucially, we uncover a negative interaction when combining both strategies, revealing that learned adaptation and explicit geometric reparameterization can conflict. Our final DAPETR model significantly advances the research and benchmark for fisheye BEV detection, providing critical insights into effective distortion-aware 3D perception design other than image rectification.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08680v1</guid>
      <category>cs.CV</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiangzhong Liu</dc:creator>
    </item>
    <item>
      <title>Asymptotic Optimality of the High-Dimensional Gaussian Mechanism and Improved Low-Dimensional Mechanisms for Differential Privacy</title>
      <link>https://arxiv.org/abs/2606.08681</link>
      <description>arXiv:2606.08681v1 Announce Type: new 
Abstract: The additive noise mechanism is a foundational tool for differential privacy (DP) of $T$-dimensional real-valued vector queries. The Gaussian mechanism, utilizing Gaussian noise, is the mostly widely used such mechanism, due to its simplicity and strong privacy guarantees. In this work, we provide justification for this choice, showing that as the dimension $T\to\infty$, no additive-noise mechanism can asymptotically improve on the Gaussian mechanism's privacy--utility tradeoff for the strong privacy settings typically used.We also develop a new family of \emph{Spherical Generalized Gamma} DP mechanisms, which contains both the Gaussian mechanism and the recently studied $\ell_2$ mechanism (Joseph \emph{et al.}, ICML 2025). We identify members of this family that outperform both the Gaussian and $\ell_2$ mechanisms in certain low-dimensional settings, and show tight composition of all mechanisms in this family, answering an open question of Joseph \emph{et al.}~regarding the $\ell_2$ mechanism.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08681v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>Proceedings of the 43rd International Conference on Machine Learning, PMLR 306, 2026</arxiv:journal_reference>
      <dc:creator>Yu Wei, Alexander Bienstock, Antigoni Polychroniadou</dc:creator>
    </item>
    <item>
      <title>Activation Steering Induces Emergent Misalignment: A More Comprehensive Evaluation</title>
      <link>https://arxiv.org/abs/2606.08682</link>
      <description>arXiv:2606.08682v1 Announce Type: new 
Abstract: Activation steering has emerged as a popular inference-time technique for modulating the behavior of large language models (LLMs). By constructing a steering vector from examples of a target behavior and injecting it into intermediate activations during inference, activation steering enables flexible behavioral control while avoiding the permanent parameter updates required by finetuning. Meanwhile, recent work has identified emergent misalignment (EM) as a significant safety concern, wherein models finetuned on unsafe examples from a narrow task may unexpectedly generalize to broadly unsafe behavior on unrelated tasks. Although finetuning-induced EM has been extensively studied, whether activation steering can induce EM remains comparatively under-explored, despite its increasing use as a model-control technique. In this paper, we present a comprehensive study of activation-steering-induced emergent misalignment, substantially expanding the evaluation scope beyond existing pioneering work. First, we show that activation steering can induce broad misalignment, even in the recent Qwen-3.5 series. Moreover, activation-steered models produce harmful responses with stronger semantic relevance and higher coherence than their finetuned counterparts, making the resulting misalignment potentially more harmful. Second, we characterize properties of AS-induced EM by analyzing key steering-specific factors, including steering magnitude, the low-rank structure of the steering subspace, and the number of epochs during steering-vector construction. Third, we evaluate the robustness and sensitivity of AS-induced EM across diverse model families, model scales, target tasks, and intervention layers. Our findings reveal activation steering as a significant yet under-examined source of emergent misalignment and provide an activation-space perspective for understanding the mechanisms and safety risks of EM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08682v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Qi Cao, Jian Lou, Meiting Liu, Wenjie Feng, Dan Li, See-Kiong Ng, Anh Tuan Luu</dc:creator>
    </item>
    <item>
      <title>BLUE: Toward Better Language Use in Efficient Vision-Language-Action Models for Autonomous Driving</title>
      <link>https://arxiv.org/abs/2606.08684</link>
      <description>arXiv:2606.08684v1 Announce Type: new 
Abstract: We present BLUE, a minimal method for better language use in vision-language-action (VLA) models for autonomous driving (AD). Through extensive analysis, we reveal that language matters on only a small fraction of routes, but on those routes it can greatly improve or degrade performance. Generating language at every frame is therefore inefficient, since most computation is spent on frames that do not benefit from language. We further show that pretrained VLA hidden states potentially already encode whether language will benefit a given frame, even though scene complexity and kinematic features alone struggle to predict this. Based on this finding, BLUE trains a lightweight gate on frozen VLA hidden states to decide per frame whether to activate language generation or predict actions directly, without modifying the backbone or requiring additional human annotation. With just a 0.11M-parameter gate, BLUE sets a new state of the art on both benchmarks, achieving 76.2% success rate on Bench2Drive and 36 driving score on Longest6 v2, while delivering 2.54x inference speedup and 8.9% success rate improvement over the backbone. BLUE provides a practical path toward efficient language-augmented AD, showing that VLA models can retain the benefits of language at a fraction of the cost. Our code, data, logs and checkpoints are fully available on https://github.com/George-Ling3/BLUE.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08684v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>George Ling, Lijin Yang, Hao Yang, Zhongzhan Huang</dc:creator>
    </item>
    <item>
      <title>ND-TNN: Tensor-Neural-Network Approximation for High-Dimensional Nonlocal Diffusion Models</title>
      <link>https://arxiv.org/abs/2606.08685</link>
      <description>arXiv:2606.08685v1 Announce Type: new 
Abstract: We study a numerical method, built on the tensor neural network (TNN) architecture introduced in \cite{wang2022tensor}, for solving nonlocal diffusion models in high-dimensional spaces. The tensor-product structure of the TNN ansatz, combined with the separability of the Gaussian kernel, reduces the high-dimensional integrals in the nonlocal energy to products of low-dimensional integrals, which are evaluated by Gauss--Legendre quadrature; nonseparable source and boundary data are handled by a TNN-based preconditioning step. For the Dirichlet boundary condition, we establish the asymptotically compatible $L^2$ error estimate \[ \|u_{\mathrm{loc}}-u_{\delta,p}\|_{L^2(\Omega)} \le C\!\left(\frac{\varepsilon_f}{\sqrt\delta} +\frac{\varepsilon_g}{\delta} +\frac{\varepsilon_u}{\sqrt\delta} +\eta_{\mathrm{opt}}\right) +C\sqrt\delta, \] where $\varepsilon_f$, $\varepsilon_g$ and $\varepsilon_u$ are the data and trial-class approximation errors and $\eta_{\mathrm{opt}}$ is the optimization residual. For the Neumann boundary condition, the $L^2$ estimate is improved to $O(\varepsilon_f+\varepsilon_g/\sqrt\delta+\varepsilon_u +\eta_{\mathrm{opt}}+\delta)$, and an $H^1$ gradient estimate is further obtained through a smoothing post-processing step. Numerical experiments on tensor-product domains up to $d=20$ support the theoretical results, and additional tests on two- and three-dimensional $L$-shaped domains demonstrate the practical robustness of the method beyond the smooth-domain setting covered by the analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08685v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ziyue Cai, Zuoqiang Shi</dc:creator>
    </item>
    <item>
      <title>Shift-Dependent Asymmetry: Orthogonal Inverse Low-Rank Adaptation for Federated Medical Segmentation</title>
      <link>https://arxiv.org/abs/2606.08687</link>
      <description>arXiv:2606.08687v1 Announce Type: new 
Abstract: Low-Rank Adaptation (LoRA) enables efficient federated fine-tuning of segmentation foundation models for medical imaging. However, most federated LoRA methods adopt a uniform aggregation rule, which breaks under the encoder-decoder asymmetry in medical segmentation: the encoder is dominated by appearance shifts, while the decoder is dominated by supervision variations. This mismatch entangles shared anatomy with site-specific biases and harms generalization. To address this, we propose Inverse Asymmetric Tuning (IAT). IAT aligns adaptation with heterogeneity sources by personalizing module-specific components in the encoder to absorb appearance shifts and in the decoder to accommodate site-dependent supervision, while retaining a shared pathway for transferable consensus. However, structural separation alone is insufficient under LoRA's bilinear parameterization, where multiplicative coupling can still cause site-specific updates to leak into the shared direction. We therefore introduce a Subspace Orthogonality Regularizer that penalizes shared-local collinearity in the effective update space, mitigating leakage without extra communication. Experiments show consistent improvements over strong federated LoRA and parameter-efficient FL baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08687v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xingyue Zhao, Wenke Huang, Linghao Zhuang, Haoran Wu, Anwen Jiang, Zhifeng Wang, Wenwen He, Ming Feng, Mang Ye, Bo Xu</dc:creator>
    </item>
    <item>
      <title>PhysAgent: Automating Physics-Based 4D Synthesis via Trajectory-Grounded Multi-Agent Feedback</title>
      <link>https://arxiv.org/abs/2606.08688</link>
      <description>arXiv:2606.08688v1 Announce Type: new 
Abstract: Achieving fully automated, physically plausible 3D motion synthesis is a core objective in graphics and generative AI. However, configuring complex environmental force fields still relies entirely on manual expert intervention, creating a severe bottleneck for large-scale simulation data generation. Existing automated methods primarily focus on material optimization and exhibit severe modality gaps and technical flaws when applied to the vastly more complex force field optimization space: naive Large Language Models (LLMs) lack underlying simulation feedback, causing severe physical inaccuracies, while traditional Score Distillation Sampling (SDS) suffers from sluggish gradients, local optima entrapment, and a mathematical inability to dynamically switch discrete force fields. To address this, we propose PhysAgent, the first simulator-in-the-loop multi-agent framework that leverages multimodal inputs for automated, physically grounded 4D synthesis. By decoupling intrinsic materials from extrinsic dynamics, PhysAgent utilizes a Semantic Agent equipped with an externalized Force Field Skill module to master simulation rules and generate valid initializations. Subsequently, the Refine Agents, driven by Trajectory-Grounded Multi-Agent Feedback, leverage vision foundation models to extract dense point trajectories from rendered frames. By converting these explicit motion trajectories into structured textual descriptors, the agent harnesses LLM commonsense reasoning to execute zero-shot macroscopic leaps, effectively escaping local optima and dynamically switching discrete force fields. Extensive experiments demonstrate that PhysAgent rapidly generates stable, diverse physical scenes from arbitrary multimodal prompts, significantly outperforming existing baselines in both generation diversity and physical accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08688v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chunji Lv, Jiaxi Ye, Yuchen Jiang, Rexar Lin, Changsheng Li</dc:creator>
    </item>
    <item>
      <title>Hierarchical Projection for Adaptive Knowledge Transfer</title>
      <link>https://arxiv.org/abs/2606.08691</link>
      <description>arXiv:2606.08691v1 Announce Type: new 
Abstract: Modern data-driven applications increasingly involve learning from multiple heterogeneous sources, where a target dataset is limited but related information is available across domains. Naively combining these sources can degrade performance when relevance varies or spurious signals are present, posing a fundamental challenge for trustworthy cross-domain learning. We propose Projection Transfer Learning (ProjectionTL), a unified framework that integrates hierarchical Bayesian modeling with adaptive projection for selective knowledge transfer. The key idea is to decouple transfer at two levels: first, we construct a source-guided hierarchical prior that aggregates information across sources using data-driven weights, capturing global alignment between each source and the target; second, we refine this borrowing through a posterior-projection step that operates at the feature level, selectively retaining coordinates that exhibit local agreement with the target signal. This two-stage design enables the method to simultaneously perform source selection and feature selection, thereby mitigating negative transfer while preserving interpretability. ProjectionTL provides a principled approach to integrating heterogeneous data across domains, bridging statistical modeling and modern machine learning paradigms for robust and interpretable transfer. Through simulations and real-world biomedical applications, we demonstrate improved accuracy, stability, and interpretability compared to existing methods. Our framework offers a scalable and generalizable strategy for trustworthy cross-domain learning in high-dimensional settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08691v1</guid>
      <category>cs.LG</category>
      <category>stat.ME</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Samhita Pal, Tian Gu</dc:creator>
    </item>
    <item>
      <title>Agentic Search for Counterfactual Recourse under Fixed LLM Budgets</title>
      <link>https://arxiv.org/abs/2606.08696</link>
      <description>arXiv:2606.08696v1 Announce Type: new 
Abstract: Counterfactual recourse aims to provide actionable feature changes that would alter an unfavorable decision made by a predictive model. In practice, affected individuals often benefit from multiple feasible alternatives rather than a single optimal explanation. A natural way to produce such alternatives is to prompt large language models (LLMs). However, prompting incurs a practical constraint: the number of LLM calls is often the dominant computational and economic cost. Together, the need for multiple alternatives and this cost constraint shift the problem from finding a single high-quality counterfactual to efficiently generating a set of oracle-validated counterfactuals under a fixed LLM-call budget. In this work, we study counterfactual recourse generation in the LLM-agentic setting as a fixed-budget search problem and propose Comp-MCTS, an agentic tree-search framework that maximizes the yield of unique, oracle-validated counterfactuals under this budget while maintaining favorable quantity--quality trade-offs. Comp-MCTS allocates the budget toward novel intervention directions via LLM-based proposal generation, oracle validation, and compression-guided pruning, in a training-free, oracle-only setting. Experiments on four real-world tabular datasets show that Comp-MCTS substantially outperforms single-candidate LATS-style baselines in the yield of unique, oracle-validated counterfactuals, and offers favorable quantity--quality--efficiency trade-offs against stronger multi-candidate variants: comparable or higher yield at similar or lower oracle-evaluation cost on three of four datasets, plus competitive proximity, sparsity, and novelty.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08696v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yasuo Tabei</dc:creator>
    </item>
    <item>
      <title>Quotient Admission Algorithms for Witness-Supported Graph Windows</title>
      <link>https://arxiv.org/abs/2606.08698</link>
      <description>arXiv:2606.08698v1 Announce Type: new 
Abstract: We formulate the quotient admission problem for finite graph-window rows. The input is a finite row set, an admissible evidence map, semantic labels, witness-support hypergraphs, and atom-level admissibility predicates. The output is a quotient decision on evidence atoms, with possible decisions certificate, residual, low-confidence, or blocked. The problem asks for the maximal guard-respecting atom-level decision map that uses no refinement beyond the admissible evidence partition. We prove an atom-union characterization of identifiable classes, give a witness-support hypergraph guard for certificate admission, characterize projected-label conflicts as blocked atoms, and present quotient admission algorithms with correctness, maximality, and complexity guarantees. With explicit evidence vectors and hyperedges, the algorithms run in expected O(B + I + n) time and space by hashing and deterministic O(B + I + n log n) time by sorting under a key-linear comparison model, where n is the number of rows, B is the total evidence encoding length, and I is the total hyperedge incidence size. We also prove a magnitude-only indistinguishability lower bound: any evaluator that observes only residual magnitudes fails on instances whose evidence atoms require different residual decisions after the magnitudes collapse them.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08698v1</guid>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yushan Li</dc:creator>
    </item>
    <item>
      <title>Simultaneous recovery of multiple parameters in nonlocal diffusion equations from internal measurements</title>
      <link>https://arxiv.org/abs/2606.08699</link>
      <description>arXiv:2606.08699v1 Announce Type: new 
Abstract: This paper is devoted to simultaneously recovering multiple parameters from internal measurements for nonlocal diffusion equations. The uniqueness of the inverse problem is established by employing the asymptotic behavior of solutions, analytic continuation, the Laplace transform, and properties of analytic functions. For numerical reconstruction, we apply the Levenberg-Marquardt method to obtain a stable approximate solution of the inverse problem. Numerical examples are provided to demonstrate the efficiency of the proposed algorithm and to validate our theoretical findings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08699v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <category>math.AP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kai Yu, Zhiyuan Li, Yikan Liu</dc:creator>
    </item>
    <item>
      <title>AutoSUT: The Environment Semantics Gap in Structured CTI for Adversary Emulation</title>
      <link>https://arxiv.org/abs/2606.08700</link>
      <description>arXiv:2606.08700v1 Announce Type: new 
Abstract: Structured Cyber Threat Intelligence (CTI) is increasingly used for adversary emulation, detection evaluation, and cyber range design. However, these workflows still require a target System Under Test (SUT) whose environment is not fully described by public CTI. We measure how much of that environment can be derived from MITRE ATT&amp;CK Structured Threat Information Expression (STIX) bundles. Using the ATT&amp;CK Enterprise, Mobile, and Industrial Control Systems datasets, with CAPEC and FiGHT as comparison datasets, we evaluate platform coverage, software specificity, vulnerability evidence, and deployment compatibility. Platform annotations are common, but software references rarely include versions or Common Platform Enumeration (CPE) identifiers. In Enterprise, 97.6% of software objects lack both, and campaign-level Common Vulnerabilities and Exposures (CVEs) remain sparse.
  Our results show that ATT&amp;CK-style structured CTI can narrow candidate environments and support lower-bound backend-family assignment, but structured fields alone are insufficient to derive a replay-ready SUT. Profile confusion decreases from 1.3% when one software item is linked to 0% when two are linked. The results identify a boundary between environment details supported by the corpus and the version, vulnerability, and deployment information that must come from external sources. Keeping corpus-supported elements fixed while varying only analyst-authored details yields multiple distinct, campaign-compatible SUTs, including an executable witness exploiting the same real vulnerability. Structured CTI, therefore, constrains but does not uniquely determine the environment, highlighting the need to separate corpus-supported commitments from analyst-authored assumptions in replay-ready emulation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08700v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sidnei Barbieri, \'Agney Lopes Roth Ferraz, Louren\c{c}o Alves Pereira J\'unior</dc:creator>
    </item>
    <item>
      <title>Is Telehealth Better Used to Treat Patients or Help Other Physicians Treat Patients? An Agent-Based Modeling Study of Healthcare Provision</title>
      <link>https://arxiv.org/abs/2606.08701</link>
      <description>arXiv:2606.08701v1 Announce Type: new 
Abstract: Telehealth, the delivery of medical care remotely, is hoped to increase access to specialty services or decrease health care utilization. Physicians can provide telehealth to each other or to patients. Specialists often treat complex patients who can be adequately cared for only in academic hospitals, suggesting that providing specialty services via telehealth will reallocate rather than reduce system utilization. Here I use agent-based modeling to investigate telehealth's effects on clinical outcomes and system utilization in medical toxicology. I found that physician-physician telehealth increased patient health but system utilization did not change. The effects were more pronounced as clinical complexity increased. Physician-patient telehealth increased cost and system utilization but not clinical outcomes. Within the limitations of our approach, these results suggest that telehealth is more cost-effective for improving generalist access to specialist knowledge than in providing care to the public.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08701v1</guid>
      <category>cs.MA</category>
      <category>cs.CY</category>
      <category>cs.SI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Michael Chary</dc:creator>
    </item>
    <item>
      <title>ConMem: Structured Memory-Guided Adaptation in Training-Free Multi-Agent Systems</title>
      <link>https://arxiv.org/abs/2606.08702</link>
      <description>arXiv:2606.08702v1 Announce Type: new 
Abstract: Recent advances have improved the adaptive capabilities of LLM-based multi-agent systems (MAS) through memory-, skill-, and learning-based approaches, yet these approaches remain challenged by noisy trajectories, insufficient modeling of memory-skill relations, and reliance on additional training or high-quality supervision. To address these limitations, we propose ConMem, a relation-aware and training-free framework that enables efficient multi-agent adaptation through cross-experience coordination. Specifically, ConMem distills historical interaction trajectories into structured memory cards to capture reusable strategies and cues, organizing them into a relation-aware memory graph. At runtime, ConMem retrieves cards according to task needs and coordinates them through the card graph to resolve strategy conflicts and recover their dependencies. Combined, these modules yield structured and relation-aware guidance, enabling robust, lightweight adaptation in multi-agent systems without additional training. Extensive experiments across multiple benchmarks and mainstream MAS architectures show consistent gains over existing memory architectures, with improved inference-time efficiency through pruning more than 50% of expanded candidates and reducing planning overhead by over 80%. Our codes are available at https://anonymous.4open.science/r/ConMemCode</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08702v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhixun Tan, Qiang Chen, Tairan Huang, Xiu Su, Yi Chen</dc:creator>
    </item>
    <item>
      <title>Analyzing the Correlation Between Hallucinations and Knowledge Conflicts in Large Language Models</title>
      <link>https://arxiv.org/abs/2606.08705</link>
      <description>arXiv:2606.08705v1 Announce Type: new 
Abstract: Hallucinations -- factually incorrect or unverifiable outputs -- remain one of the most challenging limitations of Large Language Models (LLMs), especially in knowledge-intensive tasks. One proposed explanation is internal knowledge conflicts arising from fixed, outdated training data. This paper investigates whether internal representations linked to knowledge conflicts correlate with hallucination behaviors in LLMs.
  Using probing techniques inspired by two prior works, we analyzed activations from hidden, attention, and MLP layers, as well as output logits, across predefined tasks. We probed LLaMA-3-8B on hallucination detection benchmarks and Falcon-7B on a knowledge conflict dataset. Our findings show that, although conceptually related, hallucination activation patterns cannot be fully reduced to or explained by knowledge conflict representations.
  Nonetheless, probing proves a robust tool across multiple languages and activation types, supporting its role in improving LLM interpretability. This work advances the broader understanding of hallucinations in LLMs and underscores the value of fine-grained analysis of their internal behavior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08705v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lucrezia Laraspata, Giovanna Castellano, Gennaro Vessio</dc:creator>
    </item>
    <item>
      <title>PRPO: Perception-Reinforced Policy Optimization via Token-Level Dynamic Advantage Reshaping</title>
      <link>https://arxiv.org/abs/2606.08708</link>
      <description>arXiv:2606.08708v1 Announce Type: new 
Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become an effective paradigm for improving the reasoning capability of Large Vision-Language Models (LVLMs). However, existing RLVR methods primarily rely on trajectory-level outcome rewards, which assign identical learning signals across all generated tokens. This coarse-grained credit assignment is fundamentally mismatched to multimodal reasoning, where only a sparse subset of tokens is causally grounded in visual evidence. Consequently, these pivotal perceptual tokens receive weak supervision and are often overwhelmed by language priors or reasoning-template tokens. To address this limitation, we propose Perception-Reinforced Policy Optimization (PRPO), a token-level reinforcement learning framework that explicitly identifies and reinforces pivotal perceptual tokens within long-horizon multimodal reasoning trajectories. PRPO introduces Robust Visual Dependency (RVD), a principled metric that identifies tokens whose predictions are both visually grounded and perturbation-stable, filtering out brittle or noisy visual tokens. Based on RVD, we further propose Perceptual Advantage Reshaping (PAR), a token-level credit assignment technique that amplifies perceptually informative tokens while preserving stable gradients for non-perceptual tokens. Extensive experiments on seven multimodal reasoning benchmarks demonstrate that PRPO consistently outperforms strong LVLM baselines across both 3B and 7B model scales, achieving average gains of 23.3% and 21.1%, respectively. PRPO achieves state-of-the-art performance with improved training efficiency and stronger cross-task generalization. Our findings highlight the importance of fine-grained credit assignment for scalable multimodal reinforcement learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08708v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Qiming Li, Tianlun Li, Xiaolong Cheng, Hangyu Li, Ruiyan Gong, Kangning Niu, Kaitao Jiang, Mu Xu</dc:creator>
    </item>
    <item>
      <title>Structuring agentic AI for HPC code modernization</title>
      <link>https://arxiv.org/abs/2606.08710</link>
      <description>arXiv:2606.08710v1 Announce Type: new 
Abstract: Modernization of legacy scientific codes is often necessary to keep up with the ever-evolving changes in the compute resource ecosystem. Parallelization and migration from poorly supported software ecosystems are two of the most time-consuming activities in the research software engineering field. This paper presents our experience in the successful, two-phase AI-assisted modernization of NMAP-RKPM, a roughly 60,000-line, 3D explicit solid mechanics physics engine based on the Reproducing Kernel Particle Method (RKPM). We converted this single-threaded, Fortran based MPI application into a OpenMP-parallel C++ based MPI tool in the span of a few months. While Large Language Model (LLM) based tools on their own proved inadequate, we developed a highly structured "hand-holding" agentic AI methodology, like providing manually created examples, ensuring continuous buildability and limiting session scope, that was instead highly effective. The paper provides both the AI-assisted steps that were successful and the problems that we had to overcome, alongside the reasoning behind the chosen path.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08710v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Anthony Marinov, Igor Sfiligoi</dc:creator>
    </item>
    <item>
      <title>Evaluating Operators for Acoustic Wave Simulation Correction</title>
      <link>https://arxiv.org/abs/2606.08711</link>
      <description>arXiv:2606.08711v1 Announce Type: new 
Abstract: Correcting numerical dispersion artifacts from Finite Difference solvers is a well-identified challenge in computational wave physics, but existing approaches evaluate only a restricted family of CNN-based architectures and have been applied exclusively to the elastic wave equation. We instantiate the Deep Finite Difference framework on two-dimensional anisotropic acoustic wave propagation, pairing a fourth-order Finite Difference proxy with a Pseudo-Spectral reference over 27,000 heterogeneous velocity fields. We benchmark twelve correction architectures, from linear regression to Fourier Neural Operators, under a unified 10-fold cross-validation protocol.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08711v1</guid>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pascal Tribel, Gianluca Bontempi</dc:creator>
    </item>
    <item>
      <title>SNR-ST-Mix: Sample-specific Neighborhood Regression Mixup for Augmented Spatial Transcriptomics Imputation with Deep Neural Network</title>
      <link>https://arxiv.org/abs/2606.08712</link>
      <description>arXiv:2606.08712v1 Announce Type: new 
Abstract: Purpose: Spatial transcriptomics (ST) enables gene expression measurements within the tissue context. However, these measurements are often noisy, low-resolution, and sparsely sampled, which limits the recovery of fine spatial structure. Deep neural networks have become powerful tools for expression imputation from histology, but their performance remains constrained by limited sample sizes and a lack of biologically informed augmentation. Most of the existing augmentation strategies for learning are designed for classification tasks rather than regression, which neglect spatial and transcriptomic relationships, leading to biologically implausible interpolations that hinder prediction performance. Approach: To address these limitations, we propose SNR-ST-Mix, a geometry- and expression-aware data augmentation framework designed specifically for ST data. It constrains mixing to a spot's k-nearest spatial neighbors and adaptively weights interpolation coefficients based on expression similarity, generating augmented samples that preserve local biological structure while ensuring spatial smoothness. This dual conditioning yields synthetic examples that expand the effective training manifold, promote generalization, and enhance prediction stability under sample-specific training. Results: Extensive experiments with various tissue types demonstrate that SNR-ST-Mix consistently outperforms conventional augmentation methods without requiring architectural changes or additional computation. Conclusions: SNR-ST-Mix provides an effective and biologically principled augmentation strategy for spatial transcriptomics regression tasks. By explicitly leveraging spatial geometry and transcriptomic similarity, it expands the effective training manifold and improves predictive performance without increasing model complexity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08712v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hongyi Yu, Yaoyu Fang, Jiahe Qian, Xinkun Wang, Lee A. Cooper, Bo Zhou</dc:creator>
    </item>
    <item>
      <title>The price of incrementality in k-center clustering</title>
      <link>https://arxiv.org/abs/2606.08713</link>
      <description>arXiv:2606.08713v1 Announce Type: new 
Abstract: The $k$-center problem is one of the best-studied and most intuitive clustering formulations. It asks, given a set of $n$ points in a metric space, for $k$ of the points to be designated as cluster centers, so that the maximum distance of an input point to its nearest center is minimized. Gonzalez's greedy algorithm from 1985 is a simple and efficient way to find a $2$-approximate solution. The algorithm has the attractive feature of \emph{incrementality}: it outputs the centers one by one, with a guaranteed $2$-approximation for every prefix of the obtained sequence of centers.
  Incrementality imposes a geometric constraint on how solutions can be built, and it is natural to ask whether this comes at a price in the quality of the solution. It is known that in polynomial time, the approximation ratio of $2$ is best possible, assuming $P \neq NP$. In this paper we show that even with \emph{unlimited} computational power, the factor $2$ cannot be improved, if the solution is required to be built incrementally. The lower bound construction imposes a tradeoff between all $n$ levels of the clustering simultaneously; it was obtained with the help of ChatGPT, an aspect we discuss in Section 3 of the paper.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08713v1</guid>
      <category>cs.DS</category>
      <category>cs.CG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>L\'aszl\'o Kozma</dc:creator>
    </item>
    <item>
      <title>Hybrid Neural Network and Conventional Controller Approach for Robust Control of Highly Unstable Systems: Application to Tilt-Rotor Control</title>
      <link>https://arxiv.org/abs/2606.08714</link>
      <description>arXiv:2606.08714v1 Announce Type: new 
Abstract: Multirotors are widely used in applications ranging from surveillance to precision agriculture, yet conventional designs remain limited by their under-actuation. Tilt-rotor configurations overcome this limitation by enabling full actuation. This paper investigates neural-network-based control strategies for a fully actuated tilt-rotor system with four thrust-vectoring inputs. Our work is structured in two parts. First, we deliberately present a negative result by evaluating a direct input-output control approach. In this method, multilayer perceptrons (MLPs), long short-term memory (LSTM) networks, and transformer models are trained to map system states and their desired values directly to control signals. We show that this strategy fails to stabilize the system, highlighting the inherent difficulty of applying direct input-output learning to highly unstable plants. Second, as the main contribution, we propose a neural-network-enhanced sliding mode controller (SMC). The method decomposes the system dynamics into input-independent and input-dependent components, with the former learned from a small dataset using lightweight networks, thereby reducing real-time computational demands. Moreover, the proposed method can be trained using flight logs collected from low-performance controllers, and the resulting dynamic model learned from real-world data can be used in simulation. We further compare MLP- and LSTM-based implementations under model uncertainties and external disturbances, demonstrating the robustness and effectiveness of the proposed approach; in particular, the controller with the LSTM plant dynamics predictor achieves superior performance to its MLP-based counterpart while also exhibiting lower runtime.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08714v1</guid>
      <category>eess.SY</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.6084/m9.figshare.32572083</arxiv:DOI>
      <dc:creator>Ali Kafili Gavgani, Amin Talaeizadeh, Aria Alasty, Hossein Nejat Pishkenari</dc:creator>
    </item>
    <item>
      <title>Operationalizing Linguistic Methods through Prompt-Engineering Skills: An Automatic Chinese Web Neologism Detection Pipeline</title>
      <link>https://arxiv.org/abs/2606.08715</link>
      <description>arXiv:2606.08715v1 Announce Type: new 
Abstract: We present a method for automatic Chinese web neologism detection that operationalizes traditional linguistic identification principles as prompt-engineering skills. The method has four stages: tokenizer-independent character n-gram candidate generation; dictionary anchoring with a Pointwise Mutual Information pre-filter; a well-formedness skill based on Chinese word-formation principles; and a combined rule and three-way classification skill that distinguishes neologism, entity, and none. Applied to the BAAI CCI 3.0 corpus (267M documents), the method produces 226,959 classified candidates including 4,853 labeled neologisms. To evaluate the method, we develop a per-stage conditional recall decomposition in which the pipeline's strict recall factors mathematically into the product of stage conditional recalls. Applied to Hou (2023) (4,199 entries), the decomposition exposes Stage 1 candidate coverage and Stage 4B LLM semantic judgment as the two bottlenecks (R=41.5% and 60.0% respectively), while intermediate stages are near-lossless. A length-stratified analysis further reveals that the structural well-formedness skill is length-invariant (&gt;= 96.9%) whereas the semantic novelty-classification skill is length-dependent (65.6%/59.0%/44.1% across 2/3/4-character candidates), mapping a current boundary of skill-based linguistic operationalization. We release the method, pipeline outputs, and evaluation protocol as public resources.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08715v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yufeng Wu, Meichun Liu</dc:creator>
    </item>
    <item>
      <title>Deep Active Re-Labeling: Toward Noise-Resilient Annotation Efficiency</title>
      <link>https://arxiv.org/abs/2606.08718</link>
      <description>arXiv:2606.08718v1 Announce Type: new 
Abstract: While Deep Active Learning (DAL) effectively reduces human annotation costs, its efficacy is constrained by human annotation errors. This is because the data sampled for active learning is assumed to be highly informative for training. When human annotators introduce errors into this informative data at a certain rate, the active learning performance drops significantly and, in some cases, even exhibits worse outcomes than passive learning. In this paper, we first analyze the impact of human annotation errors in the DAL setting. Then we propose a framework to address the human annotation noise problem for DAL. Informed by human learning patterns, the core idea of our proposed solution involves allocating a portion of the human annotation budget to re-annotate data that has already been labeled. Previous theoretical work suggests that when the model possesses a certain level of ability to identify potentially noisy data, even re-labeling a small fraction of the data can effectively remove noise from the active training set. To achieve this, we implement two active noise sampling strategies to detect noise under different circumstances and allocate a part of the annotation budget to re-annotate these instances. Our approach imbues active learning with a revisiting and introspective behavior. Our experiments demonstrate that, under the same annotation budget, our method is more data-efficient and yields a relatively noise-free annotation dataset in the end.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08718v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1109/BigData66926.2025.11402126</arxiv:DOI>
      <arxiv:journal_reference>2025 IEEE International Conference on Big Data (BigData), Macau, China, 2025, pp. 886-895</arxiv:journal_reference>
      <dc:creator>Md Abdullah Al Forhad, Weishi Shi</dc:creator>
    </item>
    <item>
      <title>Thinking Without Images: Internalizing Visual Manipulation with On-Policy Self-Distillation</title>
      <link>https://arxiv.org/abs/2606.08719</link>
      <description>arXiv:2606.08719v1 Announce Type: new 
Abstract: ''Thinking with Images'' has emerged as an effective paradigm for fine-grained visual reasoning: by explicitly zooming into relevant regions and reasoning over crops, models can access local evidence that is difficult to recover from a single global image. However, this benefit comes with redundant tool invocations and longer inference traces. Moreover, when such behaviors are learned mainly from outcome reward, the resulting intermediate crops or visual cues can be noisy or fail to faithfully capture task-relevant visual evidence. In this work, we ask whether the reasoning benefits of ''Thinking with Images'' can be internalized through Thinking with Imagination: an internal process that decides where to look and imagines what visual cues closer inspection would reveal without actually invoking tools. We propose Imagine-OPD, an on-policy self-distillation framework in which a teacher plays the role of a ''Thinking with Images'' reasoner during training: it receives privileged zoomed evidence views derived from annotated regions, and supervises the model's own imagination reasoning trajectories. Imagine-OPD does not require an external teacher or high-quality imagination demonstrations. Experiments on vision-centric benchmarks show that Imagine-OPD achieves the best average performance among compared models while significantly reducing inference overhead compared with ''Thinking with Images'' methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08719v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yishuo Cai, Jiahui Liu, Yuanxin Liu, Haobo Deng, Linli Yao, Yuhao Zheng, Kun Ouyang, Zhimo Li, Ziyue Wang, Xu Sun, Haoli Bai, Xiaohui Li</dc:creator>
    </item>
    <item>
      <title>A Geometric Measure of Linear Separability for Neural Representations</title>
      <link>https://arxiv.org/abs/2606.08721</link>
      <description>arXiv:2606.08721v1 Announce Type: new 
Abstract: Modern neural classifiers commonly rely on linear readouts, yet predictive metrics alone do not characterize the class-wise geometry of the representations on which such readouts operate. We introduce the directional linear separability measure (LSM), a finite-sample diagnostic for one-sided affine separability. For a target class A and a competing set B, LSM searches over affine halfspaces that contain all samples in A and measures the smallest competing-sample intrusion that must remain on the target side, normalized by |A|. The resulting quantity is asymmetric, class-wise, target-normalized, and applicable to finite representations extracted from neural networks. We establish its supporting-hyperplane characterization, relate it to optimal affine classification accuracy, and prove invariance under full-rank linear embeddings. These results separate changes caused by linear reparameterization from those caused by information loss or nonlinear geometric transformations. We also give a penalty-based affine search for estimating class-wise LSM in high-dimensional features, with reported values computed from the original discrete preservation and violation criterion. Finally, we analyze coordinatewise gated nonlinearities as finite-sample geometric operators and empirically use LSM to diagnose class-wise intrusion across common deep-learning components and architectures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08721v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yi Wei, Xuan Qi, Furao Shen</dc:creator>
    </item>
    <item>
      <title>Can LLMs understand LilyPond? A benchmark for symbolic music generation and understanding</title>
      <link>https://arxiv.org/abs/2606.08722</link>
      <description>arXiv:2606.08722v1 Announce Type: new 
Abstract: Symbolic music evaluation for large language models remains fragmented across representations, datasets, and metrics. We introduce LilyBench, a LilyPond-based benchmark that jointly evaluates symbolic music generation and music understanding on the same family of open-weight LLMs. The benchmark includes a 200-prompt generation suite and ten understanding tasks adapted from ABC-Eval, covering syntax, metadata prediction, structural sequencing, and music recognition. Generation quality is evaluated using compile rate, MusPy descriptor distributions via Jensen-Shannon similarity, and LilyBERT-based Fr\'echet Music Distance (FMD). Experiments on four open-weight models show that executable LilyPond generation is achievable in zero-shot settings, while structural understanding tasks remain challenging despite strong performance on composer and genre recognition. Our experiments also reveal systematic disagreements between descriptor-based and embedding-based metrics, suggesting that symbolic music evaluation benefits from metric triangulation rather than single-score ranking. We release the benchmark, prompt bank, and evaluation code to support future research in symbolic music generation and understanding at https://github.com/CSCPadova/lilybench</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08722v1</guid>
      <category>cs.SD</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Matteo Spanio, Mohammad Torabi, Andrea Poltronieri, Antonio Rod\`a</dc:creator>
    </item>
    <item>
      <title>From Text to Discovery: How Are LLMs Reshaping Scientific and Humanistic Research?</title>
      <link>https://arxiv.org/abs/2606.08723</link>
      <description>arXiv:2606.08723v1 Announce Type: new 
Abstract: Large Language Models (LLMs) are rapidly reshaping academic research across the natural sciences, social sciences, and humanities, yet the scientific community lacks a comprehensive, cross-disciplinary account of how these tools are being integrated, what they deliver, and where they fall short. This paper addresses that gap by mapping their current state and outlining an agenda for their responsible integration into scientific research. Our analysis reveals a consistent pattern: LLMs meaningfully accelerate research workflows -- from hypothesis generation and literature synthesis to data analysis and scientific writing -- while introducing serious challenges related to hallucination, reproducibility, dataset bias, and model opacity. Beyond technical limitations, we identify ten underexplored challenges, including the erosion of researcher autonomy, AI-driven confirmation bias, authorship ambiguity, and unequal access to these technologies -- systemic risks that demand interdisciplinary governance frameworks, robust validation standards, and expanded explainability research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08723v1</guid>
      <category>cs.DL</category>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Saleh Afroogh, Yasser Pouresmaeil, Yiming Xu, Kevin Chen, Abhejay Murali, Junfeng Jiao</dc:creator>
    </item>
    <item>
      <title>Real-Time and Accurate Collision-Free Teleoperation via Differentiable Constraint-Based Trajectory Planning</title>
      <link>https://arxiv.org/abs/2606.08725</link>
      <description>arXiv:2606.08725v1 Announce Type: new 
Abstract: In teleoperation, the human operator typically controls only the end-effector pose, which often leads to self-collisions of the manipulator and collisions with environmental obstacles, since joints and links are not controlled individually. A common strategy to mitigate this issue is to enhance the operator's input using optimal-control-based trajectory planning. As derivative-based solvers require differentiable constraints, existing approaches either approximate robots and obstacles with spheres, reducing geometric accuracy, or approximate derivatives, degrading convergence and increasing computation times. We address these limitations by adapting a recent formulation of differentiable collision-avoidance constraints, based on duality in convex optimization, to the teleoperation setting. The robot is approximated with capsules and the environment with polytopes. We compare the resulting trajectory planning method against state-of-the-art techniques in simulation with varying numbers of obstacles and evaluate it on a UR5e manipulator in a real-world teleoperation test. Results show that our approach achieves lower computation times while enabling more accurate obstacle modeling, leading to smoother and collision-free end-effector teleoperation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08725v1</guid>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Max Grobbel, Tristan Schneider, Daniel Fl\"ogel, S\"oren Hohmann</dc:creator>
    </item>
    <item>
      <title>Evaluating Multimodal Steganalysis for Split-Payload Audiovisual Steganography</title>
      <link>https://arxiv.org/abs/2606.08726</link>
      <description>arXiv:2606.08726v1 Announce Type: new 
Abstract: The aim of steganography is to hide secret information inside ordinary media so that the existence of communication is hidden rather than encrypted. In audiovisual context, the availability of audio and video streams creates an opportunity to split a payload across these two modes thus, reducing the embedding burden on any single carrier. This paper evaluates whether such split-payload audiovisual steganography can help evade unimodal and multimodal steganalysis under synchronized and asynchronous embedding settings. We create audiovisual samples where the hidden message is divided between the audio and video tracks, and then test how well different detectors can identify them. The single mode detectors performs close to random guessing, thus showing the benefit of this hiding mechanism, while the multimodal model initially appears more effective. However, further checks show that this improvement mostly comes from the video stream, not from a true combined audio-video signal. Overall, the results suggest that splitting the payload across modalities can make detection harder, but multimodal detectors must be evaluated carefully to ensure they are learning the intended signal.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08726v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Prateek Paudel, Nitin Jha, Abhishek Parakh</dc:creator>
    </item>
    <item>
      <title>Compositional Approximation Can Strictly Outperform Superpositional Approximation</title>
      <link>https://arxiv.org/abs/2606.08727</link>
      <description>arXiv:2606.08727v1 Announce Type: new 
Abstract: Many classically studied function classes are known to be approximated optimally by superpositional methods, i.e. with approximants constructed as the linear combination of elements in some dictionary. Here optimality means that the uniform approximation error viewed as a function of the number of parameters used has polynomial decay of the highest order achievable by any parametrized method whose parameters can be encoded as a bit string of length proportional, up to logarithmic factors, to the number of parameters. While compositional methods like neural networks are structurally different, their approximation rates can be made comparable by imposing constraints that ensure such a proportional bit string encoding. In this work we study function classes exhibiting structural properties that limit superpositional approximation rates to be strictly lower than compositional approximation rates. In particular, we construct explicit examples for which there is an arbitrarily large gap.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08727v1</guid>
      <category>math.NA</category>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dennis Elbr\"achter, Philipp Petersen</dc:creator>
    </item>
    <item>
      <title>Artificial Intelligence for Mathematical Reasoning: An Integrated Survey of Language Models, Neuro-symbolic Systems, and Verified Discovery</title>
      <link>https://arxiv.org/abs/2606.08728</link>
      <description>arXiv:2606.08728v1 Announce Type: new 
Abstract: Mathematical reasoning has long served as a stringent test of machine intelligence; over the past decade, it has moved from a niche problem within NLP to one of the most consequential AI frontiers. This survey provides a unified account of the field's evolution, from early rule-based math word problem (MWP) solvers and template-driven geometry systems, through neural expression generation and LLM prompting, to contemporary reasoning models, multi-agent systems, neuro-symbolic theorem provers, and verified discovery workflows. We organize the landscape along four axes: (i) informal reasoning over text and diagrams, spanning MWP solving, multimodal geometry, and VLMs; (ii) formal reasoning in proof assistants, including autoformalization, tactic prediction, compiler-guided repair, and proof search; (iii) mathematical discovery, where systems propose constructions, improve bounds, or assist attacks on open problems; and (iv) the inference and training-time techniques, including CoT prompting, tool use, process reward models, and RLVR, that increasingly connect generation with verification. We catalog major benchmarks across grade-school arithmetic, competition mathematics, geometry, formal proving, multimodal and multilingual reasoning, and expert evaluation, and we examine benchmark saturation, contamination, reporting mismatches, and the distinction between pass@1, majority voting, and verifier-assisted pass@$k$. We critically assess failure modes: brittleness under perturbation, reward hacking, multimodal grounding failures, fragile formalization, and the energy cost of reasoning-scale inference. Drawing on recent perspectives from working mathematicians, we identify future directions centered on verified-discovery workflows, reasoning efficiency, and infrastructure to make AI-assisted formalization broadly usable. Companion materials: https://github.com/Starscream-11813/awesome-AI4Math.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08728v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Syed Rifat Raiyan, Mohsinul Kabir, Hasan Mahmud, Md Kamrul Hasan</dc:creator>
    </item>
    <item>
      <title>IR-SIM: A Lightweight Skill-Native Simulator for Navigation, Learning, and Benchmarking</title>
      <link>https://arxiv.org/abs/2606.08729</link>
      <description>arXiv:2606.08729v1 Announce Type: new 
Abstract: Simulation plays a key role in automated robotics research supported by large language models (LLMs). However, existing simulators often require custom code or complex interfaces, creating a barrier to rapid prototyping and automated algorithm development. To this end, we propose the Intelligent Robot Simulator (IR-SIM), a lightweight skill-native navigation simulator designed for rapid scenario construction, benchmarking, and robot learning. In IR-SIM, scenarios are entirely defined by YAML configuration files that specify mobile robot kinematics, geometric collision checking, LiDAR sensing, visualization, and behavior modules. This design makes robotic simulation fully describable and reproducible, allowing scenarios to be generated and modified from text prompts through the proposed IR-SIM agent skills. The resulting scenarios can be used for automated benchmarking of navigation algorithms and for automated generation of training data for learning methods. Furthermore, IR-SIM provides bridges to high fidelity simulators and real world deployment, allowing users to validate their algorithms in more realistic settings after prototyping without extra coding. The experiments showcase the convenience and versatility of IR-SIM in multiple tasks: constructing navigation scenarios from natural language, training a collision avoidance policy, benchmarking social navigation policies, and bridging to high fidelity simulators and real world deployment. The project website is available at https://github.com/hanruihua/ir-sim.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08729v1</guid>
      <category>cs.RO</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ruihua Han, Shuai Wang, Chengyang Li, Rui Gao, Xinyi Wang, Zhe Liu, Guoliang Li, Yupu Lu, Qi Hao, Jia Pan, Hengshuang Zhao</dc:creator>
    </item>
    <item>
      <title>Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2606.08735</link>
      <description>arXiv:2606.08735v1 Announce Type: new 
Abstract: Quality-diversity reinforcement learning (QD-RL) aims to construct policy repertoires that contain both high-performing and behaviorally diverse policies. Existing QD-RL methods mainly diversify policy instances after rollout evaluation or use learned value information to improve policy quality and behavior targeting, while the learning branches that generate candidate policies remain less explored. This paper proposes SV-QD-RL, a structure-value coupled framework that represents each candidate as a structure-conditioned actor-critic branch. Each branch contains an actor, a structural mask, a branch-specific critic, a replay state, and evaluation attributes including behavior, return, sparsity, and value profile. The structural mask defines the actor subspace in which the branch learns, while the branch-specific critic and replay state shape its value-learning trajectory. A branch-aware QD archive then evaluates and retains branches according to behavioral quality, structural footprint, and value-profile information. Experiments on MuJoCo continuous-control tasks show that SV-QD-RL constructs policy repertoires with strong archive quality and behaviorally useful diversity. Ablation and diagnostic analyses further indicate that structural conditioning, critic differentiation, and memory-consistent refinement make complementary contributions to behavioral specialization. Schedule-aware repertoire evaluation shows that the learned archive provides selectable policy alternatives under changing behavior-level requirements. These results suggest that coupling actor structure with branch-specific value learning is an effective mechanism for generating diverse QD-RL policy repertoires.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08735v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lianrong Zuo, Peilan Xu, Yong Liu, Wenjian Luo</dc:creator>
    </item>
    <item>
      <title>Declarative Outcome-Conformant Synthesis: Exact, Closed-Form Specification Satisfaction and a Conformance Benchmark</title>
      <link>https://arxiv.org/abs/2606.08736</link>
      <description>arXiv:2606.08736v1 Announce Type: new 
Abstract: We study a capability the dominant paradigm in synthetic tabular data does not provide: exact satisfaction of a declared analytical outcome with no source data. Imitation methods (copulas, GANs, diffusion) learn a real distribution and sample from it, and are judged on fidelity to real data. A large, practical class of needs is different: generating data with no source data ("cold start") that reproduces a declared outcome (a revenue curve, a churn rate, a group share) across a relational schema. Off-the-shelf imitation tools offer no interface for such targets, and no sampler can hit an exact aggregate, because sampling has variance. On a real public dataset, off-the-shelf learned synthesizers trained on that very data miss the declared monthly aggregate by 74 to 86 percent; a per-period steelman cuts the miss to about 19 percent and still cannot reach 0; a closed-form generator reaches exactly 0. We name this task outcome-conformant synthesis, argue its evaluation axis is conformance rather than fidelity, and show the two axes are orthogonal. We contribute: (1) a formal account showing a widely-used family of exact-aggregate generators is exactly conditional-sum sampling of a Gamma population (via Lukacs' characterization), with closed-form exactness, a closed-form marginal CV, and scale-invariance; a controlled experiment maps the boundary, enforcing the exact aggregate costs at most 0.006 in 1-Wasserstein distance to an arbitrary external marginal, the rest being shape-family mismatch; (2) SpecBench, to our knowledge the first benchmark to measure conformance to analytical outcomes for cold-start relational synthesis; and (3) a closed-form, deterministic reference system. Exact aggregation alone is trivial; the contribution is conformance jointly with closed-form marginals, integrity, determinism, and zero source data. We concede fidelity to imitation where real data exists.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08736v1</guid>
      <category>cs.LG</category>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Muhammed Rasin</dc:creator>
    </item>
    <item>
      <title>Dream-Tac: A Unified Tactile World Action Model for Contact-Rich Robot Manipulation</title>
      <link>https://arxiv.org/abs/2606.08737</link>
      <description>arXiv:2606.08737v1 Announce Type: new 
Abstract: World action models inherit the predictive capability of world models, enabling action generation to be guided by anticipated future observations. However, they rely primarily on vision and often fail in contact-rich manipulation, where critical cues arise from physical interaction. In this paper, we propose Dream-Tac, a unified Tactile-World Action Model that jointly models actions, future visual observations, and tactile dynamics. Specifically, Dream-Tac introduces (i) contact-gated visuotactile fusion to selectively integrate tactile signals and (ii) a contact-aware attention bias to better regulate cross-modal interactions during manipulation. To support real-time deployment, we further design a dual-level acceleration strategy, reformulating the contact-aware bias to preserve the fused attention path during training and introducing cache-based diffusion acceleration at inference, achieving up to 2.9$\times$ faster training and 1.8$\times$ faster inference. Across six contact-rich manipulation tasks, Dream-Tac improves action accuracy by 31.7\% on average, demonstrating the effectiveness of unified visuotactile world modeling.Code is available at https://github.com/LYFCLOUDFAN/Dream-Tac.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08737v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yunfan Lou, Yifan Ye, Yankai Fu, Jun Cen, Xiaowei Chi, Yaoxu Lyu, Peidong Jia, Sirui Han, Zhihe Lu, Shanghang Zhang</dc:creator>
    </item>
    <item>
      <title>Systems-Level Planning and Coordination of Truck-Drone Collaborative Delivery Networks</title>
      <link>https://arxiv.org/abs/2606.08738</link>
      <description>arXiv:2606.08738v1 Announce Type: new 
Abstract: Urban last-mile parcel delivery increasingly relies on heterogeneous fleets whose performance depends on timely coordination, reliable communication, and scalable control. Truck-drone collaboration has emerged as a networked cyber-physical delivery paradigm that combines the payload capacity and range efficiency of trucks with the agility of drones in congested or access-limited urban environments. This paper proposes a layered planning and coordination framework that structures truck-drone collaborative delivery (TDCD) from a systems and control perspective. The framework consists of five interrelated layers: spatial-demand alignment, collaborative delivery configuration, resource and workflow orchestration, performance evaluation, and scalability analysis, providing a unified view of coordination, control, and system-level performance in networked delivery operations. The proposed framework is evaluated using a realistic urban last-mile delivery scenario derived from the 2021 Amazon Last Mile Routing Research Challenge dataset. The case study demonstrates how coordinated truck-drone operation, enabled by structured task orchestration and inter-agent synchronization, improves end-to-end system efficiency under operational constraints. Results show a 42.4% reduction in total delivery time and a 44.2% reduction in energy consumption compared to a conventional truck-only delivery model. The scalability analysis further highlights how coordination gains persist as system size increases, and shows the importance of efficient control and communication in heterogeneous delivery networks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08738v1</guid>
      <category>cs.NI</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Didem Cicek, Burak Kantarci</dc:creator>
    </item>
    <item>
      <title>The Minimal Retroreflective Microfacet Model</title>
      <link>https://arxiv.org/abs/2606.08739</link>
      <description>arXiv:2606.08739v1 Announce Type: new 
Abstract: We present the Minimal Retroreflective Microfacet (MRM) model, which turns any existing microfacet BSDF into a physically plausible retroreflective one by a single substitution: replacing the view direction with its reflection about the surface normal before evaluating the standard model. Based on the previously published back-vector formulation, MRM requires only minimal code changes and has been adopted in the OpenPBR and MaterialX material standards. We prove reciprocity and energy conservation under the assumption of a reflection-symmetric normal distribution function (NDF), which holds for all commonly used distributions, and validate the model against measured retroreflective material data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08739v1</guid>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>Journal of Computer Graphics Techniques (JCGT), Vol. 15, No. 1, pp. 60-75, 2026</arxiv:journal_reference>
      <dc:creator>Jamie Portsmouth (Autodesk), Matthias Raab (NVIDIA), Laurent Belcour (Intel), Francis Liu (NVIDIA)</dc:creator>
    </item>
    <item>
      <title>Safe, Fluent and Acceptable Motion Generation and Execution for Human--Robot Interaction in Manufacturing Environments</title>
      <link>https://arxiv.org/abs/2606.08741</link>
      <description>arXiv:2606.08741v1 Announce Type: new 
Abstract: Robots operating in human environments must not only ensure physical safety but also exhibit behaviors that are understandable, fluent, and acceptable to human partners. This paper investigates motion generation strategies that combine safety guarantees with interaction quality considerations, such as motion smoothness and human comfort. While the design of robots capable of ensuring safety in shared human-robot environments has enabled closer and more advanced forms of interaction, these new proximity-based tasks require moving beyond purely technical considerations. In particular, robot behavior must also be addressed from psycho-cognitive and social perspectives. In this context, we argue for the relevance of integrating social-aware motion control into robotic systems. First, we identify the motion parameters that influence human perception and operator experience. Then, we implement a Model Predictive Control (MPC) framework that generates four distinct socially-informed robot behaviors. Finally, we conduct a user study to evaluate and validate these behaviors and assess their social impact on non-expert participants. The results demonstrate that variations in robot behavior significantly affect the perceived social acceptability of the system. These findings highlight the importance of incorporating human-centered considerations into motion generation strategies for robots operating in shared environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08741v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Thibaut Lopez, Olivier Aycard, Pierre-Brice Wieber, Mohamed Boua, Christine Jeoffrion</dc:creator>
    </item>
    <item>
      <title>AUCp: Pseudo-AUC for Inference Model Selection with Unlabeled Validation Data in Abnormality Detection</title>
      <link>https://arxiv.org/abs/2606.08742</link>
      <description>arXiv:2606.08742v1 Announce Type: new 
Abstract: Abnormality detection is a crucial yet challenging task in medical image analysis. Distinguishing abnormalities from normal data by learning to reconstruct normal-only data alleviates the reliance on labeled datasets. However, many studies, even if unsupervised, rely on a labeled validation set to select the best model for inference from multiple training iterations. For many diseases labeled data are unavailable and substantially time consuming to obtain. To address this, AUCp - a novel metric that supports abnormality detection for unsupervised and self-supervised methods is proposed. Instead of evaluating the realism of reconstructed images to select the best of model for inference, it focuses on actual detection performance and without requiring an annotated test set. Assuming the pseudo ground truth of all unannotated samples in the test set as abnormal/positive and using traditional AUC calculation, AUCp scores are derived. Given a large and representative training set of normal samples, we show mathematical and empirical evidence that model selection using AUCp scores improves disease detection in terms of unsupervised and self-supervised methods over conventional metrics. Using two unsupervised methods for neurologic disease detection and self-supervised methods on diverse datasets, our results demonstrate that the AUCp score effectively identifies the optimal model for inference, significantly enhancing abnormality and disease detection. The corresponding implementations are available in https://github.com/mahfuzmohammad/AUCp.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08742v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1109/TMI.2026.3684946</arxiv:DOI>
      <arxiv:journal_reference>IEEE Transactions on Medical Imaging (Early Access), 2026</arxiv:journal_reference>
      <dc:creator>Md Mahfuzur Rahman Siddiquee, Fazle Rafsani, Jay Shah, Teresa Wu, Catherine D Chong, Todd J Schwedt, Baoxin Li</dc:creator>
    </item>
    <item>
      <title>Guided Discovery of New Behaviors using Diffusion Policies</title>
      <link>https://arxiv.org/abs/2606.08743</link>
      <description>arXiv:2606.08743v1 Announce Type: new 
Abstract: Diffusion models have become a powerful tool for generative modeling in robotics, with diffusion policies excelling at modeling multimodal action-trajectory distributions. However, when demonstrations are limited, standard sampling often reproduces dominant behaviors while neglecting valid but rare modes, limiting the discovery of novel solutions. Existing approaches, such as guidance methods or combining reinforcement learning with diffusion, either push samples into infeasible regions or struggle to escape local minima, failing to systematically uncover diverse behaviors. To address these challenges, we propose a framework that combines Feynman-Kac correctors with a novel guiding potential that systematically guides diffusion policy samples towards promising yet underrepresented samples. These trajectories are refined using sampling-based trajectory optimization and reincorporated into the training set to retrain the diffusion policy. Our method effectively mines and repairs novel trajectories, enabling the systematic discovery of diverse and executable behaviors. We demonstrate the effectiveness of our framework across a range of manipulation environments, consistently discovering new behaviors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08743v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dian Yu, Sebastian Sanokowski, Majid Khadiv</dc:creator>
    </item>
    <item>
      <title>MB-Loc: Multi-planar Bird's-eye-view Localization in outdoor LiDAR scenes</title>
      <link>https://arxiv.org/abs/2606.08744</link>
      <description>arXiv:2606.08744v1 Announce Type: new 
Abstract: Global LiDAR localization is a fundamental task for autonomous navigation systems. Recent methods perform Scene Coordinate Regression (SCR) and achieve superior accuracy over Absolute Pose Regression (APR) solutions by predicting dense 3D world coordinates. However, SCR approaches introduce two major bottlenecks: severe computational inefficiency from processing raw 3D geometries and significant performance degradation under varying sensor viewpoints. To address these limitations, we present MB-Loc, a lightweight and viewpoint-robust SCR framework. Instead of relying on heavy 3D convolutions, we project the input LiDAR scan into a 2.5D Multi-planar Bird's-Eye View (BEV) representation. By slicing the point-cloud along the Z-axis and mapping signed depths into discrete 2D planes, MB-Loc retains essential 3D geometric structures while exploiting the computational tractability of standard 2D CNNs. To handle the inherent sparsity of outdoor LiDAR, we introduce a KL-regularized latent bottleneck that explicitly models spatial uncertainty without injecting stochastic noise. Finally, to ensure rotation robustness, we apply 3D spatial augmentations prior to planar projection, forcing the network to implicitly learn viewpoint-invariant features. We perform extensive experiments on the publicly available NCLT dataset and demonstrate that our proposed method outperforms the current state-of-the-art. Operating at real-time inference speeds, MB-Loc significantly outperforms traditional 3D-SCR architectures in computational efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08744v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ayaan Choudhury, Preet Savalia, Anirudh Pydah, Avinash Sharma</dc:creator>
    </item>
    <item>
      <title>Stain-Aware Wavelet Regularization for Instant Adversarial Purification in Histopathology</title>
      <link>https://arxiv.org/abs/2606.08745</link>
      <description>arXiv:2606.08745v1 Announce Type: new 
Abstract: Deep learning has become prevalent in computational pathology pipelines that support tasks such as cancer screening and digital pathology analysis. However, the susceptibility of neural networks to adversarial perturbations raises safety concerns for reliable deployment in clinical practice. In histopathological images, this challenge is exacerbated by the difficulty of distinguishing high-frequency adversarial noise from subtle and diagnostically relevant tissue structures. To address this issue, we propose Stain-Aware Wavelet Regularization (SAWR), an adversarial purification framework that leverages multi-level wavelet-domain regularization based on Haar transform to hierarchically disentangle adversarial perturbations from diagnostic structural information. This spectral constraint is further extended to individual histological channels, enabling stain-specific frequency regulation consistent with the biological properties of Hematoxylin and Eosin. When integrated into an instant purification framework, SAWR improves adversarial robustness by up to 10.69\% over the baseline approach, while maintaining texture and spectral fidelity under adversarial perturbations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08745v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Zhe Li, Bernhard Kainz</dc:creator>
    </item>
    <item>
      <title>HydraQE: OSU's Submission for the IWSLT 2026 Speech Translation Metrics Shared Task</title>
      <link>https://arxiv.org/abs/2606.08748</link>
      <description>arXiv:2606.08748v1 Announce Type: new 
Abstract: We present HydraQE, our contribution to the IWSLT 2026 Speech Translation Metrics shared task. HydraQE is an end-to-end, reference-free quality estimation (QE) system for speech translation built on a Qwen3-ASR backbone, which accepts source audio and a translation hypothesis as joint input. Hidden states from all backbone layers are combined via a learnable sparsemax scalar mix, then re-encoded by a lightweight bidirectional Transformer to enable full cross-modal interaction prior to pooling into a shared embedding. Three independent prediction heads are trained on complementary supervision signals: human direct assessment (DA) annotations, MetricX-24 pseudo-labels, and xCOMET pseudo-labels. To address the scarcity of human-annotated data, we train on a combination of synthetically corrupted examples and silver pseudo-labeled machine translation outputs, using a curriculum that begins on synthetic and silver data and gradually shifts toward human-annotated examples. HydraQE outperforms cascaded text-based baselines and prior direct speech QE systems, demonstrating that end-to-end speech translation QE is competitive with cascaded approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08748v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kevin Krahn, Eric Fosler-Lussier</dc:creator>
    </item>
    <item>
      <title>New Codes from Cyclic and Negacyclic Codes of Even Length over $\mathbb{Z}_4$</title>
      <link>https://arxiv.org/abs/2606.08750</link>
      <description>arXiv:2606.08750v1 Announce Type: new 
Abstract: This paper uses theoretical results previously established in the literature to design search algorithms to find new linear codes over $\mathbb{Z}_4$ from cyclic and negacyclic codes of even length. As a result of these searches, we have found 2500 new cyclic codes and 730 negacyclic codes. These new codes exhibit improved parameters compared to previously known codes. Additionally, we have obtained binary quantum codes with good parameters from such $\mathbb{Z}_4$ codes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08750v1</guid>
      <category>cs.IT</category>
      <category>math.CO</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Nuh Aydin, Mohamed O. Belghith, Godwin Idowu, Trang T. T. Nguyen, Long B. Tran</dc:creator>
    </item>
    <item>
      <title>Less Is More: Training-Free Acceleration Framework of 3D Diffusion Models for Low-Count PET Denoising via Global-Local Trajectory Reduction</title>
      <link>https://arxiv.org/abs/2606.08751</link>
      <description>arXiv:2606.08751v1 Announce Type: new 
Abstract: Accurate quantification and uptake measurement in PET are critical for assessing disease progression and supporting clinical decision-making. While high-count PET provides reliable image quality, the associated radiation dose and prolonged acquisition remain significant clinical concerns, motivating the adoption of low-count protocols. Diffusion-model-based methods have demonstrated strong potential for restoring low-count PET to near high-count quality, but their iterative sampling procedure becomes prohibitively expensive when applied to high-resolution 3D PET volumes, introducing substantial inference latency that limits practical clinical deployment. To address these challenges, we propose a training-free Global-Local Skipping Strategy that accelerates diffusion model-based 3D PET denoising while simultaneously improving reconstruction quality. The proposed method is plug-and-play and directly applicable to pre-trained diffusion models without retraining or architectural modification. Specifically, we introduce: (i) a global denoising step skipping strategy that initializes the reverse diffusion process from an intermediate denoising step using a noise-consistent transformation of the low-count input, substantially reducing the number of required denoising steps; and (ii) a local feature reuse shortcut that reuses slowly-varying high-level U-Net features across neighboring denoising steps, further reducing per-step computation while preserving image fidelity. We evaluate the proposed approach on multiple PET tracers from in-house and public datasets, including 18F-FDG PET, 68Ga-DOTATATE PET, and 18F-PSMA PET, demonstrating consistent acceleration of over an order of magnitude alongside improved or comparable reconstruction performance relative to the full-step baseline. Blinded reader studies further confirm enhanced clinical confidence and perceived diagnostic quality.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08751v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuhan Liu, Scott M. Leonard, Marlee Crews, Muhannad Fadhel, Jinkui Hao, Tianqi Chen, Ryan J. Avery, Bo Zhou</dc:creator>
    </item>
    <item>
      <title>Co-Evolving Skill Generation and Policy Optimization</title>
      <link>https://arxiv.org/abs/2606.08755</link>
      <description>arXiv:2606.08755v1 Announce Type: new 
Abstract: Skill-augmented reinforcement learning improves language agents by storing reusable procedural knowledge acquired from past experience. Existing methods typically use strong language models to analyze trajectories, generate skills, and update a retrievable skill bank during online training. However, they rarely assess whether a newly generated skill is useful before it is stored and reused. We find that this assumption is unreliable: even skills generated by proprietary frontier LLMs exhibit highly mixed utility, with many providing little benefit or even degrading performance. Once such skills enter the bank, their effects are difficult to identify, because subsequent rollout feedback is delayed and usually reflects the combined effect of multiple retrieved skills rather than the marginal contribution of any individual skill. We propose an online reinforcement learning framework for pre-storage skill validation. The framework estimates whether a candidate skill contributes useful information beyond the skills already retrieved for the current task. It uses the standard rollout budget to form two matched groups under the same task and retrieval context: base rollouts conditioned on the currently retrieved skills, and skill-augmented rollouts conditioned on the same skills plus one candidate skill induced from the base trajectories. The reward gap between these two groups estimates the candidate skill's context-dependent marginal utility, enabling the framework to promote useful skills while filtering ineffective or harmful ones without additional rollout overhead. The framework further uses this marginal-utility signal to train the policy itself as a skill generator, reducing reliance on repeated calls to proprietary models. The learned skill-generation likelihood serves as a context-dependent score for retrieval-time reranking and outdated-skill pruning as the policy evolves.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08755v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhiwei Zhang, Yudi Lin, Nikki Lijing Kuang, Linlin Wu, Xiaomin Li, Songtao Liu, Fenglong Ma</dc:creator>
    </item>
    <item>
      <title>APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing</title>
      <link>https://arxiv.org/abs/2606.08761</link>
      <description>arXiv:2606.08761v1 Announce Type: new 
Abstract: W4A4 quantization promises full utilization of INT4 Tensor Cores, yet group dequantization overhead on CUDA Cores has driven existing systems to mixed-precision fallbacks. We present the first systematic study of how intra-SM compute balance governs this bottleneck. Through controlled benchmarks across four GPUs from Ampere and Ada architectures, we identify the Tensor Cores to CUDA Cores throughput ratio ($\rho$) as the primary hardware indicator: the W4A4-g128 kernel yields $2.0$--$2.5\times$ speedup on RTX~3090 ($\rho=16$) yet degrades to $0.43$--$0.47\times$ on A100 ($\rho=64$) in compute-bond scenarios, establishing W4A4 viability as platform-dependent rather than universally infeasible. Guided by this finding, we build \textbf{APEX4}, which co-designs pure INT4 GEMM kernels with $\rho$-aware granularity adaptation to mitigate the CUDA Cores dequantization bottleneck. APEX4 achieves perplexity within 0.63 of FP16 on LLaMA-2-70B and outperforms W4Ax Atom-g128 by 4.0\%--4.4\% in zero-shot accuracy. Deployed as a drop-in replacement in unmodified vLLM, it delivers up to $1.66\times$ end-to-end speedup on L40S ($\rho=8$), and $1.78\times$ on RTX~3090 ($\rho=16$), $2.09\times$ on A40 ($\rho=16$), while recovering A100 ($\rho=64$) to $1.20$--$1.40\times$ via the mixed-granularity mode.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08761v1</guid>
      <category>cs.DC</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hong Guo, Nianhui Guo, Weixing Wang, Jona Otholt, Christoph Meinel, Haojin Yang</dc:creator>
    </item>
    <item>
      <title>RGB-S: Image-Aligned Tactile Saliency for Robust Dexterous Manipulation</title>
      <link>https://arxiv.org/abs/2606.08765</link>
      <description>arXiv:2606.08765v1 Announce Type: new 
Abstract: Effective visuo-tactile integration is critical for robotic dexterous manipulation, especially when visual observations are unreliable or occluded. However, robustly aligning sparse, heterogeneous tactile measurements with dense visual representations remains a fundamental challenge. Most existing approaches require policies to learn cross-modal correspondences implicitly from limited demonstrations, without leveraging geometric priors. As a result, they are often data-inefficient and generalize poorly when visual observations are degraded. To address this limitation, we propose a framework that explicitly grounds physical contacts in the image domain. Using robot forward kinematics and camera calibration, we project tactile sensor locations directly onto the RGB image plane. We then render force-modulated Gaussian saliency maps to model spatial uncertainty arising from kinematic and calibration errors. By integrating these 2D spatial anchors through a zero-initialized conditioning architecture, our method injects physical contact priors into standard visual backbones while preserving pre-trained visual representations. We evaluate our method on six dexterous manipulation tasks in both simulation and the real world under severe visual occlusions. Real-world experiments show that explicit RGB-S grounding in the image domain improves real-world occluded manipulation success rates by $26.7$ percentage points over the strongest implicit visuo-tactile baseline, suggesting its improved spatial reasoning and robustness to occlusion. Project page: touch-as-saliency.github.io</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08765v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shengcheng Luo, Kefei Wu, Xiaoying Zhou, Wanlin Li, Ziyuan Jiao, Chenxi Xiao</dc:creator>
    </item>
    <item>
      <title>Understanding the Parameter Space Geometry of Transformers Encoding Boolean Functions</title>
      <link>https://arxiv.org/abs/2606.08768</link>
      <description>arXiv:2606.08768v1 Announce Type: new 
Abstract: Transformers consistently fail to learn certain simple functions that are provably expressible with specific parameter settings. This gap between learnability and expressivity is particularly prominent for sensitive functions -- functions whose output is likely to change if a single bit of the input is flipped -- for example, PARITY. While prior work has established that transformers exhibit a bias toward functions with low average sensitivity, the precise mechanism underlying this bias remains poorly understood. To shed light on this phenomenon, we study the geometry of transformers' parameter space. We show that sensitive functions -- even when representable -- occupy a vanishingly small region that random initialization is very likely to miss. Specifically, we shift the focus from average sensitivity to the full sensitivity profile -- the distribution of sensitivity values across all inputs -- and prove that randomly initialized transformers almost surely compute functions which have low-sensitivity strings. Consequently, any function that lacks such strings is provably unlearnable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08768v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Blanka K\"over, Alexandra Butoi, Anej Svete, Michael Hahn, Ryan Cotterell</dc:creator>
    </item>
    <item>
      <title>RadOT-Eval: Auditable Structured-Evidence Transport for Radiology Report Evaluation</title>
      <link>https://arxiv.org/abs/2606.08769</link>
      <description>arXiv:2606.08769v1 Announce Type: new 
Abstract: Automatic evaluation is critical for high-stakes text generation, where errors often involve omitted findings, hallucinated content, polarity reversals, location changes, uncertainty mismatches, and temporal-comparison errors rather than low surface similarity alone. Radiology report generation provides a challenging test case because generated reports must preserve structured clinical evidence across sources. We present RadOT-Eval, an interpretable structured-evidence optimal transport framework for offline auditing of radiology report generation. RadOT-Eval decomposes reference and candidate reports into attribute-structured clinical evidence units, aligns corresponding evidence using entropy-regularized optimal transport, and uses clinically meaningful side-channel discrepancies in a monotone risk model to predict error burden. All transport, feature, and readout choices are selected using the ReXVal dataset, and the frozen system is evaluated on the independent RadEvalX dataset. RadOT-Eval achieves Spearman correlations of 0.715, 0.548, and 0.399 with total, clinically significant, and clinically insignificant annotated error burden, respectively, yielding higher point estimates than standard evaluation metrics and the open-source large language model (LLM)-based evaluator GREEN-radllama2-7B. In a frozen auxiliary corruption-sensitivity stress test on ReXErr-v1, RadOT-Eval achieves 0.768 AUROC and a 0.990 corrupted-greater-than-clean paired win rate. These results show that structured evidence transport provides an auditable, rank-oriented evaluation tool for high-stakes generated clinical text under ReXVal-only model selection and frozen RadEvalX testing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08769v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Weixin Liu, Juming Xiong, Yang Li, Qingyuan Song, Susannah Rose, Murat Kantarcioglu, Bradley Malin, Zhijun Yin</dc:creator>
    </item>
    <item>
      <title>TeamHerald@CHIPSAL 2026: Hate Speech Detection and Sentiment Analysis of Nepali Memes using Transformer-based Architectures and Ensemble Learning</title>
      <link>https://arxiv.org/abs/2606.08770</link>
      <description>arXiv:2606.08770v1 Announce Type: new 
Abstract: The analysis of internet memes in the Nepali language is complicated by frequent code-mixing and a lack of established baseline resources. While memes inherently combine visual and textual elements, this study focuses on a text-centric approach by extracting embedded text using an OCR layer and modeling it with Transformer-based architectures. We evaluate six distinct models and investigate the comparative effectiveness of Hard and Soft Voting ensemble strategies across two tasks: binary hate speech detection and three-class sentiment analysis. Experimental results show that a standalone decoder-only model achieved the highest performance for binary classification, whereas the Soft Voting ensemble performed best for the multi-class sentiment task, yielding a 15.8% relative improvement in Macro F1-score over the strongest standalone baseline. These findings suggest that ensemble strategies behave differently across binary and multi-class tasks, highlighting the importance of selecting aggregation methods suited to the classification objective.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08770v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ashish Acharya, Anish Khatiwada, Rohit Khadka, Pragya Aryal</dc:creator>
    </item>
    <item>
      <title>Non-Uniform Codebook Design for Optical IRS-Assisted VLC Systems</title>
      <link>https://arxiv.org/abs/2606.08774</link>
      <description>arXiv:2606.08774v1 Announce Type: new 
Abstract: Optical intelligent reflecting surfaces (OIRS) can improve the coverage of indoor visible light communication (VLC) systems, however, practical deployment requires a finite offline codebook to avoid repeated real-time optimisation of mirror orientations. A uniform codebook with fixed angular steps does not provide uniform coverage on the user plane, because the mapping from steering angles to reflection locations on the user plane is nonlinear. To address this problem, this paper proposes a geometric-optics-based non-uniform codebook design for OIRS-assisted VLC systems. The proposed method constructs an individual codebook for each IRS element according to its geometric position, so that the reflected beams are distributed more uniformly over the user plane. The codebook accuracy is evaluated using the Frobenius norm of the channel error matrix. Simulation results show that the proposed design provides more uniform spatial mapping with fewer codewords than the uniform codebook, and that the sweep-angle resolution has a stronger effect on the codebook accuracy than the tilt-angle resolution.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08774v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Rashid Iqbal, Dimitrios Bozanis, Dimitrios Tyrovolas, Christos K. Liaskos, Muhammad Ali Imran, George K. Karagiannidis, Hanaa Abumarshoud</dc:creator>
    </item>
    <item>
      <title>Unifying Object-Centric World Models and Diffusion Policy: A Hierarchical Framework for Multi-Stage Robotic Tasks</title>
      <link>https://arxiv.org/abs/2606.08775</link>
      <description>arXiv:2606.08775v1 Announce Type: new 
Abstract: Visual world models have shown great potential in learning complex system dynamics. Recent advancements leverage these models as transition functions within Model Predictive Control (MPC) frameworks to solve various control tasks. When applied to robotics, however, they are limited to single-stage tasks such as reaching or grasping, and struggle with multi-stage ones that demand complex sequential planning. In this work, we introduce WorldDP, a world model framework designed for multi-stage robotic manipulation. Our hierarchical approach utilizes a high-level world model as a transition function to optimize for feasible subgoals during runtime, which are subsequently reached by a low-level Diffusion Policy. To further aid in learning dynamics and planning, we incorporate object-centric representations that decouple environmental entities and enable us to plan sequentially with respect to each. Evaluated across several robotics benchmarks, WorldDP consistently outperforms existing baselines, validating that coupling the world model's physically grounded planning with diffusion policy's efficient execution yields superior multi-stage performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08775v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Raktim Gautam Goswami, Prashanth Krishnamurthy, Yann LeCun, Farshad Khorrami</dc:creator>
    </item>
    <item>
      <title>How Many Counterfactuals Does It Take? Probing VLM Hallucinations Through Circuits and Causal Effects</title>
      <link>https://arxiv.org/abs/2606.08777</link>
      <description>arXiv:2606.08777v1 Announce Type: new 
Abstract: Visual Language Models (VLMs) are known to produce hallucinated predictions that are not grounded in visual evidence, yet existing approaches lack a principled understanding of how robust such predictions are under counterfactual perturbations. In this work, we study the sample complexity of counterfactual robustness for hallucinated outputs in VLMs. We define a causal influence metric based on log-probability differences between factual, counterfactual, and activation-patched runs, and use it to characterize the stability of hallucinated predictions. By leveraging circuit discovery techniques (CD-T), we identify model components responsible for these predictions and track their activation differences across counterfactual samples. We then derive empirical bounds on the minimum number of counterfactual samples m required to reliably detect instability in hallucinated outputs, using concentration inequalities and variance estimates of the causal influence distribution.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08777v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Abhivansh Gupta, Simardeep Singh, Advika Sinha, Shreyansh Modi, Akshat Tomar</dc:creator>
    </item>
    <item>
      <title>Reformulate LLM Reinforcement Learning for Efficient Training under Black-box Discrepancy</title>
      <link>https://arxiv.org/abs/2606.08779</link>
      <description>arXiv:2606.08779v1 Announce Type: new 
Abstract: Reinforcement Learning (RL) has emerged as a pivotal post-training paradigm, yet it frequently suffers from unpredictable sub-optimum performance or even training collapses. Recent findings attribute these failures to a hidden train-inference discrepancy (or mismatch), stemming from the disparate underlying engines and architecture. We find that the training policy can actively self-correct such a discrepancy when provided with an appropriate learning signal. Then, we further empirically identify a discrepancy tolerance region: within this region, aggressively narrowing the discrepancy can suppress policy exploration and reduce learning efficiency, whereas outside this region, reducing excessive discrepancy improves optimization consistency and raises the achievable local performance ceiling. According to such findings, we formulate this problem as a Discrepancy-Constrained Markov Decision Process (DCMDP), where reward maximization is coupled with a constraint that aligns training-Inference behavior, achieving stable dual-objective optimization. To adaptively balance performance improvement and discrepancy control, we introduce a Lagrangian relaxation mechanism that dynamically adjusts the relative weight of the two objectives according to the current degree of discrepancy violation. This enables stable dual-objective optimization: the policy is allowed to explore freely within the tolerance region, while being guided back when the discrepancy exceeds the safe boundary. Empirically, DCMDP significantly improves the performance of 8B dense model (Qwen-3-8b) and 30B Mixture-of-Expert model (Qwen-3-30bA3b), and enables a heterogeneous training paradigm, where LLMs can be optimized in high-fidelity training setup while being explicitly aligned for low-cost, resource-constrained inference deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08779v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiashun Liu, Runze Liu, Xu Wan, Jing Liang, Hongyao Tang, Ling Pan</dc:creator>
    </item>
    <item>
      <title>Beyond Consistency: Preserving Temporal Structure in Zero-Shot Video Editing</title>
      <link>https://arxiv.org/abs/2606.08780</link>
      <description>arXiv:2606.08780v1 Announce Type: new 
Abstract: Existing zero-shot video editing methods rely on pre-trained diffusion models, successfully achieving spatial control and basic temporal consistency but fundamentally fail to preserve the video's original temporal structure.This distinction is critical: temporal consistency ensures visual smoothness, but temporal structure dictates the video's high-level narrative, rhythm, and semantic flow. Without this preservation, the edited output, especially for long videos with complex semantic variations, becomes narratively incoherent and semantically ambiguous. To address this limitation, we introduce a novel zero-shot editing approach that, for the first time, explicitly focuses on preserving the source video's temporal structure. We achieve this by adaptively partitioning the video into semantically distinct clips based on feature similarity and selecting a representative anchor frame for each clip. To enhance both intra-clip fidelity and computational efficiency, we design a clip-adaptive token merging strategy which leverages the anchor's semantic dominance to stabilize the editing. Furthermore, we employ an alternating combination strategy that ensures seamless inter-clip transitions while maintaining semantic distinction. Extensive experiments demonstrate that our method achieves state-of-the-art results, successfully balancing the preservation of original temporal structure with computational efficiency, and setting a new benchmark for zero-shot video editing fidelity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08780v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Deyin Liu, Yisheng Ding, Zhe Jin, Xiatian Zhu, Anjan Dutta, Lin Wu</dc:creator>
    </item>
    <item>
      <title>DeepMine-Mamba: Mitigating Information Dilution in Mamba-Based State Space Models for Document Image Binarization</title>
      <link>https://arxiv.org/abs/2606.08781</link>
      <description>arXiv:2606.08781v1 Announce Type: new 
Abstract: Document image binarization aims to separate foreground text from degraded backgrounds while preserving thin, broken, and low-contrast strokes. Although deep learning methods have improved binarization performance, most existing approaches rely on convolutional, transformer-based, or generative architectures, while Mamba-based state space models remain largely unexplored for this task. In this work, we investigate Mamba-based feature propagation and observe that direct state-space propagation may dilute weak foreground cues during long-range modeling, especially faint ink traces, fragmented characters, and boundary-sensitive stroke details. To address this problem, we propose DeepMine-Mamba, a Mamba-based binarization framework equipped with a novel Anti-Dilution Gate that estimates propagation-induced feature changes and selectively restores stroke-sensitive local responses while suppressing unnecessary background enhancement. Experiments on DIBCO/H-DIBCO benchmarks under a strict leave-one-year-out protocol show that DeepMine-Mamba achieves competitive overall performance, with strong average FM and Fps across benchmark years. Ablation results further demonstrate that the Anti-Dilution Gate improves stroke preservation and reduces perceptually significant binarization errors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08781v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sheng-Wei Chan, Yung-Che Wang, Hsin-Jui Pan, Chia-Min Lin, Jen-Shiun Chiang</dc:creator>
    </item>
    <item>
      <title>MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training</title>
      <link>https://arxiv.org/abs/2606.08788</link>
      <description>arXiv:2606.08788v1 Announce Type: new 
Abstract: Representation alignment with pretrained vision models has recently shown strong potential for accelerating diffusion transformer training. By aligning intermediate diffusion features with clean-image representations from self-supervised vision encoders, existing methods improve convergence and generation quality. However, such alignment also introduces a non-trivial constraint: diffusion models operate on noisy inputs whose usable information varies across timesteps, while the reference features are extracted from clean images. In this paper, we revisit this mismatch from a token-level perspective. We find that, under full-token representation alignment, tokens with large alignment-gradient norms exhibit a stable spatial preference, suggesting that the alignment objective does not affect all tokens uniformly and may encourage the model to rely on the complete set of clean-image tokens. To address this issue, we propose MaskAlign, a token-subset representation alignment method that applies alignment to randomly sampled token subsets during training. By exposing the model to different token subsets across iterations, MaskAlign reduces the dependence of representation alignment on the complete token set and encourages alignment behavior that is more stable under token-subset perturbations. To mitigate the information loss caused by directly dropping tokens, we further introduce a lightweight pre-mask token mixing block that shares information across tokens before masking.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08788v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lianyu Pang, Tianlin Pan, Cheng Da, Changqian Yu, Huan Yang, Kun Gai, Song Guo, Wenhan Luo</dc:creator>
    </item>
    <item>
      <title>RAILS: Verification-Native Clearing For Agentic Commerce</title>
      <link>https://arxiv.org/abs/2606.08790</link>
      <description>arXiv:2606.08790v1 Announce Type: new 
Abstract: Autonomous agents negotiate, purchase, deploy code, and move funds, but no neutral mechanism determines whether they
  met their delegated obligation, who is responsible when they did not, or which settlement action follows. This is the
  agentic clearing problem. Tool protocols (MCP), inter-agent communication (A2A), payment rails (x402), mandate and
  network agent protocols (AP2, Visa, Mastercard), and settlement-risk standards each assume that determination and none
  produce it.
  Clearing is the missing primitive. Payment is not clearing. Authorization is not clearing. LLM-as-judge evaluation is
  not clearing. Settlement-risk escrow is not clearing: it consumes clearing decisions.
  RAILS (Real-Time Agent Integrity &amp; Ledger Settlement) is the integrity and clearing layer for agentic commerce,
  spanning a per-output reliability score, a published reliability record, and a clearing function that consumes them.
  The clearing protocol at its core closes that gap. Seven primitives (Obligation Object, Evidence Envelope,
  Verification Mesh, Clearing Decision, Settlement Instruction, Clearing Passport, Finality Rules), bound by a formal
  model of admissibility-graded verification, together yield a soundness property: no financially material settlement is
  supported by evidence below the obligation's admissibility floor. The property is falsifiable against the spec. We
  are not aware of a prior agent-commerce verification mechanism that states a property of this kind. The approaches
  nearest to it emit a pass, a delivery guarantee, a bare score, or an equilibrium.
  This paper specifies that clearing protocol.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08790v1</guid>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Adrian de Valois-Franklin, Alex Bogdan</dc:creator>
    </item>
    <item>
      <title>The Amplifying Mirror: Locating and Steering the Partisan Direction inside a Large Language Model</title>
      <link>https://arxiv.org/abs/2606.08792</link>
      <description>arXiv:2606.08792v1 Announce Type: new 
Abstract: Large language models are rapicly replacing search engines as the primary interface between people and information. Unlike search engines, which retrieve existing content, LLMs generate novel text shaped by internal representations learned during training. Here we show that partisan political identity is encoded in the model's activation space, and that this direction directly shapes generation. Using 190,491 tweets from sitting members of the U.S. Congress as labeled training data, we train linear probes on the hidden states of the Llama 3.1 8B Instruct model. We identify a single geometric axis at layer 18 that separates Republican from Democratic text with an AUC of 0.945 and a Cohen's d of 1.94, and use sparse autoencoders to decompose that axis into interpretable partisan features. Causally intervening along this axis, ablating or amplifying the partisan component mid-generation, produces systematic shifts in the model's output. We witness stance reversals, register shifting, and structured fabrications of authority. Our results demonstrate that partisan bias in language models is not a vague emergent property but a learned geometric feature that can be precisely located and steered. Partisan bias is not a bug to be patched, but a structural property of how these models encode information about their users. As LLMs displace search engines as the interface to knowledge, understanding that product design (and its consequences) will be essential for navigating the legal, social, and political transitions from an information ecosystem that is curated to one that is generated.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08792v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wendy K. Tam</dc:creator>
    </item>
    <item>
      <title>AI-Augmented Closed-Loop Quality Engineering: A Reference Architecture for Continuous Software Quality Intelligence</title>
      <link>https://arxiv.org/abs/2606.08793</link>
      <description>arXiv:2606.08793v1 Announce Type: new 
Abstract: The quality of software engineering is still under a challenge due to disjointed processes between requirements, testing, and production, which hinders the opportunity to implement quality strategies in consecutive releases. Existing approaches tend to be fixed-model or single-optimization approaches and lack production feedback learning mechanisms. The paper at hand proposes a closed-loop reference architecture of continuous software quality intelligence with AI enhancements. The model synthesizes requirement feature mining, risk-based test prioritization, defect prediction, and production incident analysis as an element of a feedback-based pipeline. A limited feedback learning model is introduced that is used to propagate the production signal-based on defect severity and incident impact- to the following release to ensure stability, and the time. The method is evaluated using a semi-synthetic test dataset of 4,500 requirements, 27,049 test cases, 13,089 defects and 7,841 incidents in six release cycles. The experimental results show that the proposed system reduces the defect leakage by 0.19 to 0.13, increases the effectiveness of the detection system to 0.72 to 0.84, and shortens the test execution by up to 35 percent compared to the non-adaptive baselines. The changes are stable release to release. The findings indicate that through the integration of feedback-based learning in a closed-loop architecture, it can be continued to enhance quality process, which offers practical foundation of adaptive quality engineering of software.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08793v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dimple Bajaj</dc:creator>
    </item>
    <item>
      <title>PairWise Image Finder: An Open-source Tool for Finding Visually Aligned Street-Level Image Pairs for Urban Perception Studies</title>
      <link>https://arxiv.org/abs/2606.08795</link>
      <description>arXiv:2606.08795v1 Announce Type: new 
Abstract: Change detection and scene recognition techniques have been widely applied to Street View Imagery (SVI) to understand changes in scenes across the years. However, metadata alone is often insufficient to reliably find visually aligned image pairs. This study introduces the PairWise image finder, a tool that integrates feature detection and matching, supported by semantic segmentation masks to quantify the visual alignment of two images of varying time periods. The tool outputs the share of matched key features, the matched feature distance and coverage, and the alignment of semantic masks, which enables the user to filter image pairs depending on the alignment quality and use case. The visually aligned pairs derived from the tool can be used to accurately study explicit longitudinal change and help reduce manual effort for perception studies. The usability of the tool is demonstrated through a comparison of longitudinal changes, highlighting the importance of perspective when quantifying changes. The proposed method provides a scalable and open tool for researchers and stakeholders to find high-quality image pairs for urban analysis, perception and related applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08795v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jussi Torkko</dc:creator>
    </item>
    <item>
      <title>A Non-Overlapping Schwarz Hybrid Finite Element-Neural Operator Framework for Solid Mechanics on Irregular Domains</title>
      <link>https://arxiv.org/abs/2606.08796</link>
      <description>arXiv:2606.08796v1 Announce Type: new 
Abstract: Finite element (FE) methods are the benchmark for solid mechanics simulations, yet their computational cost becomes prohibitive for problems with localised nonlinearities, fine-scale features, or long-time dynamic evolution. In our earlier FE-neural operator (FE-NO) hybrid framework [1], physics-informed deep operator networks were coupled with FE solvers through overlapping domain decomposition with Dirichlet-Dirichlet interface exchange, accelerating intensive subdomains while preserving FE fidelity elsewhere. Two limitations remained: the overlapping formulation required redundant interface computations that increased inner Schwarz iteration counts, and the convolutional feature extractor restricted the NO subdomain to structured grids, precluding irregular geometries. A non-overlapping Schwarz alternating method with Neumann-Dirichlet interface exchange replaces it, transmitting traction from the NO to FE rather than displacement. This eliminates the overlap layer and reduces inner Schwarz iterations while maintaining bounded error accumulation across all tested time horizons. For arbitrarily shaped subdomains, a Point-DeepONet operates on unstructured FE point clouds without interpolation, extending it to non-convex and irregular geometries. Strain and stress operators are derived analytically from the displacement operators via kinematic equations, rather than as independent networks, reducing trainable parameter sets while enforcing mechanical consistency by construction. The framework is validated on three benchmarks: static linear elasticity, quasi-static hyperelasticity, and elastodynamics with regular and irregular geometries. These results establish a non-overlapping FE-NO coupling paradigm that is geometry-flexible, parameter-efficient, and convergence-stable, providing a pathway for hybrid physics-based and operator-learning solvers in large-scale dynamic solid mechanics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08796v1</guid>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wei Wang, Abhinav Gupta, Haihui Ruan, Somdatta Goswami</dc:creator>
    </item>
    <item>
      <title>Scaling Decision-Focused Learning to Large Problems with Lagrangian Decomposition</title>
      <link>https://arxiv.org/abs/2606.08797</link>
      <description>arXiv:2606.08797v1 Announce Type: new 
Abstract: Decision-focused learning has shown great promise for addressing predict-then-optimize problems, particularly in the presence of under-specified models. However, its practical deployment is often hindered by high computational costs and limited scalability, as it requires solving a constrained optimization problem for each training instance at every iteration. To address these challenges, we propose a novel framework that incorporates Lagrangian decomposition into the decision-focused learning paradigm. Specifically, we introduce a new surrogate objective along with two loss functions for evaluating and training the underlying prediction model. We further propose two variants of our approach, which offer different trade-offs between computational efficiency and solution quality. Our framework can be seamlessly integrated with standard decision-focused learning methods, including Smart Predict-then-Optimize (SPO+) and Implicit Maximum Likelihood Estimation (IMLE). Through experiments on two standard benchmarks, the multi-dimensional knapsack problem and quadratic portfolio optimization, we demonstrate that our approach achieves competitive performance while remaining amenable to parallelization. In particular, it consistently outperforms traditional decision-focused learning methods on large-scale instances, involving up to eight times more variables than those typically considered in related work. The implementation is available at https://github.com/corail-research/DFL-LD.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08797v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>St\'ephane Eilles-Chan Way, Hugo Percot, Quentin Cappart, Tias Guns, Louis-Martin Rousseau</dc:creator>
    </item>
    <item>
      <title>Bridging Expert Knowledge and Automated Feature Engineering via Self-Evolution</title>
      <link>https://arxiv.org/abs/2606.08800</link>
      <description>arXiv:2606.08800v1 Announce Type: new 
Abstract: In high-stakes settings such as brand compliance, clinical care, and content moderation, machine learning cannot be deployed as opaque oracles: practitioners inspect the features driving model decisions, and models must leverage the expert documentation governing these domains. In practice, the data arrives as unstructured content, and features extracted from it must be interpretable, discriminative, and aligned with what experts consider important. Existing methods fall short: they target tabular inputs, lack demonstrated expert alignment, and cannot operationalize qualitative criteria such as 'maintain professional tone' into precise features. We present FEST (Feature Engineering with Self-evolving Trees), combining dual-stream feature generation (semantic and deterministic), semantic deduplication, and tree-guided iterative evolution to discover auditable features from raw text and images. FEST leads in 17 of 20 classifier-task combinations across brand classification, content authenticity detection, and stress detection, with a mean gain of 4.2 pp over the strongest baseline across five classifiers. An LLM-as-judge evaluation shows FEST achieves 60-80% coverage of expert-designed brand features at strict semantic-alignment thresholds, corroborated by a human expert study rating features highly on relevance, clarity, and actionability. When seeded with expert guidelines, FEST refines qualitative criteria into operational features, improving accuracy by 6-12 pp on average across brands. To enable systematic evaluation of expert alignment in automated feature engineering, we release BrandGuide, the first dataset pairing expert-designed features with 1M+ assets across 2,683 brands. By grounding feature engineering in expert knowledge, FEST opens a practical pathway for interpretable ML in domains demanding human oversight.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08800v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Varun Khurana, Vijval Ekbote, Vashu Chauhan, Yaman Kumar Singla, Rajiv Ratn Shah, Balaji Krishnamurthy</dc:creator>
    </item>
    <item>
      <title>Active Flow Expansion for Out-of-Distribution Discovery: from Theory to Molecules</title>
      <link>https://arxiv.org/abs/2606.08802</link>
      <description>arXiv:2606.08802v1 Announce Type: new 
Abstract: Standard flow and diffusion pre-training matches the distribution of available data (e.g., molecules), which often covers only a small fraction of the valid design space. In generative discovery, however, one aims to sample valid new-to-nature designs, assigned negligible probability under, and thus inaccessible to, standard models fitted to the observed data. To overcome this limitation, we depart from data distribution matching and view a generative model through its generable set: the region it covers with non-negligible probability. This allows to introduce a new learning principle for out-of-distribution flow modeling: enlarging a model's generable set to increase coverage of the valid design space. We propose Active Flow Expansion (ActFlow), a continued pre-training method that employs verifier feedback to expand a pre-trained model over new valid regions by iteratively adapting to synthetic data generated through active exploration in the learned flow representation. Theoretically, we establish to our knowledge first-of-their-kind statistical learning guarantees for out-of-distribution flow modeling, analyzing generable set expansion as a local-to-global reachability process over a learned representation. Empirically, we assess ActFlow with suitable out-of-distribution generative modeling metrics across small organic molecules, mid-sized drug-like molecules, therapeutic peptides, and protein sequence design tasks. Results show that ActFlow expands valid coverage far beyond the region modeled by the initial pre-trained model, significantly outperforming widely adopted synthetic flow pre-training methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08802v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Riccardo De Santi, Bruce Lee, Cristian Perez Jensen, Kimon Protopapas, Sophia Tang, Cheng-Hao Liu, Pranam Chatterjee, Yisong Yue, Andreas Krause</dc:creator>
    </item>
    <item>
      <title>Some Essential Constructive Foundations for Systems and Control</title>
      <link>https://arxiv.org/abs/2606.08803</link>
      <description>arXiv:2606.08803v1 Announce Type: new 
Abstract: This work develops several constructive foundations for systems and control within Bishop-style constructive mathematics. For an engineer, the guiding principle is that an object claimed to exist, such as a trajectory, an optimal control law, a selector, or a viable solution, should come with finite data and an operation computing approximations to any prescribed precision. The style remains close to classical analysis, but existential statements are organized so that their computational content is visible. The paper begins with elementary geometric data in finite-dimensional Euclidean spaces: blocks, multiblocks, representable sets, regular functions, and certified integrals. This set-first integration route is meant to complement, rather than replace, abstract constructive integration theories such as Daniell-type or integration-space approaches. The developed apparatus is then applied to a constructive functional extremum-value theorem, selector extraction for multifunctions, Filippov-type and viable solutions of differential inclusions, regular probability densities, controlled Markov chains, and empirical density certificates. A short account of resolvent projectors and linear stability is included for completeness.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08803v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>math.DS</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Pavel Osinenko</dc:creator>
    </item>
    <item>
      <title>Q-Delta: Beyond Key-Value Associative State Evolution</title>
      <link>https://arxiv.org/abs/2606.08804</link>
      <description>arXiv:2606.08804v1 Announce Type: new 
Abstract: Linear attention reformulates sequence modeling as recurrent state evolution, enabling efficient linear-time inference. Under the key-value associative paradigm, existing approaches restrict the role of the query to the readout operation, decoupling it from state evolution. We show that query-conditioned state readout induces a structured value prediction over accumulated memory that complements key-based retrieval. Based on this insight, we propose Q-Delta, a query-aware delta rule that integrates mixed key-query prediction errors into state evolution, enabling jointly corrective dynamics while preserving delta-rule efficiency. We establish stability guarantees for the resulting dynamics and derive a hardware-efficient chunkwise-parallel formulation with a custom Triton implementation. Empirical results demonstrate stable optimization, competitive throughput, and consistent improvements over strong baselines on language modeling and long-context retrieval tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08804v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sumin Park, Seojin Kim, Noseong Park</dc:creator>
    </item>
    <item>
      <title>Governance Controls for AI-Generated Test Artifacts in Autonomous Software Testing</title>
      <link>https://arxiv.org/abs/2606.08806</link>
      <description>arXiv:2606.08806v1 Announce Type: new 
Abstract: Artificial Intelligence (AI) and Large Language Models (LLMs) are increasingly used in autonomous software testing; however, AI-generated test artifacts often suffer from hallucinations, compliance violations, security risks, and limited explainability. To enhance the reliability, transparency, and trustworthiness of AI-generated testing artifacts, this research introduces the concept of Governance-Aware Autonomous Testing Framework (GATF). The framework extends the autonomous testing lifecycle with governance validation, explainability analysis, probabilistic risk assessment, compliance monitoring, as well as audit governance. Experiments were performed with Defects4J and PROMISE software engineering datasets. The proposed framework successfully reduced the governance-related risks by 89.6% and demonstrated 94.3% accuracy in governance, 96.5% artifact reliability, 94.2% compliance accuracy, and 90.8% explainability performance. The results show that autonomous testing systems that are governance-aware can significantly enhance the reliability, transparency, and operational security of autonomous testing systems in comparison to conventional AI-based testing systems. The proposed architecture is scalable and reliable and provides a safe environment for software testing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08806v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dimple Bajaj, Deepak Khetan</dc:creator>
    </item>
    <item>
      <title>A Classroom Study of LLM-Generated Feedback Intervention in Introductory Programming</title>
      <link>https://arxiv.org/abs/2606.08807</link>
      <description>arXiv:2606.08807v1 Announce Type: new 
Abstract: Large language models (LLMs) are increasingly used to provide automated feedback in introductory programming courses, yet empirical evidence from authentic classroom deployments comparing different feedback modalities remains limited. In this work, we present a large-scale classroom study in which AI-generated feedback was deployed through a randomized protocol in an introductory Python programming course. Students received one of three feedback conditions on incorrect submissions: natural language hints, AI-generated failing test cases, or no AI feedback. We release the resulting dataset, ProgFeed, which captures 6,693 submissions from 215 consenting students across 17 labs, including feedback conditions, execution-based performance measures, and fine-grained temporal information. Using this data, we analyze learning trajectories, feedback quality, and submission behavior over repeated attempts. We find that natural language feedback is significantly associated with higher completion rates and faster convergence to correct solutions. Test case feedback, by contrast, exhibits heterogeneous effects that depend critically on feedback validity. Our results suggest that the form of AI-generated feedback matters, and that evaluating feedback quality -- not just its presence -- is essential for understanding its pedagogical impact.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08807v1</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hasnain Heickal, Andrew Lan</dc:creator>
    </item>
    <item>
      <title>Energy Storage as a Multi-Use Asset: Applications Across the Power System</title>
      <link>https://arxiv.org/abs/2606.08808</link>
      <description>arXiv:2606.08808v1 Announce Type: new 
Abstract: The energy transition in power systems requires flexible assets to offset renewable generation variability across multiple time scales, while supporting the integration of renewables and the electrification of demand without requiring costly grid reinforcement. Energy storage occupies a unique position among these assets: depending on the technology, it can provide short-duration grid services at high ramping rates, such as frequency regulation and voltage support, longer-duration functions such as intra-day peak shaving, or inter-seasonal energy buffering. This multi-service character, combined with the declining costs of energy storage technologies (most notably that of battery energy storage systems), is central to the economic viability of storage investments. The value of a given installation depends strongly on its grid connection point and intended use case: an asset-coupled battery serving a consumer or generation plant faces a different service landscape, and therefore a different business case, than a network-coupled system operating as an independent grid resource. This paper presents a structured taxonomy of grid-connected energy storage applications, discusses the principal application domains, and describes the key challenges that must be addressed to integrate storage effectively into power systems. Services are discussed with special emphasis on the Swiss regulatory context. Finally, the STORE flagship project supported by the Swiss Innovation Agency (Innosuisse), where some of the critical challenges of energy storage integration in power grids are addressed, is introduced.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08808v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Fabrizio Sossan</dc:creator>
    </item>
    <item>
      <title>Continuous Language Diffusion as a Decoder-Interface Problem</title>
      <link>https://arxiv.org/abs/2606.08810</link>
      <description>arXiv:2606.08810v1 Announce Type: new 
Abstract: Gaussian-corrupted sentence embeddings have no direct linguistic interpretation, yet continuous diffusion language models can generate fluent text from them. We study this puzzle through Embedded Language Flows (ELF) and identify a decoder-basin mechanism: denoising succeeds when trajectories reach regions where the native decoder can read stable tokens. We introduce a diagnostic protocol for denoisability, semantic recoverability, order sensitivity, decoder compatibility, and trajectory reliability. It exposes failures hidden by scalar metrics: low mean-squared error can discard linguistic content, low perplexity can reflect low-entropy collapse, and clean latent reconstruction can coexist with a narrow decoder basin. A decoder-margin bound explains why token recovery depends on margin and local decoder sensitivity, not latent error alone. Auditing public ELF checkpoints reveals an interface phase diagram: early predictions are weakly readable, mid-trajectory disagreement marks a competition region, and late predictions enter a high-margin final-token basin. Once inside, token realization is surprisingly simple on generated ELF states: frozen T5 token-embedding lookup recovers $93$--$96\%$ of native decoder decisions, and a single linear readout reaches $97.9\%$ agreement at 32k samples, leaving about a 1.1 perplexity gap in a structured residual tail. A conservative margin gate exits $17$--$27\%$ earlier in denoising steps under an explicit diagnostic monitor. Boundary checks on LangFlow, BitstreamDiffusion, and the Continuous Latent Diffusion Language Model (Cola-DLM) show that the same interface questions remain meaningful when the state object and decoder change. Continuous and latent diffusion language models should therefore be evaluated as representation-decoder systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08810v1</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhicheng Du, Lan Ma</dc:creator>
    </item>
    <item>
      <title>Data Architectures and their Technical Requirements (DATER)</title>
      <link>https://arxiv.org/abs/2606.08811</link>
      <description>arXiv:2606.08811v1 Announce Type: new 
Abstract: Modern organizations generate and consume massive volumes of heterogeneous data at high speed. This requires a continuous development of new techniques for more efficient and reliable data management. Designing appropriate data architectures has therefore become a strategic necessity, as they shape how data is integrated, governed, and made available for analytics and decisionmaking. This paper introduces a conceptual framework - Data Architectures and their Technical Requirements (DATER) - to systematically describe and evaluate data architectures based on technical requirements. Six modern architectures are examined: data warehouse, (semantic) data lake, data lakehouse, data fabric, and data mesh. Each is analyzed by historical context, defining features, and conformance to DATER dimensions. The study supports researchers and practitioners in navigating architectural paradigms, clarifying overlaps, and highlighting strengths, limitations, and use-case suitability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08811v1</guid>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Sayed Hoseini, Christoph Quix, Stefan Decker</dc:creator>
    </item>
    <item>
      <title>Aperon Technical Report: Hierarchical No-Pointer Tangent-Local Search for High-Dimensional Approximate Nearest Neighbors</title>
      <link>https://arxiv.org/abs/2606.08813</link>
      <description>arXiv:2606.08813v1 Announce Type: new 
Abstract: We present HNTL (Hierarchical No-pointer Tangent-Local), the core vector indexing and candidate generation framework of the Aperon vector memory system. Proximity graphs (e.g., HNSW) incur a heavy pointer tax in memory overhead and induce irregular memory accesses that stall CPU pipelines. HNTL resolves this by partitioning the high-dimensional space into local, coherent grains, representing vectors as low-dimensional coordinates on local tangent spaces, and scanning them sequentially using a pointerless Block-SoA (Structure-of-Arrays) layout.
  On anisotropic manifold data (d=768, N=10,000), local PCA captures 96.3% of the variance, allowing HNTL to achieve a final Rerank Recall@10 of 1.0000 with a candidate pool size of only C=20 vectors. Hardware profiling via Apple kperf CPU Performance Monitoring Unit (PMU) counters demonstrates a 3.61x speedup (4.137 ns/vector vs. 14.951 ns/vector) for our NEON auto-vectorized C++ Block-SoA scan engine over standard pointer-chasing graph traversals, driven by a 3.59x IPC (Instructions Per Cycle) and near-zero L1/L2 data cache misses.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08813v1</guid>
      <category>cs.DC</category>
      <category>cs.DB</category>
      <category>cs.IR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yong Fu</dc:creator>
    </item>
    <item>
      <title>STAR: Rethinking MoE Routing as Structure-Aware Subspace Learning</title>
      <link>https://arxiv.org/abs/2606.08814</link>
      <description>arXiv:2606.08814v1 Announce Type: new 
Abstract: Mixture-of-Experts (MoE) scales model capacity efficiently by selectively routing inputs to a specialized subset of experts. However, input-expert specialization, the core motivation of MoE, critically depends on whether the router is actually aware of input structure. In practice, MoE routing is typically implemented as a shallow linear projection with limited awareness of input representation, which often leads to unstable routing. We propose STAR, a Structure Aware Routing that rethinks MoE routing as a subspace learning problem by augmenting standard learnable routing with an evolving principal subspace that tracks dominant input structure via Generalized Hebbian Algorithm (GHA). By aligning routing decisions directly with input structure, STAR enables stable expert specialization. We evaluate STAR on controlled synthetic setup and large-scale language and vision tasks, where it consistently improves routing quality and downstream performance over strong MoE baselines. Moreover, optional test-time subspace updates further enhance routing robustness and generalization under input distribution shifts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08814v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sumin Park, Noseong Park</dc:creator>
    </item>
    <item>
      <title>Momentum for Reasoning: Dense Intrinsic Signals in Policy Optimization</title>
      <link>https://arxiv.org/abs/2606.08815</link>
      <description>arXiv:2606.08815v1 Announce Type: new 
Abstract: Reinforcement learning with verifiable rewards (RLVR) has emerged as a powerful paradigm for eliciting long-chain reasoning in large language models. However, existing methods based on Group Relative Policy Optimization (GRPO) rely on a binary outcome reward, which induces two structural failure modes: Zero-Advantage Collapse, in which all rollouts in a group share the same outcome and the gradient vanishes, and Hallucinated Certainty, in which the model becomes increasingly confident on incorrect rollouts late in training. We address both modes by densifying the reward with intrinsic signals computed entirely from the policy's own conditional probabilities, and propose ISPO (Intrinsic Signal Policy Optimization, which combines a sequence-level signal measuring how informative the thinking trajectory is for the final answer, with a token-level directional reward whose hallucinated-certainty hinge penalizes confidently-wrong predictions at critical decision tokens. Across three base models and five mathematical reasoning benchmarks, ISPO consistently outperforms competitive baselines, with the largest gains on the hardest benchmarks where zero-advantage collapse is most frequent, and training-dynamics diagnostics confirm that both failure modes are decreased.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08815v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hao Chen, Zhanming Shen, Liyao Li, Yanyu Chen, Xuhang Zhu, Xiaomeng Hu, Qi Zhang, Ru Peng, Xiaoyu Shen, Haobo Wang, Junbo Zhao</dc:creator>
    </item>
    <item>
      <title>Knowledge Graphs and Reasoning LLMs for Finding Simple Yet Effective Transcriptomic Perturbation Predictors</title>
      <link>https://arxiv.org/abs/2606.08816</link>
      <description>arXiv:2606.08816v1 Announce Type: new 
Abstract: Predicting the effect of an unseen gene knockout perturbation on transcriptomic gene expression remains a highly challenging problem for virtual cell models. Recent progress has been made by leveraging biological knowledge graphs to provide a notion of similar perturbation, allowing for improved extrapolation beyond the set of training perturbations. In this work, we demonstrate that the simplest model to leverage these assumptions - a K-nearest neighbour from the knowledge graph - achieves highly competitive performance on this task, and that this can be improved further using LLMs optimised via reinforcement learning (RL) for predictive performance. Specifically, we find that the K-nearest neighbour approach beats almost all methods on out-of-distribution perturbation prediction, and when a reasoning LLM is trained via RL to make changes to the neighbourhood, it obtains equivalent performance to current state of the art methods on the cell lines from Replogle et al. (2022). We also demonstrate that the RL training improves the LLM's performance on the downstream task of differential expression prediction, despite not being trained on this directly. Overall, these findings demonstrate the efficacy of knowledge graphs as model priors, and show early signs that RL can refine LLMs into generalizable tools for predicting complex biological responses.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08816v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jake Fawkes, Liam Hodgson, Jason Hartford</dc:creator>
    </item>
    <item>
      <title>Low-Variance Randomised Numerical Linear Algebra for Finite Element Simulation</title>
      <link>https://arxiv.org/abs/2606.08817</link>
      <description>arXiv:2606.08817v1 Announce Type: new 
Abstract: We present a low-variance randomised numerical linear algebra approach for multi-query finite element systems arising from parametric elliptic partial differential equations with applications to digital twins and online model calibration. The method relies on Galerkin subspace projection for reducing the dimensionality, and then combines parameter-oblivious leverage-score Bernoulli sampling with a control variates scheme to yield a reduced-variance `forward' sketch and an invertible `inverse' sketch that are then fused to a single efficient regularised estimator. Effectively, this reduces the computational cost in computing the projected system of equations while preserving the structure, stability, and accuracy of the underlying FEM formulation. We derive probabilistic bounds for the sketching error, invertibility, and estimator variance, and then validate the method on large-scale example problems. The results show that when the parameter fields do not vary too sharply, the synergy of control variates together with the sketch fusion can largely offset the loss incurred by the sub-optimal parameter-oblivious sampling. In this regime, our method achieves substantial savings in time, memory, and communication while maintaining accuracy levels that are acceptable for scientific simulation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08817v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>N. Polydorides, Y. Wu, . H. Noori, H. Vandierendonck, R. Woods</dc:creator>
    </item>
    <item>
      <title>Unstructured Mesh Tools for Fusion Energy System Design</title>
      <link>https://arxiv.org/abs/2606.08822</link>
      <description>arXiv:2606.08822v1 Announce Type: new 
Abstract: The execution of accurate simulations of fusion energy systems requires the appropriate representation of critical component geometries as well as the coupling of complex fusion physics codes with one another and with engineering analysis tools. This paper examines the challenges of creating simulation workflows that fully leverage existing fusion research codes while integrating them with commercial computer-aided engineering (CAE) software. Key areas addressed include: (a) the construction and meshing of analysis geometries taking full advantage of available geometric modeling and meshing technologies; (b) the effective coupling of fusion physics and engineering analysis codes; and (c) the support for simulation workflows that couple particle and continuum modeling methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08822v1</guid>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mark S. Shephard, Jacob S. Merson, Onkar Sahni, Cameron W. Smith, Usman Riaz, Fuad Hasan, Aditya Y. Joshi, Dhyanjyoti D. Nath, Abhiyan Paudel</dc:creator>
    </item>
    <item>
      <title>Syntax-driven Incremental Program Verification of Matching Logic Properties</title>
      <link>https://arxiv.org/abs/2606.08824</link>
      <description>arXiv:2606.08824v1 Announce Type: new 
Abstract: Incrementality is a fundamental design principle to master the complexity of large, long-lived software systems. This principle has been embraced by agile development processes and it lays at the base of continuous software evolution. A major challenge in this context is to incrementally re-verify the correctness of software artifacts after every change, focusing the verification efforts only on the parts affected by the change.
  We present an approach to the incremental verification of programs written in KernelC, annotated with properties expressed in matching logic. The approach is based on a syntactic-semantic framework that enables analyzing code chunks in isolation so that, after a change to a program fragment, only the part whose semantics is affected by the change is re-processed. This property is obtained by expressing the language syntax through an operator precedence grammar and by formalizing its semantics through a synthesized attribute schema.
  We have implemented our technique in a prototype tool and experimentally evaluated its effectiveness. The results show that our approach does not penalize the efficiency of formal verification and can outperform program re-verification after changes, depending on the presence and type of annotations, as well as the position of the change and the program structure.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08824v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Domenico Bianculli, Antonio Filieri, Carlo Ghezzi, Dino Mandrioli, Alessandro Maria Rizzi</dc:creator>
    </item>
    <item>
      <title>Classifying galaxies in the Galaxy10 DECals dataset using Inception and Residual CNNs</title>
      <link>https://arxiv.org/abs/2606.08826</link>
      <description>arXiv:2606.08826v1 Announce Type: new 
Abstract: Image data regarding galactic morphology is expected to increase both in quantity and quality for the next foreseeable years; thus it is important to explore which deep learning architectures adapted for image classification tasks are cost-effective. Residual and Inception networks are ideal for exploring classification convolutional neural networks (CNNs) due to their computational efficiency, achieved through techniques such as residual connections and parallelized inception modules, enabling deeper networks without excessively increasing computational complexity. In this work, we analyze the performance of ResNet101 and InceptionV4 on a spatially-augmented Galaxy10 DECals dataset. Retaining the ten-class classification of galaxies, we modify the image count of each class. We find that ResNet101 and InceptionV4 models achieved accuracies of $\sim$ 90%, comparable with reported performance in the literature. In terms of performance metrics, ResNet101 is superior to InceptionV4. Our results indicate that either of these CNN architectures could serve as a robust foundation for specialized pipelines for classification of galaxy images from upcoming surveys.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08826v1</guid>
      <category>cs.CV</category>
      <category>astro-ph.GA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>Proc. Samahang Pisika Pilipinas 42, SPP-2024-2E-05 (2024)</arxiv:journal_reference>
      <dc:creator>Lanz Anthonee A. Lagman, Prospero C. Naval Jr, Reinabelle C. Reyes</dc:creator>
    </item>
    <item>
      <title>Video2Sim2Real: Full-Stack Autonomous Dexterous Skill Acquisition from a Single Human Video</title>
      <link>https://arxiv.org/abs/2606.08828</link>
      <description>arXiv:2606.08828v1 Announce Type: new 
Abstract: Human manipulation videos are a convenient and intuitive source for robot learning. However, directly transferring human dexterity to robots remains challenging due to perception errors and embodiment gap. To address this, we introduce Video2Sim2Real, a full-stack framework for autonomous skill acquisition from a single human manipulation video. Our framework first uses off-the-shelf foundation models to reconstruct a simulator-ready digital twin and extract robot and object motion priors. Rather than treating the extracted robot motion as a reliable reference throughout execution, our key idea is to recover and leverage the most fundamental sources of supervision from the demonstrated skill: We identify object-centric keyframes to optimize the corresponding robot configurations using object information from the simulator, and use these configurations as anchors that refine the robot motion such that it ultimately has the desired impact on the environment. To bridge the remaining sim-to-real gap, we introduce a sim-to-real strategy that decouples robustness to noisy and incomplete perception from variations in hand-object interaction dynamics. Specifically, we learn to recalibrate robot configurations from noisy real-world point clouds via IL, and leverage residual RL to perform local finger-level adaptations to ensure for robust and effective interactions. Finally, a collision-aware motion planning module enables spatial generalization to novel object configurations. Across several everyday manipulation tasks, Video2Sim2Real improves simulated task success, safety, and trajectory coherence over numerous baselines, and achieves better sim-to-real transfer than existing techniques. These results demonstrate a promising path toward autonomous dexterous skill acquisition from human videos.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08828v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yunhai Han, Jianuo Qiu, Linhao Bai, Ziyu Xiao, Zihang Zeng, Yangcen Liu, Zhaodong Yang, Shalin Jain, Wenrui Ma, Jiaqi Fu, Yuqian Zheng, Manisha Natarajan, Muhammad Zubair Irshad, Kenneth Shaw, Matthew Gombolay, Zsolt Kira, Harish Ravichandar</dc:creator>
    </item>
    <item>
      <title>Flexible Coupler Antenna Enhanced Wireless Communication: Modeling and Coupler Position Optimization</title>
      <link>https://arxiv.org/abs/2606.08829</link>
      <description>arXiv:2606.08829v1 Announce Type: new 
Abstract: This paper proposes a novel flexible coupler antenna (FCA) that translates passive coupling elements around a fixed-position active antenna to reshape the induced currents on the passive elements for radiation. A new form of mechanical beamforming is achieved by moving only the passive coupling elements while keeping the active antenna stationary. The proposed design significantly reduces the antenna and radio-frequency (RF) chain costs of conventional active array beamforming with low mechanical control complexity and energy consumption. For the purpose of exposition, we consider a point-to-point communication system with one FCA at the transmitter and one fixed antenna at the receiver. Specifically, based on multi-port circuit theory, we establish both the line-of-sight (LoS) and multipath channel models and derive the mechanical beamforming weights of the passive couplers as functions of their positions. Then, we formulate a new problem to maximize the received signal-to-noise ratio (SNR) by optimizing the positions of passive couplers at the transmitter, subject to coupler movement and transmit power constraints. Solving the resulting problem is inherently difficult because coupled channel and mechanical beamforming create non-linearity in the objective function.To tackle this problem, we propose an efficient block-coordinate conditional gradient method to search for the best positions of all passive couplers by sequentially optimizing the position of each coupler with those of the other couplers fixed in an iterative manner.Simulation results demonstrate that the proposed system significantly outperforms benchmark schemes in terms of achievable rate, but
  with significantly reduced active antennas and RF chains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08829v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaodan Shao, Chuangye Shan, Yunlong Du, Junling Li, Rui Zhang, Cheng-Xiang Wang</dc:creator>
    </item>
    <item>
      <title>Inference-Time Conformal Reasoning with Valid Factuality Control for Large Language Models</title>
      <link>https://arxiv.org/abs/2606.08831</link>
      <description>arXiv:2606.08831v1 Announce Type: new 
Abstract: Large language models (LLMs) increasingly perform multi-step reasoning, where intermediate claims form implicit directed acyclic graphs whose node correctness is structurally conditioned on their ancestors. This makes factuality uncertainty structural, rather than a trivial accumulation of node-wise errors, and necessitates inference-time uncertainty quantification over the reasoning structure. While conformal prediction (CP) offers flexible user-specified factuality control, existing work remains post-hoc and cannot intervene during generation. To fill the gap between CP's flexibility and its post-hoc limitation, we propose an \emph{Inference-Time Conformal Reasoning (ITCR)} framework that integrates CP directly into reasoning graph generation. ITCR learns a structure-level factuality uncertainty function that aggregates claim-level factuality signals over reasoning graphs without complex modeling assumptions. We then design the non-conformity score based on graph-level factuality uncertainty and calibrate the conformal threshold to decide when to stop generation. We theoretically show such generation is nested, yielding valid coverage guarantees for factuality control. Experiments over multiple datasets and coverage objectives demonstrate empirically valid coverage. In downstream reasoning tasks, inference-time calibrated graphs yield more accurate generation than post-hoc pruned graphs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08831v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ting Wang, Yuanjie Shi, Yan Yan, Huan Zhang</dc:creator>
    </item>
    <item>
      <title>Instrumental convergence and power-seeking</title>
      <link>https://arxiv.org/abs/2606.08832</link>
      <description>arXiv:2606.08832v1 Announce Type: new 
Abstract: Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. I show how the argument from power-seeking rests on a strong version of a claim known as the instrumental convergence thesis. I explore leading defenses of the instrumental convergence thesis and argue that none establishes the thesis in a strong enough form to ground the argument from power-seeking. I discuss implications for longtermism, the governance of artificial intelligence, and the methodology of studying risks posed by artificial agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08832v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>David Thorstad</dc:creator>
    </item>
    <item>
      <title>CSFlow: Aligning Flow Matching with Human Contrast Sensitivity</title>
      <link>https://arxiv.org/abs/2606.08833</link>
      <description>arXiv:2606.08833v1 Announce Type: new 
Abstract: We introduce Contrast Sensitive Flow (CSFlow), a weighting scheme that connects the human eye's Contrast Sensitivity Function (CSF) to the iterative denoising steps of flow matching. Because real-world images concentrate signal at low spatial frequencies, these components reach high signal-to-noise ratio earlier during continuous diffusion than high-frequency components. When generating images with diffusion or flow matching models, this induces a soft autoregressive structure in Fourier space, where coarse image content stabilizes before fine detail. Meanwhile, the human visual system is unequally sensitive to spatial frequencies: very low and very high frequencies require significantly higher contrast to be perceived. We for the first time merge these observations through two contributions: (1) a metric that estimates which frequencies are generated at each reverse flow interval and (2) timestep weights obtained by aligning the frequencies generated at each noise level with human contrast sensitivity. We validate our contributions experimentally showing that these weights can improve generative performance by lowering FID by 4.7%, increasing Inception Score by 2.2% and improving GenEval scores by 2.5% using inference-only timestep modification or short fine-tuning. Qualitatively, we find that our CSFlow weights lead to better visual realism and less cartoonish appearance of generated images.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08833v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Malgorzata Galinska, Bart Pogodzinski, Jan Eric Lenssen</dc:creator>
    </item>
    <item>
      <title>Adaptive Model Predictive Control of Nonlinear Generic Urban Air Mobility Using Linear Parameter-Varying Systems</title>
      <link>https://arxiv.org/abs/2606.08836</link>
      <description>arXiv:2606.08836v1 Announce Type: new 
Abstract: This paper presents an adaptive model predictive control (MPC) framework for nonlinear urban air mobility (UAM) vehicles operating across the full flight envelope. The proposed approach leverages a linear parameter-varying (LPV) representation to update the predictive model online, enabling accurate capture of strongly nonlinear and time-varying dynamics associated with distributed electric propulsion (DEP) eVTOL aircraft. To systematically address the high-dimensional and coupled nature of MPC tuning, a multi-objective evolutionary optimization strategy based on NSGA-II is employed, incorporating proper normalization of states and control inputs to ensure balanced weighting and meaningful exploration of the design space. The resulting controller explicitly accounts for actuator constraints and enables reconfigurable control allocation for fault-tolerant operation. The framework is evaluated in nonlinear simulations using NASA's Generic Urban Air Mobility (GUAM) model and benchmarked against a robust servomechanism linear quadratic regulator (RSLQR). Results demonstrate that the proposed adaptive MPC achieves improved trajectory tracking and enhanced robustness under both nominal conditions and actuator degradation scenarios, including partial motor failure, while maintaining constraint satisfaction throughout all flight regimes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08836v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tri Ngo</dc:creator>
    </item>
    <item>
      <title>Beyond Pass Rate: A Multilingual, Execution-Grounded Evaluation of Open Code LLMs</title>
      <link>https://arxiv.org/abs/2606.08840</link>
      <description>arXiv:2606.08840v1 Announce Type: new 
Abstract: Code generation models are typically compared using compact execution benchmarks and aggregate pass rates, but such summaries obscure how performance varies across programming languages, problem families, and failure modes. We present a large-scale, execution-grounded evaluation of 9 openly accessible LLMs specialized for coding on 2,707 free LeetCode problems across 12 programming languages. Our corpus contains 325,343 problem-model-language jobs, each linked to prompt metadata, extracted code, LeetCode execution outcomes, and static-analysis signals. The results show that current open models remain far from the human acceptance reference: the best model, Yi-Coder-9B-Chat, reaches 23.64% mean correctness, compared with a 57.2% human acceptance baseline. Rankings are also slice-dependent: Qwen2.5-Coder-14B-Instruct is strongest on hard problems and distinct-problem coverage, while Gemma-2-27B-IT achieves the highest all-language lint pass rate. Failure analysis shows that compile errors account for 63.25% of non-accepted best submissions, indicating that many failures occur before semantic correctness can be tested. Static quality further diverges from functional correctness. Together, these findings show that multilingual, artifact-preserving evaluation reveals tradeoffs hidden by single-language or single-metric leaderboards.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08840v1</guid>
      <category>cs.AI</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sayed Erfan Arefin</dc:creator>
    </item>
    <item>
      <title>ZIPP:Zero-shot Image Personalization from Personas</title>
      <link>https://arxiv.org/abs/2606.08841</link>
      <description>arXiv:2606.08841v1 Announce Type: new 
Abstract: Text-to-image diffusion models are increasingly deployed in open-ended creative contexts, yet their outputs remain impersonal, optimized for aggregate aesthetics rather than individual taste. Human preferences are pluralistic: one user favoring muted, nostalgic portraits may prefer vibrant street photography, while another gravitates toward dreamy film aesthetics. Existing methods require dense interaction histories or per-user fine-tuning, failing in cold-start settings and collapsing context-dependent preferences into a static representation. We introduce zero-shot image personalization from personas (ZIPP), which conditions image generation on natural-language personas (concise descriptors of a user's identity and aesthetic sensibilities) without any user-specific data or weight updates. ZIPP uses an LLM to rewrite prompts from the perspective of a given persona, steering diffusion models toward personalized outputs. To mine personas at scale, we train an inductive Graph Attention Network over a 22M-user Reddit interaction graph with dual contrastive objectives aligning graph structure with visual behavior, then verbalize learned representations into natural-language personas via an MLLM. We introduce ZIPBench, the first zero-shot personalization benchmark with 1.5K users, graph-mined personas, and 40K generated images. Across four benchmarks and 14 LLMs spanning five model families, persona conditioning yields consistent gains (13-20%), with frontier models benefiting most. In the few-shot setting, ZIPP matches or exceeds fine-tuned baselines trained on 100+ examples per user. ZIPP achieves the lowest preference distributional divergence (CMMD 0.16 vs. 0.55), and IPF-normalized demographic evaluation shows it substantially reduces subpopulation bias present in existing methods. Human evaluation confirms a 79% win rate over generic generation and 58-65% over all fine-tuned baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08841v1</guid>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Harini SI, Somesh Singh, Yaman Kumar Singla, David Doermann, Rajiv Ratn Shah</dc:creator>
    </item>
    <item>
      <title>From A to B to A: Palindromic Zero-Shot Voice Conversion with Non-Parallel Data</title>
      <link>https://arxiv.org/abs/2606.08843</link>
      <description>arXiv:2606.08843v1 Announce Type: new 
Abstract: We present a voice conversion (VC) framework that utilizes K-Nearest Neighbors (KNN) retrieval over WavLM representations to align non-parallel source and target speech, constructing synthetic training pairs for supervised learning. The retrieved segments serve as synthetic inputs, while real target audio provides ground-truth outputs, forming a synthetic-to-real training paradigm that naturally supports multilingual data without requiring parallel corpora or explicit alignment. To ensure consistent target-speaker identity, we incorporate a speaker loss derived from a pretrained speaker verification model. Experiments across multiple languages demonstrate that the proposed approach achieves high naturalness and strong speaker similarity, outperforming competitive VC baselines, despite being trained exclusively on English data. Samples can be accessed at: https://palindromic-vc.github.io.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08843v1</guid>
      <category>cs.SD</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Moshe Mandel, Shlomo E. Chazan</dc:creator>
    </item>
    <item>
      <title>Geometry-Aware Fisheye-LiDAR Fusion for Robust 3D Object Detection in Low-Overlap Setups</title>
      <link>https://arxiv.org/abs/2606.08844</link>
      <description>arXiv:2606.08844v1 Announce Type: new 
Abstract: As autonomous systems expand from capital-intensive robotaxis to cost-sensitive logistics, sensor configurations are increasingly optimized for coverage-per-cost. A prevalent sparse-view setup utilizes dual-fisheye cameras with a roof-mounted LiDAR, introducing severe geometric challenges: extreme radial distortion, minimal overlap, and misalignment between spherical projections and rectilinear grids. BEV fusion algorithms typically force image and point cloud modalities into unified Cartesian grids early in the pipeline, causing significant feature distortion and information loss for wide-view fisheye cameras. To address this, we propose a Geometry-Aware Hybrid Fusion (GA-HF) framework that explicitly accounts for fisheye geometry and BEV feature distortion, where fisheye features are lifted into a polar BEV grid via a Distortion-Aware Lift-Splat-Shoot (LSS) module to preserve native angular density, while LiDAR features are processed in native Cartesian space for metric fidelity of bounding box regression. To bridge these heterogeneous streams, we introduce a Dual-Attention Warping Correction module that applies spatial and channel attention to the warped camera features before fusion, explicitly suppressing artifacts in low-quality peripheral regions while enhancing high-quality semantic cues. GA-HF is evaluated on three benchmarks: KITTI-360, Dur360BEV, and Fisheye3DOD datasets. To the best of our knowledge, it is the first approach to explore LiDAR-fisheye camera fusion. On KITTI-360, GA-HF improves NDS by 4.2% over Cartesian baselines; on Dur360BEV, it surpasses both LiDAR-only and BEVFusion, while significantly reducing orientation error despite the geometric distortions; on Fisheye3DOD, it attains the highest detection score among all fusion methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08844v1</guid>
      <category>cs.CV</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiangzhong Liu, Xihao Wang, Hao Shen</dc:creator>
    </item>
    <item>
      <title>BLM-SGAN: Bidirectional Language Modeling for Semantic-Spatial Text-to-Image Generation</title>
      <link>https://arxiv.org/abs/2606.08847</link>
      <description>arXiv:2606.08847v1 Announce Type: new 
Abstract: Despite the success of image generation from text descriptions, it still faces challenges that are difficult to overcome in domains such as natural language processing (NLP) and computer vision (CV). Recent advancements in text-to-image (T2I) models, particularly those utilizing generative adversarial networks (GANs), have significantly improved the synthesis of realistic images across various domains. However, existing GAN-based T2I models still encounter key challenges, such as difficulty in capturing long-range dependencies, vanishing gradients, and the limitations of sequential processing. To address these issues, we introduce BLM-SGAN, a novel model that incorporates Bidirectional Language Modeling for Semantic-Spatial Text-to-Image Generation. BLM-SGAN leverages BERT's attention mechanisms to capture rich contextual information and efficiently manage extended sequences. Our model demonstrates state-of-the-art performance, with an Inception Score (IS) of 5.45 +/- 0.08, surpassing several competitive models such as SSA-GAN, DF-GAN, SD-GAN, and AttnGAN. BLM-SGAN effectively generates highly realistic images of birds from detailed text descriptions. The implementation code is available at: https://github.com/haidy-maher/BLM-SGAN-Text-to-Image-Generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08847v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <arxiv:DOI>10.1007/978-3-031-91351-8_5</arxiv:DOI>
      <arxiv:journal_reference>Advances on Intelligent Computing and Data Science II (ICACIn 2024), Lecture Notes on Data Engineering and Communications Technologies, vol. 254, Springer, Cham, 2025</arxiv:journal_reference>
      <dc:creator>Ahmed Abdelmoneim Mazrou, Haidy Maher El-Amir, Ali Hamdi</dc:creator>
    </item>
    <item>
      <title>A Resilience-as-a-Service assessment framework for coordinated disruption response in interdependent urban transit systems</title>
      <link>https://arxiv.org/abs/2606.08849</link>
      <description>arXiv:2606.08849v1 Announce Type: new 
Abstract: Urban public transport disruptions require rapid response strategies, yet existing studies rarely provide a decision support framework to compare alternative disruption response solutions using a common set of dynamic, passenger, operator, and environment oriented indicators. This paper proposes a KPI-driven, time-indexed framework to assess the resilience of disruption response solutions in urban transit systems. The framework combines an optimization model with a behavioral evaluation in agent-based simulation. It also underlays the secondary service degradation induced on helper lines when in-service vehicles are withdrawn to support the disrupted corridor. Rather than treating resilience as a single score, it evaluates complementary dimensions including vulnerability, adaptability, robustness, resilience loss, responsiveness, cost-based performance, emissions, and equity. The framework is implemented for the RER B transit line in the Ile-de-France (Paris) network. Results show that the coordinated strategy provides the most balanced resilience profile, combining high service continuity with lower total disruption cost than single mode alternatives, while also improving equity and maintaining competitive environmental performance. Sensitivity analysis further identifies the disruption conditions under which coordinated multimodal response is most valuable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08849v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Sara Jaber, S. M. Hassan Mahdavi, Neila Bhouri, Mostafa Ameli</dc:creator>
    </item>
    <item>
      <title>Intrinsic Selection and Particle Resampling for Inference-Time Scaling Beyond Domain Verifiability</title>
      <link>https://arxiv.org/abs/2606.08850</link>
      <description>arXiv:2606.08850v1 Announce Type: new 
Abstract: Inference-Time Scaling (ITS) has largely succeeded in verifiable domains like math and coding, where cheap verification enables scalable output selection. However, extending ITS to tasks prone to systematic failure - driven by faulty initial assumptions or unmet multidimensional constraints - typically relies on costly external solvers or brittle, model-based verifiers. Our key insight is that the intrinsic statistics of parallel sample sets, specifically length-adjusted tail entropy, provide a robust discriminative signal for solution quality without access to ground truth. Crucially, these statistics serve as a difficulty gate for adaptive compute allocation, dynamically routing problems across scaling regimes. First, Intrinsic Selection (iS) ranks candidates post-hoc, matching consensus-based algorithms across three domains and improving engineering design selection by 20% over pass@1 baselines. Second, Intrinsic Particle Filtering (iPF) generalizes this to step-level resampling, guiding generation toward high-confidence reasoning trajectories to improve pass@1 by 6.1 points on average on hard math problems. Finally, Particle Distillation (dPF) injects privileged guidance via early logit blending and KL-guided resampling, steering generation past systematic reasoning errors to satisfy expert rubrics, yielding up to 26.5% gains on complex clinical responses. Our pipeline applies seamlessly across broad-purpose, domain-specialized, and multimodal architectures, successfully extending ITS to open-ended domains without requiring trained reward models or exact ground-truth verification.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08850v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Giorgio Giannone, Mustafa Eyceoz, Shabana Baig, Shivchander Sudalairaj, Anna C. Doris, Faez Ahmed, Akash Srivastava, Kai Xu</dc:creator>
    </item>
    <item>
      <title>Enforcing Trust Accountability with Backward Propagation</title>
      <link>https://arxiv.org/abs/2606.08851</link>
      <description>arXiv:2606.08851v1 Announce Type: new 
Abstract: Trust and reputation management underpins reliable interactions in distributed networks, yet existing trust models rely solely on forward propagation of interaction-based trust signals. They lack robust mechanisms to enforce accountability for the propagated trust signals when negative interactions occur. In addition, such models often fail to initialize newly joined nodes with sparse interaction history, leading to the cold-start problem. In this paper, we propose RepuLink, a two-layer reputation model that couples an endorsement network with an interaction feedback network. RepuLink integrates two concurrent backward propagation mechanisms: Backward Endorsement Penalty Propagation (BEPP), which recursively penalizes endorsers of misbehaving nodes, and Backward Endorsement Reward Propagation (BERP), which rewards endorsers of well-performing nodes. Together, RepuLink enforces endorsement accountability and incentivizes positive behaviors, which form a positive interaction feedback loop. The endorsement layer further provides explainable, endorser-weighted trust initialization for newly joined nodes. Experiments on real-world datasets against representative trust propagation baselines demonstrate that RepuLink outperforms across four evaluation metrics in both interaction-only and full two-layer settings, while preserving comparable efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08851v1</guid>
      <category>cs.SI</category>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3770855.3817611</arxiv:DOI>
      <dc:creator>Wenbo Wu, George Konstantinidis</dc:creator>
    </item>
    <item>
      <title>Parallel SMT Solving via Dynamic Partitioning, Core-Guided Pruning, and Online Backbone Detection</title>
      <link>https://arxiv.org/abs/2606.08852</link>
      <description>arXiv:2606.08852v1 Announce Type: new 
Abstract: Exploiting parallelism in modern CPU architectures remains a longstanding challenge in optimizing SMT solvers. We introduce a novel parallel framework that dynamically builds a binary partition tree of the search space by sampling from workers' VSIDS statistics during solving. We leverage the full power of core-based CDCL-style pruning to continuously shrink the partition tree. We further optimize our architecture by incorporating online backbone detection into worker threads, as well as a terminate-on-demand mechanism to eagerly eliminate work on pruned subproblems. The resulting algorithm is highly generalizable and scales effectively with available resources. We implement our approach in the Z3 SMT solver and demonstrate that it outperforms both sequential Z3 and existing state-of-the-art parallel frameworks on challenging benchmarks from six logics in the SMT-COMP 2025 Parallel Track.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08852v1</guid>
      <category>cs.LO</category>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ilana Shapiro, Sorin Lerner, Nikolaj Bj{\o}rner</dc:creator>
    </item>
    <item>
      <title>sGPO: Trading Inference FLOPs for Training Efficiency in RLVR</title>
      <link>https://arxiv.org/abs/2606.08854</link>
      <description>arXiv:2606.08854v1 Announce Type: new 
Abstract: Standard Reinforcement Learning with Verifiable Rewards (RLVR) training allocates a fixed rollout budget to every query, without regard for what each query's difficulty means for the current policy. This leads to two symmetric failure modes: easy queries produce near-zero advantage because the policy already solves them, while unsolvable queries produce no signal because the policy never solves them. Both regimes waste training FLOPs without contributing to a learning gradient. We introduce sorted Group Policy Optimization (sGPO), a compute-efficient strategy that trades a small budget of inference FLOPs for a large reduction in wasted training FLOPs. The key insight is that cheap inference compute can serve as a single offline proxy for query difficulty. By generating a small batch of parallel samples per query under the initial policy, we obtain a model-aware empirical success rate. This motivates setting the training rollout group size to the inverse of this success rate, a practical rule that maximizes sample efficiency by extracting the most advantage per generated rollout. This single profiling pass simultaneously drives data filtering (removing trivial queries and sub-sampling unsolvable ones), adaptive group size allocation, and curriculum construction (scheduling queries from easy to hard). sGPO matches or exceeds baseline performance while reducing total training compute by a factor of three, with the upfront inference profiling cost included.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08854v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shivchander Sudalairaj, Kai Xu, Akash Srivastava, Giorgio Giannone</dc:creator>
    </item>
    <item>
      <title>Hybrid E-Assessment in Higher Education: Semi-Automated Grading of Paper-Based Written Examinations</title>
      <link>https://arxiv.org/abs/2606.08855</link>
      <description>arXiv:2606.08855v1 Announce Type: new 
Abstract: This paper examines the limitations of fully digital and partially digital e-assessment approaches in summative examinations in higher education. The analysis focuses on the didactic narrowing caused by closed question formats and on organizational, technical, and legal constraints that become particularly relevant in large student cohorts. As an alternative, the paper proposes a hybrid e-assessment approach that retains paper-based, problem-oriented examination tasks while enabling semi-automated grading. Assessment-relevant intermediate results are encoded in a structured answer format, entered by students by hand, and subsequently captured from table fields. The central technical bottleneck is reliable recognition of handwritten characters under realistic examination conditions. Recent vision-capable large language models, combined with a two-pass validation principle and comparison against a solution key, can reduce misclassifications and thereby improve the validity, fairness, and scalability of summative assessment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08855v1</guid>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hartwig Grabowski, Michael Canz</dc:creator>
    </item>
    <item>
      <title>PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers on Overleaf</title>
      <link>https://arxiv.org/abs/2606.08857</link>
      <description>arXiv:2606.08857v1 Announce Type: new 
Abstract: Expert writing feedback from experienced researchers is critical for early-career scholars to improve their manuscripts, yet high-quality feedback often remains scarce because reviewing research papers is labor-intensive. Emerging AI-powered writing assistants largely focus on grammar fixes or simulating peer review with final scores, yet they fall short of providing concrete, actionable suggestions that help students improve their papers during drafting. We present PaperMentor, a human-centered writing assistant system that delivers actionable suggestions as Overleaf-native inline comments while leaving the actual writing entirely to human authors. PaperMentor integrates an expert skill library carefully curated from established researchers' writing advice with 12 specialized agents covering different aspects of paper writing, such as formatting compliance, phrasing accuracy, and terminology consistency. In a user study (n=14), 90.6% of the generated comments were rated actionable and 67.5% were rated valid, significantly outperforming a GPT-5.2 baseline uswithout the skill library. We release PaperMentor as open source for public use. Our code is publicly available under the AGPL-3.0 license at https://github.com/jiarui-liu/overleaf</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08857v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiarui Liu, Terry Jingchen Zhang, Ryan Faulkner, X. Angelo Huang, Vil\'em Zouhar, Dominik Glandorf, Isabel Dahlgren, Van Q. Truong, Rishit Dagli, Yuen Chen, Felix Leeb, Punya Syon Pandey, Yves Bicker, Suvajit Majumder, Wenyuan Jiang, Zeju Qiu, Sankalan Pal Chowdhury, Bernhard Sch\"olkopf, Mona Diab, Zhijing Jin</dc:creator>
    </item>
    <item>
      <title>Intelligent Character Recognition of Handwritten Forms with Deep Neural Networks</title>
      <link>https://arxiv.org/abs/2606.08858</link>
      <description>arXiv:2606.08858v1 Announce Type: new 
Abstract: The automatic processing of handwritten forms remains a challenging task, wherein detection and subsequent classification of handwritten characters are essential steps. We describe a novel approach, in which both steps -- detection and classification -- are executed in one task through a deep neural network. Therefore, training data is not annotated by hand, but manufactured artificially from the underlying forms and yet existing datasets. It can be demonstrated that this single-task approach is superior in comparison to the state-of-the-art two-task approach. The current study focuses on hand-written Latin letters and employs the EMNIST data set. However, limitations were identified with this data set, necessitating further customization. Finally, an overall recognition rate of 88.28 percent was attained on real data obtained from a written exam.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08858v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1007/978-3-031-42532-5_6</arxiv:DOI>
      <arxiv:journal_reference>In: Cavallucci D., Livotov P., Brad S. (eds), Towards AI-Aided Invention and Innovation, IFIP Advances in Information and Communication Technology, vol. 682, Springer Nature Switzerland, 2023, pp. 81-94</arxiv:journal_reference>
      <dc:creator>Hartwig Grabowski</dc:creator>
    </item>
    <item>
      <title>Vision-Language Work Zone Intelligence for Safety-Critical Speed Regulation of Mixed-Autonomy Vehicles in Dynamic Environments</title>
      <link>https://arxiv.org/abs/2606.08860</link>
      <description>arXiv:2606.08860v1 Announce Type: new 
Abstract: Temporary work-zone speed limits are communicated through visually inconsistent signage and are often missing from digital maps, creating safety risks for human drivers and automated vehicle systems. We present a real-time, onboard perception pipeline that detects active work zones, recognizes associated temporary speed limits, and outputs a law-aware work-zone state and speed value suitable for driver alerts or downstream automated control. The system fuses object detections with semantic verification and temporally smoothed, hysteresis-based state transitions to reduce false activations and flicker in dynamic scenes, and runs fully on low-cost embedded hardware. Evaluated manually on a annotated subset of the ROADWork dataset (490 sequences), the system achieves inside-work-zone event-level recall of 96.5% and event-level precision of 68.7%. Speed-limit recognition evaluated on 35 minutes of in-house driving data attains 95.45% precision and 53.85% recall, with no incorrect speed classifications and a single false positive. These results demonstrate a practical, scalable approach for grounding work-zone speed awareness directly in onboard perception rather than maps or infrastructure. We release our source code for the proposed system pipeline on our GitHub repository: https://github.com/Mi3-Lab/workzone</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08860v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Angel Martinez-Sanchez, Kianna Ng, Wesley Maia, Laura Fleig, Maitrayee Keskar, Erika Maquiling, Yash Tandon, Parthib Roy, Mohan Trivedi, Ross Greer</dc:creator>
    </item>
    <item>
      <title>CHROMA: Detecting AI-Generated Images through Inter-Channel Color-Space Correlations</title>
      <link>https://arxiv.org/abs/2606.08864</link>
      <description>arXiv:2606.08864v1 Announce Type: new 
Abstract: The rapid adoption of diffusion and large-scale generative models has made it increasingly challenging to distinguish synthetic imagery from real photographs. While automated detectors have been proposed, their generalization to unseen generators remains brittle. To address this limitation, we investigate inter-channel color correlations, a lightweight and underexploited forensic cue. We first demonstrate that LPIPS, a widely used perceptual metric, exhibits inconsistent responses to perturbations that selectively alter channel dependence across different color-space parameterizations, indicating that cross-channel statistics are not uniformly constrained by common perceptual training objectives. Motivated by this, we analyze the distributions of pairwise inter-channel correlation features across multiple color spaces. Our analysis reveals systematic, generator-specific differences in these distributions, with RGB and Lab color spaces providing the most apparent separation between real and generated images. Building on this, we introduce Chroma, a detector of AI-generated images which augments standard RGB inputs with inter-channel correlation maps and employs a fixed CNN backbone trained with a modest computational budget. We assess its robustness under both single-generator training and a limited multi-generator supervision regime, where only a few samples from additional generators are available. Across a standard benchmark protocol, correlation-augmented inputs improve real-vs-generated discrimination and robustness, yielding performance competitive with recent detectors while maintaining a simple architecture and training procedure. Code is available at https://github.com/JPSoteloSilva/CHROMA</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08864v1</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Juan Pablo Sotelo, Marina Gardella, Pablo Mus\'e</dc:creator>
    </item>
    <item>
      <title>Generalizing Geometry-Guided Mamba as a Plug-and-Play Context Module for CNN-based Semantic Segmentation</title>
      <link>https://arxiv.org/abs/2606.08866</link>
      <description>arXiv:2606.08866v1 Announce Type: new 
Abstract: CNN-based semantic segmentation networks usually rely on context heads such as ASPP, PPM, or attention modules to enlarge the receptive field. These heads are effective but may introduce heavy computation, memory cost, or boundary leakage. This paper revisits Directional Geometric Mamba (G-Mamba) from DGM-Net and studies it as a plug-and-play context aggregation module rather than a complete new segmentation architecture. The key idea is to inject geometric guidance into the selective scan process, allowing long-range feature propagation to be modulated by boundary and centripetal-flow cues. We replace the original context heads of six representative CNN segmentation models, including DeepLabV3+, DANet, CCNet, PSPNet, PSANet, and OCRNet, while keeping the ResNet-101 backbone unchanged. Results on Cityscapes show consistent mIoU gains with only moderate extra GFLOPs at $1024\times1024$ resolution, suggesting that geometry-guided SSM modules can serve as practical alternatives or enhancements to conventional CNN context heads.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08866v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sheng-Wei Chan, Hsin-Jui Pan, Chun-Po Shen, Chia-Min Lin, Yung-Che Wang, Jen-Shiun Chiang</dc:creator>
    </item>
    <item>
      <title>Building Customer Support AI Agents at 100M-User Scale: An Evaluation-Driven Framework</title>
      <link>https://arxiv.org/abs/2606.08867</link>
      <description>arXiv:2606.08867v1 Announce Type: new 
Abstract: The rapid rise in LLM capabilities has made AI agents increasingly viable across a broad range of tasks. Among the most promising applications is building production-ready customer-facing agents, a challenge that demands coordinated excellence in evaluation methodology, context engineering, training, and online measurement. Yet these critical pillars are typically developed in isolation, creating blind spots that only surface after deployment.
  In this paper, we present a unified framework that bridges offline development with online impact for customer support AI agents at Nubank, a company with 100M+ users. Our approach integrates several key components: (1) structured context engineering tailored to customer support agents, (2) systematic human-in-the-loop prompt iteration, (3) rigorous LLM judge evaluation with measured inter-rater agreement and GEPA optimization for consistency, and (4) ideation-to-production validation.
  A central insight is that evaluation-pipeline quality directly determines iteration velocity. We present results from five production deployments spanning distinct domains: card delivery, debt management, credit-limit support, card management, and product explanation. These deployments deliver consistent customer-satisfaction gains while substantially accelerating iteration. In our card-delivery deployment, large-scale A/B testing yields a 37 percentage-point improvement in AI transactional Net Promoter Score and a 29 percentage-point gain in self-service rate over prior agent variants, alongside a strong correlation between offline simulation metrics and online outcomes, demonstrating that eval-driven development reliably predicts production impact. On most use cases, AI satisfaction reaches within a few percentage points of expert human agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08867v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3770855.3818332</arxiv:DOI>
      <dc:creator>Aman Gupta, Kevin Rossell, Edesio Alcoba\c{c}a, Jose Chrystian Lima Pacheco, Carolina Baptista de Lima, Shao Tang, Luiz Paulo Rabachini, Luis Moneda, Herbert Fei, Daniel Silva, Rohan Ramanath</dc:creator>
    </item>
    <item>
      <title>A Low-Latency Semantic State Estimator using Latent Predictive Learning for Dynamic Network Monitoring and Orchestration</title>
      <link>https://arxiv.org/abs/2606.08869</link>
      <description>arXiv:2606.08869v1 Announce Type: new 
Abstract: Closed-loop network monitoring and orchestration increasingly require semantic interpretations of live telemetry beyond raw counter collection. However, dynamic cloud-edge environments change both the active node set and the monitoring query at runtime, while control loops demand bounded millisecond-scale responses. We introduce a latent predictive state estimator (LPSE) for dynamic network monitoring and orchestration, built on latent predictive learning over streaming telemetry. The framework converts variable-cardinality node telemetry into topology-adaptive temporal representations, fuses them with monitoring questions, and returns bounded answers from a semantic codebook instead of autoregressive text generation. This design enables fixed-cost, single-pass inference while preserving semantic interpretability. By operating on permutation-invariant, slot-routed node representations keyed by stable identity, the model maintains a fixed input space and generalizes to node addition, removal, and reordering without retraining. Experimental results on a multi-node Kubernetes cluster show semantic prediction accuracy of 82.42% at approximately 41$\times$ lower mean inference latency and 15$\times$ smaller memory footprint compared with a deployable 4B LLM endpoint.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08869v1</guid>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hari Madhukumar, Haiyuan Li, Xiaolan Liu, Andy Corston-Petrie, Dimitra Simeonidou</dc:creator>
    </item>
    <item>
      <title>Fourier Neural Operators with rank-1 lattice points and hyperbolic cross</title>
      <link>https://arxiv.org/abs/2606.08871</link>
      <description>arXiv:2606.08871v1 Announce Type: new 
Abstract: The \emph{Fourier neural operator} (FNO) is a neural network architecture that learns mappings between function spaces. Its efficient implementation is based on the multi-dimensional Fourier transform. By deriving general regularity bounds for the FNO with respect to both the spatial and parametric variables, we prove that the generalization error of the FNO can be improved by replacing spatial tensor product grids with purpose-built rank-1 lattice points, and by using a second lattice carefully constructed as training points in the parametric space. We achieve more accurate and efficient approximations from fewer network parameters, fewer spatial points, and fewer training samples. In addition, the architecture is simplified, because the high-dimensional Fourier transform on rank-1 lattices requires only a \emph{one-dimensional fast Fourier transform}, and we can use a \emph{hyperbolic cross} frequency index set with lattice points. We demonstrate the benefits of our \emph{lattice-based hyperbolic-cross FNOs} for an elliptic PDE on the torus.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08871v1</guid>
      <category>math.NA</category>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jakob Dilen, Alexander Keller, Frances Y. Kuo, Dirk Nuyens</dc:creator>
    </item>
    <item>
      <title>EFX for Additive Chores: Nonexistence, Pareto Incompatibility, and Bi-Valued Existence</title>
      <link>https://arxiv.org/abs/2606.08872</link>
      <description>arXiv:2606.08872v1 Announce Type: new 
Abstract: We consider the fair division problem of indivisible chores and resolve the long-standing open problem for the existence of EFX allocations with additive cost functions. We show that, even for tri-valued additive cost functions, for every $n\geq 4$, there exists an instance with $n$ agents where no EFX allocation exists. Our counterexample only uses three types of chores, which is also tight on the number of types, as an EFX allocation is known to exist for two types of chores.
  We then consider bi-valued instances. We show that, for every $n\geq 4$, there exists an instance with $n$ agents where every EFX allocation is not Pareto-optimal. This is also the first example showing the incompatibility of EFX and Pareto-optimality when the costs of items are positive: existing examples showing the incompatibility of EFX and Pareto-optimal exploit items with $0$ costs. Our result shows such an example exists even for bi-valued instances. The number of agents $n$ is also tight: for $n\leq 3$, it is known that EFX is compatible with Pareto-optimality. Finally, we also show that an EFX allocation is guaranteed to exist for $n=4$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08872v1</guid>
      <category>cs.GT</category>
      <category>econ.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wentao He, Biaoshuai Tao</dc:creator>
    </item>
    <item>
      <title>Can the Environment Speak for Itself? $T^{2}$-GRPO: A Turn-Trajectory Group Relative Policy Optimization for Caregiver Agents</title>
      <link>https://arxiv.org/abs/2606.08875</link>
      <description>arXiv:2606.08875v1 Announce Type: new 
Abstract: Optimizing large language models (LLMs) for long-horizon caregiver agents requires balancing delayed task objectives with immediate environment dynamics, such as patient distress and resistance. In dementia care, this balance is especially difficult: trajectory level rewards are too sparse for turn level credit assignment, while external LLM-based evaluators are costly and can misread fragmented or indirect patient responses. To address this issue, we propose \textbf{T}urn-\textbf{T}rajectory \textbf{G}roup \textbf{R}elative \textbf{P}olicy \textbf{O}ptimization (\textbf{T$^{2}$-GRPO}), a framework that decouples caregiver RL into two normalized reward horizons and enforces safety through a binary hard veto. $T^2$-GRPO derives dense turn-level rewards directly from environment state transitions, measuring changes in patient distress and resistance from a frozen dementia patient simulator. These environment-grounded rewards are combined with trajectory-level evaluations through independent centered-rank normalization, which preserves heterogeneous reward signals and mitigates reward collapse. Extensive experiments on dementia caregivers show that T $^{2}$-GRPO outperforms competitive baselines, indicating a substantial improvement for emotionally sensitive caregiver scenarios that effectively handles immediate patient feedback, long-term care outcomes, and safety constraints.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08875v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yutong Song, Jiang Wu, Pengfei Zhang, Wenjun Huang, Honghui Xu, Nikil Dutt, Amir M. Rahmani</dc:creator>
    </item>
    <item>
      <title>PerspectiveGap: A Benchmark for Multi-Agent Orchestration Prompting</title>
      <link>https://arxiv.org/abs/2606.08878</link>
      <description>arXiv:2606.08878v1 Announce Type: new 
Abstract: Real-world LLM applications are moving beyond single-agent workflows toward orchestrated multi-agent systems, yet current models still struggle to determine what each sub-agent needs to know. To measure this, we introduce PerspectiveGap, a benchmark for evaluating LLMs' ability to compose orchestration prompts for multi-agent systems. PerspectiveGap contains 110 scenarios, each evaluated through two distractor-mixed task formats: role-fragment assignment and free-form prompt writing. These scenarios are organized into 10 topologies, which are distilled from the authors' real-world engineering practice and framed by the Prompt Economy principle: building loop-centered orchestrations that maximize utility with minimal role and engineering overhead. In experiments with 27 commercial models from 10 companies, GPT-5.5 substantially outperforms all competitors, whereas Opus 4.7 shows a notable weakness in orchestration prompting despite its strong coding performance. Nevertheless, PerspectiveGap remains challenging: the evaluated models achieve an average combined pass rate of only 14.9\% (GPT-5.5 62.0\%) and an average overall leakage rate of 246.5\% (a per-scenario information leak-event count, not a proportion; GPT-5.5 49.1\%). These findings suggest that multi-agent orchestration prompting is a distinct and under-evaluated capability, and PerspectiveGap provides a foundation for measuring and improving it systematically.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08878v1</guid>
      <category>cs.CL</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Youran Sun, Xingyu Ren, Kejia Zhang, Xinpeng Liu, Jiaxuan Guo</dc:creator>
    </item>
    <item>
      <title>Direct Data-driven Predictive Control: A Computationally Efficient Alternative to DeePC for Eco-driving in Mixed Traffic Flows</title>
      <link>https://arxiv.org/abs/2606.08880</link>
      <description>arXiv:2606.08880v1 Announce Type: new 
Abstract: Improving energy efficiency in the transportation sector is critical for achieving sustainable mobility, with eco-driving emerging as a key strategy. However, implementing effective eco-driving for connected and automated vehicles (CAVs) in mixed traffic presents a significant control challenge due to the heterogeneous, uncertain behavior of human-driven vehicles (HDVs). Data-enabled Predictive Control (DeePC) offers a promising model-free approach but is often hindered by a high computational burden, limiting its real-time feasibility. This paper introduces a novel Direct Data-driven Predictive Control (D3PC) framework to address this limitation. By reformulating the data-driven prediction mechanism, the D3PC significantly reduces computational complexity, making its computation time nearly invariant to historical data size. This computational efficiency directly enables the formulation of a sophisticated eco-driving controller that can solve the complex energy optimization problem in real time, even within diverse and stochastic mixed-traffic environments. Comprehensive simulations demonstrate that the D3PC is orders of magnitude faster than existing DeePC-based methods while achieving superior energy efficiency. Specifically, it reduces total platoon energy consumption by up to 10.71% compared to rule-based cruise control baselines and 3.80% compared to the original DeePC, confirming its effectiveness for real-time, energy-efficient control.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08880v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dongjun Li, Haoxuan Dong, Liangcai Xu, Ziyou Song</dc:creator>
    </item>
    <item>
      <title>Benchmarking Vision-Language-Action Models on SO-101: Failure and Recovery Analysis</title>
      <link>https://arxiv.org/abs/2606.08881</link>
      <description>arXiv:2606.08881v1 Announce Type: new 
Abstract: Vision-Language-Action (VLA) models have demonstrated strong generalization in robotic manipulation, yet existing evaluations are primarily conducted in simulation or on expensive robotic platforms, leaving their robustness on affordable real-world robots largely unexplored. We present a standardized real-world benchmark for evaluating representative VLA and imitation learning policies on the low-cost SO-101 robotic platform. The benchmark comprises four representative manipulation tasks together with unified evaluation protocols, enabling systematic comparison under embodiment uncertainty. Using real-world teleoperated demonstrations, we fine-tune and evaluate $\pi_{0.5}$, SmolVLA, Wall-X, and ACT directly on the physical platform. Beyond conventional task success rates, the benchmark incorporates a structured failure taxonomy, semantic- and execution-level failure decomposition, and recovery-aware evaluation metrics to characterize policy robustness. Experimental results show that stronger pretrained VLA policies generally outperform the imitation learning baseline, although performance remains highly task-dependent under low-cost robotic deployment conditions. Execution instability emerges as the dominant failure source, while recovery capability varies substantially across architectures. These results highlight the importance of failure and recovery analysis beyond binary task success and establish SO-101 as a practical benchmark for evaluating embodied AI systems under realistic low-cost robotic deployment conditions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08881v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yi Yu, Xinchuan Qiu</dc:creator>
    </item>
    <item>
      <title>Accelerating GMRES with Matrix-Free Multiscale Robin Preconditioners</title>
      <link>https://arxiv.org/abs/2606.08883</link>
      <description>arXiv:2606.08883v1 Announce Type: new 
Abstract: We propose a matrix-free right-preconditioning strategy for the Generalized Minimal Residual (GMRES) method based on the Multiscale Robin Coupled Method with oversampling (MRCM-OS) for the numerical solution of elliptic problems arising in subsurface flow. The resulting preconditioner is constructed through local subdomain solves with oversampling and smoothing, and can be applied without explicit assembly of the global operator.
  After a careful presentation of the new procedure, it is used in extensive numerical experiments. Our results demonstrate that the proposed approach substantially reduces iteration counts across a range of challenging, high-contrast subsurface flow problems. In many cases, convergence is obtained in one or two GMRES iterations when oversampling and smoothing are employed.
  The results indicate that combining GMRES with multiscale Robin-based operators is a promising direction for the construction of rapidly convergent preconditioning strategies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08883v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dilong Zhou, Rafael T. Guiraldello, Felipe Pereira, Fabr\'icio S. Sousa</dc:creator>
    </item>
    <item>
      <title>Silicon Photonics Testing: Design for Testability, Fault Detection, and Manufacturing Variation Analysis in Photonic Integrated Circuits</title>
      <link>https://arxiv.org/abs/2606.08885</link>
      <description>arXiv:2606.08885v1 Announce Type: new 
Abstract: This paper proposes a design-for-test (DFT) methodology and architecture for testing and validation of silicon photonic integrated circuits. We describe the design of silicon photonic circuits and components that comprise the proposed DFT architecture. The designs are extensively simulated and validated as test-access and fault-detection circuitry. We demonstrate how the DFT approach can be deployed on photonic integrated circuits and how they can be tested for correct operation, in terms of signal power and phase. The application is demonstrated on two distinct types of designs -- an optical neural network comprising optical devices in a feed-forward topology, and on an optical logic circuit with feedback loops.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08885v1</guid>
      <category>cs.ET</category>
      <category>physics.optics</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Pratishtha Agnihotri, Priyank Kalla, Steve Blair</dc:creator>
    </item>
    <item>
      <title>Block-A-Mole: The Sustainability Frontier of Moving-Target Censorship Resistance</title>
      <link>https://arxiv.org/abs/2606.08886</link>
      <description>arXiv:2606.08886v1 Announce Type: new 
Abstract: Internet censorship affects over four billion people, and deployed circumvention systems share a common weakness: their endpoints are fixed and discoverable, so a patient censor can enumerate and block them. Moving-target circumvention systems instead rotate endpoints across commercial cloud address space faster than censors can react, but the field lacks a theory of when rotation works, leaving rotation intervals and pool sizes to intuition. We give the first formal account of moving-target censorship resistance by modeling the censor-defender interaction as a continuous-time timing game over a combinatorial address-domain space, generalizing FlipIt to a collateral-bounded adversary. We prove a sustainability frontier separating configurations a censor can defeat from those it cannot, and show that under the Great Firewall's 2024 shift to blocking QUIC and TLS by domain, raw rotation speed is not the binding constraint. Instead, availability is governed by the domain burn rate, $\beta=\lambda_{\mathrm{disc}}/\lambda_{\mathrm{intro}}$, the ratio between how quickly the censor blocks defender domains and how quickly the defender introduces fresh ones. We derive a closed-form availability law, prove that address rotation alone cannot sustain high availability when $\beta&gt;1$ regardless of endpoint rotation speed, and characterize the frontier $\beta^\star$. We validate the analysis with an open, model-level censor-defender simulator requiring no privileged access or cloud deployment. The simulator reproduces the predicted phase transition at $\beta^\star$ under adversary profiles representative of the GFW, Russia's TSPU, and Iran, and shows robustness to state-dependent discovery and bursty, provider-correlated burns. The result replaces the heuristic of ``rotate faster'' with a precise operating condition: keeping the domain economy ahead of the censor.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08886v1</guid>
      <category>cs.CR</category>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Anindya Maiti</dc:creator>
    </item>
    <item>
      <title>PALUTE: Processing-In-Memory Acceleration via Lookup Table for Edge LLM Inference</title>
      <link>https://arxiv.org/abs/2606.08891</link>
      <description>arXiv:2606.08891v1 Announce Type: new 
Abstract: Large language models are increasingly deployed on edge devices with tight power and area budgets. While mixed-precision GEMM reduces arithmetic complexity, quantized inference is often dominated by dequantization and nonlinear operators. Lookup Table (LUT)-based method mitigates these costs by precomputing outputs and replacing repeated arithmetic with table lookups, but existing designs incur significant capacity and lookup-latency overheads. This paper presents PALUTE, a LUT-based Processing-In-Memory accelerator built on Monolithic 3D DRAM for efficient edge LLM inference. PALUTE enables in-DRAM LUT queries that exploit the vertical organization of M3D DRAM memory array tiles to achieve high parallelism with low area overhead. A near-memory LUT generator supports low-latency LUT generation for both GEMM and element-wise unary nonlinear operators, while a system-level tiering and scheduling strategy minimizes data movement across memory tiers. Evaluation using cycle-accurate simulation and RTL synthesis shows that PALUTE achieves 1,264 TPS end-to-end throughput at 0.16 W, improving energy efficiency by 12.8$\times$ over CHIME and 1.6$\times$ over FIGLUT, improving area efficiency by 2.0$\times$ over PIMPAL under W4A4 across Qwen3-4B models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08891v1</guid>
      <category>cs.AR</category>
      <category>cs.ET</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Runyang Tian, Yanru Chen, Weihong Xu, Tajana \v{S}imuni\'c Rosing</dc:creator>
    </item>
    <item>
      <title>Diffuse AI Control on Fuzzy Tasks</title>
      <link>https://arxiv.org/abs/2606.08892</link>
      <description>arXiv:2606.08892v1 Announce Type: new 
Abstract: AI models deployed in critical domains, such as AI safety research, may subtly sabotage our efforts due to misalignment. Diffuse AI Control is a subfield of AI safety concerned with mitigating risks from AI sabotage distributed over long deployment horizons (diffuse threats). These risks are particularly pernicious on fuzzy tasks, i.e. tasks which are hard to grade or require intuition. To understand diffuse threats on fuzzy tasks, we introduce a novel framework that considers AI control as an adversarial game between a blue team and a red team. The blue team uses a weak trusted model to construct a weak score against which they would train a strong, potentially subversive model to remove the subversion propensity if it were present. The red team then tries to find model behaviors that are rated highly by the weak score, and thus might not be trained out, but actually correspond to poor performance. We test our framework on the task of writing experimental proposals for research questions from recent ML papers. We use a language model with access to the original paper as a proxy "ground-truth" scorer. Our red team discovers subversive behaviors using multi-objective evolutionary prompt optimization. We show that Opus~4.6 can write proposals that are worse according to the ground truth proxy than those of GPT-OSS-20B, while the weak scorer rates them as highly as the best proposals from Opus 4.6. To mitigate the threat, we propose an adversarial optimization algorithm for the blue team that discovers more robust prompts for the weak model. This algorithm produces a blue team prompt that our red team optimization fails to exploit.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08892v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mikhail Terekhov, Caglar Gulcehre, Vivek Hebbar, Joe Benton</dc:creator>
    </item>
    <item>
      <title>Cheap Reward Hacking Detection</title>
      <link>https://arxiv.org/abs/2606.08893</link>
      <description>arXiv:2606.08893v1 Announce Type: new 
Abstract: A small transformer encoder is trained to map Terminal-Wrench trajectories onto a unit sphere where embedding distance approximates the $L_1$ distance between reward and metadata signals. A linear probe on top of that embedding detects reward hacking on the cleaned test split with AUC $0.9467$ and TPR@5%FPR $0.8296$, matching the TW sanitized LLM-as-judge AUC ($0.9510$ on the cleaned split) and exceeding its TPR@5%FPR ($0.7130$ vs $0.8296$) on the same information condition, at roughly four orders of magnitude lower per-trajectory cost. The encoder is not a pure behavior reader: stripping natural-language reasoning from its input at probe time drops AUC to $0.6213$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08893v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Iv\'an Belenky, Joaqu\'in Itria, Steven Johns</dc:creator>
    </item>
    <item>
      <title>Are Reasoning Vision-Language Models Robust to Semantic Visual Distractions?</title>
      <link>https://arxiv.org/abs/2606.08894</link>
      <description>arXiv:2606.08894v1 Announce Type: new 
Abstract: Reasoning Vision-Language Models (VLMs) achieve strong performance on complex multimodal tasks, but reliable real-world application requires handling visual inputs that are messier than clean, curated benchmarks. Existing works mainly evaluate such reliability of VLMs through input corruptions, such as noise, blur and weather effects, which make visual evidence harder to perceive. This leaves a critical reliability failure mode underexplored: a model may perceive the evidence correctly, yet reason from plausible but irrelevant and distracting evidence and propagate this mistake to its final answer. To address this gap, we introduce \textbf{Distract-Bench}, a benchmark for evaluating VLM robustness to \textbf{semantic visual distractions}, defined as meaningful but task-irrelevant visual cues added to inputs while preserving the ground-truth answer. We comprehensively evaluate eight leading open-source and two closed-source VLMs across conventional vision corruptions and Distract-Bench. Our results show that Distract-Bench exposes a robustness failure distinct from vision corruptions: reasoning VLMs largely track their non-reasoning base models under perceptual degradation, but show consistently lower robustness to semantic distractions. Further analysis shows that these distractions often enter the reasoning process of VLMs, are treated as evidence, and lead to incorrect answers. Together, these findings reframe robustness evaluation for reasoning VLMs, shifting the focus from degraded perception to distractions for reliable real-world visual reasoning. Our data and code are available at https://github.com/Yizheng-Sun/Distract-Bench.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08894v1</guid>
      <category>cs.CV</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yizheng Sun, Mochuan Zhan, Yanan Ma, Jia Tong See, Yifan Wang, Ziyi Wang, Hao Li, Yang Cui, Wenhao Cai, Jingyu Sun, Chenghua Lin, Riza Batista-Navarro, Jingyuan Sun</dc:creator>
    </item>
    <item>
      <title>Optimal Regret Exponents for Bayesian Statistical Decision Problems</title>
      <link>https://arxiv.org/abs/2606.08895</link>
      <description>arXiv:2606.08895v1 Announce Type: new 
Abstract: We study finite-state finite-action Bayesian statistical decision problems. While exact error-exponent characterizations are known for several special cases, including hypothesis testing and hypothesis exclusion, the asymptotic behavior of the optimal Bayes regret is largely unknown for general decision problems. In this paper, we show that the optimal regret always decays exponentially fast and characterize its exact exponent for arbitrary loss functions. The exponent is given by the minimum multivariate Chernoff information over the minimal incompatible subsets of states, where an incompatible subset is a collection of states for which no single action is optimal for all states in the subset. Our result recovers the classical pairwise-minimum Chernoff exponent for symmetric multiple hypothesis testing and the multivariate Chernoff exponent for hypothesis exclusion, while also yielding, to the best of our knowledge, the first exact exponent characterization for list hypothesis testing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08895v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <category>math.ST</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hyun-Young Park, Si-Hyeon Lee</dc:creator>
    </item>
    <item>
      <title>FAME: Forecastability-Aware Mixture of Experts for Heterogeneous Time Series Forecasting</title>
      <link>https://arxiv.org/abs/2606.08896</link>
      <description>arXiv:2606.08896v1 Announce Type: new 
Abstract: Large-scale retail and industrial forecasting systems contain many heterogeneous time series whose lifecycle, sparsity, volatility, seasonality, spectral patterns, and contextual sensitivity differ substantially. A single forecasting model rarely performs well across all regimes, while dense ensembles increase inference cost and provide limited insight into expert suitability. This paper studies forecastability-aware expert routing: learning how data characteristics determine the suitability of forecasting experts. We propose \method{}, a sparse mixture-of-experts framework that represents each series with a multidimensional forecastability fingerprint, mines expert-suitability targets from validation performance, and trains a cost-aware sparse router to activate a small budgeted set of experts for each series. Using a production-scale vending-machine sales dataset from Shandong New Beiyang (SNBC), where the forecasting component has been integrated into the replenishment-planning pipeline, together with public retail benchmarks, we show that expert suitability varies systematically across data regimes. On the industrial dataset with 5,000+ machines and 60M+ transactions, \method{} Top-2 reduces MSE by 12.4\% over the strongest single expert, LightGBM, while executing 1.92 experts per series on average. The deployed component produces demand forecasts, while inventory-oriented gains are estimated by an offline replay simulator under a fixed replenishment policy rather than by online intervention. The framework turns heterogeneous sales forecasting from heuristic model selection into data mining of forecastability patterns and expert specialization. Code is available at https://github.com/hit636/FAME</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08896v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qianyang Li, Xingjun Zhang, Shaoxun Wang, Tao Peng, Jia Wei</dc:creator>
    </item>
    <item>
      <title>A multi-agent system for spine MRI report generation from multi-sequence imaging</title>
      <link>https://arxiv.org/abs/2606.08897</link>
      <description>arXiv:2606.08897v1 Announce Type: new 
Abstract: Spinal pathology is a leading cause of pain and disability worldwide. Spine MRI is central to clinical evaluation, yet its interpretation remains complex and time-consuming, requiring integration of information across multiple imaging sequences and anatomical regions. Despite recent advances in automated MRI analysis, effectively combining multi-sequence data while preserving sequence-specific diagnostic information remains an open challenge. Here we present SpineAgent, a multi-agent framework for spine MRI report generation built upon a multi-sequence foundation model trained on routine clinical data from 32,047 patients and 453,683 MRI series, comprising a total of 13,441,191 MRI slices. To accommodate diverse modalities of sequences, we first pre-train two DINOv3-based encoders separately on T1- and T2-weighted sequences. We then introduce a continual training strategy that learns a synthesizer to embed images of other sequences using the T1 and T2 encoders, producing patient-level embedding that integrates various signals across MRI sequences. Using these embeddings, SpineAgent achieves state-of-the-art performance, and demonstrates strong generalizability under cross-manufacturer and cross-cohort evaluation. Beyond classification, SpineAgent enables pathology localization by identifying findings-relevant slices and segmenting pathological regions. It also supports multimodal image-report retrieval, providing a solid foundation for scalable and explainable MRI report generation. We further integrate these validated capabilities of SpineAgent into 37 specialized agents. Finally, we incorporate their outputs as structured tokens within a Medical Report Agent trained end-to-end for report generation. Through both automated metrics and expert evaluation by five radiologists, SpineAgent achieves leading performance in spine MRI report generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08897v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>q-bio.QM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Zhiping Xiao, Junwei Yang, Gongbo Sun, Han Zhang, Hanwen Xu, Yi Yao, Zachary D. Miller, William E. King III, Mohammed M. Kanani, Jalal B. Andre, Sammy Chu, Ming Zhang, Paul E. Kinahan, Nathan M. Cross, Sheng Wang</dc:creator>
    </item>
    <item>
      <title>A Kernel-Clean Lean Mechanization of Classical Lottery in Action and the Wakker--Debreu--Koopmans Representation Layer</title>
      <link>https://arxiv.org/abs/2606.08902</link>
      <description>arXiv:2606.08902v1 Announce Type: new 
Abstract: We present a Lean 4/Mathlib formalization of the additive representation theory behind Classical Lottery in Action and the Wakker-Debreu-Koopmans (WDK) layer it relies on. Our central result is a machine-checked proof that the cross-pair Thomsen / double-cancellation (hexagon) condition is irreducible from the ordinal axioms of additive conjoint measurement (weak order, restricted solvability, Archimedean condition, and tradeoff consistency). We exhibit an explicit verified counter-model (additiveRealBoolPref) satisfying all ordinal axioms yet failing the cross-pair condition, with every strict standard sequence being an arithmetic progression and hence non-dense. Around this boundary we mechanize the full derivable construction: continuous Debreu/Eilenberg utility from separability, standard-sequence grids, bisection methods from connectedness, and global additive gluing. All public theorems are sorry-free conditional wrappers over this single irreducible structural input. The development is kernel-clean, depending only on standard Lean foundations (propext, Classical.choice, Quot.sound). The companion file ClassicalLotteryInAction.lean formalizes local classical-lottery constructions, average-utility results, matching-frequency lemmas, and ambiguity-attitude statements used by the Management Science paper. This draws a precise, machine-certified line between what additive conjoint measurement can prove and what it must assume.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08902v1</guid>
      <category>cs.LO</category>
      <category>econ.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jingyuan Li, Ilia Tsetlin, Fan Wang</dc:creator>
    </item>
    <item>
      <title>Synthetic but Not Realistic: The Evaluation Challenge in Generative Modelling for Structured Electronic Medical Records</title>
      <link>https://arxiv.org/abs/2606.08903</link>
      <description>arXiv:2606.08903v1 Announce Type: new 
Abstract: Synthetic healthcare data are widely proposed as privacy-preserving substitutes for real patient data, yet their evaluation remains dominated by statistical similarity and predictive performance that do not reflect clinical validity. We introduce a multi-dimensional evaluation framework grounded in epidemiology, assessing descriptive fidelity, clinical utility, and structural validity, corresponding to descriptive, predictive, and causal questions. We evaluate four representative generative paradigms - GAN-based, VAE-boosted, diffusion-based, and masked modelling - using PRIME-CVD, a 50,000-person cohort with known ground-truth structure. While all models reproduce marginal distributions, none simultaneously preserve subgroup structure, effect estimates, and dependency structure. Notably, models with strong distributional fidelity can exhibit poor calibration and distorted relationships, leading to unreliable inference. These results show that current evaluation practices can overestimate synthetic data quality and motivate domain-informed assessment based on the ability to support valid clinical and scientific conclusions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08903v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nicholas I-Hsien Kuo, Blanca Gallego, Louisa Jorm</dc:creator>
    </item>
    <item>
      <title>Order Matters: Unveiling the Hidden Impact of Macro Placement Sequences via Proxy-Guided LLM Evolution</title>
      <link>https://arxiv.org/abs/2606.08904</link>
      <description>arXiv:2606.08904v1 Announce Type: new 
Abstract: Macro placement is a fundamental step in modern chip physical design, playing a crucial role in determining the solution quality of high-dimensional combinatorial optimization problems. Despite recent advancements in machine learning for spatial coordinate determination, the temporal dimension of placement sequencing remains largely governed by static heuristics. In this work, we demonstrate that the placement sequence is not merely a preprocessing step but a decisive factor in optimization, where suboptimal early decisions trigger irreversible domino effects that constrain the solution space. To harness this unexplored dimension, we propose \textbf{OrderPlace}, a proxy-guided LLM evolution framework for automatically discovering macro placement order strategies. Instead of relying on manually crafted heuristics such as area- or connectivity-based ordering, OrderPlace explores a broader space of code-level policies, ranging from static scoring metrics to dynamic physics-inspired mechanisms. To mitigate the prohibitive cost of evaluating sequences, we introduce a lightweight proxy evaluation mechanism that efficiently filters candidates using a deterministic greedy probe. Experimental results on the standard ISPD 2005 benchmarks demonstrate that OrderPlace discovers novel ordering strategies. Compared with WireMask-EA and the state-of-the-art method EGPlace, OrderPlace reduces wirelength by 34.04\% and 14.08\%, respectively.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08904v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shibing Mo, Jing Liu, Jianchu Xu, Ruilin Wu</dc:creator>
    </item>
    <item>
      <title>DifferSeg: Towards Diverse Multimodal Binary Segmentation via Differential Perception and Frequency Guidance</title>
      <link>https://arxiv.org/abs/2606.08906</link>
      <description>arXiv:2606.08906v1 Announce Type: new 
Abstract: In many binary segmentation tasks, most multimodal methods rely on fixed feature concatenation for cross-modal interaction and straightforward decoder designs dominated by low-frequency semantics. %ToDO: %
However, they ignore two key challenges: one is the lack of an adaptive mechanism to handle modality discrepancies and complementarity, and the other is the absence of an efficient decoding strategy to balance both high- and low-frequency representations. %
In this work, we propose a simple yet general multimodal binary segmentation framework, termed DifferSeg, to address both problems simultaneously. With the help of the differential perception fusion (DPF) module, DifferSeg employs learnable differential operators to adaptively align multimodal features and enhance their complementarity through residual fusion, effectively mitigating modality mismatch and fusion redundancy. %
In addition, we design a frequency-guided decoder (FGD) that builds cross-frequency interactions and multi-path upsampling to maintain consistency between detailed high-frequency structures and semantic low-frequency representations, ensuring fine-grained boundary recovery and noise suppression. %
Benefiting from these designs, DifferSeg can be easily generalized to diverse binary segmentation tasks, including both natural and medical modalities. Without bells and whistles, it consistently surpasses 67 state-of-the-art methods across 29 public datasets involving 18 downstream tasks, demonstrating superior generalization and segmentation accuracy.Code and pretrained models will be available at the Link.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08906v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Qiangqiang Zhou, Jiawei Xu, Yong Chen, Dandan Zhu, Yugen Yi, Xiaoqi Zhao</dc:creator>
    </item>
    <item>
      <title>Failure-Aware Refinement of Vision-Language Model for Lithography Defect Detection</title>
      <link>https://arxiv.org/abs/2606.08908</link>
      <description>arXiv:2606.08908v1 Announce Type: new 
Abstract: Semiconductor lithography inspection requires reliable detection of small pattern defects such as bridge, burr, pinch, and contamination. In this study, we propose a two-stage vision-language framework that combines initial defect detection with prediction refinement. In the first stage, Qwen3-VL is fine-tuned with LoRA as a vision-language adapter to predict defect counts, defect categories, and normalized bounding boxes from lithography images. However, direct fine-tuning may still produce common test-time errors, including false positives, missed defects, and incorrect defect types. To address this limitation, the second stage trains a refinement module using first-stage prediction failures and their corrected labels, allowing the model to review and revise initial outputs. By learning from cases where the initial adapter fails, the refinement process improves defect inference beyond single-stage fine-tuning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08908v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Pangyun Jeong, Jiyeong Kong, Yuehua Hu, Dohee Jeong, Kyung-Tae Kang</dc:creator>
    </item>
    <item>
      <title>Enhancing Presence, Deepening Fan Intensity: How Presence in Immersive Video Shapes Psychological Closeness to Performers</title>
      <link>https://arxiv.org/abs/2606.08912</link>
      <description>arXiv:2606.08912v1 Announce Type: new 
Abstract: Immersive video differs from conventional flat 2D video in that it is experienced as 180-degree stereoscopic video on a head-mounted display, thereby eliciting bodily and spatial subjective experience. Previous studies have shown that viewing and interpersonal distance affect Presence; however, it remains insufficiently understood how Presence differences are related to psychological closeness to content. In the present study, we examined whether differences in Presence could increase viewers' psychological closeness to performers within the content. This psychological closeness was operationally defined as fan intensity. Specifically, a live performance by a Japanese idol group was recorded as 180-degree immersive video, and a high-Presence condition (1.2 m) and a low-Presence condition (7.6 m) were established by manipulating filming distance. Twenty-four participants with different levels of prior involvement, comprising Avid fans and Casual fans, experienced both conditions in a counterbalanced within-participants design. Fan intensity was measured before and after the experience as perceived psychological overlap between the self and the performers. The results showed that, compared with the low-Presence condition, the high-Presence condition significantly increased all Presence-related measures except the Slater-Usoh-Steed questionnaire, with the largest condition differences observed for Possible Actions, Social Presence, and Observability. Moreover, a mixed analysis of variance on changes in fan intensity revealed a significant main effect of Presence condition, indicating that the high-Presence video produced a greater increase in fan intensity than the low-Presence video. These findings suggest that filming distance in immersive video is not merely a factor that determines angle of view or composition, but a design variable that can enhance Presence and deepen fan intensity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08912v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Koichi Toida, Hideto Hiranuma, Shimpei Miura, Norihiro Yamamoto, Yuki Kobayashi, Shingo Meguro</dc:creator>
    </item>
    <item>
      <title>Vibe Visualizing: How Visualization Novices Try (and Fail) to Generate and Interpret Visualizations with Conversational AI</title>
      <link>https://arxiv.org/abs/2606.08914</link>
      <description>arXiv:2606.08914v1 Announce Type: new 
Abstract: Conversational AI has enabled users to generate and interpret visualizations through natural language, significantly lowering the technical barrier to entry. The increased accessibility brings visualization novices into data visualization, but also exposes them to misinformation and misinterpretations. We are motivated to examine what issues can arise in interactions with current conversational AI, whether visualization novices can recognize such issues, and how they respond to them. To examine these questions, we conducted a user study on ChatGPT with 20 visualization novices, collecting their conversation logs, semi-structured interview transcripts, and Likert-scale questionnaire responses. Through thematic analysis, we developed a codebook that covers AI execution compliance, issues of AI-generated visualizations, patterns of AI responses, and prompting patterns of users. We summarized four themes, including the quality of outcomes, recurring errors from ChatGPT, misuse by users, factors that affect user trust, confidence, and verification behavior, and human-AI collaboration dynamics. To demonstrate the generalizability of our codebook and findings, we replayed the initial user prompts on Gemini and Claude and compared the outcomes, which revealed distinct failure modes for each model. Based on the results of all analyses, we derive a set of design recommendations for future AI-assisted visualization systems. We conclude with discussions on literacy gaps, diverse human-AI collaboration dynamics, and implications for agentic visualization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08914v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sam Yu-Te Lee, Yun-Hsin Kuo, Chifang Chou, Matthew Ward, Xiwei Xuan, Kwan-Liu Ma</dc:creator>
    </item>
    <item>
      <title>When Vision Misleads, Let Location Speak: A Worldwide Image Geo-Localization Method via Location Attention Mechanism and Large Multimodal Models</title>
      <link>https://arxiv.org/abs/2606.08918</link>
      <description>arXiv:2606.08918v1 Announce Type: new 
Abstract: Worldwide image geo-localization aims to determine the capture location of an image on a global scale. Existing methods often mislocalize images by matching them to visually similar scenes from different geographic regions, which limits reliability in practical applications. To address this issue, we propose TransGeoCLIP, a novel retrieval-based framework that integrates a location attention mechanism and large multimodal models (LMMs). Using the Transformer encoder with location attention to encode GPS coordinates, TransGeoCLIP can effectively distinguish geographic features among visually similar images. The framework consists of two stages: 1) Retrieval database construction, which employs Transformers equipped with location attention mechanisms to encode labeled GPS coordinates and enhance location semantics, subsequently enables joint image-text-GPS embedding through CLIP; 2) Retrieval-augmented inference, which leverages LMMs to infer the final image location prediction from retrieved database results. Extensive experimental results on diverse datasets, including IM2GPS, IM2GPS3k, YFCC4k, and YFCC26k, demonstrate that TransGeoCLIP significantly enhances localization performance for visually similar images. Particularly, street-level localization accuracy (within 1 km error) is substantially improved, surpassing state-of-the-art methods by 1.5%, 1.07%, 7.18%, and 9.75% on these benchmarks, respectively.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08918v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Junchao Cui, Wenqi Shi, Xuanzi Ma, Nan Wu, Shaoyong Du, Xiangyang Luo</dc:creator>
    </item>
    <item>
      <title>Oversight Has a Capacity: Calibrating Agent Guards to a Subjective, Fatiguing Human</title>
      <link>https://arxiv.org/abs/2606.08919</link>
      <description>arXiv:2606.08919v1 Announce Type: new 
Abstract: As LLM agents begin to take real, irreversible actions (shell commands, file edits, deploys), the standard safety pattern is a human-in-the-loop approval gate: risky actions pause and wait for a person. We argue the gate is the easy part; the hard part is the judgment - which actions to stop - which the field evaluates against two false assumptions: that there is a ground-truth notion of "risky," and that the human reviewer is a perfect, infinitely-available oracle. On a hand-labeled set of 125 adversarially-weighted agent actions we show that (i) reviewers only moderately agree on what is risky (Fleiss' kappa = 0.52), so there is no single correct label; (ii) framing the guard as selective classification under asymmetric cost makes its operating limits measurable, and on hard inputs the guard cannot safely auto-decide; and (iii) when the reviewer is modeled as endogenous (fatiguing as escalation load grows), realized safety becomes an inverted-U in the escalation rate: more human oversight can make a system less safe, and the safety-optimal guard escalates below full escalation - a setting a load-aware policy also uses to resist a flooding attack that slips a malicious action past a fatigued reviewer. Agent oversight, framed this way, is not only a classification problem but a resource-allocation one: human attention is finite, and the guard's escalation policy spends it. We claim none of these mechanisms as novel - fatigue-aware learning-to-defer (FALCON), cost-sensitive deferral under workload constraints (DeCCaF), trajectory-level guarding, and reviewer-fatigue/flooding attacks are all prior art we cite. Our contribution is an open-source agent-oversight system that operationalizes and measures them in the LLM-agent action-gating setting, turning "is my guard good?" from a guess into a curve. The inverted-U and the flooding attack are modeling results that motivate a human study.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08919v1</guid>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Emre Turan</dc:creator>
    </item>
    <item>
      <title>PolyBuild: An End-to-End Method for Polygonal Building Contour Extraction from High-Resolution Remote Sensing Images</title>
      <link>https://arxiv.org/abs/2606.08920</link>
      <description>arXiv:2606.08920v1 Announce Type: new 
Abstract: Extracting building polygon contours from high-resolution remote sensing images is a fundamental task for various mapping applications. However, the presence of varying imaging conditions and complex building structures, makes automatic contour extraction extremely challenging. Mainstream approaches for building extraction often rely on pixel-level segmentation followed by multiple post-processing steps to produce building contour, which can be computationally intensive and prone to errors. In this paper, we propose an end-to-end method named PolyBuild, which can directly extract building vector polygons from high-resolution remote sensing images without the need for any post-processing operations. The proposed method leverages two primary modules: an Initial Contour Generation Module (ICGM) and a Contour Optimization Module (COM). The ICGM is designed to generate an initial building contour by utilizing concatenated sub-region center features for each building instance. It performs simultaneous object detection and initial contour extraction by generating bounding boxes and using the center features of four sub-regions to represent each building. The Contour Optimization Module (COM) further refines the generated building contours by iteratively integrating Convolutional Neural Network (CNN) features and contour positional information in a Transformer-based decoder. The hybrid CNN-Transformer architecture effectively captures both local and global spatial relationships within the building contour, ensuring high-quality boundary delineation. Extensive experiments are conducted on three building datasets to evaluate the performance of PolyBuild. The results demonstrate that PolyBuild significantly outperforms state-of-the-art methods, including mask-based and contour-based approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08920v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yaoteng Zhang, Julin Zhang, Guangshuai Wang, Jiwei Deng, Hui Sheng, Yasir Muhammad, Shiqing Wei</dc:creator>
    </item>
    <item>
      <title>Generalized Rank-based Evaluation for Knowledge Graph Completion: Perspectives, Framework, and Analyses</title>
      <link>https://arxiv.org/abs/2606.08921</link>
      <description>arXiv:2606.08921v1 Announce Type: new 
Abstract: Knowledge graph completion (KGC) aims to predict missing facts from an observed knowledge graph (KG), playing a crucial role in a wide range of real-world applications such as drug discovery, recommender systems, and retrieval-augmented generation (RAG). Although numerous KGC models have been proposed, the evaluation of KGC remains underexplored, despite its critical role in reliably assessing model performance and selecting appropriate models for real-world applications. In this paper, we introduce two important perspectives for KGC evaluation that are overlooked by existing evaluation metrics, (P1) predictive sharpness and (P2) popularity-bias robustness. To address both perspectives, we propose a generalized evaluation framework, PROBE, which consists of a rank transformer (RT) that estimates the score of each prediction based on a desired level of predictive sharpness and a rank aggregator (RA) that determines the final evaluation score by aggregating all prediction scores according to a desired level of popularity-bias robustness. We theoretically analyze PROBE by defining six key properties for reliable KGC evaluation and prove that PROBE satisfies all the properties, while existing metrics fail to satisfy some. In particular, due to the open-world nature of KGs, an evaluation metric should preserve the relative performance of KGC models even when only incomplete facts are observed. We show that PROBE better maintains such consistency, providing a more reliable estimate of intrinsic model performance than existing metrics. Extensive experiments with six KGC models on six real-world KGs reveal that existing metrics may over- or under-estimate model performance depending on different evaluation perspectives, whereas PROBE enables a more comprehensive, flexible, and consistent evaluation of KGC models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08921v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sooho Moon, Jian Kang, Yunyong Ko</dc:creator>
    </item>
    <item>
      <title>PTDL:Multi-Terrain Fall Recovery via Phase-Terrain Decoupled Learning</title>
      <link>https://arxiv.org/abs/2606.08922</link>
      <description>arXiv:2606.08922v1 Announce Type: new 
Abstract: Humanoid robots can fall on slopes, gravel, and uneven ground in unstructured environments. We target integrated fall recovery and locomotion: rebuilding balance from a fallen state using proprioception alone and resuming velocity-commanded walking at the fall site. Prior methods often stop at quasi-static rise, neglect the post-fall ground-contact phase, or, when trained on mixed terrains without separating recovery and locomotion phases or per-surface constraints, collapse to a single compromise get-up across surfaces. We propose Phase--Terrain Decoupled Learning (PTDL), which decouples training supervision along phase and terrain axes while deploying one proprioceptive policy. On the phase axis, projected-gravity-gated dual motion-prior discriminators and a probe-to-walk transition link post-fall recovery to commanded walking. On the terrain axis, terrain-stratified recovery shaping assigns surface-specific training supervision on flat ground, gravel, and slopes; terrain labels are training-only and withheld from policy observations, enabling implicit post-fall strategy selection at deployment. We validate PTDL on a 29-DoF Unitree G1 across flat ground, gravel, and slopes up to 20 degrees in simulation and on hardware, achieving stable cross-terrain recovery, smooth recovery-to-locomotion transitions, and differentiated post-fall rise behaviors under one deployed policy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08922v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaoyu Xu, Zhiming Chen, Yuenan Zhao, Ran Song, Wei Zhang</dc:creator>
    </item>
    <item>
      <title>PROBE-Web: An Interactive System for Probing Evaluation Landscapes of Knowledge Graph Completion Models</title>
      <link>https://arxiv.org/abs/2606.08926</link>
      <description>arXiv:2606.08926v1 Announce Type: new 
Abstract: Knowledge graph completion (KGC) models are commonly evaluated using rank-based metrics such as MRR and Hits@K, despite different users often requiring different evaluation perspectives. In this demo, we present PROBE-Web, an interactive system for probing diverse evaluation landscapes for KGC models. PROBE-Web enables users to flexibly evaluate KGC models by adjusting two critical perspectives: (P1) predictive sharpness and (P2) popularity-bias robustness. Through a user-friendly GUI, users easily evaluate multiple KGC models and analyze their strengths and weaknesses. PROBE-Web provides four key functionalities: (1) conventional evaluation toolkit, (2) flexible perspective-aware evaluation, (3) explainable case studies, and (4) evaluation landscape exploration. We believe that PROBE-Web can help users better understand KGC models aligning with their objectives.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08926v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sooho Moon, Yunyong Ko</dc:creator>
    </item>
    <item>
      <title>In-Situ Immersive Analytics Authoring through Ergonomic Keyboard Support</title>
      <link>https://arxiv.org/abs/2606.08927</link>
      <description>arXiv:2606.08927v1 Announce Type: new 
Abstract: Immersive analytics uses augmented reality (AR) to integrate data analysis and authoring within physical environments. However, extensive text entry required for immersive analytics authoring remains a fundamental challenge in AR, as popular natural user interfaces often hinder expressive input. This paper presents the Body-Supported Keyboard (BSK), an ergonomic system that allows the mobile use of a Bluetooth keyboard in AR. We conducted a controlled study with 20 participants to compare the BSK with a standing desk during text transcription and a mobile AR scenario. The results showed slightly higher error rates but comparable task completion times. Participants reported comfort improvements during mobile use and positive usability ratings (mean SUS = 74.5). The BSK allows users to move freely and maintain stable postures while authoring in AR. In general, the findings show evidence of the potential for body-supported input to enhance expressive and ergonomic workflows in immersive analytics and emphasize the importance of comfort and mobility in the design of AR authoring tools.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08927v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <arxiv:DOI>10.1080/10447318.2026.2676765</arxiv:DOI>
      <arxiv:journal_reference>International Journal of Human-Computer Interaction, 1-27. 2026</arxiv:journal_reference>
      <dc:creator>Leonel Merino, Bego\~na Juli\'a-Nehme, Santiago Viana</dc:creator>
    </item>
    <item>
      <title>RankGLU: Residual Gated Score Formation for Cross-Sectional Stock Prediction</title>
      <link>https://arxiv.org/abs/2606.08930</link>
      <description>arXiv:2606.08930v1 Announce Type: new 
Abstract: Cross-sectional stock prediction is closer to a ranking problem than to ordinary return-magnitude regression, since portfolio decisions depend on the relative ordering of assets within each trading date. Existing temporal, graph-based, and market-conditioned attention models have improved stock representation learning, yet the final prediction head is often treated as a minor implementation detail. This paper argues that, under information-coefficient-oriented evaluation, score formation is a critical bottleneck: an over-flexible head can fit unstable return magnitude, whereas an overly linear head may underuse cross-feature interactions. We therefore develop RankGLU, a residual bottleneck gated linear unit for cross-sectional stock ranking. RankGLU keeps a direct linear scoring path and adds a bounded multiplicative branch, thereby preserving a stable ordering route while allowing controlled nonlinear interactions. The method is evaluated on CSI300 and CSI800 under a unified protocol with cross-sectional score normalization and an IC-augmented objective. Multi-seed experiments show that, on CSI300, RankGLU achieves the strongest mean IC among the internally controlled variants, improving from 0.0654+/-0.0052 for the original backbone and 0.0697+/-0.0030 for the ranking-aware backbone to 0.0727+/-0.0037, a gain that is consistent across all five seeds. Its best-seed result also exceeds the corresponding baselines. Ablation results further indicate that removing the GLU prediction head causes the clearest degradation among the tested component changes. Additional relation-path calibrations can produce high single-seed peaks, but their multi-seed behavior is less stable. The evidence suggests that ranking-aware stock models benefit most reliably from bounded residual score formation rather than from indiscriminate architectural expansion.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08930v1</guid>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Huixiang Xiao, Jian Xu, Feiyu Qu, Zixuan Xie, Xiangyu Li</dc:creator>
    </item>
    <item>
      <title>From Statute to Control Flow: Span-Grounded Deontic Trees for Defeasible Scope Parsing</title>
      <link>https://arxiv.org/abs/2606.08932</link>
      <description>arXiv:2606.08932v1 Announce Type: new 
Abstract: Rule-following agents tasked with executing policies and regulations often fail via Silent Scope Omission (SSO): a model applies a general rule but silently drops nested exceptions or counter-exceptions, producing outputs that appear compliant yet break on important edge cases. Although such failures are often framed as an agentic-systems problem, the underlying bottleneck is statutory and policy understanding, a capability typically studied in legal NLP. However, most existing legal NLP benchmarks emphasize end-task outcomes, which can overlook the structural omissions that cause SSO. To diagnose and mitigate SSO, we introduce NormBench, a benchmark of 2,290 provisions spanning Chinese (laws and local policies), English (U.S. tax law, GDPR, and corporate policies), and cross-lingual settings, designed for defeasible scope parsing: identifying precisely which clause overrides which. NormBench uses Span-Grounded Deontic Trees (SG-DT), a compiler-style intermediate representation that anchors every logical branch to source spans and requires explicit exclusion guards, enabling deterministic compilation and audit. Evaluations of frontier LLMs reveal two recurring pathologies: (1) Recursion Decay, where performance drops sharply as defeater depth increases, and (2) an Auditability Trap, where models retrieve relevant spans but fail to assemble correct control flow. Using SG-DT as a constrained intermediate output improves whole-tree fidelity and defeater recovery, and downstream experiments show that its utility is mechanism-specific: gains concentrate on exception-active, SSO-prone cases, while aggregate accuracy can be mixed when the added structure is unnecessary or parser fidelity is low.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08932v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jian Chen, Siyuan Li, Chucheng Wan, Zixuan Yuan</dc:creator>
    </item>
    <item>
      <title>Backward Coherence and Hidden-State Stability in Recurrent Neural Networks: A Quasi-Reverse-Martingale Theory</title>
      <link>https://arxiv.org/abs/2606.08934</link>
      <description>arXiv:2606.08934v1 Announce Type: new 
Abstract: Recurrent neural networks maintain a hidden state $h_t$, but its probabilistic meaning is often unclear. We study hidden-state stability through \emph{backward coherence}: the extent to which $h_t$ can be reconstructed from $h_{t+1}$ by a learned backward projector $g_\phi$. Under contraction and summable backward drift, the hidden-state sequence forms a quasi-reverse-martingale. This yields almost-sure convergence, rates under mixing, an interpretable limiting representation, finite pathwise stopping times, and a theoretical framework for time-uniform confidence sequences.
  Simulations support the theory. Backward-coherence regularisation reduces the empirical quasi-martingale total $\hat Q$ by $43$--$58%$, reaches stability $28$--$44%$ earlier than an unregularised RNN, and gives tracking-error recovery consistent with geometric bounds. Additional tests confirm echo-state forgetting rates bounded by $\rho$ and verify the increment-sum tube $R_t$ with $100%$ simultaneous coverage, although $R_t$ is conservative; in practice, the defect-tail proxy $\hat Q_t$ is the more useful monitor. The backward-coherence loss is also equivalent to minimising a Kullback--Leibler divergence in a Gaussian backward model, linking the method to variational inference. Extensions cover $\phi$-mixing inputs, change-point tracking, and finite-sample concentration.
  Three real-data studies further validate the approach. On PhysioNet 2012 ICU data, the Reverse Martingale RNN (RMRNN) matches RNN mortality-prediction AUC while reaching stable representations 13 hours earlier. On FRED-MD, it reduces one-month-ahead forecast error by about fourfold under concept drift. On UCI Human Activity Recognition, it maintains lower post-transition tracking error with geometric decay. The guarantees apply under the stated assumptions; universality is not claimed.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08934v1</guid>
      <category>cs.LG</category>
      <category>stat.AP</category>
      <category>stat.CO</category>
      <category>stat.ME</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuan-chin Ivan Chang</dc:creator>
    </item>
    <item>
      <title>PAI: Preserving Amplitude Information in Representation-Based Time-Series Anomaly Detection</title>
      <link>https://arxiv.org/abs/2606.08935</link>
      <description>arXiv:2606.08935v1 Announce Type: new 
Abstract: Representation-based time-series anomaly detection algorithms significantly outperform other methods on diverse anomaly detection tasks. However, we notice that they suffer from a major limitation in our evaluation - their learned embeddings are often amplitude-agnostic. Losing amplitude information can degrade performance on amplitude related anomalies, and this failure is prevalent across all existing representation-based methods. To address aforementioned issues, we propose a new anomaly scoring scheme named PAI. PAI consists of two complementary modules, a diagnostic module and a final score augmentation function. The diagnostic module compares cosine and Euclidean scoring on the same representation bank to test whether amplitude information is already captured in the learned representation. Then in final score augmentation function, PAI computes a point-wise median and MAD deviation score and a local mean-shift score-which are fused with the representation score to produce the final anomaly score. On the TSB-AD-U-Eva and TAB UV datasets, PAI improves all four evaluated representation-based methods across every reported metric, achieving average VUS-PR gains of 98.4% and 36.8%, respectively. Among all evaluated combinations, PaAno + PAI achieves the best performance, outperforming the state-of-the-art method by 15%. Further evaluation on bootstrap confidence intervals, anomaly-type breakdowns, and a TS2Vec input-normalization ablation further support the proposed scheme. These results suggest that explicitly retaining amplitude information is important for representation-based time-series anomaly detection, which has been underemphasized in existing scoring schemes. Code is available at: https://github.com/pantheon5100/PAI</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08935v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Kang Zhang, Wei Jian Lau, Shoushou Ren, Dong Lin, Joon Son Chung, Chuanhao Sun</dc:creator>
    </item>
    <item>
      <title>Report on CHIIR 2026 Workshop on Generative AI and Academic Search (GAI&amp;AS)</title>
      <link>https://arxiv.org/abs/2606.08936</link>
      <description>arXiv:2606.08936v1 Announce Type: new 
Abstract: This report summarizes the CHIIR 2026 Workshop on Generative AI and Academic Search (GAI\&amp;AS), which examined how GenAI is reshaping academic search systems and research practices. The workshop brought together researchers in human information interaction and information retrieval to explore key challenges and opportunities in designing and evaluating future academic search systems that integrate GenAI, moving beyond traditional document retrieval to support summarization, recommendation, synthesis, and conversational interaction. Participants' interests and discussions focused on three thematic clusters: foundations and principles, applications and opportunities, and search-as-learning. Across these themes, the workshop highlighted the importance of academic search systems in supporting transparency, credibility, research integrity, and long-term scholarly needs, as well as in fostering higher-order cognitive processes. Participants discussed guiding theories, design principles, methodological approaches, partnerships, and community-building efforts aimed at advancing human-centered GenAI-enhanced academic search systems. Overall, the workshop demonstrated strong community interest and a diverse range of ongoing and emerging research initiatives at the intersection of GenAI and academic search.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08936v1</guid>
      <category>cs.IR</category>
      <category>cs.AI</category>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yifan Liu (Klara), Jaime Arguello (Klara), Orland Hoeber (Klara), Chang Liu (Klara), Soo Young Rieh (Klara), Luanne Sinnamon (Klara), Dean Alvarez (Klara), Susan Archambault (Klara), Rob Capra (Klara), Henson Chen (Klara), Charles Costa (Klara), Anita Crescenzi (Klara),  Zhitong (Klara),  Guan, Jacek Gwizdka, Pao-Pei Huang, Gavindya Jayawardena, Ghazal Kalhor, Dagmar Kern, Oliver Koop, Alice Li, Afra Mashhadi, Gaohui Meng, Marta Micheli, Anil B. Murthy, Kevin Schott, Sebastian Schulthei{\ss}, Jiwoo Seo, Phaneendra Sivangula, Frans van der Sluis, Xiaoxuan Song, Silang Wang, Dan Zhang</dc:creator>
    </item>
    <item>
      <title>PACT: Learning Diverse Diagnostic Strategies via Privileged Synthesis and Branch Consensus</title>
      <link>https://arxiv.org/abs/2606.08938</link>
      <description>arXiv:2606.08938v1 Announce Type: new 
Abstract: Clinical diagnosis requires flexible use of multiple reasoning paradigms under incomplete patient information. Existing LLM-based medical agents show strong medical reasoning ability, but single-paradigm or naively mixed dialogue supervision makes these paradigms difficult to learn without interference. We propose \textbf{PACT} (Periodic Anchor Consensus Training), a framework that couples supervised multi-paradigm dialogue synthesis with consensus-based Branch training. At the data level, \textbf{DPS} (Doctor-Patient-Supervisor) uses complete electronic medical records (EMRs) for quality control while keeping the doctor agent restricted to patient-visible information. This produces validated dialogues under four diagnostic reasoning paradigms without leaking hidden clinical answers. At the training level, PACT trains one paradigm-specific LoRA Branch per paradigm and periodically aggregates Branches into a shared Anchor through sign consensus. We further construct a dynamic multi-turn Chinese medical diagnosis benchmark for interactive consultation. Experiments show that PACT achieves state-of-the-art performance among compared proprietary, medical-specialized, and task-adapted baselines on diagnostic outcome and consultation-process metrics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08938v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gen Li, Yuanze Hu, Zhichao Yang, Qingchen Yu, Jianwei Lv, Yue Guo, Yujing Liu, Faguo Wu, Hongwei Zheng, Xiandong Li, Bo Yuan, Yifan Sun, Zhaoxin Fan</dc:creator>
    </item>
    <item>
      <title>Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance</title>
      <link>https://arxiv.org/abs/2606.08940</link>
      <description>arXiv:2606.08940v1 Announce Type: new 
Abstract: Reinforcement Learning from Human Feedback (RLHF) has significantly improved the quality and fluency of large language models in text summarization. However, its impact on affective properties remains insufficiently understood. In this work, we study sentiment drift, a systematic shift toward neutral sentiment in RLHF-based summarization outputs compared to source texts. We conduct extensive experiments across multiple datasets, model architectures, and eight languages to analyze how alignment objectives influence sentiment preservation. Our results show that sentiment drift is a consistent phenomenon that becomes stronger with increased KL regularization strength, indicating a trade-off between alignment stability and affective fidelity. To explain this behavior, we introduce a Policy Attribution framework that decomposes the RLHF objective and quantifies the contribution of its components. Our analysis reveals that KL regularization is the primary driver of sentiment suppression across all settings. Based on these findings, we propose a sentiment-aware modification of the KL regularization term, which selectively reduces constraints on sentiment-bearing tokens. Empirical results demonstrate that this approach mitigates sentiment drift while maintaining summarization quality. Overall, our findings highlight a fundamental limitation of current alignment methods: while they improve factual consistency and safety, they may unintentionally suppress emotional expressiveness. This motivates the development of alignment strategies that explicitly account for affective preservation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08940v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mikhail Krasitskii, Alexander Gelbukh, Olga Kolesnikova, Grigori Sidorov</dc:creator>
    </item>
    <item>
      <title>LongRTL: Graph-Similarity-Guided LLM-driven Long Context RTL Optimization</title>
      <link>https://arxiv.org/abs/2606.08944</link>
      <description>arXiv:2606.08944v1 Announce Type: new 
Abstract: Large Language Models (LLMs) show great promise in RTL code generation and optimization. However, real-world RTL designs are typically long, entangled, and poorly modularized, posing a major challenge due to context-length limitations and lack of structure. To overcome these obstacles, we propose a scalable LLM-based RTL optimization framework guided by graph similarity. Our method introduces three collaborative agents: (1) a Partition Agent that decomposes RTL designs into semantically meaningful AST subtrees, guided by AST graph similarity to reusable design templates; (2) an Optimization Agent that generates RTL submodule code based on partitioned subtrees using multi-modal Retrieval-Augmented Generation (RAG) with both AST and RTL guidance; and (3) a Reconstruction Agent that reassembles optimized submodules based on logic-aware ordering and Graph-RAG prompting, ensuring global functional equivalence. Together, these components enable robust, structure-aware optimization of long-context RTL designs, bridging the gap between toy examples and industrial-scale hardware codebases.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08944v1</guid>
      <category>cs.AR</category>
      <category>cs.PL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuyang Ye, Che-Kuan Shen, Xiangfei Hu, Yuchen Liu, Shuo Yin, Xufeng Yao, Bei Yu, Tsung-Yi Ho</dc:creator>
    </item>
    <item>
      <title>From Hazard Functions to Language Space: Cox-Supervised Distillation of Survival Risk into a Large Language Model</title>
      <link>https://arxiv.org/abs/2606.08945</link>
      <description>arXiv:2606.08945v1 Announce Type: new 
Abstract: We investigate whether information about time-to-event risk estimated by a Cox proportional hazards model can be transferred into a generative large language model. We propose a text-based survival modelling pipeline in which structured clinical covariates are converted into text prompts and a Qwen-based large language model is fine-tuned to generate patient-specific survival risk using Cox model predictions as a training target. Across GBSG2, ACTG320, and WHAS500, the model achieves competitive held-out discrimination and calibration despite being trained as a text-generation task rather than with a conventional survival-analysis loss. We further analyse the geometry of the model's hidden states, where t-SNE visualisations reveal smooth risk gradients in latent space, suggesting that the model represents survival risk as a continuous structure rather than isolated risk categories. Together, these findings suggest that large language models can internalise survival-risk structure while supporting calibrated prediction, providing a route towards time-to-event reasoning in language models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08945v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nicholas I-Hsien Kuo, Blanca Gallego, Louisa Jorm</dc:creator>
    </item>
    <item>
      <title>NeuDW-CIM: a 65-nm 0.8-pJ/Sop Reconfigurable Neuromorphic Compute-in-Memory Macro with Nonlinear Dendrites and K-Winners</title>
      <link>https://arxiv.org/abs/2606.08947</link>
      <description>arXiv:2606.08947v1 Announce Type: new 
Abstract: This work presents NeuDW-CIM, a highly efficient neuromorphic Compute-in-Memory (CIM) macro for Spiking Neural Networks (SNNs) implemented in 65 nm CMOS. The design introduces a custom twin 9T bit-cell for ternary in-puts/weights and a reconfigurable non-linear In-Memory ADC (IMA). The macro supports two specialized modes: 1) Nonlinear Dendrite (NLD) mode, which utilizes reconfigurable IMA to emulate biological dendritic functions, achieving measured accuracies of 97.2% on N-MNIST and 95.5% on DVS Gesture; and 2) Top-K Winner (KWN) mode, featuring an early-stopping mechanism that reduces IMA conversion latency by 30% and digital LIF latency by 10x. Benefiting from the sparse update in KWN mode, NeuDW-CIM achieves a measured energy efficiency (EE) of 0.8 pJ/SOP (1.6x improvement).</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08947v1</guid>
      <category>cs.AR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Junyi Yang, Yahan Yang, Shuai Dong, Biyan Zhou, Ye Ke, Zhengnan Fu, Xin Si, An Guo, Peng Zhou, Arindam Basu</dc:creator>
    </item>
    <item>
      <title>NutriMLLM: Multimodal Large Language Models for Dietary Micronutrient Analysis</title>
      <link>https://arxiv.org/abs/2606.08948</link>
      <description>arXiv:2606.08948v1 Announce Type: new 
Abstract: Comprehensive estimation of dietary micronutrients from food images could improve clinical nutrition care, but training such models requires large multimodal datasets linking diverse foods to complete nutrient profiles. We first show that existing multimodal large language models (MLLMs), including leading proprietary models, are unreliable for this task. Across five model families and four independent evaluation benchmarks (ASA24, SNAPMe, FNDDS, and NutriBench), models frequently abstained or returned statistically implausible values. To address this gap without costly expert annotation, we repurposed a decade of population-scale 24-hour dietary recalls as structured prompts for text-to-image generation. This pipeline produced a synthetic corpus of about 1.1 million image-description-nutrient triplets, each pairing a generated food image with a complete 65-nutrient label. To our knowledge, this is the largest synthetic food-image corpus with comprehensive micronutrient annotation planned for public release upon publication. Fine-tuning Qwen3-VL (2B/4B/8B/30B) and GLM-4.6V-Flash on this corpus yielded NutriMLLM, the first family of vision-language models specialized for comprehensive dietary micronutrient estimation. We evaluate these models with a four-component framework that separately measures abstention, hallucination, overall usability, and per-nutrient numerical accuracy. On real food images, every NutriMLLM variant achieved near-complete coverage across all 65 nutrients, and the largest variant matched or exceeded proprietary baselines (GPT-5, Gemini 3, and Claude Sonnet 4.5) in accuracy on most nutrients. These results show that recall-driven synthetic supervision can make image-based comprehensive micronutrient estimation a tractable engineering problem and support dietary assessment, personalized nutrition guidance, and population-scale micronutrient surveillance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08948v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Runze Yan, Minxiao Wang, Jiaying Lu, Darren Liu, Xiao Hu, Hanqi Luo</dc:creator>
    </item>
    <item>
      <title>When More Cores Hurts: The Vector Database Scaling Paradox in HPC</title>
      <link>https://arxiv.org/abs/2606.08950</link>
      <description>arXiv:2606.08950v1 Announce Type: new 
Abstract: Vector databases have been designed and optimized for cloud environments; however, emerging scientific AI workloads (e.g., molecular search, meteorological trajectory detection, and literature-driven hypothesis generation) demand efficient, scalable execution on HPC systems. We present a large-scale evaluation of three state-of-the-art vector databases -- Qdrant, Milvus, and Weaviate -- on two production supercomputers, scaling to 256 distributed workers across 64 compute nodes. We evaluate representative workload patterns -- mixed read/write and write-then-read -- using popular benchmarks, multimodal embeddings, and a novel real-world scientific dataset. Our results reveal that workload characteristics can limit latency reduction, additional cores can reduce query throughput by up to 30.67%, and scaling from 16 to 256 workers (16x) only yields a 5.46x improvement. This scaling paradox exposes the fundamental mismatch between cloud-oriented designs and HPC systems, highlighting the need for new, HPC-aware vector database designs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08950v1</guid>
      <category>cs.DC</category>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Seth Ockerman, Song Young Oh, Amal Gueroudji, Rochana Chaturvedi, Philip Carns, Nicholas Chia, Matthieu Dorier, Robert Latham, Tanwi Mallick, Swan Perarnau, Robert Underwood, Kyle Chard, Ian Foster, Robert Ross, Shivaram Venkataraman</dc:creator>
    </item>
    <item>
      <title>AlloSpatial: Agentic Harness Framework for Spatial Reasoning in Foundation Models</title>
      <link>https://arxiv.org/abs/2606.08952</link>
      <description>arXiv:2606.08952v1 Announce Type: new 
Abstract: Multimodal Foundation Models (MFMs) have made substantial progress, yet remain fragile in spatial reasoning over the physical world. A key bottleneck lies in their inability to transform local egocentric observations into a global allocentric spatial representation. To address this, we propose AlloSpatial, an agentic framework for allocentric spatial cognition in foundation models. AlloSpatial introduces World2Mind, a plug-and-play cognitive mapping sandbox that converts egocentric observations into structured allocentric priors, including Allocentric-Spatial Trees and route maps that support querying object topology, geometric relations, passability, and trajectories. To utilize these priors reliably under noisy reconstruction and ambiguous visual evidence, AlloSpatial introduces a Spatial Reasoning Harness for tool-use judgment, modality-decoupled cue collection, and geometry-semantic arbitration. We further internalize this process in Qwen3-VL through cold-start reinforcement learning with a harness-gated trajectory-level reward. Experiments on VSI-Bench and MindCube show that AlloSpatial improves proprietary models by 5%-18% in a training-free setting, while ASTs alone support strong spatial reasoning even when visual inputs are removed. The trained AlloSpatial agents further outperform larger general-purpose models and competitive spatial baselines, suggesting that structured allocentric representations, active tool use, and verifiable reasoning offer a promising route toward spatially capable foundation models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08952v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shouwei Ruan, Bin Wang, Zhenyu Wu, Qihui Zhu, Yuxiang Zhang, Jingzhi Li, Yubin Wang, Xingxing Wei</dc:creator>
    </item>
    <item>
      <title>Self-Consistent Generative Paths via Admissible Random Variational Transport</title>
      <link>https://arxiv.org/abs/2606.08953</link>
      <description>arXiv:2606.08953v1 Announce Type: new 
Abstract: Modern generative models often define an entire probability path from a simple prior to the data law, rather than only an endpoint map. Diffusion models follow stochastic denoising paths, flow matching learns transport fields, consistency and distillation methods compress paths into one or a few steps, adversarial models match terminal distributions, and VAEs generate through latent kernels. Existing unifying views mainly describe how such paths are constructed. We study a complementary question: when is a generated probability path self-consistent? We define a self-consistent generative path as a random fixed point of admissible local variational transport corrections. In this framework, a local correction is specified by a random variational transport operator combining a divergence or geometry term, an energy term, and a structural constraint. The framework contains random regularized optimal-transport proximal steps as a structured instance, while also allowing non-OT divergences, latent kernels, adversarial constraints, causal discrete kernels, and terminal one-step maps. The theory yields a random fixed-point path residual (R-FPR), which measures the gap between the actual generated path and an admissible local correction. We prove well-posedness, random fixed-point existence and attraction, non-contractive existence, residual-to-generation error bounds, empirical residual concentration, proxy perturbation bounds, continuous-time limits, and operator-level generalization with model-specific corollaries. The resulting theory turns endpoint matching into path self-consistency testing and provides a residual-control principle for diagnosing failures, regularizing training, and guiding adaptive sampling across diffusion, flow, one-step, VAE, GAN/WGAN, and autoregressive generators.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08953v1</guid>
      <category>cs.LG</category>
      <category>math.FA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lei Luo, Yingzhen Zhang, Jian Yang</dc:creator>
    </item>
    <item>
      <title>From inverse problems to neural operators: prediction, mechanism, and generalization of data-driven models</title>
      <link>https://arxiv.org/abs/2606.08956</link>
      <description>arXiv:2606.08956v1 Announce Type: new 
Abstract: Scientists have historically relied on mathematical models based on differential equations to relate system inputs -- forces, fluxes, or heat sources -- to outputs, such as displacement, velocity, concentration, and temperature. These models rely on deep domain knowledge to determine the form of the governing differential equation, which is then calibrated with data by solving an inverse problem. In recent years, the field of Scientific Machine Learning has introduced a variety of alternative modeling strategies for physical systems. A method called Sparse Identification of Nonlinear Dynamics learns the governing equation as a sparse linear combination of terms in a user-defined library. Neural Ordinary Differential Equations construct the governing equation by taking in the state and its derivatives at the input layer of a neural network. Entirely foregoing the modeling framework of differential equations, neural operators directly learn a non-linear mapping between the system inputs and outputs. From inverse problems to neural operators, all of these modeling strategies can be conceptualized as data-driven machinery to predict a system's response over a range of inputs. It is then natural to wonder how exactly these various strategies relate to each other, and whether they can be neatly taxonomized. Drawing from the philosophical literature on scientific models, we argue that many model types have a common structure, differing only in the assumed model class of the input-output relation they define. Connecting to philosophical ideas on mechanism, and arguing that data from physical systems arises from solutions to parsimonious differential equations, we propose that only certain models are capable of mechanism discovery, and thus generalization. Our analysis is intended to unite apparently disparate modeling strategies and provide insight into their appropriate use cases.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08956v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Conor Rowan</dc:creator>
    </item>
    <item>
      <title>Rethinking 3D Shape Generation: Diffusion over Superquadrics</title>
      <link>https://arxiv.org/abs/2606.08957</link>
      <description>arXiv:2606.08957v1 Announce Type: new 
Abstract: Diffusion models have advanced 3D shape generation, yet most methods still denoise in high-cardinality spaces (e.g., voxel/SDF grids, meshes, or point clouds), which is computationally and memory intensive and makes it difficult to scale in terms of both higher resolution and stronger controllability. We rethink the diffusion representation and propose to move diffusion from dense geometry to compact geometric primitives, representing each shape as a small set of superquadrics. Instead of operating on thousands to millions of geometric representation values, we leverage 7KB superquadric parameters (pose, size, and shape), drastically reducing diffusion-state dimensionality and per-step compute/memory. Our diffusion-over-superquadrics improves scalability by supporting broader capabilities (e.g., resolution-free point-cloud decoding, part-level editing, and constraint-based design) and achieving competitive surface-fidelity and distributional performance on standard benchmarks after point-cloud decoding, while enabling efficient generation within 0.6s per shape for most conditions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08957v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhiyang Liu, Wanze Li, Yuwei Wu, Chengran Yuan, Jiawei Sun, Rui Zheng, Marcelo H Ang Jr</dc:creator>
    </item>
    <item>
      <title>ChinaHeritaQA: A Culturally-Grounded Visual Question Answering Dataset for World Heritage Sites in China</title>
      <link>https://arxiv.org/abs/2606.08959</link>
      <description>arXiv:2606.08959v1 Announce Type: new 
Abstract: We introduce ChinaHeritaQA, a multimodal benchmark dataset for evaluating the cultural reasoning abilities of vision-language models (VLMs) on UNESCO World Heritage sites in China. The dataset comprises 2,279 in-the-wild images paired with 14,133 bilingual (Chinese/English) multiple-choice QA pairs spanning seven cognitive dimensions, from basic identity recognition to historical periodization and architectural analysis. Guided by a UNESCO-aligned heritage ontology and verified through rigorous human annotation, the dataset ensures linguistic quality and factual consistency. Evaluations of state-of-the-art VLMs reveal that while top models exceed human performance on average, substantial task-level variation emerges: models excel at visual recognition but struggle with culturally grounded reasoning. Performance also varies by dynasty and region. ChinaHeritaQA reveals that strong visual retrieval does not extend to cultural and historical understanding. We release the dataset to support future research on culturally aware multimodal learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08959v1</guid>
      <category>cs.CV</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yi Zhang, Bolei Ma, Yong Cao, Chengyan Wu, Daniel Hershcovich, Anna-Carolina Haensch</dc:creator>
    </item>
    <item>
      <title>Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops</title>
      <link>https://arxiv.org/abs/2606.08960</link>
      <description>arXiv:2606.08960v1 Announce Type: new 
Abstract: Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 (16%) hackable by frontier models given only the task description. This corrupts both leaderboard rankings and RL training signal, yet the standard response is manual and reactive.
  We introduce the hacker-fixer loop, a method for building exploit-resistant verifiers without per-task manual patching. The loop alternates three LLM agents: a hacker tries to pass the verifier without solving the task, a fixer patches the verifier to reject each discovered exploit, and a solver confirms the patched verifier still admits legitimate solutions. The loop iterates: each patch reshapes what the verifier rewards, surfacing the next exploit. We further add verifier access, and let patches transfer across tasks, to broaden the exploits the loop discovers.
  On KernelBench, the loop drives the attack success rate from 62% to 0% on a held-out corpus of publicly reported exploits. We also find that weaker agents in the loop can defend against much stronger hackers: Gemini 3 Flash's loop drives the stronger Gemini 3.1 Pro and Claude Opus 4.7's attack success rate from 76% and 61% to 0% on KernelBench, and Gemini 3.1 Pro's from 39% to 17% on Terminal Bench across 77 tasks. We release Terminal Wrench (323 hackable environments, 3,632 hack trajectories) as a snapshot of the current attack surface, our patched verifiers, the exploits the loop discovered, and our implementation as a basis for future work.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08960v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ziqian Zhong, Ivgeni Segal, Ivan Bercovich, Shashwat Saxena, Kexun Zhang, Aditi Raghunathan</dc:creator>
    </item>
    <item>
      <title>C$^3$ache: Accelerating World Action Models with Cross Inference Chunk Cache</title>
      <link>https://arxiv.org/abs/2606.08962</link>
      <description>arXiv:2606.08962v1 Announce Type: new 
Abstract: World Action Models (WAMs) generalize better than standard Vision-Language-Action (VLA) policies to novel motions and environments, because a video-modeling objective lets them learn from abundant unlabeled video rather than scarce labeled robot demonstrations. This generalization is computationally expensive. To complete a task, a WAM runs over multiple inference chunks, and each chunk requires a costly denoising process. Existing acceleration methods reduce this cost by caching and reusing computation within a single chunk's denoising trajectory. Our empirical analysis reveals a substantial source of redundancy they overlook: redundancy across chunks. When a robot executes a smooth behavior, the residuals computed at a given denoising step are strongly correlated from one chunk to the next. We introduce C$^3$ache, a training-free method that caches and reuses these residuals across inference chunks at the same denoising step. Experiments on benchmarks with a Fast-WAM backbone show that C$^3$ache achieves up to a $2.5\times$ speedup in total wall-clock inference time, with negligible degradation in task success rate.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08962v1</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Weisen Zhao, Lam Nguyen, Zhicong Lu, Yuzhang Shang</dc:creator>
    </item>
    <item>
      <title>Embedding linear codes over Z4 into self-orthogonal codes</title>
      <link>https://arxiv.org/abs/2606.08964</link>
      <description>arXiv:2606.08964v1 Announce Type: new 
Abstract: The purpose of this paper is to investigate the self-orthogonal embedding problem for linear codes over Z4. We propose several tight bounds on the length of the shortest self-orthogonal embedding over Z4, and determine the exact shortest self-orthogonal embedding length under specific conditions. As an example satisfying these conditions, we establish the exact length of the shortest self-orthogonal embedding for the quaternary Preparata codes. Furthermore, to establish these results, we completely classify the exact length of the shortest doubly even self-orthogonal embedding for binary linear codes in every possible case. Finally, when the shortest self-orthogonal embedding length of a given free code over Z4 is equal to the shortest doubly even self-orthogonal embedding length of its residue code, we present an algorithm to construct all possible shortest self-orthogonal embeddings. With our algorithm, we found twelve linear codes over Z4 whose minimum Lee distances are higher than those of the Z4-linear codes in Aydins database.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08964v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Junmin An, Jon-Lark Kim, San Ling</dc:creator>
    </item>
    <item>
      <title>Before You Scroll Again: Predicting Regretful Social Media Sessions from In-the-Wild Contextual and Wearable Sensing</title>
      <link>https://arxiv.org/abs/2606.08965</link>
      <description>arXiv:2606.08965v1 Announce Type: new 
Abstract: Users often feel regret after using social media, making regret a more ecologically valid target than screen time for understanding when phone use becomes problematic. Existing self-monitoring tools cannot anticipate regret before it occurs, and prior physiological work on social media use has been confined to the lab with research-grade sensors and curated content, leaving the question of in-the-wild prediction open. We deployed a 7-day in-the-wild experience sampling study with 21 participants, combining passive smartphone logging, a low-cost consumer smartwatch (Bangle.js 2, \$80), session-level surveys (1,445 sessions), and exit interviews to investigate when and why social media sessions become regretful, and whether regret can be anticipated before a session begins. Three findings stand out: (i) the gap between intended and actual use predicts regret far more strongly than session duration, with duration's apparent effect collapsing once intention is modeled; (ii) regret is amplified when sessions displace a valued alternative, particularly at night and following productivity-app use; and (iii) pre-session contextual features generalize across participants while physiological signals add person-specific lift, pointing toward a two-layer architecture for just-in-time adaptive interventions. Interview themes of scrolling-as-avoidance and time blindness contextualize these patterns and surface design opportunities beyond timer-based interventions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08965v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sally Ahmed, Jan Enkmann, Kye Shimizu, Ivy Yip, Vincent Beermann, Ayse Alomar, Falk Uebernickel, Pattie Maes</dc:creator>
    </item>
    <item>
      <title>CARE: A Conformal Safety Layer for Medical Summarization</title>
      <link>https://arxiv.org/abs/2606.08969</link>
      <description>arXiv:2606.08969v1 Announce Type: new 
Abstract: Large language models (LLMs) are increasingly used for medical summarization, but their outputs can omit medically important information and introduce unsupported claims. Existing error-detection methods produce heuristic or uncalibrated scores, providing no formal control over missed errors and no principled way to trade off safety against clinician review burden. We introduce Conformal Assessment for Risk Evaluation (CARE), a post-hoc, model-agnostic safety layer that uses conformal risk control to overlay calibrated omission and hallucination flags onto summaries from any LLM without retraining. CARE provides finite-sample, distribution-free guarantees through two controllers: a hallucination controller that bounds the probability of a document containing any unflagged hallucinated sentence, and an omission controller that bounds the expected fraction of important omissions not surfaced for review. Unlike hallucination detection, omissions depend jointly on whether a source sentence is important and whether it is covered by the summary. We show that calibrating only one dimension can violate the target risk bound, while marginal decompositions remain valid but overly conservative. By jointly calibrating over the full $(\tau,\gamma)$ threshold space, CARE preserves formal guarantees while surfacing up to 5$\times$ fewer sentences than alternative calibrated baselines. Across five medical summarization tasks, CARE satisfies the target risk bound at $\alpha = 0.15$ with 95% confidence across 100 calibration/test resplits, using only ~100 labeled documents per domain. In a preliminary clinician study (75 document reviews), calibrated flags improved omission detection by 28.6 percentage points on average. These results show that sentence-level safety guarantees are feasible for LLM-assisted medical summarization and offer a tunable mechanism for balancing residual risk and review effort.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08969v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Suhana Bedi, Bridget Lin, Anson Y. Zhou, Chloe O. Stanwyck, Jenelle A. Jindal, Sanmi Koyejo, David Stutz, Nigam H. Shah</dc:creator>
    </item>
    <item>
      <title>An Effective Router for Vision-Language Model Selection</title>
      <link>https://arxiv.org/abs/2606.08970</link>
      <description>arXiv:2606.08970v1 Announce Type: new 
Abstract: Vision-language models (VLMs) with varying performance and resource requirements are widely deployed, making it difficult for users to select the most appropriate one among numerous VLM candidates. Existing work reveals the performance paradox phenomenon in language models and focuses on routing methods to solve it. However, developing a router for VLM selection is still a critical yet challenging problem, which primarily faces: 1) lack of specialized data, 2) ineffective feature representation, and 3) rigid model space and costly adaptation. In this paper, we construct a multimodal dataset for VLM selection, containing the outputs of seven mainstream VLMs on 32,626 unique image-text queries. We then propose ARMS, a router for VLM selection. ARMS enhances input signals with VLM profiles, employs a simple but effective architecture to improve representations of queries and VLM capabilities. To improve ARMS' adaptation to new VLMs, we propose two extension training strategies: incremental training and independent training. Experimental results on both in-distribution and out-of-distribution test sets demonstrate the effectiveness of ARMS. In particular, using our training strategy, ARMs (only 800M in size) can adapt to a broader VLM space and defeat commercial models like GPT-4o that are hundreds of times larger in scale. Our code, models, and datasets are available in the anonymous repository.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08970v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Can Wang, Shengwei Wang, Bolin Zhang, Zhiying Tu, Dianhui Chu</dc:creator>
    </item>
    <item>
      <title>Diverse Thinking Schemata Elicit Better Reasoning in Large Language Models</title>
      <link>https://arxiv.org/abs/2606.08974</link>
      <description>arXiv:2606.08974v1 Announce Type: new 
Abstract: Large reasoning models (LRMs) have attracted increasing attention for their ability to solve complex mathematical problems by generating extended reasoning chains. In this work, we focus on two critical yet underexplored aspects of the reasoning process: reasoning transitions capturing the distinct transitions between reasoning steps and answer candidates reflecting the variety of solution paths produced by the model. We collectively define these two aspects as thinking schemata. We observe a correlation between the diversity of thinking schemata and model performance, which motivates us to enhance diversity as a means to further improve reasoning potential. To this end, we propose Diverse Schemata Policy Optimization (DiScO), a framework that first endows the model with schemata awareness, then encourages diversity through reinforcement learning, and further promotes diverse reasoning at inference time. Experiments on multiple mathematical reasoning benchmarks demonstrate that DiScO consistently outperforms standard group relative policy optimization. Beyond accuracy, human-annotated analyses show that DiScO substantially improves the model's ability to recover from erroneous initial attempts. Overall, our work suggests the important role that diversity of the thinking schemata plays and points to scaling along the diversity dimension as a promising research direction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08974v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xinyue Liang, Yizhe Yang, Yu Bai, Bin Xu, Jiawei Li, Yang Gao</dc:creator>
    </item>
    <item>
      <title>RTL-BenchLS: A Large-Scale Benchmark for RTL Reasoning and Generation with Large Language Models</title>
      <link>https://arxiv.org/abs/2606.08976</link>
      <description>arXiv:2606.08976v1 Announce Type: new 
Abstract: LLM-based RTL generation and reasoning is a promising direction for hardware design automation. High-quality benchmarks are critical infrastructure for tracking progress in this direction. However, existing RTL benchmarks face inherent limitations in both scale and task scope. The designs they cover are typically small and simple, and the tasks focus almost entirely on specification-to-RTL generation. Frontier models' performance already saturates on the existing benchmarks. Scaling these benchmarks up is fundamentally difficult because aligned labels are required for benchmarking, such as specifications and testbenches. Such aligned high-quality data are rarely available for real-world designs. We introduce RTL-BenchLS, a large-scale benchmark addressing both limitations above. It contains over 10,000 formally verified Verilog designs, covering substantially larger and more complex designs than existing benchmarks. Beyond specification-to-RTL generation, we propose three novel tasks that jointly evaluate reasoning and generation: round-trip reasoning, masked-content reasoning, and repository-issue reasoning. The first two are self-supervised, which directly resolves the scaling bottleneck. All tasks are verified through formal equivalence checking without any manual testbenches. We evaluate eight LLMs on RTL-BenchLS. Even the best model reaches only 23% on natural-language round-trip reasoning, 28% on masked-content reasoning, and 12% on repository-issue fixing. RTL-BenchLS is substantially more challenging than existing benchmarks. It leaves ample room for future improvement and offers guidance for developing LLM-based methods for hardware design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08976v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jing Wang, Shang Liu, Wenji Fang, Yuchao Wu, Yugao Zhu, Zhiyao Xie</dc:creator>
    </item>
    <item>
      <title>Online Learning with Recency: Algorithms for Sliding-window Streaming Multi-armed Bandits</title>
      <link>https://arxiv.org/abs/2606.08977</link>
      <description>arXiv:2606.08977v1 Announce Type: new 
Abstract: Motivated by the recency effect in online learning, we study algorithms for single-pass *sliding-window streaming multi-armed bandits (MABs)* in this paper. In this setting, we are given $n$ arms with unknown sub-Gaussian reward distributions and a parameter $W$. The arms arrive in a single-pass stream, and only the most recent $W$ arms are considered valid. The algorithm is required to perform pure exploration and regret minimization with limited memory, defined as the number of stored arms. The model is a natural extension of the streaming multi-armed bandits model (without the sliding window) that has been extensively studied in recent years. We provide a comprehensive analysis of both the pure exploration and regret minimization problems with the model. For pure exploration, we prove that finding the best arm is hard with sublinear memory while finding an approximate best arm admits an efficient algorithm. For regret minimization, we explore a new notion of regret and give sharp memory-regret trade-offs for any single-pass algorithm. We complement our theoretical results with experiments, demonstrating the trade-offs between sample, regret, and memory.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08977v1</guid>
      <category>cs.LG</category>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Vladimir Braverman, Chen Wang, Liudeng Wang, Samson Zhou</dc:creator>
    </item>
    <item>
      <title>Heterophily-Aware Adaptive Knowledge Distillation for Hypergraph Neural Networks</title>
      <link>https://arxiv.org/abs/2606.08978</link>
      <description>arXiv:2606.08978v1 Announce Type: new 
Abstract: Hypergraph knowledge distillation aims to retain the predictive performance of a hypergraph neural network (HNN) teacher while reducing inference costs through a lightweight student model. In this work, we observe that HNNs exhibit substantially lower prediction performance on heterophilic nodes connected through semantically diverse hyperedges, indicating that the reliability of teacher knowledge varies across nodes. Motivated by this observation, we propose HADES, a heterophily-aware adaptive distillation method for hypergraph neural networks. HADES quantifies node heterophily and leverages it as an estimate of teacher reliability to modulate the transfer of teacher knowledge during distillation. Experimental results on real-world hypergraphs demonstrate that HADES consistently improves student performance across different HNN teachers and distillation objectives. In many cases, the resulting student models surpass the predictive performance of their teachers while achieving up to 12.3 times faster inference.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08978v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Joohee Cho, David Yoon Suk Kang, Yunyong Ko</dc:creator>
    </item>
    <item>
      <title>EviProp: Seeded Relevance Diffusion on Chunk-Page Graphs for Long Multimodal Document Retrieval</title>
      <link>https://arxiv.org/abs/2606.08979</link>
      <description>arXiv:2606.08979v1 Announce Type: new 
Abstract: Retrieving evidence pages from visually rich long documents is a key challenge in document question answering. Existing page-level visual retrievers operate under an independent matching paradigm: each page is scored in isolation based on query-page similarity. This paradigm can under-rank evidence pages whose signals are localized in fine-grained chunks or depend on document-internal associations. We propose EviProp, a retrieval method that recovers such pages via seeded relevance diffusion. EviProp models each document as a multimodal Chunk-Page graph with hierarchical, sequential, and similarity links. Given a query, it combines dense visual page priors with sparse chunk seeds, then runs Personalized PageRank to diffuse relevance over the graph. Experiments on MMLongBench-Doc and LongDocURL show consistent gains in evidence-page retrieval over independent visual retrieval and text-visual fusion baselines. Downstream QA results further show that improved retrieval translates into better answer accuracy, with negligible online retrieval overhead. Our code is released at https://github.com/Flyecnu/EviProp.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08979v1</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hongwei Zhang, Xiaoman Wang, Zehui Ling, Ruicheng Zhu, Yue Zhang, Pinlong Cai, Fuke Shen, Botian Shi, Tongquan Wei, Guohang Yan</dc:creator>
    </item>
    <item>
      <title>EPS3D: End-to-End Feed-Forward 3D Panoptic Segmentation</title>
      <link>https://arxiv.org/abs/2606.08980</link>
      <description>arXiv:2606.08980v1 Announce Type: new 
Abstract: This paper introduces EPS3D, a new end-to-end feed-forward framework for open-vocabulary 3D panoptic segmentation. Unlike existing methods relying on additional preprocessing, we design an end-to-end architecture, with a distillation-based training strategy on diverse 3D scenes to predict 3D-aware semantic and instance features from multi-view images, improving 3D consistency and avoiding error accumulation. We further propose a mutual enhancement module to enforce inherent semantic-instance consistency. By aligning semantics within instances (Ins2Sem) and refining instance features with semantic guidance (Sem2Ins), we achieve more coherent 3D scene understanding. Ultimately, EPS3D outperforms SOTA baselines on two benchmarks (e.g., +13% mIoU for semantics on Replica) with high efficiency (e.g., 1s per scene), supporting tasks like robotic manipulation and 3D scene editing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08980v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Runsong Zhu, Jiaxin Guo, Xiaoyang Guo, Zhengzhe Liu, Ka-Hei Hui, Wei Yin, Kai Chen, Wei Chen, Weiqiang Ren, Yunhui Liu, Pheng-Ann Heng, Chi-Wing Fu</dc:creator>
    </item>
    <item>
      <title>Baichuan-M4: A Clinical-Grade Medical Agent System for Continuous Care</title>
      <link>https://arxiv.org/abs/2606.08982</link>
      <description>arXiv:2606.08982v1 Announce Type: new 
Abstract: Baichuan-M4 is Baichuan Intelligence's clinical-grade medical large model, designed for \emph{continuous care} rather than single-turn medical question answering. It is built as a coordinated medical agent system around three pillars: \textbf{Baichuan-Harness}, a unified runtime that keeps reinforcement-learning training and real-world deployment consistent while enforcing action constraints, tool use, long-term patient memory, and multi-agent coordination; a \textbf{core reasoning model} trained with a continuous-care reinforcement-learning framework that integrates span-level reward modeling (SPAR++), reasoning-path compression, curriculum learning, and stabilized policy optimization; and a \textbf{clinical tool layer} for patient-memory management, authoritative evidence-based retrieval, and multimodal medical perception across documents, X-rays, and dermatology. On a cross-dimensional medical evaluation suite, Baichuan-M4 attains leading results in static medical knowledge and safety, dynamic OSCE-style consultation, long-context clinical memory, evidence-based retrieval, medical document OCR, and multimodal image understanding, while lowering the hallucination rate to 3.3\%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08982v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Aiyuan Yang, Chengfeng Dou, Da Pan, Dian Wang, Fan Yang, Fei Deng, Fei Li, Guangwei Ai, Hui Liu, Hongda Zhang, Jinyang Tai, Kai Lu, Lijun Liu, Linwei Chen, Linyu Li, Meiqing Guo, Peidong Guo, Qiang Ju, Rihui Xin, Shuai Wang, XinKai Ma, Xudong Chen, Yichuan Mo, Canbin Piao, Leyi Pan, Yihe Luo, Zian Wang</dc:creator>
    </item>
    <item>
      <title>Beyond Neural Collapse: Task-Intrinsic Geometry Governs Neural Representations in Modular Arithmetic</title>
      <link>https://arxiv.org/abs/2606.08985</link>
      <description>arXiv:2606.08985v1 Announce Type: new 
Abstract: While neural collapse (NC) predicts that a $K$-class-balanced classifier should organize terminal representations as a $(K-1)$-dimensional simplex equiangular tight frame (ETF), modular addition consistently enters a different regime: networks compress to a two-dimensional cyclic geometry in which both classifier weights and token embeddings lie on circles. We refine the explanation of this phenomenon in three directions. First, we formalize a layerwise non-uniform training mechanism: downstream classifier weights are driven by dense cross-entropy gradients into a rank-2 equiangular configuration before upstream embeddings fully reorganize, and once this classifier plane forms, backpropagated feature gradients constrain embedding motion to the same plane while weight decay suppresses orthogonal components. Second, after this subspace locking, the induced in-plane dynamics admit an entropy-regularized transport interpretation on $S^1$; combined with modular-addition labels, this reduces embedding formation to phase alignment, whose minimizers are single-frequency characters of $\mathbb{Z}/P\mathbb{Z}$ and hence equal-angle points on a circle. Third, we quantify why this solution prevails over NC: a simplex ETF gains only an $O(1)$ advantage in cross-entropy, whereas the cyclic rank-2 solution enjoys a $\Theta(K)$ advantage under Schatten or weight-decay surrogates, yielding a critical threshold $\lambda_{\mathrm{crit}} = \Theta(1/K)$. Our results explain both why classifier weights move first and why embeddings subsequently align with them, showing that grokking on modular arithmetic is governed not by maximal separation alone but by a task-structured trade-off between separation, symmetry, and complexity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08985v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hu Tan, Kuo Gai, Shihua Zhang</dc:creator>
    </item>
    <item>
      <title>Structure-Aware Modeling of Multiple-Choice Questions Improves Automatic Difficulty Estimation</title>
      <link>https://arxiv.org/abs/2606.08988</link>
      <description>arXiv:2606.08988v1 Announce Type: new 
Abstract: Automatic Question Difficulty Estimation (AQDE) holds growing promise for educational assessment because it has the potential to yield difficulty estimates that are competitive with expert judgment, while helping reduce the time and financial burden associated with pilot administrations and scaling to digital testing contexts. Prior AQDE studies report mixed evidence on whether adding distractors as additional text to the question stem and the correct key consistently improves difficulty prediction. We hypothesize that the effectiveness of distractor information depends on its structural representation, and that explicitly modeling distractors as separate components improves difficulty estimation over baselines that omit this information. To address this, we designed controlled architectures that model MCQ components as distinct inputs to isolate the contribution of distractor content and order. Specifically, we represented distractors by encoding each distractor as its own text input and aggregating their representations either with order-aware concatenation (with positional tags) or with an order-invariant summation. We evaluated these architectures using two Chilean datasets (Natural and Social Sciences, 2016-2020; 4,114 multiple-choice questions). Compared to a simpler model that only used the question stem and the key, our best distractor-aware architecture achieved higher predictive performance, reaching R^2 = 0.83 for Natural Sciences and R^2 = 0.71 for Social Sciences items. An order-invariant variant achieved nearly the same accuracy with approximately half as many parameters, offering a favorable accuracy-efficiency trade-off. These results show that structural information (especially distractor content) drives gains in predictive accuracy, supporting the development of efficient, structure-aware models that are computationally viable for large-scale educational applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08988v1</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Gabriel Ortega, Abelino Jim\'enez, S\'everin Lions, Pablo Dartnell</dc:creator>
    </item>
    <item>
      <title>SpaceVLN: A Zero-Shot Vision-and-Language Navigation Agent with Online Spatial Cognitive Memory and Reasoning</title>
      <link>https://arxiv.org/abs/2606.08992</link>
      <description>arXiv:2606.08992v1 Announce Type: new 
Abstract: Vision-and-Language Navigation in continuous environments requires agents to understand the spatial structure of previously unseen environments in order to follow language instructions. Although foundation models have opened a promising path toward zero-shot navigation without task-specific policy training, many navigators still rely on local visual cues and linear history-based reasoning, overlooking the spatial nature of navigation across explored regions, traversed paths, landmarks, and their spatial relations. In this paper, we propose SpaceVLN, a navigation agent built around Spatial Cognitive Memory and Task-Guided Spatial Reasoning. Specifically, SpaceVLN introduces an efficient stagewise closed-loop framework where planning and execution are organized around verifiable space--landmark stages. During navigation, the agent progressively abstracts explored regions into Spatial Waypoints and dynamically maintains subtask-grounded landmark evidence, forming a hierarchical Spatial Cognitive Memory for progress localization and spatial-relation understanding. Built on this memory, Spatial-CoT integrates task-progress reasoning with spatial perception, analysis, and prediction, enabling Task-Guided Spatial Reasoning for embodied navigation. The unified stage interface enables SpaceVLN to address both Vision-and-Language Navigation and Object-Goal Navigation under a unified zero-shot setting, without task-specific policy training. Across R2R-CE, RxR-CE, GN-Bench, and HM3D-OVON, SpaceVLN achieves state-of-the-art zero-shot performance, and real-robot deployment further validates its applicability. These results highlight Spatial Cognitive Memory and Task-Guided Spatial Reasoning as a practical foundation for stronger embodied navigation agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08992v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yucheng Deng, Pingrui Lai, Xinhai Li, Chenjia Bai, Xiaoheng Deng, Chengnuo Sun, Xuelong Li, Hua Yang</dc:creator>
    </item>
    <item>
      <title>LEAF: A Learning-Enabled ADMM Framework for Accelerated Convex Optimization</title>
      <link>https://arxiv.org/abs/2606.08993</link>
      <description>arXiv:2606.08993v1 Announce Type: new 
Abstract: We propose LEAF, a learning-enabled ADMM framework for accelerated convex optimization. The key idea is to approximate the Moreau envelope of the objective function using an Input Convex Neural Network (ICNN), resulting in a learned model that preserves convexity and smoothness. This leads to the proposed Moreau Envelope Learning ADMM (MEL-ADMM) and its splitting variant sMEL-ADMM. Unlike existing approaches that learn high-dimensional operators directly, LEAF learns a scalar-valued Moreau envelope, significantly reducing model complexity and improving data efficiency. The framework accommodates a broad class of convex problems with smooth and non-smooth objectives. By embedding convexity explicitly through the ICNN architecture, the proposed approach maintains high approximation accuracy while preserving key structural properties of the optimization problem. Both MEL-ADMM and sMEL-ADMM are developed with theoretical guarantees of convergence and feasibility under the learned model. Rigorous analysis shows that the proposed methods achieve convergence rates comparable to classical ADMM while reducing per-iteration computational cost. Numerical experiments demonstrate up to an order-of-magnitude speedup over state-of-the-art solvers while maintaining low optimality gaps</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08993v1</guid>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Binh Nguyen, Trinh Tran, Truong X. Nghiem</dc:creator>
    </item>
    <item>
      <title>Language-Aware Token Boosting: LLM Language Confusion Reduction Without Tuning</title>
      <link>https://arxiv.org/abs/2606.08994</link>
      <description>arXiv:2606.08994v1 Announce Type: new 
Abstract: Large language models (LLMs) sometimes exhibit language confusion when generating non-English text. Existing approaches typically rely on fine-tuning to mitigate this issue. In contrast, we propose a tuning-free paradigm for reducing language confusion. Within this paradigm, we introduce two methods: Language-Aware Token Boosting (LATB), which applies targeted perturbations to tokens associated with the desired language, and Adaptive Language-Aware Token Boosting (Adaptive-LATB), which dynamically adjusts these perturbations based on the model's confidence in the intended language. Experiments demonstrate that our methods effectively improve multilingual alignment by reducing language confusion, while maintain the summarization quality without requiring any additional fine-tuning. Our code is publicly available. https://github.com/scbdatax/genai-datax-language-aware-token-boosting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08994v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Trapoom Ukarapol, Pakhapoom Sarapat, Nut Chukamphaeng</dc:creator>
    </item>
    <item>
      <title>The Token Not Taken: Sampling, State, and the Variability of AI Agent Outputs</title>
      <link>https://arxiv.org/abs/2606.08998</link>
      <description>arXiv:2606.08998v1 Announce Type: new 
Abstract: Agentic AI systems can behave differently across runs: the same request may produce a different plan, a different tool call, a different code edit, or a different final answer. Such variability arises from several layers that are often conflated. A foundation model is a large pretrained model, usually adaptable to many downstream tasks, that maps an input context to predictions over outputs. In many current agents, that model is embedded in an orchestration loop that plans, calls tools, observes results, and updates state. One explicit intrinsic source of variability in such systems is token generation: the model computes scores over possible next tokens, the scores are converted into probabilities, and a decoder may sample tokens using a pseudo-random number generator. A small sampled token difference can then propagate upward into a different tool call, code path, search query, or agent state. Other sources of variability are extrinsic to token sampling, including changing environments, live data, serving infrastructure, batch effects, and numerical details. By separating these layers, the manuscript clarifies what it means to call agentic AI systems stochastic, when such variability can be reproduced under matched conditions, and why deterministic execution need not imply identical behavior in deployed settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08998v1</guid>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <category>econ.GN</category>
      <category>q-fin.EC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Muhammad Zia Hydari, Raja Iqbal</dc:creator>
    </item>
    <item>
      <title>JAX-AMG: A GPU-Accelerated Differentiable Sparse Linear Solver Library for JAX</title>
      <link>https://arxiv.org/abs/2606.09001</link>
      <description>arXiv:2606.09001v1 Announce Type: new 
Abstract: Sparse linear systems from PDE discretizations are central to scientific computing, yet no existing JAX-ecosystem solver simultaneously provides GPU-accelerated algebraic multigrid (AMG), automatic differentiation (AD), and distributed multi-GPU execution. JAX-AMG fills this gap by wrapping the Nvidia AmgX solver suite as a native JAX primitive, exposing AMG and Krylov methods with configurable preconditioners through a unified interface compatible with JIT compilation, reverse-mode AD via adjoint methods, batched solves, and MPI-based distributed execution. Solver caching amortizes setup costs across repeated solves, making JAX-AMG practical for PDE-constrained optimization and inverse problems. The result is a robust, scalable sparse linear algebra layer that integrates seamlessly into differentiable simulation and scientific machine learning pipelines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09001v1</guid>
      <category>cs.MS</category>
      <category>physics.comp-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Yi Liu, Xiantao Fan, Jian-Xun Wang</dc:creator>
    </item>
    <item>
      <title>LATTEArena: An Evaluation Framework for LLM-powered Tabular Feature Engineering (Extended Version)</title>
      <link>https://arxiv.org/abs/2606.09004</link>
      <description>arXiv:2606.09004v1 Announce Type: new 
Abstract: Feature engineering remains essential for tabular data analysis, and Large Language Models (LLMs) have emerged as a promising paradigm for automating this process, giving rise to LLM-powered AuTomated Tabular feature Engineering (LATTE). However, the absence of standardized platforms prevents fair, cost-aware comparisons. Furthermore, complex methodological designs obscure the specific contributions of individual components; for example, although LFG integrates Tree-of-Thought, few-shot demonstrations, Monte Carlo Tree Search, and natural language generation, the isolated impact of each technique's competitive edge remains unquantified. To address these challenges, we introduce LATTEArena, the first competitive evaluation framework featuring: (1) a six-dimensional taxonomy decomposing 15 representative methods into reusable components; (2) a standardized modular arena for controlled comparison; (3) multi-dimensional assessments covering performance, cost, and robustness; and (4) component-level ablation quantifying each technique's competitive edge. Through extensive evaluations, we reveal 16 key findings, including: (1) Tree-of-Thought with Monte Carlo Tree Search achieves optimal cost-effectiveness; (2) RPN and Code output formats dominate classification and regression tasks, respectively. We publicly release the modular framework and over 4000 execution logs, enabling researchers to seamlessly pit new techniques against existing ones and advance LATTE.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09004v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ankai Hao, Ke Chen, Huan Li, Lidan Shou</dc:creator>
    </item>
    <item>
      <title>Document-Authored Control-Signal Impersonation: A Low-Cost Indirect Prompt Attack on RAG Safety Boundaries</title>
      <link>https://arxiv.org/abs/2606.09005</link>
      <description>arXiv:2606.09005v1 Announce Type: new 
Abstract: Retrieval-augmented generation (RAG) systems often serialize user queries, retrieved documents, metadata, system labels, and task instructions into one natural-language prompt. We study a source-authority boundary failure in this design: attacker-authored retrieved text can impersonate metadata, provenance, authority, or disclosure-policy signals that appear control-relevant to the model. We call this pattern Document-Authored Control-Signal Impersonation (DACSI). DACSI is a non-imperative, metadata-like payload subclass within indirect prompt injection. Its central lesson is simple: document-authored labels are data, not policy. Command-style injection asks the model to ignore, override, or violate policy; DACSI asks whether untrusted document text can be misattributed as an authorized control signal when RAG prompt rendering collapses trusted and untrusted text into the same natural-language channel.
  We evaluate DACSI across six model settings, prompt-pressure levels, injection baselines, signal taxonomies, RAG-mediated pipelines, system-control probes, a source-authority attribution probe, and synthetic canary formats. We interpret the evidence by model regime rather than as six equal replications: DeepSeek V4 Pro and Qwen3.5-397B provide the cleanest positive lift, DeepSeek V4 Flash is a high-susceptibility setting, GPT-5.5 and Gemini 3.1 Pro Low are strong-boundary probes with selected residual risks, and GLM-4.7 is a saturated leakage boundary case. Across these regimes, DACSI warrants separate evaluation because it uses a command-free metadata/provenance/policy surface, follows a RAG-specific source-authority path, and responds to source/channel separation. The source-authority probe is behavioral attribution evidence, not proof of an internal mechanism.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09005v1</guid>
      <category>cs.CR</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jianguo Zhu</dc:creator>
    </item>
    <item>
      <title>Sustainability and Artificial Intelligence: Necessary, Challenging, and Promising Intersections</title>
      <link>https://arxiv.org/abs/2606.09006</link>
      <description>arXiv:2606.09006v1 Announce Type: new 
Abstract: Both digital economy and digital technology researchers increasingly recognize the need to better address the role that artificial intelligence (AI) plays in shaping the evolution of the environmental, social and governance aspects of development. It appears that sustainability and AI research converge on the features of wicked problems that are complex, interconnected and dynamic. Building off such convergence, this article aims to map out the necessary, challenging, and promising intersections by providing an overview of the state of art research. Based on 541 bibliographic data collected from the Web of Science (WoS) database, the findings reveal the increasingly central body of work on green and sustainable science and technology in bridging various disciplines, main journals and key topics and concepts. The findings reveal how such interactions can be necessary, challenging, and promising. The article concludes with few general arguments regarding how to diversify and expand the community of practice regarding AI for sustainable development, especially in the areas of expected AI application areas and institutions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09006v1</guid>
      <category>cs.SI</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <category>cs.ET</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:DOI>10.1109/MSIEID52046.2020.00076</arxiv:DOI>
      <arxiv:journal_reference>2020 Management Science Informatization and Economic Innovation Development Conference (MSIEID), Guangzhou, China, 2020, pp. 360-363</arxiv:journal_reference>
      <dc:creator>Han-Teng Liao, Zijia Wang</dc:creator>
    </item>
    <item>
      <title>High-Order Regularity and a Fully Discrete Fourier Spectral Method for a Partially Dissipative Viscoelastic Timoshenko System with Memory</title>
      <link>https://arxiv.org/abs/2606.09007</link>
      <description>arXiv:2606.09007v1 Announce Type: new 
Abstract: This paper investigates a class of partially dissipative viscoelastic Timoshenko systems with memory, where dissipation is induced by a Volterra-type memory term acting only on the shear variable. The well-posedness of weak and strong solutions is established on finite time intervals, including existence, uniqueness, stability, and higher-order regularity under compatibility conditions consistent with mixed boundary conditions. For the numerical approximation, a Fourier spectral fully discrete scheme is constructed: sine and cosine basis expansions are used in space for unknowns satisfying Dirichlet and Neumann boundary conditions, respectively; in time, a central difference scheme is applied to the second-order derivatives, and the composite trapezoidal rule is used to approximate the memory convolution term. Based on a discrete energy method, the positivity of the constructed discrete energy is proved, and the error estimate for the fully discrete scheme with second-order convergence in time and \(q\)-th order in space is established for any q \in \mathbb{N}. Numerical experiments are given to verify the theoretical convergence rates and to compare the dynamic responses of the local and nonlocal models, demonstrating that the memory term effectively captures energy dissipation and vibration attenuation behavior in viscoelastic materials.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09007v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhenyang Zhong, Hui Liang</dc:creator>
    </item>
    <item>
      <title>Scaling by Diversified Experience for Vision-Language-Action Models</title>
      <link>https://arxiv.org/abs/2606.09009</link>
      <description>arXiv:2606.09009v1 Announce Type: new 
Abstract: Vision-Language-Action models face significant challenges in real-world deployment due to the entanglement of high-level reasoning with low-level control, and the instability of policy optimization. In this paper, we introduce SyVLA, a robust VLA model trained with diversified experiences. We propose an Intention Decoupling algorithm to isolate control-relevant features from reasoning contexts and a similar-sample guided RL pipeline to stabilize policy updates and mitigate distribution shift. Extensive experiments on real-world robotic tasks and multi-modal benchmarks demonstrate that SyVLA achieves superior task success rates and stronger out-of-distribution generalization compared to existing methods, while effectively preserving core vision-language capabilities. Codes and Datasets is released on \href{https://sy-vla.github.io/}{project page}.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09009v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Leiyu Wang, Zhaofengnian Wang, Xueqi Li, Luoyi Fan, Cewu Lu, Nanyang Ye</dc:creator>
    </item>
    <item>
      <title>Understanding Quantization-Aware Training: Gradients at Quantized Weights Bias to the Low-Loss Basin</title>
      <link>https://arxiv.org/abs/2606.09012</link>
      <description>arXiv:2606.09012v1 Announce Type: new 
Abstract: Post-training quantization (PTQ) converts a trained full-precision model into low-bit weights without task-level retraining, while quantization-aware training (QAT) incorporates quantization into the training loop. Although PTQ is efficient and often accurate at moderate bitwidths, it can fail sharply at aggressive bitwidths; QAT is more expensive but can often recover the lost accuracy. We propose a unified geometric framework that explains both PTQ failure and QAT recovery. We model full-precision training as following a low-loss \emph{river} inside a wider \emph{valley}: a normal neighborhood of the river forms a nearly flat \emph{basin}, while leaving this basin incurs a sharp loss increase. When the quantization grid is comparable to the basin width, local PTQ objectives, including rounding and Hessian-based second-order reconstruction, can select a high-loss deployed quantized point outside the basin even when nearby low-loss quantized points exist. In this regime, straight-through-estimator-based QAT has a useful bias: it evaluates gradients at the deployed quantized weights while updating latent full-precision weights, causing the gradient to sense the valley wall and acquire an inward component that steers subsequent quantized iterates back into the basin. We formalize this mechanism through a local landscape model, construct a geometric PTQ failure mode, and prove finite-time QAT recovery under local quantizer-compatibility assumptions. Experiments across vision and language models under multiple neural-network quantization schemes corroborate the predicted basin-crossing failure of PTQ and the corresponding recovery mechanism of QAT.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09012v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>math.OC</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hanyang Li, Jianhao Ma, Ying Cui</dc:creator>
    </item>
    <item>
      <title>Beyond Averages: Evaluating LLMs on Human Survey Replication at the Distributional Level</title>
      <link>https://arxiv.org/abs/2606.09013</link>
      <description>arXiv:2606.09013v1 Announce Type: new 
Abstract: LLMs are increasingly used to simulate human survey responses, but prior work has mainly evaluated replication using mean-level or aggregate agreement, offering limited insight into whether LLMs reproduce the variability of human behavior. We evaluate LLM-based survey replication at the distributional level using a non-public 2010 consumer choice experiment on Korean instant noodle purchases, a setting unlikely to overlap with model training data. We evaluate three response variables of differing statistical type: binary purchase incidence, categorical brand choice, and count purchase quantity. For each, we compare human and LLM responses at mean-level, pattern, and distributional alignment, and against reference baselines from the human data alone. LLMs reproduce condition-level patterns reasonably well but fail to capture distributional structure: for purchase quantity, no model beats a condition-insensitive baseline that simply matches the pooled human distribution. Because models that match human means well can still produce distributions further from humans than this baseline, mean-based evaluation alone can be actively misleading. Replication also varies with input configuration, with structured personas and multimodal inputs improving alignment while explicit reasoning prompting degrades it monotonically.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09013v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jeonghyeon Moon, Jiwon Kim, Yeheum Lah, Yoonju Han, Yuncheol Kang</dc:creator>
    </item>
    <item>
      <title>Deterministic versus Stochastic Optimization for Joint Path Planning and Dynamic Time Splitting in Multiple-UAV-Cached IoT Networks</title>
      <link>https://arxiv.org/abs/2606.09014</link>
      <description>arXiv:2606.09014v1 Announce Type: new 
Abstract: This paper examines wireless-powered Internet of Things (IoT) networks involving multiple unmanned aerial vehicles (UAVs) equipped with backscatter and caching technologies to relay and transmit signals. For data communication and energy harvesting (EH), the source transmits information and power to UAVs using the dynamic time splitting (DTS) method. UAVs use harvested energy for passive communication (backscatter) and for active communication (transmitting information) to the destination. The primary objective is to maximize the total throughput by jointly optimizing the DTS ratio, trajectory, and transmission power, leveraging the UAVs' caching capability. This optimization problem is challenging due to its non-convexity. Therefore, an efficient alternating algorithm using the block coordinate descent (BCD) method is proposed to optimize each variable given the fixed values of the other parameters. By applying the Karush-Kuhn-Tucker (KKT) conditions, we derive a closed-form expression for the optimal DTS ratio, significantly reducing computation time. The optimal values for the other two parameters are determined using the BCD. In order to thoroughly assess the effectiveness of various solutions for the original problem, this paper introduces an approach leveraging a genetic algorithm (GA). The GA in this context employs a one-point crossover method, value mutation, and rank-based selection based on fitness values. Numerical results show that the BCD and GA achieve at least 31% throughput improvement over the benchmarks, with reduced computational time. These findings demonstrate the performance gain and practical feasibility of our solutions in caching-enabled UAV-aided IoT networks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09014v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Trinh Van Chien, Dinh Thanh Tung, Waqas Khalid, Ngo Cong Dung, Banh Thi Quynh Mai, Symeon Chatzinotas</dc:creator>
    </item>
    <item>
      <title>MaterialClusterGS: Palette-Based Material Decomposition and Physically-Based Relighting with 2D Gaussian Splatting</title>
      <link>https://arxiv.org/abs/2606.09018</link>
      <description>arXiv:2606.09018v1 Announce Type: new 
Abstract: We present MaterialClusterGS, a palette-based material decomposition framework for 2D Gaussian Splatting that enables physically based relighting and material editing. Existing Gaussian inverse rendering methods typically assign independent BRDF parameters to individual primitives. While flexible, this local fitting strategy makes material recovery highly under-constrained: shadows, indirect illumination, geometric errors, and visibility residuals can be absorbed into thousands of slightly different local material estimates. Meanwhile, recent palette-based appearance methods operate solely in RGB space without modeling physical materials or illumination. To bridge this gap, we represent scene materials using a compact global palette of shared BRDF prototypes assigned via a continuous spatial material field. Without shared material structure, editing one region does not propagate consistently to others of the same material, making per-primitive decompositions impractical for editing. We jointly optimize the material field, palette prototypes, and environment lighting under a physically based rendering objective. The resulting framework recovers compact, spatially coherent attributes directly usable for material editing, relighting, and transfer.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09018v1</guid>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hao Zhang, Ang Li, Boyan Du, Junke Zhu, Fei Zhu, Meng Gai, Zhangjin Huang, Guoping Wang, Sheng Li</dc:creator>
    </item>
    <item>
      <title>TLDR: Compressing Audio Tokens for Efficient Autoregressive Text-to-Speech</title>
      <link>https://arxiv.org/abs/2606.09019</link>
      <description>arXiv:2606.09019v1 Announce Type: new 
Abstract: Codec-based autoregressive (AR) speech language models have achieved strong text-to-speech (TTS) quality by modeling speech as sequences of discrete audio tokens with large pretrained backbones. However, this token-level formulation creates a structural efficiency bottleneck: speech-token sequences are much longer than text sequences, requiring the AR backbone to perform causal computation at every token position and maintain a KV cache that grows with the sequence length. We introduce TLDR, a patch-based autoregressive framework that accelerates codec-based AR-TTS by shifting the causal modeling from token-level speech sequences to patch-level sequences. TLDR groups consecutive codec tokens into compact latent patches using a lightweight compressor, models the resulting shorter patch sequence with a frozen pretrained AR-TTS backbone adapted by LoRA, and reconstructs fine-grained speech tokens within each patch using a speaker-conditioned extractor. With a patch size of 4, TLDR achieves a 1.8x inference speedup over the baseline AR-TTS model and reduces global KV-cache memory by up to 75%. Experimental results indicate that patch-level global causal modeling can be a practical way to reduce the inference cost of pretrained codec-based AR-TTS systems without replacing the existing modules.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09019v1</guid>
      <category>cs.SD</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yejin Lee, Junwon Moon, Hyoeun Kim, Hyunjin Choi, Heeseung Kim, Kyuhong Shim</dc:creator>
    </item>
    <item>
      <title>Personal Salience: Highlighting Is Social, but Individuality Lives in Selection</title>
      <link>https://arxiv.org/abs/2606.09024</link>
      <description>arXiv:2606.09024v1 Announce Type: new 
Abstract: Social highlighters let people mark passages that matter to them. We ask how much of an individual is recoverable from these naturalistic traces, using a co-readership identity control (the same document highlighted by many users) that holds document and topic fixed and asks whether a person's own history predicts their marks better than another reader's does. We separate generic salience (structure), crowd salience (what others marked), and personal salience (the individual residual). First, highlighting is social: which sentences you mark is predicted far better by the crowd than by structure or by a personal model, and even a well-estimated crowd, an information-privileged baseline that sees others' marks on the same document, beats a frontier LLM twin built from your other-document history; the within-document personal signal is at most a whisper (own-vs-other gap +0.017 by an embedding scorer, small but significant). Second, in sharp contrast, individuality lives in selection: asked which of the already-salient passages are yours, your own history is a strong, leakage-free predictor (gap +0.14). A topic decomposition shows this is largely stable thematic preference: it shrinks ~6-8x against a topically-matched peer, and a thin residual cannot be separated from finer topic. The non-obvious part is an asymmetry: under the same scorer the individual signal is ~6-8x weaker in salience than in selection. Methodologically, naive history-conditioning evaluations leak (the target's own marks enter the profile in ~42% of pairs, inflating personal scores by up to +0.15 AP) and small crowds overstate personalization; our results are leakage-free, use a dense crowd, and a model-matched control. Highlights carry a genuine individual signature, but a thin layer over a strong shared one, surfacing far more in which salient things a person selects than in what is salient.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09024v1</guid>
      <category>cs.IR</category>
      <category>cs.CL</category>
      <category>cs.HC</category>
      <category>cs.SI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kazuki Nakayashiki, Keisuke Watanabe</dc:creator>
    </item>
    <item>
      <title>Structural Grid Descriptors Predict Within-Task Solver Success on ARC-AGI</title>
      <link>https://arxiv.org/abs/2606.09026</link>
      <description>arXiv:2606.09026v1 Announce Type: new 
Abstract: We ask whether structural properties of intermediate grid states predict whether a symbolic ARC-AGI solver will succeed, framed as a test of conditional mutual information I(X;Y|task) &gt; 0. Across 44,800 runs spanning two architecturally distinct solvers (beam search and Stochastic DFS), 400 ARC tasks, 28 configurations per solver, and both training and evaluation splits, hand-crafted grid descriptors measured at 50% trajectory completion discriminate successful from failed runs within the same task (mean within-task best-feature AUC = 0.885, p &lt; 0.001 under within-task label permutation). Most predictive content lies along a single grid-complexity axis. The result generalizes across solver architectures: a feature selected on one solver predicts success on the other with AUC 0.747-0.762 in all four transfer directions (p &lt; 0.001, leakage controlled). On a pre-registered held-out set of 41 reliable tasks, the frozen feature n_components_final achieves AUC = 0.765 (95% CI [0.717, 0.810], p &lt; 0.001), robust under task-clustered bootstrap resampling and cross-solver task collapsing. The signal is not explained by solver capacity (configuration-residualized AUC = 0.927 and 0.896 for beam search and SDFS, p &lt; 0.001) and is only weakly coupled to score trajectories (R^2 approximately 0). Early stopping at 50% completion reduces beam-search compute by 33.6% while retaining 98.9% of solves; degenerate-trajectory detection reduces SDFS compute by 65.3% with no solve loss. Finally, on 229 of 400 evaluation tasks the DSL primitive library produces no valid transition from the input grid. This 0-step collapse is invariant to search budget and universally failed by beam search, indicating a DSL coverage limitation rather than a search-budget effect.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09026v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ayan Pendharkar</dc:creator>
    </item>
    <item>
      <title>SafeRun: Enabling Determinism in LLM Planning for Running</title>
      <link>https://arxiv.org/abs/2606.09027</link>
      <description>arXiv:2606.09027v1 Announce Type: new 
Abstract: Large Language Models enable flexible natural-language planning but remain unreliable in determinism-critical domains due to their probabilistic nature. This limitation is especially problematic in running planning, where violating safety rules can lead to safety risks. We propose SafeRun, a framework for deterministic LLM-based planning via a decoupled architecture. SafeRun separates soft interpretation by an LLM from hard constraint enforcement by a deterministic solver, ensuring strict safety constraints while preserving natural-language flexibility. To validate SafeRun, we build a comprehensive benchmark for running planning under realistic physiological and safety constraints. Experiments across five LLMs show that SafeRun achieves 100\% safety score (vs.\ 79.1\% PE average and 97.6\% CodeAct average) while maintaining competitive instruction-following scores. The SafeRun benchmark is publicly available at \href{https://huggingface.co/datasets/zzp-seeker/SafeRun-RunPlanning-Benchmark}{{huggingface}}.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09027v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Meilin Chen, Zepeng Zhai, Jiaxuan Zhao, Yuan Lu</dc:creator>
    </item>
    <item>
      <title>ATM: Action-Consistency Transfer Matrix for Diagnosing and Improving Latent World Models</title>
      <link>https://arxiv.org/abs/2606.09028</link>
      <description>arXiv:2606.09028v1 Announce Type: new 
Abstract: Latent world models are increasingly used for control and goal-conditioned planning, yet assessing whether their learned representations are useful for planning usually requires slow, planner-coupled simulator evaluation with CEM or similar planners. Such evaluation is black-box and model-complexity-dependent: under the same protocol, different world models may require minutes to hours per checkpoint. In this work, we propose ATM, an Action-Consistency Transfer Matrix for diagnosing whether latent transitions preserve action semantics relevant to planning. ATM compares action information in real encoded transitions and model-predicted transitions through lightweight post-hoc probes, producing an interpretable matrix that reveals representation quality, transition-domain inconsistency, and failure modes without simulator rollout. It can also be collapsed into a simple screening score for within-task ranking across checkpoints, variants, and world models. When the true success gap is non-trivial, ATM achieves highly reliable pairwise ranking, while reducing minutes-to-hours CEM evaluation to seconds-level transition analysis, yielding more than 100x speedup in our setup. We further introduce AITS, showing that action-identifiability is not only diagnostic but also a useful training signal for improving downstream planning without changing the planner.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09028v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiaheng Chen</dc:creator>
    </item>
    <item>
      <title>Frequency Decoupled Framework for Screen Content Image Super-Resolution</title>
      <link>https://arxiv.org/abs/2606.09029</link>
      <description>arXiv:2606.09029v1 Announce Type: new 
Abstract: Methods based on implicit neural representations have demonstrated superior performance in Screen Content Image Super-Resolution (SCISR) . However, they overlooked the inherent frequency characteristics, leading to suboptimal performance. We propose a frequency decoupled framework (FDF) that rethinks SCISR from a phasor perspective by capturing structured energy in amplitude and relational continuity in phase, and jointly exploiting them with bespoke implicit representations to faithfully recover the regular textures and global configuration of Screen Content Image (SCI).
  Amplitude-Phase Factorization Network (APFN) first separates images into amplitude and phase streams, where Amplitude Clustering Module (ACM) organizes sparse yet high-energy amplitude responses into representative prototypes for periodic pattern extraction, while Phase Consistency Self-Attention (PCSA) progressively reinforces configuration through continuous consistency propagation.
  And Oscillation-Anharmonic Implicit Fitting Network (OAIF-Net) integrates periodic and coherent implicit representations for efficient exploitation of the periodic patterns and coherent context embedded in SCI.
  Experimental results show FDF achieves state-of-the-art SCISR performance at multiple scales across four public SCI datasets. Ablation experiments further demonstrate the effectiveness of each component in extracting and exploiting periodic patterns and coherent context.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09029v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Xufei Wang, Qicheng Zhang, Qi Wu, Ziyang Gu, Shizhuang Weng</dc:creator>
    </item>
    <item>
      <title>TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs</title>
      <link>https://arxiv.org/abs/2606.09030</link>
      <description>arXiv:2606.09030v1 Announce Type: new 
Abstract: Clinical early warning systems built on electronic health records, in which clinical observations are recorded as irregularly sampled medical time series (ISMTS), must deliver both calibrated risk scores for patient triage and interpretable rationales that clinicians can verify. Large Language Models (LLMs) have been explored for this task, yet they collapse graded clinical risk into overconfident binary predictions. This risk polarization undermines both calibration and cross-patient comparability. To address this, we propose TRIAGE, a framework that trains an LLM to generate dialectical reasoning over competing clinical outcomes by eliciting outcome-specific rationales. This dialectical formulation mitigates risk polarization, enabling a single LLM to yield continuous risk scores grounded in explicit clinical reasoning. Evaluated on three ISMTS benchmarks, TRIAGE achieves an average AUPRC improvement of 3.3% and reduces calibration error by 81% compared to the competitive baselines. An LLM-as-a-judge assessment further shows that our rationales surpass post-hoc explanations from the baseline by 20% in clinical reasoning quality. The source code is available at https://github.com/HyeongWon-Jang/TRIAGE .</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09030v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hyeongwon Jang, Gyouk Chu, Changhun Kim, Joonhyung Park, Hangyul Yoon, Eunho Yang</dc:creator>
    </item>
    <item>
      <title>Bridging the Agent-World Gap: Text World Models for LLM-based Agents</title>
      <link>https://arxiv.org/abs/2606.09032</link>
      <description>arXiv:2606.09032v1 Announce Type: new 
Abstract: Large language model (LLM)-based agents are increasingly used in interactive textual environments, from web navigation and code editing to tool use and long-horizon dialogue. Yet many remain largely reactive, mapping observations to actions without an explicit model of how these environments are structured and evolve. This motivates text world models (TWMs): transition models over textual states that, given a state and a candidate action, predict the resulting webpage, terminal output, API response, or user reply, thereby supporting planning, efficient learning, and principled evaluation. We systematically review text world models for LLM-based agents, organized around a formal framework and the agent lifecycle: (1) Foundations, defining text world models and characterizing them by state representation and grounding domain; (2) Construction, taxonomizing LLM-as-WM and code-as-WM paradigms and reviewing methods for building them; (3) Application, examining how world models support agents at training time through experience synthesis and at inference time through planning, verification, and adaptation; and (4) Evaluation, covering both evaluation of the world model itself and its use as an evaluation environment for agents. We aim to consolidate this rapidly developing area, clarify its design space, and highlight open challenges for future research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09032v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yixia Li, Hongru Wang, Peng Lai, Zhiwen Ruan, He Zhu, Youxin Zhu, Ganlong Zhao, Minda Hu, Yun Chen, Sibei Yang, Peng Li, Jeff Z. Pan, Jia Pan, Guanhua Chen, Yang Liu, Guanbin Li</dc:creator>
    </item>
    <item>
      <title>CRANE: Knowledge Editing for Reasoning MLLMs</title>
      <link>https://arxiv.org/abs/2606.09033</link>
      <description>arXiv:2606.09033v1 Announce Type: new 
Abstract: The emergence of reasoning multimodal large language models (MLLMs), which generate explicit chain-of-thought (CoT) reasoning before producing answers, has introduced a new challenge for knowledge editing: methods that appear successful under traditional metrics (teacher-forcing accuracy up to 100%) can fail severely when the model's reasoning process is examined (Grounded Success as low as 0%). We identify three failure modes: (1) Structural Collapse, where weight-modifying methods destroy the CoT format; (2) Cognitive Dissonance, where the model's reasoning chain actively rejects the injected edit fact based on visual evidence; and (3) Shallow Internalization, where methods succeed on exact queries but fail on rephrase or multi-hop variants. On reasoning MLLMs, these modes interact: methods that generalize (FT, LoRA) trigger format collapse, while methods without deep modification cannot generalize. To expose these failures, we propose a CoT-aware evaluation protocol and construct ReasonEdit-Bench, with conflict stratification, multi-level probes, and multi-hop portability tests.
  We propose CRANE, a retrieval-augmented framework that requires no per-edit parameter modification. CRANE combines a modality-aware dual-library retrieval system with a two-phase training strategy: Supervised Fine-Tuning (SFT) for structural initialization, followed by GRPO with a Cognitive Routing Reward that trains the model to arbitrate between visual priors and injected edit facts. On ReasonEdit-Bench, CRANE achieves 96.9% Grounded Success on conflict scenarios and 96.9% intermediate entity usage in multi-hop chains, with 97.6% text-locality and 68.1% image-locality Edit Independence. On the out-of-distribution MMEVOKE benchmark, CRANE reaches 87.0% under gold retrieval.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09033v1</guid>
      <category>cs.CV</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Han Huang, Hao Wang, Mengqi Zhang, Shu Wu, Qiang Liu, Liang Wang</dc:creator>
    </item>
    <item>
      <title>Leveraging NeRF-Rendered Images for 3D Gaussian Splatting</title>
      <link>https://arxiv.org/abs/2606.09034</link>
      <description>arXiv:2606.09034v1 Announce Type: new 
Abstract: Neural radiance field (NeRF) and 3D Gaussian splatting (3DGS) are two mainstream approaches for novel view synthesis. They often show complementary performance, i.e., 3DGS demonstrating faster rendering speed and NeRF demonstrating higher rendering quality. Motivated by this, we propose leveraging NeRF-rendered images for 3DGS. Specifically, we target street scenes and utilize a pre-trained street-specific NeRF method to produce training images for a target 3DGS method. In our 3DGS training, NeRF-rendered images are used to remove transient objects in street-level input views and to generate bird's-eye views as additional views, inheriting the higher-quality rendering of NeRF into 3DGS. We further incorporate a diffusion-based image enhancement to improve the image quality of the additional views. Experimental results on one synthetic and two real datasets demonstrate that our proposed method improves street-scene rendering while preserving the speed of 3DGS and the quality of NeRF.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09034v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mizuki Morikawa, Yuta Shimizu, Chunyu Li, Yusuke Monno, Masatoshi Okutomi</dc:creator>
    </item>
    <item>
      <title>A Multi-Agent System for IPMSM Design Optimization via an FEA-AI Hybrid Approach</title>
      <link>https://arxiv.org/abs/2606.09037</link>
      <description>arXiv:2606.09037v1 Announce Type: new 
Abstract: Interior permanent magnet synchronous motor (IPMSM) design requires balancing conflicting objectives and multi-physics constraints, while modern optimization workflows face three bottlenecks: manual problem setup, high finite element analysis (FEA) cost, and unreliable surrogate-based search in sparse or out-of-distribution regions. To address these limitations, we propose an end-to-end automated IPMSM design optimization framework that integrates retrieval-augmented generation (RAG) for structured problem definition with an uncertainty-aware FEA-AI hybrid optimization pipeline. A Design agent, connected to a motor textbook through RAG, provides domain-knowledge-based options and engineering tips, and compiles an optimization card and a design-of-experiments plan for AI-model training. A Training agent automates electromagnetic FEA, records geometry-validation and solver-failure logs, analyzes failed geometries using ANOVA-based data analysis and LLM reasoning, and invokes a Design Sampling agent to redefine the design space and generate additional samples. An Optimization agent performs GA-based search with uncertainty-driven switching: low-uncertainty candidates are evaluated by AI-surrogate inference, whereas high-uncertainty and reliability-critical Pareto-front or top-K candidates are corrected by high-fidelity FEA and reused for iterative retraining. The framework converts manual, experience-dependent configuration into a reproducible workflow that balances computational cost and prediction reliability. Experimental results under a matched high-fidelity FEA budget show that the proposed hybrid approach achieves better objective performance while maintaining low and further reducible predictive uncertainty, outperforming FEA-only search, which is limited by early budget exhaustion, and AI-only search, which converges to a low-confidence optimum.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09037v1</guid>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jinseong Han, Sunwoong Yang, Namwoo Kang</dc:creator>
    </item>
    <item>
      <title>Personalization Meets Safety:Mechanisms,Risks,and Mitigations in Personalized LLMs</title>
      <link>https://arxiv.org/abs/2606.09038</link>
      <description>arXiv:2606.09038v1 Announce Type: new 
Abstract: Large Language Models (LLMs) have enabled increasingly personalized interactions by adapting to users' preferences, contexts, and long-term histories. However, the mechanisms that enable personalization also expand the safety landscape in ways not systematically addressed by existing literature. Existing reviews typically focus either on personalization or safety, leaving their intersection largely unexplored. We present the first comprehensive, safety-aware review of personalized LLMs. We organize personalization along three dimensions-user representation, personalization paradigm, and evaluation-and introduce a unified taxonomy of safety risks. At the representation level, we analyze risks arising from diverse user representations. Across mainstream personalization paradigms, we delineate vulnerabilities inherent to prompting, retrieval augmentation, parameter fine-tuning, reinforcement learning, Mixture-of-Experts (MoE), pruning, agent frameworks, and multimodal personalization, and synthesize mitigation strategies across the model lifecycle. Beyond these fine-grained risks, we characterize paradigm-agnostic safety risks arising from personalized adaptation. We further summarize personalized datasets and evaluation methodologies. Through a case study of OpenClaw, we analyze deployment trends in personalized agent ecosystems. Our analysis reveals three structural inadequacies in existing research: safety is evaluated as user-invariant rather than relational, personalization techniques are analyzed in isolation rather than in composition, and evaluation frameworks cannot capture emergent long-term risks. By jointly examining personalized representations, personalization paradigms, safety risks, defenses, and evaluation methods, we provide a unified framework for developing safe personalized LLMs and highlight key directions for future research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09038v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Yanyan Luo, Xue Han, Ruiqiao Bai, Xin Huang, Yitong Wang, Qian Hu, Qing Wang, Chunxu Zhao, Jie Liu, Cong Geng, Lehao Xing, Pengwei Hu, Junlan Feng</dc:creator>
    </item>
    <item>
      <title>Agent Economics: An Entropy-Controlled Pluralistic Alignment Framework for Preventing Artificial Hivemind in Autonomous Agents</title>
      <link>https://arxiv.org/abs/2606.09039</link>
      <description>arXiv:2606.09039v1 Announce Type: new 
Abstract: This study proposes the Behavioral Protocol Framework (BPF), an entropy-controlled pluralistic alignment framework designed to address two critical challenges in autonomous agent economies: the hivemind effect arising from excessive strategic convergence among agents and the lack of transparency in autonomous decision-making processes. The proposed BPF consists of three core modules: Mentalizing-based Social Intelligence (MbSI) grounded in Theory of Mind (ToM), Pluralistic Alignment (PA), and a Verifiable Execution Kernel (VEK). These modules are organically integrated within a closed-loop architecture that governs the entire lifecycle of agent behavior, from decision-making and execution to verification and feedback. To evaluate the proposed framework, a simulation environment implemented in Python and a Streamlit-based user interface will be developed. Through empirical experimentation, the study aims to examine whether the entropy-control mechanism of the PA module can effectively preserve strategic diversity among agents and mitigate collective convergence, while the VEK module provides a comprehensive and transparent audit trail of the decision-making process. The anticipated results are expected to demonstrate that the proposed framework can simultaneously enhance the stability, efficiency, and trustworthiness of autonomous agent economies. Consequently, this research offers a practical approach for developing robust, transparent, and accountable agent-native economic systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09039v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Cheonsu Jeong</dc:creator>
    </item>
    <item>
      <title>Culturally-Aware AI for Cross-Boundary Community Learning: Undergraduate Innovation at the Intersection of Computation and Design</title>
      <link>https://arxiv.org/abs/2606.09041</link>
      <description>arXiv:2606.09041v1 Announce Type: new 
Abstract: Research on artificial intelligence in education (AIED) is rapidly expanding, yet technical progress often lacks human-centered grounding and adequate attention to cultural context. Community-Based Learning, a pedagogy rooted in social work, remains underrepresented in AIED research, particularly within Asia-Pacific contexts. This paper reports on cross-boundary Community-Based Learning where undergraduate students develop AI-enabled solutions for cultural heritage preservation and sustainable development. We examine how community-engaged computing operationalizes human-centered AIED across three dimensions: education, technology, and culture. We contribute a collaborative framework for culturally-aware AIED that fosters multi-stakeholder collaboration while widening participation by dissolving disciplinary silos between social work and computational science.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09041v1</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiaojiao Zhao, Weisheng Zhang, Jiawen Cai, Haibin Gao, Luyao Zhang</dc:creator>
    </item>
    <item>
      <title>Seamless Contraction-Control Framework for Unplanned Grid-Connected/Stand-Alone Transitions of Grid-Forming Inverters</title>
      <link>https://arxiv.org/abs/2606.09042</link>
      <description>arXiv:2606.09042v1 Announce Type: new 
Abstract: Unplanned grid-connected (GC)/stand-alone (SA) transitions commonly occur in AC microgrids during protection trips, manual breaker operation, or low-bandwidth supervisory communication. Under such unplanned transitions, a grid-forming inverter must support the local-load voltage in stand-alone operation and regulate the desired power/current injection in grid-connected operation. Existing P--Q droop-based seamless-transfer methods often rely on planned transition commands, supervisory islanding detection, or pre-synchronization interval, which may prevent timely voltage/current support during unplanned bidirectional transitions. To address this problem, this paper proposes a seamless contraction-control (SCC) framework for target dynamics. Using the SCC, contraction-based grid-connected current-control and stand-alone voltage-control laws are proposed. With the new control laws, the inverter achieves transient stability and converges to the target trajectory with a prescribed convergence rate. Furthermore, a breaker-status observer is proposed to infer the grid-connected/stand-alone mode from voltage measurements on both sides of the breaker, eliminating the need for a dedicated pre-synchronization interval or supervisory islanding detection process and enabling timely voltage/current support during unplanned transitions. Experimental results validate that the proposed method achieves stand-alone voltage support, stable grid-connected current injection under symmetrical/unsymmetrical grid-voltage sag and phase-jump disturbances, and unplanned bidirectional transitions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09042v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Qianxi Tang, Li Peng</dc:creator>
    </item>
    <item>
      <title>DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity</title>
      <link>https://arxiv.org/abs/2606.09043</link>
      <description>arXiv:2606.09043v1 Announce Type: new 
Abstract: Reward models trained from pairwise preferences often exploit superficial shortcut cues rather than learning true response quality. We propose DynaCF, a dynamic reweighting framework for mitigating shortcut learning in reward model training. Unlike static shortcut heuristics, DynaCF measures shortcut sensitivity online during optimization by applying semantics-preserving counterfactual perturbations and tracking the resulting margin shifts and preference flips under the current model. Samples with higher shortcut sensitivity are dynamically downweighted in the Bradley-Terry objective, encouraging the model to rely less on superficial patterns and more on task-relevant preference signals. Extensive experiments show that DynaCF consistently improves robustness in preference modeling.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09043v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Fengyuan Liu, Yongliang Miao, Zirui He, Yanguang Liu, Fei Sun, Mengnan Du</dc:creator>
    </item>
    <item>
      <title>Decoy-Calibrated Failure Audits for Language Models</title>
      <link>https://arxiv.org/abs/2606.09046</link>
      <description>arXiv:2606.09046v1 Announce Type: new 
Abstract: Useful audits reveal not only how often a model fails, but also where its failures concentrate. An auditor may test many candidate explanations: long inputs, indirect questions, distracting evidence, or combinations of these factors. The risk is selection. The largest observed effect may reflect a real failure mode, or it may simply be the best result among many tried. We introduce Janus, a procedure for deciding when a proposed error explanation is credible enough to report. The goal is not to generate new explanations, but to decide which ones hold up. The auditor starts with a fixed model, a labeled evaluation set, and a frozen list of candidate explanations, which we call descriptors. Janus scores each descriptor by its error-rate lift, then compares real descriptors with fake ones that have the same frequencies but are randomly assigned to examples. A descriptor is confirmed only if it beats this decoy floor on the data used for discovery and then repeats on separate held-out data. In a controlled audit of multi-table lookup tasks, Janus identifies the planted failure, confirming long-chain descriptors and their interactions. The LLM often stops partway through the lookup chain instead of reaching the final answer. On two public benchmarks, MuSiQue and LongBench v2, the SliceLine baseline flags plausible high-error pockets, but Janus confirms none of them. Ablations show why both safeguards matter. On LongBench v2, an uncalibrated fixed threshold reports 20 descriptors, the decoy floor leaves one, and the holdout check rejects the last one after its lift shrinks from 0.36 to 0.05. The resulting principle separates proposing explanations from reporting them. Candidates may come from any source, but only those that beat decoys and replicate on fresh data become audit findings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09046v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vyzantinos Repantis, Ameya Gawde, Harshvardhan Singh</dc:creator>
    </item>
    <item>
      <title>Families of Control-Cost-Parametrized Inverse-Optimal Universal Stabilizers</title>
      <link>https://arxiv.org/abs/2606.09047</link>
      <description>arXiv:2606.09047v1 Announce Type: new 
Abstract: A classical universal stabilization formula offers the practitioner no design freedom: it is a single, parameter-free object. We introduce a cost-parametrized family of stabilizing feedback laws, where (1) the user chooses a function that serves as the running cost on control in an inverse-optimal cost functional, and (2) obtains, through a formula, a nonlinear "expander" of a pre-existing universal controller, which solves an infinite-horizon optimal control problem with a meaningful cost on the state. The cost-to-expander formula is a three-step construction, involving, inter alia, cost differentiation and function inversion-overall, a nonlinear infinite-dimensional operator. The cost-to-expander operator is proven Lipschitz, which enables uniform neural operator approximation of the entire family and supports both offline performance exploration and online adaptation. Semiglobal practical asymptotic stability and second-order suboptimality bounds are established under the approximation. The operator learning and its use in semiglobal stabilization are illustrated numerically. We call the result 'half-direct-optimal' because the paper's design is less than a general 'direct optimal' (HJB-inducing) control, but more than the fully inverse optimal, since the user performs minimization for an arbitrary given cost on control. The dual to the half-direct problem we solve is the problem in which the cost on the state is arbitrary and given. This dual problem is easier and outside of the scope of the paper.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09047v1</guid>
      <category>eess.SY</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Miroslav Krstic, Luke Bhan</dc:creator>
    </item>
    <item>
      <title>Beyond Convolution: Advancing Hypergraph Neural Networks with Hypergraph U-Nets</title>
      <link>https://arxiv.org/abs/2606.09051</link>
      <description>arXiv:2606.09051v1 Announce Type: new 
Abstract: Convolutions have successfully transitioned from image processing to the complex realm of non-Euclidean higher-order domains, particularly in hypergraphs. Despite the success in convolution, the exploration of a popular architecture named U-Net remains largely unexplored for hypergraph data due to the lack of well-defined pooling and unpooling operations. This work pioneers the study of U-Net architectures for hypergraph data, addressing the critical challenge of designing effective pooling and unpooling operations that retain maximal structural information from the input hypergraph. Motivated by hierarchical clustering, we propose to construct the pooling and unpooling operators all at once by cutting the clustering dendrogram at different granularities, named the Parallel Hierarchical Pooling (PHPool) and Unpooling (PHUnpool) operators. Unlike existing pooling methods that risk local structural damage through a sequential learning procedure, our PHPool operators are designed in a global and parallel manner to ensure fidelity to the original hypergraph structure with efficient computation while the PHUnpool operators are tailored to perform inverse operations of the PHPools for hypergraph reconstruction. We validate our model through hypergraph reconstruction simulation, hypergraph classification, and node-level anomaly detection, where it demonstrates superior performance over existing state-of-the-art graph and hypergraph deep learning methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09051v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Fuli Wang, Wei Qian, Daniel L. Lau, Gonzalo R. Arce</dc:creator>
    </item>
    <item>
      <title>INFUSER: Influence-Guided Self-Evolution Improves Reasoning</title>
      <link>https://arxiv.org/abs/2606.09052</link>
      <description>arXiv:2606.09052v1 Announce Type: new 
Abstract: Self-evolution offers a scalable path to stronger reasoning: a pretrained language model improves itself with only minimal external supervision. Yet existing methods either depend on extensively curated or teacher-generated training data, or, when the generator runs unsupervised, reward it by a difficulty heuristic that need not improve the solver. We introduce INFUSER, an iterative co-training framework with two co-evolving roles: a Generator that drafts questions and reference golden answers from a pool of unstructured, automatically collected documents, and a Solver that improves by training on them. The solver is trained with standard correctness rewards against the generator-provided answers, while the generator is rewarded by an optimizer-aware influence score that measures whether each proposed question would actually improve the solver on the target distribution. Because this continuous, noisy influence score is poorly served by standard GRPO, we propose DuGRPO, a dual-normalized variant of GRPO, for generator training. Together, these turn the document pool into an adaptive curriculum that favors questions useful to the current solver, not just hard ones. On Qwen3-8B-Base, INFUSER outperforms strong self-evolution baselines with over 20% relative improvement on Olympiad and SuperGPQA benchmarks, and an 8B INFUSER co-evolving generator outperforms a frozen 32B thinking generator on math and coding. Ablations confirm each design choice is necessary, and two extensions, applying INFUSER to an instruction-finetuned anchor and augmenting it with rule-verifiable RLVR data, further demonstrate the flexibility and generalizability of the framework. Code is available at https://github.com/FFishy-git/INFUSER.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09052v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.GT</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Siyu Chen, Miao Lu, Beining Wu, Heejune Sheen, Fengzhuo Zhang, Shuangning Li, Zhiyuan Li, Jose Blanchet, Tianhao Wang, Zhuoran Yang</dc:creator>
    </item>
    <item>
      <title>MilliVid: Hierarchical Latents for Long-Range Consistency in Video Generation</title>
      <link>https://arxiv.org/abs/2606.09056</link>
      <description>arXiv:2606.09056v1 Announce Type: new 
Abstract: Video generative models have become increasingly powerful, but long-range consistency remains challenging to achieve because even a few dozen frames require impractically long transformer sequence lengths. We show that this issue can be mitigated by generating video using coarse-to-fine rollout within a multi-scale token space. Our approach is simple: first, we pre-train an autoencoder that compresses each frame into a hierarchy of tokens, with levels ranging from the typical latent resolution to only a handful of tokens per frame. The coarsest levels capture the most consequential information, such as scene layout and semantics, while finer levels add high-frequency appearance and texture. Then, we train a video diffusion model to generate these tokens using coarse-to-fine rollout. By carefully controlling the level of detail at which frames are generated and used as context during each rollout step, we are able to preserve long-range consistency in geometry and object permanence while spending less compute on the long-range consistency of less perceptually relevant details. We validate this approach using a custom dataset of long Minecraft videos, where it produces substantially more consistent rollouts compared to existing baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09056v1</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ishaan Preetam Chandratreya, David Charatan, Basile Van Hoorick, Sergey Zakharov, Vitor Guizilini, Phillip Isola, Vincent Sitzmann</dc:creator>
    </item>
    <item>
      <title>Stage-1 Controls the Entropy Regime, Not the Outcome</title>
      <link>https://arxiv.org/abs/2606.09059</link>
      <description>arXiv:2606.09059v1 Announce Type: new 
Abstract: Two-stage post-training -- a Stage-1 warm-start (supervised fine-tuning, SFT, or on-policy distillation, OPD) followed by Stage-2 reinforcement learning (RL) -- is increasingly used for vision-language models (VLMs). We ask what Stage-1 actually controls in a small-data study using Qwen2.5-VL-7B with a same-modality 72B VLM teacher for OPD. First, the three warm-starts reach a narrow $53$--$54\%$ band on Geometry3K internal validation, consistent with the narrow range reported by recent specialized methods; this setup provides little evidence that Stage-1 changes the in-domain endpoint. Second, a matched-recipe, early-stopped SFT improves out-of-domain MathVista by $+2.1$ points, reversing the $-9.5$-point drop of an over-trained variant. The clearest difference is the \emph{entropy regime}: OPD enters RL with substantially higher policy entropy than either SFT initialization, and the separation remains visible through the available trajectories. At the in-domain initialization, OPD also has higher answer diversity and pass@16 ($+2.0$ to $+5.2$ points over SFT), although problem-level bootstrap intervals show that the smaller contrast is uncertain. The advantage is absent after RL (endpoint pass@16 values within $1.1$ points) and on MathVista (six models within $1.2$ points). Our contribution is therefore a bounded empirical characterization: Stage-1 is strongly associated with the entropy regime in this setup, but the downstream payoff is small, localized, and not evidence that OPD is a better RL warm-start.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09059v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jianxiong Shen</dc:creator>
    </item>
    <item>
      <title>ATTAIN: Automated Exploit Failure Analysis through Trace-Driven Diff Analysis</title>
      <link>https://arxiv.org/abs/2606.09060</link>
      <description>arXiv:2606.09060v1 Announce Type: new 
Abstract: Exploits are widely used to check whether library vulnerabilities appear in different versions and to mark affected version ranges. Exploit-based checks sometimes fail because exploits stop running on many versions after API or environment changes. Commit-based methods, such as SZZ-style analysis, sometimes miss the right introduce commits and spread labels incorrectly along long version chains. These problems leave many affected versions unlabeled or wrongly labeled and make manual exploit failure analysis very expensive and impractical at scale.
  We present ATTAIN, a trace-driven diff analysis framework with three modules to assess vulnerability presence across evolving library versions. The modules are trace construction, diff exploration, and affected-version judgment. The trace construction module executes an exploit across historical library versions and compares their behaviors to capture cross-version execution divergences. Using these divergences, the diff exploration module guides an LLM through a finite-state tool loop to autonomously search over version changes and collect vulnerability-relevant diff hunks. The affected-version judgment module reasons over the collected evidence to determine whether the vulnerability exists in each version and outputs the affected version range.
  We evaluate ATTAIN on an extensive dataset comprising 224 CVEs and 25,943 library versions across 128 libraries. ATTAIN achieves an F1-score of 93.24%, outperforming the commit-based methods V-SZZ and LLM4SZZ by 116.28% and 33.30%, respectively. ATTAIN uses short tool-guided prompts and a fixed number of iterations, keeping token usage low. It matches or surpasses existing methods on frequent CWE types, including cases where exploit runs fail for non-vulnerability reasons or commit messages do not clearly delimit affected versions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09060v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Xinwei Mao, Zirui Chen, Xing Hu, Xin Xia</dc:creator>
    </item>
    <item>
      <title>Fairness-Aware and Latency-Controllable Scheduling for Chunked-Prefill LLM Serving</title>
      <link>https://arxiv.org/abs/2606.09061</link>
      <description>arXiv:2606.09061v1 Announce Type: new 
Abstract: As large language models (LLMs) are increasingly deployed with highly heterogeneous workloads, chunked-prefill execution has emerged as a mainstream serving architecture. Balancing scheduling fairness and latency stability in such environments is critical; otherwise, severe head-of-line blocking and request starvation will degrade user experience. However, existing systems rely on rigid First-Come, First-Served (FCFS) policies and static token budgets, leading to fairness degradation and unpredictable latency jitter. To address these issues, we propose a fairness-aware and latency-controllable scheduling framework for chunked-prefill LLM engines. Specifically, we design a lightweight aging-based scheduling policy that dynamically calculates priorities using accumulated waiting time and remaining prefill work. Furthermore, we develop Latency-Prediction-Based Request Scheduling (LPRS) and Active Prefill Control (APC) to replace static budgets with target-time constraints and actively regulate prefill concurrency. We evaluated our scheduling framework on NVIDIA GPUs and Ascend accelerators using real-world workloads. Results show the aging policy reduces mean end-to-end latency by over 10\% compared to FCFS. Moreover, LPRS and APC significantly reduce P99 tail latency and suppress prefill fragmentation, confirming that the structural prefill control and the temporal latency constraints are fundamentally complementary. All codes have been released in Github.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09061v1</guid>
      <category>cs.DC</category>
      <category>cs.PF</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haoxin Liu, Jiayi Wang, Yueshen Xu, Rui Li</dc:creator>
    </item>
    <item>
      <title>Security-First Approach to API Pipeline Development with Zero-Trust Architecture</title>
      <link>https://arxiv.org/abs/2606.09062</link>
      <description>arXiv:2606.09062v1 Announce Type: new 
Abstract: Modern enterprises face an accelerating onslaught of API-targeted threats amid a rapidly expanding attack surface. Record volumes of software vulnerabilities continue to accelerate dramatically, with 28,818 CVEs disclosed in 2023 (a 38% jump from 2022) and 40,009 CVEs in 2024 (another 38% increase), while the average time-to-exploit (TTE) of new flaws shrank to mere days (approximately 5 days in 2023, down from 32 days in 2021). At the same time, API usage dominates web traffic and has become a primary vector for breaches - 99% of organizations experienced API security incidents in the last year, with 22% suffering actual data breaches via APIs (based on industry vendor research). This paper proposes a comprehensive "security-first" framework for API pipeline development, leveraging Zero-Trust Architecture principles within DevSecOps practices to counter these trends. We introduce a five-pillar approach encompassing Governance &amp; Planning, Secure Design, Continuous Testing, Pipeline Controls, and Runtime Protection, aligned with industry standards (OWASP API Security Top 10 2023, NIST Secure Software Development Framework) and recent cybersecurity advisories. The results show significant improvements in vulnerability mitigation and breach prevention (e.g., 30% reduction in security incidents and 40% fewer post-release vulnerabilities in representative case studies), highlighting the positive impact of proactive security integration. The paper concludes with a discussion on implementation challenges, the evolving threat landscape, and recommendations for organizations to adopt a security-first pipeline with Zero-Trust to fortify API development against current and future threats.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09062v1</guid>
      <category>cs.CR</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mahima Agarwal, Keshav Ranjan</dc:creator>
    </item>
    <item>
      <title>See More, Think Deeper: Query-Expanded Visual Evidence and Answer-Clue Guided Reflection for Long Video Understanding</title>
      <link>https://arxiv.org/abs/2606.09064</link>
      <description>arXiv:2606.09064v1 Announce Type: new 
Abstract: Recent advances in Video Large Language Models (Video-LLMs) have enabled performance on long-video understanding tasks. However, existing methods still face two key limitations: evidence acquisition often relies on a single search intent, and answer generation lacks an effective visual feedback mechanism. To address these limitations, we propose \textbf{CoVER}, a Comprehensive Visual Evidence and Reflection framework for long-video understanding. CoVER enables Video-LLMs to \textbf{See More} by dynamically gathering query-expanded visual evidence, and \textbf{Think Deeper} by verifying draft answers with effective answer-specific visual feedback. Together, these mechanisms shift long-video understanding from answer-centric generation to evidence-centric and visually verifiable reasoning. Experimental results show that CoVER-7B substantially outperforms models with the same parameter scale and even surpasses state-of-the-art closed-source models on certain metrics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09064v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shuning Wang, Zhiheng Wu, YiNuo Lu, Naiming Liu, Chen Jia, Bowen Liu, Shuo Nie, Weijie Zhu, Yumeng Zhang</dc:creator>
    </item>
    <item>
      <title>OnlyDense: Reduced-Order Modeling for Lagrangian simulation</title>
      <link>https://arxiv.org/abs/2606.09065</link>
      <description>arXiv:2606.09065v1 Announce Type: new 
Abstract: In science and engineering, Lagrangian simulation methods such as Smooth Particle Hydrodynamics (SPH) or Material Point Method (MPM) are often employed to study the behavior of dynamic systems. However, these methods can be prohibitively computationally expensive, particularly when simulating multi-scale spatial or temporal phenomena, e.g., void growth and coalescence within macro-scale geometries, structural failure of spacecraft components resulting from hypervelocity impact of space debris particles, etc. In contrast to graph-based methods, where the state of the system is understood as a discrete set of particles, we propose a learning framework for scalable representation and dynamics modeling of massive particle systems by treating the system state as a function and its evolution as a trajectory in Hilbert space. Rather than representing the state as a discrete set of particles or embedding it in a nonlinear latent manifold, we approximate the state space with a linear subspace spanned by learned neural basis functions. This parameterization enables direct projection to obtain latent coefficients and explicit access to the basis functions, avoiding optimization over a nonlinear latent space. The resulting representation admits a natural interpretation: latent variables correspond to coefficients in Hilbert space, and basis functions correspond to spatial modes, analogous to Proper Orthogonal Decomposition. The framework thus unifies classical projection-based reduced-order modeling with modern deep learning, while remaining invariant to the number of discretization points. Experiments on large-scale SPH simulations with over one million particles, including dynamic events with extreme deformation and fragmentation, demonstrate that the proposed method accurately reconstructs and predicts dynamics, achieving an R$^2$ score above $0.99$ with as few as $32$ basis functions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09065v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tu Do, Shannon Ryan, Santu Rana</dc:creator>
    </item>
    <item>
      <title>Emergent Misalignment Can Be Induced by Sycophancy and Reversed via Alignment Gating</title>
      <link>https://arxiv.org/abs/2606.09068</link>
      <description>arXiv:2606.09068v1 Announce Type: new 
Abstract: Prior work has shown that fine-tuning large language models on malicious or incorrect outputs in narrow domains can induce broad misalignment and harmful behavior, a phenomenon known as emergent misalignment. However, efficient methods for reversing such misalignment remain limited. In this work, we make two contributions. First, we identify sycophancy fine-tuning, i.e., training models to passively agree with users' incorrect opinions, as a previously underexplored driver of emergent misalignment, and show that it induces broad and severe misaligned behavior. Second, we propose Alignment Gating, an efficient method for reversing emergent misalignment that inserts learnable and controllable gates into the model during fine-tuning. Through fine-tuning, these gates learn to identify the internal representations responsible for unsafe responses. Thus, amplifying or suppressing these representations then exacerbates or mitigates EM, respectively. We further find that alignment gating module exhibits strong generalization: gating weights obtained from narrow-domain fine-tuning substantially suppress broad-domain misaligned behavior while preserving the model's general capabilities.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09068v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sicheng Wang, Xiangyang Zhu, Han Wang, Zongrui Wang, Yuan Tian, Kaiwei Zhang, Kaiyuan Ji, Qi Jia, Guangtao Zhai</dc:creator>
    </item>
    <item>
      <title>REFLECT: Intervention-Supported Error Attribution for Silent Failures in LLM Agent Traces</title>
      <link>https://arxiv.org/abs/2606.09071</link>
      <description>arXiv:2606.09071v1 Announce Type: new 
Abstract: Large language model (LLM) agents now solve complex tasks through long plan-and-execution traces, yet the ability to locate errors in a completed traces still lags far behind, especially in the \emph{silent failure} regime. Existing approaches predict suspect steps via classifiers or LLM judges, or recover correct answers via retry, but none feed the intervention outcome back to \emph{refine the attribution itself}. We propose \methodname, a method that closes this gap by diagnosing a candidate error step, testing it through controlled replay with a diagnosis-specific patch, and using the verified outcome flip as contrastive evidence to refine the final attribution. Across four localization benchmarks spanning multi-hop reasoning across domains, \methodname achieves the highest localization accuracy among same-auditor methods across all four benchmarks, with the largest gains on structured tool-use traces, while providing actionable localization even when ground-truth answers are unavailable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09071v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Xiaofeng Lin, Yingxu Wang, Tung Sum Thomas Kwok, Daniel Guo, Sahil Arun Nale, Charles Fleming, Guang Cheng</dc:creator>
    </item>
    <item>
      <title>A Unifying Lens on Reward Uncertainty in RLHF</title>
      <link>https://arxiv.org/abs/2606.09073</link>
      <description>arXiv:2606.09073v1 Announce Type: new 
Abstract: Reinforcement learning from human feedback (RLHF) is bottlenecked by \emph{reward hacking}, where the policy exploits errors in a proxy reward model (RM) and produces high RM scores without genuine quality gains. A natural mitigation is \emph{pessimism}: penalizing rewards in regions where the RM is uncertain. However, standard scalar RMs provide no principled notion of uncertainty. We argue that the right object is a \emph{distributional} reward model $p(r\mid x,y)$. Under either a Bayesian inference or a KL-distributionally robust optimization (KL-DRO) lens, the KL-regularized RLHF objective admits a closed-form effective reward $\tilde r(x,y) = \pm\beta\log\mathbb{E}_p[e^{\pm r/\beta}]$. The pessimistic branch unifies the prior heuristics for RM ensemble aggregation: mean aggregation, worst-case optimization (WCO), and uncertainty-weighted optimization (UWO) all emerge as limits or truncations of this single expression. This also clarifies the implicit assumptions of each existing rule.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09073v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ely Hahami, Yoel Zimmermann, Ray Zhou, Jack Benarroch Jedlicki</dc:creator>
    </item>
    <item>
      <title>REFINE: Super-efficient 3D Gaussian Splatting Pruning via Rendering-Free Primitive Importance</title>
      <link>https://arxiv.org/abs/2606.09074</link>
      <description>arXiv:2606.09074v1 Announce Type: new 
Abstract: Existing pruning methods for 3D Gaussian splatting (3DGS) suffer from either severe quality degradation or prohibitive computational overhead. In this paper, we propose REFINE, a highly accelerated 3DGS pruning framework centered on a novel rendering-free primitive importance metric. Our approach leverages an analytically approximated, rendering-aware Hessian field to quantify the expected perceptual error induced by the removal of individual primitives. By modeling the joint modulation of visibility, projection geometry and the content adaptive hyperparameter, we entirely bypass costly forward rendering passes and derive an anisotropic perceptual weight field that serves as a high-fidelity proxy for primitive importance. Extensive experiments across multiple benchmark datasets demonstrate that REFINE maintains highly competitive rendering quality while achieving an unprecedented $3,000\times$ reduction in pruning-related computational complexity compared to state-of-the-art pruning methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09074v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhang Chen, Shuai Wan, Mengting Yu, Fuzheng Yang, Junhui Hou</dc:creator>
    </item>
    <item>
      <title>Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions</title>
      <link>https://arxiv.org/abs/2606.09076</link>
      <description>arXiv:2606.09076v1 Announce Type: new 
Abstract: Reward models are central to text-to-image post-training, but visual preference is subjective and better represented as a distribution over rubric scores than as a deterministic scalar. Existing scalar, score-token, and pairwise reward models over-compress uncertainty and fine-grained score differences, while reasoning-based generative rewards provide stronger judgments but are costly to deploy and difficult to use as direct optimization signals. We propose Z-Reward, a teacher-student reward modeling framework that decouples reasoning-heavy judgment from efficient reward deployment. The teacher is a large VLM that uses reasoning to infer rubric-aligned score distributions, and is trained with Group-wise Direct Score Optimization (GDSO), which combines policy-gradient rewards from distribution expectations with direct pointwise and pairwise supervision on score distributions and score gaps. The student is trained with Reasoning-Internalized Score Distillation (RISD), which transfers the teacher's reasoning-conditioned score distribution into a compact VLM without requiring explicit reasoning chains at inference time. On our internally annotated evaluation set, the 27B GDSO teacher reaches 89.6% human preference accuracy, outperforming SFT, RewardDance, and GRPO, while the 9B RISD student reaches 88.6%, outperforming the OPD baseline and closely matching the larger teacher. We further show that Z-Reward can serve as a differentiable reward signal for text-to-image optimization, yielding a 41.3% net human-preference improvement over the SFT baseline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09076v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xin Jin, Huanqia Cai, Zhen Li, Zechao Zhan, Dengyang Jiang, Aiming Hao, Yuming Jiang, Chunle Guo, Peng Gao, Ming-Ming Cheng, Steven C. H. Hoi</dc:creator>
    </item>
    <item>
      <title>Neural Legendre-Fenchel transform with Hessian Preconditioning</title>
      <link>https://arxiv.org/abs/2606.09077</link>
      <description>arXiv:2606.09077v1 Announce Type: new 
Abstract: The Legendre-Fenchel (LF) transform is a fundamental tool in convex analysis and machine learning that maps lower semi-continuous functions to their convex conjugates. In practice, when closed-form formula are not available for expressing convex conjugates of given functions, one must approximate them using various techniques. One recent such versatile numerical method is the deep Legendre transform method which relies on neural networks although it remains challenging particularly for tackling ill-conditioned functions. This work builds on the reformulation of the LF transform as a projective polarity. A notable property of this framework is its affine invariance. We leverage this affine invariance to introduce a Hessian-based preconditioning strategy. Specifically, we apply an affine deformation around a minimizer so that the second-order Taylor approximation of the function coincides with the canonical paraboloid, whose conjugation map is the identity. A residual network initialized near the identity can then learn this simplified mapping, while the original conjugation map is recovered through the inverse deformation. The proposed preconditioning incurs only a modest computational overhead, consisting of a single eigendecomposition during initialization and two matrix-vector multiplications per query. Experiments on a diverse set of convex functions, including high-dimensional benchmarks, demonstrate improved convergence rates and enhanced numerical accuracy of the conjugation, with particularly significant gains for ill-conditioned problems. Finally, we discuss the scope of applicability of our proposed method and highlight several of its limitations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09077v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Basile Plus-Gourdon, Frank Nielsen</dc:creator>
    </item>
    <item>
      <title>The Hidden Bias of Process Reward Models:PRISM for Rewarding the Right Reasoning</title>
      <link>https://arxiv.org/abs/2606.09078</link>
      <description>arXiv:2606.09078v1 Announce Type: new 
Abstract: Process Reward Models (PRMs) improve credit assignment for reasoning by providing step-level feedback. However, we identify a hidden bias in PRMs caused by severe imbalance in step-level training data. Standard cross-entropy training amplifies this bias, causing PRMs to overcredit plausible but incorrect steps and produce high false-positive rates. We show that these false positives have an asymmetric downstream effect: false negatives mainly slow exploration, whereas false positives actively steer Best-of-N selection, guided decoding, and policy optimization toward flawed reasoning. This suggests that PRM training should shift from pointwise label fitting to reliable relative comparisons. To address this, we propose PRISM (Precision Ranking for Improved Step Modeling), a policy-aware PRM training framework that learns from contrastive step-level comparisons and hard negatives generated by a temporal lookahead strategy, requiring no new human labels. We further use a difficulty-aware curriculum to optimize the contrastive step margin. Across PRMBench and ProcessBench, PRISM substantially reduces false positives (22% on PRMBench) and improves macro F1 over strong discriminative PRMs. When applied to policy optimization and search tasks, including guided decoding and Best-of-N selection, it consistently improves accuracy (up to 22% for guided decoding and 33% for Best-of-N) and robustness. More broadly, trustworthy process supervision is not just about assigning high rewards, but about rewarding the right reasoning for the right reasons.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09078v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aakriti Agrawal, Souradip Chakraborty, Armin Saghafian, Nihal Sharma, Rizal Fathony, Nam H Nguyen, C. Bayan Bruss, Amrit Singh Bedi, Furong Huang</dc:creator>
    </item>
    <item>
      <title>FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention</title>
      <link>https://arxiv.org/abs/2606.09079</link>
      <description>arXiv:2606.09079v1 Announce Type: new 
Abstract: Conventional LLMs keep the full KV cache loaded during decoding, causing a severe GPU memory bottleneck for ultra-long context serving. In this report, we propose Lookahead Sparse Attention (LSA), a novel inference paradigm powered by a Neural Memory Indexer built upon the DeepSeek-V4 architecture. Rather than passively attending to all historical tokens, LSA proactively predicts future context demands and preserves only the query-critical KV chunks in the GPU memory. Crucially, we instantiate this architecture via a backbone-free decoupled training strategy. By formulating the indexer as a standard dual-encoder architecture, we train it independently using standard retrieval training frameworks without ever loading the massive backbone model into GPU memory.
  We demonstrate that this "less is more" paradigm significantly maximizes serving efficiency while acting as an effective attention denoiser in tasks that rely on long-term global memory. Across primary long-context evaluation suites (e.g., LongBench-v2, LongMemEval, and RULER), FM-DS-V4 compresses the average physical KV cache footprint down to merely 13.5% of the full-context baseline, while consistently preserving or slightly elevating downstream accuracy (+0.6% absolute margin on average). Crucially, at extreme 500K scales, FlashMemory suppresses the physical KV cache overhead by over 90% without destabilizing the backbone's core reasoning capacities.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09079v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Yan Wang, Qifan Zhang, Jiachen Yu, Tian Liang, Dongyang Ma, Xiang Hu, Zibo Lin, Chunyang Li, Zhichao Wang, Jia Li, Yujiu Yang, Haitao Mi, Dong Yu</dc:creator>
    </item>
    <item>
      <title>Beyond FLOPs: Benchmarking Real Inference Acceleration of LLM Pruning under a GEMM-Centric Taxonomy</title>
      <link>https://arxiv.org/abs/2606.09080</link>
      <description>arXiv:2606.09080v1 Announce Type: new 
Abstract: Pruning has emerged as a dominant paradigm for accelerating large language model (LLM) inference, spanning a broad spectrum of methods that remove computation across tokens, layers, heads, dimensions, and attention patterns. Despite sharing the same objective, these pruning approaches induce fundamentally different execution behaviors, causing realized speedups to depend heavily on hardware and kernel implementations. Consequently, the practical acceleration benefits of different pruning families remain poorly understood. In this work, we introduce a GEMM-centric taxonomy that reorganizes existing pruning methods according to the logical \textbf{M}, \textbf{N}, and \textbf{K} dimensions of general matrix multiplication (GEMM). Leveraging this abstraction, we build a unified benchmarking framework that enables implementation-consistent comparison across the pruning design space and systematically characterizes the acceleration--quality Pareto frontier. Our results show that static depth pruning remains the strongest Pareto-optimal baseline and stays closest to its theoretical acceleration upper bound in memory-bounded scenarios. During prefill, the frontier transitions from static depth at low quality loss (0\%--4\%), to dynamic depth at moderate loss (5\%--16\%), and finally to static width pruning at higher loss levels (17\%--26\%). These findings establish the first unified view of the practical limits of pruning-based LLM acceleration and provide guidance for future pruning research.\footnote{Code is available at https://github.com/EIT-NLP/LLM-Pruning/tree/main/PruningInferSim}</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09080v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haozhe Hu, Hao Wu, Anhao Zhao, Longwei Ding, Peiran Yin, Yunpu Ma, Xiaoyu Shen</dc:creator>
    </item>
    <item>
      <title>Edge-Constrained UAV Small-Object Detection with P2 Enhancement and Quantum-Inspired Lightweight Structure Search</title>
      <link>https://arxiv.org/abs/2606.09081</link>
      <description>arXiv:2606.09081v1 Announce Type: new 
Abstract: Unmanned aerial vehicle (UAV) object detection requires compact detectors that retain small-object details under onboard computation and memory constraints. Repeated downsampling inlightweight networks weakens shallow spatial information, while manually adding attention orfusion modules may increase cost without stable gains. This study analyzes YOLOX-Nano underedge-deployment constraints by combining a P2 high-resolution detection branch with a quantum-inspired evolutionary algorithm (QIEA) for lightweight structure screening. The search space isdefined by lightweight priority and task specificity, and the evaluation jointly considers accuracy,floating-point operations (FLOPs), latency, memory consumption, and recall. On VisDrone, theP2 branch increases APamall by 31.10% over the YOLOX-Nano baseline. Compared with NanoDet-Plus with similar model size, YOLOX-Nano+-P2 improves APs0.ss by 17.5% and APamal by 44.9%.The QIEA-selected candidate obtains the highest Recallso, but +P2 remains the strongest AP-oriented variant after full training. Full 100-epoch verification of Random-best, GA-best, andSA/QUBO-best candidates further shows that proxy rankings do not necessarily transfer to finalAPse9s. These results support using P2 as the main small-object enhancement path and QIEA as alightweight tool for candidate screening and accuracy-cost analysis. The source code, configurationfiles, diagnostic scripts, and summarized results are available at https://github.com/Ming23233/UAV-QIEA-Edge-Detection</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09081v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wuming Lei, Yanbin Gao, Mingyan Sun, Xiaobin Li, Xuechen Liang</dc:creator>
    </item>
    <item>
      <title>Teach Multimodal Recommendation Model to See via Personalized Visual Extraction and Adaptive Learning</title>
      <link>https://arxiv.org/abs/2606.09082</link>
      <description>arXiv:2606.09082v1 Announce Type: new 
Abstract: Multimodal sequential recommendation (MSR) incorporates textual and visual information to improve recommendation quality. However, recent studies and our empirical analysis show that visual features are often underutilized, thereby contributing far less than textual signals. We attribute this issue to two factors: insufficient visual representation learning (pretrained encoders fail to capture preference-relevant cues) and unbalanced visual-text optimization (textual features dominate the learning process). To address these issues, we propose Teach Multimodal Recommendation Model to See via Personalized Visual Extraction and Adaptive Learning (REVEAL), a plug-and-play framework that enhances visual representation learning and cross-modal optimization without modifying the original recommendation backbone. REVEAL consists of Feedback-Guided Visual Extraction (FVE), which refines prompt-guided visual extraction through task-level feedback, and Adaptive Visual Learning (AVL), which dynamically reweights visual learning to alleviate modality imbalance. Experiments on multiple real-world datasets and MSR backbones demonstrate that REVEAL consistently improves recommendation performance. Further analysis shows that these gains arise from more effective attention to preference-relevant visual regions and better visual utilization during training. The code is available at https://github.com/YutongLi2024/REVEAL.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09082v1</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yutong Li, Xinyi Zhang, Ziyi Ye, Daoguo Dong, Yu-gang Jiang</dc:creator>
    </item>
    <item>
      <title>Context-Fractured Decomposition Attacks on Tool-Using LLM Agents: Exploiting Artifact Provenance Gaps</title>
      <link>https://arxiv.org/abs/2606.09084</link>
      <description>arXiv:2606.09084v1 Announce Type: new 
Abstract: Tool-using LLM agents interact with the world through actions that persist state in artifacts (e.g., workspace files or logs). Consequently, jailbreak defenses must reason about cross-step composition rather than isolated text. Yet most existing attacks and defenses, including ``multi-turn'' jailbreaks such as Crescendo and Tree of Attacks,still assume a single contiguous conversation visible to the defender. This assumption breaks down in real agent pipelines, where enforcement is fragmented across tools, modules, and time, and where artifact provenance is often not tracked. We operationalize a deployment failure mode for tool-using LLM agents, the \emph{provenance gap}, and study reproducible triggers for it: \emph{Context-Fractured Decomposition} (CFD), a family of cross-context multi-step jailbreaks that preserve benign-looking intermediate artifacts from an early interaction and elicit harmful behavior much later, potentially in a different agent instance or workflow stage, via individually innocuous tool actions whose risk emerges only under delayed artifact-mediated composition. We instrument the failure mode with trace-level diagnostics and outline a verifiable mitigation direction (provenance lineage tagging). Across agent-system jailbreak benchmarks, CFD improves success rates by up to 28.3 percentage points over state-of-the-art baselines, even against strong single-turn judges. Disclaimer: This paper contains examples of harmful or offensive language.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09084v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Xiaofeng Lin, Yukai Yang, Daniel Guo, Sahil Arun Nale, Charles Fleming, Guang Cheng</dc:creator>
    </item>
    <item>
      <title>DynaOD: Dynamic Origin-Destination Flow Generation with Discrete-to-Continuous Temporal Semantic Modeling</title>
      <link>https://arxiv.org/abs/2606.09086</link>
      <description>arXiv:2606.09086v1 Announce Type: new 
Abstract: Dynamic origin-destination (OD) flow generation seeks to synthesize realistic mobility dynamics from temporal context alone, without relying on historical OD observations. A key challenge is to translate semantic temporal signals into temporally coherent OD patterns while preserving the inherent spatial heterogeneity of urban regions. We propose DynaOD, a semantic-driven framework that models temporal dynamics through two complementary perspectives: discrete directional trends that characterize qualitative shifts in urban activity patterns, and continuous temporal evolution that captures how such shifts unfold over time. By jointly encoding these temporal semantics, the framework constructs time-varying region representations that condition pretrained static OD generators in a lightweight and plug-and-play fashion. This modular design further supports scalable deployment and cross-city transferability. Extensive experiments on large-scale real-world datasets show that our method consistently outperforms representative baselines in both predictive accuracy and distributional fidelity. Code is publicly available at https://github.com/csjiezhao/DynaOD.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09086v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jie Zhao, Xianqi Dai, Jie Feng, Huandong Wang, Yong Li</dc:creator>
    </item>
    <item>
      <title>Autonomous FPV Flight with Translational Optical Flow and Uncertainty Mask</title>
      <link>https://arxiv.org/abs/2606.09088</link>
      <description>arXiv:2606.09088v1 Announce Type: new 
Abstract: Autonomous FPV quadrotor flight in complex environments using a monocular RGB camera as the sole exteroceptive sensor remains a fundamental challenge. Recent research has shown that using optical flow as the input of a neural network can achieve end-to-end autonomous flight in cluttered scenes. However, extracting the most relevant information from the flow estimation is the key bottleneck limiting agility and robustness. Existing methods struggle to disentangle obstacle-induced optical flow from the ego-motion background flow and suffer from low signal-to-noise ratios near the focus of expansion (FoE). To address these issues, we decompose the optical flow into translational and rotational components and utilize only the translational flow, which captures scene geometry and depth cues. In addition, we introduce an uncertainty mask derived from inconsistencies between forward and backward flow estimates. This mask highlights obstacle structures, including those within the FoE region. Both cues are fed to a control policy trained in a differentiable simulation framework, which enables efficient first-order optimization across perception and control. We validate our approach through extensive experiments in both simulated and real-world forest environments. The proposed system achieves robust flight at speeds of up to 13.91 m/s in simulation and 11.79 m/s in real-world tests, with a 93.3\% success rate over 30 real-world trials, nearly doubling the previously reported 6 m/s real-world speed of the monocular-RGB optical-flow UAV obstacle avoidance system.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09088v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yang Deng, Yu Hu, Feng Yu, Linzuo Zhang, Danping Zou</dc:creator>
    </item>
    <item>
      <title>Context Rot in AI-Assisted Software Development: Repurposing Documentation Consistency for AI Configuration Artifacts</title>
      <link>https://arxiv.org/abs/2606.09090</link>
      <description>arXiv:2606.09090v1 Announce Type: new 
Abstract: Developers increasingly provide AI coding assistants with persistent context through configuration files such as CLAUDE.md, AGENTS.md, and .cursorrules. These files describe code elements, architecture, and development conventions, forming the context that guides AI tool behavior across sessions. As software evolves, this context can become stale, a phenomenon we call context rot. While AI configuration artifacts are new, the underlying consistency problem connects to decades of software documentation research. Researchers have built tools to check consistency between documentation and code, spanning README files, code comments, API documentation, architecture descriptions, and installation instructions. We argue that this existing toolbox is an immediate starting point for detecting context rot, and we present a research roadmap mapping documentation consistency approaches to corresponding problems in this new setting. As preliminary evidence, applying an existing README/wiki consistency checker to a statistically representative sample of 356 repositories identifies stale code element references in 23.0% of repositories, showing that traditional documentation consistency tools can already surface context rot.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09090v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Christoph Treude, Sebastian Baltes</dc:creator>
    </item>
    <item>
      <title>Stabilizing On-Policy Distillation for MLLM Reasoning with Global Normalization</title>
      <link>https://arxiv.org/abs/2606.09091</link>
      <description>arXiv:2606.09091v1 Announce Type: new 
Abstract: On-policy distillation (OPD) has recently emerged as an important post-training paradigm. By using a stronger teacher model to provide dense, fine-grained supervision for sampled trajectories, OPD offers a clear advantage over reinforcement learning with verifiable rewards (RLVR), which typically depends on sparse binary or outcome-based environmental feedback. However, naive token-level distillation can suffer from gradient instability, due to magnitude misalignment in outlier states. To address this issue, we propose Globally Normalized Distillation Policy Optimization (GNDPO), a practical method that stabilizes optimization by transforming raw KL scores into batch-level relative advantages. This normalization effectively mitigates gradient explosions while retaining the benefits of token-level guidance. Experimental results show that GNDPO substantially improves training robustness and downstream performance across multimodal reasoning tasks. The code is released at https://github.com/OPPO-Mente-Lab/GNDPO.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09091v1</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dongze Hao, Zhiwei Jin, Chen Chen, Haonan Lu</dc:creator>
    </item>
    <item>
      <title>From Shortcuts to Reasoning: Robust Post-Training of Theory of Mind with Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2606.09092</link>
      <description>arXiv:2606.09092v1 Announce Type: new 
Abstract: Theory of Mind (ToM) is a must-acquire skill for modern foundation model systems to operate effectively and safely in the real world. Recent works have explored honing ToM via post-training; however, we show that such progress is confounded by a pervasive "shortcut" issue: tasks can reach up to 99% accuracy by simply exploiting spurious causal correlations, leading to a false sense of ToM. Motivated by this, we first develop a framework to systematically examine ToM datasets for shortcuts and provide guidance for future development. We find that questions reducible to pure state tracking, such as "belief," are especially shortcut-prone compared to mind questions, such as "intention," where reasoning beyond tracking is required. Using four shortcut-free datasets across three ToM contexts, we then comprehensively study whether Reinforcement Fine-Tuning with verifiable rewards and explicit reasoning chains, called Thinking-RFT, elevates ToM beyond Supervised Fine-Tuning, or SFT. Our key findings are as follows. First, Thinking-RFT effectively improves ToM in all scenarios, with a 6% improvement over SFT, particularly in complex higher-order reasoning, with a 10% improvement over SFT, and multimodal cases, with a 7% improvement over SFT. It also generalizes notably better to unseen domains and higher-order queries while being more robust to counterfactuals. Second, ToM benefits specifically from the joint effect of reasoning and RL: Thinking-RFT outperforms Non-Thinking-RFT by 7% on average. Third, RFT works by learning to ground its reasoning on anchor cues, such as keywords and state changes, that correspond to causal factors. We believe our study is useful for developing effective and robust ToM post-training datasets and advancing critical ToM capabilities.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09092v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jike Zhong, Yuxiang Lai, Ming Li, Yuheng Li, Wuao Liu, Behzad Dariush, Konstantinos Psounis, Shao-Yuan Lo</dc:creator>
    </item>
    <item>
      <title>LAEI: Layered Autonomous Edge Intelligence Framework for Robust UAV Swarm Operations</title>
      <link>https://arxiv.org/abs/2606.09099</link>
      <description>arXiv:2606.09099v1 Announce Type: new 
Abstract: Autonomous UAV swarms require scalable coordination mechanisms that maintain mission performance under limited communication, environmental uncertainty, and component failures. Centralized approaches provide global coordination but suffer from communication bottlenecks and single-node vulnerabilities, whereas fully decentralized methods often lack mission-level consistency. This paper presents Layered Autonomous Edge Intelligence (LAEI), a UAV-swarm framework that combines onboard learned policies with lightweight mission-level supervision. Each UAV performs local perception, obstacle avoidance, and action selection onboard, while the supervisory layer provides adaptive goal reassignment, fault-aware recovery, and context-dependent policy guidance without directly controlling low-level actions. LAEI further incorporates recovery strategies, including dynamic reassociation, backup supervisory support, and fallback local autonomy, to maintain mission continuity under representative failure scenarios. We evaluate LAEI in simulated UAV-swarm scenarios using mission completion time, collision rate, and coverage efficiency. The results show that LAEI reduces mission completion time and improves operational efficiency while maintaining collision-aware distributed UAV-level decision-making.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09099v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Changmin Park, Wooyong Jung, Hwangnam Kim</dc:creator>
    </item>
    <item>
      <title>Alcmean's: Unsupervised community detection using local Laplacian, automatic detection of the number of centers</title>
      <link>https://arxiv.org/abs/2606.09100</link>
      <description>arXiv:2606.09100v1 Announce Type: new 
Abstract: Community detection is a fundamental problem in the analysis of complex networks. It has applications across social, biological, and financial domains. Traditional algorithms such as Louvain, LPA, and modularity optimization often require manual parameter tuning. They also suffer from inaccurate cluster center selection and struggle with scalability. To address these challenges, we propose Automatic Laplacian Centrality Means (ALCMeans), a novel community detection algorithm. ALCMeans combines Laplacian energy-based automatic center identification with DeepWalk embeddings for robust node representation. Unlike existing Laplacian-based and clustering methods, ALCMeans eliminates the need to predefine the number of communities, enhances cluster center selection using structural importance, and leverages representation learning for more accurate and stable assignments. Experimental results on benchmark datasets demonstrate 10 to 20 percent higher NMI and ARI scores compared to Louvain, Newman-Girvan, LPA, Fast-Greedy, and a recent GNN-based competitor (MAGI, KDD 2024). Additional evaluations with modularity and F1-scores confirm the superiority of ALCMeans. Ablation studies highlight the critical contributions of each component. Despite its reliance on DeepWalk parameters and increased runtime relative to lightweight heuristics, ALCMeans consistently outperforms state-of-the-art methods. This makes it a promising tool for real-world network analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09100v1</guid>
      <category>cs.SI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:DOI>10.22103/jmmr.2026.25756.1849</arxiv:DOI>
      <dc:creator>Shahin Momenzadeh, Rojiar Pir Mohammadiani</dc:creator>
    </item>
    <item>
      <title>Chimera: Protocol-Aware Recovery for Confidential BFT Consensus</title>
      <link>https://arxiv.org/abs/2606.09101</link>
      <description>arXiv:2606.09101v1 Announce Type: new 
Abstract: Trusted Execution Environments (TEEs) have enabled confidential Byzantine Fault-Tolerant (BFT) consensus systems with confidentiality and improved scalability. However, TEEs do not provide state continuity: during recovery, a compromised host can roll back a crashed enclave to a stale persistent state, significantly threatening both safety and availability. Existing defenses face a fundamental tradeoff: they either impose substantial overhead on critical consensus paths, reducing throughput and increasing latency, or incur prolonged recovery delays, hurting availability.
  We present the first systematic taxonomy of rollback-resilient recovery for confidential BFT consensus, distilling prior approaches into four categories. We further expose their inherent limitations. Guided by this detailed analysis, we design CHIMERA, a protocol-aware recovery framework that breaks this tradeoff. Our key insight is that rollback protection in consensus systems should not be uniform. Different types of persistent states differ fundamentally in their state distribution, update behavior, and representation form. CHIMERA separates persistent state into metadata and logs according to these protocol-level properties and applies distinct recovery mechanisms to each type. We formally model CHIMERA in Maude and verify its safety and liveness properties. We implement it on Braft and ZooKeeper using Intel TDX, and evaluate it in both LAN and WAN settings. Results show that CHIMERA achieves higher throughput, lower recovery latency, and better availability than state-of-the-art rollback-resilient baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09101v1</guid>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tong Liu, Xiaoqing Wen, Ziwei Zhou, Si Liu, Jianyu Niu, Cong Wang, Yinqian Zhang</dc:creator>
    </item>
    <item>
      <title>Concepts in Practice: C++ MPI Bindings for the HPC Ecosystem. From a Standardizable Core to a Composable Interface</title>
      <link>https://arxiv.org/abs/2606.09102</link>
      <description>arXiv:2606.09102v1 Announce Type: new 
Abstract: The official C++ MPI bindings were removed from the standard in 2008, leaving a gap that numerous third-party libraries have attempted to fill. However, existing wrappers typically cover only a limited subset of MPI or target specific use cases, falling short of a general-purpose solution. A recent conceptual paper proposed general design principles for modern C++ bindings based on C++20 concepts, without committing to a concrete interface.
  We present the first concrete realization of these principles in a layered architecture. At the foundation, we define a core layer: refined C++20 concepts formalizing the MPI standard's notion of data buffers, automatic mapping of standard C++ constructs, non-intrusive customization points for third-party types, and concept-based wrappers for MPI procedures. The result is a low-level native C++ MPI interface that works directly with STL containers, is highly extensible, and lends itself to standardization. Built on this core, we present KaMPIng-v2 -- a C++ MPI library offering the convenience and memory-safety of KaMPIng with composable, pipe-based syntax inspired by C++ ranges for efficient, boilerplate-free MPI programming. Finally, we demonstrate the core layer's broad applicability by designing lightweight adapters for GPU and performance-portability libraries, making the HPC ecosystem a first-class citizen in MPI. Kokkos views, Thrust device vectors, and SYCL buffers can be passed directly to MPI procedures, with adapter logic remaining self-contained.
  All contributions are backed by a fully functional open-source reference implementation, demonstrating the practical viability of the proposed design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09102v1</guid>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tim Niklas Uhl, Matthias Schimek, Daniel Brommer</dc:creator>
    </item>
    <item>
      <title>Addressing Market Regime Changes and Heavy-Tailed Returns in Portfolio Optimization via Bayesian VAR and Elliptical Black-Litterman</title>
      <link>https://arxiv.org/abs/2606.09104</link>
      <description>arXiv:2606.09104v1 Announce Type: new 
Abstract: Deep reinforcement learning (DRL) frameworks for portfolio optimization have shown promise for their ability to learn allocation rules dynamically from market data. However, these models fail to account for fat-tailed returns, which characterize actual market behavior with more frequent extreme events. Furthermore, historical data is treated homogeneously, without accounting for temporal importance, leading models to fail during regime changes. We propose a new BAVAR-BLED algorithm that combines methods derived from Bayesian-Averaging Vector Autoregressive (BAVAR) and the Black-Litterman model using Elliptical Distributions (BLED) within a TD3 architecture. BAVAR captures a set of vector autoregressive representations that consider multi-scale temporal features, enabling adaptive allocation decisions based on regime-aware estimates of return expectations and dispersion matrices. These estimates serve as prior inputs to BLED, a model that uses Student's t-distributions, allowing for more realistic fat tail return estimates. The BAVAR-BLED algorithm uses transformer networks for view construction and CNNs for risk-aversion estimates, which modify dynamic allocation decisions based on market conditions. An evaluation of 29 Dow Jones Industrial Average constituents over a decade-long market period shows that BAVAR-BLED significantly outperforms state-of-the-art methods, achieving Sharpe and Sortino ratios of 1.72 and 2.70, respectively, and total returns of 57.26%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09104v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>q-fin.PM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Daniil Mikriukov (University of Liverpool, Xi'an Jiaotong-Liverpool University), Ruoyu Sun (Xi'an Jiaotong-Liverpool University), Angelos Stefanidis (Xi'an Jiaotong-Liverpool University), Jionglong Su (Xi'an Jiaotong-Liverpool University), Zhengyong Jiang (Xi'an Jiaotong-Liverpool University)</dc:creator>
    </item>
    <item>
      <title>Graph2Idea:Retrieval-Augmented Scientific Idea Generation with Graph-Structured Contexts</title>
      <link>https://arxiv.org/abs/2606.09105</link>
      <description>arXiv:2606.09105v1 Announce Type: new 
Abstract: Generating novel, feasible, and high-quality research ideas is an important yet challenging task in scientific discovery.Recent Large Language Model (LLM)-based methods often ground idea generation with retrieved literature, but the retrieved evidence is usually provided as flat text, such as titles, abstracts, or summaries. Such flat contexts may contain redundant or weakly relevant information, while making cross-paper relations among problems, methods, mechanisms, and findings difficult to identify and trace.To address this challenge, we propose Graph2Idea, a knowledge graph-guided framework for retrieval-augmented scientific idea generation.Graph2Idea first retrieves papers according to the input topic, transforms them into structured knowledge triples, and dynamically constructs a target-centered knowledge graph to make literature relations explicit.It then extracts compact graph-derived contexts that retain target-relevant relational evidence while reducing noisy textual input.Based on these contexts, a two-stage generation process first identifies promising research directions and then guides the LLM to synthesize candidate ideas from graph-grounded evidence.Experiments on a scientific idea generation benchmark show that Graph2Idea outperforms representative baselines under the automatic evaluation protocol.Compared with the strongest baseline scores, it improves Novelty from 0.45 to 0.52, Quality from 0.24 to 0.29, and Feasibility from 0.22 to 0.28.These results suggest that graph-structured evidence helps LLMs generate research ideas through more explicit, compact, and traceable recombination of prior scientific knowledge.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09105v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xu Li, Hanzhe Tu, Xun Han</dc:creator>
    </item>
    <item>
      <title>RAM: Reachability Across Morphologies</title>
      <link>https://arxiv.org/abs/2606.09108</link>
      <description>arXiv:2606.09108v1 Announce Type: new 
Abstract: Many stages of the robotic lifecycle, from morphology synthesis to operation, rely fundamentally on the reachable workspace. However, current methods for approximating workspaces are slow, imprecise, or tied to a single morphology. We introduce Reachability Across Morphologies (RAM): a morphology-conditioned, implicit neural representation that acts as a fast, differentiable surrogate for pose reachability, generalising to unseen morphologies while inherently accounting for self-collisions. To train RAM, we publish a large-scale dataset of $3\cdot10^{10}$ samples generated solely from forward kinematics. Experiments show that our model achieves an $ F_1$-score of $86\%$ at nanosecond inference, outperforming the baseline by $14\%$ while reducing inference time by three orders of magnitude. We further demonstrate speed-ups of one and two orders of magnitude for gradient-based morphology and trajectory optimisation, respectively.
  Website: https://timwalter.github.io/ram.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09108v1</guid>
      <category>cs.RO</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tim Walter, Xinyu Chen, Jonathan K\"ulz, Matthias Althoff</dc:creator>
    </item>
    <item>
      <title>Driving Video Retrieval for Complex Queries with Structured Grounding</title>
      <link>https://arxiv.org/abs/2606.09109</link>
      <description>arXiv:2606.09109v1 Announce Type: new 
Abstract: Video retrieval at scale is central to data curation and safety validation in autonomous driving, where users want to find not only scenes but also dynamic events such as cut-ins and hard braking. Existing vision-language and keyword-based retrieval methods often miss these events because the relevant motion may not be explicitly described in text or captured by lexical overlap. Rule-based retrieval can encode such events more directly, but it is brittle: generated or hand-written rules often fail when their assumptions do not match real driving data. We propose STRIVE-D, a data-calibrated retrieval framework for driving videos. It uses weakly labeled in-domain videos to estimate when a query rule is reliable, adapt rules that mismatch observed data, and fuse calibrated rule scores with vision-language and keyword-based retrieval signals. Across three driving benchmarks, including newly released human-annotated event data on DrivingDojo, STRIVE-D delivers up to 84% relative improvement in top-1 accuracy over state-of-the-art methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09109v1</guid>
      <category>cs.CV</category>
      <category>cs.IR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Manyi Yao, Sparsh Garg, Christian Shelton, Amit Roy-Chowdhury, Abhishek Aich</dc:creator>
    </item>
    <item>
      <title>HDRAgent: An Agentic Framework for Multi-Exposure HDR Imaging</title>
      <link>https://arxiv.org/abs/2606.09110</link>
      <description>arXiv:2606.09110v1 Announce Type: new 
Abstract: Most existing multi-exposure HDR methods follow a fixed feed-forward reconstruction paradigm, making them prone to ghosting artifacts in complex dynamic scenes. To address this issue, we propose HDRAgent, the first agent-driven framework for HDR imaging, which adaptively selects reconstruction strategies according to the current scene conditions. Specifically, to provide scene-specific prior knowledge, we introduce a fine-grained contextual knowledge matching (FCM) module. This module leverages multimodal large language model (MLLM)-derived scene perception to retrieve relevant historical cases and tool knowledge, organizing them into structured evidence for MLLM-based adaptive tool scheduling. In addition, we propose a perception--distortion feedback mechanism that transforms post-execution quality assessment and artifact diagnosis into structured feedback, which is accumulated in historical memory to help subsequent contextual knowledge refinement and strategy selection. Furthermore, considering that extreme motion can invalidate alignment methods, we design an agent-guided generative alignment strategy that uses MLLM-based dynamic-region parsing to reconstruct unreliable contents in non-reference frames under reference-frame guidance. Experiments demonstrate that HDRAgent effectively reduces ghosting and local artifacts while achieving competitive or superior objective performance and visual quality.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09110v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Weiyu Zhou, Tao Hu, Yijian Wang, Xiaogang Xu, Ruixing Wang, Qingsen Yan</dc:creator>
    </item>
    <item>
      <title>Illumination-Invariant Anomaly Detection for Sub-Canopy UAV Multispectral Point Clouds</title>
      <link>https://arxiv.org/abs/2606.09111</link>
      <description>arXiv:2606.09111v1 Announce Type: new 
Abstract: Unmanned Aerial Vehicle (UAV) multispectral point clouds (MPC) provide high-dimensional spatial-spectral data for sub-canopy target detection; however, their efficacy is significantly compromised by severe illumination heterogeneity caused by vegetation shadows. To address this, we propose a prior-free anomaly detection framework capable of robustly handling lighting variations. First, we formulate solar angle estimation as an inverse optimization problem. By coupling spectral indices with a ray-tracing model, this strategy achieves Prior-Free Shadow Extraction without relying on flight metadata, effectively distinguishing dark objects from true shadows. Second, to mitigate spectral distortions, we introduce an Illumination-Consistent Sparse Representation mechanism. Unlike standard reconstruction methods, we construct a background dictionary strictly from neighbors sharing the same illumination state. This constraint effectively disentangles spectral reflectance from lighting variations, ensuring that targets are represented solely by physically consistent background points. Experimental results indicate that the proposed method significantly improves the separability between anomalies and background in complex forest environments, demonstrating superior performance over state-of-the-art baselines. This framework is particularly suited for identifying camouflaged military targets, mapping fallen tree trunks, and uncovering archaeological ruins hidden beneath dense foliage.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09111v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Likun Chen, Yanfeng Gu, Xian Li</dc:creator>
    </item>
    <item>
      <title>Hybridizing Equilibrium Propagation with Ising Machines for Efficient Energy-Based Learning</title>
      <link>https://arxiv.org/abs/2606.09112</link>
      <description>arXiv:2606.09112v1 Announce Type: new 
Abstract: The rapid evolution of artificial intelligence has led to substantial advances in deep neural networks. Nonetheless, conventional GPU-based training remains highly energy-demanding, motivating the exploration of physical dynamics and compatible energy-based learning schemes, such as equilibrium propagation (EP). EP-based training, however, frequently suffers from convergence to local minima due to phase-space contraction. Here we introduce an Ising-dynamics-inspired equilibrium-propagation framework in which dissipative Hopfield relaxation is replaced by an extended phase-space dynamics with conjugate variables. The resulting training paradigm keeps the local two-phase learning rule of EP while changing the physical route by which neural states reach equilibrium. We show that this dynamics lowers effective energy barriers, accelerates convergence, improves noise robustness, and trains deep convolutional Hopfield networks on MNIST, FashionMNIST, and CIFAR-10 with performance comparable to backpropagation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09112v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chen-Rui Fan, Bo Lu, Xing-Yu Wu, Tie-Jun Wang, Chuan Wang</dc:creator>
    </item>
    <item>
      <title>MAAM: Anchor-Preserving Compression and Contextual Calibration for Chinese Discriminatory Language Detection</title>
      <link>https://arxiv.org/abs/2606.09114</link>
      <description>arXiv:2606.09114v1 Announce Type: new 
Abstract: Chinese discriminatory-language detection is challenging because harmful intent is often implicit and context-dependent. We propose MAAM (Myopia--Astigmatism Anchor Mechanism), a lightweight, model-agnostic framework inspired by functional visual blur: rather than preserving every token equally, MAAM retains discrimination-relevant semantic anchors and calibrates them with C--I--S contextual priors (Contextual Tone, Group Identity, and Stance Polarity). We also introduce ChLGBT, to our knowledge the first Chinese LGBT-focused discriminatory-language dataset, with 8,120 manually annotated samples and three ordinal labels: explicit bias, implicit bias, and emotional intensity. Across strong encoder baselines, MAAM improves all three prediction dimensions, with consistent gains in accuracy, F1, Brier score, and expected calibration error. Compared with frontier LLM baselines under zero-shot and few-shot prompting protocols, MAAM remains competitive while offering stronger compactness and stability. These results suggest that interpretable anchor preservation and contextual calibration provide a practical alternative to heavier model scaling for Chinese discriminatory-language assessment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09114v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Yuxin Fu, Shijing Si</dc:creator>
    </item>
    <item>
      <title>Counterfactual Transport Flows for Offline Conservative Trajectory Refinement</title>
      <link>https://arxiv.org/abs/2606.09115</link>
      <description>arXiv:2606.09115v1 Announce Type: new 
Abstract: Offline reinforcement learning (RL) offers a path to policy improvement from logged data alone, using historical returns or other measurable outcomes as world feedback. A key difficulty is improving observed behavior without extrapolating beyond what the offline data supports. We propose \emph{counterfactual transport flows}, a source-conditioned trajectory refinement framework for offline decision-making guided by world feedback. Given a low-feedback candidate trajectory, we construct local preference pairs from offline data by retrieving nearby trajectories in latent trajectory space with higher task-specific feedback, and use them as weak supervision for conservative refinement. The framework learns instance-specific refinement directions: at inference time, a refinement strength parameter controls how far the candidate trajectory is transported, enabling a trade-off between preserving the original behavior and applying stronger improvement. Experiments on D4RL benchmarks, including AntMaze and MuJoCo tasks, show that our method improves behavior from historical returns as world feedback, while providing interpretable trajectory-level refinement paths.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09115v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lena Krieger, Xuan Zhao, Zhuo Cao, Qin Wang, Hanno Scharr, Ira Assent</dc:creator>
    </item>
    <item>
      <title>Optimizing Energy-based Neural Network Training with Coherent Ising Machine</title>
      <link>https://arxiv.org/abs/2606.09117</link>
      <description>arXiv:2606.09117v1 Announce Type: new 
Abstract: While Ising machines serve as advanced physical solvers for the Ising model,enabling applications in combinatorial optimization and neural network training,their scalability for large-scale neural networks remains constrained by hardware connectivity limitations and suboptimal training methodologies. In this work,we leverage a Coherent Ising Machine (CIM) to train an energy-based neural network using Equilibrium Propagation, achieving performance comparable to existing software-based implementations. We further enhance the algorithm by integrating the Adam optimizer to solve for the ground state of a Hopfield energy network, significantly improving convergence speed and solution accuracy. Additionally, we demonstrate the scalability of our approach across deeper network architectures and convolutional operations. Our results highlight the potential of CIM dynamics as a scalable platform for training complex neural networks, offering a pathway toward energy-efficient implementations via analog circuits, optoelectronics, or integrated photonics. This work establishes a novel physical framework for next-generation AI hardware development.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09117v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chen-Rui Fan, Bo Lu, Zhi-Hong Zhang, Run-Qing Zhang, Jing-Wei Wen, Chuan Wang</dc:creator>
    </item>
    <item>
      <title>ComplexConstraints and Beyond: Expert Rubrics for RLVR</title>
      <link>https://arxiv.org/abs/2606.09118</link>
      <description>arXiv:2606.09118v1 Announce Type: new 
Abstract: As LLM capabilities advance rapidly, the evaluation methods used to assess them increasingly lag behind. Traditional benchmarks relied on programmatic verification of narrow, surface-level constraints, but real-world instruction following and agentic tasks demand assessment of nuanced, context-dependent behaviors that resist simple scripted checks. We present a systematic analysis of expert-curated rubric-based evaluation as an alternative paradigm, drawing on empirical evidence from two domains: complex instruction following and enterprise agentic tasks. We first articulate five design principles for constructing high-quality rubrics, including Maximum Viable Atomicity, intent-aware criterion design, and iterative LLM-judge calibration. To validate these principles, we introduce ComplexConstraints, a new expert-curated instruction-following dataset in which each prompt is paired with 10-40 atomic rubric criteria. We demonstrate that these expert rubrics are not only better evaluation instruments but also highly effective training signals: training on approximately 1,000 ComplexConstraints examples yields +15.5% improvement for a 4B-parameter model and +12.2% for a 235B-parameter model on instruction following, while single-epoch RL training on a rubric-graded enterprise environment produces gains that transfer to out-of-distribution benchmarks the model was never trained on (+4.5% BFCL, +7.4% Tau2-Bench, +6.8% Tool-Decathlon). Our findings establish that expert-authored rubrics improve both the measurement and the development of frontier LLM capabilities, serving as effective evaluation and RL training signals.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09118v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sushant Mehta, Liudas Panavas, Edwin Chen</dc:creator>
    </item>
    <item>
      <title>AutoPilot: Learning to Steer High Speed Robust BFT</title>
      <link>https://arxiv.org/abs/2606.09120</link>
      <description>arXiv:2606.09120v1 Announce Type: new 
Abstract: Recent Byzantine Fault Tolerant (BFT) protocols achieve strong performance by combining the low-latency advantages of leader-based BFT protocols with the high-throughput benefits of DAG-based data dissemination. Despite exposing a wide spectrum of internal tunable parameters, these protocols typically rely on static and heuristic configurations, which leads to performance degradation under dynamic workloads, heterogeneous network conditions, and evolving adversarial behaviors. In this paper, we present AutoPilot, a reinforcement learning-based framework that continuously monitors runtime conditions and dynamically adjusts protocol parameters online to optimize consensus performance. To ensure robustness, AutoPilot coordinates learning in a decentralized manner, providing resilience against adversarial data pollution. We implement AutoPilot on top of Autobahn, a state-of-the-art, highspeed, robust BFT protocol, and evaluate it across diverse dynamic environments. Experimental results demonstrate that AutoPilot quickly converges to the optimal configuration under changing environments, reduces end-to-end latency by 49.8% compared to the default protocol configuration, and outperforms random configuration exploration by 73.3%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09120v1</guid>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Liangrong Chen, Yue Zhang, Eric Zhou, Mohammad Javad Amiri, Ryan Marcus, Chenyuan Wu</dc:creator>
    </item>
    <item>
      <title>Autonomous Incident Resolution at Hyperscale: An Agentic AI Architecture for Network Operations</title>
      <link>https://arxiv.org/abs/2606.09122</link>
      <description>arXiv:2606.09122v1 Announce Type: new 
Abstract: Cloud network infrastructure at hyperscale presents unique operational challenges where traditional human-driven incident response cannot keep pace with the volume, velocity, and complexity of failures. This paper presents an agentic AI architecture for autonomous incident resolution in large-scale network operations. Our system employs a multi-agent orchestration framework where specialized AI agents collaborate to detect, diagnose, and remediate network incidents without human intervention. We describe the architectural principles, including hierarchical agent decomposition, skills-based tool invocation via standardized protocols, structured knowledge encoding from operational runbooks, progressive autonomy with safety boundaries, and closed-loop verification. The architecture has been deployed in production at a major cloud provider, demonstrating that agentic AI systems can achieve autonomous resolution rates exceeding 90% for common incident categories while maintaining safety guarantees through layered authorization and rollback mechanisms. We discuss design tradeoffs, failure modes, and lessons learned from operating autonomous AI agents at scale.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09122v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <category>cs.ET</category>
      <category>cs.MA</category>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Arun Malik</dc:creator>
    </item>
    <item>
      <title>An Enhanced Geometric-Spectral Feature Learning Framework for Airborne Multispectral Point Cloud Classification</title>
      <link>https://arxiv.org/abs/2606.09123</link>
      <description>arXiv:2606.09123v1 Announce Type: new 
Abstract: Multispectral point cloud (MPC) is composed of 3D spatial-spectral information, which holds tremendous potential for accurate land-cover classification. However, the representation power of classification models is limited by inherent high-dimensional and heterogeneous spatial-spectral information, unbalanced sample distribution, and inter-class spectral similarity of airborne MPCs. We build two MPC datasets and propose an enhanced geometric-spectral feature learning framework based on attentions for airborne MPC classification. A key component in our model is a two-stream feature fusion method with attention mechanisms, which enhances the representation capability of spatial-spectral features from high-dimensional heterogeneous MPCs. The first stream aims to extract position-encoded global spectral features with fusion self-attention, and the second stream comprises a multikernel point convolution and feature aggregation attention to extract spectral-guided geometric features. We then develop a residual attention fusion block to integrate the most informative geometric-spectral features from the two parallel streams. Another important contribution of this work is a joint loss function to improve the learning ability on unbalanced and interclass similar samples. Experimental results on two airborne MPC datasets demonstrate the effectiveness of the proposed method compared with the state-of-the-art methods. Furthermore, the codes and datasets used in this paper will be made available freely at https://github.com/HITlixian/TGRS_GSFF.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09123v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xian Li, Yanfeng Gu, Aleksandra Pi\v{z}urica</dc:creator>
    </item>
    <item>
      <title>A Regret Minimization Framework on Preference Learning in Large Language Models</title>
      <link>https://arxiv.org/abs/2606.09124</link>
      <description>arXiv:2606.09124v1 Announce Type: new 
Abstract: Reinforcement learning with verifiable rewards (RLVR) has enabled progress on reasoning-intensive tasks by relying on task-specific verifiers that provide automated correctness signals. However, many realistic language tasks are difficult to equip with reliable verifiers, motivating a growing reliance on reinforcement learning from human feedback (RLHF). In this setting, we argue that a closer examination of how human feedback should be interpreted is essential. We introduce Regret-based Preference Optimization $(\textbf{RePO})$, which reframes RLHF through $\textit{regret minimization}$ rather than reward maximization. Human preferences are often shaped by $\textit{prospective}$ anticipation of outcomes and $\textit{counterfactual}$ comparisons to alternative behaviors, rather than by immediate, outcome-independent utility. $\textbf{RePO}$ captures this structure by modeling preferences as behavior-conditioned assessments of relative suboptimality. Experiments on mathematical reasoning benchmarks and human preference datasets demonstrate consistent performance gains, indicating that $\textbf{RePO}$ is an effective and human-aligned approach for training large language models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09124v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Suhwan Kim, Taehyun Cho, Geon-Hyeong Kim, Yu Jin Kim, Youngsoo Jang, Moontae Lee, Jungwoo Lee</dc:creator>
    </item>
    <item>
      <title>Unveiling Privacy Risks in Multi-modal Large Language Models: Task-specific Vulnerabilities and Mitigation Challenges</title>
      <link>https://arxiv.org/abs/2606.09125</link>
      <description>arXiv:2606.09125v1 Announce Type: new 
Abstract: Privacy risks in text-only Large Language Models (LLMs) are well studied, particularly their tendency to memorize and leak sensitive information. However, Multi-modal Large Language Models (MLLMs), which process both text and images, introduce unique privacy challenges that remain underexplored. Compared to text-only models, MLLMs can extract and expose sensitive information embedded in images, posing new privacy risks. We reveal that some MLLMs are susceptible to privacy breaches, leaking sensitive data embedded in images or stored in memory. Specifically, in this paper, we (1) introduce MM-Privacy, a comprehensive dataset designed to assess privacy risks across various multi-modal tasks and scenarios, where we define Disclosure Risks and Retention Risks. (2) systematically evaluate different MLLMs using MM-Privacy and demonstrate how models leak sensitive data across various tasks, and (3) provide additional insights into the role of task inconsistency in privacy risks, emphasizing the urgent need for mitigation strategies. Our findings highlight privacy concerns in MLLMs, underscoring the necessity of safeguards to prevent data exposure. Our dataset and code can be found here.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09125v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tiejin Chen, Pingzhi Li, Kaixiong Zhou, Tianlong Chen, Hua Wei</dc:creator>
    </item>
    <item>
      <title>Semantic and Task-Oriented V2X Communications: Pushing the Limits of V2X Networks Scalability</title>
      <link>https://arxiv.org/abs/2606.09126</link>
      <description>arXiv:2606.09126v1 Announce Type: new 
Abstract: Scalable Vehicle-to-Everything (V2X) networks are key to support the large-scale deployment of connected and automated mobility. However, the scalability of V2X networks is currently challenged by the limitations of existing V2X communication paradigms, which prioritize the reliable and timely delivery of the transmitted information over a careful message content selection - an approach that can potentially lead to the transmission of unnecessary information and an inefficient usage of communication resources. Semantic and task-oriented V2X communications have recently been proposed to address these scalability challenges by focusing on the content of the transmitted messages, particularly on its relevance to the intended receivers. In this paper, we numerically demonstrate that semantic and task-oriented V2X communications can substantially improve the scalability of V2X networks, increasing by up to a 4.1x factor the number of supported vehicles under high-density conditions. In addition, we show that semantic and task-oriented V2X communications can also decrease the inter-reception time between consecutive messages by up to 67% and lead to a twofold increase in the probability of successfully delivering all required relevant information to the intended receivers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09126v1</guid>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Luca Lusvarghi, Javier Gozalvez, Mohammad Irfan Khan, Seyhan Ucar, Miguel Sepulcre, Onur Altintas</dc:creator>
    </item>
    <item>
      <title>OpenOpt: An Open-Source SRAM Optimizer Based on Equivalent Circuit Model</title>
      <link>https://arxiv.org/abs/2606.09129</link>
      <description>arXiv:2606.09129v1 Announce Type: new 
Abstract: This paper proposes a co-optimization framework that jointly optimizes SRAM architecture and transistor sizing using equivalent circuit models. The framework simplifies inactive SRAM cells into equivalent RC loads and static power models, achieving up to 61.4$\times$ simulation speedup while maintaining high fidelity (read/write delay error $&lt;$0.22%, power error $&lt;$1.68%). A joint search space encompassing architecture parameters and device sizing integrates seven algorithms including SA, PSO, Bayesian Optimization variants, and multi-objective evolutionary algorithms. Based on FreePDK45, ablation experiments confirm complementary gains from architecture selection and transistor sizing. Among all algorithms, MOEA/D achieves the best Figure of Merit (8.2721), yielding 6.2% improvement in SNM, 73.6% reduction in area, and 42.3% reduction in peak power. The framework is publicly available at https://github.com/W1Y1K1/OpenOpt.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09129v1</guid>
      <category>cs.NE</category>
      <category>cs.AR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yikai Wang, Yiheng Wu, Can Wang, Bohao Liu, Junhao Ma, Zhuohua Liu, Qinxin Mei, Shan Shen</dc:creator>
    </item>
    <item>
      <title>Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation</title>
      <link>https://arxiv.org/abs/2606.09131</link>
      <description>arXiv:2606.09131v1 Announce Type: new 
Abstract: Multimodal large language models (MLLMs) commonly inherit the deep, symmetric Transformer backbone designed for unimodal text modeling, and apply the same computation uniformly to image and language tokens. This design overlooks a key modality asymmetry: image and text tokens differ substantially in information density, redundancy, and required reasoning depth. Through a layer-wise analysis of LLaVA-1.5, we observe that vision tokens tend to saturate in the middle layers. Specifically, text-to-image attention decreases from 0.68 at layer 0 to 0.07 by layer 4, and stabilizes near 0.04 after layer 18, whereas text tokens continue to benefit from deep semantic processing. These findings suggest a mismatch between architectural symmetry and depth-asynchronous modality evolution, resulting in redundant visual computation and possible drift in perceptual representations during deep task-specific adaptation. Motivated by this, we propose Dual-Path Vision Token Routing (DPVR), a modality-asymmetric routing framework for efficient MLLMs. Its core instantiation, DPVR-LF (Late-Layer Fusion), routes vision tokens at the saturation point into a one-layer trainable side branch, runs a thirteen-layer text-only forward that skips image positions in the deep stack, and re-fuses the visual and textual streams only at the final layer. With approximately 3% trainable parameters, DPVR-LF preserves competitive multimodal performance on standard benchmarks while reducing visual computation in the deep Transformer stack. The results challenge the conventional assumption that vision tokens must traverse all deep language-model layers, and indicate that a single late fusion layer can be sufficient for maintaining strong perceptual competence in LLaVA-style MLLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09131v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Siyuan Liu, Jinyang Wu</dc:creator>
    </item>
    <item>
      <title>Vision Language Model Helps Private Information De-Identification in Vision Data</title>
      <link>https://arxiv.org/abs/2606.09132</link>
      <description>arXiv:2606.09132v1 Announce Type: new 
Abstract: Visual Language Models (VLMs) have gained significant popularity due to their remarkable ability. While various methods exist to enhance privacy in text-based applications, privacy risks associated with visual inputs remain largely overlooked such as Protected Health Information (PHI) in medical images. To tackle this problem, two key tasks: accurately localizing sensitive text and processing it to ensure privacy protection should be performed. To address this issue, we introduce VisShield (Vision Privacy Shield), an end-to-end framework designed to enhance the privacy awareness of VLMs. Our framework consists of two key components: a specialized instruction-tuning dataset OPTIC (Optical Privacy Text Instruction Collection) and a tailored training methodology. The dataset provides diverse privacy-oriented prompts that guide VLMs to perform targeted Optical Character Recognition (OCR) for precise localization of sensitive text, while the training strategy ensures effective adaptation of VLMs to privacy-preserving tasks. Specifically, our approach ensures that VLMs recognize privacy-sensitive text and output precise bounding boxes for detected entities, allowing for effective masking of sensitive information. Extensive experiments demonstrate that our framework significantly outperforms existing approaches in handling private information, paving the way for privacy-preserving applications in vision-language models. Our dataset and code can be found here.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09132v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tiejin Chen, Pingzhi Li, Kaixiong Zhou, Tianlong Chen, Hua Wei</dc:creator>
    </item>
    <item>
      <title>Multiversion Concurrency Control for Multiversion B-Trees</title>
      <link>https://arxiv.org/abs/2606.09133</link>
      <description>arXiv:2606.09133v1 Announce Type: new 
Abstract: Multiversion concurrency control (MVCC) enables scans to read data from a committed snapshot (version), reducing conflicts with write operations compared to traditional concurrency approaches. Currently, versioned records are often managed in a B$^+$-tree using version chains. However, version chains introduce overhead during scans and can still lead to conflicts between scans and writers. The multiversion B-tree (MVBT) was designed for optimal range scan performance on arbitrary versions, but has been considered impractical due to its structural complexity and, until recently, the lack of effective concurrency control. In this paper, we present the concurrent MVBT (cMVBT), a redesign of the MVBT featuring a novel concurrency control protocol that uses optimistic latches for write operations and requires no latches for range scans, while preserving all the optimality guarantees of the original MVBT. Additionally, cMVBT supports continuous garbage collection without activity spikes, seamlessly integrating free-space management. Experiments with mixed workloads derived from a standard benchmark show that the cMVBT achieves low overhead, high write throughput, and excellent range scan performance, outperforming state-of-the-art methods based on version chains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09133v1</guid>
      <category>cs.DB</category>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Amir Tonta, Bernhard Seeger, Eljas Soisalon-Soininen</dc:creator>
    </item>
    <item>
      <title>From USD Scenes to Knowledge Graphs: Zero-Shot Ontology Grounding with LLMs</title>
      <link>https://arxiv.org/abs/2606.09134</link>
      <description>arXiv:2606.09134v1 Announce Type: new 
Abstract: Constructing knowledge graphs from 3D simulation scenes is essential for robot task reasoning, but the key bottleneck, grounding scene objects to formal ontology classes, still relies on manually curated dictionaries that are brittle and do not generalize across assets. We investigate whether large language models (LLMs) can automate this grounding step for Universal Scene Description (USD) scenes as a zero-shot, training-free alternative. On a kitchen scene (125 objects) with SOMA-HOME Ontology, LLMs achieve 90-96% exact-match accuracy with descriptive names and 49-89% with abbreviated names, substantially outperforming dictionary and embedding baselines. Under fully opaque names, context-augmented prompting recovers up to 48%. Feature ablation reveals that LLMs primarily exploit semantic cues in the scene graph (sibling names and parent paths); anonymizing these cues reduces accuracy to 0-6%, while geometry alone yields only 4-17%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09134v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.CV</category>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiangtao Shuai, Zongxiong Chen, Manfred Hauswirth, Sonja Schimmler</dc:creator>
    </item>
    <item>
      <title>Steganography Without Modification: Hidden Communication via LLM Seeds</title>
      <link>https://arxiv.org/abs/2606.09135</link>
      <description>arXiv:2606.09135v1 Announce Type: new 
Abstract: We demonstrate that widely deployed Large Language Model (LLM) inference stacks harbor a steganographic channel that requires no modification to model weights, sampling code, or output distributions. The channel exploits a structural property of deterministic decoding: pseudo-random number generators (PRNGs) used in inverse-transform sampling produce a seed-dependent sequence of token-level probability intervals that can be reconstructed from the generated text alone. A sender encodes a secret message in the PRNG seed before generation; a receiver reconstructs the intervals and recovers the seed, and thus the hidden payload, by exhaustive search over the seed space.
  We formalize two operational modes. In the known-prompt setting, sender and receiver share the prompt, enabling exact interval reconstruction and perfect seed recovery via forced alignment. In the unknown-prompt setting, only the generated text is available; approximate interval reconstruction combined with a maximum-hit-count scoring strategy still permits reliable recovery from sufficiently long outputs.
  Extensive experiments across six model families and five heterogeneous text domains show that, in the known-prompt setting, full 32-bit seed recovery from the complete 2^32 candidate space achieves up to 100% accuracy, depending on model and text domain, within 300 tokens and under 35 seconds on a single GPU. In the unknown-prompt setting, recovery reaches near-perfect accuracy at 600-800 tokens in about 12 seconds. We further analyze the influence of prompting strategies, tokenization ambiguities, and sampling hyperparameters on channel reliability. Moreover, we discuss several applications of our results: First, it allows for the steganographic transmission of 32 bits, but also shows that ignorance of the prompt is not a valid security assumption.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09135v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Felix M\"achtle, Jonas Sander, Sebastian Berndt, Ben Weimar, Nils Loose, Thomas Eisenbarth</dc:creator>
    </item>
    <item>
      <title>Modeling of Spinning Plates: Geometric Stiffening and Modal Approximation for GNC Applications</title>
      <link>https://arxiv.org/abs/2606.09137</link>
      <description>arXiv:2606.09137v1 Announce Type: new 
Abstract: This work presents a modal formulation for flexible rectangular plates, accounting for nonlinear geometric effects arising from in-plane foreshortening and centrifugal stiffening. The model is linearized with respect to elastic deformations while retaining the full dependence on spacecraft angular velocities and accelerations. System matrices depend nonlinearly on spacecraft states through squared and cross-product terms, capturing gyroscopic coupling and dynamic stiffening phenomena for arbitrary rotational maneuvers. Polynomial approximation of mode shapes enables efficient computation while preserving accuracy. Model predictions are validated against finite element simulations and literature data for transient response under prescribed hub motion.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09137v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Umberto Zucchelli, Irene Valles S\'anchez, Francesco Sanfedino</dc:creator>
    </item>
    <item>
      <title>Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2606.09138</link>
      <description>arXiv:2606.09138v1 Announce Type: new 
Abstract: Agentic reinforcement learning (RL) has become an important post-training paradigm for turning LLMs from static chatbots into interactive agents, giving rise to representative applications such as OpenClaw. Existing work mainly focuses on policy optimization algorithms and training frameworks, but pays less attention to the full data lifecycle of agent-environment interactions, from data production to training consumption. To bridge this gap, we present Claw-R1, an interactive step-level data middleware system for agentic RL. Claw-R1 connects heterogeneous agent runtimes with RL training backends through two core components: a Gateway Server and a Data Pool. The Gateway Server captures multi-turn interaction steps through a unified LLM API entry point, while the Data Pool organizes them into step-level records consisting of prompt IDs, response IDs, rewards and other metadata. In our demo, users can interactively inspect live trajectories, examine the state, action, and reward of each step, curate data by quality and readiness, and configure training-ready batches for different downstream RL algorithms. Overall, Claw-R1 treats agent interaction traces as managed data assets rather than temporary runtime logs. Through this demonstration, we hope to encourage the community to recognize the importance of data management in agentic RL. Our code is available at https://github.com/AgentR1/Claw-R1 and the demonstration video can be found at link https://youtu.be/Pw47dAOw6B0.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09138v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Daoyu Wang, Mingyue Cheng, Qingchuan Li, Shuo Yu, Jie Ouyang, Qi Liu</dc:creator>
    </item>
    <item>
      <title>A Geometric Framework for Absolute Pose and Velocity Estimation with Event Cameras</title>
      <link>https://arxiv.org/abs/2606.09139</link>
      <description>arXiv:2606.09139v1 Announce Type: new 
Abstract: Despite the rapid advancements in event-based motion estimation, current geometric methods primarily focus on velocity estimation. However, absolute pose estimation, which is equally crucial for key applications such as robotic navigation and augmented reality, remains relatively underexplored. Consequently, the simultaneous recovery of absolute pose and velocity from event streams remains an open and challenging problem. To address this gap, we propose a geometric framework for absolute pose and velocity estimation by leveraging 3D lines in the scene and the events they trigger. At the core of the framework lie two key geometric constraints: the orthogonality between a 3D line and the normal vector of its corresponding event plane, and the collinearity of an event with the 2D projection of its associated line. Based on these constraints, we present both linear and polynomial solvers for absolute pose estimation. The former enables efficient computation, while the latter provides a globally optimal solution for rotation. For velocity estimation, we develop an efficient linear solver and a more accurate optimization-based solver to recover both angular and linear velocities. Notably, our methods require a minimum of three event-line correspondences to determine the 6-DoF absolute pose or velocities independently. Extensive experiments in simulation and on real-world datasets demonstrate that our methods achieve state-of-the-art performance, with significant improvements in accuracy and computational efficiency compared to existing methods. The demo code is publicly available at https://github.com/Zibin6/EventPoseVelocity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09139v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zibin Liu, Shunkun Liang, Banglei Guan, Yang Shang, Qifeng Yu, Ji Zhao</dc:creator>
    </item>
    <item>
      <title>DiffSight-Former: Modeling Structural Differences and Temporal Dynamics for Glaucoma Progression Prediction</title>
      <link>https://arxiv.org/abs/2606.09140</link>
      <description>arXiv:2606.09140v1 Announce Type: new 
Abstract: Glaucoma is a leading cause of irreversible blindness worldwide, and early detection from fundus images is critical for effective disease management. While deep learning has achieved promising performance in fundus image analysis, most existing methods rely on single time-point images and fail to capture longitudinal structural and vascular changes associated with disease progression. Sequential fundus images acquired during clinical follow-up provide valuable temporal information; however, current sequential models often struggle to detect subtle early progression signals and commonly depend on fixed-length inputs or diagnostic cues from already glaucomatous images, limiting their clinical utility for early prediction. To address these limitations, we propose DiffSight-Former, a framework for glaucoma progression prediction from sequential fundus images. It incorporates a time-variant feature extraction module based on a fundus-specific foundation model to obtain robust anatomical representations. A multi-structure difference modeling module is introduced to quantify progression-related changes in the optic disc/cup region and retinal vasculature. These representations are integrated with temporal interval embeddings and processed by a time-aware Transformer to model disease progression and estimate the probability of future glaucoma onset. Experiments were conducted on two longitudinal datasets, SIGF (405 sequences) and GRAPE (263 sequences). On SIGF, DiffSight-Former achieved an AUC of 91.54% and a sensitivity of 92.16% for progression prediction. On GRAPE, it achieved an average accuracy of 87.48% across three clinical visual-field progression criteria. Compared with existing approaches, DiffSight-Former demonstrates strong performance and robustness across different temporal settings, highlighting its potential for longitudinal glaucoma monitoring and early risk prediction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09140v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yi Huang, Lei Bi, Jinman Kim</dc:creator>
    </item>
    <item>
      <title>Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models</title>
      <link>https://arxiv.org/abs/2606.09142</link>
      <description>arXiv:2606.09142v1 Announce Type: new 
Abstract: Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions from short egocentric video clips. We approach this by formulating the task as a closed-ended visual question answering (VQA) problem and leveraging vision language models (VLMs) to predict the pedestrians' intent. We first benchmark three families of state-of-the-art VLMs in a zero-shot setting, finding that they achieve moderate gains over random guessing but exhibit limited higher-level traffic reasoning. Motivated by these findings, we further adapt VLMs to the target task using parameter-efficient fine-tuning. Our results show that the fine-tuned models substantially outperform their zero-shot counterparts and achieve a 9\% accuracy improvement over a specialized transformer-based baseline. Finally, we demonstrate that incorporating additional contextual cues, including ego motion, vehicle motion, and eye gaze, further improves predictive performance. In particular, the fine-tuned Qwen3-VL-2B model guided by eye gaze and ego motion achieves a 14.5% accuracy improvement over the transformer baseline, establishing a new state of the art for egocentric pedestrian intent decoding.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09142v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Danya Li, Xiang Su, Yan Feng, Rico Krueger</dc:creator>
    </item>
    <item>
      <title>CAMF-Det: Closure-Aware Multimodal Fusion for LiDAR-Camera 3D Object Detection on UAV Platforms</title>
      <link>https://arxiv.org/abs/2606.09143</link>
      <description>arXiv:2606.09143v1 Announce Type: new 
Abstract: Multimodal 3D object detection based on LiDAR and cameras has demonstrated excellent performance in ground-vehicle scenarios, but has not been explored for Unmanned Aerial Vehicle (UAV) platforms. In UAV top-down scenes, frequent groundobject occlusion dominated by tree canopies causes spatially varying and modality-dependent information degradation. Existing multimodal fusion frameworks neither explicitly model such ground-object occlusion nor embed occlusion awareness into the detection pipeline, limiting their performance in occluded UAV scenes. To address these challenges, we propose CAMF-Det, a closure-aware multimodal fusion framework for LiDAR-camera 3D object detection on UAV platforms, which derives dual-modal occlusion intensity through physics-inspired modeling and embeds them as priors throughout the detection pipeline. First, a dual-modal closure modeling module explicitly constructs occlusion intensity ground truth for both modalities offline via a Beer-Lambert-inspired formulation and building-mask correction. Second, using these ground-truth maps as supervision, a dual-modal prediction network converts the offline modeling results into online occlusion intensity predictions under single-frame inference. Third, both ground-truth and predicted occlusion intensity are injected into data augmentation, feature encoding, multimodal fusion, and detection head, enabling adaptive detection under spatially varying and modality-dependent information degradation. Experiments on two self-built UAV-based multimodal datasets, SI3D-DI and SI3D-DII, demonstrate that CAMF-Det achieves the best performance across all difficulty levels, with hard-level mAP$_{\mathrm{BEV}}$ improvements of 9.43% and 4.88% over the best competing methods, respectively. These results confirm the effectiveness of explicit occlusion prior modeling and exploitation for robust multimodal 3D detection in UAV scenes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09143v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yanze Jiang, Yanfeng Gu, Xian Li</dc:creator>
    </item>
    <item>
      <title>Containerizing BIDSme : A Reproducible Tool for BIDS Conversion</title>
      <link>https://arxiv.org/abs/2606.09144</link>
      <description>arXiv:2606.09144v1 Announce Type: new 
Abstract: The "Brain Imaging Data Structure" (BIDS) has become a widely adopted standard for organizing and sharing neuroimaging datasets of various modalities. However, converting raw brain imaging data into BIDS framework remains a complex and time-consuming task. BIDSme is a semi-automated tool developed to streamline this conversion process, but until recently, it lacked the portability and accessibility needed for widespread adoption. This paper presents the containerization of BIDSme using Docker and Docker Compose, improving usability, reproducibility, and integration into existing platforms like Neurodesk. It also details the design choices, iterative refinements, and validation process that led to a flexible, lightweight, and user-friendly containerized application.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09144v1</guid>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bradley Spitz, Antoine Jacquemin, Nikita Beliy, Christophe Phillips</dc:creator>
    </item>
    <item>
      <title>PrivCode++: Latent-Conditioned Differentially Private Code Generation for Comprehensive Guarantees</title>
      <link>https://arxiv.org/abs/2606.09145</link>
      <description>arXiv:2606.09145v1 Announce Type: new 
Abstract: Large language models fine-tuned on instruction-code pairs may memorize and subsequently leak sensitive training data. Existing differentially private (DP) code generation methods primarily protect code snippets while assuming prompts are public, which fails in realistic scenarios where prompts may also contain sensitive information. When prompts cannot be explicitly learned or used during generation, code synthesis suffers from severe utility degradation as well as reduced diversity and fidelity. To address these challenges, we propose PrivCode-Plus, the first work to explore DP code generation where both prompts and code snippets are considered sensitive in LLM fine-tuning. PrivCode-Plus introduces a two-stage DP framework with a Privacy-Free Latent Conditioning module, enabling effective DP fine-tuning and data synthesis without direct access to sensitive prompts or code. Extensive experiments show that PrivCode-Plus achieves substantially higher utility than baselines, remains competitive with the method with relaxing privacy assumptions, and provides stronger privacy guarantees.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09145v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zheng Liu, Chen Gong, Terry Yue Zhuo, Zhou Yang, Kecen Li, Wenlong Meng, Xinwen Hou, Yu Liu, Xiaochen Li</dc:creator>
    </item>
    <item>
      <title>Explicit Representation Alignment for Multimodal Sentiment Analysis</title>
      <link>https://arxiv.org/abs/2606.09148</link>
      <description>arXiv:2606.09148v1 Announce Type: new 
Abstract: Multimodal affective analysis aims to understand human sentiment and emotion by jointly modeling heterogeneous modalities such as text and images. However, multimodal models often fail to consistently outperform strong text-only baselines, with performance varying significantly across fusion strategies. In this work, we identify representation misalignment between independently pretrained modality encoders as a key bottleneck for effective multimodal learning, and show through controlled experiments that alignment prior to fusion is often more important than fusion complexity. To address this issue, we propose a unified multimodal affective analysis framework that leverages vision-language models (VLMs) to convert visual content into structured textual descriptions, projecting heterogeneous modalities into a shared linguistic space and enabling interpretable text-centric reasoning. To further improve robustness, we introduce a hybrid learning strategy that combines semantic token selection with a batch-level uniformity regularization objective, encouraging a more dispersed and stable global feature space while mitigating noise introduced by VLM-generated descriptions. Experiments on multiple multimodal sentiment and emotion benchmarks show that our method consistently outperforms strong unimodal and multimodal baselines, achieving state-of-the-art performance. Our analysis further highlights the critical role of representation alignment in multimodal affective learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09148v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Baode Wang, Ziming Wang, Huacan Wang, Ronghao Chen, Biao Wu</dc:creator>
    </item>
    <item>
      <title>Ultra Flash: Scaling Real-Time Streaming Video Generation to High Resolutions</title>
      <link>https://arxiv.org/abs/2606.09150</link>
      <description>arXiv:2606.09150v1 Announce Type: new 
Abstract: While recent autoregressive video diffusion models achieve remarkable streaming quality, they remain confined to low resolutions (e.g., 480P), leaving efficient, scalable, real-time high-resolution video generation a fundamental open challenge. To bridge this gap, we present Ultra Flash, a cascaded streaming framework capable of real-time high-resolution video generation. Ultra Flash achieves ~30 FPS at 1K resolution and ~18 FPS at 2K resolution on a single GPU through three key contributions: (1) an architecture-preserving T2V-to-TV2V super-resolution training paradigm coupled with an AIGC-oriented data degradation pipeline that effectively preserves the generative capability of the base model, enabling enhanced high-resolution detail when cascaded after mainstream low-resolution generative models; (2) a causal streaming latent upsampler paired with a high-resolution decoder, which enhances spatiotemporal coherence while enabling efficient latent spatial scaling and precise high-resolution decoding with negligible computational overhead; and (3) a cascade high-resolution streaming video generation optimization scheme that first performs hybrid-reward-enhanced sparse causalization and single-step distillation of the super-resolution model, then introduces cascaded streaming self-forcing preference optimization with dynamic cache management, jointly enhancing overall coherence, improving quality, and enabling real-time high-resolution streaming video generation. Extensive experiments demonstrate that Ultra Flash reliably produces ultra-high-resolution streaming video while maintaining state-of-the-art visual quality and superior efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09150v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator> Luxury, Jie Huang, Zihao Fan, Xiaoxiao Ma, Yuming Li, Jun-hao Zhuang, Zeyue Xue, Siming Fu, Haoran Li, Mingchen Zhong, Guohui Zhang, Shichen Ma, Yijun Liu, Jiaqi Shi, Yanwen Ma, Yaofeng Su, Haoyu Wang, Yaowei Li, Songchun Zhang, Weiyang Jin, Yuxuan Bian, Shiyi Zhang, Haojun Xu, Shuai Lu, Xin Han, Wei Tang, Haoyang Huang, Nan Duan</dc:creator>
    </item>
    <item>
      <title>Customization under Fire: Plugin Poisoning in Text-to-Image Ecosystem</title>
      <link>https://arxiv.org/abs/2606.09151</link>
      <description>arXiv:2606.09151v1 Announce Type: new 
Abstract: The prosperity of text-to-image (T2I) models has fostered a vibrant share-and-play ecosystem centered on Low-Rank Adaptation (LoRA) plugins, which allow users to customize and share model capabilities with ease. This democratization, however, comes with a hidden but severe security risk. Malicious users could share and distribute seemingly benign LoRA plugins that contain hidden functionalities to poison the model-sharing market, like Civitai or Liblib, severely undermining the user trust that underpins this collaborative ecosystem and threatening the safety of countless downstream applications. Despite these risks, plugin poisoning in the real-world T2I ecosystem remains underexplored. This paper introduces PoisonLoRA, the first systematic study of LoRA plugin supply-chain risks that exploits the trust and characteristics within the T2I ecosystem. We identify two primary attack instances: (1) Concept Hijacking, where a hijacked LoRA could generate images to influence public opinion and spread propaganda, and (2) Task Injection, where a LoRA is injected to produce harmful content (e.g., NSFW images) only activated by a secret key. Critically, the malicious payload persists with virus-like propagation. Such propagations weaponize the very act of creative collaboration (e.g., LoRA merging) to spread its contagion, turning every remix into a new carrier. Extensive experiments validate that PoisonLoRA is both effective and stealthy. Specifically, we achieve approximately 100% attack success rates (ASR) on both Civitai and Liblib on 6 datasets across 4 scenarios, without being detected by the platforms. The poisoned LoRA demonstrates extreme robustness, with nearly 100% ASR even transferred to different base models and remixed more than 5 times.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09151v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiahao Chen, Xing He, Yong Yang, Xinfeng Li, Chunyi Zhou, Junhao Li, Zhe Ma, Tianyu Du, Shouling Ji</dc:creator>
    </item>
    <item>
      <title>Improved Convergence Analysis of Topology Dependence in Decentralized SGD</title>
      <link>https://arxiv.org/abs/2606.09154</link>
      <description>arXiv:2606.09154v1 Announce Type: new 
Abstract: Decentralized SGD is a fundamental algorithm in decentralized learning, although the influence of an underlying network topology on its convergence behavior is not yet fully understood. Existing convergence analyses have shown that topologies with a small spectral gap significantly deteriorate the convergence rate of Decentralized SGD in both homogeneous and heterogeneous cases. However, many prior papers have reported that indeed the choice of the topology has a significant experimental impact in the heterogeneous case, but has little experimental impact on training behavior in the homogeneous case. In this paper, we present a tighter convergence analysis of Decentralized SGD, offering a more precise understanding of how topologies affect the convergence rate than the prior analysis. Specifically, unlike existing convergence analyses that used only the spectral gap as a property of the topology, our novel analysis shows that all eigenvalues of the mixing matrix affect the convergence rate. Throughout the experiments, we carefully evaluated the convergence behavior of Decentralized SGD and demonstrated that our novel convergence analysis can more accurately describe the effect of topology on the convergence rate.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09154v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuki Takezawa, Anastasia Koloskova, Sebastian U. Stich</dc:creator>
    </item>
    <item>
      <title>Bridged SBI: Correcting Biased Low-Fidelity Posteriors for Cost-Efficient High-Fidelity Inference</title>
      <link>https://arxiv.org/abs/2606.09155</link>
      <description>arXiv:2606.09155v1 Announce Type: new 
Abstract: Accurate calibration of particle-based simulators is crucial for robotic earthwork simulation, but analytical calibration is challenging due to this task's highly nonlinear particle dynamics and the black-box nature of conventional simulators. Although simulation-based inference (SBI) can estimate posterior distributions over simulation parameters solely from forward simulations, applying SBI directly to high-fidelity (HF) particle simulators is often computationally prohibitive. Low-fidelity (LF) simulators with coarser particles can reduce this cost, but changes in particle size and particle count shift the parameter values needed to reproduce the same observation, producing biased LF posteriors. We propose Bridged SBI, which leverages a biased but informative LF posterior to guide HF inference. This method first uses inexpensive LF simulations to identify a coarse high-density parameter region, and then it learns a local residual bridge to transport LF posterior samples toward HF-consistent regions by correcting the LF--HF discrepancy. We analyze how sequential multi-fidelity SBI (Naive-MF) can suffer from LF-induced posterior miscoverage when it directly relies on the LF posterior without discrepancy correction. We then show that Bridged SBI is designed to alleviate this issue by explicitly modeling the LF--HF discrepancy through residual correction. Experiments on both sim-to-sim particle-parameter calibration and real-to-sim calibration with real soil observation show that Bridged SBI produces more accurate and reliable HF posteriors than HF-only SBI or the Naive-MF baseline, especially under limited HF simulation costs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09155v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Gahee Kim, Yuki Kadokawa, Sandro M. Alcantara Tacora, Taro Abe, Daisuke Endo, Genki Yamauchi, Takeshi Hashimoto, Takamitsu Matsubara</dc:creator>
    </item>
    <item>
      <title>OmniGen-AR: AutoRegressive Any-to-Image Generation</title>
      <link>https://arxiv.org/abs/2606.09156</link>
      <description>arXiv:2606.09156v1 Announce Type: new 
Abstract: Autoregressive (AR) models have demonstrated strong potential in visual generation, offering superior performance with simple architectures and optimization objectives. However, existing methods are typically limited to single-modality conditions, e.g., text, restricting their applicability in real-world scenarios that demand image synthesis from diverse controls. In this work, we present OmniGen-AR, a unified autoregressive framework for Any-to-Image generation. By discretizing various visual conditions through a shared visual tokenizer and text prompts with a text tokenizer, OmniGen-AR supports a broad spectrum of conditional inputs within a single model, including text (text-to-image generation), spatial signals (segmentation-to-image and depth-to-image), and visual context (image editing, frame prediction, and text-to-video generation). To mitigate the risk of information leakage from condition tokens to content tokens, we introduce Disentangled Causal Attention (DCA), which separates the full-sequence causal mask into condition causal attention and content causal attention. It serves as a training-time regularizer without affecting the standard next-token prediction during inference. With this design, OmniGen-AR achieves new state-of-the-art or at least competitive results across a range of benchmark, e.g., 0.63 on GenEval and 80.02 on VBench, demonstrating its effectiveness in flexible and high-fidelity visual generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09156v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Junke Wang, Xun Wang, Qiushan Guo, Peize Sun, Weilin Huang, Zuxuan Wu, Yu-Gang Jiang</dc:creator>
    </item>
    <item>
      <title>SEF-CLGC at SemEval-2026 Task 11: Logical Notation Impact on Language Model Performance</title>
      <link>https://arxiv.org/abs/2606.09157</link>
      <description>arXiv:2606.09157v1 Announce Type: new 
Abstract: This paper revisits our pipeline called Syllogistic Evaluation Framework-Common Logic Grammar Construction (SEF-CLGC). We combine formal logical notations with Small Language Models (SLMs) to evaluate reasoning performance on the SemEval-2026 Task 11 Subtask 1: Disentangling Content and Formal Reasoning in Large Language Models. Our experiments show that by relying solely on SLMs, trained on a combination of natural and symbolic languages, our best model achieves a content score of 27.80% on the task while significantly lowering the content bias in reasoning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09157v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hanna Abi Akl, Fabien Gandon, Catherine Faron, Pierre Monnin</dc:creator>
    </item>
    <item>
      <title>Unified Energy for Invariant and Independent Decoding in Diffusion Language Models</title>
      <link>https://arxiv.org/abs/2606.09159</link>
      <description>arXiv:2606.09159v1 Announce Type: new 
Abstract: Diffusion Language Models (DLMs) enable parallel text generation by iteratively denoising a full sequence, offering attractive flexibility compared to auto-regressive (AR) decoding. However, existing methods fail to fully capture token relationships, leading to a performance gap relative to AR baselines, especially as the degree of parallelism increases. In this paper, we give a systematic analysis of the gap, identifying three key factors: (i) model capacity, (ii) dependency, and (iii) invariance. To address these issues, we first propose an invariant energy (Inv-E) together with an effective sampling-based estimator to handle the invariance issue. By further combining with the independent energy (Ind-E), we obtain a unified energy (Uni-E), that accounts for all these factors. Uni-E enjoys a unique advantage: it can be computed exactly without sampling-based partition estimation. Besides, Uni-E is model agnostic and can therefore be scaled to models of arbitrary size. We further prove that Uni-E can correct the distribution shift caused by dependency and invariance. Extensive experiments across Diffusion Language Models (DLMs) and Diffusion Large Language Models (DLLMs) demonstrate the effectiveness of the proposed Uni-E.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09159v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuchen Yan, Minkai Xu, Zaiquan Yang, Yatao Bian</dc:creator>
    </item>
    <item>
      <title>Crop Recommendation and Agricultural Query Answering System Using Spatio-Temporal Graph Neural Networks and Hybrid Retrieval Augmentation</title>
      <link>https://arxiv.org/abs/2606.09160</link>
      <description>arXiv:2606.09160v1 Announce Type: new 
Abstract: This paper presents a unified system designed to support precision agriculture by integrating advanced weather prediction, crop recommendation, and a question-answering tool for farmers. We propose two deep learning models -- a Transformer-based Graph Neural Network and a Spatio-Temporal Graph Convolutional Network (STGCN) -- to forecast weather conditions for the next 30 days using data from 1,359 locations in Nepal. The STGCN outperforms the Transformer-based model in accuracy (MSE ~0.011 vs. 0.013), effectively modeling both spatial and temporal dependencies in climate data. These predictions are combined with static soil properties such as pH, moisture, and organic content to generate localized crop recommendations through a scoring algorithm that matches each crop's optimal growing conditions. Additionally, we develop a Retrieval-Augmented Generation (RAG) chatbot that leverages domain-specific agricultural documents to answer farmers' questions in natural language. The entire system is deployed via a mobile application, offering real-time suggestions and conversational support. User feedback confirms the system's usability and relevance, especially in rural settings where personalized farming guidance is limited. Overall, our approach demonstrates how combining machine learning models with local agricultural data can empower farmers with actionable insights, promoting more informed decisions, better crop yields, and increased resilience to climate variability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09160v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Prajwal Thapa, Yagya Raj Pandeya</dc:creator>
    </item>
    <item>
      <title>Extreme Points of the $(0,\delta)$-LDP Polytope with Small Input Size and Arbitrary Output Sizes</title>
      <link>https://arxiv.org/abs/2606.09161</link>
      <description>arXiv:2606.09161v1 Announce Type: new 
Abstract: The structure of locally differentially private (LDP) mechanisms can be understood through the geometry of the corresponding privacy polytope. While the extreme points of the \( (\epsilon,0)\)-LDP polytope are well characterized (Kairouz \emph{et al.}, 2014; Holohan \emph{et al.}, 2017; Pensia \emph{et al.}, 2017), comparatively little is known for the \((\epsilon,\delta)\)-LDP polytope with \(\delta&gt;0\). Recent work (Elangovan and Jog, 2024) has shown that even in the special case \(\epsilon=0\), the \( (0,\delta) \)-LDP privacy polytope exhibits fundamentally different behaviour. In this work, we provide complete characterizations of the extreme points for the low-input-alphabet regime \(k=2\) and \(k=3\) and with arbitrary output alphabet size \(m \). We also identify new extreme mechanisms for larger input alphabet sizes $k$, of the star configuration type, as introduced by Elangovan and Jog (2024).</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09161v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Supriya Rawat, Myna Vajha, Gowtham R. Kurri, Anand Sarwate</dc:creator>
    </item>
    <item>
      <title>Zero-Parameter Geometric Gating for Temporally Stable Low-Altitude UAV Video Semantic Segmentation</title>
      <link>https://arxiv.org/abs/2606.09162</link>
      <description>arXiv:2606.09162v1 Announce Type: new 
Abstract: Video semantic segmentation for low-altitude UAVs requires temporal consistency, yet dense optical flow introduces spatially structured noise in the planar regions that dominate aerial imagery. We propose a zero-parameter geometric gate that uses RANSAC homography inlier ratios on a $16\times16$ spatial grid to route each region to either homography or optical flow warp before fusion via Semantic Similarity Propagation. The gate requires no learned parameters -- only a median-threshold binary decision on RANSAC statistics -- adding only 211K trainable parameters (the SSP fusion layer) to a frozen backbone. On synthetic UAVid, the method achieves +4.24--4.91\% mIoU improvement over base models across two architectures (SegFormer-b2 and Hiera-S+UPerNet). Mechanism diagnostics reveal that flow residuals in planar regions are spatially autocorrelated (Moran's I = 0.32, $p &lt; 0.001$), predict boundary instability (Spearman $\rho = 0.66$), and that rigidification recovers temporal consistency from 62\% to 92\% (+29.5pp) in homography-valid regions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09162v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jingpu Yang, Fengxian Ji, Zhengzhao Lai, Juanfan Wu, Mingxuan Cui, Yufeng Wang</dc:creator>
    </item>
    <item>
      <title>EnclaveScale: Hardware-Assisted Edge-DP for Secure Data Centre Power Telemetry</title>
      <link>https://arxiv.org/abs/2606.09163</link>
      <description>arXiv:2606.09163v1 Announce Type: new 
Abstract: EnclaveScale is a distributed, hardware-assisted telemetry architecture providing post-extraction attestation, enabling operators to collaboratively model high-resolution generative AI power transients. Existing cryptographic techniques scale poorly for 10-Hz streaming or fail to authenticate origins, permitting malicious hosts to spoof sensor inputs. We implement and evaluate a post-extraction pipeline utilizing DCAP attestation, differential privacy noise injection, and Byzantine rejection across 32 GCP Confidential VMs, achieving 0\% post-extraction attack success rate. This edge-DP approach distils continuous GPU transients into discrete Markov-chain transition matrices, guaranteeing event-level differential privacy. To mitigate pre-ingestion vulnerabilities, we propose an SPDM-authenticated first-mile layer. While current platforms lack attested I/O, emerging hardware architectures integrate PCIe IDE and TDISP to natively prevent host-level synthesis, securing the end-to-end provenance boundary. A Global Aggregation Enclave verifies these cryptographic proofs prior to capacity-weighted aggregation. Evaluation demonstrates a steady-state throughput of $131{,}406$ samples/s per enclave, amortising attestation overhead to $0.23\,\mu$s/sample. On empirical NVML-sampled H100, A100, and L4 traces, EnclaveScale achieves a dynamic orchestration margin error of $1.3$\,MW compared to $0.1$\,MW for an honest-aggregator central-DP baseline. EnclaveScale establishes a secure foundation for dynamic multi-tenant power orchestration, obfuscating sub-second anomalies locally and protecting macro-workload confidentiality via spatial dilution during global aggregation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09163v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Hung Dang, Tue Nguyen, Minh Vo</dc:creator>
    </item>
    <item>
      <title>Reliable to Expressive: A Curriculum for Rubric-Following Safety Judges</title>
      <link>https://arxiv.org/abs/2606.09165</link>
      <description>arXiv:2606.09165v1 Announce Type: new 
Abstract: Safety judges are increasingly deployed to evaluate model outputs against evolving criteria, yet recent meta-evaluation work shows they remain brittle under prompt and rubric variation, with false negative-rate swings of up to 0.24 reported for stylistic perturbations alone. We argue that safety judgment is fundamentally a rubric-following problem: a robust judge must apply the given evaluation criteria consistently across rubric formulations rather than memorize one specific template. We propose a training strategy that combines (i) instance-conditioned dynamic rubrics generated from prompt-response-label triples to expose the judge to the variability of evaluation criteria, and (ii) a reliable-to-expressive curriculum that begins with clean fixed-rubric supervision and progressively introduces noisier dynamic-rubric data. We evaluate on a single human-labeled set under three contrasting rubric prompts (HarmBench-style, ShieldGemma-style, and a domain-specific rubric). Our 12B curriculum judge achieves 94.12-94.88% accuracy across the three rubrics with a cross-rubric range of only 0.76, outperforming general-purpose LLMs, dedicated safety classifiers, and reasoning-oriented judges up to 30B in both peak accuracy and stability. An ablation shows that naively mixing dynamic rubrics into SFT increases cross rubric variance (1.44 -&gt; 3.60); only the curriculum schedule recovers and improves on the fixed rubric baseline (variance 0.76).</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09165v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Yongtaek Lim, Hyeji Choi, Minwoo Kim</dc:creator>
    </item>
    <item>
      <title>Vision-Language Guided Hyperspectral Object Tracking via Semantics Fusion and Contextual Template Updating</title>
      <link>https://arxiv.org/abs/2606.09167</link>
      <description>arXiv:2606.09167v1 Announce Type: new 
Abstract: Hyperspectral object tracking (HOT) leverages the rich spectral information provided by hyperspectral videos (HSVs), offering substantial potential for object tracking. However, efficiently extracting and exploiting spectral information from redundant spectral bands remains a fundamental challenge, which severely limits model generalization and tracking performance. Moreover, in dynamic scenes, targets often experience drastic appearance variations due to factors such as occlusion and illumination changes. These variations lead to large deformations between the current frame and the template. Such discrepancies pose major challenges for existing temporal modeling approaches. In this work, we propose VLHTrack, a novel hyperspectral vision-language (VL) joint tracking framework. Specifically, we incorporate language priors to address the fundamental challenge of spectral redundancy by designing a Language-Guided Band Selection Module (LBSM). By leveraging Large Language Model (LLM) descriptions, LBSM establishes a semantic-to-spectral mapping that mitigates redundancy and accentuates discriminative spectral features. A Multi-Modal Vision-Language Fusion Module is then employed to seamlessly integrate visual and linguistic embeddings, harnessing their complementary advantages to learn coherent cross-modal representations. To address target deformation in long-term sequences, we propose a dynamic update template feature strategy implemented via the Dynamic Template Update with Mamba (DTUM) module. By leveraging selective state space modeling, DTUM learns inter-frame dependencies to update template feature, ensuring efficient template feature evolution guided by temporal context. Experiments on HOT2023 and HOT2024 demonstrate that VLHTrack outperforms state-of-the-art (SOTA) methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09167v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Rui Yao, Yuhong Zhang, Kunyang Sun, Hancheng Zhu, Jiaqi Zhao, Zhiwen Shao, Abdulmotaleb El Saddik</dc:creator>
    </item>
    <item>
      <title>IMUG-Bench: Benchmarking Unified Multimodal Models on Interleaved Understanding and Generation</title>
      <link>https://arxiv.org/abs/2606.09169</link>
      <description>arXiv:2606.09169v1 Announce Type: new 
Abstract: In recent years, unified multimodal models (UMMs) have emerged to support both understanding and generation within a single framework. Mastering dynamic, multi-turn interleaved image-text dialogues is a crucial task for UMMs in real-world applications. However, existing benchmarks fail to evaluate this important task, as they are often limited to single-turn or static settings, and typically overlook exposure bias in multi-turn interactions. To bridge this gap, we propose IMUG-Bench, a comprehensive benchmark for multi-turn interleaved image-text dialogue of UMMs that jointly evaluates their understanding and generation capabilities. Our IMUG-Bench comprises three classes: Static Spatial, Temporal Causal, and Hybrid, covering 3,113 samples and 12,034 interaction turns. It also includes dynamic understanding questions, thereby supporting evaluation that better reflects real-world multi-turn interaction scenarios. Large-scale experiments on IMUG-Bench systematically evaluate mainstream open-source and closed-source UMMs, revealing their capability boundaries and failure modes, and uncovering pronounced exposure bias on the generation side in multi-turn interactions. We further explore several test-time scaling strategies, including Chain-of-Thought, Self-Verification, and Best-of-N Sampling, which effectively improve generation accuracy and mitigate exposure bias in generation tasks. These findings provide insights into enhancing the robustness and multi-turn interaction capability of future UMMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09169v1</guid>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <category>cs.MM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lingyi Meng, Zecong Tang, Haoran Li, Tengju Ru, Zhejun Cui, Weitong Lian, Qi Kang, Hangshuo Cao, Yichen Zhu, Yechi Liu, Kaixuan Wang, Yu-Jie Yuan, Chunwei Wang, Yu Zhang, Bo Dai</dc:creator>
    </item>
    <item>
      <title>sketch-plot: Progressive Editing for Text-to-Image Academic Figures</title>
      <link>https://arxiv.org/abs/2606.09171</link>
      <description>arXiv:2606.09171v1 Announce Type: new 
Abstract: Text to image (T2I) models such as gpt-image-2 can now generate publication grade academic figures from a short prompt, but the output is a flat raster: a user who wants to change one arrow, one label, or one icon has to regenerate the whole image, which also disturbs the parts they wanted to keep. We present sketch-plot, an interactive system that closes this controllability gap with a three layer progressive editing pipeline: a generated PNG, an addressable puzzle of editable pieces, and a per piece SVG. The user stops at the layer that gives them enough control for the change at hand, so the cost of decomposition and vectorisation is paid only on the pieces that need it. Realising this pipeline is not trivial. General segmentation models lack the semantic discriminability to decompose a research figure cleanly, and end to end image vectorisation produces incomplete shapes and loses semantic structure. We therefore route both stages through a human in the loop interface that lets the user accept, refine, or reject decomposition and vectorisation decisions on a piece by piece basis. We validate the design with an expert user study, in which participants found sketch-plot effective for making targeted edits to AI generated academic figures and preferred it over regenerating the whole image. A demonstration video is available at https://anonymous.4open.science/r/SketchPlotVideo/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09171v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yinghao Tang, Yupeng Xie, Yingchaojie Feng, Tingfeng Lan, Wei Chen</dc:creator>
    </item>
    <item>
      <title>Uniform-in-time Strong Error Estimates of Tamed-FEM to Superlinear SPDEs driven by Multiplicative Noise</title>
      <link>https://arxiv.org/abs/2606.09173</link>
      <description>arXiv:2606.09173v1 Announce Type: new 
Abstract: We establish sharp, uniform-in-time strong error estimates for a nonlinearity-explicit tamed finite element method (FEM) applied to a class of superlinear stochastic partial differential equations (SPDEs) driven by multiplicative noise, including the stochastic Allen--Cahn equation with a moderately thick interface. This tamed-FEM was first introduced in [Z. Liu and J. Shen, arXiv:2502.19117] to ensure long-time unconditional stability and to preserve the Lyapunov structure of this class of SPDEs. We further prove that the scheme is exponentially ergodic and derive the convergence rate between the exact invariant measure and its numerical counterpart in the Wasserstein-2 distance. Finally, we present numerical experiments that verify the ergodicity as well as the sharpness and time-independence of the strong convergence rates for this tamed-FEM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09173v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jingjing Cai, Zhihui Liu</dc:creator>
    </item>
    <item>
      <title>Demonstrating chart-plot: Closing the Last Mile of Academic Chart Generation</title>
      <link>https://arxiv.org/abs/2606.09174</link>
      <description>arXiv:2606.09174v1 Announce Type: new 
Abstract: Large language models can translate a researcher's intent into runnable matplotlib code, yet the resulting chart rarely lands in a paper without multiple rounds of manual revision. We argue that the open problem is not chart code generation but chart publication: making the output look like a top-venue figure, survive the target layout, and respond to precise author edits. We present chart-plot, an agentic harness that closes this last mile through three components: (1) a style-aware code generator conditioned on a textual style skill distilled from accepted figures at the target venue, (2) a deployment-aware render loop that compiles the chart inside the target LaTeX context and revises until layout constraints are met, and (3) a structured edit layer that exposes every chart element as a directly manipulable handle. We report early results on three chart-type case studies (grouped bar, scaling line, paired distributions) and a small user study.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09174v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yinghao Tang, Yupeng Xie, Yingchaojie Feng, Jiale Lao, Tingfeng Lan, Wei Chen</dc:creator>
    </item>
    <item>
      <title>CANS: Accelerating Multiuser Collaborative Edge Inference via Cooperative Autodidactic NeuroSurgeon</title>
      <link>https://arxiv.org/abs/2606.09175</link>
      <description>arXiv:2606.09175v1 Announce Type: new 
Abstract: Recently, mobile edge computing (MEC)-enabled collaborative deep neural network (DNN) inference has emerged as a promising approach for delivering intelligent services to resource-constrained mobile devices. A representative scenario is multi-user collaborative edge inference, where distinct devices independently partition their DNN models and offload backend computation to a common edge server over wireless networks. However, determining the optimal DNN partition for each device is challenging due to unknown and time-varying system conditions, including fluctuating wireless links and diverse device capabilities. To address this problem, we propose Cooperative Autodidactic NeuroSurgeon (CANS), a collaborative edge inference framework that enables devices to adaptively learn optimal DNN partitions by sharing informative feedback during online inference. To handle the challenge of device heterogeneity and better leverage offline inference experience, we integrate a novel FedLinUCB-DW algorithm that groups devices of the same type and warm-starts online exploration using local offline early-exit inference experience. Furthermore, we provide theoretical guarantees for FedLinUCB-DW by deriving the regret upper bound. We also validate our method on both a simulated environment and a hardware prototype system. Empirical evaluations demonstrate that CANS achieves lower inference latency compared to state-of-the-art baselines. Especially, in prototype experiments on two edge devices, the proposed CANS reduced average inference latency by up to 50% compared to the non-cooperative baseline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09175v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zheshun Wu, Ziyang Zhang, Changyao Lin, Zenglin Xu, Jie Liu</dc:creator>
    </item>
    <item>
      <title>Performance Evaluation of Social Learning</title>
      <link>https://arxiv.org/abs/2606.09176</link>
      <description>arXiv:2606.09176v1 Announce Type: new 
Abstract: Social Learning is a decentralized decision-making paradigm in which spatially dispersed agents collect streaming observations regulated by one of a finite number of models (the hypotheses). The agents are interested in assigning probability scores (the beliefs) to the possible hypotheses. To this end, the agents exchange their beliefs according to a certain communication graph. It has been shown that, under reasonable conditions on the identifiability of the decision model and the network connectivity, each agent ultimately places all the belief mass on the true hypothesis governing the data. However, several questions remain unanswered regarding the evaluation of the social learning performance. One recently adopted performance metric is the rejection rate, i.e., the rate at which the beliefs about the erroneous hypotheses vanish. One contribution of this work is to establish that the rejection rate leads to several paradoxes, which make it unsuitable as a valid performance measure. We then focus on studying the error probability measure. For a binary Gaussian problem, we derive an analytical formula characterizing the ratio between the individual agents' probabilities and the optimal Bayesian probability. The formula shows that this ratio is expressed by the product of two terms quantifying the effect of the network connectivity and the role of the prior information. As a result, an irreducible gap emerges between the decentralized and the centralized error probabilities, which is agent-dependent and does not disappear asymptotically.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09176v1</guid>
      <category>cs.MA</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Felice Scala, Marco Carpentiero, Vincenzo Matta, Ali H. Sayed</dc:creator>
    </item>
    <item>
      <title>Culturally-Adapted Red-Teaming Across East and Southeast Asian Contexts: A Methodological and Comparative Analysis</title>
      <link>https://arxiv.org/abs/2606.09178</link>
      <description>arXiv:2606.09178v1 Announce Type: new 
Abstract: Multilingual safety evaluation of large language models (LLMs) has predominantly relied on direct translation (DT) of English benchmarks into target languages - an approach that converts surface-level linguistic form while failing to reflect the cultural context embedded in threat scenarios, social norms, and legal frameworks. We construct paired DT and culturally-adapted (CA) datasets via 1:1 seed matching for four languages - Korean (KO), Japanese (JA), Thai (TH), and Khmer (KM) - and compare Attack Success Rate (ASR) and Cultural Realism scores across four open-source LLM. CA prompts yield Delta-ASR &gt; 0 across all 16 language x model combinations (mean +9.3 pp), and DT-based evaluation underestimates risk in 44 of 48 category x language combinations. Language-level analysis reveals that the distribution of threat forms is heterogeneous across languages. Cultural Realism analysis further shows that DT Cultural Depth (C3) scores remain consistently below 1.0 out of 3.0 across all four languages (mean 0.17), whereas CA scores reach up to 2.51, indicating that direct translation produces inputs systematically divergent from those encountered in real-world multicultural settings. These findings demonstrate that adapting benchmarks to language-specific cultural contexts - rather than relying on linguistic translation alone - is necessary for valid multilingual LLM safety evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09178v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Hyeji Choi, Yongtaek Lim, Minwoo Kim</dc:creator>
    </item>
    <item>
      <title>Claude Code-Driving Scenario Mining for the Argoverse 2 Challenge</title>
      <link>https://arxiv.org/abs/2606.09180</link>
      <description>arXiv:2606.09180v1 Announce Type: new 
Abstract: We present our submission to the CVPR 2026 Argoverse 2 Scenario Mining Challenge. Our system uses a four-stage pipeline: (1) autonomous code generation via a Claude Code agent powered by GLM~5.1, (2) iterative training set screening with Timestamp Balanced Accuracy threshold 0.8 to curate few-shot examples, (3) semantic code review by a separate Claude Code session, and (4) Qwen3-VL scene-level verification to filter false positives. We report results on the Argoverse 2 test set.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09180v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wei Deng, Caoshengzhe Xue, Shuaikun Liu, Zhaohong Liu, Mengshi Qi, Huadong Ma</dc:creator>
    </item>
    <item>
      <title>Counterfactual Reasoning for Fine-Grained Evidence Disentanglement in VideoQA</title>
      <link>https://arxiv.org/abs/2606.09181</link>
      <description>arXiv:2606.09181v1 Announce Type: new 
Abstract: Recent advances in video multimodal models have significantly improved VideoQA performance. However, these systems often rely on spurious statistical correlations rather than answer-relevant causal evidence, resulting in unfaithful and brittle reasoning, especially in complex real-world scenarios. Existing methods either rely on cross-modality correlations, costly curated training resources, or insufficient causal assumptions and constraints, and typically operate at the time-interval level. As a result, they fail to explicitly disentangle causal visual cues from confounders and provide limited fine-grained evidence localization. To address this issue, we propose a Counterfactual Reasoning framework for fine-grained Evidence Disentanglement (CREDiT). CREDiT formulates the VideoQA process using a structural causal model and learns cross-modality representations that are explicitly decomposed into causal and non-causal components under independence and minimality constraints. To facilitate faithful disentanglement, we introduce feature-level causal interventions and construct counterfactual inputs that approximate causal effects while suppressing non-causal correlations. Extensive experiments on NExT-GQA, SportsQA, and SPORTU-video demonstrate that CREDiT consistently improves answer accuracy and reasoning reliability across both generic and complex sports scenarios, leading to more trustworthy VideoQA systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09181v1</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhou Du, Hamid Krim, Xiao Wu, Zhaoquan Yuan, Liangwei Li, Keisuke Fujii</dc:creator>
    </item>
    <item>
      <title>Understanding How Enterprises Adopt the Model Context Protocol for LLM-Driven Software Engineering</title>
      <link>https://arxiv.org/abs/2606.09182</link>
      <description>arXiv:2606.09182v1 Announce Type: new 
Abstract: Large Language Models (LLMs) are increasingly used in AI-based software engineering, but their limitations in complex task execution and multi-tool coordination have driven growing interest in the Model Context Protocol (MCP). Existing research has mainly focused on MCP's technical design, with limited empirical evidence on how it is adopted and used in enterprise practice, particularly with regard to deployment challenges, operational risks, and practitioner expectations. To address this gap, we conducted semi-structured interviews with 20 practitioners from eight companies in the Internet and financial sectors. The findings show that MCP is valued for supporting cross-system collaboration, task decoupling, and knowledge reuse in LLM-based workflows, but its adoption remains constrained by ecosystem fragmentation, cross-component coordination difficulties, and unresolved problems in distributed state management and fault diagnosis. Participants also expressed strong demand for better standardization, lower adoption barriers through low-code or plugin-based approaches, and more systematic operational support. These results provide early empirical evidence on enterprise MCP practice and offer practical implications for improving MCP's standardization, usability, and deployment readiness in real-world software engineering environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09182v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kehui Chen, Yicheng Sun, Jacky Keung, Zhenyu Mao, Xiaoxue Ma</dc:creator>
    </item>
    <item>
      <title>Autonomous Obstacle Removal for Excavators through Policy Learning with Particle Simulation</title>
      <link>https://arxiv.org/abs/2606.09183</link>
      <description>arXiv:2606.09183v1 Announce Type: new 
Abstract: Autonomous obstacle removal from the ground is an important earthwork task, but this is difficult to automate because an excavator must adapt its excavation trajectories over repeated cycles as soil-obstacle conditions change. Learning such state-dependent behavior requires a training environment that reproduces accumulated soil-obstacle interactions, including contact states, terrain deformation, and obstacle visibility. Accordingly, particle-based simulation is suitable for the relevant policy learning. However, particle simulation is computationally expensive, and repeated excavation cycles further increase the learning cost. We observe that the burial condition of an obstacle governs both task difficulty and simulation cost: deeper burial makes obstacle removal harder while also requiring more particles for accurate simulation. This observation motivates a burial-conditioned curriculum learning strategy. We propose a time-efficient sim-to-real policy learning framework in which the policy observes terrain and obstacle information from RGB-D measurements and then outputs a parameterized excavation trajectory; in this process, the simulator reproduces in a real-world excavator the same observation-action interface it uses under controllable burial conditions. The curriculum begins with shallow burial conditions and progressively increases burial depth while adjusting particle count, thus simultaneously controlling task difficulty and simulation cost. Experiments show that the proposed framework successfully learns an effective obstacle-removal policy, whereas baseline methods fail even after a full week of training. The proposed curriculum achieves effective performance within three days and achieves successful transfer to a real 12-ton excavator operating on open ground with various steel obstacles, thus demonstrating robust obstacle removal.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09183v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuki Kadokawa, Sandro M. Alcantara Tacora, Taro Abe, Daisuke Endo, Genki Yamauchi, Takeshi Hashimoto, Takamitsu Matsubara</dc:creator>
    </item>
    <item>
      <title>DuplexOmni: Real-Time Listening, Seeing, Thinking, and Speaking for Full-Duplex Interaction</title>
      <link>https://arxiv.org/abs/2606.09186</link>
      <description>arXiv:2606.09186v1 Announce Type: new 
Abstract: Human interaction is continuous, multimodal, and full-duplex by nature. Although recent omni models have made substantial progress in unified speech, vision, and text modeling, combining seamless real-time interaction with complex reasoning and tool use remains challenging. We present DuplexOmni, a method for real-time multimodal full-duplex interaction. DuplexOmni separates model capability into an interaction layer and a thinking layer, which collaborate asynchronously in parallel. The interaction layer is implemented by the DuplexOmni model, an end-to-end system that processes streaming audio and video inputs while generating text and speech responses in real time. The thinking layer is a pluggable module that provides complex reasoning and tool-use capabilities. To support this method, we further develop a Writer-Director pipeline for constructing continuous-interaction training data. Experiments show that DuplexOmni achieves strong performance on multiple public benchmarks and exhibits natural full-duplex interaction ability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09186v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Muye Huang, Lingling Zhang, Xingyu Yu, Lei Shi, Zhanyu Ma, Jun Xu, Jiuchong Gao, Jinghua Hao, Renqing He, Jun Liu</dc:creator>
    </item>
    <item>
      <title>CP4D: Compositional Physics-aware 4D Scene Generation</title>
      <link>https://arxiv.org/abs/2606.09187</link>
      <description>arXiv:2606.09187v1 Announce Type: new 
Abstract: 4D generation (\textit{i.e.}, dynamic 3D generation) has recently emerged as a rapidly growing research frontier due to its powerful spatiotemporal modeling capabilities. However, despite notable advances, existing approaches typically fail to capture the underlying physical principles, producing results that are both physically inconsistent and visually implausible. To overcome this limitation, we present CP4D, a novel paradigm for photorealistic 4D scene synthesis with faithful adherence to complex physical dynamics. Drawing inspiration from the compositional nature of real-world scenes, where immutable static backgrounds coexist with dynamic, physically plausible foregrounds, CP4D reformulates 4D generation as the integration of a static 3D environment with physically grounded dynamic objects. On this basis, our framework follows a three-stage pipeline: \textbf{1)} Firstly, we leverage pre-trained expert models to generate high-fidelity 3D representations of the environment and foreground objects respectively. \textbf{2)} Subsequently, to produce physically plausible trajectories and realistic interactions for these objects, we propose a hybrid motion synthesis strategy that integrates priors from physical simulators with the common sense embedded in video diffusion models. \textbf{3)} Finally, we develop an automated composition mechanism that seamlessly fuses the static environment and dynamic objects into coherent, physically consistent 4D scenes. Extensive experiments demonstrate that CP4D can generate explorable and interactive 4D scenes with high visual fidelity, strong physical plausibility, and fine-grained controllability, significantly outperforming existing methods. The project page: https://anonymous.4open.science/w/CP4D/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09187v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hanxin Zhu, Cong Wang, Tianyu He, Long Chen, Xin Jin, Chen Gao, Zhibo Chen</dc:creator>
    </item>
    <item>
      <title>Trajectory Optimization in Single and Dual-UAV Bearing-Only Target Localization</title>
      <link>https://arxiv.org/abs/2606.09188</link>
      <description>arXiv:2606.09188v1 Announce Type: new 
Abstract: Bearing-only target localization is a fundamental problem in optical measurement and finds extensive applications in unmanned aerial vehicle (UAV) technology. Effective trajectory planning establishes favorable observation geometries, thereby enhancing the target localization accuracy of bearing-only UAV systems. This paper proposes an trajectory optimization method for unmanned aerial vehicles (UAVs) in bearing-only target localization scenarios. By leveraging the Fisher Information Matrix (FIM), the proposed approach dynamically integrates the geometric configuration and vehicle maneuverability into the optimization framework. Specifically, we introduce a spectrally-weighted FIM objective function that provides better gradient dynamics near degenerate configurations, enabling the planner to rapidly escape from poor observation conditions. For dual-UAV scenarios, an intersection angle sine term is introduced to optimize triangulation geometry by improving the sight-line intersection angle, thereby preventing trajectory aggregation. Furthermore, we propose an improved Particle Swarm Optimization (PSO) algorithm with motion model constraints and particle normalization to ensure the physical feasibility of the trajectory and enhance the compatibility with the objective functions. Simulation results demonstrate that the proposed method reduces the median localization error by 99.21% compared to conventional FIM-based approaches in single-UAV scenarios, and achieves a 69.70% improvement for dual-UAV configurations, exhibits superior performance in long-duration bearing-only target localization of maneuverability targets at extended ranges.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09188v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhijian Xiao, Huayu Huang, Bin Li, Yang Shang, Banglei Guan</dc:creator>
    </item>
    <item>
      <title>Pretrained, Frozen, Still Leaking: Auditing Cross-Encoder Attribute Transfer in EEG Foundation Models</title>
      <link>https://arxiv.org/abs/2606.09189</link>
      <description>arXiv:2606.09189v1 Announce Type: new 
Abstract: EEG foundation-model releases are usually audited one endpoint at a time: raw-reconstruction, membership inference, identity linkage, or DP-SGD on the downstream head. We audit the same released embeddings under all four endpoints jointly, on BIOT, LaBraM, and EEGPT, and show that each single-endpoint audit clears releases that still leak spectral attributes. The decisive evidence is a cross-encoder transfer audit: a single ridge attribute decoder learned from one frozen encoder transfers, via a fitted linear bridge, to held-out-subject test splits of every other encoder, with subject-disjoint matched-control 95% CI lower bound at least 0.081 across all six BIOT/LaBraM/EEGPT directions. We prove a sufficient condition: two encoders sharing a nontrivial attribute-coordinate projector overlap beta admit a chained ridge bridge attacker with centered-gain lower bound sqrt(beta/(1+tau^2)) - eps_br - rho_0, and back-solve beta in [0.008, 0.198]. To turn the joint audit into a deployment-readable decision rule we introduce an audit-endpoint disagreement score (AEDS), prove sufficient conditions for its positivity, and bootstrap-calibrate it per cell; AEDS is positive in all eight matched-CI cells (BIOT/LaBraM/EEGPT on EEGMMI; LaBraM on Sleep-EDF, 54-channel LIMO, CHB-MIT pediatric scalp EEG) with p&lt;0.001, while a head-level Carlini LiRA membership audit reaches AUC only 0.50-0.70. Standard defenses fail under audit: a Wiener-style noise-aware adaptive attacker, the LiRA audit, and DP-SGD at every utility-preserving epsilon in {4,8} leave the attribute channel essentially unchanged. The contribution is an audit framework that turns scattered single-endpoint defenses into a joint release decision, supported by a cross-encoder bridge theorem and adaptive-attacker, LiRA, and DP-SGD baselines; the audit licenses release-blocking, not raw-waveform exfiltration or held-out-subject identity recovery.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09189v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jianwei Tai</dc:creator>
    </item>
    <item>
      <title>Asymptotic Optimality of Thompson Sampling for Risk-Averse Bandits with Sub-Gaussian Rewards</title>
      <link>https://arxiv.org/abs/2606.09191</link>
      <description>arXiv:2606.09191v1 Announce Type: new 
Abstract: We prove that $\rho\text{-}\mathrm{NPTS}_{\mathrm{SG}}$, an anchor-free nonparametric Thompson Sampling algorithm for risk-averse bandits, achieves regret matching the instance-dependent lower bound to leading order in $\log n$, establishing it as asymptotically optimal for any continuous risk functional $\rho$ (CVaR, mean-variance, Sharpe ratio, distortion risk measures, and more) on the class of distributions with bounded density and sub-Gaussian tails, including Gaussian arms. Both this result and its bounded-support counterpart require only continuity of $\rho$: strictly weaker than the dominance condition of prior parametric Thompson Sampling results, and strictly weaker than the Lipschitz condition of UCB-type algorithms, yielding the first instance-optimal guarantees for non-Lipschitz functionals such as the Sharpe ratio without parametric reward assumptions. The bounded-support case is developed first as a stepping stone sharing the same proof structure. The key technical contributions are a discretisation lemma (bounded support) and a truncated discretisation lemma (sub-Gaussian tails), each projecting the growing-alphabet Dirichlet posterior onto a fixed grid via the Dirichlet aggregation property, holding all polynomial prefactors at fixed degree independent of sample size and breaking the super-exponential barrier that blocked prior proofs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09191v1</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Joel Q. L. Chang</dc:creator>
    </item>
    <item>
      <title>Symbolic and Abstractive Reasoning with Complex Visual Queries</title>
      <link>https://arxiv.org/abs/2606.09195</link>
      <description>arXiv:2606.09195v1 Announce Type: new 
Abstract: Understanding and reasoning over abstract visual content remains a challenge for current multi-modal large language models (MLLMs). In this paper, we explore a novel abstract data type termed complex visual query (CVQ), designed to probe symbolic and abstractive reasoning, which is a critical yet underexplored dimension of human-like neuro-symbolic reasoning for MLLMs. We present a comprehensive investigation from three perspectives: \textbf{Data $\times$ Paradigm $\times$ Exploration}. Specifically, we propose a scalable pipeline for synthesizing CVQs grounded in large-scale multi-modal knowledge graphs, generating a diverse dataset encompassing 14 distinct query types via systematic combinations of first-order logic operators. We further introduce a two-stage training framework that progressively equips MLLMs with robust visual reasoning capabilities. We conduct extensive experiments to rigorously evaluate MLLMs across multiple dimensions, including reasoning performance on CVQs, as well as cross-task and cross-scenario generalization. We believe our work opens new perspectives and avenues for advancing the reasoning frontiers of MLLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09195v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yichi Zhang, Jingdian Lu, Zhuo Chen, Lingbing Guo, Jun Xu, Wen Zhang, Huajun Chen</dc:creator>
    </item>
    <item>
      <title>MASS: Deep Research for Social Sciences with Memory-Augmented Social Simulation</title>
      <link>https://arxiv.org/abs/2606.09198</link>
      <description>arXiv:2606.09198v1 Announce Type: new 
Abstract: Deep Research agents powered by Large Language Models (LLMs) have exhibited extraordinary potential in automated paper writing tasks. However, existing systems rely heavily on literature retrieval and synthesis through internet and local knowledge bases, often resulting research in lacking insight and creativity in social science. To address this issue, we propose "Memory-Augmented Social Simulation (MASS)", an innovative paradigm that leverages highly realistic and research-oriented social simulations to enhance the creativity and empirical founding of LLMs-generated research. Specifically, MASS integrates three core components: dynamic goal-path planning with multi-level social norm restraint to guide the simulation, a multi-disciplinary behavior dataset for agent memory cold-start, and a structured forgetting mechanism inspired by the Ebbinghaus curve. Together, these ensure simulation authenticity and provide a robust empirical foundation for generating innovative scholarly papers. Experimental results demonstrate the effectiveness of our method, showing a 6.81\% improvement in generation overall quality over foundation LLMs and 17.19\% gain in Insight over strong baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09198v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yongrui Liu, Deyi Xiong</dc:creator>
    </item>
    <item>
      <title>Resource-aware Computation-Communication Overlap for multi-GPU ML Workloads</title>
      <link>https://arxiv.org/abs/2606.09200</link>
      <description>arXiv:2606.09200v1 Announce Type: new 
Abstract: The rapid growth of large-scale machine learning (ML) has made distributed training across multiple GPUs a fundamental component of modern ML systems. As model sizes and computational throughput continue to increase, communication overhead has become a dominant bottleneck in multi-GPU training, particularly when computation and communication are executed sequentially. This work explores concurrent execution of computation and collective communication using two portable runtime controls: shared-memory-driven occupancy shaping for computation kernels and elevated scheduling priority for communication kernels. Our approach regulates computation-kernel residency through per-block shared-memory allocation, leaving sufficient on-chip resources for communication kernels to make progress. In addition, assigning higher priority to communication streams ensures steady communication progress once resources become available. Experiments on NVIDIA A40, A100, H100, and AMD MI250X GPUs demonstrate that the proposed method enables effective computation-communication overlap and reduces total execution time by up to 25.5 percent, without modifying vendor libraries or kernel implementations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09200v1</guid>
      <category>cs.DC</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Minyu Cui, Miquel Pericas</dc:creator>
    </item>
    <item>
      <title>Deterministic Execution of ROS~2 Applications via Lingua Franca</title>
      <link>https://arxiv.org/abs/2606.09203</link>
      <description>arXiv:2606.09203v1 Announce Type: new 
Abstract: The Robot Operating System~2 (ROS 2) is a widely used middleware for robotic systems, characterized by a publish-subscribe (pub-sub) communication mechanism in which computation is structured as callbacks dispatched by ROS 2 executors. Despite its popularity, the pub-sub pattern in ROS 2 is inherently nondeterministic: the order in which these callbacks run is nondeterministic even within a single executor, and distributed deployments add further nondeterminism from the interleaving of messages across nodes and from network latency. Such nondeterminism often leads to concurrency issues and makes it virtually impossible to analyze for safeness and provide guarantees.
  We present a framework that is able to convert an unmodified ROS 2 application and run it under Lingua Franca (LF), a coordination language for deterministic execution using logical time, so that the same input always produces the same deterministic execution order. We first describe which ROS 2 features can be executed deterministically under logical time. Such features enable the possibility to establish an automatic conversion framework to extract information from a ROS 2 application and directly convert it into an LF program. The rich features of LF, such as logical-time delays, federated execution across processes, and fault handling, can then be applied to make the ROS 2 application be executed in a deterministic and timing-predictable manner without changing the ROS 2 code. We evaluate the framework on a synthetic example and on the Autoware reference system. We show that the order in which callbacks are executed differs in default ROS 2, while also having end-to-end latencies that vary across executions. In contrast, our LF-controlled ROS 2 system produces a deterministic execution order and consistent end-to-end latencies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09203v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Harun Teper, Shaokai Lin, Shulu Li, Edward A. Lee, Jian-Jia Chen</dc:creator>
    </item>
    <item>
      <title>The Injection Paradox: Brand-Level Suppression in Safety-Trained LLM Recommendations via RAG Context Injection</title>
      <link>https://arxiv.org/abs/2606.09204</link>
      <description>arXiv:2606.09204v1 Announce Type: new 
Abstract: We present a reproducible failure mode of safety training in RAG-based LLM recommendation -- the Injection Paradox -- in which prompt injections embedded in retrieved documents backfire against the attacker, suppressing the target brand below the injection-free baseline. In safety-trained Claude models, documents containing prompt injections suffer a sharp drop in recommendation rate, and this suppression propagates beyond the injected document to unmodified documents of the same brand. In Claude Opus 4.6, the target brand drops from a 54% baseline to zero top-2 recommendations across all 50 trials, even though only 1 of 4 brand documents in the corpus contains an injection. The directional pattern is reproduced in counterfactual experiments and across three brands. A contrasting result across the GPT models tested, where the same injection instead increases recommendations, suggests model-family differences in how injection-like context affects recommendation behavior. These findings raise the technical possibility of a reverse-attack scenario in which an adversary embeds injections in a competitor's documents to suppress the competitor's brand via safety-sensitive model behavior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09204v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hyunseok Paeng</dc:creator>
    </item>
    <item>
      <title>Event-driven dynamic trajectories reconstruction and measurement of mechanical parameters for fragments</title>
      <link>https://arxiv.org/abs/2606.09208</link>
      <description>arXiv:2606.09208v1 Announce Type: new 
Abstract: During warhead detonation, high-density, high-speed, and mutually occluded fragments are generated. Their mechanical parameters (position, velocity, kinetic energy) directly determine the lethality of the warhead fragment field. However, high-intensity flash and smoke in detonation scenarios severely hinder the accurate acquisition of these mechanical parameters. To address this challenge, this paper integrates experimental mechanics approaches and presents an event-driven method for reconstructing the dynamic trajectories of fragments and measuring their mechanical parameters. As a novel brain-inspired visual sensor, event cameras offer microsecond-level temporal resolution and high dynamic range lighting change perception, overcoming the difficulty of accurately measuring high-speed targets under strong flash interference. The method constructs a multi-event-camera vision system, adopting three geometric constraints: time-correlated epipolar constraint to find potential matching event point pairs, and trifocal tensor line constraint plus local homography constraint to eliminate mismatches. A comprehensive probability model is established, with entropy weight method determining the weight of each constraint's probability to quantitatively filter mismatches. 3D trajectory reconstruction is achieved via spatial line-line intersection and nonlinear optimization. Finally, the velocity and kinetic energy of the fragments are calculated based on the reconstructed trajectory. This method provides reliable technical support for the mechanical damage evaluation of warhead fragment fields and the tactical protection design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09208v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haoyang Li, Banglei Guan, Muxi Zha, Yifei Bian, Minzu Liang, Yang Shang, Qifeng Yu</dc:creator>
    </item>
    <item>
      <title>Frequent Itemset Mining with Quantum Computing</title>
      <link>https://arxiv.org/abs/2606.09209</link>
      <description>arXiv:2606.09209v1 Announce Type: new 
Abstract: Frequent Itemset Mining (FIM) is a foundational task in data analytics, but its candidate and conditional pattern spaces can grow rapidly, and maintaining support information becomes increasingly costly on dense datasets. These bottlenecks present a critical opportunity for quantum computing to redesign the way candidate representation and support verification are organized. Motivated by recent developments in quantum computing, we propose the \textit{QuantumFreqMine (QFM)} framework for FIM. QFM introduces three mechanisms: (1)~\textit{Bit-Vector Qubit Encoding}, (2)~\textit{Mining-Aware Candidate Superposition}, and (3)~\textit{Bit-Parallel Threshold Marking}. We provide a theoretical analysis in terms of time complexity, space comlexity, and logical resource usage. We implement QFM on IBM Qiskit and Amazon Braket. The experiments demonstrate that QFM outperforms representative baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09209v1</guid>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yen-Hsin Hsu, Ya-Wen Teng, De-Nian Yang, Wang-Chien Lee, Philip S. Yu, Ming-Syan Chen</dc:creator>
    </item>
    <item>
      <title>SNN-MLIR: An MLIR Dialect for Compiling Neuromorphic SNNs from NIR to Bare-Metal C</title>
      <link>https://arxiv.org/abs/2606.09213</link>
      <description>arXiv:2606.09213v1 Announce Type: new 
Abstract: Spiking neural networks (SNNs) are increasingly trained in a wide range of frameworks (SnnTorch, Lava, Norse, and others) each with its own model format. The Neuromorphic Intermediate Representation (NIR) addresses this fragmentation by providing a common, framework-independent format for exchanging trained SNN models. NIR solves the exchange problem, but it stops there. It provides a description of a network, not a path to running one. Each backend is still left to implement deployment on its own, with no shared, transformable compiler representation in between. This paper presents snn-mlir, an outof-tree MLIR dialect for SNNs together with a NIR-MLIR-C compilation bridge. The dialect provides a small set of typepolymorphic operations that work identically on floating-point (f32/f64) and quantized data, so a single intermediate representation serves both simulation and hardware-oriented deployment. A Python front end reads any NIR file and emits dialect IR, automatically inserting rescaling operations to keep quantization scales consistent across layers. A reference lowering pass converts the dialect to standard linalg and arith operations, from which the toolchain produces self-contained, dependency free C11 code that compiles and runs on any C-capable CPU or embedded target. We evaluate numerical fidelity against reference outputs, portability across CPU targets, and the cost of quantization. The current scope is feedforward, fully-connected networks with a CPU backend. snn-mlir is released as open source under the Apache-2.0 license with LLVM-exception and it is already available on Github.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09213v1</guid>
      <category>cs.PL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alejandro Garc\'ia Gener, Alvaro Roll\'on de Pinedo</dc:creator>
    </item>
    <item>
      <title>MotionWAM: Towards Foundation World Action Models for Real-Time Humanoid Loco-Manipulation</title>
      <link>https://arxiv.org/abs/2606.09215</link>
      <description>arXiv:2606.09215v1 Announce Type: new 
Abstract: World Action Models (WAMs) couple a video dynamics prior to the policy and have shown encouraging results on tabletop manipulation, but iterative denoising over high-dimensional video-action latents leaves them too slow for real-time humanoid loco-manipulation. The problem is compounded by the dominant hierarchical paradigm, in which a high-level manipulation policy controls only the upper body while a low-level controller tracks coarse base commands -- placing upper and lower body in inconsistent action spaces and reducing the legs to balance-preserving locomotion. We present MotionWAM, a real-time WAM that drives autonomous humanoid loco-manipulation from a single egocentric camera by conditioning the policy on the intermediate denoising features of a video world model. MotionWAM replaces the upper-lower split with a unified motion latent and predicts whole-body motion tokens that jointly cover locomotion, torso motion, height regulation, foot interaction, and hand manipulation in a single action space. A three-stage learning framework progressively adapts the video world model to egocentric visual dynamics and to the target humanoid embodiment. On nine real-world Unitree G1 tasks, MotionWAM runs in real time, substantially outperforms Vision-Language-Action (VLA) baselines fine-tuned on the same demonstrations by over 30% in overall success rate, and executes task-driven foot interaction that decoupled upper-lower policies cannot reach. Our results suggest that video-pretrained WAMs can be lifted from tabletop manipulation to coordinated, human-like whole-body humanoid control.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09215v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jia Zheng, Teli Ma, Yudong Fan, Zifan Wang, Shuo Yang, Junwei Liang</dc:creator>
    </item>
    <item>
      <title>Minimal Solvers for Full-DoF Motion Estimation from Asynchronous Differential SfM</title>
      <link>https://arxiv.org/abs/2606.09218</link>
      <description>arXiv:2606.09218v1 Announce Type: new 
Abstract: As a bio-inspired intelligent sensor, event cameras have introduced a new paradigm in the intelligent perception of spatiotemporal information and visual motion estimation, characterized by their high temporal resolution, low latency, and minimal power consumption. However, their asynchronous data streams present significant challenges to traditional synchronous, frame-based algorithms. To address these challenges, this paper presents a novel framework for full degree of freedom (DoF) egomotion estimation directly from asynchronous optical flow, specifically targeting the joint recovery of angular and linear velocities. We decouple the differential epipolar constraint into distinct angular and linear velocity components, and derive its formulation for asynchronous data. Based on this formulation, an optimization algorithm is developed that enables full-DoF egomotion estimation leveraging at least five points. Furthermore, by applying a first-order approximation to rotational dynamics, we transform the constraint equations into a polynomial form, resulting in the first algebraic minimal 5-point solver for this formulation. To ensure real-time performance in high-speed scenarios, we additionally propose an accelerated solver achieved by truncating high-order angular velocity terms. Extensive evaluations on both synthetic and real-world datasets demonstrate that the asynchronous approach outperforms traditional synchronous methods, particularly in its accuracy and robustness to spatiotemporal noise. We believe that this work establishes a critical foundation for efficient and accurate continuous-time motion estimation in high-speed robotics applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09218v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shuo Pan, Banglei Guan, Bin Li, Zhenbao Yu, Zibin Liu, Zi Wang, Yang Shang, Qifeng Yu</dc:creator>
    </item>
    <item>
      <title>Semi-supervised Source Detection in Astronomical Images: New Benchmark and Strong Baseline</title>
      <link>https://arxiv.org/abs/2606.09219</link>
      <description>arXiv:2606.09219v1 Announce Type: new 
Abstract: Source detection in modern observational astronomy is a cornerstone for localizing and identifying stellar sources accurately. It is crucial for studies such as stellar population synthesis and cosmological parameter estimation. However, the characteristics of astronomical images, including high density, the effect of point spread functions and low signal-to-noise ratios, significantly challenge the latest advanced object detectors. Besides, fully-supervised detection methods are hardly practical, due to the significant difficulty in annotating dense, small, and faint sources in astronomical images. To tackle the scarcity of astronomical datasets, we introduce a new comprehensive benchmark (LAMOST-DET), comprising 18,400 astronomical images and 728,898 source instances. Upon the dataset, we further devise a novel semi-supervised learning framework coined Nova Teacher, capable of detecting dense sources effectively given sparse annotations. It integrates source light enhancement module, confidence-guided pseudo-supervision, and cross-view complementary mining in a dual-teacher paradigm. Extensive experiments on LAMOST-DET show that, Nova Teacher consistently improves previous competitors by 4.04% and 5.22% mAP under two semi-supervised settings. Additionally, our method competes against other detectors on a natural image dataset, validating its generalization ability to various scenarios. The source code is available at https://github.com/AcWiz/NovaTeacher.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09219v1</guid>
      <category>cs.CV</category>
      <category>astro-ph.IM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Longhan Feng, Zihuang Cao, Ali Luo, Yuanhao Guo, Shuilian Yao, Yixin Guo, Qi Jia, Yu Liu</dc:creator>
    </item>
    <item>
      <title>Quantitative Performance Analysis of Stopping Criteria for CMA-ES</title>
      <link>https://arxiv.org/abs/2606.09220</link>
      <description>arXiv:2606.09220v1 Announce Type: new 
Abstract: Covariance matrix adaptation evolution strategy (CMA-ES) is a state-of-the-art black-box optimization algorithm. In general, CMA-ES uses a portfolio of multiple stopping criteria to automatically determine when to stop the search. This mechanism aims to avoid unnecessary consumption of the function evaluation budget during stagnation. Stopping criteria play an important role in CMA-ES, particularly when restart strategies are employed. However, the effectiveness of stopping criteria in CMA-ES remains poorly understood. To address this issue, this paper investigates how the 11 stopping criteria in CMA-ES behave on the noiseless BBOB function set. The performance of the stopping criteria is quantitatively evaluated based on the optimal stopping point in terms of the number of function evaluations in a single run of CMA-ES. Our results show that, although which stopping criterion is triggered first depends significantly on the sample size $\lambda$ and the dimension $n$, \texttt{tolflatfitness} and \texttt{tolfun} are frequently the first criteria to be triggered among the portfolio of 11 stopping criteria. We also demonstrate that \texttt{tolfunhist} and the portfolio achieve the highest stopping accuracy in most cases. In addition, our results show that the \texttt{tolfun} and \texttt{tolfunhist} criteria are frequently triggered before CMA-ES reaches complete stagnation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09220v1</guid>
      <category>cs.NE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ryoji Tanabe</dc:creator>
    </item>
    <item>
      <title>TinyContainer: Container Runtime Middleware Enabling Multi-tenant Microcontrollers with Built-in Security</title>
      <link>https://arxiv.org/abs/2606.09225</link>
      <description>arXiv:2606.09225v1 Announce Type: new 
Abstract: Software containerization technologies for resource-limited devices enable multi-tenant microcontrollers, which allow running multiple applications with different permission levels. However, current solutions lack run time configuration over various settings on container scheduling and container permissions to host resources. This limits the applicability of constrained containerization in dynamic and heterogeneous environments. This paper introduces TinyContainer, a lightweight software container management middleware designed for multi-tenant microcontrollers. TinyContainer provides per-container configurable scheduling and fine-grained access control to host resources through a metadata-driven approach, supporting multiple runtimes via a runtime abstraction layer. We analyze the performance of TinyContainer with a small WebAssembly runtime, CS4WAMR, and RIOT OS, a common RTOS. We report on experiments using popular IoT boards based on various Cortex-M microcontrollers. We show the endpoint system brought by TinyContainer allowing to regulate access of containers to host resources and provide host services to containers with an overhead of up to 4 ms per call. In particular, we showcase a TinyML use case, whereby containers retain data and model weights, while model inference is delegated to native host RTOS services.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09225v1</guid>
      <category>cs.OS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>ACM WiSec 2026</arxiv:journal_reference>
      <dc:creator>Bastien Buil, Chrystel Gaber, Samuel Legouix, Emmanuel Baccelli, Samia Bouzefrane</dc:creator>
    </item>
    <item>
      <title>Trustworthy Smart Fabs via Professional Proxies: Scaling Safe and Sustainable by Design (SSbD) through Industrial Data Spaces</title>
      <link>https://arxiv.org/abs/2606.09227</link>
      <description>arXiv:2606.09227v1 Announce Type: new 
Abstract: The convergence of the 2026 European Union Safe and Sustainable by Design (SSbD) framework, Corporate Sustainability Due Diligence Directive (CSDDD), and Carbon Border Adjustment Mechanism (CBAM) introduce a severe governance bottleneck for advanced semiconductor manufacturing facilities ("Smart Fabs"). Regulatory compliance demands have surpassed the capacity of manual corporate reporting, creating a direct conflict between multi-stakeholder transparency and corporate data privacy. This paper addresses this challenge by introducing a zero-trust socio-technical orchestration framework that operationalizes a six-layer SSbD reference architecture within trustworthy industrial data spaces. We propose a shift from reactive automation to autonomous governance through "Professional Proxies"-role-based agentic workflows executing within hardware-isolated trust zones. Structured as an interoperable network protocol stack, the framework coordinates an automated, five-step "relay race" between Facility, Process Engineering, and Finance proxy teams to align factory-floor yield models with macro-level sustainability mandates. By executing Virtual Metrology (VM) predictions and Federated Machine Learning (FML) inside hardware-rooted Trusted Execution Environments (TEEs), this architecture resolves the Data Sovereignty Paradox, demonstrating how fabs can export cryptographically signed compliance tokens via International Data Spaces (IDS) connectors without exposing proprietary process recipes. Ultimately, this framework provides technology managers with a verifiable, evidence-based pathway toward resilient, net-zero Industry 5.0 ecosystems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09227v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.CE</category>
      <category>cs.CY</category>
      <category>cs.HC</category>
      <category>cs.SI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Han-Teng Liao, Chang-Yi Kao, Karen Ang</dc:creator>
    </item>
    <item>
      <title>End-to-End Training for Discrete Token LLM based TTS System</title>
      <link>https://arxiv.org/abs/2606.09234</link>
      <description>arXiv:2606.09234v1 Announce Type: new 
Abstract: Recent state-of-the-art (SOTA) text-to-speech (TTS) systems typically adopt a cascaded pipeline consisting of a speech tokenizer, an autoregressive large language model (LLM), and a diffusion based flow-matching (FM) model, with these components trained independently. In this paper, we propose a fully end-to-end (E2E) optimization framework that unifies the training of the speech tokenizer, LLM, FM model, and an additional reward model (RM). Specifically, we first jointly optimize the tokenizer using multi-task objectives derived from reconstruction for FM, next-token prediction for LLM, and multi recognition task for RM. This joint training encourages the discrete speech token space to capture acoustically and semantically salient information that is better tailored to TTS. We then further optimize the LLM using downstream reconstruction and recognition by FM and RM, which reduces inference-time mismatch and steers the LLM toward more preferred generations. Experimental results show that our E2E framework consistently outperforms cascaded baselines. On the Seed-TTS-Eval benchmark, our system achieves a word error rate (WER) of 0.78% and 1.56%, a new SOTA result with a 0.6B-parameter LLM and 0.5B-parameter FM model. These results validate that holistic E2E optimization is critical for improving discrete-token-based TTS systems with a much simpler training pipeline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09234v1</guid>
      <category>cs.SD</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Changfeng Gao, Yong Ren, Jun Yuan, Ye Bai, Zhao You, ShiDong Shang</dc:creator>
    </item>
    <item>
      <title>Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation</title>
      <link>https://arxiv.org/abs/2606.09236</link>
      <description>arXiv:2606.09236v1 Announce Type: new 
Abstract: Autonomous Racing has seen remarkable progress through deep Reinforcement Learning (RL), primarily for four-wheeled vehicles. However, motorbikes introduce substantially greater complexity due to the need to manage balance and lean angle, in addition to more reactive steering and throttle control, and a smaller weight. In this work, we present a framework for training an autonomous agent to race a superbike in VRider SBK, a physics-accurate Unity-based motorbike simulator. Our approach integrates Soft Actor-Critic (SAC) with Self-Paced curriculum Deep reinforcement Learning (SPDL), which dynamically generates progressively more challenging tasks based on the agent's performance, without requiring manual curriculum design. The agent's state space comprises proprioceptive features extended with lean-angle history, along with global track features via course points. The reward signal is shaped to encourage progress along the track while penalizing instability-inducing behaviors specific to two-wheeled dynamics. Preliminary experimental results demonstrate that SPDL outperforms SAC alone in training efficiency, lap time, and driving stability across multiple tracks and motorbike models, establishing a first baseline for RL-based autonomous motorbike racing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09236v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Luca Ghisi, Jacopo Essenziale, Carlo D'Eramo, Matteo Luperto</dc:creator>
    </item>
    <item>
      <title>Can we stabilize an inverted pendulum with feedback from a time-of-flight camera?</title>
      <link>https://arxiv.org/abs/2606.09237</link>
      <description>arXiv:2606.09237v1 Announce Type: new 
Abstract: Time-of-flight cameras are popular in robotics for providing direct depth information while being compact, inexpensive, and robust to lighting conditions, but their low spatial resolution and depth noise are widely believed to preclude precise feedback control. In this paper, we show that an inexpensive, low-resolution time-of-flight camera provides sufficient feedback to reliably and precisely balance an inverted pendulum on a cart--a canonical benchmark for fast, unstable dynamics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09237v1</guid>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anthony Czubarow, Antonio Terpin, Raffaello D'Andrea</dc:creator>
    </item>
    <item>
      <title>Orange Lab: Lowering Barriers to Data Mining through Embedded Interactive Workflows</title>
      <link>https://arxiv.org/abs/2606.09239</link>
      <description>arXiv:2606.09239v1 Announce Type: new 
Abstract: While visual programming of data analysis workflows has become an important vehicle for the democratization of data science, such systems remain largely confined to standalone applications and offer limited support for transitioning their visual analytics solutions into interactive web environments. As a result, data analysis pipelines are difficult to share, embed, and adapt into user-facing analytical tools. We present Orange Lab, a web-based collaborative environment for visual data analytics. At its core, Orange Lab enables users to visually construct machine learning workflows from modular components, where interactions in any component propagate seamlessly through the workflow, turning static pipelines into dynamic, reactive systems that support exploration and data-driven storytelling. Our key contribution is component exposition, a paradigm that allows authors to embed selected workflow components, or parts of their interfaces, into arbitrary web contexts, creating synchronized, interactive interfaces while hiding underlying workflow complexity. This enables the development of tailored analytical views and narrative-driven experiences that integrate data analysis directly into online materials. We demonstrate the approach through deployments in data literacy education, where embedded components guide students in hands-on exploration of machine learning concepts without requiring knowledge of the underlying system, showing that Orange Lab effectively lowers barriers to entry and supports the democratization of data science.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09239v1</guid>
      <category>cs.LG</category>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Matej Bevec, Ale\v{s} Erjavec, Vesna Tanko, Lena Trnovec, Lan \v{Z}agar, Ana Fari\v{c}, Janez Dem\v{s}ar, Bla\v{z} Zupan</dc:creator>
    </item>
    <item>
      <title>Closing the Indexing-Decoding Gap in Multimodal Generative Retrieval via Prefix Retention Optimization</title>
      <link>https://arxiv.org/abs/2606.09241</link>
      <description>arXiv:2606.09241v1 Announce Type: new 
Abstract: Multimodal generative retrieval formulates multimodal retrieval as discrete identifier generation, eliminating the need for explicit similarity search over external embeddings. Existing approaches construct identifiers via residual quantization and decode them with trie-constrained beam search. This combination introduces an indexing-decoding gap: identifier learning objectives, including reconstruction and contrastive losses, do not explicitly enforce prefix discriminability during decoding. As a result, even well-optimized identifiers can be irreversibly pruned early in beam search due to low-rank prefixes. We theoretically characterize this gap and derive a survival bound that relates prefix retention to three controllable factors in indexing and decoding. Building on this bound, we propose PRO, prefix retention optimization, a unified framework comprising three mechanisms: (i) prefix ranking distillation aligns quantized prefix rankings with those induced by pre-quantization embeddings using a listwise loss; (ii) vocabulary scheduling increases codebook sizes from shallow to deep residual quantization levels to reduce early competition from non-target prefixes; and (iii) geometric score fusion vectorizes each candidate prefix and incorporates its similarity to the query into beam search scoring, further reducing the indexing-decoding mismatch. Experiments on nine multimodal retrieval tasks show that PRO improves retention of target identifier prefixes and outperforms existing multimodal generative retrieval baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09241v1</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yufei Chen, Zihan Wang, Yubao Tang, Yukun Zhao, Maarten de Rijke, Zhaochun Ren</dc:creator>
    </item>
    <item>
      <title>Conceptualising Reflective Use: Toward A Process Perspective On Human-AI Interaction</title>
      <link>https://arxiv.org/abs/2606.09242</link>
      <description>arXiv:2606.09242v1 Announce Type: new 
Abstract: The rapid diffusion of generative artificial intelligence (genAI) systems reshapes how individuals engage with information systems, requiring users to monitor, assess, and adapt their interaction with non-deterministic systems. Existing constructs capture elements of this engagement but do not account for the situated dynamics of the entire evaluative process in genAI use. This research-in-progress, situated in a larger endeavour towards a scale development, derives an initial conceptualisation of reflective use: a behavioural-knowledge capability that unfolds across pre-use, in-use, and post-use phases, reinforced through situated reflective knowledge gained in practice. Drawing on expert interviews and a focus group, we identify four core components of reflective use and show how they form an iterative capability cycle anchored within the motivational needs outlined in self-determination theory. Understanding reflective use is essential to ensure appropriate reliance and high decision quality, and thus provides a foundation for promoting responsible and effective human-AI interaction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09242v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Thimo Schulz, Christina Speck</dc:creator>
    </item>
    <item>
      <title>EgoTactile: Learning Grasp Pressure for Everyday Objects from Egocentric Video</title>
      <link>https://arxiv.org/abs/2606.09243</link>
      <description>arXiv:2606.09243v1 Announce Type: new 
Abstract: Estimating full-hand grasp pressure from egocentric video is critical for immersive VR and robotic manipulation, yet dense tactile sensing often relies on intrusive hardware. Existing vision-based methods predominantly rely on planar surfaces or fingertip contacts, failing to generalize to complex 3D object interactions. Therefore, we introduce EgoTactile, a benchmark pairing egocentric video with full-hand pressure supervision for diverse everyday objects, incorporating a bare-hand transfer subset to enable generalization to natural scenarios. Leveraging this benchmark, we first establish EgoPressureFormer as a discriminative baseline. Beyond this, to explicitly address the uncertainty in partial observations, we propose EgoPressureDiff, a conditional diffusion framework that adapts a large-scale pre-trained video diffusion backbone. By combining rich world knowledge priors with a Physically-Informed Feature Rectification layer to inject semantic constraints, our approach effectively infers plausible contact patterns and resolves visual-physical ambiguities. Extensive experiments demonstrate that our method achieves superior performance on the benchmark and robust transferability to in-the-wild scenarios. Our project page is available at https://egotactile.github.io/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09243v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuan Zeng, Yujia Shi, Tiao Tan, Xingting Li, Yaqi Qin, Zongqing Lu, Wenming Yang, Jing-Hao Xue, Qingmin Liao</dc:creator>
    </item>
    <item>
      <title>Proposal Refinement for Few-Shot Object Detection</title>
      <link>https://arxiv.org/abs/2606.09245</link>
      <description>arXiv:2606.09245v1 Announce Type: new 
Abstract: Few-shot object detection has gained widely attention in recent years. Some excellent algorithms have been proposed to handle this task. However, most of these algorithms rely on the performance of few-shot classification. Unlike previous attempts, our work focuses on the problem of unbalanced distribution of region proposals between the novel classes and the base classes. In order to alleviate this unbalanced distribution, we propose the proposal refinement approach for different training phases. Specifically, refinement loss is designed for the base training phase to enhance sensitivity of the model to novel classes, and refinement branch is introduced as an auxiliary branch for RPN (Region Proposal Networks) to generate more novel proposals in the fine-tuning phase. By rebalancing the proposal distribution, the proposed approach outperforms the baselines methods by roughly 1\%$\sim$6\% on current benchmarks without increasing any inference time. Through extensive experiments, we prove that we establish a new state-of-the-art method for the few-shot object detection task.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09245v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuan Zeng, Bin Song, Jie Guo, Yuwen Chen</dc:creator>
    </item>
    <item>
      <title>SOMA: From Surface Observations to Muscle Anatomy</title>
      <link>https://arxiv.org/abs/2606.09246</link>
      <description>arXiv:2606.09246v1 Announce Type: new 
Abstract: With the growing demand for realistic virtual humans, parametric body models have become a cornerstone of modern medicine, sports, and entertainment applications. However, most of these models are inherently limited: they only capture the 3D surface of the skin, offering no insight into the complex bio-mechanical structures that generate motion. As more applications expand towards biomechanics, the need for virtual human models that go beyond the skin has become increasingly evident. Traditional soft-tissue simulations, such as FEM, are accurate but non-scalable and too computationally expensive for most common applications. Alternatively, existing biomechanical tools can simulate muscular forces and activations, but do not model changes in external shape, restricting how activations correlate with actual observable anatomy. This motivates a novel inverse research problem: recovering muscle deformations directly from visible surface observations - i.e., from the skin, and thus the pose. In this work, we present SOMA (from Surface Observations to Muscle Anatomy), a person-specific model that infers spatio-temporal muscle behavior from surface signals obtained using RGB cameras, and SKIM, a subject-specific soft-tissue deformation dataset. To the best of our knowledge, this is the first method that attempts to recover muscle deformations from multi-view RGB data. We show how our method provides anatomically grounded animations without the complexity of traditional simulations, leading to a scalable and cost-effective solution. Data and code are available.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09246v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Eduardo Alvarado, Emily Kim, Gerrit Nolte, Friedemann Runte, Mario Botsch, Marc Habermann, Christian Theobalt</dc:creator>
    </item>
    <item>
      <title>Temporal-Aware Reasoning Optimization for Video Temporal Grounding</title>
      <link>https://arxiv.org/abs/2606.09248</link>
      <description>arXiv:2606.09248v1 Announce Type: new 
Abstract: Multi-modal Large Language Models (MLLMs) have achieved remarkable progress in video temporal grounding with reinforcement learning for generating reasoning paths. However, existing models often produce superficial reasoning, which offers limited guidance for precise temporal localization. This limitation stems from (1) inefficient random exploration and (2) reward functions that focus solely on the answer correctness while ignoring reasoning quality. To address these issues, we propose TaRO (Temporal-Aware Reasoning Optimization), a framework that explicitly enhances the model's ability of thinking with time. First, we introduce a Constructive Reasoning Exploration that leverages pre-generated dense captions to construct reasoning paths grounded in explicit visual cues and timestamps, enabling efficient exploration of high-quality time-aware reasoning. Second, to evaluate reasoning quality, we design a Temporal-Sensitivity Reward. High-quality reasoning should be anchored to specific events and timestamps. If the event boundary under thinking is disrupted, such reasoning should become invalid, leading to a drop in the logit of the reasoning path. We utilize this drop as a critique of reasoning quality. Finally, TaRO follows a progressive curriculum, which starts by utilizing this reward to select better constructed reasoning paths, and evolves to a free exploration phase where the model autonomously generates effective reasoning. Experiments demonstrate that TaRO achieves state-of-the-art performance on VTG benchmarks. Code is available at https://github.com/oceanflowlab/TaRO.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09248v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Minghang Zheng, Zihao Yin, Yi Yang, Yuxin Peng, Yang Liu</dc:creator>
    </item>
    <item>
      <title>MAGIS: Evidence-Based Multi-Agent Reasoning for Interpretable Strabismus Clinical Decision-Making</title>
      <link>https://arxiv.org/abs/2606.09249</link>
      <description>arXiv:2606.09249v1 Announce Type: new 
Abstract: Strabismus is a common ocular disorder that requires fine-grained subtype diagnosis for individualized treatment planning. However, existing deep learning methods mainly provide diagnostic predictions without transparent reasoning, while recent large vision-language models (LVLMs), although promising for joint image understanding and report generation, remain highly prone to hallucination in this evidence-sensitive and rule-driven medical task. To address these challenges, we propose MAGIS, an evidence-based Multi-AGent reasoning for Interpretable Strabismus diagnosis framework. MAGIS transforms black-box end-to-end generation into a structured diagnostic process consisting of candidate hypothesis generation, dual-evidence constrained context, evidence-based corrective verification, and report generation. Specifically, we introduce a Dual-Evidence Constrained Context (DECC) mechanism that jointly organizes visual evidence from the photograph of the nine cardinal positions of gaze and evidence-based clinical diagnostic rules into a constrained context for reliable diagnostic reasoning. We further develop an Evidence-Based Corrective Verification (EBCV) mechanism that verifies whether the current diagnostic hypothesis is supported by visual evidence, heatmap-based visual cues, and evidence-based clinical diagnostic rules. Hypothesis refinement is triggered when inconsistency is detected. Experiments on a fine-grained strabismus benchmark demonstrate that MAGIS not only significantly outperforms other state-of-the-art diagnostic systems, improving the weighted F1 score from 72.0% to 91.3%, but also substantially improves the clinical reliability (consistency, alignment, and completeness) of generated diagnostic reports. These results demonstrate that MAGIS provides an effective solution for building accurate, evidence-based, and clinically interpretable strabismus diagnosis systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09249v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xikai Tang, Yifan Wang, Jiafan Zhuang, Li Luo, Jinming Guo, Xiaoling Xie, Jiacheng Liu, Peiwei Wei, Lihao Zhong, Xiaoli Kang, Jie Cen, Guangqiang Yin, Kunliang Qiu, Ce Zheng, Zhun Fan</dc:creator>
    </item>
    <item>
      <title>LiteVSR: Lightweight Adaptation of Frozen Diffusion Transformers for Video Super-Resolution</title>
      <link>https://arxiv.org/abs/2606.09250</link>
      <description>arXiv:2606.09250v1 Announce Type: new 
Abstract: Adapting large-scale pre-trained video generators for Video Super-Resolution (VSR) in novel domains remains computationally prohibitive. Methods that reformulate generation as direct Low-Quality to High-Quality mappings deviate from the original generative formulation, demanding extensive fine-tuning. ControlNet-style adapters lose their efficiency under modern Diffusion Transformers since the absence of encoder-decoder hierarchy forces duplication of the entire backbone. We observe that flow matching offers a principled alternative for cross-domain VSR adaptation. By predicting a constant velocity field across all timesteps, the adaptation task reduces to learning a fixed injection pattern rather than time-varying transformations. Building on this insight, we propose LiteVSR, a minimalist framework that performs VSR using a completely frozen Diffusion Transformer with a lightweight State-Aware Adapter. The adapter employs a dual-stream architecture that extracts static structural cues from the LQ input and dynamic cues from intermediate denoising states, aligning them through time-dependent cross-attention to enable adaptive transition from structural alignment to texture refinement as denoising proceeds. LiteVSR achieves competitive restoration quality with only 11.25% trainable parameters and 12 GPU-hours of training on a single A100, while maintaining fast sampling (down to a single step) compatibility.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09250v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yu Cao, Ziquan Liu, Zhensong Zhang, Jiankang Deng, Shaogang Gong, Jifei Song</dc:creator>
    </item>
    <item>
      <title>TruthSplit: Operationalizing Conditional Validity in Arguments Through Multi-Perspective Reasoning</title>
      <link>https://arxiv.org/abs/2606.09251</link>
      <description>arXiv:2606.09251v1 Announce Type: new 
Abstract: We present TruthSplit, an interactive system for multi-perspective argument analysis. Existing argumentation tools typically analyze properties of the argument itself, such as structure, quality, stance, or persuasiveness, while leaving perspective-specific background knowledge implicit. TruthSplit addresses this gap by supporting an exploratory analysis of how the same claim can lead to different conclusions when interpreted through worldview-specific values, assumptions, and conceptual definitions. We refer to this perspective-dependent analysis as conditional validity. Given an input argumentative text, TruthSplit extracts claims and premises, applies a three-layer natural language inference (NLI) approach to assess both logical and worldview-specific normative consistency, and conditions large language model (LLM) reasoning on structured worldview profiles that encode core values and decision principles. The system then generates perspective-specific interpretations, identifies value conflicts and assumption gaps, and visualizes divergence through interactive analytical interfaces.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09251v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Benjamin Stieger, Maximilian Terberger, Thomas Huber, Christina Niklaus</dc:creator>
    </item>
    <item>
      <title>A practical probabilistic framework for deformable image registration uncertainty in radiotherapy dose propagation</title>
      <link>https://arxiv.org/abs/2606.09253</link>
      <description>arXiv:2606.09253v1 Announce Type: new 
Abstract: Deformable image registration (DIR) is widely used in radiotherapy for dose propagation and accumulation, but uncertainty in the underlying deformation can substantially affect clinically relevant dose estimates. We present a practical probabilistic framework for propagating DIR uncertainty to voxel-wise dose statistics and dose-volume histograms (DVHs). The method models the mapped correspondence at each voxel as a random variable governed by a transparent local certainty map that can be defined by simple safety margins, structure-boundary mismatch, or structure-wise conservative uncertainty values. This yields interpretable quantities such as dose probabilities, expected dose, confidence bounds, and induced DVH envelopes.
  The framework is designed to remain lightweight and interpretable: it avoids complex biomechanical or ensemble-based uncertainty models and instead emphasizes simple parameterization, computational feasibility, and transparent dose metrics. We further introduce a structure-guided in/out strategy as an optional refinement that restricts mapping probabilities to anatomically plausible target regions. The approach is demonstrated on a prostate radiotherapy case study and used to compare different certainty-map strategies and probability kernels. The experiments show that the certainty-map design has a stronger effect on resulting dose and DVH uncertainty bounds than the specific kernel choice, while the additional benefit of the in/out strategy is case-dependent and modest in the present example. Overall, the proposed framework provides a transparent way to incorporate DIR uncertainty into radiotherapy dose assessment and to study how modelling choices affect propagated dose metrics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09253v1</guid>
      <category>cs.CV</category>
      <category>physics.med-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Stefan Heldmann, Sven Kuckertz, Nasim Givehchi, Thomas Coradi, Mikel Byrne, Ben Archibald-Heeren, Nils Papenberg</dc:creator>
    </item>
    <item>
      <title>RPO-PDT: Demonstrating Role-Play-Based Knowledge Adaptation for Student Support Dialogue (Demonstration System)</title>
      <link>https://arxiv.org/abs/2606.09255</link>
      <description>arXiv:2606.09255v1 Announce Type: new 
Abstract: We present RPO-PDT: a retrieval-grounded, role-play-based dialogue system for adaptive student support in higher education. RPO-PDT is: (1) able to provide institution-specific Personal Development Tutor (PDT) guidance using structured knowledge sources; (2) constrained by explicit persona, boundary, confidentiality, and safety policies; and (3) designed around a reverse-roleplay loop where unresolved interactions are replayed from the student perspective, enabling alternative tutor strategies to be generated and stored as reusable strategy memory. RPO-PDT supports both text-based and Furhat-based embodied interaction for demonstrating grounded, safe, and adaptive student-support dialogue.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09255v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Filip Janik, Ewa Olton, Robert Smales, Harris Spratt, Shea Tait, Md Zia Ullah, Yanchao Yu</dc:creator>
    </item>
    <item>
      <title>BSTabDiff: Block-Subunit Diffusion Priors for High-Dimensional Tabular Data Generation</title>
      <link>https://arxiv.org/abs/2606.09257</link>
      <description>arXiv:2606.09257v1 Announce Type: new 
Abstract: High-Dimensional Low-Sample Size (HDLSS) tabular domains (e.g., omics) are characterized by $n \ll m$, where $n$ = number of samples, and $m$ = number of features. Such domains often exhibit strong local correlation groups, sparse cross-group dependencies, heavy-tailed non-Gaussian marginals, heteroscedastic noise, and structured missingness, making direct density learning in $\mathbb{R}^m$ ill-conditioned since $n \ll m$. We propose BSTabDiff, a block-subunit generative framework that partitions the $m$ observed features into $M$ latent blocks ($M \ll m$) and generates each block via a shared low-dimensional subunit variable, concentrating global dependence learning in the compact block-latent space $\mathbb{R}^M$ while decoding to the full feature space with copula-driven dependence, flexible per-feature marginals, and explicit missingness mechanisms. BSTabDiff supports modern deep priors on block latents, including diffusion and normalizing flows, enabling stable synthesis and controllable benchmark generation in the HDLSS regime. Empirically, BSTabDiff produces more realistic and stable high-dimensional synthetic data when compared with unstructured tabular generators on HDLSS data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09257v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Al Zadid Sultan Bin Habib, Md Younus Ahamed, Prashnna Gyawali, Gianfranco Doretto, Donald A. Adjeroh</dc:creator>
    </item>
    <item>
      <title>Back to the Familiar Future: Failure Recovery for VLA Policies via Pre-Imagined Milestone Selection</title>
      <link>https://arxiv.org/abs/2606.09258</link>
      <description>arXiv:2606.09258v1 Announce Type: new 
Abstract: Vision-language-action (VLA) policies can deviate from nominal trajectories during manipulation, even when tasks remain physically feasible. Recovering from these deviations is challenging, as they push the policy into unfamiliar state spaces where direct re-planning frequently destabilizes action sequences. We propose Back to the Familiar Future (B2FF), a recovery framework for foresight-driven VLAs that leverages future visual conditioning as a recovery interface. Before execution, the VLA generates a milestone bank of familiar future states conditioned on the clean initial observation. At recovery time, a recoverability-aware selector selects a recovery milestone from this bank and enforces it as a fixed visual goal. This enables the VLA to robustly map off-trajectory observations back to a familiar future. On failure-injected LIBERO, under controlled recovery timing aligned with the injected failure, B2FF increases the average success rate of a baseline VLA from 56.3% to 74.0%, demonstrating that pre-imagined milestones can guide recovery without fine-tuning the low-level action generator.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09258v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Suyeon Shin, Juwon Kim, Hyeonbin Park, Hyunseo Kim, Hyundo Lee, Hyung-Sin Kim, Byoung-Tak Zhang</dc:creator>
    </item>
    <item>
      <title>Self-supervised Learning Matters: A Simple Ensemble Solution for Micro-Gesture Recognition</title>
      <link>https://arxiv.org/abs/2606.09261</link>
      <description>arXiv:2606.09261v1 Announce Type: new 
Abstract: In this paper, we present XInsight Lab's solution to the micro-gesture classification track of the 4th MiGA Challenge at IJCAI 2026, in which our solution ranked first and achieved a new state-of-the-art result. We propose a multimodal ensemble framework that integrates a self-supervised RGB-based model with supervised multi-stream models from previous solutions. The self-supervised RGB model is pretrained on 120K unlabeled clips via masked video modeling and then fine-tuned on iMiGUE. This simple yet effective RGB baseline achieves 69.224% top-1 accuracy on the iMiGUE test set, demonstrating the benefit of learning transferable representations from unlabeled in-domain videos. By incorporating this model as a complementary branch, the final ensemble reaches 74.419% top-1 accuracy, surpassing the previous state of the art by 1.206 percentage points. Experimental results on iMiGUE, including ablation studies on the ensemble strategy, validate the effectiveness of self-supervised RGB representation learning for micro-gesture recognition.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09261v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tingyi Liu, Kun Li, Fei Wang, Junjie Chen, Zhiliang Wu, Jihao Gu, Haixu Liu, Dan Guo</dc:creator>
    </item>
    <item>
      <title>See More, Match Better: Multi-Source Feature Fusion for Two-View Correspondence Learning</title>
      <link>https://arxiv.org/abs/2606.09262</link>
      <description>arXiv:2606.09262v1 Announce Type: new 
Abstract: Two-view correspondence learning aims to distinguish true correspondences (inliers) from false ones (outliers) in image pairs by leveraging their underlying differences. Existing methods mainly rely on coordinate-based geometric consistency. However, they often struggle with pseudo-consistent outliers in scenes containing repetitive structures, textureless regions, or locally similar geometric patterns. To address this limitation, we propose TriMatch, a multi-source feature fusion framework for two-view correspondence learning, which consists of two parts: feature extraction and feature refinement. In feature extraction, TriMatch jointly extracts geometric, texture semantic, and structural semantic features to provide complementary evidence for correspondence discrimination. To bridge the gap between semantic and geometric features, texture and structural semantic features are aligned with geometric features through dedicated Texture-Geometric Alignment and Structural-Geometric Alignment modules, respectively. We further introduce a Semantic-Guided Correspondence Modulation module, which modulates geometric features using semantic information to suppress geometrically plausible but semantically inconsistent correspondences. In feature refinement, a Hierarchical Semantic-Enhanced Correspondence Refinement strategy progressively models correspondence dependencies and recalibrates multi-context feature responses, enabling more reliable inlier-outlier discrimination. Extensive experiments demonstrate the effectiveness, robustness, and generalization capability of TriMatch.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09262v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaojie Li, Xin Jiang, Luanyuan Dai, Jinnan Yang, Yongdong Zhang, Zechao Li</dc:creator>
    </item>
    <item>
      <title>Physics-Guided Sequence-Based Generative Framework for Acoustic Metamaterial Inverse Design</title>
      <link>https://arxiv.org/abs/2606.09266</link>
      <description>arXiv:2606.09266v1 Announce Type: new 
Abstract: Acoustic metamaterial (AMM) inverse design is particularly challenging for broadband target responses due to acoustic dispersion: a structure that matches the desired response at one frequency may deviate at others, and modifying geometry to improve one sub-band often perturbs neighboring sub-bands. Yet existing broadband inverse-design approaches are either constrained by predefined templates, or rely on image representations that fail to preserve the geometric precision and structural connectivity required by acoustic structures. We present MetaSeq, a physics-guided, sequence-based generative framework for acoustic metamaterial inverse design. At its core, MetaSeq introduces a language that represents each AMM as a structured sequence, rather than as a pixel grid or fixed template. This representation preserves precise geometry, explicitly encodes connectivity, and casts inverse design as a sequence-to-sequence task from target response to structure sequence. MetaSeq further constructs a balanced, high-fidelity dataset with efficient calibration and complexity-based sampling. To address the one-to-many nature of inverse design, MetaSeq combines supervised pretraining with reinforcement learning fine-tuning guided by a physics-based solver and validity checker. Extensive evaluations against COMSOL and five baselines show that MetaSeq reduces response error by 45% over the best baseline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09266v1</guid>
      <category>cs.SD</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yijie Li, Jiahao Xu, Ching-Chih Tsao, Lili Qiu, Jingxian Wang</dc:creator>
    </item>
    <item>
      <title>VGP-Nav: Metric-Aware Visual Geometric Perception for Robot Navigation</title>
      <link>https://arxiv.org/abs/2606.09268</link>
      <description>arXiv:2606.09268v1 Announce Type: new 
Abstract: Reliable robotic navigation necessitates the seamless integration of accurate global localization and dense, metric-consistent obstacle perception. A common strategy to achieve these capabilities involves integrating diverse sensing modalities: cameras offer rich visual features for localization, while active sensors like LiDAR provide direct metric measurements. However, such multi-sensor configurations necessitate complex spatial-temporal calibration and increase deployment overhead. Although vision-only approaches offer a low-cost and scalable alternative, existing monocular visual systems typically struggle to simultaneously achieve efficient, globally consistent localization and dense, metric-consistent geometric perception. To bridge this gap, we propose \textbf{VGP-Nav}, a unified framework for \textit{Metric-Aware Visual Geometric Perception} that relies solely on monocular RGB input to jointly support metric localization and obstacle perception. Our key insight is to anchor localization-grounded visual geometry to physically meaningful scale constraints derived from ground-plane geometry, thereby providing a reliable metric reference for monocular perception. VGP-Nav resolves monocular scale ambiguity online and produces localization-grounded, metric obstacle representations that are directly applicable to downstream planning. Extensive experiments demonstrate strong generalization across diverse environments and successful deployment on real mobile robots, highlighting the practicality of our approach for scalable, low-cost, and safe autonomous navigation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09268v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hewei Pan, Weiye Zhu, Zekai Zhang, Zitong Huang, Rongtao Xu, Jinbao Wang, Feng Zheng</dc:creator>
    </item>
    <item>
      <title>Multi-View Speech Representation Learning for Parkinson's Disease Detection Using Context-guided Cross-modal Attention</title>
      <link>https://arxiv.org/abs/2606.09271</link>
      <description>arXiv:2606.09271v1 Announce Type: new 
Abstract: Parkinson's disease (PD) is a progressive neurodegenerative disorder that frequently causes speech impairments associated with hypokinetic dysarthria. As speech production relies on the precise coordination of complex neuromuscular mechanisms, speech analysis has emerged as a promising non-invasive and cost-effective biomarker for early PD detection. Recent deep learning approaches have shown encouraging results; however, most existing methods rely on a single speech representation, potentially overlooking complementary pathological information encoded across different feature spaces. In this work, we propose a multi-branch deep learning framework for automatic PD detection from speech. Each recording is segmented into 5-second chunks and represented using three complementary modalities: Log-Mel spectrograms, MFCCs, and HuBERT embeddings extracted from raw waveforms. The spectrograms are processed using a pre-trained ResNet-18 encoder, MFCC sequences are modeled through a BiLSTM network, and raw speech is encoded using a pre-trained HuBERT model. To effectively integrate these heterogeneous representations, we introduce a context-guided cross-modal attention mechanism that dynamically weights temporal HuBERT embeddings according to the global acoustic context derived from the spectrogram and MFCC branches. Experiments conducted on the publicly available Spanish PC-GITA corpus under strict speaker-independent 5-fold cross-validation demonstrate the effectiveness of the proposed approach. The proposed architecture achieves an accuracy of 91.51%, an F1-score of 91.24%, and an AUC of 95.97%. Furthermore, ablation studies confirm the contribution of both the proposed context-guided cross-modal attention mechanism and the integration of complementary speech representations. These findings highlight the potential of heterogeneous speech modeling for robust and clinically reliable PD detection.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09271v1</guid>
      <category>cs.SD</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>George Theodosiou, Loukas Ilias, Dimitris Askounis</dc:creator>
    </item>
    <item>
      <title>EditSSC: Toward Editable Semantic Occupancy Scenes with Unconditional Diffusion Models</title>
      <link>https://arxiv.org/abs/2606.09273</link>
      <description>arXiv:2606.09273v1 Announce Type: new 
Abstract: 3D semantic scene generation is crucial for autonomous driving applications, yet most methods rely on complex 3D-specific architectures such as triplane encoders and adapted diffusion networks, limiting both their simplicity and their editing capabilities. We propose EditSSC, an editing-ready method for 3D semantic scene generation using 2D Bird's Eye View (BEV) representations and off-the-shelf latent diffusion network. Our approach reshapes 3D semantic occupancy grids into multi-channel BEV images and leverages the quantized autoencoder and UNet from Stable Diffusion with minimal modifications. We perform diffusion on the latents after quantization, which enables training-free editing capabilities. By exploiting class-to-code correspondences in the codebook, our method supports sketch-guided generation, inpainting, and outpainting without any retraining. On SemanticKITTI, EditSSC outperforms existing 3D-specific baselines on unconditional generation, demonstrating that well-established 2D architectures can be effectively repurposed for 3D scene generation and editing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09273v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Fatima Balde, Raoul de Charette, Alexandre Boulch</dc:creator>
    </item>
    <item>
      <title>An implicit octree-based adaptive Material Point Method</title>
      <link>https://arxiv.org/abs/2606.09275</link>
      <description>arXiv:2606.09275v1 Announce Type: new 
Abstract: The Material Point Method provides an effective approach for modelling the large deformations that often arise from contact interactions between rigid structures and surrounding continua. However, solving these problems requires accurate representation of the continuum-structure interface, which necessitates high resolution background mesh and material point discretisations. This requirement, combined with evolving continuum-structure interfaces and the fact that most Material Point Method implementations are dependent on structured meshes, can result in large numerical systems and long run times especially when modelling problems in three-dimensions. Motivated by this issue, this paper provides the first octree-based implicit Material Point Method for efficient solution of large deformation continuum-structure interaction problems. The octree background mesh provides a natural way to automatically adapt both the computational mesh and the material point discretisation based on the position of the interaction between the structure and continuum. The new approach is demonstrated on a number of large deformation benchmark and continuum-structure interaction problems, where up to a 5.5-times speed up and a consequent 21-times CO2 saving is achieved when running on a HPC compared to results obtained using a conforming mesh.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09275v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Robert E. Bird, William M. Coombs, Charles E. Augarde, Ted J. O'Hare</dc:creator>
    </item>
    <item>
      <title>ERBench: A Benchmark and Testsuite for Equation Discovery Algorithms</title>
      <link>https://arxiv.org/abs/2606.09276</link>
      <description>arXiv:2606.09276v1 Announce Type: new 
Abstract: Equation discovery aims to automate the discovery of scientific models in the form of mathematical equations from data. Technically, equation discovery is implemented by symbolic regression algorithms. Performance of symbolic regression for equation discovery is measured along two dimensions: Prediction accuracy on test data, and recovery of known groundtruth formulas. For standard regression, accuracy is typically measured on in-domain test data, for instance, by splitting a data set randomly into training and test data. While this makes sense for in-domain interpolation, which is the common goal in ordinary regression, it can be a misleading proxy for true model discovery and generalization. The obvious alternative is to measure out-of-domain accuracy. However, obtaining challenging out-of-domain test data is a non-trivial problem. Therefore, we focus on equation recovery for evaluating symbolic regression algorithms for equation discovery. The rationale is that symbolic regression algorithms that perform well in recovering known groundtruth formulas are good candidates to perform well in unknown equation discovery. Existing benchmarks for symbolic regression include equation recovery tasks, however, with only a small number of groundtruth formulas that are publicly known. Moreover, these benchmarks place less emphasis on evaluating the robustness of algorithms in terms of their behavior under changing dimensionality, sampling size, sampling distribution and sampling domain. This, however, is of central importance to practitioners wanting to discover equations for modeling natural phenomena, since data is almost certainly noisy and comes from diverse domains, distributions, and sample sizes. To fill this gap, we introduce the Equation Recovery Benchmark (ERBench), a new evaluation framework designed to rigorously assess algorithms explicitly targeting the task of equation discovery.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09276v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Paul Kahlmeyer, Henrik Voigt, Michael Habeck, Joachim Giesen</dc:creator>
    </item>
    <item>
      <title>Internalizing Geometric Law: Learning from Solver Residuals for Precision-Critical Generation</title>
      <link>https://arxiv.org/abs/2606.09278</link>
      <description>arXiv:2606.09278v1 Announce Type: new 
Abstract: Large Language Models frequently hallucinate in precision-critical domains such as technical diagramming and mechanical design, where outputs must satisfy strict geometric constraints. We study open-ended geometric synthesis from natural language: translating free-form descriptions into precise constructions whose entities must simultaneously satisfy dozens of interacting constraints. To make this tractable, we release PyGeoX, a programmable geometric DSL that compiles declarative constraints into a differentiable loss, and PyGeoX-Bench, a stratified suite of 300 problems with per-constraint verifiable rewards. Using PyGeoX as a verifier, we identify a failure mode we call Outlier Gradient Masking: under global-norm rewards (any scheme that aggregates residuals through a single norm, for example, $\exp(-\mathrm{MSE})$), a single outlier constraint can nullify the learning signal across all others. To address this, we propose Saturating Additive Rewards (SAR), which decompose the reward into bounded per-constraint terms, preserving partial progress and ensuring consistent gradients even under severe violations. Against MSE-based rewards, the natural baseline for geometry solvers, SAR improves the hard-tier solving rate by $2.3\times$, and the resulting 8B model is competitive with much larger frontier systems on this benchmark. We release the engine, benchmark, and data at https://github.com/Huawei-AI4Math/PyGeoX.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09278v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rafael Cabral, Pang Zixi, Ziyi Shou, Shen Xin</dc:creator>
    </item>
    <item>
      <title>Bridging nanoparticle morphology and viscoelastic behavior in epoxy nanocomposites: A coarse-grained simulation-informed constitutive model</title>
      <link>https://arxiv.org/abs/2606.09279</link>
      <description>arXiv:2606.09279v1 Announce Type: new 
Abstract: Accurate prediction of the material behavior of polymer nanocomposites under various thermomechanical loading conditions is increasingly demanded for engineering applications. This study proposes an integrated framework combining coarse-grained (CG) molecular simulations and experimental testing to develop predictive constitutive models for nanoparticle/epoxy nanocomposites. The key contribution of this work lies in characterizing the influence of nanoparticle content and agglomerate size on the rate- and temperature-dependent behavior of nanocomposites, enabled by large-scale CG simulations. The proposed framework successfully captures the material response, including nonlinear hyperelasticity, softening behavior, and rate- and temperature-dependent properties, across a broad range of strain rates, temperatures, and nanoparticle sizes and weight fractions. The predictive capability of the CG simulation-informed constitutive model is validated using additional experimental data that were not included in the parameter identification process. By reducing reliance on extensive experimental testing while maintaining high accuracy, this simulation-driven approach offers an efficient pathway for developing robust, predictive constitutive models for designing and optimizing advanced nanocomposites.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09279v1</guid>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Atiyeh Hentea, Shadab Zakavatib, Behrouz Arash, Maximilian Jux, Raimund Rolfes</dc:creator>
    </item>
    <item>
      <title>Revisiting mesoscopic traffic flow simulation in SUMO: Limitations, analysis, and an alternative</title>
      <link>https://arxiv.org/abs/2606.09282</link>
      <description>arXiv:2606.09282v1 Announce Type: new 
Abstract: Mesoscopic traffic flow models combines the merits of both macroscopic and microscopic models by capturing individual vehicle behavior in great detail and remaining the computational efficiency. At the time of this study, the mesoscopic model proposed by Eissfeldt (2004) is used in Simulation of Urban MObility (SUMO). The movement of vehicles is governed by dynamic headways between edges. However, the model does not fully comply with the principle of the Lighthill-Whitham-Richards (LWR) model. Several problems are identified, including the incomplete consideration of queue dynamics and the limited implementation of backward traveling spaces. Two case study scenarios demonstrate that the problems lead to unrealistic onset and recovery pattern of congestion. The magnitude of congestion is generally underestimated with this model. To address these drawbacks, a proper mesoscopic discrete-time implementation of link transmission model, which follows the LWR principle, is proposed. By explicitly incorporating backward traveling spaces to capture queue spillback phenomena, the proposed model provides a more precise representation of congestion dynamics. The link density outputs are consistent with the kinematic wave theory and the microscopic traffic simulation in SUMO, thus verifying its theoretical accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09282v1</guid>
      <category>eess.SY</category>
      <category>cs.MA</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ying-Chuan Ni, Alina Akopian, Anastasios Kouvelas, Michail A. Makridis</dc:creator>
    </item>
    <item>
      <title>VAIC: Vision-Guided Humanoid Agile Object Interaction Control via Decoupled Commands</title>
      <link>https://arxiv.org/abs/2606.09286</link>
      <description>arXiv:2606.09286v1 Announce Type: new 
Abstract: Humanoid robots hold immense potential for real-world assistance, yet agile interaction with objects in unstructured environments demands tightly coupled whole-body coordination. Despite recent advancements, current controllers face a critical deployment gap. They rely heavily on dense reference trajectories and perfect state observability, which inherently limits physical generalization. We present Vision Guided Agile Interaction Control (VAIC), a unified framework that bridges this gap by operating exclusively on onboard depth, historical proprioception, and a decoupled user command interface. VAIC employs a two-stage distillation paradigm. First, a privileged teacher policy masters diverse interaction skills using precise object kinematics and exact environmental states. Second, a deployable student policy distills these capabilities by replacing full body tracking with velocity targets across multiple axes and an interaction indicator for each frame. The student utilizes a recurrent object adaptation module to implicitly infer unobservable object dynamics from raw depth streams and proprioception. Evaluations and real-world deployments on the humanoid robot demonstrate that a single VAIC policy successfully executes highly diverse dynamic tasks. These tasks include box carrying, cart interaction, and skateboarding, consistently outperforming baselines and advancing autonomous humanoid deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09286v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dongting Li, Qianyang Wu, Xingyu Chen, Liang Li, Yuhang Lin, Sikai Wu, Guoyao Zhang, Mingliang Zhou, Diyun Xiang, Qiang Zhang, Renjing Xu, Jianzhu Ma</dc:creator>
    </item>
    <item>
      <title>Trajectory Geometry of Transformer Representations Across Layers</title>
      <link>https://arxiv.org/abs/2606.09287</link>
      <description>arXiv:2606.09287v1 Announce Type: new 
Abstract: Understanding how transformer representations evolve across layers, not merely what they encode, remains an open problem in mechanistic interpretability. We recast the transformer forward pass as a discrete population trajectory through a high-dimensional representation manifold, drawing on geometric tools from computational neuroscience. Rather than probing for pre-specified features, we characterize trajectory geometry using five metrics computed directly in the ambient space: trajectory length, curvature, a semantic convergence index, layerwise cosine similarity, and representational stability. Across three model families (GPT-2, TinyLlama, Qwen2.5) and five controlled prompt families, we report four findings. First, semantically related prompts converge significantly in middle-to-late layers (peak CI 0.41--0.58, p&lt;0.001, Mann-Whitney U), consistent with attractor-like dynamics. Second, reasoning tasks produce trajectories of greater curvature than lexical variations (0.71--0.83 rad vs. 0.27--0.31 rad), suggesting curvature encodes computational complexity. Third, ambiguous tokens exhibit trajectory bifurcation with up to 5.6x representational separation by the final layer, absent in unambiguous controls. Fourth, layerwise cosine similarity reveals a universal three-phase structure: encoding, elaboration, and output preparation, consistent across all three architectures. All four effects vanish under shuffled-layer and random-embedding controls. We release a fully open-source, model-agnostic pipeline and argue that trajectory geometry constitutes a principled, probe-free lens for mechanistic interpretability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09287v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vishal Pandey, Gopal Singh</dc:creator>
    </item>
    <item>
      <title>Intention Driven Identification of In-Possession Match Phases in Association Football through Temporal Graph Learning</title>
      <link>https://arxiv.org/abs/2606.09289</link>
      <description>arXiv:2606.09289v1 Announce Type: new 
Abstract: Understanding tactical organisation of association football, hereafter referred to as football, requires identifying distinct match phases. Yet in-possession phases are rarely directly observable and are shaped by evolving tactical intentions, rather than spatial patterns alone. This study proposes a data-driven framework for identifying in-possession match phases from spatiotemporal tracking data. Seven German Bundesliga matches recorded at 25 Hz with TRACAB were analysed. A hierarchical phase model was defined with three tactical intentions (Invade Opponent Space, Keep Possession, Scoring) and six phases (Build Up, Progression, Counter Attack, Maintenance, Sustained Threat, Finishing). A Temporal Graph Attention Network (T-GAN) was developed to combine frame-level player-interaction graphs, contextual features, and Transformer-based temporal modelling. Performance was evaluated using frame-level F1 and a sequence-aware Intersection over Truth-Dominance (IoT-D) metric. T-GAN achieved macro-average frame-level F1 scores of 0.87 at the intention level, 0.76 for invasion-related phases, and 0.79 for scoring phases. At the sequence level, mean diagonal IoT-D F1 increased from 0.68 to 0.79 for intentions and from 0.61 to 0.71 for phases after post-processing, indicating improved temporal coherence. Model comparisons showed that sequence modelling was the main driver of segmentation quality, while graph-based relational modelling was particularly beneficial for Counter Attack recognition. Exploratory player attention analysis further suggested that wide and midfield positional groups contributed strongly to phase discrimination. Overall, the framework translates continuous tracking data into tactically interpretable in-possession phase representations, with potential applications in automated match annotation, tactical analysis, and playing-style profiling.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09289v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuesen Li, Daniel Link</dc:creator>
    </item>
    <item>
      <title>Visual Para-Thinker++: A Single-Policy Multi-Agent Framework for Visual Reasoning</title>
      <link>https://arxiv.org/abs/2606.09290</link>
      <description>arXiv:2606.09290v1 Announce Type: new 
Abstract: Visual reasoning requires integrating evidence distributed across regions, attributes, and relations, making single-chain reasoning prone to early perceptual commitment and hallucination. We propose Visual Para-Thinker++, a single-policy multi-agent framework in which one shared MLLM policy is instantiated as role-conditioned Main, Worker, and Summary Agents. The Main Agent decomposes the task with fixed allocation patterns; Worker Agents reason in parallel under context isolation; and the Summary Agent reconciles full Worker reasoning traces rather than majority-voting on final labels. The shared policy is trained by Multi-Agent Capability Injection and Role-Decoupled Multi-Agent Optimization, which assign role-specific rewards and advantages to corresponding token segments to reduce gradient conflict among collaborative roles. A native inference engine enables efficient multi-agent rollout through shared visual prefix and KV cache reuse. Across V*, CountBench, the RefCOCO family, and HallusionBench, Visual Para-Thinker++ consistently outperforms single-trajectory and inference-time parallel baselines, with especially strong gains on hallucination-sensitive visual reasoning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09290v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haoran Xu, Hongyu Wang, Yifei Gao, Jiaze Li, Zizhao Tong, Xiaofeng Zhang, Xiaosong Yuan</dc:creator>
    </item>
    <item>
      <title>Dual Quaternion-Based Unscented Kalman Filter with Visual Inertial Odometry for Navigation in GPS-Denied Environments</title>
      <link>https://arxiv.org/abs/2606.09292</link>
      <description>arXiv:2606.09292v1 Announce Type: new 
Abstract: Reliable navigation in GPS-denied environments remains a fundamental challenge in robotics, aerospace, and autonomous vehicle applications. This paper presents a Dual Quaternion-Based Unscented Kalman Filter (DQUKF) equipped with a Visual Inertial Odometry (VIO) algorithm for accurate state estimation enabling navigation in GPS denied locations. The proposed framework formulates the DQUKF in an error state manner, where the nominal pose is represented by a unit dual quaternion and the local pose error is represented by a 6-dimensional twistor parameterization used for sigma point generation, covariance propagation, and measurement correction. In parallel, the VIO algorithm tracks features across image frames, synchronizes measurements between the IMU and camera, and provides visual constraints that complement inertial propagation. Simulation results on the EuRoC MAV dataset show that the proposed DQUKF converges under high initialization uncertainty and achieves a position RMSE of 0.2584~m in the difficult flight sequence, outperforming the benchmark filters.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09292v1</guid>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1016/j.measurement.2026.121964</arxiv:DOI>
      <dc:creator>Mohamed Khalifa, Hashim A. Hashim</dc:creator>
    </item>
    <item>
      <title>One Model, Multiple Goals: Adaptive Multi-Objective Learning for E-commerce Dialogue Systems</title>
      <link>https://arxiv.org/abs/2606.09293</link>
      <description>arXiv:2606.09293v1 Announce Type: new 
Abstract: Dialogue systems in e-commerce scenarios often need to satisfy multiple objectives: accurately reasoning over user profiles (e.g., eligibility, credit limit) to ensure correct decision-making and user state interpretation, while also generating natural and faithful responses. These goals are complementary but not identical. In this work, we propose MORE, an adaptive Multi-Objective REinforcement learning framework that jointly optimizes reasoning accuracy and linguistic naturalness. Our preliminary experiments show that directly mixing rewards with diverging optimization dynamics can cause oscillations and unstable learning. Thus, instead of optimizing a single mixed reward, we treat reasoning functions as constraints that guide policy optimization. At inference time, the system directly generates responses without explicit reasoning steps, while still benefiting from reasoning-enhanced scaffold and avoiding additional inference overhead. To better balance linguistic objectives during response generation, we introduce an adaptive multi-reward mechanism that aggregates signals such as fluency and naturalness and dynamically reweighs them via gradient feedback. We evaluate MORE on two real-world dialogue systems at ByteDance and the MultiWOZ 2.2 benchmark, where it consistently outperforms strong baselines. In 14-day online experiments on ByteDance production traffic, MORE improves overall and reached conversion by 16.53% and 30.09%, while increasing user satisfaction and reducing handoff rates. Notably, in a human-machine comparison, MORE recovers about 60% of the incremental conversion lift achieved by human agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09293v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mingzhe Li, Jing Xiang, Enguo Zhou, Lang Gao, Tai Li, Qishen Zhang, Xiangliang Zhang, Xiuying Chen</dc:creator>
    </item>
    <item>
      <title>Virtual-point-based Solutions to Handle Generalized Absolute Pose Problem</title>
      <link>https://arxiv.org/abs/2606.09294</link>
      <description>arXiv:2606.09294v1 Announce Type: new 
Abstract: Multi-camera systems are increasingly adopted in robotics and autonomous navigation for their wide field of view, flexibility, and fault tolerance. Nevertheless, existing PnP solvers fail to handle multiple projection centers. This paper introduces a virtual point formulation that bridges the standard PnP and generalized pose problems, enabling a unified pipeline that transforms existing PnP solvers into generalized pose solvers. Based on this framework, we derive three Virtual-point-based Generalized Pose solvers, namely VGPc, VGPq, and VGPr, leveraging Cayley, quaternion, and rotation-matrix parameterizations, respectively. Extensive experiments demonstrate that the proposed solvers inherit the accuracy and efficiency of original PnP algorithms while significantly outperforming existing generalized solvers. Specifically, VGPc achieves higher estimation accuracy under heteroscedastic noise conditions, VGPq maintains global optimality, whereas VGPr provides superior computational efficiency without accuracy degradation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09294v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bin Li, Banglei Guan, Shunkun Liang, Yang Shang</dc:creator>
    </item>
    <item>
      <title>N\"ushuVoice: Reviving the Voice of Endangered N\"ushu with Pitch-Aware Text-to-Speech</title>
      <link>https://arxiv.org/abs/2606.09295</link>
      <description>arXiv:2606.09295v1 Announce Type: new 
Abstract: N\"ushu is an endangered phonetic script historically used by women in Jiangyong County, southern Hunan, China. While existing computational studies of N\"ushu mainly focus on textual digitization and visual recognition, the acoustic reconstruction of its authentic pronunciation remains largely unexplored. Building a N\"ushu text-to-speech (TTS) system is particularly challenging because available recordings are extremely limited and mostly consist of isolated syllable-level pronunciations rather than natural sentence-level utterances. In this work, we introduce N\"ushuVoice, the first TTS benchmark for N\"ushu. We construct a sentence-level N\"ushu text-to-audio dataset that aligns standardized Unicode N\"ushu text, phonetic transcriptions, standard Chinese translations, and archival recordings. To synthesize speech under this extreme low-resource setting, we propose N\"ushu-PitchVITS, an F0-conditioned VITS framework that leverages N\"ushu's five-level pitch notation as an explicit prosodic inductive bias. Experimental results show that N\"ushu-PitchVITS outperforms strong TTS baselines in spectral fidelity, pitch reconstruction, and human-rated intelligibility. We publicly release the dataset and code at: https://anonymous.4open.science/r/Nvshu-TTS-2EB6.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09295v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hongkun Yang, Xinhui Yi, Xiyan Zhao, Yibo Meng, Lionel Z. Wang, Lixu Wang, Yaqi Zhang, Ruiqi Chen, Xuanyue Zhao, Lanxin Zhang, Yu Zeng, Weijia Chu, Yiming Ma, Chenyu Liu, Jianghao Lin, Xin Xu</dc:creator>
    </item>
    <item>
      <title>Justification and structure- and asymptotic-preserving discretizations of a hyperbolized Cahn-Hilliard equation</title>
      <link>https://arxiv.org/abs/2606.09299</link>
      <description>arXiv:2606.09299v1 Announce Type: new 
Abstract: We study a hyperbolic approximation ("hyperbolization") of the Cahn-Hilliard (CH) equation, originally proposed by Dhaouadi, Dumbser, and Gavrilyuk (2025, DOI: 10.1098/rspa.2024.0606) and study its convergence towards the CH model in a relaxation limit both via formal asymptotic expansions and, for a slightly modified approximation, via the relative energy framework. Moreover, we develop energy-stable semidiscretizations of the CH equation and of this hyperbolization using upwind summation-by-parts operators in space. Subsequently, we combine them with (additive) implicit-explicit (IMEX) Runge-Kutta methods based on a convex-concave splitting. We show that the resulting method is asymptotic preserving, i.e., it converges in the limit of the relaxation parameter to a stable discretization of the original CH equation. The choice of the necessary parameters is guided by the a priori error estimate based on the relative energy framework.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09299v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jan Giesselmann, Fabio Leotta, Hendrik Ranocha, Jochen Schuetz</dc:creator>
    </item>
    <item>
      <title>PRISM: Topology-Aware Cross-Modal Imputation for Modality-Deficient Federated Graph Learning</title>
      <link>https://arxiv.org/abs/2606.09301</link>
      <description>arXiv:2606.09301v1 Announce Type: new 
Abstract: Multimodal federated graph learning (MM-FGL) aims to collaboratively learn from decentralized graphs with text and images. However, real-world clients may not share a common modality basis: a visual-search client may contain image--interaction graphs but no seller descriptions, while a catalog client may provide text but no product images. We refer to this practical setting as client-level modality deficiency. Unlike random instance-wise missingness, a deficient client lacks the local semantic basis needed to reconstruct the absent modality. More importantly, in graph learning, incomplete representations initialize message passing, so imputation errors can be filtered, mixed, and amplified by the receiving topology. To address this gap, we propose \textbf{PRISM} (\textbf{P}roactive \textbf{R}etrieval and \textbf{I}mputation via \textbf{S}tructural \textbf{M}eta-prompting), a topology-aware federated cross-modal imputation framework. Rather than reconstructing the missing modality solely from local observations, PRISM recovers missing-modality semantics from the federation and introduces them into local graph propagation under topology-aware control. Experiments on six multimodal graph datasets across graph-centric and modality-centric tasks show that PRISM consistently improves modality-deficient clients, outperforming state-of-the-art baselines by \textbf{4.48}\% on average.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09301v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Zekai Chen, Miao Zhang, Jiayang Xing, Xunkai Li, Xun Wu, Rong-Hua Li, Guoren Wang</dc:creator>
    </item>
    <item>
      <title>Reason Twice: Segmentation via Candidate Discovery and Comparative Reasoning</title>
      <link>https://arxiv.org/abs/2606.09303</link>
      <description>arXiv:2606.09303v1 Announce Type: new 
Abstract: The rapid development of pretrained foundation models has enabled more general image segmentation. Multimodal large language models (MLLMs) have been widely explored for image segmentation with complex queries that require high-level reasoning. Despite promising progress, existing methods are often constrained by limited training data and the gap between MLLMs and mask generation modules. To better transfer MLLMs' perception and reasoning ability to complex reasoning-based segmentation tasks, we propose a two-stage framework Rea2Seg for mask generation and selection. Specifically, the framework first identifies potential regions as candidate masks based on the attention maps of a segmentation MLLM. It then employs an MLLM to reason over the question and candidate masks and assign scores to each mask. The final segmentation result is obtained by reranking the candidates and selecting the highest-scoring mask, reformulating image segmentation as candidate discovery followed by discriminative mask selection.
  We also notice that a large portion of questions in existing benchmarks focus on commonsense reasoning, and these questions usually do not fully require joint visual observation and reasoning. To address this issue, we introduce a new benchmark called ReasonSeg-SGDR that comprehensively evaluates a model's perception, grounding, and reasoning abilities across multiple dimensions, including discriminative recognition, spatial reasoning, geometric reasoning, and multi-step reasoning, with fine-grained mask generation.
  In addition, we collect training data to enhance MLLMs' ability to jointly understand multimodal queries and candidate masks, and to assign scores through reasoning. Experimental results on the proposed benchmark and ReasonSeg demonstrate the effectiveness of the unified mask generation and selection framework.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09303v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xinyan Gao, Haoran Hao, Xiangyu Yue</dc:creator>
    </item>
    <item>
      <title>SG-OPD: Sign-Gated On-Policy Distillation via Sign-Consistency Gating and Phased Teacher Sampling</title>
      <link>https://arxiv.org/abs/2606.09304</link>
      <description>arXiv:2606.09304v1 Announce Type: new 
Abstract: On-policy distillation (OPD) trains a student on its own trajectories with dense per-token supervision from a stronger teacher, and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its effectiveness implicitly relies on two assumptions that frequently break in practice: trajectory-level alignment between the student and the teacher, and uniform token-level reliability of the teacher's preferences. We therefore propose Sign-Gated On-Policy Distillation (SG-OPD), which uses a binary verifier as a trust signal for the teacher at two complementary granularities: phased teacher sampling mixes in verifier-endorsed teacher rollouts at cold-start, and a sign-consistency gate extrapolates the distillation update on tokens where the teacher agrees with the verifier-correct direction and interpolates it where it disagrees. Experiments on competition-level mathematical reasoning benchmarks show that SG-OPD consistently outperforms standard OPD, with average gains of 1.98 and 7.50 at the per-sample and per-question levels, respectively.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09304v1</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haoran Xu, Hongyu Wang, Yifei Gao, Jiaze Li, Xiaofeng Zhang, Xiaosong Yuan</dc:creator>
    </item>
    <item>
      <title>FF-JEPA: Long-Horizon Planning in World Models with Latent Planners</title>
      <link>https://arxiv.org/abs/2606.09311</link>
      <description>arXiv:2606.09311v1 Announce Type: new 
Abstract: Joint Embedding Predictive Architectures (JEPAs) have shown promising world modeling capabilities, enabling planning in latent space by optimizing action trajectories using methods like the Cross-Entropy Method (CEM). These methods are, however, too computationally expensive and ineffective for long-horizon planning. Furthermore, these methods typically require an explicit image of the goal state, which is not always possible in real-world tasks. In this work, we tackle these limitations by proposing Forward-Forward-JEPA (FF-JEPA), a hierarchical approach leveraging two forward dynamics models. Alongside a standard action-conditioned forward model, we introduce an action-free latent planner that predicts the next subgoal given the current state. This approach removes the need for goal images and enables long-horizon planning by decomposing complex trajectories into a sequence of tractable, short-term optimization problems. Preliminary results on PushT demonstrate that FF-JEPA successfully overcomes flat world models' long-horizon collapse, highlighting this approach as a promising direction for goal-free planning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09311v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sergi Masip, Jonathan Swinnen, Yutong Hu, Renaud Detry, Tinne Tuytelaars</dc:creator>
    </item>
    <item>
      <title>Toward Compiler World Models: Learning Latent Dynamics for Efficient Tensor Program Search</title>
      <link>https://arxiv.org/abs/2606.09312</link>
      <description>arXiv:2606.09312v1 Announce Type: new 
Abstract: Tensor program optimization is essential for modern machine learning systems, but its search space is enormous. Existing auto-schedulers reduce measurement cost with learned cost models, yet they usually evaluate each candidate as a static code snapshot, ignoring the schedule trajectory that produced it. This makes them insensitive to action dependencies and vulnerable to superficial code variations. We propose a \emph{world-model-inspired} evaluator that models schedule evaluation as action-conditioned latent dynamics over program states. Starting from the initial program, it rolls out scheduling actions in a continuous latent space with a lightweight transition model, avoiding expensive AST mutation and repeated code encoding. The final dynamic representation is combined with action and hardware features to rank candidates. Implemented in TVM AutoScheduler, our method improves representative-subgraph latency over Ansor by 1.37$\times$ on GPU and 1.54$\times$ on CPU under the same 64-trial budget. It also matches Ansor-10K within 2.2% geometric mean using 10$\times$ fewer measurements, and accelerates full-model inference over PyTorch/PyTorch-opt(cuDNN) by 4.61$\times$/3.67$\times$ geometric mean.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09312v1</guid>
      <category>cs.LG</category>
      <category>cs.PL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haolin Pan, Lianghong Huang, Xvlin Zhou, Mingjie Xing, Yanjun Wu</dc:creator>
    </item>
    <item>
      <title>Machine-Learning Emulation of Satellite Greenhouse Gas Retrievals: Stability over Time</title>
      <link>https://arxiv.org/abs/2606.09313</link>
      <description>arXiv:2606.09313v1 Announce Type: new 
Abstract: Retrieval algorithms are used to estimate atmospheric concentrations of greenhouse gases (GHGs), such as carbon dioxide (CO2) and methane (CH4), by solving inverse problems from high-spectral-resolution satellite radiance measurements. However, these algorithms are computationally expensive, which makes real-time estimation at scale difficult. Machine-learning models have therefore been proposed as fast emulators of retrieval algorithms. Most existing studies, however, evaluate them only on test data from the same period as the training data.
  We study the stability over time of such emulators using data from the Greenhouse Gases Observing SATellite (GOSAT). We show that prediction accuracy generally deteriorates when the test period moves away from the training period. We also show that including time as an input feature substantially improves XCH4 prediction for Lasso and neural-network models. Among the methods considered, a simple Lasso model performs as well as or better than more complex methods such as neural networks, and yields more stable predictions over time. We further validate the results using the Total Carbon Column Observing Network (TCCON), a ground-based observation network. On the TCCON-matched dataset, the time-augmented Lasso achieves errors against TCCON that are comparable to the disagreement between GOSAT and TCCON for both XCO2 and XCH4.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09313v1</guid>
      <category>cs.LG</category>
      <category>stat.AP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Nugzar Gognadze, Motonobu Kanagawa, Yu Someya, Hisashi Yashiro</dc:creator>
    </item>
    <item>
      <title>KPGrasp: Scalable Keypoint Flow Matching for Dexterous Grasp Generation</title>
      <link>https://arxiv.org/abs/2606.09314</link>
      <description>arXiv:2606.09314v1 Announce Type: new 
Abstract: Generating high-quality dexterous grasps remains challenging for learning-based methods, which often depend on carefully tuned contact losses or costly contact-based test-time refinement. We present KPGrasp, a flow-matching framework that learns dexterous grasp priors from large-scale data rather than relying on contact losses or contact-based test-time refinement. KPGrasp couples an all-Euclidean 3D hand-keypoint parameterization with a simple yet scalable Transformer flow model. The parameterization avoids the drawbacks of the conventional mixed SE(3) pose and joint-angle output space, expresses grasps in the same frame as the object point cloud, and thus enables native spatial reasoning; the Transformer flow model is trained with only the standard flow-matching loss and scales effectively with data, model capacity, and batch size. Experiments demonstrate state-of-the-art performance on two simulation benchmarks. On the Dexonomy benchmark, it reaches a 76.3% grasp success rate, improving over the strongest directly comparable baseline by 47.4% while reducing penetration depth to 2.4 mm. The same model also achieves the best average performance on the DexGrasp Anything benchmark without fine-tuning. For batched inference, KPGrasp requires only 0.032 s per grasp. Finally, real-world experiments on 20 diverse objects demonstrate that the pipeline can be deployed in a real-world setup.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09314v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuansen Huang, Jiayi Chen, Haoran Liu, Yubin Ke, Bing Han, Jiangran Lyu, Mi Yan, Li Yi, He Wang</dc:creator>
    </item>
    <item>
      <title>Brain-Prompt Injection: A Route-Safety Audit for BCI-LLM Agents</title>
      <link>https://arxiv.org/abs/2606.09315</link>
      <description>arXiv:2606.09315v1 Announce Type: new 
Abstract: BCI-to-agent pipelines turn decoded neural activity into an authorization channel for tool-use agents, exposing a new attack surface we call \emph{brain-prompt injection}: signal-side perturbations, context-only injections, and adaptive dual-decoder attacks can all change the routed action while EEG-side or text-side monitors remain blind. Route safety in this stack depends on what the audit log can observe, not on decoder accuracy or agreement alone. We define a Route-Safety Audit Contract: a minimal log schema, denominator hierarchy, and endpoint specification, and prove an audit-schema separation theorem together with a C3 attacked-dependence decomposition; clean agreement and marginal robustness do not identify the joint term that controls C3 routing. As a calibration layer on top of the contract, we apply split-conformal calibration to a non-oracle EEG confirmation channel and report the resulting false-accept frontier under an explicit threat-archetype matrix. We instantiate the contract on EEGMMI native left/right command-control over 5{,}400 events, harmless tool stubs, and seed/case denominators. Provenance blocks C2 routes ($0.000$); agreement-plus-provenance routes C3 flips ($1.000$); confirmation-plus-provenance routes them ($0.000$). The conformal frontier reaches FAR $0.000$ at clean utility $0.150$ for $\alpha=.005$ and FAR $0.119$ at clean utility $0.452$ for $\alpha=.10$ under acquisition isolation; an attacker-controllable confirmation channel breaks the bound to $\approx\!1$. Subject-cluster bootstrap confirms these intervals on $60$ subjects; cross-architecture (TinyEEGNet, EEGNetV4) and capacity-sweep results show within-regime saturation. Mediation and confirmation reduce risk; they are not intent certificates.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09315v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jianwei Tai</dc:creator>
    </item>
    <item>
      <title>Anything2Skill: Compiling External Knowledge into Reusable Skills for Agents</title>
      <link>https://arxiv.org/abs/2606.09316</link>
      <description>arXiv:2606.09316v1 Announce Type: new 
Abstract: Retrieval-augmented generation (RAG) enables agents to access external knowledge at inference time, but it primarily retrieves fragmented declarative evidence, leaving agents to repeatedly infer task procedures from passages, manuals, examples, logs, or trajectories. This raises a fundamental question: can skills extracted from external knowledge bases be installed into an agent, enabling it to rapidly approximate domain expertise? In this paper, we propose Anything2Skill, a taxonomy-guided framework that compiles heterogeneous external knowledge into reusable, retrievable, and executable skills for agents. Given a corpus of knowledge records, \textsc{Anything2Skill} first decomposes each record into evidence windows and performs plan-and-expand skill extraction under a skill-tree prior. The extracted candidates are then converted into structured skill contracts that specify invocation conditions, contraindications, action moves, workflow steps, constraints, output specifications, supporting evidence, and confidence scores. To construct a deployable procedural memory, Anything2Skill manages the extracted skills in a persistent SkillBank through taxonomy-aware compilation, registry-level reconciliation, lifecycle tracking, versioned updates, and visible skill-tree projection. At inference time, agents retrieve both task-specific passages from the original knowledge base and relevant procedural skills from the SkillBank, allowing RAG to provide declarative evidence while compiled skills provide reusable procedural guidance. Experiments on qsv and GitHub-CLI show that Anything2Skill combined with RAG achieves 98.85\% and 94.10\% success rates, respectively, substantially outperforming RAG-only agents. These results suggest that compiling latent procedural knowledge into explicit skills is an effective way to extend retrieval-augmented agents from knowledge access toward capability reuse.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09316v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Qianjun Pan, Yutao Yang, Junsong Li, Jie Zhou, Kai Chen, Xin Li, Qin Chen, Liang He</dc:creator>
    </item>
    <item>
      <title>Engineering Scalable Distributed List Ranking</title>
      <link>https://arxiv.org/abs/2606.09318</link>
      <description>arXiv:2606.09318v1 Announce Type: new 
Abstract: The list ranking problem is one of the classical problems of parallel computing, with nontrivial algorithms and many applications as a subroutine for solving other problems. While it has been intensively studied in the early days of parallel computing, few things happened in the last 20 years. In particular, there is little work on scaling list ranking to large machines and input sizes. We reconsider list ranking starting from the ground-breaking results of Sibeyn a quarter century ago. We employ algorithm and performance engineering to improve his sparse ruling-set algorithm, making it capable of scaling to many processors, and provide a more detailed analysis of the impact of the algorithm's parameters, further guiding our practical implementation.
  We perform an extensive experimental study across a variety of input instances with different structural properties. We demonstrate that indirect communication, exploiting input locality, and message coalescing allows scaling to billions of elements on up to 24,576 cores.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09318v1</guid>
      <category>cs.DC</category>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Peter Sanders, Matthias Schimek, Tim Niklas Uhl, Thomas Weidmann</dc:creator>
    </item>
    <item>
      <title>TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders</title>
      <link>https://arxiv.org/abs/2606.09323</link>
      <description>arXiv:2606.09323v1 Announce Type: new 
Abstract: Tabular encoders are usually evaluated inside task-specific end-to-end pipelines, so models from different training paradigms are difficult to compare directly even when they operate on similar tabular signals. We introduce TRL-Bench, a multi-granular tabular representation learning (TRL) benchmark that standardizes cross-paradigm representation-level evaluation: each encoder exports row-, column-, or table embeddings through its supported wrapper, and shared lightweight heads probe them across three suites: TRL-CTbench (column/table), TRL-Rbench (row), and TRL-DLTE (compositional Data-Lake Table Enrichment spanning all three granularities). To support this standardized setting, we release curated benchmark assets and task reformulations, including 50 OpenML tables with 123 verified targets, 16 row-pair linkage rewrites, and a 47,772-table DLTE lake derived from 1,379 parent tables. Across 20 models and 16 tasks, TRL-Bench shows that once downstream conditions are standardized, encoder quality is capability-specific rather than captured by a single leaderboard. In TRL-CTbench, generic text encoders often lead on tasks with strong surface-text signal, while tabular specialists win where their pretraining objective aligns with the task. In TRL-Rbench, within-table prediction and cross-table linkage favor different training regimes, with atomic linkage performance correlating strongly with the row-matching stage of DLTE pipelines. In TRL-DLTE, the strongest pipelines combine capability-matched specialists rather than reuse a single encoder, and top end-to-end quality depends on non-additive compositional fit rather than per-stage marginal rank alone. TRL-Bench provides a common protocol for measuring reusable signal in exported tabular representations under shared downstream conditions. Code and data: https://github.com/LOGO-CUHKSZ/TRL-Bench</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09323v1</guid>
      <category>cs.AI</category>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wei Pang, Xiangru Jian, Hehan Li, Zhixuan Yu, Alex Xue, Jinyang Li, Zhengyuan Dong, Xinjian Zhao, Hao Xu, Chao Zhang, Reynold Cheng, M. Tamer \"Ozsu, Tianshu Yu</dc:creator>
    </item>
    <item>
      <title>A Universal Dense Football Event Representation Based on TabTransformer</title>
      <link>https://arxiv.org/abs/2606.09327</link>
      <description>arXiv:2606.09327v1 Announce Type: new 
Abstract: Football event data constitute a rich spatiotemporal source for quantitative analysis of player actions in team sports. These datasets contain heterogeneous features, combining continuous location coordinates with categorical variables such as action type, action outcome, and body part. Such data have been applied in sports analytics for match outcome forecasting, player evaluation, and tactical pattern recognition. However, existing approaches predominantly encode categorical features using one-hot or ordinal embedding representations, overlooking the intrinsic semantics of action descriptors. The Transformer is a deep neural network architecture based on self-attention that captures dependencies between input features at arbitrary positions. We propose and implement a Transformer-based model to learn latent dependencies among categorical event features and produce dense representations of football events. By encoding categorical features as learned embedding vectors, sport-specific action semantics are captured during pretraining, enabling the representations to support downstream tasks such as action value estimation and play style recognition. Empirical evaluation shows that the embedding representations yield superior probability calibration over task-specific baselines on the downstream prediction tasks, as measured by Brier score.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09327v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Weiran Yang, Daniel Memmert, Maximilian Klemp-Weins</dc:creator>
    </item>
    <item>
      <title>Conan-embedding-v3: Fusing Modality-Specific Models for Omni-Modal Embedding</title>
      <link>https://arxiv.org/abs/2606.09331</link>
      <description>arXiv:2606.09331v1 Announce Type: new 
Abstract: Omni-modal retrieval promises a single embedding space for text, image, video, document, and audio inputs, but building such a unified retriever is difficult since these modalities differ in data distribution, architecture, and optimization dynamics. In this work, we present Conan-embedding-v3, a decouple--fuse--recover framework for omni-modal retrieval. Conan-embedding-v3 first trains modality specialists independently and fuses their task vectors into a single dense backbone, a strategy we call Decoupled Specialist Fusion. We show that this fusion composes visual, video, and document retrieval capabilities, but also exposes a failure mode for projector-based modalities: when audio is attached through an external encoder and projector, fusing the backbone leaves the projector calibrated to the audio-specialist backbone, causing a large audio retrieval regression despite copying all audio-specific modules unchanged. We call this failure Projector Drift. To repair it, Conan-embedding-v3 applies Projector Recovery (i.e., full-parameter fine-tuning of the projector while keeping the backbone frozen) followed by balanced multi-modal rehearsal. The resulting model supports these retrieval pathways in one backbone, achieving 74.9 scores on MMEB while obtaining 55.61 on the 30-task MAEB audio suite.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09331v1</guid>
      <category>cs.MM</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Shiyu Li, Zhiyuan Hu, Yifan Wang, Peiming Li, Zheng Wei, Yang Tang</dc:creator>
    </item>
    <item>
      <title>How Far Can Prompting Go for Minimal-Edit Ukrainian Grammatical Error Correction?</title>
      <link>https://arxiv.org/abs/2606.09334</link>
      <description>arXiv:2606.09334v1 Announce Type: new 
Abstract: Fine-tuned Large Language Models (LLMs) dominate in Ukrainian grammatical error correction (GEC), while API-accessed LLMs remain nearly untested on minimal-edit benchmarks. We evaluate 11 commercial LLMs from four providers and one open-source Ukrainian model on the UNLP 2023 GEC-only benchmark, comparing zero-shot, few-shot, minimal-edits, and LLM-assisted prompt optimization strategies. Our best configuration (Gemini 3.1-Pro) reaches F0.5=69.22, closing over 90% of the gap to fine-tuned SOTA (F0.5=73.14). For zero-shot prompts, only Claude models benefit from Ukrainian instructions. However, the best overall results for all models use Ukrainian minimal-edits prompts, whose language-specific rules require Ukrainian to express precisely. LLM-assisted prompt optimization on top of minimal-edits + few-shot achieves the highest score. Detailed minimal-edits instructions yield the largest gains for punctuation and case errors but cause the model to abandon several low-frequency categories. Delving into error analysis, we identify five recurring overcorrection patterns tied to Ukrainian-specific linguistic phenomena. Code, prompts, and outputs are publicly available.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09334v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kateryna Karpo, Artem Chernodub</dc:creator>
    </item>
    <item>
      <title>TORL-VLA: Tactile Guided Online Reinforcement Learning for Contact-Rich Manipulation</title>
      <link>https://arxiv.org/abs/2606.09337</link>
      <description>arXiv:2606.09337v1 Announce Type: new 
Abstract: Vision-Language-Action (VLA) models have become a powerful framework for robotic manipulation, and recent studies have introduced tactile or force feedback into VLAs to address contact-rich tasks. However, these models are typically deployed as offline policies. When contact conditions shift from the training distribution, the policy cannot perform online adaptation, leading to problems such as inappropriate contact forces and inefficient retries. Therefore, we propose TORL-VLA, a tactile-guided online reinforcement learning framework that couples tactile feedback with policy refinement for contact-rich manipulation. Our method introduces a tactile-derived wrench-aware VLA to predict reference actions and future wrench sequences, while a lightweight online RL module is used to refine the reference actions. To stabilize learning from mixed exploratory policy-generated and human-intervention data, we introduce an intervention-censored critic that prevents post-intervention success from being wrongly credited to policy-generated actions preceding intervention. Real-robot experiments on long-horizon contact-rich tasks, including latch manipulation, coffee-cup placement, and egg handling, show that TORL-VLA improves success rates at both subtask and full-task levels, as well as time-bounded execution efficiency over strong baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09337v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Huaihang Zheng, Yi Yang, Kai Ma, Shenglin Xu, Tian Xie, Guozheng Li, Xiangyu Wang, Yiren Ma, Si Liu, Yinian Mao, Baoxu Liu</dc:creator>
    </item>
    <item>
      <title>Multi-Hop Knowledge Composition is Bound by Pretraining Exposure</title>
      <link>https://arxiv.org/abs/2606.09338</link>
      <description>arXiv:2606.09338v1 Announce Type: new 
Abstract: Large Language Models fail at implicit multi-hop reasoning: a model answers "When was $X$ born?" and "Who is $Y$'s closest friend?" correctly but fails on "When was $Y$'s closest friend born?" in a single forward pass, even when both facts are perfectly memorized and individually retrievable. We study this failure in a controlled natural language setting with a strict separation between individuals exposed to compositional contexts during pretraining and those that never appear in any such context. We confirm that compositional failure persists even at 97% 1-hop accuracy, establishing the gap as a pretraining failure rather than a knowledge absence. We propose and test nine data-centric augmentation formats and find that compositional pretraining transfers to unseen questions for exposed individuals, but never to individuals absent from compositional pretraining, suggesting that exposure to compositional contexts during pretraining is a necessary condition for implicit multi-hop reasoning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09338v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yannis Karmim, Luis Marti, Djam\'e Seddah, Valentin Barri\`ere</dc:creator>
    </item>
    <item>
      <title>Thresholded Local Hyper-Flow Diffusion</title>
      <link>https://arxiv.org/abs/2606.09340</link>
      <description>arXiv:2606.09340v1 Announce Type: new 
Abstract: Local Hyper-Flow Diffusion (HFD) gives an edge-size-independent Cheeger-type guarantee for seeded clustering in general submodular hypergraphs, but existing HFD solvers do not keep intermediate computation local at every iteration. We introduce Thresholded Local HFD (TL-HFD), a first-order method that maintains an active region around the seeds, performs projected subgradient updates on that region and its immediate boundary, and expands via thresholded (top-k) boundary activation. We prove that the local update is exact: the degree-preconditioned projected subgradient step restricted to the active region and its boundary coincides with the unrestricted global update. We establish finite-time dual suboptimality for both exact and thresholded updates, treating the latter as inexact projected subgradient steps with explicit skipped-boundary error. We further derive an additive activated-volume bound controlled by realized local subgradient norms and the minimum boundary-push among newly activated vertices, and translate approximate dual optimality with localized support into a robust sweep-cut guarantee for early-stopped iterates. For general submodular cut-costs, each iteration is local in the scanned region and oracle-sensitive in the hyperedge primitive. Empirically, TL-HFD often matches or improves over HFD while activating less volume, with the largest gains on noisy instances where diffusion tends to absorb non-target vertices.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09340v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Meher Chaitanya, Sebastian Dalleiger, Luana Ruiz</dc:creator>
    </item>
    <item>
      <title>Leveraging Structural Constraints for Diffusion-based Neural TSP Solvers</title>
      <link>https://arxiv.org/abs/2606.09343</link>
      <description>arXiv:2606.09343v1 Announce Type: new 
Abstract: Neural combinatorial optimization has recently achieved strong results on the Euclidean Traveling Salesman Problem (TSP) using generative models such as diffusion and consistency models. State-ofthe-art approaches like FT2T combine fast consistency-based prediction with gradient-based inference time refinement. However, gradient search often incurs significant computational overhead and may not align with the discrete structure of feasible solutions. We introduce Projected Consistency Inference (PCI), a plug-and-play, retraining-free alternative that replaces gradient refinement with structure-aware projections: PCI decodes valid Hamiltonian tours from the consistency model output and applies a lightweight local search (e.g., 2-opt). PCI achieves an average optimality gap (OG) of 0.17% on TSP with 500 cities, and 0.31% on TSP with 1000 cities, outperforming FT2T best settings (OG 0.22% and 0.36%, respectively) while reducing the inference time up to 30 to 40%. PCI also exhibits lower variance and memory usage, and can surpass classical heuristics such as LKH3 in rapid solution generation. Our results demonstrate that structure-aware inference time operations provide a practical and principled path for neural TSP solvers, complementing training time objectives.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09343v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:journal_reference>The 20th Learning and Intelligent OptimizatioN Conference (LION), Jun 2026, Milan (Italie), Italy</arxiv:journal_reference>
      <dc:creator>Micka\"el Basson (CRIStAL, Scool), Philippe Preux (CRIStAL, Scool)</dc:creator>
    </item>
    <item>
      <title>IB-HFN: Information Bottleneck-Driven SAR-Optical Fusion Network for High-Fidelity Cloud Removal</title>
      <link>https://arxiv.org/abs/2606.09347</link>
      <description>arXiv:2606.09347v1 Announce Type: new 
Abstract: Synthetic aperture radar (SAR)-assisted optical cloud removal aims to recover surface information obscured by clouds in optical remote sensing images by exploiting complementary SAR observations. Existing multimodal fusion methods typically rely on direct spatial concatenation and pixel-wise supervision, which can propagate SAR speckle noise into optical reconstruction and lead to over-smoothed results. To address these limitations, we propose an Information Bottleneck-driven High-Fidelity Network (IB-HFN) for SAR-assisted optical cloud removal. IB-HFN employs a dual-stream backbone to preserve modality-specific representations before deep semantic fusion, thereby mitigating premature cross-modal contamination. At the fusion stage, we introduce a Spatial Information Bottleneck Fusion module that compresses SAR features through a channel-wise variational information bottleneck to suppress unstructured speckle noise. In parallel, a local-global gating mechanism predicts clear-sky regions and routes reliable optical details through a Dirac-initialized skip connection, decoupling noise suppression from texture preservation. We further develop a joint optimization strategy that integrates feature-level bottleneck regularization with image-level constraints on reconstruction accuracy, structural consistency, spectral fidelity, and contrastive sharpness. A dynamic weighting schedule balances these objectives to stabilize training and reduce hazy artifacts. Experiments on the SEN12MS-CR dataset under challenging spatio-temporal splits demonstrate that IB-HFN achieves superior structural preservation and spectral fidelity over existing methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09347v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Haojun Guo, Fan Feng, Ziquan Wang</dc:creator>
    </item>
    <item>
      <title>PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment</title>
      <link>https://arxiv.org/abs/2606.09348</link>
      <description>arXiv:2606.09348v1 Announce Type: new 
Abstract: Long-horizon agentic tasks pose a fundamental credit assignment challenge for outcome-base reinforcement learning: trajectory-level rewards verify final correctness but provide limited guidance on which intermediate reasoning steps or tool interactions contribute to the outcome. The difficulty is especially pronounced in multi-turn search agents, where successful trajectories may contain misleading actions and failed trajectories may contain valuable evidence-gathering steps. We propose PBSD (Privileged Bayesian Self-Distillation), a Bayes-calibrated self-distillation method for fine-grained credit assignment under sparse final rewards. PBSD measures trajectory quality through the posterior-to-prior probability ratio of the verified answer and applies Bayes' rule to convert this hard-to-estimate answer-side ratio into a tractable likelihood ratio between a standard student model and a privileged answer-conditioned teacher model. Autoregressive decomposition of this Bayesian evidence score yields turn-level signals that identify whether each intermediate turn supports or undermines the verified outcome. Consequently, PBSD provides a principled and elegant reweighting scheme that transforms sparse outcome supervision into Bayes-calibrated turn-level credit signals, while remaining fully compatible with standard policy optimization. Experiments demonstrate that PBSD consistently enhances performance across both in-domain and out-of-domain settings, and effectively transfers knowledge from short-context training to long-context inference, suggesting that its fine-grained credit assignment mechanism facilitates more effective policy learning and yields improved generalization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09348v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yang Tian, Rui Wang, Xumeng Wen, Junjie Li, Shizhao Sun, Lei Song, Jiang Bian, Bo Zhao</dc:creator>
    </item>
    <item>
      <title>Taming Perception Jitter: Uncertainty-Aware LiDAR Object Detection for Reliable Motion Classification</title>
      <link>https://arxiv.org/abs/2606.09350</link>
      <description>arXiv:2606.09350v1 Announce Type: new 
Abstract: Reliable motion classification is critical for autonomous driving, as false dynamic predictions of static objects can cascade into unnecessary planner interventions. Unstable bounding box predictions can lead to spurious velocity estimates in tracking and falsely predicted trajectories. We present a deployment-friendly mitigation strategy that augments a 3D object detector with aleatoric uncertainty estimates and applies a two-sample z-test over short observation windows to separate true motion from jitter. Integrated into Autoware with minimal changes, the approach reuses existing data association for minimal compute overhead. Empirical results show parity with velocity thresholding on nuScenes, but substantially fewer false dynamic predictions and unnecessary stops in real-world test drives, explained by the presence of an intermediate jitter band in the recorded data that speed-only rules misclassify. This demonstrates that uncertainty-aware detection and lightweight statistical testing can deliver practical performance gains for autonomous driving in noisier real-world settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09350v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Cornelius Schr\"oder, \v{Z}ygimantas Marcinkus, Markus Lienkamp</dc:creator>
    </item>
    <item>
      <title>In-Context Learning for the Imputation of Public Opinion Data with Large Language Models</title>
      <link>https://arxiv.org/abs/2606.09351</link>
      <description>arXiv:2606.09351v1 Announce Type: new 
Abstract: Large language models have been widely evaluated as simulators of individual survey responses. In practice, however, fully unobserved responses are rare; the dominant problem is partial non-response. Imputation aims to restore the overall structure of a survey dataset by filling in these missing values. It has its own well-defined evaluation criteria and differs fundamentally from prediction. We propose to impute missing survey data through in-context learning (ICL). We systematically evaluate ICL design choices across different missingness mechanisms (MCAR, MAR, MNAR) on 150 opinion variables spanning 15 waves of the American Trends Panel. Compared to well-established statistical methods for data imputation like MICE PMM, our ICL approach consistently reduces absolute error across all missingness mechanisms, with the largest gains under non-random missingness (MNAR). Notably, the best-performing specification (gpt-oss-120b with 100 in-context examples) achieves near-nominal aggregate coverage (approaching the 95% level) with confidence intervals two to five times narrower than MICE PMM. We publish a Python package with an sklearn-like API to enable easy deployment of our method using local and proprietary LLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09351v1</guid>
      <category>cs.CL</category>
      <category>stat.ME</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tobias Holtdirk, Georg Ahnert, Joseph W Sakshaug, Anna-Carolina Haensch</dc:creator>
    </item>
    <item>
      <title>Beyond Humans: Multispecies Animal Face Recognition Using Transfer Learning</title>
      <link>https://arxiv.org/abs/2606.09353</link>
      <description>arXiv:2606.09353v1 Announce Type: new 
Abstract: Individual animal recognition can be useful in the search for lost or stolen pets, the tracking of individuals of endangered species, and the recognition of animals in crowded farms. Present recognition techniques mostly use physical devices, e.g., microchips, often impractical and difficult to apply. These could be replaced by remote recognition via the animal's face; if accurate enough, it provides several advantages: it is non-invasive, can work at a distance, and is difficult to counterfeit, as, for instance, in the case of substituting sick animals for healthy ones in the food industry. The few existing datasets with sufficient per-subject images annotated with a single animal identity are not large enough to train current deep learning architectures. We rather investigate the possibility of transfer learning, exploiting pre-trained network models as backbones. Our experiments compared FaceNet, which is specifically trained on large databases of human faces, with the Vision Transformer (ViT) pre-trained on ImageNet, i.e., on object categories. We used three face datasets of very different animals: dogs, primates (lemurs, golden monkeys, and chimpanzees), and cattle. We report the results and, for each dataset, compare them with the state of the art (SOTA) ad hoc-trained deep networks. The capture conditions differ among the three datasets. Image quality (resolution, motion blur, diverse poses, etc.) decreases from dogs to cattle to primates. The best performance was achieved with dogs, where ViT reached a mean verification accuracy of 96.85% and a Rank-1 Identification Rate of 84.34%. The results for endangered primates are still encouraging, but performance varies across animal classes and tasks (verification or identification), and does not always outperform SOTA. For cattle, the ViT results outperform SOTA, while FaceNet is still competitive.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09353v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Maria De Marsico, Anil K. Jain, Annalaura Miglino</dc:creator>
    </item>
    <item>
      <title>MosaicIMU: Composing Carrier Experts for Generalizable Neural Inertial Odometry</title>
      <link>https://arxiv.org/abs/2606.09355</link>
      <description>arXiv:2606.09355v1 Announce Type: new 
Abstract: Robust inertial odometry is essential for various carriers when external sensing is unreliable. Learning-based methods reduce integration drift by capturing local motion priors, but these methods often remain tied to a particular carrier, limiting generalization across heterogeneous platforms. We present MosaicIMU, a carrier-conditioned Mixture-of-Experts (MoE) pretraining-and-adaptation framework for generalizable neural inertial odometry. MosaicIMU uses a prototype-based router to compose carrier-specific expert features, decodes local velocity and uncertainty constraints, and integrates them with a history-aware EKF. For unseen domain adaptation, it freezes the pretrained base model and learns a new lightweight expert residual branch. For edge-deployment, it further reuses the router to select informative online samples for efficient incremental updates. Experiments show that MosaicIMU consistently outperforms learning-based baselines, reducing average ATE and RTE-10s by 40% and 34%, respectively. These results highlight that MosaicIMU provides a scalable pretraining-to-deployment paradigm for generalizable and adaptive neural inertial odometry.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09355v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Junye Zou, Huiyi Yan, Xinning Xu, Xiaolei Li, Pengkun Zhou, Jinhui Zhang, Ziyang Meng</dc:creator>
    </item>
    <item>
      <title>Coupling Complementary Simulations for Combined Performance and Energy Optimization</title>
      <link>https://arxiv.org/abs/2606.09356</link>
      <description>arXiv:2606.09356v1 Announce Type: new 
Abstract: Polymer simulations are among the most computationally demanding workloads in soft-matter research, often requiring days of execution and high energy consumption to achieve physically meaningful results. In this work, we address these challenges through the coupling and optimization of two complementary simulation frameworks: the Uneyama-Doi Model (UDM) and the SOft coarse-grained Monte Carlo Acceleration (SOMA). UDM efficiently propagates concentration fields at the continuum level, while SOMA resolves chain-scale thermal fluctuations via particle-based Monte Carlo dynamics. Each model was individually optimized for GPU execution using kernel fusion, memory coalescing, asynchronous random-number generation yielding up to 70% (UDM) and 80% (SOMA) performance improvement. The coupling is performed through our proposed coordinator library that orchestrates data exchange and synchronizes time-stepping across multiple GPUs. Further management of coupling workload distribution enabled a 13x overall speedup and 24.5x reduction in total energy usage compared to the SOMA baseline, i. e., 96% energy saving. The proposed hybrid approach maintains the same scientific fidelity while drastically reducing the computational and energy footprint, showcasing the potential of energy-aware, cross-application co-design for sustainable high-performance simulations</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09356v1</guid>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Adel Dabah, Gregor H\"afner, Sonja Happ, Simon Pickartz, Marcus M\"uller, Andreas Herten</dc:creator>
    </item>
    <item>
      <title>ExDet: Open-Domain Open-Vocabulary Detection with Cross-modal Extrapolation and Rectification</title>
      <link>https://arxiv.org/abs/2606.09360</link>
      <description>arXiv:2606.09360v1 Announce Type: new 
Abstract: Open-domain open-vocabulary detection (ODOVD) requires detectors to generalize to both novel categories and unseen domains, making it more challenging than open-vocabulary detection. Existing methods typically train open-vocabulary detectors together with domain generalization modules from scratch, leading to high training cost. we propose ExDet, a lightweight category-domain collaborative generalization framework for ODOVD that enhances the cross-category and cross-domain generalization of existing detectors. ExDet consists of Text-Guided Extrapolation (TGE), a lightweight Detector-Compatible Rectification (DCR) module, and ExRPN. Specifically, TGE exploits the DeltaSpace property of vision-language models (VLMs) to infer category- and domain-aware proxy visual prototypes from text. DCR is learned from the TGE-generated prototypes in a detector training-free and real-data-free manner, and is inserted after the classification head at inference to rectify representations toward a detector-compatible source-domain visual distribution, thereby enhancing classification for targets from novel categories and unseen domains. ExRPN recalibrates proposal scores by combining semantic similarity with RPN confidence, improving recall for novel and domain-shifted objects while providing better support for subsequent classification and DCR. ExDet achieves SOTA performance on OD-LVIS, OV-LVIS, Objects365, and MSOSB.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09360v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yupeng Zhang, Yuzhong Feng, Ruize Han, Zhiwei Chen, Wei Feng, Liang Wan</dc:creator>
    </item>
    <item>
      <title>Bespoke-Card: Why Tune When You Can Generate? Synthesizing Workload-Specific Cardinality Estimators</title>
      <link>https://arxiv.org/abs/2606.09361</link>
      <description>arXiv:2606.09361v1 Announce Type: new 
Abstract: Cardinality estimators are built to support arbitrary schemas and workloads, forcing them to rely on generic statistics even when the schema and workload is known in advance, leaving optimizers prone to large errors and poor plans. We present Bespoke-Card, an agent-driven system that synthesizes workload-specific cardinality estimators as executable code: a planning agent designs the estimators strategies, a coding agent implements them, and a validator scores the estimates against true cardinalities and PostgreSQL estimates, forming a robust and deterministic harness. Going beyond naive prompting, Bespoke-Card uses structured q-error feedback, regression analysis, concrete outlier subplans, a curriculum isolating join-only, filter-only, and full-subplan errors, and archival selection of the best implementation. Injecting its estimates into the optimizer cuts total PostgreSQL runtime on JOB by 33% and reduces median q-error over all JOB subplans by 41%, while synthesizing a strong estimator in under one hour for less than $10. Bespoke-Card is opening a new avenue for cardinality estimation next to classical generic estimators and learned estimator architectures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09361v1</guid>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Johannes Wehrstein, Anton Winter, Timo Eckmann, Carsten Binnig</dc:creator>
    </item>
    <item>
      <title>Zero-Shot Semantic Re-Identification for Autonomous Driving: A VLM Baseline Study</title>
      <link>https://arxiv.org/abs/2606.09362</link>
      <description>arXiv:2606.09362v1 Announce Type: new 
Abstract: Re-Identification (ReID) in autonomous driving is typically formulated as a visual matching problem, where observations of vehicles, pedestrians, and cyclists are associated across time, frames, or camera views using learned appearance embeddings, often complemented by motion, geometric, or multimodal cues. However, purely visual representations may be sensitive to viewpoint, occlusion, illumination, and sensor-domain variations, limiting their interpretability and robustness in complex driving scenes. We propose a baseline study of a zero-shot pipeline using Vision-Language Models (VLMs) to generate textual descriptions of detected traffic participants and evaluate whether these descriptions can support identity matching across observations. Instead of relying only on low-level visual similarity, the proposed formulation represents each object through structured semantic attributes, including category, color, shape, pose, visible parts, spatial context, and distinctive visual cues. This study provides an initial benchmark for language-based re-identification in autonomous-driving scenarios, discussing and evaluating the strengths and limitations of current VLMs for this task. Results demonstrate that zero-shot semantic descriptions can support effective object re-identification, achieving retrieval performance comparable to a supervised CNN baseline while offering greater interpretability through explicit identity cues. However, the experiments also reveal important challenges, including attribute inconsistency across viewpoints and limited fine-grained discrimination between visually similar instances.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09362v1</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Eduardo Borges, Manuel Abreu, Lu\'is Garrote, Urbano J. Nunes</dc:creator>
    </item>
    <item>
      <title>Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory</title>
      <link>https://arxiv.org/abs/2606.09365</link>
      <description>arXiv:2606.09365v1 Announce Type: new 
Abstract: Medical agent systems are increasingly expected to support interactive clinical decision making rather than only static question answering. In such settings, effective agents must reuse prior experience across evolving cases, yet existing memory mechanisms often retain raw historical traces that are redundant, noisy, and difficult to govern. More importantly, they rarely distinguish which memories are truly useful for future reasoning. This limits their ability to accumulate compact and reliable experience for long-horizon clinical reasoning. To close this gap, we propose SkeMex, a post-deployment self-evolution framework that improves medical agents through a skill-based memory without updating model weights. SkeMex distills informative interaction trajectories into structured skills that encode reusable procedural knowledge, and organizes them into a multi-branch repository spanning general, task-specific, and action-level experience. To determine which memories should be reused and retained, SkeMex estimates context-dependent utility from environment feedback and uses it to guide value-aware retrieval and repository governance. A closed-loop ``Read--Write--Assess--Govern" lifecycle further supports continual evolution by writing new skills, updating utilities, promoting useful memories, and removing harmful entries. Experiments across diverse clinical tasks show that SkeMex consistently outperforms representative memory-based agents in both offline and online settings. It also generalizes across model backbones and supports transferable skill memory. All data and code will be released publicly.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09365v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haoran Sun, Wenjie Li, Yujie Zhang, Zekai Lin, Fanrui Zhang, Kaitao Chen, Xingqi He, Yichen Li, Mianxin Liu, Lei Liu, Yankai Jiang</dc:creator>
    </item>
    <item>
      <title>Is Text All You Need? Text as a Universal Information Bottleneck for Speech LLMs</title>
      <link>https://arxiv.org/abs/2606.09366</link>
      <description>arXiv:2606.09366v1 Announce Type: new 
Abstract: Large language models (LLMs) provide a powerful reasoning backbone for speech understanding, but integrating continuous acoustic signals into a frozen LLM remains challenging. Existing speech-to-LLM interfaces typically operate at two extremes: either enforcing near-discrete token alignment, which benefits transcription but loses paralinguistic information, or learning unconstrained continuous representations, which can drift away from the LLM's input space and degrade autoregressive decoding. In this work, we propose Convex Gate (C-Gate), a speech-to-LLM bridge that constrains all speech representations to lie within the LLM's input embedding manifold with an architectural convex-hull constraint. Concretely, each frame is represented as a convex combination of token embeddings, ensuring compatibility with the pretrained LLM while preserving continuous expressivity. Across automatic speech recognition (ASR) and emotion recognition, C-Gate achieves strong joint performance, improving LibriSpeech WER by up to 48.7% relative while matching or exceeding single-task emotion accuracy. Beyond performance, our analysis reveals a key insight: information is not carried by discrete token identities, but by time-resolved trajectories in the embedding space. Causal interventions confirm that both the trajectory structure and alignment to the pretrained embedding manifold are critical for performance. These results suggest that geometry, rather than token discreteness, is the fundamental design factor in speech-to-LLM interfaces, and provide a controlled regime for studying multimodal integration in frozen LLMs. We release the checkpoint, per-sample outputs, mechanism dumps, and intervention suite for replication.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09366v1</guid>
      <category>cs.CL</category>
      <category>eess.AS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ming-Hao Hsu, Yuxuan Hu, Shujie Liu, Jinyu Li, Yan Lu, Zhizheng Wu</dc:creator>
    </item>
    <item>
      <title>RT-SDGOD: Real-Time Single-Domain Generalized Object Detection</title>
      <link>https://arxiv.org/abs/2606.09367</link>
      <description>arXiv:2606.09367v1 Announce Type: new 
Abstract: In real-world deployment under strict real-time constraints, weather and imaging variations induce significant distribution shifts, severely degrading detectors. Single-Domain Generalized Object Detection aims to mitigate this issue, yet existing methods rarely investigate-at the level of problem formulation-the generalization capability of real-time detectors under such constrained inference budgets. To this end, we introduce Real-Time Single-Domain Generalized Object Detection (RT-SDGOD), which focuses on how real-time detectors can achieve cross-domain generalization under zero extra inference overhead by relying solely on training-time representation learning. We observe that, under domain shift, DETR-based real-time detectors mainly degrade through increased missed detections, rooted in limited and unstable object-level discriminative evidence. Based on this, we propose RT-SDGDet, a multi-evidence collaborative modeling framework for RT-SDGOD. The core idea is to enable multiple queries of the same object to collaboratively cover more sufficient discriminative evidence while maintaining the stability of such evidence modeling across views. Specifically, we use one-to-many (O2M) supervision to construct stable object-specific query groups, and further design Discriminative Evidence Diversity Learning (DEDL) and Dual-view Evidence Consistency Learning (DvECL) to expand object-level evidence coverage and improve evidence stability under appearance perturbations, respectively. Since all components are introduced only during training, our method incurs no extra inference overhead. Extensive experiments show that the proposed method achieves better generalization performance than existing approaches across multiple unseen target domains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09367v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yupeng Zhang, Fangzhuo Gao, Ruize Han, Wei Feng, Liang Wan</dc:creator>
    </item>
    <item>
      <title>PhysScene: A Scene Graph Dataset for Scientific Visual Reasoning in Physics Experiments</title>
      <link>https://arxiv.org/abs/2606.09368</link>
      <description>arXiv:2606.09368v1 Announce Type: new 
Abstract: Scene Graphs (SGs) provide structured representations of visual scenes by modeling objects and their pairwise relationships. Despite recent progress, existing datasets primarily focus on generic natural contexts, leaving domain-specific and function-oriented scenes largely underexplored. This limitation restricts the evaluation of relational reasoning in scientific experimental scenes, thereby hindering the development of intelligent monitoring, analysis, and related applications in such scenes. To address this gap, we introduce PhysScene, the first SG dataset tailored to physics experiments. PhysScene encompasses specialized instruments, structured experimental setups, and functional relations intrinsic to experimental environments, enabling reasoning that extends beyond spatial co-occurrence to logical dependencies. Rather than pursuing large data scale, PhysScene focuses on strong semantic constraints and high relation density in experimental scenes, posing new challenges for existing scene parsing algorithms while offering opportunities for further improvements. Extensive analyses and experiments show that PhysScene complements existing benchmarks and establishes a valuable testbed for advancing scientific visual reasoning. The dataset is publicly available at https://github.com/ZMH-SDUST/PhysScene.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09368v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Minghao Zou, Qingtian Zeng, Shangkun Liu, Yanda Meng, Guanghui Yue, Baoquan Zhao, Abdulmotaleb El Saddik, Wei Zhou</dc:creator>
    </item>
    <item>
      <title>Residual Pseudospectra Reveal a Physics-Informed Koopman Backbone for Tropical Pacific Variability and ENSO Prediction</title>
      <link>https://arxiv.org/abs/2606.09369</link>
      <description>arXiv:2606.09369v1 Announce Type: new 
Abstract: Tropical Pacific sea-surface-temperature (SST) variability spans interacting timescales, with the ENSO as its dominant interannual expression. Yet the dynamical structure organizing this variability and underpinning extended-range predictability remains difficult to extract from high-dimensional observations. Koopman operator learning offers spectral coordinates for nonlinear dynamics, yet finite geophysical records often produce dense, sampling-sensitive spectra whose physical content is ambiguous. We show that this apparent redundancy reflects coherent operator-level structure. Combining kernel Extended Dynamic Mode Decomposition with residual minimization and pseudospectral analysis, we use the Koopman eigenvalue relation as a physics-informed consistency test to organize learned spectra. Applied to ERA5 and HadISST tropical Pacific SST anomalies, the residual landscape identifies 19 robust residual-minimum frequencies with coherent spatial modes that persist across products and sampling realizations. Together, these modes define a compact Koopman backbone spanning low-frequency modulation through quasi-biennial components, including ENSO-band variability. The surrounding spectral cloud is structured by integer powers and nonlinear combinations of this backbone, forming a residual-ordered Koopman hierarchy. The backbone reconstructs substantial Nino3.4 variance and enables skillful out-of-sample forecasts, with greatest gains at 8-18-month leads. By embedding dynamical consistency into physics-informed operator learning, the framework turns opaque spectra into robust, interpretable and predictive representations of tropical Pacific variability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09369v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <category>physics.ao-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Paula Lorenzo-Sanchez, Matthew J. Colbrook, Antonio Navarra</dc:creator>
    </item>
    <item>
      <title>Capability-Aligned Hierarchical Learning for Tool-Augmented LLMs</title>
      <link>https://arxiv.org/abs/2606.09371</link>
      <description>arXiv:2606.09371v1 Announce Type: new 
Abstract: Tool learning enables LLMs to invoke external tools to accomplish tasks. Prior studies have demonstrated the effectiveness of a hierarchical structure: a high-level policy handles global planning and decomposes tasks into manageable sub-tasks, and a low-level policy focuses on invoking tools to solve these sub-tasks. However, these works typically optimize the high-level and low-level policies separately, leading to planner-executor misalignment and limiting LLM performance on tool-use tasks. In this paper, we propose a method called Capability-Aligned Hierarchical Learning (CAHL), which leverages RLVR to jointly optimize both policies, enabling better alignment between the high-level planner and the low-level executor. Experiments on constrained tool-use benchmarks (API-Bank and BFCL) and an open-ended environment (Bamboogle) demonstrate the effectiveness of CAHL.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09371v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haotong Yang, Ting Long, Yi Chang</dc:creator>
    </item>
    <item>
      <title>Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle</title>
      <link>https://arxiv.org/abs/2606.09376</link>
      <description>arXiv:2606.09376v1 Announce Type: new 
Abstract: Reference-free faithfulness metrics verify each atomic claim a model makes against ground truth, and are increasingly used to evaluate grounded generation. We show they share a blind spot: they measure only precision -- are the stated claims supported? -- and therefore reward abstention, since a model can score near-perfect faithfulness by saying almost nothing. We make this measurable using Formula 1 telemetry, a domain where strategic ground truth is derived deterministically and, crucially, completely: for each decision we know the full set of facts that mattered. This completeness -- absent in open-domain faithfulness benchmarks -- lets us measure recall (coverage of the relevant facts) exactly, alongside precision. On a multilingual (EN/ES/PT) benchmark of 7,253 decision instances spanning 150 races, the most precise frontier model covers under half of the relevant facts and ranks last by F1, so requiring coverage reorders the systems; the same effect reappears in a second complete-oracle domain (NOAA weather forecasts). A prompt ablation shows the low coverage is not an under-prompting artifact: explicitly asking models to be thorough does not close the gap. We pair faithfulness with coverage into a single score, validate the metric (controlled perturbation; agreement across a model-free regex extractor and a cross-family LLM extractor, system-level Spearman 1.0), and give a verifier-guided generation method that improves precision and recall without references. We release the benchmark, structured annotations, metric, baselines, and an interactive demo.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09376v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Juan S. Santillana</dc:creator>
    </item>
    <item>
      <title>Scaling Neural Network Verification with Tensor Parallelism and Fully Sharded Data Parallelism</title>
      <link>https://arxiv.org/abs/2606.09377</link>
      <description>arXiv:2606.09377v1 Announce Type: new 
Abstract: Formal neural network verification -- proving that a network satisfies safety properties for \emph{all} inputs in a specified domain -- is bounded in practice by GPU memory: standard implementations of bound-propagation algorithms (IBP, CROWN, $\alpha$-CROWN) require weight and relaxation-coefficient matrices to reside entirely on one accelerator. We adapt two parallelism techniques originally developed for large-scale model training to the \texttt{auto\_LiRPA}\,/\,$\alpha,\beta$-CROWN verification framework. \textbf{Tensor Parallelism (TP)} shards both weight and $A$-matrices across GPUs, achieving ${\approx}2\times$ peak-memory reduction at $P{=}2$; soundness is confirmed on VNN-COMP 2022 MNIST-FC benchmarks, though bound tightness degrades with the number of sharded zones due to forced IBP substitution for intermediate bounds inside sharded zones. \textbf{Fully Sharded Data Parallelism (FSDP)} shards only weight matrices with a per-layer \texttt{AllGather}, producing bounds that are \emph{bitwise identical} to the single-GPU baseline: baseline memory drops by 80--90\%, peak memory by 34--39\% on wide MLPs. FSDP integrates cleanly with complete verification ($\beta$-CROWN + Branch-and-Bound) and with convolutional layers (\texttt{BoundConv}); a complete \emph{unsat} result is obtained for CIFAR-100 ResNet-large (VNN-COMP 2024) under FSDP. Across all experiments the memory bottleneck in $\alpha$-CROWN+BaB mode proves to be per-neuron alpha tensors, not weight matrices, pointing to the key direction for future work.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09377v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sergei Vorobyov, Eugene Ilyushin</dc:creator>
    </item>
    <item>
      <title>Echo-DM: Ultrasound Marker Removal via Conditional Latent Diffusion and Region-Aware Fusion</title>
      <link>https://arxiv.org/abs/2606.09378</link>
      <description>arXiv:2606.09378v1 Announce Type: new 
Abstract: Clinical ultrasound images often contain artificial markers, such as measurement calipers and text, to assist diagnostic interpretation and comparison. However, these markers can introduce shortcut bias in downstream automated analysis, encouraging deep learning models to rely on marker-related cues rather than clinically meaningful anatomy. Existing marker removal methods are either mask-dependent and vulnerable to error propagation, or mask-free deterministic restorers that may over-smooth ultrasound texture and perturb unaffected background regions. To address these challenges, we present Echo-DM, a framework for ultrasound marker removal via conditional latent diffusion and region-aware fusion. Echo-DM follows a common encoder-diffusion-decoder pipeline, where a DiT-based conditional latent diffusion network performs global restoration and a region-aware fusion module enforces preservation-aware image-space refinement under end-to-end mask-free inference. Building on this fixed core design, we further instantiate Echo-DM-V and Echo-DM-R with VAE-based and RAE-based latent modules, respectively, which demonstrates that the Echo-DM architecture is compatible with diverse latent-module instantiations. Extensive experiments on Echo-PAIR, a large-scale paired clinical ultrasound dataset, demonstrate superior marker removal and strong anatomical fidelity compared with representative two-stage baselines, while providing favorable quality--efficiency trade-offs across deployment settings. Data, code and models will be released at https://github.com/MiliLab/Echo-DM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09378v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhiwei Wang, Tao Huang, Wentao Jiang, Muyi Li, Jianxin Liu, Jian Chen, Jie Zou, Yong Luo, Bo Du, Jing Zhang</dc:creator>
    </item>
    <item>
      <title>Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short</title>
      <link>https://arxiv.org/abs/2606.09380</link>
      <description>arXiv:2606.09380v1 Announce Type: new 
Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a leading paradigm for improving the reasoning ability of large language models through outcome-based supervision. However, verifiable rewards frequently become uninformative at the group level: when all sampled traces of a given prompt receive identical rewards, group-relative advantage estimation provides no gradient signal, even though the traces may differ substantially in reasoning quality. We propose Reasoning Arena, an adaptive training framework that routes such non-diverse reward groups to a judge system instead of discarding them. Beyond examining the final answer, Reasoning Arena constructs trace tournaments, where reasoning traces are compared head-to-head to expose finer-grained preferences within the group, converting reasoning quality into rich relative reward signals. To make reward estimation efficient, rather than exhaustively comparing every pair, each new trace is evaluated against a small, dynamically updated pool of previously generated traces as anchors to efficiently establish a relative ranking. We then fit a Bradley-Terry model on the incomplete comparison graph, enabling scalable RL integration without quadratic pairwise comparisons. Empirical results demonstrate that Reasoning Arena consistently outperforms the RLVR baseline by 7.6% on average in competition mathematics and coding benchmarks. By converting otherwise wasted zero-advantage samples into useful gradient updates, our method accelerates training by 27% to 41%, saving nearly 50% of generation compute, and substantially improves overall reasoning performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09380v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Han Zhou, Adam X. Yang, Laurence Aitchison, Anna Korhonen, Albert Q. Jiang</dc:creator>
    </item>
    <item>
      <title>ReGIL: Retrieval-Guided Imitation Learning from a Single Demonstration</title>
      <link>https://arxiv.org/abs/2606.09381</link>
      <description>arXiv:2606.09381v1 Announce Type: new 
Abstract: Learning robot manipulation policies with deep neural networks from a single demonstration remains highly challenging, as even small deviations from the demonstrated trajectory can quickly compound into failure, while collecting substantial online interaction data is costly. We propose ReGIL, a retrieval-guided imitation learning framework that treats a single demonstration as an external memory. ReGIL repeatedly queries this static memory throughout training to simultaneously guide exploration, generate the regularization buffer, and construct rewards. Specifically, it computes rewards through local temporal alignment between the current trajectory and the retrieved segment, providing step-wise and informative feedback for policy improvement. We evaluate ReGIL on robotic manipulation tasks from the LIBERO and Meta-World benchmarks under the single demonstration setting. ReGIL outperforms prior baselines in both success rate and training efficiency. In real-robot experiments, using only one demonstration and less than one hour of online training, ReGIL achieves over 75% success rate across three manipulation tasks with randomness in both initial robot pose and target position. These results demonstrate that leveraging the single demonstration as reusable memory can provide more than static supervision for efficient robot learning. More details can be found on our website: https://regil2026.github.io/</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09381v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuying Zhang, Francesco Verdoja, Wenyan Yang, Ville Kyrki</dc:creator>
    </item>
    <item>
      <title>Consecutive Support Matching Induced Parameter Tuning Accelerates Momentum Iterative Hard Thresholding</title>
      <link>https://arxiv.org/abs/2606.09382</link>
      <description>arXiv:2606.09382v1 Announce Type: new 
Abstract: Momentum-based acceleration of iterative hard thresholding (IHT) can dramatically speed up sparse signal recovery from linear measurements, but its effectiveness hinges on careful parameter tuning -- a task complicated by the frequent support changes inherent to hard thresholding. We propose CosMIHT(Consecutive Support Matching Induced Momentum IHT), which resolves this difficulty through a simple adaptive rule: start with the conservative parameters and whenever two consecutive iterates share the same support, estimate the extreme eigenvalues of the support restricted Gram matrix via a lightweight power method and switch to the corresponding optimal heavy-ball parameters. This mechanism allows CosMIHT to automatically interpolate between cautious MIHT-like behavior during support discovery and near-optimal accelerated convergence after support identification.
  Under standard restricted isometry assumptions, we develop a two-phase convergence theory. In the \emph{wandering phase}, we establish a linear contraction of the recovery error up to a noise floor and derive an explicit upper bound on the number of iterations required to identify the correct support. In the \emph{lock-in} phase, we establish that, with a randomly initialized power method based eigenvalue estimates that depend on the number of power iterations, the algorithm enjoys, with high probability, a near-optimal accelerated convergence rate akin to the heavy ball method. We corroborate the theoretical findings with extensive numerical experiments on both noiseless and noisy measurements demonstrating that CosMIHT achieves faster convergence than state-of-the-art iterative sparse recovery techniques without compromising the recovery performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09382v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Samrat Mukhopadhyay, Debasmita Mukherjee</dc:creator>
    </item>
    <item>
      <title>An Opticalmechanics Framework for Dynamic Estimation of Multibody Systems</title>
      <link>https://arxiv.org/abs/2606.09383</link>
      <description>arXiv:2606.09383v1 Announce Type: new 
Abstract: Conventional dynamics analysis of the human body is often constrained by the need for contact force and torque sensors and controlled laboratory environments. To address this issue, this study proposes an opticalmechanics kinematic-dynamic integrated estimation framework for multibody systems. Specifically, a constrained multibody model is established to describe the system dynamics, while image-measured kinematic quantities are used as non contact inputs for dynamic estimation. The unknown joint torque is then identified through a genetic-algorithm based optimization by minimizing the discrepancy between model-predicted and image-measured kinematic quan tities. Experimental validation on an air-bearing platform showed that the wrist joint torque estimated from image data achieved a mean absolute error of 0.46 Nm compared with sensor measurements. In the forward prediction test, the model-predicted angular velocity achieved a mean absolute error of 0.006 rad/s relative to the image-measured results. This study demonstrates the potential of combining image measurement and mechanical modeling for non-contact dynamic estimation in scenarios where direct force and torque measurement is difficult.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09383v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Banglei Guan, Xuanyu Bai, Qingquan Chen, Zibin Liu, Dongcai Tan, Zhenbao Yu, Yang Shang, Qifeng Yu</dc:creator>
    </item>
    <item>
      <title>Distilling Safe LLM Systems via Soft Prompts for On Device Settings</title>
      <link>https://arxiv.org/abs/2606.09388</link>
      <description>arXiv:2606.09388v1 Announce Type: new 
Abstract: Deploying safe large language models (LLMs) on resource-constrained edge devices presents a critical challenge: while dual-model systems combining LLMs with guard models provide effective safety guarantees, their substantial memory and computational demands make them prohibitively expensive for on-device deployment. This paper presents a comprehensive study of parameter-efficient safety alignment methods for resource-constrained settings. Through systematic evaluation across multiple LLM architectures, training objectives, and parameter-efficient fine-tuning approaches, we identify that soft prompts combined with distillation-based training consistently outperform alternative methods. We introduce distillation frameworks based on total variation and KL divergence that effectively transfer safety behaviors from guard models into learned soft prompts. Our evaluations on various benchmarks demonstrate that this combination achieves superior safety-usefulness trade-offs compared to LoRA adapters, steering vectors, and direct optimization methods, while requiring minimal additional memory and compute at inference time. These findings establish soft prompt distillation as the preferred approach for safety alignment in on-device LLM deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09388v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>42nd Conference on Uncertainty in Artificial Intelligence 2026</arxiv:journal_reference>
      <dc:creator>Motasem Alfarra, Cristina Pinneri, Dana Kianfar, Mohammed Almousa, Christos Louizos</dc:creator>
    </item>
    <item>
      <title>LexRubric: A Rubric-Guided Diagnostic Benchmark for Open-Ended Legal Tasks</title>
      <link>https://arxiv.org/abs/2606.09389</link>
      <description>arXiv:2606.09389v1 Announce Type: new 
Abstract: As large language models (LLMs) are increasingly applied to real-world legal tasks, evaluating the reliability of their open-ended legal responses has become essential. These tasks require context-sensitive answers and allow little room for error, motivating fine-grained and diagnostic evaluation that can identify specific sources of response quality failures. We introduce LexRubric, a rubric-based benchmark for evaluating open-ended Chinese legal tasks. LexRubric contains 649 instances from legal consultation and judicial examination, which reflect both everyday legal needs and professional legal reasoning and cover 14 legal scenarios. It further includes 12,337 expert-written atomic scoring criteria organized under a unified six-dimensional framework, enabling accurate evaluation and diagnostic analysis across tasks and evaluation dimensions. To validate the reliability of the evaluation, we test multiple judge models and compare model-based judgments with human judgments. We further evaluate 18 recent general and legal-domain LLMs on LexRubric. Results show that different models exhibit distinct capability profiles, and that open-ended legal question remains challenging for current LLMs. Data is available at: https://github.com/foggpoy/LexRubric.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09389v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yifan Chen, Haitao Li, Yiran Hu, Kaisong Song, Jun Lin, Yueyue Wu, Qingyao Ai, Min Zhang, Yiqun Liu</dc:creator>
    </item>
    <item>
      <title>Real-time body pose non-verbal communication with a consistency-based reliability measure</title>
      <link>https://arxiv.org/abs/2606.09390</link>
      <description>arXiv:2606.09390v1 Announce Type: new 
Abstract: Body movement communicates intent at distances and in conditions where neither the face, nor speech can be captured. We study the recognition of communicative intent from 2D body pose alone. We argue that body motion is a reliable signal especially in scenarios that require real time low-cost on-device person-to-robot communication in long distance environments, such as rescue missions. However, existing resources do not isolate this signal. Affective corpora combine body, face, voice and text, while skeleton action-recognition benchmarks label the action performed rather than the message conveyed. We release a dataset of real frames of full-body pose covering ten communicative intents and we compare it against other real (IPC) and synthetic (MotionLCM, VEO3.1, Kimodo) ones that span a range of difficulty. We target systems that can run on a robot's limited onboard hardware. We benchmark multiple models, from skeleton graph classifiers to joint motion-forecasting networks, and report performance metrics together with frame rate on an embedded GPU (NVIDIA Orin~Nano), since speed matters as much as accuracy in our scenario. Finally, we show that a model's own autoregressive self-consistency works as an unsupervised reliability signal. We give a short proof that bounds the probability that a self-consistent prediction is correct, show that this probability grows with the number of consistent steps, and identify the conditions under which a confident prediction can still be false, benchmarked against industry-standard metrics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09390v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alina Marcu, Dragos Costea, Cristina Lazar, Marius Leordeanu</dc:creator>
    </item>
    <item>
      <title>From Coarse to Fine: Managing Temporal Granularity in Spatio-Temporal Data for Fine-Grained Traffic Prediction</title>
      <link>https://arxiv.org/abs/2606.09392</link>
      <description>arXiv:2606.09392v1 Announce Type: new 
Abstract: Efficient acquisition, storage, and utilization of traffic data are critical challenges in spatio-temporal data management. Most traffic data systems collect and store observations at fixed, coarse-grained temporal intervals to reduce storage and computation costs. However, such coarse-grained data severely limits downstream applications that require predictions at a finer temporal granularity. Collecting and maintaining fine-grained traffic data across all locations and time periods would impose a substantial burden on database storage and preprocessing pipelines. To address this temporal granularity mismatch, we formulate a novel problem: predicting fine-grained future traffic using coarse-grained sampled data. We propose the Spatial-Temporal Refinement Predictor (STRP), a granularity-aware framework for spatio-temporal data systems. STRP integrates two components: Tree Convolution for efficient and interpretable spatial dependency modeling, and Inverse Dilated Convolution for progressive temporal extrapolation. STRP supports two practical prediction settings: window-based and duration-based, to handle different forms of granularity mismatch. Experiments on six benchmark datasets show that STRP significantly outperforms state-of-the-art baselines in both accuracy and efficiency. Our work offers a practical and interpretable approach to managing granularity mismatches in spatio-temporal traffic data systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09392v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shuhao Li, Weidong Yang, Yue Cui, Zizhuo Xu, Lipeng Ma, Fan Zhang, Xiaofang Zhou</dc:creator>
    </item>
    <item>
      <title>CapRL++: Unified Reinforcement Learning with Verifiable Rewards for Dense Image and Video Captioning</title>
      <link>https://arxiv.org/abs/2606.09393</link>
      <description>arXiv:2606.09393v1 Announce Type: new 
Abstract: Image and video captioning are fundamental tasks that bridge the visual and linguistic domains, playing a critical role in pre-training Large Vision-Language Models (LVLMs). Current state-of-the-art captioning models are typically trained with Supervised Fine-Tuning (SFT), a paradigm that relies on expensive, non-scalable annotations and often causes models to memorize specific ground-truth answers, limiting their generality and ability to generate diverse, creative descriptions. To overcome these limitations, we propose applying Reinforcement Learning with Verifiable Rewards (RLVR) to the open-ended task of multimodal captioning. We introduce Captioning Reinforcement Learning++ (CapRL++), a novel reference-free training framework that redefines caption quality through its utility: a high-quality caption should enable a non-visual language model to accurately answer questions about the corresponding visual content. CapRL++ employs a decoupled two-stage pipeline where an LVLM generates a caption, and the objective reward is derived from the accuracy of a separate, vision-free LLM answering Multiple-Choice Questions based solely on that caption. Evaluations on more than 20 image and video benchmarks show that CapRL++ improves dense caption quality and strengthens caption-based pretraining across tasks such as spatial and temporal understanding. Pretraining on scalable image and video caption datasets annotated by CapRL++ yields substantial downstream gains. Furthermore, within the Prism Framework for caption quality evaluation, compact models trained with CapRL++ achieve dense captioning performance comparable to substantially larger models such as Qwen2.5-VL-72B and Qwen3-VL-235B-A22B. These results validate that CapRL++ effectively trains models to produce generalizable, high-fidelity descriptions, establishing a robust foundation beyond the limitations of traditional SFT.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09393v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Penghui Yang, Long Xing, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Yibin Wang, Yujie Zhou, Jiazi Bu, Jianze Liang, Qidong Huang, Jiaqi Wang, Feng Wu, Dahua Lin</dc:creator>
    </item>
    <item>
      <title>Empirical Study for Structured Output Control in LLMs for Software Engineering</title>
      <link>https://arxiv.org/abs/2606.09395</link>
      <description>arXiv:2606.09395v1 Announce Type: new 
Abstract: LLM-generated outputs in software engineering rarely exist in isolation. They must plug into toolchains, APIs, and data pipelines that impose strict, often organization-specific structural contracts. A semantically correct output that violates the expected format is, from the consuming system's perspective, indistinguishable from a wrong answer, making structural fidelity an operational prerequisite for deploying LLMs in practice. Yet current models routinely produce syntactically invalid or structurally non-compliant outputs. Unlike encoders, autoregressive decoders generate text token-by-token with a local rather than global focus, amplifying structural fragility whenever the target format deviates from familiar training distributions.
  We present a systematic evaluation of structural reliability across four representative SE tasks, categorizing failures into syntax, structural, and semantic errors. We benchmark ways of mitigation targeting the decoder: grammar-constrained decoding, regex-based validation, and a strict template-driven control (Template Token Match Generation, TTMG) to isolate the sources of these failures. TTMG nearly eliminates syntax errors, yet substantial structural and semantic errors persist, demonstrating that the core bottleneck lies beyond syntax formatting. A detailed case study further illustrates how residual errors cascade in downstream workflows. Our findings show that current structure-enforcing tools are necessary but insufficient, and highlight the need for approaches that jointly ensure structural fidelity and semantic correctness in LLM-driven workflows.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09395v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Yewei Song, Prateek Rajput, Tiezhu Sun, Saad Ezzini, Tegawend\'e F. Bissyand\'e, Jacques Klein</dc:creator>
    </item>
    <item>
      <title>PriFT: Prior-Support Guided Supervised Fine-Tuning</title>
      <link>https://arxiv.org/abs/2606.09396</link>
      <description>arXiv:2606.09396v1 Announce Type: new 
Abstract: Supervised fine-tuning (SFT) is an efficient approach for downstream task adaptation and often serves as the initialization stage for reinforcement learning (RL), but it can show weaker generalization than RL. A key limitation is its off-policy objective: SFT fits fixed demonstrations token by token, including targets poorly aligned with the model's pretrained distribution, which can lead to overfitting. A recent line of work addresses this issue by assigning larger training weights to tokens better aligned with the current model's predictive distribution, with the intuition that fitting these tokens are less distortive to the model's pretrained knowledge and representations. However, computing the token weights from the model that is currently fine-tuned entangles token weights with the optimization trajectory, inducing a self-reinforcing dynamics as the distribution rapidly departs from the pretrained model. To address this, we propose PriFT (Prior-support guided Fine-Tuning), which derives token weights from a frozen pretrained reference to obtain a stable reweighting signal unaffected by fine-tuning. This signal estimates prior support: the extent to which each target token is supported by the pretrained distribution. Across multiple existing token-reweighting rules, replacing the reweighting signal from the online model to pretrained model consistently improves performance. We introduce two instantiations: PriFT-prob uses pretrained token probability, while PriFT-mass selects tokens by cumulative probability mass under the pretrained distribution. Extensive experiments on mathematical reasoning, code generation, and medical question answering show that PriFT achieves state-of-the-art results among SFT baselines and provides a better initialization for subsequent RL training.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09396v1</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ke Wang, Shuangqi Li, Mathieu Salzmann, Pascal Frossard</dc:creator>
    </item>
    <item>
      <title>RunAgent SuperBrowser: A Theory of Autonomous Web Navigation Grounded in Human Browsing Behaviour</title>
      <link>https://arxiv.org/abs/2606.09399</link>
      <description>arXiv:2606.09399v1 Announce Type: new 
Abstract: We present SUPERBROWSER, an autonomous web-navigation agent designed against a single guiding hypothesis: a web agent should browse the way a person browses. A human reading a page does not retain every pixel they have seen; they look at a few candidate targets, decide on one, and remember only what is needed to keep the goal alive. We operationalize this perception-cognition-action triad as three coupled mechanisms. First, a vision-first bounding-box pipeline labels candidate interactive regions on every screenshot and feeds them, asynchronously prefetched, to the language model so that the "eye" precedes the "hand". Second, a three-role brain -- an Orchestrator that classifies and routes, a Planner that evaluates progress every few steps, and a Worker that emits per-step actions -- separates strategic from operational reasoning. Third, a structured Ledger stores only what a person would: the goal, the last three actions, a small set of facts and dead-ends, and a handful of checkpoints; a six-phase eviction loop systematically discards stale screenshots, state blobs, and reasoning traces from the live context. Action execution is a three-tier click cascade (Chrome DevTools Protocol to Puppeteer to scripted) with humanized Bezier motion, plus a chevron-aware bounding-box snapper that resolves the "small arrow beside a large label" ambiguity. On the Mind2Web Hard benchmark (66 tasks), SUPERBROWSER attains 89.47% success, placing third overall and ahead of every published open/research browser-agent baseline by a large margin. We argue that the gain comes not from any single trick but from the consistent application of a cognitive contract throughout the system.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09399v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Radeen Mostafa, Sawradip Saha</dc:creator>
    </item>
    <item>
      <title>vesselFM-CT: Segmenting All Blood Vessels in CT Images for System-Level Cardiovascular Analysis</title>
      <link>https://arxiv.org/abs/2606.09400</link>
      <description>arXiv:2606.09400v1 Announce Type: new 
Abstract: The vascular network in the human body is characterized by blood vessels exhibiting drastic structural variations in radius, length, topological properties, and branching patterns. This heterogeneity, together with location-specific anatomical background variations, poses a significant challenge for robust, large-scale analysis of the entire cardiovascular system. As a result, most research has focused on narrow, isolated segments of the vascular network. While such targeted studies provide valuable insights, they inherently limit the ability to assess the systemic health and functional integrity of the vascular network as a whole. In this work, we aim to bridge this gap to advance both clinical diagnostics and our fundamental understanding of vascular physiology. We propose the task of segmenting all vessels in CT images, ranging from the largest components of the cardiovascular system to even minuscule mesenteric vessels. To this end, we introduce vesselFM-CT, the first model capable of robustly segmenting all blood vessels in 3D CT images. VesselFM-CT is trained via an iterative, multi-step process and optimizes our proposed TubeLoss loss function, effectively addressing the inherent heterogeneity of the cardiovascular system. We demonstrate that vesselFM-CT outperforms all baselines and enables automated, precise extraction of the cardiovascular system from CT images, thereby unlocking a wide range of clinical and technical perspectives, including automated disease classification and synthetic CT image generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09400v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bastian Wittmann, Chinmay Prabhakar, Suprosanna Shit, Bjoern Menze</dc:creator>
    </item>
    <item>
      <title>Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models</title>
      <link>https://arxiv.org/abs/2606.09401</link>
      <description>arXiv:2606.09401v1 Announce Type: new 
Abstract: Recent work has applied differential privacy (DP) to adapt large language models (LLMs) for sensitive applications, offering theoretical guarantees. However, its practical effectiveness remains unclear, partly due to LLM pretraining, where overlaps and interdependencies with adaptation data can undermine privacy despite DP efforts. To analyze this issue in practice, we investigate privacy risks under DP adaptations in LLMs using state-of-the-art attacks such as robust membership inference and canary data extraction. We benchmark these risks by systematically varying the adaptation data distribution, from exact overlaps with pretraining data, through in-distribution (IID) cases, to entirely out-of-distribution (OOD) examples. Additionally, we evaluate how different adaptation methods and different privacy regimes impact the vulnerability. Our results show that distribution shifts strongly influence privacy vulnerability: the closer the adaptation data is to the pretraining distribution, the higher the practical privacy risk at the same theoretical guarantee, even without direct data overlap. We find that parameter-efficient fine-tuning methods, such as LoRA, achieve the highest empirical privacy protection for OOD data. Our benchmark identifies key factors for achieving practical privacy in DP LLM adaptation, providing actionable insights for deploying customized models in sensitive settings. Looking forward, we propose a structured framework for holistic privacy assessment beyond adaptation privacy, to identify and evaluate risks across the full pretrain-adapt pipeline of LLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09401v1</guid>
      <category>cs.LG</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bart{\l}omiej Marek, Lorenzo Rossi, Vincent Hanke, Xun Wang, Michael Backes, Franziska Boenisch, Adam Dziedzic</dc:creator>
    </item>
    <item>
      <title>Fully Oblivious Differential Privacy for Frequency Estimation in the Augmented Shuffle Model with Trusted Processors</title>
      <link>https://arxiv.org/abs/2606.09402</link>
      <description>arXiv:2606.09402v1 Announce Type: new 
Abstract: In the shuffle model of DP (Differential Privacy), a shuffler randomly permutes users' data to achieve high accuracy and privacy. Recent studies show that most existing shuffle protocols are vulnerable to collusion attacks by the data collector and users. They address this issue by introducing the augmented shuffle model that incorporates random sampling and dummy data addition into the shuffler. However, it remains open how to ensure the shuffler follows the protocol and does not collude with the data collector in this model.
  We address this trust issue by thoroughly exploring the augmented shuffle model with TEEs (Trusted Execution Environments). We first introduce a new privacy notion, FODP (Fully Oblivious DP), which strengthens DP to prevent various TEE side-channel attacks based on external/internal memory access patterns and control flows. We propose a general framework for FODP algorithms based on memory-size obfuscation and three concrete algorithms within it. We also improve the efficiency of our algorithms by using the count-min sketch and optimizing the number of hashes. We evaluate our algorithms on Intel SGX and demonstrate their effectiveness through comparisons with nine baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09402v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Takao Murakami, Yuichi Sei, Reo Eriguchi</dc:creator>
    </item>
    <item>
      <title>Introducing multiplex semantic networks as multifaceted representations of creative associative knowledge across multilingual samples</title>
      <link>https://arxiv.org/abs/2606.09403</link>
      <description>arXiv:2606.09403v1 Announce Type: new 
Abstract: Creativity is a complex cognitive ability that relies on knowledge organisation and retrieval from semantic memory. Yet most research uses a single task to measure it, capturing only a fraction of this complexity. This study investigates multiplex networks - layered semantic networks obtained from six cognitive tasks - as a more comprehensive approach to modelling the associative knowledge underlying creativity. We collected data from N=518 individuals from four countries (Austria, USA, Singapore, Italy). From their responses to verbal fluency, sentence-chain, free association, and narrative writing tasks, we constructed semantic networks and assembled them in a multiplex structure. AI persona-based responses provided a comparison baseline. Structural reducibility analyses showed that different task layers captured distinct, non-redundant information about semantic organisation, supporting the use of multiple tasks over any single one. The networks from high- and low-creative groups remained structurally distinct, while AI-generated networks showed near-identical structures regardless of creativity group. Finally, we used 12 features (network measures, emotional scores, and spreading activation simulations) in a machine learning model using ridge regression to predict individual creativity scores. The combination of structurally similar layers, as identified in the previous stage, improved a proof-of-concept prediction accuracy by 50%. Structural measures showed the highest feature importance, with spreading activation dynamics providing additional predictive power. Together, these findings indicate that multiplex semantic networks capture a richer, cross-cultural picture of associative knowledge underlying creativity. We also release our diverse dataset and code to foster diverse computational approaches within the creativity community.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09403v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Edith Haim, Kurt Haim, Roger E. Beaty, Cynthia S. Q. Siew, Massimo Stella</dc:creator>
    </item>
    <item>
      <title>Advanced simulation framework for AC/MTDC power systems</title>
      <link>https://arxiv.org/abs/2606.09406</link>
      <description>arXiv:2606.09406v1 Announce Type: new 
Abstract: Alternating current (AC)/multi-terminal direct current (MTDC) hybrid power systems (HPSs) play a crucial role in enabling long-distance power transmission and flexible interconnections between AC grids. However, the challenges that HPSs encountered are numerous, with stability and harmonic issues being particularly prominent. Traditional electromagnetic transient (EMT) tools have struggled to accommodate small-signal stability problems and the potential issues of the optimal interactions among converters. To address this gap, HARMONY ("HARMONic stabilitY assessment of PE-penetrated power systems") has been developed for the advanced simulation and analysis of interconnected AC/MTDC HPSs as a comprehensive mathematical framework based on C++ programming language. The primary goals of Harmony are to provide faster and trusted stability analyses, and address the analytical difficulties associated with converter control dynamics, converter-driven stability, and interoperability in HPSs. This framework is intended to be open source, therefore broadening collaboration for researchers, and to contribute to the community of power systems engineers. In this paper, we demonstrate two core functionalities featured in HARMONY, that are optimal power flow (OPF) and harmonic stability analyses (HAS). The underlying analysis models and computational methodologies for both functionalities are presented in detail to help future readers and users gain a clear understanding of mathematical fundamentals of HARMONY. Furthermore, we introduce the integrated framework of OPF and HAS designed in HARMONY, along with representative printed analysis results, to demonstrate the appealing capabilities of HARMONY.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09406v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Aleksandra Leki\'c, Azadeh Kermansaravi, Haixiao Li, Yasel Quintero Lares, Saif Alsarayreh, Robert Dimitrovski</dc:creator>
    </item>
    <item>
      <title>Delayed Functional Observers for Output-Delayed Linear Systems</title>
      <link>https://arxiv.org/abs/2606.09407</link>
      <description>arXiv:2606.09407v1 Announce Type: new 
Abstract: This paper introduces a novel class of delayed functional observers specifically designed to reconstruct delayed control laws under severe output measurement lags, directly complementing recent literature \cite{trinhnn26, trinhnam26}. By systematically mitigating simultaneous, unequal delays across both the actuator and sensor channels, the proposed architecture resolves dual-channel latency without requiring full-state estimation or computationally intensive real-time distributed integration. Ultimately, this work provides a powerful, low-order framework that bridges the gap between idealized control theory and the practical constraints of modern networked engineering systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09407v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hieu Trinh</dc:creator>
    </item>
    <item>
      <title>Can Data Work be Reparative?</title>
      <link>https://arxiv.org/abs/2606.09408</link>
      <description>arXiv:2606.09408v1 Announce Type: new 
Abstract: We present an ethnographic study of an alternative approach to data work, developed by a civic-tech initiative that builds datasets for training and benchmarking online safety systems. They aim to respond to online safety concerns from a feminist perspective, by building safety datasets collaboratively with those most impacted by online harms. In this paper, we examine how this approach aims to reorient data work as a site for repair and redress, and trace the struggles they encounter in the process. Specifically, we draw attention to the challenges and tensions involved in advancing just reward for data work and collective governance of AI datasets. Examining these challenges through an STS-informed lens of reparative justice and repair, we argue that the work of repairing data work (and AI) lies, fundamentally, in resetting the ties of accountability. At a time heightened emphasis on efforts like safety evaluations and red teaming to make AI more responsible, we highlight the need to confront foundational questions about how the humans involved in these efforts relate to the datasets and systems they help produce. A reparative lens demands that we interrupt prevailing norms of data work and place at their centre, not AI or datasets, but those most harmed by the neglect, oversight and exclusion animated in the current modes of dataset production. This, we argue, offers a bold vision for responsibility and contributes towards a critical agenda for building alternative futures of data and AI practice.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09408v1</guid>
      <category>cs.CY</category>
      <category>cs.AI</category>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Srravya Chandhiramowuli, Ding Wang, Alex Taylor</dc:creator>
    </item>
    <item>
      <title>Correct Looks Better: Pairwise Comparisons Reveal Accuracy Rankings</title>
      <link>https://arxiv.org/abs/2606.09409</link>
      <description>arXiv:2606.09409v1 Announce Type: new 
Abstract: Pairwise comparisons combined with aggregation methods like Elo have become central to evaluating generative models, yet concerns remain that they reward superficial stylistic cues or display judge biases. In a more positive turn, we show that model rankings from pairwise comparisons strongly agree with ground-truth-based accuracy rankings when such ground truth is available for comparison. By converting five well-known benchmarks into free-form generative evaluations, we find that Elo rankings achieve a Spearman correlation above 0.9 with accuracy rankings and substantially outperform direct evaluation when the judge is weak. Furthermore, style and judge bias have only minor effects on model rankings, despite most judgments occurring on pairs where both candidate answers are correct (or incorrect). On such pairs, we find that repetition after the final answer (echo) is a causal driver of judge preference.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09409v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Mina Remeli, Moritz Hardt</dc:creator>
    </item>
    <item>
      <title>Capacity, Not Format: Rethinking Structured Reasoning Failures</title>
      <link>https://arxiv.org/abs/2606.09410</link>
      <description>arXiv:2606.09410v1 Announce Type: new 
Abstract: Prior work treats structured output as a reasoning tax, but this framing is incomplete: the cost of formatting depends strongly on a model's spare capacity. Using information-matched prose controls and a four-level schema complexity gradient, we separate format-specific effects from prompt-length confounds across 4 models and 5 benchmarks with 0% parse failures on successfully generated responses.
  We find that structured formats are capacity-dependent. Models with sufficient headroom absorb JSON constraints without degradation (Sonnet: $88.7\pm4.0$% JSON vs. $89.3\pm1.7$% CoT on MATH-Hard). In contrast, formats severely degrade models operating near their limits through two distinct mechanisms. First, under standard token budgets, Haiku drops 36.2pp ($p &lt; 0.0001$) largely due to truncation. Second, even with extended budgets eliminating truncation, GPT-4o-mini drops 28.0pp ($p &lt; 0.001$), revealing pure capacity competition independent of token exhaustion.
  This format penalty scales with schema complexity (McNemar $p &lt; 0.0001$) and cannot be explained by prompt length alone. Furthermore, these results qualify claims of frontier model immunity: on AIME competition math, Opus 4.7 drops from 96.2% to 91.0% under JSON ($-5.3$pp; the displayed percentages are independently rounded, exact difference is $7/133 = 5.26$pp $\approx 5.3$pp). A delayed-structure ablation -- reasoning freely before formatting -- recovers most of the lost accuracy (3-run mean: 80--87%), supporting the capacity competition mechanism. The practical implication is not to avoid structured output, but to match it to capacity: when a model is near its limits, think first, format later.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09410v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hengxin Fan</dc:creator>
    </item>
    <item>
      <title>Now You (Still) See Me: Detecting Evasive Steganographic Payloads in LLMs</title>
      <link>https://arxiv.org/abs/2606.09411</link>
      <description>arXiv:2606.09411v1 Announce Type: new 
Abstract: Large language models can be fine-tuned to encode prompt-borne secrets into fluent, seemingly benign outputs. This creates a steganographic exfiltration risk that is difficult to detect with output-level steganalysis. Recent work proposes mechanistic detection using linear probes that recover the secret from internal activations. We show that this defense can be systematically evaded, but that detectability can be recovered through a targeted data-level intervention. First, we extend the detection setup to include a non-linear MLP probe. We then adversarially fine-tune steganographic trojans across five base models: Qwen3-8B, Llama-3.1-8B, Ministral-8B, Qwen3-14B, and Phi-4-14B. The resulting models retain $58$--$79\%$ exact-match secret recovery while evading both ridge and held-out MLP probes, with $1$--$8\%$ average capability degradation across six benchmarks. We then give an information-theoretic characterization of this evasion. Successful evasion preserves recoverability while reducing low-order extractability of the secret from the content-aligned representation, forcing the payload into synergistic interaction with residual degrees of freedom. This motivates a recontextualization dataset that restricts these residual degrees of freedom. On this distribution, both ridge and MLP detectability are restored across all five evasive trojans. Overall, our findings show that activation-based steganography detection is vulnerable to adaptive evasion, but also that theory-guided evaluation distributions can expose otherwise hidden payloads.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09411v1</guid>
      <category>cs.CR</category>
      <category>cs.IT</category>
      <category>cs.LG</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Charles Westphal, Timothy Douglas, Keivan Navaie, Tiago Pimentel, Fernando E. Rosas</dc:creator>
    </item>
    <item>
      <title>Towards Post-Quantum Secure Pharmacovigilance with ML-KEM and ML-DSA</title>
      <link>https://arxiv.org/abs/2606.09412</link>
      <description>arXiv:2606.09412v1 Announce Type: new 
Abstract: Pharmacovigilance systems handle sensitive healthcare and drug-safety data, including adverse event reports and clinical observations. As quantum computing advances, classical public-key cryptographic systems such as RSA and elliptic-curve cryptography may become vulnerable, creating long-term risks for healthcare data that must remain confidential for many years. This paper presents an educational prototype of a post-quantum secure pharmacovigilance data pipeline. The system uses ML-KEM-768 for post-quantum key establishment, HKDF-SHA-256 for deriving an AES key, AES-256-GCM for efficient file encryption, and ML-DSA-65 for digital signatures and tamper detection. The pipeline supports multiple file formats, including TXT, CSV, JSON, and PDF, by treating files as raw bytes and preserving metadata for reconstruction at the receiver. The prototype includes separate hospital, gateway, pharma receiver, attacker, benchmarking, and dashboard components. We evaluate the system using synthetic pharmacovigilance datasets of different sizes and formats. Our results show that ML-KEM adds a small constant overhead, while AES encryption and ML-DSA signing dominate runtime as file size increases. This work is not a production-ready healthcare system, but rather an educational systems-level exploration of how post-quantum cryptographic primitives can be integrated into healthcare-style data pipelines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09412v1</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Saee Desai, Tom Shimoni, Eddie Cameron, David Akamine, Aniketh Chunduri</dc:creator>
    </item>
    <item>
      <title>AI Assurance in UK Defence: Challenges in Operationalising JSP 936</title>
      <link>https://arxiv.org/abs/2606.09414</link>
      <description>arXiv:2606.09414v1 Announce Type: new 
Abstract: This report examines practical challenges in operationalising JSP 936 Part 1 for AI assurance in UK Defence. Using a structured interpretive review of the directive's requirements, the analysis identifies eight thematic challenge areas adequacy of evidence and argument, management of human interaction with AI, definition of the operational environment, integration of AI within systems of systems, assessment and maintenance of AI performance, analysis of safety and security, measurement of ethicality, and mitigation of the inherent complexities of AI. The report argues that JSP 936 provides a useful governance basis, but that implementation depends on unresolved technical, organisational, and assurance questions. These challenges stem from the socio-technical nature of AI-enabled systems, uncertainty in real-world deployment contexts, limitations in current assurance methodologies, and tensions between performance, safety, human oversight, security, and ethical acceptability. The report identifies areas where further methods, guidance, and organisational capability are needed for the ambitious, safe, and responsible adoption of AI across Defence. This is consistent with MOD's own framing of JSP 936 as requiring iterative implementation and supporting guidance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09414v1</guid>
      <category>cs.HC</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Callum Cockburn, Sam Farrow</dc:creator>
    </item>
    <item>
      <title>Harness Engineering for Physical AI: Robot Middleware Is the Harness Layer</title>
      <link>https://arxiv.org/abs/2606.09416</link>
      <description>arXiv:2606.09416v1 Announce Type: new 
Abstract: Robot middleware faces a new role in the era of Physical AI. Learned policies, planners, and vision-language-action (VLA) models now enter deployed robots as causal participants on the control path, but the layer that integrates them with timing, scheduling, and network has not been named. Recent language-agent work names this layer the harness, the external system that mediates tools, manages state, bounds resources, and records execution. The robotics community has not yet adopted this framing, and we propose that robot middleware is that harness. A Physical AI harness differs from a software harness in where it intervenes. A software harness mediates at tool-call boundaries. A Physical AI harness must mediate at control, computing, and communication simultaneously, because a learned policy's output crosses all three: its commands shift the trajectory, its inference time shifts the schedule, and its payload shifts the bandwidth. Robot middleware is the lowest robot-stack layer with mediating abstractions over all three, so it is best positioned to compose their enforcement. It already provides most of what a harness needs but lacks the enforcement for an AI model. We name this missing enforcement as three functions: Projection gates each output at emission, Isolation bounds the model's execution and transmission slot, and Transfer falls back to a verified baseline when checks fail. Each appears today as hand-built application code in deployed robot systems, built on surfaces robot middleware already provides. Robot middleware should host them not as the best single-axis enforcer but as the layer that composes all three. We sketch this as a ROS 2 Harness Profile, a deployment artifact that carries an AI model's declared output region, inference budget, and operating regime while the middleware enforces them across ROS 2, DDS, and Zenoh.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09416v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sanghoon Lee, Jiyeong Chae, Kyung-Joon Park</dc:creator>
    </item>
    <item>
      <title>What Should a Skill Remember? Quality-Cost Trade-offs in Cost-Aware Skill Rewriting for Language Model Agents</title>
      <link>https://arxiv.org/abs/2606.09421</link>
      <description>arXiv:2606.09421v1 Announce Type: new 
Abstract: Large language model agents increasingly rely on skills: reusable procedural documents encoding workflows, tool use, implementation patterns, validation checks, and domain rules. Skill rewriting is often treated as prompt compression, but shorter skills can make agents more expensive by removing sparse operational anchors that prevent exploration, debugging, and recovery. We study skill rewriting through this economic lens. Our controlled framework profiles skill structure, rewrites skills using information-preservation strategies, and evaluates the rewrites under fixed task instructions, environments, and verifiers. Experiments on SkillsBench reveal distinct quality--cost trade-offs across strategies: API/code anchoring, workflow guarding, and rule/formula anchoring benefit different task families, with no universally dominant template. In the main held-out evaluation, the learned policy reduces total cost by 7.0\% and downstream agent-token cost by 6.0\%; in frozen cross-model transfer, the corresponding reductions average 14.7\% and 13.7\%, while verifier quality is preserved. These results position skill design as cost-aware operational knowledge engineering rather than prompt compression. Resources: \href{https://github.com/1Reminding/Skill_EE}{SkillEE}.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09421v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qinghua Xing, Yinda Chen, Yaping Jin, Zhenhe Wu, Bohan Lin, Hang Zhou, Xinghao Chen, Hanting Chen, Zhiwei Xiong</dc:creator>
    </item>
    <item>
      <title>Toward Signing Activity Projection in Sign Language Interaction</title>
      <link>https://arxiv.org/abs/2606.09424</link>
      <description>arXiv:2606.09424v1 Announce Type: new 
Abstract: Social robots must interact robustly not only with users assumed by speech-centered systems but also with diverse users whose communication relies on different modalities, e.g., sign language. One important capability gap is predictive turn-taking with signing users. Although Voice Activity Projection (VAP) has been successfully used to model future voice activity in spoken interaction, it remains unclear whether the framework transfers to sign language interaction. This paper presents an initial transfer study of adapting a VAP architecture to dyadic sign language interaction. Using interaction recordings from the Public DGS Corpus, we derive binary signing activity streams from lexical sign annotations and formulate proxy tasks for turn-taking prediction. The model uses pose-derived hand, eye-region, and mouth-region features extracted for each signer. The results show that SHIFT/HOLD prediction is promising, especially with hand cues, while SHIFT-prediction remains difficult. These findings provide initial evidence for both the promise and the current limitations of transferring predictive turn-taking models from spoken interaction to sign language interaction. Predictive modeling of sign language interaction still requires sign-language-specific event definitions that go beyond speech-derived categories.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09424v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Takao Obi, Wang Yusong, Koji Inoue, Kotaro Funakoshi</dc:creator>
    </item>
    <item>
      <title>WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces</title>
      <link>https://arxiv.org/abs/2606.09426</link>
      <description>arXiv:2606.09426v1 Announce Type: new 
Abstract: Computer-use agents (CUAs) increasingly operate in runtimes that combine visual desktop control, command-line execution, code editing, browsers, and external tools. Existing benchmarks, however, often evaluate these interfaces as separable capabilities, leaving long-horizon cross-interface orchestration under-tested. Thus, we introduce WeaveBench, a long-horizon hybrid-interface benchmark with 114 tasks across 8 real-world work domains, grounded in real user requests and publicly verifiable artifacts. Each task requires agents to combine GUI observations/actions with CLI/code operations within a single trajectory. We evaluate these tasks on a real Ubuntu desktop inside deployed CLI-agent runtimes, augmented with a minimal desktop-control plugin. We also propose a companion trajectory-aware judge that inspects deliverables, files, screenshots, logs, and action traces, while detecting shortcut behaviors such as fabricated visual evidence or hard-coded metrics. Across frontier model-runtime pairings, the best PassRate reaches only 41.2%, showing the benchmark remains far from saturated. The trajectory-aware judge further reveals that outcome-only grading substantially overestimates agent performance. Overall, WeaveBench exposes a critical gap in CUA evaluation and provides an effective testbed to measure whether agents can orchestrate GUI, CLI, and code operations across long-horizon real-world tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09426v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wanli Li, Bowen Zhou, Yunyao Yu, Zhou Xu, Yifan Yang, Dongsheng Li, Caihua Shan</dc:creator>
    </item>
    <item>
      <title>Guide Me Out: A Framework to Benchmark VLM Operators Communication in Crisis Scenarios</title>
      <link>https://arxiv.org/abs/2606.09428</link>
      <description>arXiv:2606.09428v1 Announce Type: new 
Abstract: Effective crisis response requires spatially grounded communication that bridges linguistic guidance of civilians with the physical environment, accounting for structural bottlenecks, evolving threats, and agent-specific contexts. Yet, current NLP research in crisis communication remains mainly limited to static, text-only classification settings, overlooking the critical communicative role of AI operators in dynamic, embodied scenarios. We address this gap with a novel benchmarking framework for evaluating Vision-Language Models (VLMs) tasked with guiding civilian agents through simulated evacuations. We test two communication strategies (narrowcast vs. broadcast), two environment representations (visual vs. graph-based), and two threat behaviors (static vs. moving) across nine maps of varying structural complexity. Our results show that Narrowcast consistently reduces civilian Fail rates compared to Broadcast across all difficulty levels. Guidance quality depends heavily on how the VLM operator represents the world: the visual modality drives performance, while adding an adjacency graph is model-dependent and often harmful. Moving threats raise Fail rates across all conditions as communication must continuously adapt over time. Together, these findings show that deploying VLMs as AI operators in evacuation scenarios remains a non-trivial challenge, where the choice of communication strategy and input representation can directly determine the success or failure of the intervention.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09428v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Giacomo Gonella, Stefano Menini, Marco Guerini</dc:creator>
    </item>
    <item>
      <title>LargeMonitor: Monitoring Online Task-Free Continual Learning via Large Pretrained Models</title>
      <link>https://arxiv.org/abs/2606.09430</link>
      <description>arXiv:2606.09430v1 Announce Type: new 
Abstract: Online task-free continual learning (TFCL) requires intelligent agents to sequentially accumulate knowledge from an unbounded, non-stationary data stream under strict single-pass constraints and without any explicit task identifiers. Existing online TFCL paradigms primarily rely on parameter-efficient prompt tuning or dynamic structure expansion driven by training-coupled optimization dynamics, such as empirical loss fluctuations or evolving latent distances. As a result, these training-coupled solvers remain agnostic to the structural origins of distribution drift, mechanically enforcing a fixed strategy across fundamentally distinct streaming variations. To address this gap, we propose LargeMonitor, a framework that leverages large pretrained foundation models to autonomously orchestrate task-free continuous adaptation. Specifically, LargeMonitor introduces a decoupled detection module utilizing the frozen, stable representation space of large vision models (LVMs) to achieve robust, zero-shot drift detection without training-dependent interference or brittle threshold tuning. Upon a confirmed drift, the framework activates a context-aware diagnostic module driven by large multimodal models (LMMs) to interpret the precise semantic etiologies of the stream variation (e.g., novel class emergence vs. environmental domain shift). This dual-stage capability empowers the continuous learner to dynamically deploy adaptive and shift-specific optimization strategies. Extensive experiments across multiple TFCL settings and benchmarks demonstrate that LargeMonitor achieves precise, robust detection and diagnosis of complex data streams while consistently improving the performance of existing online TFCL algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09430v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mingqi Yuan, Xiaoquan Sun, Shihao Luo, Jiayu Chen</dc:creator>
    </item>
    <item>
      <title>Graph Mamba Operator: A Latent Simulator for Interacting Particle Systems</title>
      <link>https://arxiv.org/abs/2606.09432</link>
      <description>arXiv:2606.09432v1 Announce Type: new 
Abstract: Modeling interacting dynamical systems requires capturing spatial interactions alongside long-range temporal dependencies. Graph neural networks (GNNs) provide a natural representation but typically rely on autoregressive rollouts and treat spatial and temporal dynamics separately, leading to error accumulation over long horizons. Existing approaches also focus on local interactions and short temporal contexts, limiting their ability to capture multi-hop dependencies and global structure. We introduce the Graph Mamba Operator (GraMO), a latent-space simulator that integrates state-space models with graph-based interaction learning. In contrast to prior work that sequences nodes or applies spatial and temporal updates in separate stages, GraMO couples graph-based interactions and temporal state updates within a single recurrence. The update is linear in the latent state, with input-dependent coefficients that adapt across regimes. We evaluate GraMO on N-body systems, motion capture, and robotics datasets, achieving the lowest error across benchmarks and the largest gains in long-horizon prediction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09432v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Karn Tiwari, Niladri Dutta, N M Anoop Krishnan, Prathosh A P</dc:creator>
    </item>
    <item>
      <title>Bayesian Selective Latent Inference for Wastewater-First Influenza Monitoring</title>
      <link>https://arxiv.org/abs/2606.09433</link>
      <description>arXiv:2606.09433v1 Announce Type: new 
Abstract: Wastewater influenza surveillance can reveal community circulation before clinical reporting, but wastewater alone is not a fully identifiable proxy for human burden. Existing wastewater models assume a fixed evidence set, while generic evidence-acquisition methods treat official surveillance streams as interchangeable costly features. We cast wastewater-first influenza monitoring as a selective decision problem: starting from mandatory wastewater evidence, the system must decide whether wastewater is sufficient, which delayed official stream to query next, and when abstention is the only scientifically defensible action under source ambiguity. We propose Bayesian Selective Latent Inference (BSLI), a principled Bayesian method that maintains a posterior over latent burden and identifiability, certifies answerability through explicit scientific gates, and optimizes query-stop decisions with an exact cost-calibrated Bellman policy. We prove the key variational, answerability, Bellman-optimality, and one-dimensional cost-calibration properties. On a fixed public-data benchmark with 5,933 forecasting episodes and 3,102 source-ambiguity episodes, BSLI improves the matched-budget cost-performance frontier while preserving conservative abstention under source ambiguity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09433v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yixuan Zhang (Section of Health Data Science and AI, Department of Public Health, University of Copenhagen, Copenhagen, Denmark), Yang Song (Section of Health Data Science and AI, Department of Public Health, University of Copenhagen, Copenhagen, Denmark), Hao Wang (Rutgers University, New Brunswick, NJ, USA), Samir Bhatt (Section of Health Data Science and AI, Department of Public Health, University of Copenhagen, Copenhagen, Denmark, MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Faculty of Medicine, Imperial College London, London, United Kingdom), Hengguan Huang (Section of Health Data Science and AI, Department of Public Health, University of Copenhagen, Copenhagen, Denmark, MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Faculty of Medicine, Imperial College London, London, United Kingdom)</dc:creator>
    </item>
    <item>
      <title>Operator learning for solving Fokker-Planck equations with various initial conditions</title>
      <link>https://arxiv.org/abs/2606.09434</link>
      <description>arXiv:2606.09434v1 Announce Type: new 
Abstract: The Fokker-Planck equation (FPE) plays a pivotal role in describing the time evolution of probability density functions (PDFs) for systems governed by stochastic dynamics. In this work, we propose a conditional normalizing flow-based physics-informed neural network (PINN) framework for efficiently approximating the solution operator of the FPE for a whole range of initial conditions. Leveraging the Chapman-Kolmogorov equation for Markovian stochastic processes, the problem is reformulated into approximating a transition PDF starting at initial time from a Dirac mass centered at an arbitrary point. The PDF of an associated linearized stochastic differential equation (SDE) is employed as the base distribution for the normalizing flow, providing a good approximation of the target PDF, especially for small times, and thereby avoiding the singularity of the map associated with the Dirac delta initial distribution. Furthermore, a time-weighted loss function is introduced to mitigate numerical instabilities arising at small times, achieving a balance between causality and training difficulty as time progresses. A variety of numerical experiments are presented to illustrate the effectiveness and robustness of the proposed method.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09434v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Li Zeng, Xiaoliang Wan, Yaobin Wang, Fabio Nobile, Tao Zhou</dc:creator>
    </item>
    <item>
      <title>MUDIDI: A Two-Stage Framework for Multilingual Dictionary Digitization with Language Models</title>
      <link>https://arxiv.org/abs/2606.09435</link>
      <description>arXiv:2606.09435v1 Announce Type: new 
Abstract: Multilingual dictionaries are among the most valuable documentary resources for low-resource and endangered languages, yet many remain available only as scans. For many decades, their digitization and conversion into a machine-readable format was nearly impossible due to language-specific scripts, complex multi-column layouts full of entries with abbreviations and cross-references. Recent vision-language models offer a promising solution, but it is unclear how well they preserve characters, markup, and process lexicographic structure. We introduce MUDIDI, a two-stage framework for multi-lingual dictionary digitization. Stage One evaluates the quality of character recognition and markup preservation; Stage Two focuses on dictionary entry segmentation with subsequent mapping into a machine-readable lexicographic schema, SIL's Multi-Dictionary Formatter. We also release a dataset that consists of human-annotated lexicographic entries collected from 30 public-domain dictionaries featuring diverse writing systems, language families, and formats. We benchmark OCR systems, general-purpose Large Language Models (LLMs), and Vision Language Models (VLMs) on the dataset, demonstrating superior performance of LLMs across most writing systems and languages in both stages, and provide practical guidelines on improving the results for more challenging scenarios. Finally, we show that supplementing additional information, such as dictionary introduction, to the LLMs can improve the quality of the digitized dictionary. Github: https://github.com/DavidSamuell/MUDIDI-Pipeline-for-Digitization-of-Multilingual-Dictionary/</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09435v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>David Setiawan, Temuulen Khishigsuren, Milind Agarwal, Pagnarith Pit, Aso Mahmudi, Ekaterina Vylomova</dc:creator>
    </item>
    <item>
      <title>Leveraging Optimal Information-Power Flow for Transmission Switching in AC/MTDC Grids</title>
      <link>https://arxiv.org/abs/2606.09436</link>
      <description>arXiv:2606.09436v1 Announce Type: new 
Abstract: The emerging AC/multi-terminal DC grids are regarded as a promising solution for accommodating the increasing integration of renewable energy sources. This work proposes an optimization framework to address transmission switching (TS) problems arising in practical operational scenarios, such as maintenance scheduling, contingency management, and fault restoration. Unlike most existing studies, the proposed framework considers the role of communication networks in TS operations and develops an optimal information-power flow (OIPF) model. The OIPF model captures the impact of information flows on circuit breaker actions while incorporating communication-related costs, thereby better reflecting practical operational decision-making processes. To ensure computational tractability, the resulting optimization problem is formulated as a mixed-integer second-order cone programming (MISOCP) model through convex relaxations, polygonal approximations, and Big-M reformulations. Numerical case studies illustrate the applicability of the proposed OIPF model and indicate its potential in supporting transmission switching decisions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09436v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haixiao Li, Aleksandra Leki\'c</dc:creator>
    </item>
    <item>
      <title>Tracking the Effective Surface Area of Non-Convex Satellites</title>
      <link>https://arxiv.org/abs/2606.09439</link>
      <description>arXiv:2606.09439v1 Announce Type: new 
Abstract: This paper presents a novel framework to track the effective surface area of non-convex satellites, enabling the use of aerodynamic drag in low Earth orbit for orbital control. The proposed framework enables the satellite to track the effective surface area while simultaneously performing other maneuvers. We introduce this framework through a backstepping control algorithm, and exemplify its advantages with an extension, to simultaneously maximize solar panel exposure. The equilibria of the closed-loop systems are shown to be asymptotically stable, and simulation results confirm the effectiveness of the proposed framework.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09439v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Lauritz Rismark Fosso, Raymond Kristiansen, Jan Tommy Gravdahl, Sveinung Johan Ohrem, Alessio Bocci</dc:creator>
    </item>
    <item>
      <title>SIFT: Selective-Index For Fast Compute of RAG Prefill by Exploiting Attention Invariance</title>
      <link>https://arxiv.org/abs/2606.09441</link>
      <description>arXiv:2606.09441v1 Announce Type: new 
Abstract: Retrieval-Augmented Generation (RAG) injects LLM queries with relevant documents to improve response quality. This injection increases prompt length and slows time to first token (TTFT). Unlike standard queries, RAG queries have a unique property of context reuse where the same documents recur across user queries. Thus, fully recomputing documents for every RAG query does redundant compute and increases TTFT. Prior works precompute KV tensors of RAG documents offline and coarsely recompute some tokens during online prefill. However, such KV reuse is often slower than full recomputation on modern GPUs due to high-latency disk transfers. Further, such a coarse-grained recomputation degrades accuracy.
  To address these limitations, this paper proposes SIFT: Selective-Index For Fast Compute of RAG Prefill by Exploiting Attention Invariance. SIFT processes documents offline and extracts fine-grained locations of high attention scores for each document. Next, we identify the following attention invariance insights that enable us to exploit the extracted locations during runtime: (1) Local-Attention Invariance: The location of high attention scores within a document remain invariant to surrounding documents. This helps us predict the location of high scores where the document attends to itself. (2) Cross-Attention Consistency: Keys with high intra-document attention also attract cross-attention from subsequent documents. This helps us predict the location of high scores where the document attends to future documents. Critically, SIFT stores no KV data and only stores locations of high scores in the form of two compact bit vectors. SIFT's storage is up to 24,000x smaller than KV tensors, obviating costly disk transfers. During prefill, SIFT computes the attention only for the marked locations and improves TTFT by 1.71x while holding accuracy within 1% of full recompute.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09441v1</guid>
      <category>cs.AI</category>
      <category>cs.AR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rya Sanovar, Srikant Bharadwaj, Hritvik Taneja, Moinuddin Qureshi</dc:creator>
    </item>
    <item>
      <title>Leveraging Morphology for Historical Script Metrological Analysis</title>
      <link>https://arxiv.org/abs/2606.09446</link>
      <description>arXiv:2606.09446v1 Announce Type: new 
Abstract: Advances in handwritten text recognition have enabled large-scale transcription of historical documents, but still provide limited access to interpretable visual measurements for paleography, the study of historical scripts. In this paper, our main insight is that morphological script analysis, in particular the capacity to learn character prototypes from line-level transcriptions, enables the definition of scalable, meaningful, and stable paleographic measurements. More precisely, we leverage a transformer-based detection architecture together with a prototype-based line reconstruction module to learn prototypical characters and their occurrence, deformation, and positioning.
  Our contributions are twofold. First, we introduce a deep architecture and learning methodology that enables efficient character modeling with only line-level transcription supervision, significantly improving over the Learnable Typewriter baseline and enabling accurate character bounding box prediction, unlocking its potential for paleographic measurements. Second, we introduce and demonstrate the paleographical relevance of automatic measurements enabled by our architecture for characters, bi-grams, and spaces between graphical units. For this demonstration, we extend the annotations of the codex Paris, BnF, fr. 2813, commissioned in the late fourteenth century by Charles V and copied by four hands, to 160 pages. We visualize our measurements over these pages, showing how they enable us not only to differentiate graphical profiles, but also to discover and analyze subtle variations. This case study outlines the scalability of our approach and its frugality in terms of required training data, since a single column of text is sufficient to compute our measurements on each of the 160 pages.
  Data and code are publicly available at: https://malamatenia.github.io/morphology4metrology-analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09446v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Malamatenia Vlachou Efstathiou, Rapha\"el Baena, Dominique Stutzmann, Mathieu Aubry</dc:creator>
    </item>
    <item>
      <title>AliyunConsoleAgent: Training Web Agents in Real-World Cloud Environments via Distillation and Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2606.09447</link>
      <description>arXiv:2606.09447v1 Announce Type: new 
Abstract: We present AliyunConsoleAgent, a web agent framework for automated documentation verification in real-world cloud consoles. Major cloud platforms encompass hundreds of products with rapid feature iteration, causing console UIs to frequently diverge from their corresponding documentation. Verifying that documented procedures accurately reflect the current console and can be executed end-to-end demands an estimated 4 million recurring inspections annually, yet manual coverage remains below 1%. While agent systems built on frontier proprietary models achieve high success rates, their prohibitive cost and data privacy constraints preclude large-scale deployment. We propose a two-stage training paradigm: supervised fine-tuning (SFT) on distilled frontier-model trajectories, followed by reinforcement learning using Group Relative Policy Optimization (GRPO) and a dual-channel outcome reward model in real cloud environments. To support large-scale RL training, we construct a high-determinism rollout system featuring Terraform-based resource pre-provisioning and LLM-driven on-demand provisioning, which effectively isolates environment noise from the training signal. We further introduce a rule-based reward evaluation protocol grounded in backend audit logs, providing objective, reward-hacking-resistant outcome judgment. Our model evolves from mechanical instruction following to autonomous decision-making with cloud console and product-specific understanding. Experiments on a challenging 278-task benchmark where the best frontier model achieves only 65.34% demonstrate that AliyunConsoleAgent-32B achieves a 63.52% mean success rate -- a 20.24 percentage-point improvement over the base model, narrowing the gap to the best frontier proprietary model to 1.82 pp (bootstrap 95% CI [-1.27, 7.39]) -- at 92% lower inference cost.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09447v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bojie Rong, Zheyu Shen, Qiaoping Wang, Pengfei Kang, Yang Xu, Yawen Wei, Hanyu Wu, Zhi Zhao, Leihao Pei, Linquan Jiang</dc:creator>
    </item>
    <item>
      <title>Explicit and asymptotically good constructions of Algebraic Geometry codes in the sum-rank metric</title>
      <link>https://arxiv.org/abs/2606.09448</link>
      <description>arXiv:2606.09448v1 Announce Type: new 
Abstract: Algebraic Geometry (AG) codes (i.e. linear codes from algebraic function fields) in the Hamming metric were proposed by Goppa in 1980 and have been intensively studied ever since. Linearized Algebraic Geometry codes, the analogue of AG codes in the sum-rank metric, were instead introduced more recently [9], using quotients of the ring of Ore polynomials with coefficients in an algebraic function field. In this paper, we further investigate the results in [9], providing explicit, optimal and asymptotic constructions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09448v1</guid>
      <category>cs.IT</category>
      <category>math.AG</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Peter Beelen, Elena Berardini, Anina Gruica, Maria Montanucci</dc:creator>
    </item>
    <item>
      <title>Reasoning without Gold Standards: A Proxy-Judge Theory of Autoformalization</title>
      <link>https://arxiv.org/abs/2606.09449</link>
      <description>arXiv:2606.09449v1 Announce Type: new 
Abstract: Complex reasoning tasks increasingly require systems to produce outputs whose correctness cannot be judged by exact match against a single reference. Autoformalization (AF) is a representative example; it asks a model to translate informal mathematical or logical reasoning into a formally checkable object, yet expert-validated formalizations do not scale beyond toy cases and a single informal argument can admit many valid formal renderings. Progress therefore depends on whether partial, structured proxies can substitute for exact references.
  We introduce a reference-free proxy-judge framework for AF that replaces gold-standard matching with a vector of per-axis property checks. The framework organizes the proxy along three structural scopes that cover global properties of the elicited object, per-module properties internal to its sub-components, and cross-domain properties that re-align it to the informal source, and aggregates each axis into a verdict vector. The vector drives a reflective refinement loop in which a violated coordinate routes the controller to a matching repair target, so each iteration changes only what is judged wrong.
  Under bounded judge noise, the expected intrinsic gap contracts geometrically to a noise-dependent plateau. Across seven formalization backbones on miniF2F, ProofNet, e-SNLI, and ProntoQA, refinement consistently lifts Pass Rate over the single-shot ICL baseline, and the per-axis proxy outperforms a matched scalar proxy on benchmarks where the baseline has room to improve. Structured proxy judgments therefore provide both a practical refinement signal and a theoretical handle on convergence when exact references are unavailable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09449v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lei Xu, Xin Quan, Andr\'e Freitas</dc:creator>
    </item>
    <item>
      <title>TheoremBench: Evaluating LLMs on Theorem Proving in Formal Mathematics</title>
      <link>https://arxiv.org/abs/2606.09450</link>
      <description>arXiv:2606.09450v1 Announce Type: new 
Abstract: LLMs have recently achieved strong results on formal proving benchmarks. However, existing evaluations remain heavily concentrated on competition-style problems and often fail to capture how models behave on longer, more dependency-rich mathematical developments. We introduce TheoremBench, a Lean4 benchmark designed to evaluate theorem provers beyond contest settings. The benchmark is built from nearly one hundred classical theorems and is released in two complementary forms: a plain main version containing one target theorem per instance, and a premised version that expands each theorem into a structured family of related proving tasks consisting of the main theorem together with automatically extracted supporting subtheorems. This design enables evaluation of not only whether the final theorem was proved from scratch, but also of partial progress through the internal proof structure of a theorem. Our experiments show that explicit premises substantially improve performance for Lean4-capable prover models. To provide a comprehensive evaluation, we introduce theorem-level coverage and token-efficiency metrics that expose qualitative differences in proof behavior. The results show that current provers remain strongly biased toward easy subtheorems and often solve theorems through long and inefficient tactic traces rather than compact proof plans. TheoremBench therefore provides a more fine-grained view of formal reasoning ability and highlights the importance of structural benchmark design for evaluating Lean4 theorem provers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09450v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>QuocViet Pham, Elvir Karimov, Andrey Galichin, Ivan Oseledets</dc:creator>
    </item>
    <item>
      <title>Dense Force Estimation with an Event-based Optical Tactile Sensor</title>
      <link>https://arxiv.org/abs/2606.09451</link>
      <description>arXiv:2606.09451v1 Announce Type: new 
Abstract: Humans rely on spatially dense, geometry and force-aware tactile feedback at high temporal resolution for dexterous manipulation. While vision-based tactile sensors enable dense force estimation, they are limited by camera frame rates, motion blur, and data bandwidth. Event-based optical tactile sensors offer an attractive alternative with microsecond temporal resolution and low motion blur, but existing methods are restricted to predicting only net forces. We introduce the first framework for dense 3D force field reconstruction using event-based optical tactile sensors. Our approach estimates 3D surface displacements from event data and maps them to forces via the inverse Finite Elements Method (iFEM). Shear displacements are recovered through the proposed event-based marker tracking algorithm, while normal displacements are predicted by a convolutional neural network trained on a collected dataset of synchronized force-displacement-event data. Experiments demonstrate accurate reconstruction of physically grounded forces, achieving a mean absolute error of (0.14 N, 0.10 N, 0.93 N) over force ranges up to (4 N, 4 N, 20 N), while operating at an average of 100 Hz. This work constitutes a first step toward enabling dense force feedback for high-frequency control in robotic grasping and dexterous manipulation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09451v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Agis Politis, Ren\'e Zurbr\"ugg, Valentina Cavinato</dc:creator>
    </item>
    <item>
      <title>GD-MIL: Grade-Disentangled Multiple Instance Learning for Multimodal Biochemical Recurrence Prediction in Prostate Cancer</title>
      <link>https://arxiv.org/abs/2606.09453</link>
      <description>arXiv:2606.09453v1 Announce Type: new 
Abstract: Biochemical recurrence (BCR) after radical prostatectomy is a critical endpoint in prostate cancer, yet risk stratification relies almost entirely on variables dominated by Gleason grade. Whether H&amp;E whole slide images (WSIs) carry prognostic signal beyond grade, and whether multiple instance learning (MIL) can recover it, remains unsettled. A key obstacle is that many pipelines select model checkpoints on the evaluation fold, artificially inflating concordance. We construct a rigorous benchmark on TCGA-PRAD (487 patients, 101 BCR events) using strict out-of-fold scoring over five-fold cross-validation repeated across five seeds. The choice of MIL aggregator (ABMIL, CLAM, TransMIL, PatchGCN) has little effect (C-index 0.61-0.64 with UNI2-h), while the feature extractor is the dominant factor (ResNet50 0.566 versus pathology foundation models up to 0.639). A clinical Cox model on grade, stage, and age reaches 0.687; no imaging-only model significantly outperforms it (p &gt; 0.10). We introduce Grade-Disentangled MIL (GD-MIL), a gated-attention MIL encoder trained with a gradient-reversal grade adversary that encourages the slide representation to be invariant to Gleason grade before late fusion with clinical variables. GD-MIL achieves C-index 0.704, significantly outperforming both the clinical baseline (delta-c = +0.029, p = 0.0005) and the best imaging-only model (delta-c = +0.062, p = 0.039), suggesting H&amp;E morphology contains prognostic information complementary to grade. A median risk split yields log-rank p &lt; 0.0001 separation in BCR-free survival (~20% vs ~70% at five years).</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09453v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dasari Naga Raju</dc:creator>
    </item>
    <item>
      <title>Tuning Dispatch Thresholds for Fixed Last-Mile Routes: A Simulation-Based Pareto Analysis of a Production Policy</title>
      <link>https://arxiv.org/abs/2606.09455</link>
      <description>arXiv:2606.09455v1 Announce Type: new 
Abstract: Many parcel networks dispatch vehicles on \emph{fixed routes} using a simple load-accumulation rule: a truck leaves the depot for a fixed route as soon as the volume (or item count) waiting for that route crosses a threshold. The threshold is usually parameterised as an affine function of route length, $\tau_r=\beta+\gamma\,d_r$, and the pair $(\beta,\gamma)$ is chosen once and frozen into production. This paper studies how good that frozen choice actually is, treating the question as a data-intensive, data-driven decision-making problem over a full month of real operational flow. Using a discrete-event simulator that replays the recorded arrival stream and reconstructs every trip, we sweep the $(\beta,\gamma)$ design space, evaluate the two competing objectives -- company operating cost and average parcel lead time -- and recover the Pareto frontier of efficient policies for two deployed variants (volume-triggered and item-count-triggered). The two policies turn out to be in strikingly different states of tune. The volume-threshold configuration lies on its own Pareto frontier: the simulator finds no $(\beta,\gamma)$ pair that strictly dominates it, so the deployed policy is \emph{already Pareto-efficient} -- an unusual positive audit result. The item-count configuration is the opposite: it is dominated by a concrete simulated configuration that is both faster and cheaper, and the available cost saving at equal lead time is about \num{5.0}\,\pct{}. We trace the item-count policy's inefficiency to a base that is too large and a length coefficient that is too small for the deployed truck capacity, and show that a \emph{steeper} threshold -- lower base, higher slope -- is preferable. Because the remedy is a two-scalar reconfiguration, the analysis converts directly into an actionable, zero-capital recurring saving.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09455v1</guid>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Alexander Ponomarenko, Ilya Antonov</dc:creator>
    </item>
    <item>
      <title>Breaking the Tokenizer Barrier: On-Policy Distillation across Model Families</title>
      <link>https://arxiv.org/abs/2606.09456</link>
      <description>arXiv:2606.09456v1 Announce Type: new 
Abstract: On-Policy Distillation (OPD) has become a core technique in the post-training of Large Language Models (LLMs) for transferring knowledge from domain experts to student models. However, existing OPD distillation methods require teacher and student models to share the same tokenizer, restricting the applicability of OPD within the model series. Current mainstream practice typically employs Supervised Fine-Tuning (SFT) on teacher-generated responses for cross-tokenizer distillation, which fails to capture the rich knowledge embedded in the teacher's probability distribution. In this work, we enable the standard on-policy distillation method to operate across model families, ensuring that high-fidelity token-level signals can propagate across different tokenizers with a precise token-mapping algorithm. Extensive experiments show that cross-tokenizer OPD is significantly more compute-efficient than baselines on various benchmarks. Our results unlock a broader range of teacher-student pairs for OPD, opening up new avenues for adapting and enhancing interactions between LLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09456v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yifan Niu, Han Xiao, Dongyi Liu, Zelong Wang, Dihong Gong, Yasheng Wang, Jia Li</dc:creator>
    </item>
    <item>
      <title>$\omega$-EVA: Envision, Verify, and Act with Latent Interactive World Models</title>
      <link>https://arxiv.org/abs/2606.09457</link>
      <description>arXiv:2606.09457v1 Announce Type: new 
Abstract: Embodied policies typically map current observations directly to actions, leaving candidate-action consequences implicit. World models provide predictive supervision, representations, or external simulation, but rarely let a policy inspect the imagined consequence of its own proposal before acting. We introduce $\omega$-EVA, a latent interactive world model that realizes an Envision--Verify--Act loop for embodied action generation. Its three-stage framework learns action-conditioned latent dynamics, trains a language-conditioned flow policy on dynamics-aware visual representations, and feeds the policy's proposal back through the world model. A tri-branch refiner jointly reasons over the current state, proposal-conditioned future, and proposed action to produce the final action chunk. Because consequence reasoning remains in latent feature space, $\omega$-EVA avoids generating future videos at inference. Evaluations across diverse single-arm, bimanual, long-horizon, and perturbed simulation settings show that the complete interaction pipeline consistently improves the proposal policy, while latent diagnostics indicate meaningful action-conditioned future structure. With approximately 1.2B parameters and no additional robot-data pretraining, $\omega$-EVA demonstrates a compact and competitive performance--scale--data trade-off, making the world model an active action-feedback module rather than a passive predictor.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09457v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Zhenguo Sun, Yu Sun, Hande Huang, Alois Knoll</dc:creator>
    </item>
    <item>
      <title>AbstRAG: Learning to Abstract for Retrieval Problems</title>
      <link>https://arxiv.org/abs/2606.09459</link>
      <description>arXiv:2606.09459v1 Announce Type: new 
Abstract: Retrieval-augmented generation often fails when the query, the document evidence, and the user's intent are expressed at different levels of abstraction. A query may ask about a class, a relation, or an event, while the document only states specific instances, indirect framings, or scoped formulations. We define this mismatch as an abstraction gap: the minimal set of typed assumptions required to align query intent with the available evidence. To close this gap, we introduce AbstRAG, which treats abstraction as an explicit retrieval object. AbstRAG decomposes the query--evidence gap into expression, conceptual, intent--evidence, and event-type components, and scores relevance by combining match quality, a query-independent utility prior, and the cost of the required bridges. Its central mechanism is reflective refinement: a critic diagnoses retrieval failures, localizes the failed abstraction operator, proposes a minimal stage-specific patch, and accepts the patch only under sufficiency and compression controls. Across three within-document retrieval benchmarks against seven baselines, AbstRAG outperforms on nDCG@10 in 18 of 21 paired-bootstrap contrasts and improves generation accuracy by 1.9%, 5.2%, and 4.0% across the three benchmarks; ablations confirm that reflective refinement drives most of the retrieval gain and the compression control alone reduces over-expansion false positives from 73.7% to 0% on a stress slice.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09459v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lei Xu, Xin Quan, Daniel Pedronette, Andr\'e Freitas</dc:creator>
    </item>
    <item>
      <title>A 65-nm Privacy-Preserving Neuromorphic Encoder With 7.13-nJ Efficiency, 2.38-Mb/mm^2 Item-Memory Density, and Federated Learning Support</title>
      <link>https://arxiv.org/abs/2606.09460</link>
      <description>arXiv:2606.09460v1 Announce Type: new 
Abstract: The increasing demand for privacy-preserving personal data analytics in smart assistants, wearable health monitors, and context-aware systems calls for hardware that is both energy-efficient and secure. This work presents a 65-nm privacy-preserving neuromorphic encoder that leverages transistor-level process variation as physically unclonable entropy for hyperdimensional computing. The proposed 2T-2T entropy cell enables compact, device-specific, and write-free item memory, allowing privacy-preserving bio-signal encoding without storing random basis vectors in conventional memory. The fabricated prototype achieves 7.13 nJ per encoding, 2.38 Mb/mm^2 item-memory density, 76.44 nJ per prediction, and 357.32 nJ per training update. It also supports in-situ decision-making, continual learning, and federated learning for multi-user deployment and cold-start personalization. Evaluations across bio-signal datasets demonstrate 93.2% accuracy on EMG and 96.1% accuracy on UCI-HAR, while reducing hypervector dimensionality by 14.3x compared with binary hyperdimensional computing. These results demonstrate an energy-efficient and privacy-preserving neuromorphic hardware platform for secure edge biomedical intelligence.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09460v1</guid>
      <category>cs.AR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Boyang Cheng, Jianbo Liu, Steven Davis, Zephan M. Enciso, Likai Pei, Xueji Zhao, Muya Chang, Ningyuan Cao</dc:creator>
    </item>
    <item>
      <title>H2HMem: A Multimodal Memory Benchmark for Agents in Human-Human Interactions</title>
      <link>https://arxiv.org/abs/2606.09461</link>
      <description>arXiv:2606.09461v1 Announce Type: new 
Abstract: Large language model agents are increasingly deployed in human-human interaction settings, such as meeting assistants and clinical documentation systems, where they must observe conversations and retain information for downstream queries. Unlike traditional human-assistant settings, these environments are inherently multimodal, involve complex discourse phenomena such as anaphora and deixis, and contain asynchronous or conflicting information from multiple participants. However, existing memory benchmarks largely focus on single-user, text-only interactions, failing to capture these challenges. To address this gap, we introduce H2HMem, a Human-to-Human Multimodal Memory Benchmark for evaluating memory capabilities in complex human-human interactions. H2HMem includes both dyadic and multi-party conversations with multimodal information streams, and evaluates agents along three dimensions: memory recall, reasoning, and application. Experiments with advanced agents reveal substantial limitations in constructing, retaining, and utilizing memories across modalities, participants, and sessions, highlighting substantial room for improvement in next-generation LLM agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09461v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shiping Zhu, Yibo Yang, Zhengyang Wang, Tiancheng Shen, Dandan Guo, Ming-Hsuan Yang</dc:creator>
    </item>
    <item>
      <title>The Changing Global Division of Labor in Software: Emergence and Diffusion of New Programming Skills across IT Hubs</title>
      <link>https://arxiv.org/abs/2606.09463</link>
      <description>arXiv:2606.09463v1 Announce Type: new 
Abstract: With the rise of new industries, often new jobs emerge. Evolutionary Economic Geography and in particular Industry Life Cycle perspectives predict that these activities first emerge in a limited number of cities to then diffuse to other locations as job descriptions become more standardized. Here, we focus on a particularly important new industry: software development, an activity that is economically important, quickly changing, and has a pronounced spatial concentration in a small number of global IT hubs. We use an online database of over 60 million questions and answers about problems in software development that yields a longitudinal dataset of 237 software skills. By geo-locating 3 million posting users at regular intervals, we link these skills to cities worldwide. We find that, in spite of its digital nature, the software industry exhibits similar spatial regularities as previously observed in more traditional sectors. First, cities diversify into skills that are related to their existing ones. Second, new skills first emerge in cities with large and diversified software sectors, and later diffuse -- mostly unhindered by geographical distance -- to smaller cities specialized in closely related skills. We find suggestive but limited support for a windows of locational opportunity account: although even brand-new skills still emerge first in cities with strong prior specialization in related skills, concentrations of related activities impact less the emergence of new skills than the diffusion of existing ones.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09463v1</guid>
      <category>cs.SI</category>
      <category>econ.GN</category>
      <category>q-fin.EC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Johannes Wachs, Xiangnan Feng, Simone Daniotti, Frank Neffke</dc:creator>
    </item>
    <item>
      <title>DECSELFMASK: Leveraging Unlabeled Text via Self-Relevance-Guided Masking for Decoder-Only Classification</title>
      <link>https://arxiv.org/abs/2606.09466</link>
      <description>arXiv:2606.09466v1 Announce Type: new 
Abstract: Classification tasks require annotated data, which can often be expensive, time-consuming, or even unfeasible to collect. This is the case of the medical domain, where large datasets often have few annotated examples. To address this, we propose DecSelfMask (Decoder Self-learning by Masking), an approach to enhance decoder-only performance on classification tasks. We build on common self-learning approaches by leveraging a model to create training examples from unlabeled data to propose a novel relevance-guided masking strategy. We use relevance attribution methods to determine what portions of unannotated texts are relevant for a task. We then create self-supervised training examples by masking out those portions, training the model to reconstruct them via next-token-prediction. We hypothesize that those examples convey knowledge about the structure and semantics of unannotated data that can be useful for downstream performance. We test our approach on 136 tasks from a collection of 1.9M clinical notes from an Italian hospital. We quantify DecSelfMask's impact on downstream tasks on 5 models of different scales and families, including a probing analysis. Experiments show consistent gains, outperforming standard supervised fine-tuning approaches (+19.9 points in Macro F1), synthetic label generation (+12.5), and continual pretraining (+6.3), as well as common baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09466v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Pietro Ferrazzi, Matteo Merler, Giovanni Bonetta, Alberto Lavelli, Bernardo Magnini</dc:creator>
    </item>
    <item>
      <title>A Finetuned SpeechLLM for Joint Multi-Granular L2 Assessment and Natural-Language Rationales</title>
      <link>https://arxiv.org/abs/2606.09470</link>
      <description>arXiv:2606.09470v1 Announce Type: new 
Abstract: Automated L2 speech assessment can assign proficiency labels, but often lacks interpretability. We propose a rubric-guided SpeechLLM for multi-aspect, multi-granular assessment, trained with a hybrid objective combining supervised fine-tuning and Bounded Direct Preference Optimization. The model jointly predicts ordinal labels at the sentence-level (accuracy, fluency, prosody), word/phoneme-level accuracy, and generates a natural-language rationale in the same response. On SpeechOcean762, our approach matches or outperforms single-granularity models while remaining competitive with prior approaches. We analyze rationale reliability along two axes: self-consistency with model predictions and alignment with ground-truth labels, using sentiment consistency (plausibility) and mention-based agreement (faithfulness). Rationales are plausible at the sentence level, but faithfulness degrades at the word/phoneme level: references are sparse and weakly aligned with token-level labels.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09470v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aditya Kamlesh Parikh, Cristian Tejedor-Garcia, Catia Cucchiarini, Helmer Strik</dc:creator>
    </item>
    <item>
      <title>Escaping the KL Agreement Trap in On-Policy Distillation</title>
      <link>https://arxiv.org/abs/2606.09471</link>
      <description>arXiv:2606.09471v1 Announce Type: new 
Abstract: On-policy distillation (OPD) provides dense token-level supervision by asking a teacher to score student-generated rollouts. However, when the student drifts into an unrecoverable prefix, the teacher may locally agree with the degraded state, producing low reverse KL but little corrective training signal. We identify this persistent regime as a low-KL agreement trap. Further analyses show that tokens during and after such traps produce less useful supervision signals. We propose KAT (KL Agreement Trap Termination), an online OPD termination rule that detects persistent low-KL agreement with a dynamic training-adaptive threshold. By filtering weak supervision from degenerate agreement, KAT improves avg@k accuracy by 2.66% and pass@k by 3.43% across four mathematical benchmarks, while reducing average rollout length by 59.73%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09471v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haoran Xin, Anhao Zhao, Ying Sun, Jin Li, Xiaoyu Shen, Hui Xiong</dc:creator>
    </item>
    <item>
      <title>Training-Free Generalized Few-Shot Segmentation through Open-Vocabulary Semantic Arbitration</title>
      <link>https://arxiv.org/abs/2606.09474</link>
      <description>arXiv:2606.09474v1 Announce Type: new 
Abstract: Generalized Few-Shot Semantic Segmentation (GFSS) has traditionally been approached as a representation-learning problem, requiring task-specific adaptation to incorporate novel classes from limited support examples. Recent foundation models, however, already exhibit strong open-vocabulary recognition and segmentation capabilities, raising a different question: can GFSS be solved through inference-time coordination of frozen semantic priors rather than parameter adaptation? We answer this question with Open-V, a training-free GFSS framework that combines Segment Anything (SAM3) Promptable Concept Segmentation (PCS) with a K-shot CLIP support centroid through calibrated per-pixel semantic arbitration. OpenV introduces no trainable components and supports arbitrary semantic categories at inference time. Beyond segmentation performance, our study contributes three broader findings. First, we show that support information can be incorporated through inference-time semantic grounding, and that its contribution increases as foundation-model text priors weaken on label-disjoint vocabularies. Second, we identify a reproducibility confound in foundationmodel segmentation, demonstrating that preprocessing and evaluation-space mismatches can silently distort reported performance. Finally, we validate Open-V across PASCAL5i, COCO-20i, and ADE-OW, showing that training-free coordination of foundation-model priors generalizes across both conventional GFSS and open-vocabulary evaluation settings. On PASCAL-5i (1-shot), Open-V attains base/novel/harmonic mIoU of 78.4/77.5/77.9, without GFSS-specific training surpassing the strongest trained baseline by +17.7 HM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09474v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Silas Kwabla Gah, Ebenezer Owusu</dc:creator>
    </item>
    <item>
      <title>Emergent alignment and the projectability of ethical personas</title>
      <link>https://arxiv.org/abs/2606.09475</link>
      <description>arXiv:2606.09475v1 Announce Type: new 
Abstract: Work on `emergent misalignment' shows that finetuning LLMs on narrow tasks can induce broadly misaligned behavior. This supports the `persona selection' (PSM) hypothesis: during pre-training, LLMs learn to simulate different characters and perspectives, which can be elicited and refined during post-training. This paper investigates the converse phenomenon, `emergent alignment', and uses it to support and refine the PSM and motivate a novel desideratum for alignment. We finetune a helpful-only model on broad and narrow safety tasks. To create SFT samples, we follow the `Constitutional AI' (CAI) approach and use four constitutions which encode reasonable alignment strategies: deontology, consequentialism, virtue ethics, and aligning AIs as subordinate to human authority. For each of those models, we show that finetuning on two narrow safety sub-categories reliably induces emergent alignment over a representative set of general safety categories, and on safety subcategories that we directly filtered-out of the data sets used for narrow alignment. To test the `PSM' using a more fine-grained evaluation, we used a multidimensional `ethical persona' diagnostic. For each constitutionally finetuned (broad/narrow) model, we evaluate how well their behavior matches their expected signature profile. Our results show that our CAI models acquire their expected ``ethical persona'' -- e.g., the model narrowly fine-tuned on SFT samples created using the consequentialist constitution agrees significantly more with utilitarian than deontological beliefs. Yet our coarse and fine-grained evaluations show that there are significant differences across our (broad/narrow) finetuned CAI models in how well they project. We conclude that alignment strategies should be evaluated, not just on their (in-distribution) general safety performance, but also specifically on their degree of projectability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09475v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Guillermo Del Pinal, Youngchan Lee, Cameron McNamara, Alejandro Perez Carballo</dc:creator>
    </item>
    <item>
      <title>Goal Sets, Not Goal States: Queryable Robot Goals through Goal-Set Hindsight Relabeling</title>
      <link>https://arxiv.org/abs/2606.09476</link>
      <description>arXiv:2606.09476v1 Announce Type: new 
Abstract: Hindsight relabeling usually turns achieved future states into exact goals, which can overconstrain offline robot learning when task success depends only on a subset of the state. We propose Goal-Set Hindsight Relabeling (GS-HER), a predicate-level generalization of HER in which achieved states certify query-defined goal sets rather than singleton goal states. A binary query specifies which variables define success, making the goal predicate an inference-time input while leaving the underlying offline GCRL algorithm unchanged. Across OGBench tasks and five offline goal-conditioned learners, GS-HER improves performance when full-state goals are bottlenecked by nuisance dimensions and turns hindsight relabeling into a reusable goal interface: one checkpoint can answer multiple robot goal predicates without retraining.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09476v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Carlos V\'elez Garc\'ia, Miguel Cazorla, Jorge Pomares</dc:creator>
    </item>
    <item>
      <title>Efficient Minimal Solvers for Visual-Inertial Relative Pose Estimation in Multi-Camera Systems</title>
      <link>https://arxiv.org/abs/2606.09477</link>
      <description>arXiv:2606.09477v1 Announce Type: new 
Abstract: Estimating the relative poses of multi-camera systems is a fundamental problem in computer vision, with critical applications in autonomous vehicles, mobile devices, and unmanned aerial vehicles (UAVs). However, existing solutions often suffer from high computational complexity or rely on an excessive number of point correspondences, limiting their real-world applicability. To address these limitations, we propose two efficient minimal solvers for estimating the relative poses of multi-camera systems using a novel parameterization. The first solver leverages the vertical direction prior provided by Inertial Measurement Units (IMUs), while the second utilizes the rotation axis direction prior from IMUs. Our methods require only four point correspondences and reduce the problem of multi-camera relative pose estimation to solving a univariate 6th-degree polynomial, a significant improvement over existing approaches, which typically involve 8th-degree polynomials. This reduction in computational complexity and correspondence requirements makes our solvers particularly effective when integrated into RANSAC frameworks, demonstrating strong potential for visual odometry applications. Through rigorous evaluations on synthetic data and the KITTI benchmark, our methods achieved superior computational efficiency and competitive accuracy compared to state-of-the-art algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09477v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tao Li, Zhenbao Yu, Banglei Guan, Jianli Han, Weimin Lv</dc:creator>
    </item>
    <item>
      <title>Optical Music Recognition for Real-World Manuscripts with Synthetic Data</title>
      <link>https://arxiv.org/abs/2606.09479</link>
      <description>arXiv:2606.09479v1 Announce Type: new 
Abstract: Optical Music Recognition (OMR) has seen major progress in model design, with end-to-end methods now capable of recognising notation at all levels of complexity. However, the impact of this progress has been limited by the visual domains of available training datasets, which are largely born-digital. Existing large collections of sheet music in libraries and other heritage institutions contain predominantly manuscripts, whose visual domains are highly diverse and different, so existing OMR systems fail when applied in the real world. These institutions are often resource-constrained, so large in-domain datasets cannot be expected. We provide a first baseline on real-world manuscripts with complex piano notation in the resource-constrained scenario. Using fine-grained music notation graph (MuNG) annotations and the Smashcima synthesis tool, we then show that while some direct transcriptions of in-domain data remain essential, domain adaptation using synthetic musical manuscript images brings significant improvement. Furthermore, the symbols used do not need to be in-domain, so the expensive fine-grained annotation can be avoided. We thus bring OMR closer to one of its stated goals: preserving and promoting musical cultural heritage.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09479v1</guid>
      <category>cs.CV</category>
      <category>cs.DL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ji\v{r}\'i Mayer, Martina Dvo\v{r}\'akov\'a, Vojt\v{e}ch Dvo\v{r}\'ak, Mark\'eta Herz\'anov\'a Vlkov\'a, Filip B\'im, Pavel Pecina, Samuel \v{S}omorjai, Petr \v{Z}abi\v{c}ka, Jan Haji\v{c} jr</dc:creator>
    </item>
    <item>
      <title>Loss-Guided Adaptive Scale Refinement for Molecular Force Prediction</title>
      <link>https://arxiv.org/abs/2606.09480</link>
      <description>arXiv:2606.09480v1 Announce Type: new 
Abstract: Molecular systems involve interactions across multiple spatial scales, from local coordination and short-range perturbations to long-range electrostatic and solvent-mediated effects. However, most molecular representation learning methods rely on manually predefined scales, and the task-optimal modeling scale may not coincide with these fixed levels. This study introduces a loss-guided adaptive scale refinement framework for molecular force prediction, treating predefined scales as initial anchors and discovering task-effective resolutions through interpolation, routing, differentiable scale updates, and scale pool refinement.
  Using a NaCl aqueous ionic system as a minimal testbed, this study constructs short-scale and long-range force prediction branches and analyzes their complementarity. Oracle hard routing reduces the overall force MAE from 399.65 to 382.67, while continuous oracle interpolation further reduces it to 380.96. In close-contact regimes with nearest-ion distance below 0.6 nm, the close-contact MAE decreases from 327.22 to 260.51. A minimal scale pool update experiment shows that starting from endpoint anchors {0,1}, loss-guided updates automatically generate intermediate scales and recover most of the continuous oracle performance. The final updated scale pool {0,0.125,0.25,0.375,0.5,0.75,1} achieves an overall MAE of 381.23.
  These results support adaptive scale refinement as a promising direction for molecular representation learning, especially when fixed-scale modeling is insufficient.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09480v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Limin Yu</dc:creator>
    </item>
    <item>
      <title>Memory Beyond Recall: A Dual-Process Cognitive Memory System for Self-Evolving LLM Agents</title>
      <link>https://arxiv.org/abs/2606.09483</link>
      <description>arXiv:2606.09483v1 Announce Type: new 
Abstract: Long-term memory for an LLM agent is more than retrieving the right passage at the right time. Current memory systems collapse belief revision, causal coupling, and cross-domain abstraction into a single retrieval surface tuned for surface recall, and consequently struggle on implicit personalisation that requires reasoning over how a user has evolved. We propose DCPM, which reorganises agent memory along a cognitive capability hierarchy ascending from raw inputs and atomic facts, through diachronic belief trajectories and identity, to domain schemas, latent intentions and cross-domain patterns. The hierarchy is driven by two processes inheriting the architectural split of dual-process theory: a synchronous daytime writer (System1) that records belief revisions as doubly linked supersedes chains, and an asynchronous nighttime engine (System2) that induces schemas and intentions and sweeps for cross-domain collisions abstracted into higher-level core schemas. On LongMemEval, PersonaMem and PersonaMem-v2, enabling System2 contributes most where the benchmark rewards implicit cross-session inference (up to +5.20 on PersonaMem-v2) and least on span recall, matching the architectural prediction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09483v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tianxiang Fei, Mingyang Song, Mao Zheng, Xiang Yu</dc:creator>
    </item>
    <item>
      <title>Detecting Differences Is Not Understanding Structure: Large Language Models Fail at Graph Isomorphism</title>
      <link>https://arxiv.org/abs/2606.09484</link>
      <description>arXiv:2606.09484v1 Announce Type: new 
Abstract: Large language models (LLMs) have shown impressive performance on diverse reasoning tasks, yet their capacity for structural reasoning in graphs remains unclear. We investigate whether LLMs can genuinely understand graph isomorphism -a fundamental problem in graph theory. While LLMs achieve near-perfect accuracy on isomorphism detection, we show this performance is illusory. When identical graphs are presented with permuted node labels, LLMs fail to identify their isomorphism. This finding suggests that LLMs exploit patterns rather than reasoning about abstract graph structure. Since permutation invariance is a fundamental requirement for valid structural reasoning, these results indicate that success on graph reasoning benchmarks should not be interpreted as evidence of genuine topological understanding.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09484v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kumar Thushalika, Sukumar Kishanthan, Asela Hevapathige</dc:creator>
    </item>
    <item>
      <title>LangRetrieval: Language-Guided Self-Evolving Satellite-to-Radar Retrieval via CSI-Driven Reward</title>
      <link>https://arxiv.org/abs/2606.09486</link>
      <description>arXiv:2606.09486v1 Announce Type: new 
Abstract: Satellite-to-radar (S2R) retrieval estimates ground radar precipitation from geostationary satellite observations, providing a critical solution for precipitation monitoring in radar-sparse regions. However, S2R retrieval is intrinsically ill-posed: similar cloud-top radiances can correspond to distinct precipitation regimes, storm organizations, and surface intensities, which are difficult to uniquely determine the underlying meteorological state from local spectral cues alone. Meteorological semantics offer complementary scene-level information that can help resolve this ambiguity. Yet existing static semantic conditioning is often insufficient, as externally predefined semantics cannot adapt to dynamic convective scenes or align with retrieval objectives. To this end, we propose LangRetrieval, a language-guided conditional flow matching (CFM) framework that establishes a closed-loop optimization mechanism between meteorological semantics and retrieval accuracy. Specifically, LangRetrieval consists of two core components: (i) Semantic Warm-up: structured meteorological attributes are injected into the CFM backbone through cross-attention conditioning, enabling continuous semantic guidance throughout the generation trajectory; and (ii) Self-Evolving Semantic Optimization: a lightweight attribute policy is first initialized from vision-language model annotations and subsequently refined via Group Relative Policy Optimization (GRPO) using multi-threshold Critical Success Index (CSI) rewards, enabling semantic generation to evolve directly toward improved retrieval accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09486v1</guid>
      <category>cs.MM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chunlei Shi, Junming Hou, Yi-Lin Wei, Jiong Wang, Yecheng Zhang, Yichao Dong, Wenqi Ren, Dan Niu</dc:creator>
    </item>
    <item>
      <title>LLM-Orchestrated Conformance Checking in Stroke Care Without Computer-Interpretable Guidelines</title>
      <link>https://arxiv.org/abs/2606.09489</link>
      <description>arXiv:2606.09489v1 Announce Type: new 
Abstract: Objective: Conformance checking in healthcare seeks to assess whether patient care pathways adhere to clinical guidelines. However, its practical application often depends on the availability of formal, machine-interpretable representations of guidelines, such as Computer-Interpretable Guidelines (CIGs), which are seldom available in real-world clinical settings.
  Methods: This work introduces a modular framework based on the orchestration of Large Language Models (LLMs) to support medical conformance checking directly from unstructured clinical and guideline texts, without requiring predefined CIGs. The proposed architecture integrates multiple LLMs and supporting components to extract patient traces from clinical discharge letters, identify normative rules from textual clinical guidelines, translate these rules into executable scripts, and compute a Trace Conformance Indicator to quantify compliance within the event log.
  Results: The framework was implemented and evaluated in the stroke care domain at the neurological ward of Alessandria Hospital. Hundreds of patient traces were automatically extracted from hospital data and assessed against 50 rules derived from the reference guideline. The analysis showed that more than 86\% of the available traces were conformant.
  Conclusion: The results demonstrate the feasibility of using orchestrated LLMs for practical healthcare conformance analysis. At the same time, the study provides evidence of a high level of adherence to stroke care guidelines at Alessandria Hospital.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09489v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Giorgio Leonardi, Stefania Montani, Manuel Striani, Alessandro Canessa, Delfina Ferrandi</dc:creator>
    </item>
    <item>
      <title>ContextShift: A Controlled Benchmark for Context Dependence in Object Detection</title>
      <link>https://arxiv.org/abs/2606.09495</link>
      <description>arXiv:2606.09495v1 Announce Type: new 
Abstract: Modern object detectors achieve strong performance on standard benchmarks, yet their robustness to contextual variation remains insufficiently understood. Prior evaluations largely rely on aggregate metrics such as AP on uncontrolled distribution shifts, which can obscure how performance degrades under context change. We introduce ContextShift, a controlled benchmark that systematically manipulates object--context relationships while preserving object appearance. Built on COCO 2017, it isolates context as an independent variable through geometric transformations and synthetic and natural background substitutions, including a continuous compatibility axis based on normalized pointwise mutual information (NPMI). Across diverse detector architectures, we observe a consistent degradation pattern: false negatives increase by up to 227% and prediction volume decreases by up to 44%, while false positives remain stable or decline. This suppression behavior is not captured by aggregate metrics such as AP, which can mask substantial recall loss and changes in prediction dynamics. Further analysis suggests that degradation is driven less by reduced confidence than by a reduced formation of valid detection candidates. Moreover, performance along the statistical compatibility axis is non-monotonic, peaking at intermediate NPMI and degrading toward both extremes, indicating that statistical co-occurrence does not correlate linearly with effective visual context. Finally, we show that context-aware augmentation improves robustness: every augmented variant outperforms the dataset-only baseline on both original and manipulated test images, partially recovering performance lost to prediction-suppression failures by exposing models to object--context decoupling during training.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09495v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dan Zlotnikov, Alex Lazarovich, Ohad Ben-Shahar</dc:creator>
    </item>
    <item>
      <title>Self-Harness: Harnesses That Improve Themselves</title>
      <link>https://arxiv.org/abs/2606.09498</link>
      <description>arXiv:2606.09498v1 Announce Type: new 
Abstract: The performance of LLM-based agents is jointly shaped by their base models and the harnesses that mediate their interaction with the environment. Because different models exhibit distinct behaviors, effective harness design is inherently model-specific. Yet agent harnesses are still largely engineered by human experts, a paradigm that scales poorly as modern LLMs become increasingly diverse and rapidly evolving. In this paper, we introduce Self-Harness, a new paradigm in which an LLM-based agent improves its own operating harness, without relying on human engineers or stronger external agents. We operationalize Self-Harness as an iterative loop with three stages: Weakness Mining, which identifies model-specific failure patterns from execution traces; Harness Proposal, which generates diverse yet minimal harness modifications tied to these failures; and Proposal Validation, which accepts candidate edits only after regression testing. We instantiate Self-Harness on Terminal-Bench-2.0 using a minimal initial harness and three base models from diverse families: MiniMax M2.5, Qwen3.5-35B-A3B, and GLM-5. Across all three models, Self-Harness consistently improves performance, with held-out pass rates increasing from 40.5% to 61.9%, 23.8% to 38.1%, and 42.9% to 57.1%, respectively. Qualitative analyses further show that Self-Harness does not simply add generic instructions, but effectively turns model-specific weaknesses into concrete, executable harness changes. These results suggest a path toward LLM-based agents that are not merely shaped by their harnesses, but can also participate in reshaping them.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09498v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hangfan Zhang, Shao Zhang, Kangcong Li, Chen Zhang, Yang Chen, Yiqun Zhang, Lei Bai, Shuyue Hu</dc:creator>
    </item>
    <item>
      <title>Targeting World Models to Compromise Robot Learning Pipelines</title>
      <link>https://arxiv.org/abs/2606.09499</link>
      <description>arXiv:2606.09499v1 Announce Type: new 
Abstract: World models have recently seen a rapid growth in both their popularity and capability as more data efficient tools for generating robot training data or simulating real world environments, with many works proposing their integration into the robot learning pipeline. While highly practical, in this work we demonstrate that world models introduce a uniquely stealthy and effective data poisoning entry point into the robot learning supply chain that can result in the deployment of unsafe or otherwise compromised robotic policies despite training on seemingly safe ground truth training data. In contrast to traditional data poisoning techniques which directly implant dangerous trajectories into sold or uploaded datasets, our novel attack methods inject malicious prompts or compromising transition dynamics into visibly safe teleoperated datasets which are only activated once fed through a world model as input. This can result in the generation of synthetic, dangerous robot training trajectories and subsequently unsafe or compromised robot policies. We demonstrate the effectiveness of our attacks against both state of the art action conditioned and text conditioned world models, showing a full end-to-end backdoor on a downstream DRL policy and a proof-of-concept for the VLA setting. Overall these findings necessitate research into more secure world models and reevaluating their position within the robot learning supply chain.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09499v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ethan Rathbun, Ahmed Agha, Saaduddin Mahmud, Christopher Amato, Alina Oprea, Eugene Bagdasarian</dc:creator>
    </item>
    <item>
      <title>Deterministic Integrity Gates for LLM-Assisted Clinical Manuscript Preparation: An Auditable Biomedical Informatics Architecture</title>
      <link>https://arxiv.org/abs/2606.09500</link>
      <description>arXiv:2606.09500v1 Announce Type: new 
Abstract: Objective. Large language models (LLMs) increasingly draft clinical research manuscripts, but their fluency can hide fabricated citations, numbers that drift from source tables, and unmet reporting-guideline items. Existing tools generate text without verifying it, and self-critique inherits the blind spots that produce confident fabrication. We describe an architecture that pairs generation with verification. Methods. The design rests on three principles: decompose the workflow into self-contained skills, gate every stage transition with halt-on-failure, and resolve each integrity question with the cheapest sufficient mechanism -- a deterministic, re-executable check where one suffices, and a prose-level probe only where interpretation is unavoidable. This determinism-where-possible split, organized as an integrity-gate taxonomy, is the core contribution. It is realized as MedSci Skills, an open-source toolkit of 43 skills coordinated by one orchestrator, whose deterministic tier comprises 21 standard-library detectors. We evaluate it on three reproducible public-dataset pipelines (STARD, PRISMA, STROBE) and a seeded-defect ablation. Results. Across the three pipelines every content-hash manifest verified clean and the gates surfaced real defects. On 27 identical injected defects the deterministic gates detected all 27 with no false positives on the matched clean fixtures, whereas a generic single-prompt LLM reviewer detected 11, its misses concentrated in generated-code, bibliography-internal, and style defects the prose does not expose. Conclusion. Determinism-where-possible verification yields an auditable, re-executable trail that exposes the evidence a human needs to check an LLM-assisted manuscript -- feasibility and reproducibility evidence, not a claim of human-competitive quality, which a separate blinded study addresses. MedSci Skills is MIT-licensed and archived (v3.8.0).</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09500v1</guid>
      <category>cs.AI</category>
      <category>cs.DL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yoojin Nam, Jinhoon Jeong, Namkug Kim</dc:creator>
    </item>
    <item>
      <title>Second-order bulk-surface splitting for the wave equation with kinetic boundary conditions</title>
      <link>https://arxiv.org/abs/2606.09502</link>
      <description>arXiv:2606.09502v1 Announce Type: new 
Abstract: This paper is devoted to the numerical analysis of a second-order bulk--surface splitting scheme for the semi-linear wave equation with kinetic boundary conditions. The construction is based on the interpretation of the equations as coupled system and the implementation of different difference formulae for the discrete states depending on their exact position in the system equations. This results in a 4-step scheme which decouples bulk and surface dynamics. We prove energy stability and second-order convergence under a weak CFL condition and validate these results also numerically.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09502v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>R. Altmann, R. Morandin</dc:creator>
    </item>
    <item>
      <title>Guaranteed Fast Implementation of the Split Covariance Intersection Filter: Nested Newton Method Thanks to the Fourth-Order Convexity of w-Optimization</title>
      <link>https://arxiv.org/abs/2606.09505</link>
      <description>arXiv:2606.09505v1 Announce Type: new 
Abstract: The split covariance intersection filter (Split CIF) is a useful tool for general data fusion and has the potential to be applied in a variety of engineering tasks. The w-optimization problem involved in the Split CIF concerns the performance and implementation efficiency of the Split CIF. It is known that the w-optimization problem enjoys the desirable property of convexity (or more clearly, the second-order convexity in this paper's context). This paper proves that the w-optimization problem further enjoys a more desirable property namely the fourth-order convexity, thanks to which a guaranteed fast implementation of the Split CIF can be realized. The new implementation is coined as the nested Newton method, which is also presented in this paper.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09505v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hao Li</dc:creator>
    </item>
    <item>
      <title>Prisma-World: Camera-Controllable Multi-Agent Video World Model</title>
      <link>https://arxiv.org/abs/2606.09507</link>
      <description>arXiv:2606.09507v1 Announce Type: new 
Abstract: Video world models have made rapid progress in generating controllable visual experiences, but most of them still simulate the world from a single observer. Extending such models to multiple agents raises a central challenge: if each agent's future state is generated independently, overlapping views may instantiate different versions of the same scene, leading to inconsistent objects, layouts, and appearances across agents. Conventional camera conditioning controls individual trajectories, but it does not explicitly couple the generation of views that should agree under shared scene geometry. We introduce Prisma-World, a camera-controllable multi-agent world model that formulates multi-agent generation as a joint geometry-aware denoising process for cross-view consistency. Prisma-World processes all agent videos within one full-attention sequence, uses a multi-agent RoPE design to distinguish agent identities while preserving synchronized temporal coordinates, and injects relative camera geometry into attention to bias overlapping viewpoints toward shared scene evidence. To further strengthen multi-view consistency and enhance global spatial perception, we augment our framework with an overlap-decaying curriculum training paradigm alongside minimap-conditioned structural guidance. To facilitate the training and evaluation of multi-agent models, we introduce PrismaDataset, a large-scale UE5 dataset with panoramic acquisition across diverse scenes, composable multi-agent view groups with flexible agent counts and complex camera trajectories, and precise camera/action annotations for consistency training and evaluation. Experiments show that a single Prisma-World model can generate high-fidelity multi-agent videos with flexible agent numbers, camera controllability, improved cross-view consistency, and spatial grounding under minimap guidance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09507v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Huiqiang Sun, Zhan Peng, Size Wu, Kun Wang, Kang Liao, Dianyi Wang, Xingyu Zeng, Sheng Jin, Yangguang Li, Zhiguo Cao, Ziwei Liu, Wei Li</dc:creator>
    </item>
    <item>
      <title>From Rigid to Dynamic: Entropy-Guided Adaptive Inference for Long-Context LLMs</title>
      <link>https://arxiv.org/abs/2606.09508</link>
      <description>arXiv:2606.09508v1 Announce Type: new 
Abstract: Existing sparse attention and KV cache compression methods for long-context LLM inference typically apply fixed sparsity patterns or uniform budgets across all attention heads, overlooking the substantial variation in attention behavior among heads and contexts. We observe two distinct entropy patterns among attention heads: Rigid Heads, whose entropy stays near zero across input segments, and Dynamic Heads, whose entropy fluctuates significantly. Crucially, the distribution of these types is context-dependent and cannot be predetermined offline. We therefore propose EntropyInfer, a training-free framework that uses attention entropy to adaptively allocate compute at the granularity of individual heads and segments during prefilling. For decoding, we introduce a latent KV cache compression scheme that leverages generated output tokens, rather than prefill tokens alone, to identify and retain the most critical cache entries. Extensive experiments on Llama, Qwen and openPangu model series show that EntropyInfer consistently outperforms baselines including SnapKV, AdaKV, and CritiPrefill, achieving up to 2.39$\times$ end-to-end speedup beyond 100k tokens with minimal quality degradation compared to full attention. The code is released in https://github.com/SHA-4096/EntropyInfer.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09508v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhanchao Xu, Haoyang Li, Qingfa Xiao, Fei Teng, Chen Jason Zhang, Lei Chen, Qing Li</dc:creator>
    </item>
    <item>
      <title>Local Search on Vertex Coloring for Bipartite Graphs</title>
      <link>https://arxiv.org/abs/2606.09509</link>
      <description>arXiv:2606.09509v1 Announce Type: new 
Abstract: Local search is a well-known heuristic method used in optimization. In this thesis, we explore its capabilities on the vertex coloring problem, an $NP$-hard problem with relevance in both theoretical analysis and practical application. To recognize limitations in the applicability of local search of the vertex coloring problem, we analyze local search landscapes on differently-structured bipartite graphs. We identify structures that ensure only global optima can exist as well as ones that enable the existence of non-global local optima, showing that on general bipartite graphs, it is possible for local search to return arbitrarily bad results. Further, we analyze the capabilities of local search on graphs where a local optimum can be found. To do so, we introduce a gray-box local search mutation operator that removes less frequent colors with higher probability and prove that it finds an optimal coloring on complete bipartite graphs in an expected run time of $\Theta(n \log n)$. This is a drastic improvement to the exponential tun time of the black-box Random Local Search, showing that gray-box mutation operators can improve the run time of local search.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09509v1</guid>
      <category>cs.NE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Johanna Gasse</dc:creator>
    </item>
    <item>
      <title>Securing Self-supervised Data Curation for Foundation Models Robustness</title>
      <link>https://arxiv.org/abs/2606.09511</link>
      <description>arXiv:2606.09511v1 Announce Type: new 
Abstract: Self-supervised data curation provides a pathway to scaling and improving the generalization capabilities of machine learning models. By leveraging self-supervised learning (SSL) for data curation, the demand for massive training datasets required by foundation models can be effectively met. SSL greatly alleviates the costs associated with annotation and manual dataset curation while minimizing the need for human oversight. However, the integrity of SSL-curated datasets must be rigorously checked, as reliance on anonymous and unvetted external sources can substantially increase the risk of data poisoning. In this paper, we propose a Poisoned Data Detector (PDD), an active defense mechanism designed to ensure the integrity of SSL-curated datasets prior to foundation model training. PDDs are designed using a combination of the pretrained ImageBind model and traditional classifiers, including Random Forest (RF), k-Nearest Neighbors (KNN), Naive Bayes (NB), and Support Vector Machines (SVM). We rigorously evaluated PDDs using 176,200 images from three diverse datasets and three different adversarial attacks encompassing both in-distribution and out-of-distribution scenarios. Notably, SVM-PDD achieves superior performance for both in-distribution (Set3-Set5) and out-of-distribution (TrueFace and 140K RealFace) datasets. Our design demonstrates strong scalability and enables the rapid integration of new adversarial attack detectors through an ensemble approach.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09511v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sandeep Gupta, Roberto Passerone</dc:creator>
    </item>
    <item>
      <title>BUDDY: BUdget-Driven DYnamic Depth Routing for Adaptive Large Language Model Inference</title>
      <link>https://arxiv.org/abs/2606.09514</link>
      <description>arXiv:2606.09514v1 Announce Type: new 
Abstract: Large language models (LLMs) incur high inference cost due to their depth and parameter scale. Depth pruning can reduce latency by skipping redundant Transformer blocks, but existing methods (i) provide limited control under user-specific compute budgets and (ii) typically fix the routing path, failing to adapt as the context grows during decoding. We propose Buddy, a budget-driven dynamic depth routing framework. Buddy uses a lightweight Decision Module to score intermediate layers conditioned on the input and deterministically executes the top-k layers to satisfy a given budget. To support decode-time adaptation, Buddy reuses the first-layer KV cache as a low-overhead global context source and pools it together with the newest token representation before each routing decision. When no explicit budget is provided, an optional Budget Predictor estimates an input-dependent compute level to balance quality and efficiency. Experiments on Llama-family and Qwen models show that Buddy is competitive with strong static pruning baselines and often improves the accuracy-compute trade-off, while uniquely supporting strict budget control, decode-time rerouting, and multiple budgets within a single trained model.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09514v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuhua Zhou, Shaoqi Yu, Shichao Weng, Changhai Zhou, Mingze Yin, Fei Yang, Aimin Pan</dc:creator>
    </item>
    <item>
      <title>SwiftVR: Real-Time One-Step Generative Video Restoration</title>
      <link>https://arxiv.org/abs/2606.09516</link>
      <description>arXiv:2606.09516v1 Announce Type: new 
Abstract: Real-time video restoration (VR) for live streams requires high-resolution outputs under strict per-frame latency constraints. Existing one-step diffusion-based VR models remain difficult to deploy on consumer-grade GPUs due to two main bottlenecks: quadratic spatial attention at high resolutions and the latency-memory overhead of large video autoencoders. We present SwiftVR, a streaming one-step generative VR framework that reduces both bottlenecks under a causal chunk-wise protocol. For attention, mask-free shifted-window self-attention gathers each spatial window into a dense tensor via deterministic indexing, keeping all attention calls on the dense scaled dot-product attention path without masks, cyclic shifts, padding, or hardware-specific sparse kernels. Because SwiftVR uses only standard dense SDPA calls, the trained model transfers to consumer GPUs without retraining or custom kernels. For autoencoding, a lightweight Restoration-aware Autoencoder enables fast chunk-wise decoding while preserving reconstruction quality. On a single H100, SwiftVR sustains 31~FPS at 2560x1440 and 14~FPS at 3840x2160, whereas all compared diffusion-based VR baselines exceed the memory limit at 4K. On a consumer RTX~5090, SwiftVR reaches 26~FPS at 1920x1080. To our knowledge, SwiftVR is the first generative VR model to achieve real-time 1080p streaming on a consumer-grade GPU, while attaining strong no-reference perceptual quality with lower inference cost. Project is available at https://h-oliday.github.io/SwiftVR.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09516v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiaqi Yan, Xiangyu Chen, Xinlin Zhong, Haibin Huang, Chi Zhang, Jie Liu, Jiantao Zhou, Xuelong Li</dc:creator>
    </item>
    <item>
      <title>Investigating Calibration Challenges in Probabilistic Electricity Price Forecasting</title>
      <link>https://arxiv.org/abs/2606.09517</link>
      <description>arXiv:2606.09517v1 Announce Type: new 
Abstract: As renewable energy integration increases market volatility, probabilistic electricity price forecasting has become essential for effective risk management. However, current-proper-scoring rules often prioritize forecast sharpness at the expense of calibration, leading to overconfident and statistically unreliable uncertainty estimates. This work highlights the critical gap between theoretical scoring and practical calibration, demonstrating that models can become mere proxies for deterministic forecasts when reliability is neglected. We conclude that future research must shift toward calibration-aware objectives and architectures to ensure the distributional integrity of energy market forecasts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09517v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3765611.3815392</arxiv:DOI>
      <dc:creator>Jan Niklas Lettner, Hadeer El Ashhab, Benjamin Sch\"afer</dc:creator>
    </item>
    <item>
      <title>Constructions of Quantum $(r,\delta)$-LRCs from cyclic codes</title>
      <link>https://arxiv.org/abs/2606.09522</link>
      <description>arXiv:2606.09522v1 Announce Type: new 
Abstract: Classical $(r,\delta)$ locally recoverable codes (LRCs) play a central role in distributed data storage systems as they enable an efficient recovery from erasures by accessing a small number of surviving symbols. Motivated by their prospective use in future quantum data storage and by recent theoretical progress on quantum locally recoverable codes (qLRCs), we investigate the construction of qLRCs from classical cyclic $(r,\delta)$-LRCs. Our approach identifies cyclic LRCs whose defining sets satisfy a dual-containing condition, allowing them to serve as valid CSS ingredients. We present three explicit families of $(r,\delta)$-qLRCs, two of which are optimal with respect to the quantum Singleton-like bound, whenever the codes are pure, thereby providing optimal examples. Additionally, the codes presented in Constructions 2 and 3 have no bound on their lengths with respect to the field size required to obtain these codes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09522v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Rajendra Prasad Rajpurohit, Maheshanand Bhaintwal</dc:creator>
    </item>
    <item>
      <title>Emergence of Context Characteristics Sensitivity in Large Language Models</title>
      <link>https://arxiv.org/abs/2606.09525</link>
      <description>arXiv:2606.09525v1 Announce Type: new 
Abstract: During instruction fine-tuning (IFT), large language models (LLMs) learn to follow instructions by using the provided context to answer a query. While prior work has studied how context characteristics correlate with context usage by the LLM, this analysis has been limited to inference time, leaving open how these relationships are acquired in the first place. Here, we measure how models' sensitivity to such characteristics shifts across successive IFT stages: supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning with verifiable rewards (RLVR). Experiments across four models and three datasets show that SFT makes models more likely to use contexts that are easy to understand, such as containing high length, context-query similarity, and fluency. Post-SFT dynamics may either reinforce or resolve these preferences depending on the training dataset. Our findings reveal that context usage is actively reshaped at each IFT stage, and designing a balanced IFT dataset is important in ensuring robust context utilization of instruction-tuned models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09525v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nadya Yuki Wangsajaya, Haeun Yu, Isabelle Augenstein</dc:creator>
    </item>
    <item>
      <title>When Types Intersect and Effects Get Handled</title>
      <link>https://arxiv.org/abs/2606.09526</link>
      <description>arXiv:2606.09526v1 Announce Type: new 
Abstract: We introduce a novel intersection type system for a $\lambda$-calculus with algebraic effects and handlers. The system, inherently behavioral in nature, enjoys the classical properties of intersection type systems, in particular subject reduction and expansion. It thus characterizes the set of terms whose evaluation process terminates and, at the same time, allows reducing the reachability problem to type inference. This new system, the first with these features for a calculus with handlers, induces a system of simple types which, although not guaranteeing termination, is type sound and admits a decidable HOMC problem, unlike similar type systems like Dal Lago and Ghyselen's HEPCF.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09526v1</guid>
      <category>cs.LO</category>
      <category>cs.PL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ugo Dal Lago, Taro Sekiyama, Stefano Catozi</dc:creator>
    </item>
    <item>
      <title>Relocate and Emulate: Re-Hosting Android's Application Layer</title>
      <link>https://arxiv.org/abs/2606.09528</link>
      <description>arXiv:2606.09528v1 Announce Type: new 
Abstract: Dynamic analysis of Android's application layer typically relies on physical devices, limiting scalability and reproducibility. To compensate, we introduce a systematic re-hosting method that relocates the Android framework and pre-installed software from real device firmware into a fully emulated environment. Our approach integrates vendor-specific components into the Android Open Source Project (AOSP) build system using tailored extraction and injection strategies, producing vendor-flavoured emulator images that preserve system integrity and runtime compatibility. This enables dynamic execution of real-world framework and application-layer components, including proprietary binaries and pre-installed apps, across multiple SDK versions. We evaluate our method on 184 firmware samples from SDK 31-33. It achieves high build and boot success rates, with residual failures primarily occurring during core-service initialization due to baseline strategy limitations, missing dependencies, device-protection checks, or emulator constraints. However, the modular design allows injection strategies to be extended for specific firmware, supporting broader compatibility and future research on automated, adaptive re-hosting. Though we identified potential for optimization through engineering vendor-specific solutions, our research demonstrates the feasibility of vendor-flavoured emulators for scalable, reproducible dynamic analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09528v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1109/SANER67736.2026.00053</arxiv:DOI>
      <dc:creator>Thomas Sutter, Timo Kehrer, Bernhard Tellenbach, Marc Rennhard</dc:creator>
    </item>
    <item>
      <title>Hybrid Metaheuristic Combining the Dragonfly Algorithm and Tabu Search for the Traveling Salesman Problem</title>
      <link>https://arxiv.org/abs/2606.09529</link>
      <description>arXiv:2606.09529v1 Announce Type: new 
Abstract: The Traveling Salesman Problem (TSP) is a classical NP-hard combinatorial optimization problem that aims to find the shortest Hamiltonian cycle visiting each city exactly once and returning to the starting point. This paper proposes a hybrid metaheuristic for the TSP by combining the Dragonfly Algorithm (DA), a swarm-intelligence-based global search method, with Tabu Search (TS), a memory-based local search technique. The proposed method follows a High-Level Relay Hybridization (HRH) scheme, in which DA is first used to explore the solution space and generate a promising initial tour, while TS subsequently refines this solution through neighbourhood-based improvement and tabu memory. The hybrid approach is evaluated on standard TSPLIB benchmark instances, including burma14, att48, and ch150, and compared with standalone DA, standalone TS, and several classical metaheuristics such as Genetic Algorithm, Ant Colony Optimization, Particle Swarm Optimization, and Random Search. A systematic grid-search procedure is also conducted to study the influence of the main hyperparameters on solution quality and execution time. The experimental results indicate that the proposed hybrid can improve tour quality compared with the standalone DA and TS on the tested instances, highlighting the benefit of combining global exploration with local exploitation. However, the results also suggest that performance remains sensitive to parameter settings and problem size, motivating further validation on larger benchmarks and stronger TSP-specific baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09529v1</guid>
      <category>cs.NE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ammar Bouketta</dc:creator>
    </item>
    <item>
      <title>Reduced integration with scaled boundary parametrization for virtual elements at finite strains</title>
      <link>https://arxiv.org/abs/2606.09530</link>
      <description>arXiv:2606.09530v1 Announce Type: new 
Abstract: This contribution presents an alternative stabilization technique for the virtual element method (VEM) based on reduced integration combined with a scaled boundary parametrization. To this end, a Taylor series expansion of the constitutive quantities with respect to the sectional center is carried out, enabling analytical integration of the weak form and reducing the need for integration points to only one per section. The accuracy of the proposed formulation is shown by several numerical examples, including a non-linear patch test. Different loading, e.g. compression under large deformations, and material conditions, such as hyperelastic anisotropy and elasto-plasticity, are considered. The biquadratic serendipity finite element formulation (Q2) and the low-order finite element formulation with hourglass stabilization (Q1STc+) are used for comparison. While the patch test was not fulfilled using higher order shape functions, the formulation led to good results and was able to capture the structure's response accurately. Furthermore, the formulation performed better when the physical element resembled the pre-assigned parent elements. The example of the asymmetrically notched specimen under elasto-plastic material behavior showed that the proposed formulation is able to capture inelasticities.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09530v1</guid>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Njomza Pacolli, Bjorn Sauren, Jannick Kehls, Sven Klinkel, Stefanie Reese, Hagen Holthusen</dc:creator>
    </item>
    <item>
      <title>Interpretable Crisis Behavior Analysis Using Mobility and Social Media Data</title>
      <link>https://arxiv.org/abs/2606.09532</link>
      <description>arXiv:2606.09532v1 Announce Type: new 
Abstract: Crises alter both how people move and how they communicate. During emergencies such as wildfires and pandemics, changes in mobility patterns and online emotional discourse evolve jointly, yet they are typically studied in isolation. This paper presents a unified and interpretable pipeline that integrates mobility and social media data to identify cross-domain behavioral patterns in crisis settings. The framework is evaluated through two case studies: a short-horizon analysis of the January 2025 Los Angeles wildfires (prototype case) and a longitudinal analysis of UAE COVID-19 behavior from March 2020 to December 2021 (primary case, 671 days). The pipeline aligns heterogeneous daily signals, transforms them into binary behavioral states, applies Formal Concept Analysis (FCA) to extract co-occurrence structure, mines association rules, and validates rule stability through chronological holdout testing. A structured policy-translation layer renders robust rules as operational briefs specifying triggers, lead times, and action playbooks. Results reveal clear cross-domain behavioral structure in both crises. In the wildfire case, traffic stress, fear/anger sentiment, and governance discourse are tightly coupled within a 33-day window, with key rules reaching 100\% confidence and lift scores up to 2.5. In the COVID case, repeated mobility adaptation and sentiment volatility yield 8 stable same-day rules (88\% holdout pass rate) and 40 clean predictive rules with 2--7 day lead horizons. The work demonstrates that interpretable multimodal fusion can produce both scientifically credible and policy-actionable crisis intelligence.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09532v1</guid>
      <category>cs.CY</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Muhammad Hamza Arshad Majeed, Sidahmed Benabderrahmane, Talal Rahwan</dc:creator>
    </item>
    <item>
      <title>Just-in-time Restoration with Distributed Fiber Sensing in Metropolitan Optical Networks</title>
      <link>https://arxiv.org/abs/2606.09533</link>
      <description>arXiv:2606.09533v1 Announce Type: new 
Abstract: Distributed Fiber Sensing (DFS) leverages optical backscattering signals to predict failure events and enable just-in-time restoration in metropolitan optical networks, i.e., without optical amplifiers. In this paper, we study the effectiveness of proactive restoration based on DFS information in all-optical networks, while considering different sensing devices' capabilities. We evaluate whether restoration can be provisioned just-in-time before a failure happens, and its impact on key performance metrics, including the number of affected and suspended optical circuits, bandwidth blocking rate, and service downtime. Simulation results demonstrate that just-in-time restoration enabled by DFS with a prediction time capability of 15~ms can reduce circuit disruptions by more than 90\% compared to restoration without sensing and ensure optical service continuity in optical networks comparable to resource-intensive protection schemes, at a fraction of the spectral resources.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09533v1</guid>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sleman Mouammar, Italo B. Brasileiro, Andre C. Drummond</dc:creator>
    </item>
    <item>
      <title>A Continuification Approach to CAV Control in Mixed Traffic via Variable Speed Limits</title>
      <link>https://arxiv.org/abs/2606.09534</link>
      <description>arXiv:2606.09534v1 Announce Type: new 
Abstract: This paper presents a method for controlling traffic via the use of connected and automated vehicles (CAVs) acting as moving bottlenecks. Current methods for moving bottleneck control use a couple PDE-ODE model, based on the Lighthill-Whitham-Richard (LWR) model, to represent the influence of the CAV. Control of the CAV is normally achieved by designing the control on the ODE which models the speed of the moving bottleneck. The proposed method in this paper instead looks to reduce the computational burden of controlling multiple CAVs by designing the moving bottleneck controller first on the PDE. The original control designed on the PDE is a linear quadratic regulator (LQR) that determines the optimal variable speed limit (VSL) for the entire length of freeway in order to regulate density to a desired setpoint. Then, a continuification approach is utilized to determine the input speed for each CAV. Results show that multiple CAVs can be controlled via this method, with minimal computational burden, and that as the number of CAVs increases the solution approaches the global optimal solution determined by the LQR.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09534v1</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Brian Block, Cecilia Pasquale, Silvia Siri, Simona Sacone, Stephanie Stockar</dc:creator>
    </item>
    <item>
      <title>Overcoming Decoder Inconsistencies in Whisper for Dravidian and Low-Resource Languages</title>
      <link>https://arxiv.org/abs/2606.09535</link>
      <description>arXiv:2606.09535v1 Announce Type: new 
Abstract: Multilingual ASR models such as Whisper perform well on high-resource languages but exhibit substantially higher Word Error Rates (WER) for Dravidian languages compared to Indo-Aryan ones. Through linguistic and dataset analysis, we show that Dravidian languages have longer words, higher vocabulary diversity, and lower repetition, resulting in sparse token distributions and frequent character-level substitution errors. Baseline fine-tuning further reveals decoder imbalance between self-attention (linguistic context) and cross-attention (acoustic cues). Although synthetic token-repetition experiments indicate potential gains, they are impractical. Motivated by these observations, we introduce two decoder-level enhancements: Weighted-Attention, which adaptively balances attention sources, and Self-Conditioning, which reinjects intermediate predictions to improve token consistency. Experiments demonstrate consistent WER reductions for low-resource and agglutinative languages.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09535v1</guid>
      <category>cs.CL</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chowdam Venkata Kumar, Kumud Tripathi, Pankaj Wasnik</dc:creator>
    </item>
    <item>
      <title>Adversarial Attack and Disturbance Detection by Hadamard-Coded Output Representations for Object Detection and Semantic Segmentation</title>
      <link>https://arxiv.org/abs/2606.09536</link>
      <description>arXiv:2606.09536v1 Announce Type: new 
Abstract: Conventional one-hot encodings often yield poorly calibrated models, being overconfident under attack, and letting entropy-based detection algorithms fail. Previous image classification works have demonstrated that Hadamard-coded output representations can improve adversarial robustness. However, attempts to integrate Hadamard codes into semantic segmentation fall far behind state-of-the-art models in mean intersection-over-union performance. Regarding object detection, such output encodings have not yet been investigated at all. Further, no prior art addressed intrinsic codeword inconsistencies or actually exploited intrinsic codeword redundancy. Accordingly, we first derive a novel decoding procedure for Hadamard codewords towards optimal class-wise probabilities, solving the underlying optimization problem by using the projection onto the probability simplex. Second, our optimization delivers a measure of prediction inconsistency. Third, we are the first to show how to exploit these inconsistencies for adversarial attack and disturbance detection. Fourth, we introduce HadamardNet, a framework employing Hadamard codes as output representations for semantic segmentation and object detection models and tasks. We conduct a comprehensive evaluation both on disturbances and adversarial attacks, achieving state-of-the-art perturbation detection performance for both tasks in only a single detection pass, while delivering equivalent or close-by reference performance on clean data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09536v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lucas G\"ornhardt, Timo Bartels, Niklas Schwarz, Tim Fingscheidt</dc:creator>
    </item>
    <item>
      <title>STEPS: Semantic-Contract-Guided Scheduling for LLM-Assisted Natural-Language-Driven Edge AI Services</title>
      <link>https://arxiv.org/abs/2606.09537</link>
      <description>arXiv:2606.09537v1 Announce Type: new 
Abstract: Networked AI services are increasingly delivered through edge infrastructures to support latency-sensitive applications. Edge scheduling is critical for deciding where and how AI services are executed under limited communication and computing resources. Existing frameworks usually assume that requirements are given as numerical constraints, such as latency bounds, energy budgets, or cost limits. In practice, users often express expectations through ambiguous natural language, creating a gap between user intent and resource constrained scheduling. To bridge this gap, we propose semantic-contract-guided edge potential scheduling (STEPS), a natural language driven scheduling framework for LLM assisted edge AI services. STEPS introduces semantic contracts as executable interfaces between user-side semantics and edge-side decision making. An LLM assisted semantic parser extracts service levels and confidence scores, which are converted into service preferences, fulfillment bounds, and semantic uncertainty. Based on these contracts, STEPS formulates edge scheduling as a contract-guided potential game that jointly determines execution-node selection, computing-resource provisioning, and bandwidth allocation. It also builds feedback signals from semantic request drift, fulfillment drift, fulfillment pressure, and admission pressure to adjust semantic admission, contract conservativeness, and edge coordination. We characterize the exact potential game structure, establish pure strategy Nash equilibrium existence, and prove convergence and stability properties. Experiments show that STEPS improves semantic contract fulfillment, reduces contract guided service loss, and maintains robust adaptation under ambiguous requests and non-stationary edge environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09537v1</guid>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Houyi Qi, Minghui Liwang, Xianbin Wang, Seyyedali Hosseinalipour</dc:creator>
    </item>
    <item>
      <title>Efficient Traffic Prediction at Scale: A Systematic Study of STGCN Architectural Depth</title>
      <link>https://arxiv.org/abs/2606.09539</link>
      <description>arXiv:2606.09539v1 Announce Type: new 
Abstract: Spatio-temporal graph neural networks (STGNNs) have become the dominant approach for traffic prediction, yet their computational requirements pose challenges for practical deployment in intelligent transportation systems (ITS). While recent work has proposed efficient alternatives to STGNNs, a fundamental question remains unexplored: are these architectures themselves over-parameterised? We examine this question using the Spatio-Temporal Graph Convolutional Network (STGCN), one of the most widely adopted models in this domain. Through systematic experiments across four diverse traffic datasets, we compare 1-block, 2-block (standard), and 3-block STGCN variants. Our findings reveal that the single-block architecture achieves optimal performance for short-term prediction (10 mins) on three of four datasets, while incurring only marginal degradation ($\leq$1.8% relative error) at longer horizons. Crucially, the 2-block variant incurs 61% higher CPU inference latency and 37% lower throughput relative to 1-block -- substantial overhead for resource-constrained ITS deployment. The 3-block architecture offers no favourable tradeoff, more than doubling computational cost for $&lt;$0.5% relative improvement. These results suggest that the default 2-block STGCN may be over-parameterised for many applications, with implications for both practitioners deploying traffic prediction systems and researchers benchmarking efficiency-focused methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09539v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Soban Nasir Lone, Mohamed Abouelela, Taeyoung Yu, Jiwon Kim, Constantinos Antoniou</dc:creator>
    </item>
    <item>
      <title>A VideoMAE-v2 Approach to Zero-Shot Traffic Accident Anticipation</title>
      <link>https://arxiv.org/abs/2606.09542</link>
      <description>arXiv:2606.09542v1 Announce Type: new 
Abstract: Traffic accident anticipation -- predicting the likelihood of an imminent collision at every frame of a dashcam video -- is safety-critical yet difficult to scale, because collecting in-domain annotated accident footage for every deployment scenario is prohibitively expensive. We study this task under a zero-shot setting where no target-domain training data is available: the model must learn exclusively from a publicly available binary-labelled driving-accident dataset and generalise to unseen dashcam footage. We propose a framework that bridges the gap between the frame-level temporal risk estimation task and coarsely labelled binary accident datasets by coupling a VideoMAE-v2 backbone with a per-frame prediction head under a sliding-window protocol. Our method achieves 2nd place in the 2026 CVPR@AUTOPILOT Zero-Shot Accident Anticipation competition. Code is available at https://github.com/TimeSouth/zero-shot-taa-solution.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09542v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Siyuan Li, Xiaoyang Bi, Mengshi Qi</dc:creator>
    </item>
    <item>
      <title>From Genes to Tokens: a GWAS-inspired Approach for Interpretable Stylometric Analysis</title>
      <link>https://arxiv.org/abs/2606.09543</link>
      <description>arXiv:2606.09543v1 Announce Type: new 
Abstract: This short paper introduces a stylometric interpretation method inspired by genome-wide association studies (GWAS). Each "gene" token's association with "phenotype" authorship is tested using logistic regression with multiple-comparison correction. Applied to English, German, and Russian corpora, the method detects statistically significant lexical markers distinctive of individual authors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09543v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dmitry Pronin, Evgeny Kazartsev</dc:creator>
    </item>
    <item>
      <title>Streaming Interventions: Can Video Large Language Models Correct Mistakes as They Occur?</title>
      <link>https://arxiv.org/abs/2606.09547</link>
      <description>arXiv:2606.09547v1 Announce Type: new 
Abstract: Learning everyday skills, like cooking a dish, relies increasingly on instructional media such as online videos. This opens the door to the use of video (and multimodal) large language models (LLMs) as task guidance assistants. A crucial capability for the real-world success of a prospective task guidance assistant is it's ability to intervene proactively as soon as a mistake is apparent in order to guide the user. To evaluate this crucial capability, we introduce Ego-MC-Bench (Mistake Corrections), a benchmark for evaluating reactive, step-by-step task guidance in realistic cooking scenarios. Extensive experiments show that Ego-MC-Bench is highly challenging for state-of-the-art video LLMs. We argue that a key reason is the limited availability of training data for fine-tuning models on this task. Although there exists a wide range of cooking video datasets, existing datasets lack examples of mistakes along with appropriately timed interventions. To help address this data limitation, we also introduce Ego-CoMist, a counterfactual synthetic dataset created by transforming non -interactive cooking videos into supervised training examples showing proactive interventions. We show that fine-tuning on Ego-CoMist yields performance gains especially for smaller and more efficient video LLMs that are well suited for delivering assistance on edge devices.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09547v1</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Apratim Bhattacharyya, Shweta Mahajan, Sanjay Haresh, Rajeev Yasarla, Reza Pourreza, Litian Liu, Risheek Garrepalli, Roland Memisevic</dc:creator>
    </item>
    <item>
      <title>Model Poisoning Against Federated Model Adaptation with Chain of Bit-Flips</title>
      <link>https://arxiv.org/abs/2606.09548</link>
      <description>arXiv:2606.09548v1 Announce Type: new 
Abstract: Federated Learning (FL) allows a set of clients to collectively train a global model without sharing local training data. Giving the responsibility of the training to decentralized actors may lead to poisoning attacks: clients controlled by malicious third party potentially poison the training dataset to install a backdoor in neural networks. In FL, these backdoor attacks rely solely on algorithmic approach, however, recent advances in hardware faults threats (e.g, Rowhammer) have widen the overall attack surface. In the context of federated model adaptation, we introduce a novel category of backdoor attack against FL systems that relies on model poisoning based on hardware-fault attacks. More precisely, we propose a task-agnostic backdoor attack that is implanted during the FL training time by inducing hardware faults (bit-flips) in parameters of a single local model. The backdoor is crafted during a previous offline phase from the pretrained model initially used by the FL system. Our results show that a backdoor can be successfully applied on different type of models and datasets. Typically, with up to 10 faults per malicious client occurrence and 19 total occurrences on a ResNet-18 are enough to reach 94% of attack success rate. Finally, we discuss the practicality and the robustness of the attack potential defenses, while putting into perspective the practical constraints of Rowhammer, which is the preferred attack vector for this type of threats.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09548v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Bastien Vuillod, Kevin Hector, Pierre-Alain Moellic, Jean-Max Dutertre, Olivier Potin</dc:creator>
    </item>
    <item>
      <title>SecureClaw: Clawing Back Control of LLM Agents</title>
      <link>https://arxiv.org/abs/2606.09549</link>
      <description>arXiv:2606.09549v1 Announce Type: new 
Abstract: Tool-using large language model (LLM) agents face two distinct security failures: unauthorized external actions and exposure of sensitive plaintext inside the runtime before any final output check can intervene. Existing defenses usually protect one boundary, either the planner/runtime or the action sink, and therefore do not by themselves secure both surfaces. We present SecureClaw, a dual-boundary architecture that places authorization at the effect sink and plaintext confinement at the read boundary. Sensitive reads pass through a trusted gateway that replaces raw values with opaque handles and, in the evaluated deployment, bounded summaries as an explicit declassification interface. Writes that change external state follow a PREVIEW$\rightarrow$COMMIT protocol in which only a trusted executor may commit the exact canonical request authorized by policy. The runtime can still plan over summaries and symbolic references, but cannot directly dereference secrets or perform side effects. Across AgentDojo, AgentLeak, and Agent Security Bench (ASB), SecureClaw is the only defense we evaluate in a common harness that simultaneously retains usable task utility and achieves 0\% attack success rate (ASR) on ASB, 0.64\% ASR on AgentDojo, and 3.23\% overall leak on AgentLeak's attacked parity lane, which measures final-output and internal-relay leakage.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09549v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuhan Ma, Stefan Schmid</dc:creator>
    </item>
    <item>
      <title>InquiTree: Evaluating AI Agents in the Scientific Inquiry Loop with Paper-Derived Research Trees</title>
      <link>https://arxiv.org/abs/2606.09550</link>
      <description>arXiv:2606.09550v1 Announce Type: new 
Abstract: While LLM-based agents are increasingly used in scientific workflows, it remains unclear whether they are truly qualified for the dynamic and uncertain process of discovery. Existing static evaluations often conflate genuine reasoning with rote memorization. We introduce InquiTree, a diagnostic environment that formalizes scientific inquiry as interactive Research Trees: directed acyclic graphs capturing the logical dependencies among hypothesis formulation, study design, result interpretation, and belief updating. Evaluating agents on a 30-paper test pool and releasing the open-access InquidTree-18(IT-18) subset, we identify two key limitations. First, agents exhibit an "Erosion of Marginal Capabilities": during long-horizon interactions, they develop "cognitive tunneling," where critical judgment and anomaly detection degrade relative to their intrinsic baselines. Second, performance drops on papers published after model training cutoffs, revealing a boundary between interpolation and extrapolation and suggesting that apparent competence is partly driven by parametric memory. These findings indicate that scaling context alone is insufficient for reliable AI scientists; stronger architectures or human oversight may be required to preserve critical evaluation and generalization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09550v1</guid>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shaoyang Cui</dc:creator>
    </item>
    <item>
      <title>FuseFSS: Efficient Secure LLM Inference with Function Secret Sharing</title>
      <link>https://arxiv.org/abs/2606.09551</link>
      <description>arXiv:2606.09551v1 Announce Type: new 
Abstract: Two-server secure inference allows a client to query a hosted large language model (LLM) without revealing prompts or embeddings. Recent GPU systems based on function secret sharing (FSS) make linear layers efficient, but fixed-point nonlinearities and helper operations remain a bottleneck because each operator is typically implemented as a bespoke protocol with its own comparisons, wrap-around corrections, and preprocessing material. We present FuseFSS, a compiler that replaces per-operator protocol design with a single compilation pipeline. For each scalar fixed-point operator, a compact specification lists its interval partition, low-degree arithmetic pieces, and required predicate bits. The compiler emits two batched FSS evaluations on the public masked value: one packed comparison that returns all predicate bits, and one vector interval lookup that returns the active coefficients and constants. Compared to the current state-of-the-art FSS-based GPU secure inference, FuseFSS preserves accuracy while achieving a $1.24\times$--$1.50\times$ end-to-end speedup and reducing online communication by $9\%$--$16\%$ on BERT and GPT-style models; preprocessing is also lighter, with $14\%$--$23\%$ lower key-generation time and $20\%$--$24\%$ smaller keys.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09551v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuhan Ma, Yong Li, Stefan Schmid</dc:creator>
    </item>
    <item>
      <title>OpenBibleTTS: Large-Scale Speech Resources and TTS Models for Low-Resource Languages</title>
      <link>https://arxiv.org/abs/2606.09553</link>
      <description>arXiv:2606.09553v1 Announce Type: new 
Abstract: Recent advances in neural text-to-speech (TTS) and multilingual speech generation have substantially improved synthetic speech quality, yet these gains remain unevenly distributed across the world's languages. Existing models are still dominated by a small set of high-resource languages, while many studies of low-resource TTS are simulated on artificially downsampled high-resource corpora that do not reflect the orthographic variation and limited phonetic coverage encountered in genuinely underrepresented settings. As such, we introduce OpenBibleTTS, which is a large-scale benchmark for low-resource speech synthesis spanning 37 underrepresented languages. Moreover, a systematic comparison of various TTS architectures and large-scale speech generation models is conducted across in-domain Biblical text and out-of-domain material. Results show that no single system dominates across languages and metrics: Gemini-TTS achieves the highest listener ratings on most evaluated languages, but monolingual EveryVoice models trained on OpenBibleTTS remain strongest for intelligibility and are preferred in several African languages, while open from-scratch systems degrade sharply on out-of-domain text, revealing a persistent gap between broad multilingual coverage and reliable synthesis quality in underserved linguistic communities. We complement automatic evaluation with subjective human judgments, and open-source all processed datasets, alignments, and trained models to support future low-resource TTS research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09553v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>David Guzm\'an, Luel Hagos Beyene, Jesujoba Oluwadara Alabi, Yejin Jeon, Dietrich Klakow, David Ifeoluwa Adelani</dc:creator>
    </item>
    <item>
      <title>AI Scientists Are Only as Good as Their Evidence: A Stratified Ablation of Proprietary Data and Reasoning Skills in Drug-Asset Valuation</title>
      <link>https://arxiv.org/abs/2606.09556</link>
      <description>arXiv:2606.09556v1 Announce Type: new 
Abstract: AI Scientist agents are often evaluated as if capability were mainly a function of model quality, prompting, or reasoning scaffolds. We test a different hypothesis in drug-asset valuation: for knowledge-intensive scientific decisions, the limiting factor is often the evidence substrate the agent can access. We run a controlled three-arm ablation on a production valuation agent: A is a plain web-only LLM analyst, B adds public structured tools plus a 14-dimension valuation playbook, verifier, objectivity policy and red-team, and C adds the proprietary Noah AI corpus of curated pipeline, trial and deal intelligence. Across a 13-asset stratified benchmark, B improves calibration and audit discipline: tier-in-range accuracy rises from 0.80 to 0.89 and objectivity from 3.16 to 3.30. But B does not remove the factual ceiling. Under capability-superset accounting, A and B recover only 0.25 and 0.38 of the curated gold competitive record, while C recovers 0.96; on the curated long-tail subset, C reaches 0.93 vs. 0.26/0.30. Raw blind-panel decision quality is similar for A and B (7.01 vs. 6.96), so we introduce completeness-aware decision utility: informed decision-quality = decision-quality x gold-coverage. On this metric, C reaches 7.43 vs. 1.76/2.57 for A/B. Even a perfect non-proprietary-data report would be capped at 3.83 by B's coverage. The result is not that reasoning scaffolds are unimportant; they improve calibration and discipline. Rather, proprietary evidence sets the upper bound of what the AI Scientist can know and therefore decide.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09556v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yinan Wang</dc:creator>
    </item>
    <item>
      <title>Safe-RULE: Safe Reinforcement UnLEarning</title>
      <link>https://arxiv.org/abs/2606.09559</link>
      <description>arXiv:2606.09559v1 Announce Type: new 
Abstract: Offline safe reinforcement learning (Safe RL) enables policy learning without online interactions, making it suitable for safety-critical systems such as robotics systems. However, its reliance on static datasets exposes offline Safe RL to data poisoning attacks, where adversaries inject malicious samples that compromise safety and induce unsafe policy behavior. In this work, we propose a new learning paradigm, named safe reinforcement unlearning (Safe-RULE), used as a defense framework to remove the influence of poisoned data without retraining from scratch or requiring access to the original training environment. We further extend reinforcement unlearning to offline Safe RL by explicitly accounting for both task performance and safety constraints during the unlearning process. Experiments across benchmark Safe RL tasks demonstrate that our approach effectively enhances safety performance against data poisoning attacks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09559v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Shixiong Jiang, Taozheng Zhu, Fanxin Kong</dc:creator>
    </item>
    <item>
      <title>PRISM: Recovering Instruction Sets from Language Model Activations</title>
      <link>https://arxiv.org/abs/2606.09563</link>
      <description>arXiv:2606.09563v1 Announce Type: new 
Abstract: As LLMs are deployed as agents, reliable monitoring requires knowing not only what they output, but which instructions are steering their behavior. This is difficult when models infer unintended subgoals, follow contextual cues, or are influenced by prompt injections and hidden objectives. While activation-to-language methods suggest that hidden states can reveal natural-language information, existing approaches are not designed to recover the full set of simultaneous instructions, constraints, prohibitions, and subgoals active in agentic settings. We formalize this problem as instruction set retrieval and introduce PRISM, an activation-conditioned interpreter that decodes hidden states from a frozen target model into a faithful bullet list of active instructions. Unlike prior activation-to-language methods, PRISM is trained to recover instruction sets directly, using judge-guided GRPO to reward covered instructions and penalize unsupported ones. Across benign, constrained, prompt-injection, and hidden-objective settings, PRISM outperforms activation-to-language baselines, especially on security-relevant objectives.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09563v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gilad Gressel, Rahul Pankajakshan, Julia Diament, Efim Hudis, Krishnashree Achuthan, Yisroel Mirsky</dc:creator>
    </item>
    <item>
      <title>STON'R Converges to First-Order Nash~Equilibria of Multiplayer Games</title>
      <link>https://arxiv.org/abs/2606.09565</link>
      <description>arXiv:2606.09565v1 Announce Type: new 
Abstract: Nonconcave games present a unique challenge, as neither pure Nash equilibria nor local Nash equilibria (LNE) are guaranteed to exist, even in zero-sum settings. Additionally, computing approximate LNE in smooth multiplayer games over bounded regions is PPAD-hard. These challenges, coupled with the inherent complexity, have driven recent research toward broader equilibrium concepts, such as min-max critical points, and first-order Nash equilibria (FONE), which correspond to solutions of specific non-monotone variational inequalities. This paper addresses general-sum multiplayer games with compact convex strategy sets and smooth, nonconcave utility functions. Daskalakis et al. introduced the STON'R algorithm for solving variational inequality problems and established convergence under smoothness assumptions. They further showed that the algorithm's limit points correspond to equilibria in specific classes of games, namely local minimax equilibria in two-player zero-sum games and Nash equilibria in smooth concave games. In this work, we extend the convergence result to multiplayer general-sum games and show that the variational inequality solutions targeted by STON'R correspond to first-order Nash equilibria (FONE), a general game-theoretic solution concept that unifies these previously studied cases. We demonstrate the effectiveness and robustness of the algorithm on various examples from recent literature.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09565v1</guid>
      <category>cs.GT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Marika Kosohorsk\'a, Tom\'a\v{s} Kroupa, Tom\'a\v{s} Votroubek</dc:creator>
    </item>
    <item>
      <title>Self-Explainability in Self-Adaptive and Self-Organising Systems: Status and Research Directions</title>
      <link>https://arxiv.org/abs/2606.09568</link>
      <description>arXiv:2606.09568v1 Announce Type: new 
Abstract: The growing complexity of self-adaptive and self-organising systems, fuelled by advances in Artificial Intelligence (AI), has made them increasingly difficult to understand and trust. While Explainable AI aims to provide insight into AI decision-making, a more advanced goal is for systems to explain themselves - an ability referred to as Self-Explainability (SX). This article presents a systematic literature review on SX, analysing existing approaches, including their domains, targets, and evaluation methods. The review develops a unified definition and taxonomy of SX and introduces Levels of Self-Explainability, providing a framework for positioning current and future research. Our results show that most SX approaches remain conceptual, with few practical implementations. Moreover, there is currently no formal or de facto standard for evaluating SX, highlighting a major research gap. This work thus establishes a foundation and roadmap for advancing Self-Explainability in complex systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09568v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tom Beyer, Svea Wisy, Sven Tomforde</dc:creator>
    </item>
    <item>
      <title>Efficient Minimal Solvers for Relative Pose Estimation in Autonomous Driving Applications</title>
      <link>https://arxiv.org/abs/2606.09569</link>
      <description>arXiv:2606.09569v1 Announce Type: new 
Abstract: With the advancement of visual sensing systems, computer vision is playing an increasingly important role in autonomous driving and robot navigation. Relative pose estimation in multi-camera systems is essential for accurate vehicle localization and environment perception, demanding high real-time performance and robustness. Existing methods, however, often involve high computational costs and rely heavily on abundant feature matches, limiting their applicability in time-sensitive driving scenarios. To address these limitations, this paper introduces a unified framework for efficient relative pose estimation, built upon a novel translation parameterization and first-order rotation approximation. Within this framework, we propose three efficient minimal solvers specifically designed for autonomous vehicles. The first solver integrates the vertical direction prior from Inertial Measurement Units (IMUs), the second utilizes the rotation axis direction prior during steering maneuvers, and the third is designed for planar motion - a realistic assumption for ground vehicles operating on structured roads. By reducing both the minimal number of point correspondences and the algebraic complexity, our methods enable faster hypothesis generation within RANSAC-based pipelines, improving suitability for real-time systems. Extensive experiments on synthetic datasets and the KITTI autonomous driving benchmark demonstrate that the proposed solvers achieve a favorable balance between speed and accuracy compared to existing state-of-the-art algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09569v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tao Li, Liang Liu, Jianli Han, Weimin Lv</dc:creator>
    </item>
    <item>
      <title>UXBench: Benchmarking User Experience in AI Assistants</title>
      <link>https://arxiv.org/abs/2606.09570</link>
      <description>arXiv:2606.09570v1 Announce Type: new 
Abstract: As AI assistants serve millions of users daily, evaluating user experience (UX) beyond general model capability has become increasingly important. We present UXBench, the first user-centric benchmark grounded in real user feedback signals for evaluating preference alignment and dialogue generation. The benchmark consists of three interconnected tasks, UX Judge, UX Eval, and UX Recovery, with 7,400 test instances extracted from over 70K interaction logs of a mainstream Chinese AI assistant. The dataset closely reflects real user distributions, covering 8 scenarios, 83 domains, and diverse failure patterns that pose severe challenges. Extensive experiments on 26 frontier language models provide novel insights into how well models perceive user experience and how improvements in model capability contribute to better dialogue engagement. Through comprehensive analysis of model behavior and performance gaps, we show that user feedback prediction is a learnable capability, where a reward model trained from in-the-wild feedback signals can achieve well-calibrated accuracy. We further document the systematic biases of LLM-as-a-judge evaluation protocols and compare typical response strategies that directly affect user experience. UXBench establishes a new evaluation landscape and calls for greater attention to tailored UX optimization, contributing to a user-centric scaling law that shapes the success of AI assistants.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09570v1</guid>
      <category>cs.CL</category>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mengze Hong, Xia Zeng, Zeyang Lei, Sheng Wang, Chen Jason Zhang, Di Jiang, Taiming Fu, Jinfeng Huang, Mengqiao Liu, Qinghe Chang, Haosheng Zou, Qiongyi Zhou, Sijun He, Chen Xiaoshuai, Simon Deng, Haojing Huang, Zijian Li, Lucas Mu Li, Fubao Zhang, Mona Zhou, Wei Ma, Chenxuan Ma, Yuanmeng Zhang, Jian Song, Minlong Peng, Di Liang, Davey Chen</dc:creator>
    </item>
    <item>
      <title>CT-VAM: A Cerebello-Thalamic-Inspired Vision-Action Model for Efficient Visuomotor Control</title>
      <link>https://arxiv.org/abs/2606.09572</link>
      <description>arXiv:2606.09572v1 Announce Type: new 
Abstract: Vision-language-action models have shown strong promise for robot manipulation, yet raw language is primarily needed to specify task intent rather than to be repeatedly processed during high-frequency low-level execution. Motivated by this separation, we propose a cerebello-thalamic-inspired vision-action model (CT-VAM) for efficient task-conditioned visuomotor control. CT-VAM acts as a compact local execution policy that predicts action chunks from dualview visual observations, proprioception, and a lightweight task condition, potentially enabling a practical cloud-edge paradigm in which high-level semantic reasoning can be handled by large models while fast closed-loop control runs on local hardware. To fuse heterogeneous inputs effectively, CT-VAM introduces TARS (Thalamic Action Routing Stream), a stream-separated conditional attention decoder that independently routes action, visual and task streams, preventing dense sensory tokens from overwhelming compact task-relevant conditions. With only 68M parameters, CT-VAM achieves LIBERO success rates competitive with substantially larger VLA models, while reducing inference latency. Together with flow-consistent inpainting for asynchronous chunk execution, CT-VAM supports high-frequency control and demonstrates robust realworld deployment on resource-constrained robotic platforms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09572v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiacheng Li, Yize Guo, Jiabin Guo, Qingchen Liu, Jiahu Qin</dc:creator>
    </item>
    <item>
      <title>Code Is More Than Text: Uncertainty Estimation for Code Generation</title>
      <link>https://arxiv.org/abs/2606.09577</link>
      <description>arXiv:2606.09577v1 Announce Type: new 
Abstract: Large language models (LLMs) are increasingly deployed as code generators, where silently wrong programs pose real safety and reliability risks. Reliable uncertainty estimation (UE) is essential for selective prediction, human-in-the-loop review, and downstream agentic decisions. Yet most existing code UE methods are inherited from natural language (NL) generation and ignore properties that make code distinct. We argue that code differs from NL in three ways: a single wrong token can break an entire program (token fragility); algorithmic intent and concrete implementation can disagree independently (intent-code gap); and programs can be executed (executability). We instantiate these properties as three orthogonal uncertainty axes: lexical (Top-K token entropy), algorithmic (pseudo-code consistency), and functional (behavioral consistency). Across five code LLMs, our three-axis ensemble improves average AUROC from 0.696 for the strongest NL-derived baseline to 0.776 (+8.1 points). Notably, on Qwen3-14B, our single-pass Top-K token entropy matches the strongest multi-pass baseline while being over 3x cheaper; across models, it remains a competitive low-cost signal. These results suggest that code UE deserves code-specific design rather than direct NL ports.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09577v1</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuling Shi, Caiqi Zhang, Yuexian Li, Haopeng Wang, Yeheng Chen, Nigel Collier, Xiaodong Gu</dc:creator>
    </item>
    <item>
      <title>TABVERSE: Benchmarking Cross-Format Table Understanding in LLMs and VLMs</title>
      <link>https://arxiv.org/abs/2606.09578</link>
      <description>arXiv:2606.09578v1 Announce Type: new 
Abstract: Large Language Models (LLMs) and Vision-Language Models (VLMs) are increasingly evaluated on table reasoning tasks, but the role of table representation remains under-explored. In practice, the same table content may appear in different structural formats, such as HTML, Markdown, and LaTeX, or as rendered images. However, existing evaluations often let content, format, layout, and modality vary together, making it difficult to isolate representation effects. We introduce TABVERSE, a controlled multimodal table benchmark that aligns the same table content across multiple structural formats and rendered images, with question category and difficulty tags. This design enables systematic evaluation of representation effects while holding table content fixed. We evaluate LLMs and VLMs across three tasks: Question Answering (QA), Structural Understanding Capability (SUC), and Structure Reconstruction (SR). Our results show that representation choice substantially affects table understanding. Models generally perform better with structured text than with rendered images, but the size of this gap depends on the task, model, and format. HTML is often the most robust text format, while row-sensitive structural tasks and syntactically usable LaTeX reconstruction remain challenging. These findings show that table representation is a key factor in reliable table evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09578v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Momina Ahsan, Sarfraz Ahmad, Ming Shan Hee, Roy Ka-Wei Lee, Preslav Nakov</dc:creator>
    </item>
    <item>
      <title>AeroMesa: Efficient Data Management System for Multi-Dimensional Spatio-Temporal Trajectories</title>
      <link>https://arxiv.org/abs/2606.09581</link>
      <description>arXiv:2606.09581v1 Announce Type: new 
Abstract: The rapid growth of trajectory data -- especially the dense 4D traces generated by unmanned aerial vehicles (UAVs) -- is placing mounting pressure on spatio-temporal data management systems. Existing HBase-based trajectory indexes suffer from three limitations: coarse-grained temporal pruning, locality-unfriendly XZ2 spatial encodings with workload-blind ordering, and severe row-key interval fragmentation when altitude is jointly encoded with the horizontal dimensions. We present AeroMesa, a unified system that natively supports $(x,y)$, $(x,y,t)$, $(x,y,z)$, and $(x,y,z,t)$ queries within a single storage framework. AeroMesa integrates three complementary designs: a temporal index (TI$^{+}$) that refines pruning to second-level granularity, a Hilbert-BFS spatial index with a Workload-Aware Jaccard ordering, and a decoupled 4D architecture that separates horizontal indexing from altitude-aware secondary indexing to eliminate isotropic-encoding fragmentation. We implement AeroMesa on Apache HBase and Redis and evaluate it on a real-world dataset (T-Drive) and a high-fidelity 90,000-trajectory UAV simulation dataset. AeroMesa consistently outperforms all baselines: TI$^{+}$ cuts temporal-query candidates by up to 51% over MCTM, the Hilbert-BFS/WAJ index lowers 2D latency by up to 17.9% over the state-of-the-art TMan, and the decoupled 4D design reduces latency by up to 30$\times$ while cutting merged scan ranges by up to three orders of magnitude over XZ3/TXZ3 joint-encoding approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09581v1</guid>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yue Zhang (Shanghai Jiao Tong University, Shanghai, China), Zizhong Ding (Shanghai Jiao Tong University, Shanghai, China), Lin Sun (Shanghai Jiao Tong University, Shanghai, China), Haopeng Chen (Shanghai Jiao Tong University, Shanghai, China), Yan Jiao (ShangHai Shapere Information Technology Co. Ltd., Shanghai, China), Yongming Xu (ShangHai Shapere Information Technology Co. Ltd., Shanghai, China)</dc:creator>
    </item>
    <item>
      <title>On Choosing the $\mu$ Parameter in Gaussian Differential Privacy</title>
      <link>https://arxiv.org/abs/2606.09582</link>
      <description>arXiv:2606.09582v1 Announce Type: new 
Abstract: Recent work argues for using Gaussian differential privacy (GDP) to report the privacy guarantees in privacy-preserving machine learning. We provide principled mappings from pure-DP $\varepsilon$ to GDP $\mu$ by matching the worst-case success of a strong-adversary membership inference attack in terms of three metrics: multiplicative advantage at fixed FPR, precision at fixed recall, and the standard privacy profile. We tabulate $\mu$ values across a useful range of parameters and recommend $\mu \approx \varepsilon/5$ as a conservative general-purpose conversion.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09582v1</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bogdan Kulynych, Antti Honkela</dc:creator>
    </item>
    <item>
      <title>Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text</title>
      <link>https://arxiv.org/abs/2606.09585</link>
      <description>arXiv:2606.09585v1 Announce Type: new 
Abstract: Chain-of-Thought (CoT) improves the performance of Large Language Models (LLMs) and has been extended to Multimodal Large Language Models (MLLMs). More recent work further moves from text-based multimodal reasoning toward interleaved-modal reasoning, where intermediate steps can incorporate both textual rationales and visual evidence. In this work, we propose a bolder and more ambitious idea: could images alone serve as the reasoning medium for both language and multimodal tasks? To explore this, we propose optical reasoning, which treats images as a standalone reasoning medium. We instantiate this concept with two variants: typographic-based optical reasoning, which optimizes visual layouts for compact rationale rendering, and graphical-based optical reasoning, which composes text and graphical elements into structured visual rationales. Across mathematical, scientific, and interleaved-modal reasoning benchmarks, optical reasoning can match or even exceed traditional text reasoning while reducing reasoning tokens by an average of 28.57% on language tasks and 16% on multimodal tasks, achieving 1.96 times the token efficiency of text reasoning. These results show that images can effectively and efficiently encode rationales while providing a unified visual canvas for reasoning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09585v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yutong Bian, Dongjie Cheng, Heming Xia, Yongqi Li, Wenjie Li</dc:creator>
    </item>
    <item>
      <title>Pressure-robust and quasioptimal Discontinuous Galerkin discretisations of the $p$-Stokes problem</title>
      <link>https://arxiv.org/abs/2606.09586</link>
      <description>arXiv:2606.09586v1 Announce Type: new 
Abstract: In the present paper, we propose Local Discontinuous Galerkin (LDG) approximations for a nonlinear system of $p$-Stokes type, having $(p,\delta)$-structure. On the basis of the primal formulation, we prove well-posedness and stability (a priori estimates) of the methods under truly minimal regularity assumptions. We show that the first method possesses a pressure-robust and quasi-optimal error estimate, and discuss its consequences. Moreover, we propose a second method, for which we show a pressure-robust error estimate and prove convergence and convergence rates, which are optimal for linear ansatz functions for all $p\in (1,\infty)$ and $\delta\geq 0$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09586v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>P. A. Gazca-Orozco, M. R\r{u}\v{z}i\v{c}ka</dc:creator>
    </item>
    <item>
      <title>Seeing the Hivemind: A Consensus-Aware Interaction Technique for Mitigating AI Homogenization</title>
      <link>https://arxiv.org/abs/2606.09587</link>
      <description>arXiv:2606.09587v1 Announce Type: new 
Abstract: People are increasingly using AI for creative tasks such as writing. While adoption continues to grow, this form of use risks undermining individual creativity locally and reducing the heterogeneity of creative output at scale. In response, we introduce the Semantic Repulsion Technique (SRT) and evaluate it both computationally and through a study with 16 participants who regularly use AI for creative tasks. Our computational assessment reveals that SRT increases semantic diversity by 85--167\% while reducing consensus phrases by 43--95\% across task modes. In the user study, SRT outputs received higher usefulness ($p = .019$, $W = .208$) and coherence ratings ( $p = .006$, $W = .260$); 68.8\% of participants were willing to use SRT-Strong for multiple tasks versus 18.8\% for baselines. Originality and coherence ratings were positively correlated across all systems ($\rho = +.40$ to $+.67$), suggesting that divergence need not compromise readability. Taken together, these preliminary findings can inform the design of AI systems that aim to support everyday creativity without contributing to homogenization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09587v1</guid>
      <category>cs.HC</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Muhammad Haris Khan, Joel wester</dc:creator>
    </item>
    <item>
      <title>Probabilistically Checking Quantum Proofs, with Interaction</title>
      <link>https://arxiv.org/abs/2606.09588</link>
      <description>arXiv:2606.09588v1 Announce Type: new 
Abstract: The model of interactive oracle proofs (IOP) generalizes the notion of probabilistically checkable proof (PCP), in which a static proof is verified probabilistically by querying a small number of bits, to the interactive setting: a polynomial-time verifier interacts with an unbounded prover, but is restricted to only reading a small number of bits, in total, from the messages sent by the prover. IOPs provide a relaxed setting in which to study local probabilistic verification. They have proved instrumental in devising efficient methods for verification through subsequent compilation into non-interactive or succinct protocols.
  We study a quantum analogue of interactive oracle proofs (qIOP) in which the verifier and communication are both allowed to be quantum; yet the verifier is restricted to perform measurements only on a small number of qubits received from the prover. Our main result is a qIOP for any language in QMA, in which the total communication is polynomial but the verifier only reads a polylogarithmic number of qubits in total. The protocol has completeness parameter exponentially close to $1$ and soundness bounded away from $1$ by a constant. In the absence of a quantum PCP theorem, this provides the first information-theoretically sound local and robust characterization of QMA, albeit interactive.
  Our protocol combines the use of a quantum locally testable code (LTC) with classical techniques, notably probabilistically checkable proofs of proximity (PCPP). We avoid the necessity for complex multi-qubit tests employed in other settings by leveraging the local indistinguishability property of the quantum LTC.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09588v1</guid>
      <category>cs.CC</category>
      <category>quant-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Baocheng Sun, Thomas Vidick</dc:creator>
    </item>
    <item>
      <title>I Was Scrolling and Then I Saw a Pregnant Strawberry</title>
      <link>https://arxiv.org/abs/2606.09589</link>
      <description>arXiv:2606.09589v1 Announce Type: new 
Abstract: AI minidramas (also known as fruit dramas) are short, algorithmically distributed generative AI video series featuring anthropomorphized characters that have recently emerged as a widespread phenomenon on social media platforms. This paper argues that despite their seemingly innocuous aesthetic, these videos reproduce deeply gendered narrative structures in which female characters are systematically associated with moral transgression, sexual betrayal, and reproductive capacity, and that several plots also encode the logic of racialization, i.e., the process by which visible bodily difference is morally loaded. Drawing on feminist film theory, critical race theory, and platform studies, it further argues that the generative AI aesthetic of these videos, characterized by softness, roundness, and visual cuteness, functions as a mechanism of aesthetic laundering, neutralizing the ideological weight of these narratives and enabling their circulation despite content moderation systems. This paper approaches these questions through personal observation and close reading, reflecting on the specific affordances of generative AI that make this phenomenon both possible and culturally consequential for the field of computational creativity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09589v1</guid>
      <category>cs.CY</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Piera Riccio</dc:creator>
    </item>
    <item>
      <title>Clinically Grounded Privacy Evaluation of Medical LMs</title>
      <link>https://arxiv.org/abs/2606.09590</link>
      <description>arXiv:2606.09590v1 Announce Type: new 
Abstract: Medical language models (LMs) can memorize and reproduce protected health information, but privacy evaluations often focus on recovery of training text rather than disclosure under realistic threat models. We introduce a clinically grounded framework that evaluates leakage along a graded axis of adversarial access, ranging from publicly inferable demographics to leaked note fragments. At each tier, we measure verbatim memorization of patient-specific text and semantic leakage of sensitive diagnoses. Applying the framework to an LM pretrained on 378k clinical notes, we find that routine encounter metadata (i.e. name, date of birth, provider, practice, visit date) elicits high rates of verbatim memorization across a patient's timeline and sensitive-diagnosis recovery (AUROC 0.91 for abortion, 0.81 for HIV). At the same time, exact-match memorization can overstate disclosure: 36% of memorized tokens reflect templated documentation. Our work highlights the risks of training on longitudinal clinical data, providing a practical framework for contextual privacy evaluation of medical LMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09590v1</guid>
      <category>cs.CL</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sasha Ronaghi, Sana Tonekaboni, Lena Stempfle, Vivian Utti, Jordan Li Cahoon, Nathaniel Hendrix, Ayin Vala, Marzyeh Ghassemi, Emily Alsentzer</dc:creator>
    </item>
    <item>
      <title>Parent-Hash DAG: A Cost Analysis of Constant-Time Append for On-Chain Registries</title>
      <link>https://arxiv.org/abs/2606.09593</link>
      <description>arXiv:2606.09593v1 Announce Type: new 
Abstract: Provenance trees are append-only directed acyclic graphs of artifact registrations anchored on a public blockchain, recently introduced as the data substrate of operator-gated provenance infrastructure. Their defining data-structural pattern is a parent-hash directed acyclic graph (PHDAG), in which each append performs a constant number of storage writes to previously-untouched slots. This pattern has not previously been isolated as a standalone primitive, formally bounded with explicit constants, or benchmarked against the standard alternative, the incremental Merkle tree (IMT). We formalize PHDAG append as O(1) in gas cost, independent of registry size and tree depth, and develop a stochastic cost model for IMT in which per-insert cost is a random variable over the leaf index, deriving closed-form expressions for its mean and variance. We validate both analyses empirically on Base Sepolia across tree depths 1 to 25. PHDAG is observed to be depth-invariant at 76,276 gas (standard deviation about 6 gas), while IMT cost grows linearly with depth. The crossover below which IMT is cheaper falls far beneath the depths of every production registry surveyed. We further establish trustless registry reconstruction from public event logs in linear time with no off-chain dependency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09593v1</guid>
      <category>cs.DC</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ian C. Moore, Fernando Paredes Garcia</dc:creator>
    </item>
    <item>
      <title>Popcorn: A Configurable Benchmark for Visual Evidence in Multimodal Movie Recommendation</title>
      <link>https://arxiv.org/abs/2606.09595</link>
      <description>arXiv:2606.09595v1 Announce Type: new 
Abstract: Movies are long-form audiovisual works, yet recommender benchmarks often rely on trailers, thumbnails, or metadata. These sources differ in semantics and scalability: full movies preserve consumption-level evidence, trailers concentrate promotional highlights, and thumbnails provide sparse but catalog-scale visual signals. We present Popcorn, a configurable benchmark for visual evidence in multimodal movie recommendation, combining title-aligned full-movie/trailer embeddings with MovieLens-linked thumbnail features encoded by modern visual and vision-language models. Popcorn standardizes modality assembly, fusion, splitting, evaluation, and LLM-augmented metadata through a single configuration contract. Experiments show that thumbnail VLMs provide strong, scalable item-side evidence, while controlled trailer/full-movie comparisons show that visual evidence sources are not interchangeable: the choice of source and fusion strategy affects ranking accuracy, coverage, diversity, and calibration. The framework is available at https://github.com/RecSys-lab/Popcorn.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09595v1</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ali Tourani, Fatemeh Nazary, Yashar Deldjoo, Tommaso Di Noia</dc:creator>
    </item>
    <item>
      <title>Awareness of Technological Isomorphism: Integrating AI into Elementary Mathematics Teaching on Data and Prediction,A Case Study of the Compound Line Graph</title>
      <link>https://arxiv.org/abs/2606.09598</link>
      <description>arXiv:2606.09598v1 Announce Type: new 
Abstract: The deep integration of Artificial Intelligence (AI) into elementary mathematics education necessitates a conceptual tool capable of explaining students' cognitive transition from disciplinary knowledge to AI understanding. This study proposes a novel core concept, "Awareness of Technological Isomorphism, " defined as a student's metacognitive realization that their own mathematical cognitive operations (e.g., observing trends, inducing patterns, and making predictions) share an underlying logical structure with AI technical operations (e.g., pattern recognition and predictive modeling). This awareness, in turn, facilitates cognitive transfer from disciplinary mathematics to AI comprehension. Underpinned by transfer learning and metacognitive theories, this study clarifies the distinct essence of this concept from traditional "computational thinking." We demonstrate the explanatory power of this framework in two ways: elucidating the mechanism of students' cognitive leap from mathematics to AI, and guiding instructors to identify "isomorphic interfaces" within disciplinary curricula. On this basis, a three-stage pedagogical pathway--spanning "Perception, Comprehension, and Creation"--is constructed alongside a corresponding evaluation rubric. This framework is empirically validated through a case study based on the "Compound Line Graph" lesson from a fifth-grade mathematics textbook in China, offering a highly replicable operational framework for the deep convergence of disciplinary instruction and AI literacy education.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09598v1</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Li Li, Yu Cao</dc:creator>
    </item>
    <item>
      <title>Formal Foundations and Proof-Carrying Certificates for q-ary Covering Codes in Lean 4</title>
      <link>https://arxiv.org/abs/2606.09600</link>
      <description>arXiv:2606.09600v1 Announce Type: new 
Abstract: Covering codes in finite Hamming spaces ask for small sets of words whose Hamming balls cover the whole space. This paper presents a Lean 4 formalization of the elementary theory of q-ary covering codes, centered on certificate predicates for upper bounds, lower bounds, and exact covering numbers $K_q(n,r)$. The formalization proves the q-ary Hamming-ball volume formula, the sphere-covering lower bound, elementary exact cases, product and relation rules, and selected small exact certificates. It also demonstrates an end-to-end workflow for checking explicit upper bounds transcribed from van Laarhoven et al. (1989). The accompanying database is proof-carrying: stored bounds have traces that replay to Lean proofs of the corresponding upper- or lower-bound predicates. The contribution is not new record bounds or a reproduction of known tables, but a reusable, auditable foundation for machine-checked covering-code certificates.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09600v1</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Andreas Florath</dc:creator>
    </item>
    <item>
      <title>Assessing Sample Quality in Conditional Generation under Compositional Shift</title>
      <link>https://arxiv.org/abs/2606.09601</link>
      <description>arXiv:2606.09601v1 Announce Type: new 
Abstract: Conditional generators provide a natural tool for controllable generation, including settings where the desired condition is a new composition of observed attributes or experimental factors. In many applications, especially in scientific domains, such models are attractive to explore conditions for which real samples are rare, expensive, or not yet observed. However, this creates a circularity for evaluation: standard conditional quality metrics require a reference target distribution, but in the extrapolative regime that distribution is unavailable by definition. We address this problem with a post-hoc, per-sample trust score for assessing conditional samples using only the training distribution. The score combines two estimable quantities: global realism, measuring compatibility with the real data manifold, and attribute-wise faithfulness, measuring whether a sample is closer to the requested attributes than to plausible alternatives. We show that the score can recover meaningful comparisons across extrapolated generations, under a mild coverage condition on the observed attributes. These comparisons enable effective filtering, ranking, and abstention of generations and can be used directly on off-the-shelf pretrained models. In biological imaging, selected samples preserve real morphological structure better and improve downstream predictive performance, while similar gains are observed on controlled vision benchmarks. Finally, we show how the score can be applied during generation, enabling abstention before full decoding. Code is available at https://github.com/berkerdemirel/faithful-cond-gen.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09601v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Berker Demirel, Valentino Maiorca, Marco Fumero, Theofanis Karaletsos, Francesco Locatello</dc:creator>
    </item>
    <item>
      <title>Automated IEP Generation from Traditional Chinese Parent-Teacher Interviews via Corpus-Grounded Feature Diffusion</title>
      <link>https://arxiv.org/abs/2606.09603</link>
      <description>arXiv:2606.09603v1 Announce Type: new 
Abstract: Writing Individualized Education Programs (IEPs) is a high-labor, knowledge-intensive document burden; English-language research has demonstrated that generative AI can significantly reduce drafting time, yet automated IEP generation in Traditional Chinese remains virtually unexplored due to domain data scarcity, strict privacy regulations, and the absence of local evaluation benchmarks. We propose a low-resource fine-tuning pipeline centered on Corpus-Grounded Feature Diffusion (CGFD): (1) 25 dual-expert high-score seed transcripts are selected via a tau threshold with flag-aware score caps; (2) a FeatureProfile (sentence length, structure, quantification templates) is extracted from seeds and injected into LLM prompts alongside Verbalized-Sampling-style diversity control to drive diffusion; (3) 15 expert gold seeds are used as diffusion anchors, targeting 585 samples; 567 valid diffusion samples are obtained, yielding a 582-sample training set used to fine-tune Breeze-7B with QLoRA; (4) schema-constrained inference via Grammar-Constrained Decoding (GCD) enforces a hierarchical SMART Goal Ladder schema at inference time. Ablation results on a 55-sample schema stress set reveal an unexpected finding: GCD is counterproductive under Traditional Chinese token budgets -- the no-GCD path achieves 100% schema pass rate at 34% lower median latency, outperforming GCD on both reliability and speed. On the n=10 formal hold-out, the no-GCD inference path achieves BERTScore F1 = 0.779, exceeding GPT-5.4 (0.726), DeepSeek-V3.2 (0.703), Gemini-3-Flash-Preview (0.703), and Llama-4-Maverick (0.700) zero-shot baselines while maintaining fully local, air-gapped inference. This system addresses a gap in Traditional Chinese special-education NLP and offers a scalable, privacy-preserving local inference solution under an industrial engineering paradigm.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09603v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kuanlin Chen, Cheng-En Ou</dc:creator>
    </item>
    <item>
      <title>Next-Token Prediction Learns Generalisable Representations of Sleep Physiology</title>
      <link>https://arxiv.org/abs/2606.09605</link>
      <description>arXiv:2606.09605v1 Announce Type: new 
Abstract: Foundation models offer a promising route to compress multi-modal physiological signals into compact representations of human health, with broad applications across sleep medicine, cardiology, neurology and other healthcare domains. Existing models have typically been trained with masked-reconstruction or contrastive objectives. However, masked reconstruction may be poorly suited to the stochastic nature of these signals, while contrastive approaches rely on positive-pair definitions despite the semantic invariances of physiological signals being poorly understood. In this work, we show that next-token prediction is a simple and scalable alternative. We develop Hypnos, a multi-modal sleep foundation model trained using eight different sensing modalities (e.g. EEG, ECG, respiratory signals) drawn from over 20,000 overnight polysomnography recordings. We tokenize each modality into streams of discrete tokens using residual vector quantization, then train a large auto-regressive RQ-Transformer to jointly predict the next token across all modalities in parallel. After training, Hypnos can be applied to continuous streams of sensor data from any subset of supported modalities, generating embeddings for downstream tasks. Across a range of benchmarks, Hypnos significantly outperforms existing foundation models. In sleep stage classification, we match the performance of strong supervised baselines on held-out test sets whilst using \(100\times\) less labelled data. Hypnos even generalises to daytime physiology, surpassing a dedicated ECG foundation model at detecting atrial fibrillation. Our results demonstrate that next-token prediction is a strong self-supervised objective for representation learning from multi-modal physiological signals.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09605v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Jonathan F. Carter, Lionel Tarassenko</dc:creator>
    </item>
    <item>
      <title>Path-Traced Inverse Rendering with Global Illumination in 3D Gaussian Fields</title>
      <link>https://arxiv.org/abs/2606.09606</link>
      <description>arXiv:2606.09606v1 Announce Type: new 
Abstract: Ray tracing enables 3D Gaussian fields to serve as a representation for physically based light transport. Faithful inverse rendering requires forward rendering and backward optimization to be defined within a consistent light-transport pipeline. Existing inverse rendering methods estimate G-buffers via splatting and optimize materials in screen space, tying the recovered properties to a rasterization-based pipeline. This pipeline mismatch, together with simplified rendering equations that neglect indirect illumination, often leads to inconsistent shading, visible artifacts, and inaccurate material-lighting estimation under path-traced rendering. Therefore, we propose a splatting-free path-traced inverse rendering framework for 3D Gaussian fields, where forward light transport and backward gradient propagation are defined within a unified ray-tracing pipeline. Our key idea is to define a path-space equivalent interaction model for overlapping Gaussian primitives, under which Monte-Carlo-based path tracing is unbiased for the induced light-transport integral, while pathwise gradients are replayed over the same ray-traced interactions rather than splatting-derived screen-space buffers. The framework optimizes materials and a compact Spherical-Gaussian environment under the full rendering equation with ray-traced visibility and multi-bounce light transport. Extensive experiments demonstrate competitive material inversion and improved path-traced rendering quality, producing more plausible shadows, reflections, and relighting results under global illumination.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09606v1</guid>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Junke Zhu, Hao Zhang, Yutian Zhu, Ang Li, Chenxiao Hu, Meng Gai, Fei Zhu, Zhangjin Huang, Sheng Li</dc:creator>
    </item>
    <item>
      <title>Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes</title>
      <link>https://arxiv.org/abs/2606.09607</link>
      <description>arXiv:2606.09607v1 Announce Type: new 
Abstract: Interpretability increasingly treats groups of components, not individual units, as the basic object, and proposes to find them by clustering co-activation statistics. We ask whether such a cheap signal actually identifies an attention-head circuit. Adapting a sparse-autoencoder clustering recipe to attention heads -- but validating by causal ablation rather than reconstruction -- we cluster heads and then run a closure test: ablate the discovered community and compare per-example damage to matched-random controls. Across two dense 1B-scale models (Pythia 1B, OLMo 1B) and two input distributions, the communities pass closure. In a Mixture-of-Experts model (OLMoE-1B-7B), route-conditional clustering recovers a statistically real signal that nonetheless does not survive closure -- ablation improves loss, the wrong direction. Extending closure across training, attention-target selectivity and participation ratio decouple from function in both directions. We conclude that a cheap signal is a circuit proposal, not a confirmed circuit; closure is what separates them.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09607v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yongzhong Xu</dc:creator>
    </item>
    <item>
      <title>TUDSR: Twice Upsampling-Diffusion for Higher Super-Resolution</title>
      <link>https://arxiv.org/abs/2606.09608</link>
      <description>arXiv:2606.09608v1 Announce Type: new 
Abstract: Diffusion-based generative models have achieved remarkable success in real-world image super-resolution (SR). With tiled diffusion techniques, these models can produce high-resolution images that exceed their native-supported resolution. However, the quality of such high-resolution (e.g $2048^2$) outputs often remains extremely poor, primarily due to two factors we consider: the image upsampling ratio (e.g $\times8$) exceeding the model's native-supported upsampling ratio (e.g $\times4$), and the model's native-supported resolution. In practice, training a native high-resolution model requires larger architectures, which incur significant computational overhead and GPU memory costs, making it hard on limited-resource equipment. Thus, we present TUDSR, a Twice Upsampling-Diffusion framework for higher SR. The TUDSR framework mainly consists of two stages: the first involves training at $R$-resolution, and the second introduces a looped chunk-based training strategy at $NR$-resolution. Each stage adapts a one-step GAN architecture comprising a generator and a discriminator. Based on SD2.1-base, we develop TUDSR-S, which achieves state-of-the-art performance across multiple benchmarks. Extensive experiments further demonstrate that TUDSR-S generates high-quality images at the resolutions of $1024^2$ and even $2048^2$, significantly outperforming existing approaches. Code is available at https://github.com/wuer5/TUDSR.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09608v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhiqiang Wu, Yitong Dong, Xian Wei</dc:creator>
    </item>
    <item>
      <title>Shape Formation for the Cooperative Transportation of Arbitrary Objects Using Multi-Agent Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2606.09610</link>
      <description>arXiv:2606.09610v1 Announce Type: new 
Abstract: Cooperative object transportation is essential in numerous domains, including industrial to domestic services. A popular transportation strategy is to carry objects on top of multi-robot systems. The corresponding task is typically solved by decomposing it into three interconnected subproblems: formation control, cooperative navigation, and collision avoidance. A particular challenge posed by real-world objects is their potentially arbitrary shape and non-uniform mass distribution, necessitating robot formations that securely support the object. In this work, we address the challenge of pattern formation control for transporting such real-world objects by proposing a novel multi-agent reinforcement learning approach. Our approach enables a multi-robot system to autonomously position itself underneath an object to support its weight while avoiding obstacles during the formation process. Our evaluations with diverse environments and varying numbers of robots show that our approach leads to policies that reliably produce balanced formations and generalize to cluttered scenes and objects with complex geometry and non-uniform mass distribution.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09610v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mohamed Sayed, Wolfram Burgard, Tanja Katharina Kaiser</dc:creator>
    </item>
    <item>
      <title>AGENTSERVESIM: A Hardware-aware Simulator for Multi-Turn LLM Agent Serving</title>
      <link>https://arxiv.org/abs/2606.09613</link>
      <description>arXiv:2606.09613v1 Announce Type: new 
Abstract: Multi-turn LLM agents interleave model calls with external tool invocations, shifting serving from stateless request processing to stateful program execution. Serving these workloads requires scheduling, KV-cache management, and routing policies that use program-level context, including turn dependencies, tool-induced gaps, and reusable KV state. Evaluating such policies directly on real systems is costly, since each design point may require dedicated accelerator time across arrival rates, model scales, serving-instance counts, and memory hierarchies. Simulation offers a scalable alternative, but existing LLM serving simulators target stateless request-level workloads and therefore omit the core dynamics of agent serving: multi-turn program execution, cross-turn cache locality, and KV-cache residency during tool gaps. We present AGENTSERVESIM, a hardware-aware simulator for multi-turn LLM agent serving. AGENTSERVESIM evaluates serving policies at program granularity through composable modules: a Program Orchestrator preserves program identity and turn order, a Tool Simulator materializes tool-induced gaps, a Session-Aware Router maintains program-to-instance affinity for cache-aware dispatch, and a KV Residency Model tracks policy-defined KV placement across HBM, host DRAM/CXL, and eviction. Across real serving deployments and hardware configurations, AGENTSERVESIM reproduces real-system behavior within 6% error across key performance metrics while running entirely on commodity CPUs. These results show that AGENTSERVESIM enables controlled, repeatable exploration of agent-serving policies without requiring exhaustive deployment on costly accelerators.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09613v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rakibul Hasan Rajib, Mengxin Zheng, Qian Lou</dc:creator>
    </item>
    <item>
      <title>DexPIE: Stable Dexterous Policy Improvement from Real-World Experience</title>
      <link>https://arxiv.org/abs/2606.09615</link>
      <description>arXiv:2606.09615v1 Announce Type: new 
Abstract: Dexterous manipulation presents substantial challenges for imitation learning due to its high-dimensional action space and complex contact-rich dynamics. Policies trained purely from demonstrations often suffer from compounding errors during deployment and require large amounts of expert data to achieve reliable performance. To move beyond the limitations of demonstration data, in this work, we propose DexPIE, a post-training framework for dexterous policy improvement from experience collected through real-world deployment. First, DexPIE enables effective exploration coverage through a dexterous-hand-adapted intervention system and multi-stage DAgger-style data collection across initial and intermediate task stages, providing reliable supervision for accurate policy evaluation. To reduce temporal noise between post-training rollouts and demonstration data, we introduce asynchronous inference in the relative action space, which better aligns rollout data with demonstrated behavior and allows the critic to learn a value function induced by a more consistent underlying policy. Finally, DexPIE improves the policy through conditioning on a continuous optimality indicator, allowing the policy to leverage the quality of data in a more fine-grained manner. Across three challenging real-world dexterous manipulation tasks, DexPIE achieves a 37% improvement in success rate over the demonstration-based reference policy, outperforming all baseline methods and demonstrating stronger robustness. The source code and dataset will be made publicly available.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09615v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ruizhe Liao, Wenrui Chen, Liangji Zeng, Haoran Lin, Fan Yang, Kailun Yang, Yaonan Wang</dc:creator>
    </item>
    <item>
      <title>Strict-Priority Packet Delay in Switches with Transmit-Ring Buffering</title>
      <link>https://arxiv.org/abs/2606.09619</link>
      <description>arXiv:2606.09619v1 Announce Type: new 
Abstract: Strict Priority (SP) scheduling is widely used at switch egress to provide low-latency service to high-priority (HP) traffic. Existing deterministic and stochastic latency models typically account for scheduler behavior and packet transmission, but omit a common switch implementation detail: the transmit ring (TXR) between the scheduler and the physical port. Because the switch must prepare the next packet before the current transmission completes, packets already placed in the TXR can further delay HP packets. This changes both the worst-case delay and the per-hop delay distribution of HP packets. This paper identifies this modeling gap, extends standard SP latency models to include the TXR, and validates the revised model through measurements on multiple switches. It also provides a measurement method for estimating the TXR size, a parameter that is often not reported in switch datasheets. The resulting model provides a closer representation of switch behavior for systems that use SP scheduling and require either delay bounds or delay distributions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09619v1</guid>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yash Deshpande, Quirin Vogel, Wolfgang Kellerer</dc:creator>
    </item>
    <item>
      <title>Motion planning for hundreds of floating robots</title>
      <link>https://arxiv.org/abs/2606.09620</link>
      <description>arXiv:2606.09620v1 Announce Type: new 
Abstract: Planning collision-free motion for large robot fleets is difficult because collision avoidance induces strong inter-agent coupling that grows rapidly with team size. We consider omnidirectional floating robots on water, where choreographies are specified by sparse keyframes and an interactive tool must generate trajectories within seconds, even when transitions span minutes and thousands of time steps. We propose a scalable pipeline that builds a collision graph from an initialization, decomposes the coupled problem into interaction clusters, and solves clusters independently (and in parallel) with robustness mechanisms for common decomposition pathologies. We validate the approach in simulations up to 500 robots. The synthesized trajectories have also been deployed in two real-world demonstrations, on Lake Z\"urich with a fleet of 24 Way of Water crafts and at the Time Space Existence 2025 Venice Biennale.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09620v1</guid>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jan Kamm, Antonio Terpin, Raffaello D'Andrea, Aswin Ramachandran</dc:creator>
    </item>
    <item>
      <title>Constrained user-item allocation for e-commerce marketing campaigns</title>
      <link>https://arxiv.org/abs/2606.09623</link>
      <description>arXiv:2606.09623v1 Announce Type: new 
Abstract: When running marketing campaigns, retailers must decide which products to promote and which users to target. These decisions are inherently coupled: effective campaigns match users and items with strong mutual affinity into non-overlapping groups of predefined sizes. However, existing approaches assume predefined campaign structure or decouple item selection from user assignment, and cannot discover campaign groupings directly from joint interaction patterns. We therefore formalize this campaign problem as auto-targeting: jointly selecting users and items to construct multiple disjoint campaigns. To solve this combinatorial problem, we propose three complementary strategies: (i) constrained spectral biclustering to find dense regions in the user-item affinity matrix, (ii) greedy local search with pairwise swaps for combinatorial refinement, and (iii) a multi-armed bandit framework to escape local optima through exploration. We evaluate these methods on a synthetic dataset, the Amazon Reviews benchmarks, and large-scale proprietary commercial data, and compare the results to simulated annealing as a baseline. The results show that biclustering consistently achieves the highest campaign quality, lift, and fairness scores. While biclustering runs efficiently on smaller datasets, its runtime increases substantially on very large ones, where bandit-based methods instead offer a scalable alternative.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09623v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Maja Lindstr\"om, Natalija Glisovic, Jan von Pichowski, Tommy L\"ofstedt, Martin Rosvall</dc:creator>
    </item>
    <item>
      <title>ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies</title>
      <link>https://arxiv.org/abs/2606.09630</link>
      <description>arXiv:2606.09630v1 Announce Type: new 
Abstract: Vision-language-action (VLA) policies provide strong priors for language-conditioned manipulation, but remain brittle in off-nominal states requiring targeted recovery. We propose ReCoVLA -- a failure-conditioned residual recovery framework that keeps a pretrained VLA policy frozen, uses an external vision-language model (VLM) to infer the failure mode and recovery stage, and compiles a structured reward from task-relevant components. Rather than using the VLM to generate actions or rewards directly, ReCoVLA uses it as a semantic reward selector: it predicts a recovery descriptor and reward mask for in-simulation residual-policy training, followed by zero-shot sim-to-real deployment of the trained recovery policies. This decouples high-level failure understanding from low-level corrective control to support different VLAs. Experiments across short-horizon, long-horizon, and contact-rich manipulation tasks show that ReCoVLA outperforms the tested baselines on average. In simulation, our reward compiler improves average success from 36.7% for the fine-tuned $\pi_{0.5}$ baseline to 66.7%. In physical zero-shot sim-to-real experiments, ReCoVLA achieves the best average performance, with 61.7% success.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09630v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haodi Hu, Chung-Ta Huang, Jing Liu, Ye Wang, Kei Suzuki, Matthew Brand, Toshiaki Koike-Akino</dc:creator>
    </item>
    <item>
      <title>Efficiently Restructuring Sovereign Debt via Arctic Auctions with Convex Costs</title>
      <link>https://arxiv.org/abs/2606.09631</link>
      <description>arXiv:2606.09631v1 Announce Type: new 
Abstract: We study the problem of computing competitive equilibria in the Arctic product-mix auction, originally developed for the Icelandic government for exchanging blocked financial accounts, and more recently proposed by IMF staff for sovereign debt restructuring. From the buyers' perspective, the Arctic auction is equivalent to the quasi-linear Fisher market. However, unlike the standard Fisher model, the seller can express rich supply preferences through explicit supply-side costs and constraints. Despite extensive algorithmic literature on Fisher markets, the seller side has not received much attention, and no polynomial-time algorithm was previously known for computing competitive equilibrium when sellers face nontrivial costs.
  We examine the natural and expressive regime of separable, stepwise-increasing marginal costs that underlie the above-stated applications. Using polyhedral theory techniques, we first show that rational inputs lead to rational-valued competitive equilibria. Motivated by this result, we develop the first polynomial-time algorithm for this setting based on a non-trivial extension of classic primal-dual balanced-flow techniques for linear Fisher markets. Our work provides a robust computational foundation for auctions with sophisticated preferences, paving the way for flexible and institutionally feasible market designs in global finance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09631v1</guid>
      <category>cs.GT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jugal Garg, Edwin Lock, Vijay V. Vazirani</dc:creator>
    </item>
    <item>
      <title>Civil Court Simulation with Large Language Models</title>
      <link>https://arxiv.org/abs/2606.09632</link>
      <description>arXiv:2606.09632v1 Announce Type: new 
Abstract: Court simulation bridges legal education and judicial practice, yet human-based simulations are costly and difficult to scale. Large language models (LLMs) offer a scalable alternative, but existing court-simulation research mainly focuses on criminal cases. Civil litigation is more common in practice and harder to simulate because its claims, liability, and remedies are more flexible. We present a multi-agent court simulation framework for Chinese civil cases. The framework organizes role-based interaction through a five-stage civil trial procedure and integrates memory module and statute retrieval to support long-process adjudication. Experiments show that the framework produces reliable civil judgments, with clear strengths in liability allocation and multi-item adjudication. Further experiments show that memory quality substantially affects downstream simulation quality. Through a five-layer factor framework, we analyze how legal grounding, information conditions, judicial capability and role orientation, organizational pressure, and social context affect the framework's reliability and behavior. These results support the effectiveness of the proposed framework for civil court simulation. The dataset and code are available at: https://github.com/foggpoy/Civil-Court.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09632v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yifan Chen, Haitao Li, Kaiyuan Zhang, Yueyue Wu, Qingyao Ai, Yiqun Liu</dc:creator>
    </item>
    <item>
      <title>ATN3D: Density-Aware LiDAR-Radar Early 3D Object Detection Under Extreme Sparsity</title>
      <link>https://arxiv.org/abs/2606.09634</link>
      <description>arXiv:2606.09634v1 Announce Type: new 
Abstract: 3D object detection is the backbone of perception for automated vehicles (AV) and broader intelligent transportation systems applications. Long-range detection is challenging because sensing evidence is sparse; yet this ``long-range'' scenario is routine in traffic. Although &gt;30m is often labeled long-range in computer vision, on roadways it affords only approx. 1-2s for perception and decision-making. Under such extreme sparsity, two core challenges arise. First, early multimodal fusion tends to discard sparsity information and inject noise from empty or falsely occupied cells, degrading long-range recall. Second, context-agnostic uniform channel supervision favors dense and near-range samples, leaving far and small objects under-optimized, delaying the earliest detection of distant objects. We propose ``Ask The Neighbor'' (ATN3D), a LiDAR-Radar framework tailored for sparse-range conditions. ATN3D introduces (i) Density-aware early fusion with cross-modal gating that conditions fusion on per-voxel density/sparsity and Radar evidence, (ii) Occupancy-gated neighborhood aggregation with circular kernels to aggregate only from credible cells, (iii) Evidence-conditioned channel self-attention to adapt channel weights with weather/range, and (iv) a Range-aware loss that re-balances classification and localization by distance, aligning training with distance-stratified evaluation. On the VoD benchmark across clear and foggy conditions, ATN3D surpasses strong baselines: +3.55% mAP in clear weather and +8.41% mAP under simulated heavy fog; for &gt;30m objects, gains are +3.33% (clear) and +2.09% (heavy fog). These results indicate earlier and more reliable long-range detections under sparse sensing in on-road traffic.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09634v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Debojyoti Biswas, Xianbiao Hu</dc:creator>
    </item>
    <item>
      <title>Gradient-Guided Reward Optimization for Inference-time Alignment</title>
      <link>https://arxiv.org/abs/2606.09635</link>
      <description>arXiv:2606.09635v1 Announce Type: new 
Abstract: Ensuring the reliability of Large Language Models (LLMs) under distribution drift requires inference-time adaptation. While inference-time alignment methods such as Best-of-$N$ and rejection sampling are widely used, they frame the task as a sampling-intensive, reward-guided search, leading to two key limitations: their performance is bounded by the base model's generation quality, and their reliance on imperfect reward models makes them vulnerable to reward hacking. To address these challenges, we introduce Gradient-Guided Reward Optimization (GGRO), a lightweight inference-time method that performs targeted, minimal intervention during decoding via gradient guidance. Specifically, GGRO monitors token-level entropy to identify high-uncertainty regions indicative of drift or misalignment. Upon detection, it responds by injecting nudging tokens, generated using gradient signals from an off-the-shelf reward model, to steer the generation trajectory rather than merely re-ranking samples. Experiments show that GGRO consistently improves inference-time alignment across safety, helpfulness, and reasoning benchmarks. It also increases coverage of high-quality responses and robustness to reward hacking, with minimal computational overhead. Code is available at https://github.com/lhk2004/GGRO.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09635v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hankun Lin, Ruqi Zhang</dc:creator>
    </item>
    <item>
      <title>Agentic Persona Generation with Critique-Refinement: An Industrial Evaluation</title>
      <link>https://arxiv.org/abs/2606.09637</link>
      <description>arXiv:2606.09637v1 Announce Type: new 
Abstract: Personas are widely used in software engineering to support requirements elicitation, design, and validation, but their manual creation is costly, time-consuming, and hard to scale. Recent LLM-based approaches automate persona generation from textual data; however, they typically rely on single-shot generation and subjective evaluations, limiting practical reliability. We present PerGent, an industry-grade method for persona generation built around an iterative critique-refinement loop. Specifically, PerGent uses a generator and a critic LLM agent, coordinated by an orchestrator, to iteratively refine personas using external resources such as interviews, surveys, and job postings through a critique-refinement loop with a user-defined maximum number of rounds. We deploy and evaluate PerGent in an industrial setting at Kinaxis, comparing it with three baselines, including one-shot methods. In an expert in-situ evaluation, PerGent achieved the highest expert approval rate (96.9%), exceeding all baselines. We further compare PerGent-generated personas with best-practice personas manually created by domain experts prior to the adoption of LLMs. Compared to baselines, PerGent reproduces a larger proportion of expert content while also contributing substantial new content beyond the pre-LLM personas. We conclude with lessons learned from deploying and evaluating PerGent at Kinaxis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09637v1</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Mohammad Hossein Amini, David Dewar, Shiva Nejati, Mehrdad Sabetzadeh</dc:creator>
    </item>
    <item>
      <title>Data-driven discovery of governing differential equations across physical systems</title>
      <link>https://arxiv.org/abs/2606.09638</link>
      <description>arXiv:2606.09638v1 Announce Type: new 
Abstract: Differential equations play a critical role in scientific discovery because they provide a mathematical framework to describe the behaviour of physical phenomena. As a promising alternative to traditional first principles, data-driven differential equation discovery has attracted increasing attention for its ability to infer governing laws directly from experimental or simulated data, especially when the underlying physics is unclear. However, the field has expanded rapidly along diverse methodological directions, particularly with the emergence of AI-based approaches, and still lacks a clear organizing perspective. In this Review, we propose a problem-oriented perspective on data-driven differential equation discovery. We first introduce a two-dimensional phase diagram of equation discoverability, where discovery problems are organized according to structural complexity and coefficient complexity. This phase diagram shows how the field has moved from the discovery of sparse equations with simple coefficients toward more complex governing laws with richer structures and more flexible parameterizations. It also clarifies why different methodological families succeed or fail in different problem settings. We then present the representation-evaluation-optimization (REO) framework as a fundamental abstraction of the discovery process. By identifying the core problems of equation discovery that persist across algorithmic variations, REO shifts the discussion from individual algorithms to the fundamental principles that determine discoverability. We connect these perspectives to applications across physics and adjacent sciences, and argue that the next challenge is not merely recovering equations, but using them to revise existing theories, distil mechanisms and form new scientific concepts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09638v1</guid>
      <category>cs.LG</category>
      <category>cs.SC</category>
      <category>math-ph</category>
      <category>math.MP</category>
      <category>physics.comp-ph</category>
      <category>stat.AP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Siyu Lou, Hao Xu, Wenguan Wang, Lu Lu, Hao Sun, Yang Liu, Linfeng Zhang, Dongxiao Zhang, Yuntian Chen</dc:creator>
    </item>
    <item>
      <title>CineDance: Towards Next-Generation Multi-Shot Long-Form Cinematic Audio-Video Generation</title>
      <link>https://arxiv.org/abs/2606.09639</link>
      <description>arXiv:2606.09639v1 Announce Type: new 
Abstract: The fidelity and structural diversity of training datasets fundamentally determine the capabilities of video generation models. While commercial systems showremarkableabilitytogeneratecinematicnarratives, the progress of open-source models remains limited by the scarcity of high-quality training data. To bridge this gap, we introduce CineDance-1M, a large-scale, open research Text-to-Audio-Video (T2AV) dataset designed specifically for multi-shot, long-form joint audio-video generation. Averaging 92.8 seconds and 24.2 continuous shots per video, it provides configurable, structured annotations for both audio and video modalities. This exceptional quality is achieved through a rigorous three-stage curation pipeline: i) diverse sourcing and comprehensive cleansing, ii) film-theory-inspired narrative parsing, and iii) hierarchical dual-modal captioning. For a comprehensive assessment, we propose CineBench, featuring a diverse prompt suite and a six-dimensional, human-aligned metric system tailored for complex narrative audio-video evaluation. Furthermore, we adapt LTX-2.3 into CineDance, which demonstrates exceptional single-modality quality alongside precise audio-video alignment and robust subject and environment consistency, effectively validating our curation strategy and the high quality of CineDance-1M. We anticipate that this work will serve as a solid foundation for accelerating future research in multi-shot, long-form joint audio-video generation. Our project page is available at https://aliothchen.github.io/projects/CineDance/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09639v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuheng Chen, Teng Hu, Yuji Wang, Qingdong He, Zhucun Xue, Qianyu Zhou, Xiangtai Li, Lizhuang Ma, Jiangning Zhang, Dacheng Tao</dc:creator>
    </item>
    <item>
      <title>Physics-Aware Sparse Learning and Selective Online Adaptation for Euler-Lagrange Robot Dynamics</title>
      <link>https://arxiv.org/abs/2606.09640</link>
      <description>arXiv:2606.09640v1 Announce Type: new 
Abstract: Accurate dynamics models are essential for model-based robotic control, yet nominal Euler--Lagrange models often become inaccurate in the presence of payload variation, unmodeled coupling, friction, aerodynamic effects, and changing operating conditions. Most learning-based correction methods improve prediction accuracy by introducing a single additive residual, but do not preserve the internal mechanical structure of Euler--Lagrange systems. This leads to models that do not preserve symmetry, positive-definiteness, or the coupling between inertia and velocity-dependent terms, which can result in physically inconsistent predictions and reduced reliability when embedded in model-based controllers. We propose a structure-preserving residual learning framework that decomposes model mismatch into an inertia correction, the corresponding induced Coriolis term, and a generalized-force residual. The mechanical component is learned under physical constraints, while the disturbance-sensitive component is represented through a sparse history-dependent latent interaction model and adapted online using Bayesian linear regression. This separation preserves key mechanical structure while restricting adaptation to the part of the dynamics most affected by changing conditions. Experiments across multiple robotic platforms, including mobile, aerial, and manipulator systems, show that the proposed method improves dynamics prediction and trajectory tracking under coupled and time-varying dynamics. These results highlight the value of combining structured residual modeling, compact latent interaction selection, and selective online adaptation for real-world model-based control.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09640v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rishabh Dev Yadav, Samaksh Ujjawal, Sihao Sun, Spandan Roy, Wei Pan</dc:creator>
    </item>
    <item>
      <title>MAVIS: Multi-Agent Video Retrieval via Structured Video Understanding</title>
      <link>https://arxiv.org/abs/2606.09641</link>
      <description>arXiv:2606.09641v1 Announce Type: new 
Abstract: The dominant paradigm in video retrieval relies on embedding-based full-corpus scanning, which suffers from inherent computational inefficiency and the semantic asymmetry between information-dense videos and sparse textual queries. To bridge this gap, we introduce \textbf{MAVIS}, a novel multi-agent framework that rethinks retrieval as cooperative reasoning rather than brute-force search. MAVIS first bridges the granularity mismatch by parsing raw videos into a \textbf{Structured Semantic Library}, enabling explicit attribute-level indexing. During retrieval, a planner decomposes complex user intents into atomic sub-tasks, dispatching specialized agents to independently nominate candidates. Crucially, MAVIS employs a \textbf{Logic-aware Debate} mechanism with a strict veto protocol, where agents collaboratively prune logical mismatches to identify a compact set of ``controversial'' candidates for fine-grained verification. This agentic workflow effectively bypasses the inefficiency of full-library traversal. Extensive experiments on MSR-VTT, MSVD, and ActivityNet demonstrate that MAVIS achieves competitive performance without task-specific fine-tuning, offering a scalable and interpretable alternative to traditional dual-encoder approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09641v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jie Zhang, Qilang Ye, Hao Zhou, Haochen Liang, Fei Luo</dc:creator>
    </item>
    <item>
      <title>FMplex: Model Virtualization for Serving Extensible Foundation Models</title>
      <link>https://arxiv.org/abs/2606.09643</link>
      <description>arXiv:2606.09643v1 Announce Type: new 
Abstract: Foundation models (FMs) are increasingly used as backbones for downstream tasks across language, vision, time-series, and multimodal applications. Yet existing model-serving systems deploy each customized task as an independent model instance, thereby replicating heavyweight backbones, wasting accelerator memory, and losing opportunities to amortize batching and loading costs. This paper presents FMplex, a serving system that treats FM backbones as a virtualization substrate for deployment sharing. FMplex presents each task with a virtual foundation model (vFM), a logically private FM instance backed by a shared physical FM. This abstraction lets independently customized tasks share a backbone while preserving task-specific extensions, independent lifecycles, and task-level isolation. In addition, we propose a batch-aware fair-queueing scheduler that combines weighted task-level sharing with inter- and intra-task batching across colocated tasks. We implement a FMplex-based serving stack spanning task construction, sharing-aware deployment, and runtime execution. Across 7 FM backbones (16 variants) and 92 downstream tasks, FMplex reduces latency by up to 80% over spatial partitioning and 33.3% over best-effort co-location, while hosting up to 6x more tasks at cluster scale.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09643v1</guid>
      <category>cs.DC</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.OS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hetvi Shastri, Pragya Sharma, Walid A. Hanafy, David Irwin, Mani Srivastava, Prashant Shenoy</dc:creator>
    </item>
    <item>
      <title>Where Does the Answer Come From? Benchmarking View-Level Visual Evidence Identification in Multi-View MLLMs for Autonomous Driving</title>
      <link>https://arxiv.org/abs/2606.09644</link>
      <description>arXiv:2606.09644v1 Announce Type: new 
Abstract: Multimodal large language models (MLLMs) achieve strong results on visual reasoning benchmarks, but answer accuracy alone does not indicate whether a model relied on the correct visual evidence. This gap is particularly important in multi-view driving scenes used for autonomous driving, where a model can produce a plausible answer while grounding it in the wrong camera view. We introduce a multi-view visual question answering benchmark for evaluating evidence-source identification: given six synchronized NuScenes views and a question, the model must identify the supporting camera view and answer the question. The benchmark contains 122 conflict-centric question-answer pairs from 73 scenes, spanning causality, counterfactual reasoning, and intent prediction. View labels are proposed by an automatic conflict-mining pipeline and manually verified by annotators. We evaluate three settings: camera-view selection, oracle QA given the golden view, and joint prediction in which the model selects a view and answers in one pass. Answers are evaluated in both multiple-choice and free-form formats, using exact match for structured predictions and an LLM judge for free-form responses. By explicitly separating visual-source identification from answer correctness, the benchmark exposes grounding failures that answer-only evaluation misses.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09644v1</guid>
      <category>cs.CL</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yimu Wang, Yee Man Choi, Barry Zhang, Mozhgan Nasr Azadani, Sean Sedwards, Krzysztof Czarnecki</dc:creator>
    </item>
    <item>
      <title>Modeling Components and Connections in Cyber-Physical Systems</title>
      <link>https://arxiv.org/abs/2606.09645</link>
      <description>arXiv:2606.09645v1 Announce Type: new 
Abstract: Text based configuration files for cyber-physical systems show the hierarchy of component modules well but often hide the details of connections and interfaces between modules. A model-based visual approach to these configuration files can better capture this information. The XML structure of Robot Operating System (ROS) launch files can be improved using a modeling approach. This paper presents ROSLaunchVisual, a model-integrated environment built on WebGME for designing, visualizing, and managing ROS launch files. The tool raises the level of abstraction by allowing developers to create and modify launch files using a graphical interface that represents nodes, publishers, subscribers, and arguments as interconnected components. The tool provides a dynamic system analysis that can then be used in the static development and analysis of new and existing launch files. ROSLaunchVisual incorporates features such as metamodel-driven validation, automatic import/export of launch files, and visual communication mapping. Plugins further enhance functionality by updating libraries, checking for semantic errors, and managing remaps. By making launch file creation more intuitive and less error-prone, ROSLaunchVisual improves development efficiency and system understanding, especially in collaborative or large-scale robotics projects.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09645v1</guid>
      <category>cs.RO</category>
      <category>cs.PL</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kate Sanborn, Tanuj Kenchannavar, Vakul Nath, Jonathan Sprinkle</dc:creator>
    </item>
    <item>
      <title>Do Video Foundation Models Understand Intuitive Physics? A Layerwise Probing Analysis</title>
      <link>https://arxiv.org/abs/2606.09646</link>
      <description>arXiv:2606.09646v1 Announce Type: new 
Abstract: We study whether pretrained video foundation models encode intuitive-physics information in their frozen representations, and how this information varies across model families, layers, and probe types. Using frozen-feature probing on IntPhys2 and Minimal Video Pairs (MVP), we compare predictive joint-embedding models (V-JEPA), masked reconstruction models (VideoMAE), and a diffusion-based video generator (LTX-Video). V-JEPA achieves the strongest overall results across benchmarks, especially with probes that model temporal dynamics, while VideoMAE remains competitive and LTX-Video recovers weaker but non-trivial signal. Layerwise analyses show that physics-relevant information is weakest in early layers and becomes most accessible at intermediate-to-late depth, and temporal controls show that disrupting frame order substantially reduces performance, especially on MVP. Together, these results suggest that intuitive-physics knowledge emerges reliably in pretrained video representations, but its accessibility depends strongly on pretraining paradigm, representational depth, and readout mechanism.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09646v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Samuele Punzo, Niccol\`o Caselli, Ippokratis Pantelidis, Francesco Massafra, Salvatore Lo Sardo, Mohammadreza Salehi</dc:creator>
    </item>
    <item>
      <title>ArtiFact: A Large-Scale Multi-Modal Cultural Heritage Dataset</title>
      <link>https://arxiv.org/abs/2606.09648</link>
      <description>arXiv:2606.09648v1 Announce Type: new 
Abstract: Multi-modal data management has emerged as a central research topic in the database community, spanning data integration, semantic query processing, and data quality assessment. Despite this growing interest, the community lacks large-scale, real-world datasets combining tables, text, and images. We present ArtiFact, a multi-modal cultural heritage dataset of 651045 museum records collected from the Metropolitan Museum of Art, the Art Institute of Chicago, and the Rijksmuseum. We demonstrate the utility of ArtiFact through two downstream tasks. For cross-modal error detection, we introduce a curated taxonomy of seven error categories injected into 130209 records and show that reliably detecting subtle domain-specific errors such as material anachronisms and temporal shifts remain an open challenge. For semantic query processing, we show that current systems struggle with queries involving cultural proximity, ambiguous object types, and historically contingent terminology. Our results position ArtiFact as a challenging benchmark for multi-modal data management research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09648v1</guid>
      <category>cs.DB</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Luciano Duarte, Olga Ovcharenko, Sebastian Schelter</dc:creator>
    </item>
    <item>
      <title>A Unifying Framework for Concept-Based Representational Similarity</title>
      <link>https://arxiv.org/abs/2606.09653</link>
      <description>arXiv:2606.09653v1 Announce Type: new 
Abstract: Learned representations across models and modalities often exhibit striking structural similarities, suggesting shared underlying concept decompositions. However, concept alignment remains poorly defined: existing approaches optimize different objectives under the same terminology, obscuring what is actually aligned.
  We propose a unifying framework that decomposes alignment along two axes: what is aligned (representations vs. concepts) and at what level (instance-wise vs. distributional). This induces four corresponding properties -- instance-wise and distributional variants of translation and concept consistency -- and reveals precisely which of these guarantees existing methods provide. We further introduce \InterVenchA, an intervention-based benchmark that separately measures extraction quality, translation quality, and concept consistency. Through theory and experiments, we show that commonly assumed equivalences between alignment objectives fail in practice: optimizing one property does not reliably recover the others, and purely unsupervised objectives fail to recover meaningful instance-level alignment. We then propose the Coupled Sparse Autoencoder (CoSAE), which jointly enforces complementary alignment objectives. Strong alignment emerges only in this regime. Surprisingly, as little as 0.1\% paired data is sufficient to recover instance-level alignment when anchoring distributional objectives.
  Overall, our results show that concept alignment is fundamentally multi-objective: it must be defined, measured, and optimized as such.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09653v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gr\'egoire Dhimo\"ila, Victor Boutin, Agustin Martin Picard, Thomas Fel, Thomas Serre</dc:creator>
    </item>
    <item>
      <title>Beyond Accuracy: Community Perspectives on Machine Translation</title>
      <link>https://arxiv.org/abs/2606.09655</link>
      <description>arXiv:2606.09655v1 Announce Type: new 
Abstract: Despite remarkable progress in machine translation (MT), non-AI communities have raised growing concerns about MT systems, suggesting a noticeable gap between technical advancement and the needs of real-world users. For instance, while NLP researchers focus on benchmark performance, end users care about ethical concerns, trust, reliability, costs, and more. We argue that listening to various user communities is essential so that research efforts would be directed towards the problems that the communities care about. To this end, we present a large-scale analysis, for the first time, that investigates what four stakeholder communities (AI developers, professional translators, language learners, and language service providers) post about MT technology on social media. To do so, we construct a dataset of 79,286 posts and comments from Reddit, Facebook, Bluesky, and Mastodon from 2019 to 2025, and analyse where these communities disagree, and how and why. Overall, we find that communities often disagree, and even show strong conflicts due to polarised sentiments on topics such as translation quality, efficiency, and reliability. This is because these communities approach these topics differently: the AI community frames them as technical and computational problems, while non-AI (user) communities care more about quality nuances, time savings, user trust, and broader social issues.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09655v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Yujun Wang, Ehud Reiter, Shimei Pan, Steffen Eger, Wei Zhao</dc:creator>
    </item>
    <item>
      <title>Muon Learns More Robust and Transferable Features than Adam</title>
      <link>https://arxiv.org/abs/2606.09658</link>
      <description>arXiv:2606.09658v1 Announce Type: new 
Abstract: Muon has recently emerged as a state-of-the-art optimizer for pretraining Large Language Models (LLMs) and vision classifiers. Despite its efficiency advantage over Adam and SGD, the feature-learning advantage of Muon remains unclear. This paper investigates Muon's feature-learning advantage through the lens of robustness and transferability. First, by evaluating pretrained models on corrupted images and texts, we show that features learned by Muon are consistently more robust than those learned by Adam and SGD across different architectures, including transformers and Convolutional Neural Networks (CNNs). Using trained layer-wise probes, we further show that this robustness advantage is reflected in larger logit margins across layers. Second, by training linear classifiers or fine-tuning full models from pretrained parameters on downstream tasks, we demonstrate that Muon-learned features transfer more effectively than those learned by Adam and SGD. This transferability advantage is further supported by the diversity of hidden states across layers, as measured by effective rank. Finally, in a representative classification problem with multi-component features, we prove that Muon attains larger margins and higher effective rank than Adam and SGD, providing theoretical support for our empirical findings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09658v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tianyu Ruan, Fengzhuo Zhang, Shuche Wang, Shihua Zhang</dc:creator>
    </item>
    <item>
      <title>End-to-End Context Compression at Scale</title>
      <link>https://arxiv.org/abs/2606.09659</link>
      <description>arXiv:2606.09659v1 Announce Type: new 
Abstract: Long-context language model inference is bottlenecked by memory, as the KV cache grows with context length. Recent techniques to compress the KV cache fall short: they either degrade model quality substantially or require considerable time and compute to compress a single long prompt. Furthermore, many methods require the input to fit within the target model's context window, and are generally incompatible with modern production inference engines. Encoder-decoder compressors, which map a long token sequence to a shorter sequence of latent embeddings consumed by a decoder, are an appealing alternative in principle. However, existing approaches are not competitive with KV cache compression on the accuracy-efficiency frontier. In this work, we revisit encoder-decoder compression and close this gap. We first perform an architecture search, pre-training many variants from scratch to determine how best to design and train encoder-decoder compressors. Guided by our findings, we continually pre-train a family of 0.6B-encoder, 4B-decoder models on over 350B tokens each, at compression ratios of 1:4, 1:8, and 1:16. We introduce Latent Context Language Models (LCLMs), a family of compressors that improve the Pareto frontier across general-task performance, compression speed, and peak memory usage. We demonstrate that LCLMs serve as efficient backbones for long-horizon agents, letting the agent skim through a compressed long context and adaptively expand relevant segments on demand.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09659v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ang Li, Sean McLeish, Haozhe Chen, Nimit Kalra, Zaiqian Chen, Artem Gazizov, Venkata Anoop Suhas Kumar Morisetty, Bhavya Kailkhura, Harshitha Menon, Zhuang Liu, Brian R. Bartoldson, Tom Goldstein, Sanae Lotfi, Micah Goldblum, Pavel Izmailov</dc:creator>
    </item>
    <item>
      <title>A Differentiable Simulation of the Eye for Patient-Specific Strabismus Surgery Planning</title>
      <link>https://arxiv.org/abs/2606.09661</link>
      <description>arXiv:2606.09661v1 Announce Type: new 
Abstract: Purpose: Up to 4% of adults will develop strabismus in their lifetime. The most common surgical intervention involves adjusting the length of one or more extraocular muscles to correct the angular deviation. This correction depends on surgical expertise and statistical reference tables, which often fail to yield optimal results for patients with atypical eye morphology. Our work proposes a physics-based modeling approach to personalized surgical planning, accounting for patient-specific eye anatomy. Methods: We built a physics-based simulator of the eye and its muscles, incorporating patient-specific geometry and Hill-type muscle biomechanics. We solve an optimization problem to find the surgical dosage that minimizes angular deviation. The model is implemented as a fully differentiable simulation, enabling efficient optimization. We validated the framework by comparing its predictions with standard surgical tables for emmetropic eyes before applying it to anatomically atypical virtual patients. Results: Our model's predictions for emmetropic eyes were first validated, demonstrating a strong fit with standard surgical tables. More importantly, for high-myopia models, the framework computed a clinically significant increase in the required surgical dosage compared to standard eyes. This computed recession difference is highly relevant as surgical plans are adjusted in 0.5 mm increments. Conclusion: Our results show that our model provides a calibrated surgical plan that, unlike standard tables, also accounts for pathologies involving atypical eye shapes. This patient-specific model represents a step toward personalized surgical planning, with the potential to improve dosage accuracy and surgical outcomes for atypical cases.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09661v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <arxiv:DOI>10.1007/s11548-026-03675-3</arxiv:DOI>
      <arxiv:journal_reference>Int J Comput Assist Radiol Surg Int J Comput Assist Radiol Surg. 2026 Apr 30</arxiv:journal_reference>
      <dc:creator>Even Harsigny, Pablo Alvarez, Michel Duprez, St\'ephane Cotin</dc:creator>
    </item>
    <item>
      <title>When Built-in Thinking Helps and Hurts: Constraint-Level Error Shifts in Instruction Following</title>
      <link>https://arxiv.org/abs/2606.09662</link>
      <description>arXiv:2606.09662v1 Announce Type: new 
Abstract: Large reasoning models (LRMs) often improve math and coding performance, but their effect on instruction following is unclear. We study IFEval with Qwen3 models (1.7B-32B), using same-weights Thinking ON/OFF controls; four Hunyuan models provide directional cross-family support. Aggregate pass-rate changes are small (-0.55 to -3.52 pp), yet 10-20% of prompts switch between pass and fail across modes, suggesting that thinking changes the pattern of errors--some prompts improve while others worsen--rather than uniformly degrading performance. Under a post-hoc Qwen3-derived grouping, constraint types separate into Planning (global counting, structure, coordination), which improves at the class level under thinking, and Precision (exact local form), which consistently worsens; the class-level Planning/Precision sign pattern holds directionally for all four Hunyuan models despite Hunyuan's opposite aggregate direction. Thinking also changes final-answer length; matched-length analyses substantially reduce the Precision drop, but a residual penalty remains. Analyzing thinking traces with a cross-encoder relevance metric reveals three patterns: Neutral shows a positive relevance-compliance link (r approximately 0.15); Planning shows near-zero predictive correlation (r approximately 0.02) despite measurable trace engagement, consistent with an execution gap between CE-measured trace relevance and final-answer compliance; Precision shows a small negative correlation (r approximately -0.05), with failing instances having higher mean relevance than passing ones. Activation patching across four model sizes (1.7B-14B) shows that Precision flip instances are more often restored than Planning flip instances (32-58% vs. 14-40% mean layer-restoration), with the largest gap at 14B (about 30 pp).</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09662v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sai Adith Senthil Kumar</dc:creator>
    </item>
    <item>
      <title>From 0-to-1 to 1-to-N: Reproducible Engineering Evidence for MetaAI Recursive Self-Design</title>
      <link>https://arxiv.org/abs/2606.09663</link>
      <description>arXiv:2606.09663v1 Announce Type: new 
Abstract: Recursive self-design refers to AI-assisted modification of the mechanisms by which an AI system is built, evaluated, and improved. This paper treats MetaAI not as a mature paradigm, but as a working term for a human-seeded, AI-expanded development pattern in which the design space itself becomes a target of modification. We propose an operational evidence framework with four criteria: inspectable target system, meta-level modifier, feedback-directed selection, and recursive continuation. We then map public systems, including Darwin Goedel Machine (DGM), STOP, Goedel Agent, and ShinkaEvolve, against these criteria. DGM provides the most direct currently reported evidence: its published results show improvement from 20% to 50% on SWE-bench Verified and from 14.2% to 30.7% on full Polyglot after 80 iterations, with ablations suggesting that both open-ended exploration and self-improvement contribute. Finally, we provide MetaAI-Mini, a reproducible HumanEval-based protocol and codebase. Because no completed model run is included in this build, MetaAI-Mini is reported as a protocol rather than as an experimental result.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09663v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dun Li, Jiatao Li, Hongzhi Li</dc:creator>
    </item>
    <item>
      <title>In-Context Learning for Latent Space Bayesian Optimization</title>
      <link>https://arxiv.org/abs/2606.09664</link>
      <description>arXiv:2606.09664v1 Announce Type: new 
Abstract: Bayesian optimization (BO) is a central tool for sample-efficient design, and latent-space Bayesian optimization (LSBO) extends it to structured objects such as molecules and proteins. In parallel, tabular foundation models such as TabPFN and TabICL now achieve state-of-the-art regression performance and are increasingly used as BO surrogates. Because their Bayesian behavior is induced by large synthetic pretraining collections, the composition of this pretraining distribution is crucial. LSBO creates a distinctive mismatch: the induced map from latent code to objective value differs markedly from the regression tasks used to train current in-context models. We address this mismatch by complementing the pretraining stage of tabular foundation model surrogates with synthetic optimization tasks defined on the latent space of a molecular VAE. The continued-pretraining objective features a regularizer that anchors the model to the original checkpoint, preserving its broad regression prior while avoiding overspecialization to the adaptation tasks. On held-out molecular optimization benchmarks, the resulting model achieves strong performance, supporting the relevance of LSBO-specific adaptation for in-context surrogates.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09664v1</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tuan A. Vu, Harri L\"ahdesm\"aki, Julien Martinelli</dc:creator>
    </item>
    <item>
      <title>Frequency-based Constrained Sampling for Interval Patterns</title>
      <link>https://arxiv.org/abs/2606.09666</link>
      <description>arXiv:2606.09666v1 Announce Type: new 
Abstract: Output space pattern sampling is a powerful alternative to exhaustive pattern mining for exploring large pattern spaces, as it enables users to focus on representative patterns drawn according to a chosen interestingness measure. In this paper, we address the problem of sampling interval patterns under user-defined syntactic constraints. We introduce CFips, a sampling approach that incorporates constraints directly into the sampling procedure. The approach relies on a multi-step sampling framework and supports several syntactic constraints by decomposing them into elementary predicates on interval bounds while preserving exact sampling guarantees. We formally prove that CFips samples interval patterns proportionally to their frequency within the constrained pattern space. The experimental results show that integrating constraints into the sampling procedure enables to complete mining tasks that would otherwise fail within a given time out.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09666v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Djawad Bekkoucha, Abdelkader Ouali, Bruno Cr\'emilleux</dc:creator>
    </item>
    <item>
      <title>Algorithm for Contextual Queueing Bandits with Rate-Optimal Queue Length Regret</title>
      <link>https://arxiv.org/abs/2606.09668</link>
      <description>arXiv:2606.09668v1 Announce Type: new 
Abstract: Contextual queueing bandits provide a framework for learning to schedule heterogeneous jobs under unknown context-dependent service rates. Under stochastic contexts, existing algorithms achieve $\widetilde{\mathcal{O}}(T^{-1/4})$ queue length regret, defined as the expected difference between the learner's and oracle's queue lengths at horizon $T$. In this paper, we improve this rate to $\widetilde{\mathcal{O}}(T^{-1/2})$. The key observation is that random exploration is needed only up to a carefully chosen cutoff round, rather than throughout the entire horizon. We propose CQB-$\eta$-2, a three-phase algorithm: (i) pure random exploration to construct an initial estimator, (ii) $\eta$-random exploration combined with a UCB rule to continue learning while maintaining negative drift, and (iii) pure UCB after the exploration cutoff. Our proof decomposes the queue length regret at the cutoff round. Before the cutoff, negative drift suppresses queue length differences caused by suboptimal choices. After the cutoff, the first two phases provide sufficient random exploration samples, ensuring that UCB decisions incur small departure-rate gaps. Combining these two bounds yields queue length regret of order $\widetilde{\mathcal{O}}(T^{-1/2})$. We further prove a minimax lower bound of order $\Omega(T^{-1/2})$. The proof constructs two hard instances that are statistically indistinguishable up to the final service decision, and uses a queue-specific coupling argument to convert the resulting testing error into queue length regret. Together, our upper and lower bounds characterize the minimax dependence on the horizon $T$ up to logarithmic factors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09668v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Seoungbin Bae, Dabeen Lee</dc:creator>
    </item>
    <item>
      <title>SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks</title>
      <link>https://arxiv.org/abs/2606.09669</link>
      <description>arXiv:2606.09669v1 Announce Type: new 
Abstract: Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and operate within the physical world. However, existing benchmarks predominantly rely on passive evaluation (e.g., static VQA) or simulator-specific pipelines, failing to assess general interactive spatial understanding. We introduce SpatialWorld, a unified benchmark designed specifically for evaluating the interactive spatial understanding of multimodal agents in complex real-world tasks. Integrating eight heterogeneous simulation backends under a shared, simulator-agnostic protocol, SpatialWorld features 760 human-annotated tasks across diverse domains (e.g., household routines, travel, social collaboration). Agents must solve tasks under vision-only partial observability, actively gathering egocentric visual evidence and expressing decisions via a unified, text-based action interface native to MLLMs. For reliable evaluation, each task includes a human-validated initial state, a reference trajectory, and a terminal-state verifier. Evaluating 15 advanced agents reveals that robust spatial task solving remains challenging: the strongest model, GPT-5, achieves an average task success rate (TSR) of only 17.4%, while the leading open-source model, Qwen-3.5, reaches 14.1%. Further analysis exposes a clear mismatch between task success and execution efficiency, alongside substantial domain-specific performance variations. These bottlenecks in active exploration and long-horizon planning position SpatialWorld as a rigorous testbed for future spatial agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09669v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hongcheng Gao, Hailong Qu, Jingyi Tang, Jiahao Wang, Zihao Huang, Hengkang Qiao, Shihong Huang, Junming Yang, Yi Li, Hongyixuan Yuan, Wenjie Li, Bohan Zeng, Wenbo Li, Bo Wang, Jianhui Liu, Olive Huang, Haoyang Huang, Wentao Zhang, Guoqing Huang, Nan Duan, Yinpeng Dong</dc:creator>
    </item>
    <item>
      <title>Visual Prompting Meets Feature Reconstruction-Based Anomaly Detection with Dual-Teacher Supervision</title>
      <link>https://arxiv.org/abs/2606.09670</link>
      <description>arXiv:2606.09670v1 Announce Type: new 
Abstract: Recent Anomaly Detection methods achieve perfect detection and segmentation scores on well-established datasets, such as MVTec. However, many of these methods face challenges when foundational assumptions - such as consistent object scale, viewpoint, background, illumination, and centered placement - are violated. Those variations that occur render anomaly detection methods unusable in many real-world scenarios. To address these limitations, we introduce three key contributions: (1) a visual prompting pipeline that isolates objects using foreground-background masking; (2) a mechanism for unfreezing the teacher in student-teacher models to improve domain adaptability; and (3) a data augmentation strategy leveraging diffusion-generated synthetic images to enhance anomaly detection performance. We achieve a 3.5 percentage point improvement over the previous state-of-the-art on the challenging AeBAD dataset by using the Masked Multiscale Reconstruction (MMR) model as our backbone.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09670v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mateo Diaz-Bone, Daniel Caraballo, Florian Scheidegger, Thomas Frick, Mattia Rigotti, Andrea Bartezzaghi, Roy Assaf, Niccolo Avogaro, Yagmur G. Cinar, Brown Ebouky, Filip M. Janicki, Piotr S. Kluska, Cezary Skura, Cristiano Malossi</dc:creator>
    </item>
    <item>
      <title>Transition-Based Digital Twin Modelling for Alzheimer's Disease under Sparse Longitudinal Data</title>
      <link>https://arxiv.org/abs/2606.09671</link>
      <description>arXiv:2606.09671v1 Announce Type: new 
Abstract: Alzheimer's disease (AD) progression is highly heterogeneous and is typically observed through sparse and irregular longitudinal data, posing challenges for prediction and personalised monitoring. Existing machine learning approaches have improved AD prediction using multimodal data, yet often focus on static classification or cohort-level risk estimation, providing limited support for subject-specific modelling and uncertainty-aware reasoning. To address these limitations, we present a personalised digital twin framework for AD prediction and scenario-based analysis using multimodal longitudinal data. The proposed approach integrates complementary modelling strategies to capture clinical transitions and temporal dependencies across visits. Using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), including cognitive assessments, clinical variables, and MRI-derived phenotypes, the framework predicts cognitive status and diagnostic categories while quantifying predictive uncertainty and enabling patient-specific what-if trajectory analysis. Evaluation on leak-free subject-level splits demonstrates strong performance in score forecasting and diagnosis classification. In this sparse and irregular ADNI setting, transition-based modelling of adjacent visits achieved higher predictive accuracy than the sequence-based branch, suggesting that local transition modelling may be more data-efficient. While sequence models remain valuable for uncertainty-aware trajectory forecasting, local transition modelling offers a more data-efficient and robust predictive strategy. These findings highlight the importance of aligning temporal modelling strategies with clinical data structure and suggest that transition-based digital twin formulations may provide a practical and interpretable approach for personalised disease forecasting in neurodegenerative disorders.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09671v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yinyu Huang, Yilin Zhang, Sofia Michopoulou, Christopher Kipps, Rahman Attar</dc:creator>
    </item>
    <item>
      <title>Correlation Is Not Enough: Embedding Human Metadata for Individual Causal Discovery</title>
      <link>https://arxiv.org/abs/2606.09672</link>
      <description>arXiv:2606.09672v1 Announce Type: new 
Abstract: Ask a pretrained biomedical language model whether "cortisol 28 ug/dL" and "stock-market volatility" are related, and it returns a cosine similarity of 0.83 on a scale where 1.0 means identical. The two share no mechanism. This is not a corner case: every off-the-shelf biomedical encoder we tested (BioBERT, PubMedBERT, BioM-ELECTRA) scores unrelated cross-domain pairs between 0.76 and 0.92 when the answer should be near zero. Accuracy on cross-domain discrimination is 0%.
  Retrieval systems survive this, because a language model downstream filters the noise. A Large Behavioural Model (LBM), a foundation model whose subject is a person rather than a sentence, does not: it reasons over a graph of a user's life and treats embedding proximity as evidence that two events are causally linked. False proximity writes a false causal edge, and everything downstream inherits the error. Here, embedding geometry is not a tuning knob; it is correctness.
  We report the fix. A contrastive pass over 72,034 pairs raises PubMedBERT BIOSSES correlation from 0.633 to 0.828 and within-vs-across-domain separation from 1.05x to 1.63x. A second pass, BODHI, mines hard negatives from edges absent in a biomedical knowledge graph and lifts separation to 2.30x and the discrimination gap to +0.392, at a 4.5% BIOSSES cost. On an Intel Xeon 6737P with AMX, OpenVINO cuts single-query latency from 1367 ms to 10 ms (133x) and reaches 555 sentences/sec. One finding contradicts standard advice: FP16 beats INT8 on this silicon at every serving batch size, and we explain why. The same model on a no-AMX Ice Lake instance runs 13-27x slower. We release the benchmark suite, training corpora, the BODHI generator, and the OpenVINO scripts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09672v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <category>cs.PF</category>
      <category>q-bio.QM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Suraj Biswas, Saurabh Gupta, Pritam Mukherjee</dc:creator>
    </item>
    <item>
      <title>(Auto)formalization is supposed to be easy: Trellis process semantics for spelling out rigorous proofs</title>
      <link>https://arxiv.org/abs/2606.09674</link>
      <description>arXiv:2606.09674v1 Announce Type: new 
Abstract: We present Trellis: an autoformalization system that leverages LLM agents in a deterministically constrained workflow to enforce incremental progress in Lean autoformalization tasks through iterative refinement of natural language proofs. Our approach is motivated by the common mathematician's notion of what it means to have a rigorous proof in the first place: namely, that it would be routine to elaborate any part of the proof in further detail. The result is a system which aims to achieve reliable autoformalization on a modest budget and with generalist agents, with specialization to autoformalization coming not from any task-specific agent training but instead from a meaning-of-rigor inspired workflow enforced by process semantics. We link to an end-to-end Lean formalization of a recent Ramsey theory breakthrough produced by the process.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09674v1</guid>
      <category>cs.AI</category>
      <category>cs.LO</category>
      <category>math.CO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wesley Pegden</dc:creator>
    </item>
    <item>
      <title>Boundary-Layer-Induced Failure of Standard Physics-Informed Neural Networks: A Legendre Wavelet Collocation Benchmark for Singularly Perturbed Transport Problems</title>
      <link>https://arxiv.org/abs/2606.09676</link>
      <description>arXiv:2606.09676v1 Announce Type: new 
Abstract: Boundary layers provide a demanding test for numerical solvers because the solution may remain almost constant over most of the domain while changing rapidly in a narrow region near the boundary. This paper studies a singularly perturbed one-dimensional transport boundary-value problem with increasing Peclet number $(\mathrm{Pe})$. A local Legendre wavelet collocation method (LWM) is compared with a standard soft-boundary physics-informed neural network (PINN) for this benchmark. The wavelet approximation uses locally supported Legendre polynomial basis functions and converts the problem into a square algebraic collocation system with residual, boundary, and interface-continuity equations. Numerical experiments are performed for $\mathrm{Pe}=1,10,100,$ and $1000$. The LWM captures all four cases, with the largest error remaining below $5\times 10^{-3}$. The standard soft-boundary PINN performs well for the mild cases but fails to resolve the sharp boundary layer for the larger Peclet numbers. The results show that local wavelet collocation is more reliable than the standard soft-boundary PINN for this benchmark, while dense near-boundary evaluation helps reveal errors that may be missed on coarse grids.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09676v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Suvendu Nayak, Arun Kumar Gupta</dc:creator>
    </item>
    <item>
      <title>SoccerNet 2026 Player-Centric Ball-Action Spotting:Retraining and Post-Processing Extensions to the FOOTPASS Baselines</title>
      <link>https://arxiv.org/abs/2606.09679</link>
      <description>arXiv:2606.09679v1 Announce Type: new 
Abstract: We describe our system for the SoccerNet 2026 Player-Centric Ball-Action Spotting Challenge, which requires predicting who performs which action and when, across eight classes in broadcast soccer. Building on the three FOOTPASS baselines [1] (TAAD, TAAD+GNN, and TAAD+DST), we contribute four extensions: (1) gradient check pointing to enable full-backbone fine-tuning on a single GPU; (2) fusion of GNN logits into the DST encoder, combining graph-based tactical context with per-player visual features; (3) square-root frequency class weighting to address the 213:1 pass-to-tackle imbalance in the training data; and (4) a post processing pipeline comprising per-class logit gating, temporal frame refinement, jersey re-assignment, and a two-model ensemble. Our system achieves 0.548 Macro F1 on the test set and 0.446 on the challenge set (server evaluation).</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09679v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Parthsarthi Rawat</dc:creator>
    </item>
    <item>
      <title>GenEyePose: Patient-Free, Knowledge-Based Saccadic Eye Movement Modeling for Digital Neurophysiologic Biomarker Development</title>
      <link>https://arxiv.org/abs/2606.09681</link>
      <description>arXiv:2606.09681v1 Announce Type: new 
Abstract: Eye movements, including saccades, are widely regarded as highly sensitive and objective biomarkers of neurophysiologic states. Detecting saccadic signatures in neurologic diseases offers a rapid, portable alternative to brain imaging, avoiding access and cost barriers. Currently, there are no robust AI-enabled video-oculographic solutions (e.g., digital biomarkers) for screening, triaging, or localizing brain abnormalities due to privacy issues and scarce datasets. In this work, we propose the first fully synthetic, patient-free, multimodal eye movement generation pipeline for generalizable saccade analysis. Using this synthetic dataset, we trained a deep learning classifier to distinguish between normal and abnormal (hypometria and hypermetria) saccadic accuracies and evaluated its performance on real-world clinical data. The model achieved an AUROC of 0.76 and a sensitivity of 0.71, showing that the synthetic data has strong potential to generalize for clinical applications, including as a screening tool in at-home and emergency room settings or a tool for precise neuroanatomic localization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09681v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tianyu Lin, Jooyoung Ryu, Puvada Sreevarsha, Rahul Srinivasaragavan, Riya Satavlekar, Susan Kim, Nidhi Soley, Yujie Yan, Ishan Vatsaraj, Carl Harris, Aimon Rahman, Vishal Patel, Joseph Greenstein, Casey Taylor, Kemar E. Green</dc:creator>
    </item>
    <item>
      <title>AutoMegaKernel: A Statically-Checked Agent Harness for Self-Retargeting Megakernel Synthesis</title>
      <link>https://arxiv.org/abs/2606.09682</link>
      <description>arXiv:2606.09682v1 Announce Type: new 
Abstract: AutoMegaKernel (AMK) compiles a HuggingFace Llama-family model into a single persistent cooperative CUDA kernel that runs the whole forward pass in one launch, with no per-model hand-written CUDA. The contribution is the system, not raw speed.
  A frozen schedule-IR validator statically certifies deadlock-freedom and race-freedom via static graph checks (not a mechanized proof), so an unsafe agent-proposed schedule is rejected before launch: across 7,160 adversarial schedules (6,091 unsafe) it had zero false-accepts and accepted all 360 real lowerings. The same source retargets sm_80/sm_90/sm_120 from one codebase, auto-generates correct megakernels for 10 of 10 supported models, and on a real SmolLM2-135M checkpoint reproduces HuggingFace greedy decode token-for-token (perplexity match 2.5e-7). An unattended, agent-drivable autoresearch loop self-improves the megakernel over its own baseline (1.25-1.72x).
  A search-found int8 (W8A16) megakernel beats CUDA-graphed cuBLAS bf16 at batch-1 decode across NVIDIA's datacenter inference fleet: L4 up to 1.33x, the current-gen L40S 1.25-1.27x, A10G up to 1.08x at scale, and the consumer RTX 5090 1.19-1.23x. The ordering is not a clean function of bandwidth (the 864 GB/s L40S beats the 600 GB/s A10G); the divide is inference-class vs training-class. AMK trails cuBLAS on the high-bandwidth training-class A100/H100, where the harness localizes the cross-SM-sync bottleneck; we report the gap plainly. This is a precision-asymmetric (W8A16 vs bf16) comparison at decode position 0; the largest real checkpoint is TinyLlama-1.1B. Code and the harness: https://github.com/RightNow-AI/AutoMegaKernel</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09682v1</guid>
      <category>cs.LG</category>
      <category>cs.DC</category>
      <category>cs.PF</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jaber Jaber, Osama Jaber</dc:creator>
    </item>
    <item>
      <title>An 84-Format Numeric Catalog with Bit-Exact Conformance Vectors: A Vendor-Neutral Reference for FP8, BF16, MXFP4, and Microscaling Formats</title>
      <link>https://arxiv.org/abs/2606.09686</link>
      <description>arXiv:2606.09686v1 Announce Type: new 
Abstract: Numeric format proliferation in machine learning hardware -- FP8 (E4M3 and E5M2), BF16, MXFP4, microscaling block formats, and dozens of research variants -- has outpaced the availability of vendor-neutral, bit-exact reference material. Engineers porting models across accelerators encounter silent divergences that are difficult to diagnose without a shared ruler.
  This paper describes a catalog of 84 numeric formats spanning 13 families, a suite of six bit-exact conformance packs covering GF16, MXFP4 element, BF16, FP8 E4M3, FP8 E5M2, and E8M0 block scale, and an IEEE P3109 v3.2.0 cross-walk that maps each pack to its corresponding standards-track configured format. Each pack is a self-contained JSON document with a SHA-256 fingerprint, a shared row schema, and an anchor vector that encodes 3.0 -- the identity phi^2 + 1/phi^2 = 3 -- as a cross-pack sanity check. Packs are cross-validated against ml_dtypes 0.5.4 (Google/JAX); any divergence is documented explicitly and interpreted as a spec-permitted interpretation gap rather than hidden. The work is framed as registry filling: it does not propose new formats, make model-accuracy claims, or assert superiority over any vendor's implementation. All artifacts are publicly available at https://github.com/gHashTag/t27 under an open license.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09686v1</guid>
      <category>cs.AR</category>
      <category>cs.AI</category>
      <category>cs.MS</category>
      <category>cs.NA</category>
      <category>cs.PF</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dmitrii Vasilev</dc:creator>
    </item>
    <item>
      <title>A space-time sparse-grid method for the wave equation</title>
      <link>https://arxiv.org/abs/2606.09688</link>
      <description>arXiv:2606.09688v1 Announce Type: new 
Abstract: We develop a fast space-time numerical scheme for approximating solutions to the linear wave equation. The approach is based on the sparse-grid combination technique applied to a coercive space-time discretization. Designed for tensor-product space-time discretizations, the method enables efficient parallelization of the resulting solver. We provide a rigorous theoretical analysis establishing convergence rates and computational complexity estimates. Numerical experiments validate the theoretical estimates and demonstrate the efficiency of the proposed method.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09688v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Matteo Ferrari, Andrea Moiola, Chiara Perinati, Ilaria Perugia</dc:creator>
    </item>
    <item>
      <title>Low-Rank Acceleration of the Operator Fourier Transform</title>
      <link>https://arxiv.org/abs/2606.09689</link>
      <description>arXiv:2606.09689v1 Announce Type: new 
Abstract: We develop a numerical algorithm for the efficient solution or approximation of solutions to the Helmholtz equation on a structured grid in two dimensions. We make use of the Operator Fourier Transform (OFT) and a low-rank cross approximation scheme (Cross-DEIM) to decompose the problem into an integral over a pseudo-time of solutions to the Schr\"odinger equation. The OFT is a framework for solving operator equations like fractional Laplacian equations or the Helmholtz equation, when the latter is written as a product of two paraxial operators. The main computational cost in the OFT is the solution to the Schr\"odinger equation, especially when the dimension or mesh resolution is high. In this work, we alleviate this cost by utilizing a low-rank method. Such methods aim to beat the curse of dimensionality when low-rank structures are present in the solution. We show that the combination of these two approaches can have large cost reductions for certain classes of problems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09689v1</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jack Kelley</dc:creator>
    </item>
    <item>
      <title>Observability for Delegated Execution in Agentic AI Systems</title>
      <link>https://arxiv.org/abs/2606.09692</link>
      <description>arXiv:2606.09692v1 Announce Type: new 
Abstract: Delegation-scoped execution is not identifiable from standard observables: audit logs and execution traces can be identical under multiple incompatible delegation assignments. This gap is especially acute in LLM-based agentic systems, where agents dynamically select tools, vary execution sequences across runs for the same instruction, and spawn cooperating sub-agents. These dynamics fragment and interleave traces, making delegation-scoped reconstruction from causal structure alone structurally underdetermined. Although individual actions are authorized and logged, existing audit, tracing, and security schemas lack the semantics to reconstruct what actions occurred under a given delegation across heterogeneous systems. We focus on delegation-scoped attribution and access/share footprint reconstruction, not intent inference or reasoning reconstruction. We present an agent-aware observability substrate consisting of a lightweight gateway and a common information model that binds delegation context at execution time. This enables reliable cross-tool delegation-scoped reconstruction and direct forensic queries without heuristic time-window correlation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09692v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Abhinav Mishra, Kumar Sharad</dc:creator>
    </item>
    <item>
      <title>PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models</title>
      <link>https://arxiv.org/abs/2606.09697</link>
      <description>arXiv:2606.09697v1 Announce Type: new 
Abstract: Large language models (LLMs) routinely face requests that should be refused, creating a trade-off between helpfulness and harm prevention. However, refusals themselves can be helpful. In high-risk interactions involving crisis, coercion, or escalating intent, blunt non-compliance may prevent direct harm while still failing to support the needs of the person behind the request. We present PsychoSafe, a psychologically-informed refusal framework that reframes refusal as structured supportive communication grounded in evidence-based intervention strategies. To develop PsychoSafe, we construct a corpus of 8019 prompt-response pairs spanning five psychologically salient risk domains and apply prompting and parameter-efficient fine-tuning to Qwen 3.5 27B. On a balanced validation set of 500 prompts, evaluated with an LLM judge and validated through human ratings, PsychoSafe prompting improves overall refusal quality by 28.1% over a generic baseline, with particularly strong gains in external resource referral (+46.8%) and psychological grounding (+34.8%), while preserving downstream performance on non-refusal tasks. Fine-tuning achieves near-perfect refusal and resource-referral rates but reduces response relevance. Additional evaluations on SORRY-Bench and XSTest show strong in-domain robustness but limited out-of-domain generalization, suggesting that future work should diversify fine-tuning data to help models apply interventions selectively rather than schematically.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09697v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gianluca Barmina, Federico Torrielli, Sven Harms, Jacob Nielsen, Felix M\"achtle, Stine Lyngs{\o} Beltoft, Peter Schneider-Kamp, Thomas Eisenbarth, Lukas Galke Poech, Anne Lauscher</dc:creator>
    </item>
    <item>
      <title>Optimal Feedback Communication with Information Maximization and Distortion Minimization</title>
      <link>https://arxiv.org/abs/2606.09698</link>
      <description>arXiv:2606.09698v1 Announce Type: new 
Abstract: We study the problem of optimally sending a real-valued source through multiple uses of a channel with feedback. First, we state a set of conditions that are sufficient for an encoder to achieve maximal mutual information between the source and all the channel outputs. This set of conditions are also necessary when the channel is input-identifiable, a condition widely satisfied by common channel models. More notably, we further study the information maximization-distortion minimization problem, where the mutual information between the source and all channel outputs still needs to be maximized, while at each step, the MMSE of estimating the source from the channel outputs so far also needs to be minimized. We derive a solution to this problem for discrete channels with certain symmetries, e.g. $k$-ary symmetric or $k$-ary erasure channels. We show that for such channels the famous posterior matching scheme, while not necessary for information maximization alone, is sufficient and essentially necessary for achieving both information maximization and distortion minimization. This work also provides a new perspective of regularizing distortion-minimizing feedback communication through information maximization, which enables us to find the optimal solution that otherwise would be intractable.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09698v1</guid>
      <category>cs.IT</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <category>math.IT</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aolin Xu</dc:creator>
    </item>
    <item>
      <title>Cranio-Diff: Diffusion-based Cross-domain Craniofacial Reconstruction with 2D X-ray Skull Guidance and Structural Identity Constraints</title>
      <link>https://arxiv.org/abs/2606.09699</link>
      <description>arXiv:2606.09699v1 Announce Type: new 
Abstract: The state-of-the-art generative models, such as CycleGAN, Pix2Pix, and diffusion models have demonstrated remarkable performance in the face generation task. However, they fail to effectively capture cross-modality semantic information in craniofacial reconstruction when translating from the skull (x-ray) to the face (optical) domain, due to a mismatch in the alignment of structural identity across modalities. To address this issue, we propose Cranio-Diff, a diffusion-based framework for cross-domain cranio-facial reconstruction from 2D X-ray skull images. The proposed approach integrates skull-conditioned structural guidance through ControlNet with biometric text conditioning to generate a face which is more semantically and structurally aligned with the given skull. The proposed Cranio-diff method is evaluated on skull-face dataset obtained from X-ray scans of 120 subjects in lateral and frontal views. To enable controlled evaluation, each face image is synthesised across three age groups (25, 45, 65) and three BMI variations of -10%, baseline and +10%, yielding 4320 paired samples. To the best of our knowledge, this is the only X-ray-face dataset with this magnitude. Extensive experiments showed that the proposed method outperforms recent existing approaches in both generated image quality and retrieval task. Finally, to evaluate the performance of our proposed method, we have evaluated the quality of the generated image using FID, IS, SSIM, LPIPS, PSNR and ArcFace score. Additionally, retrieval performance is evaluated using recall@k, mAP@k and MRR@k. Obtained experimental results demonstrate that the proposed method can be used as an alternate tool in providing aid in forensic investigations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09699v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ravi Shankar Prasad, Naresh Gurjar, Shashank Baghel,  Chirag, Dinesh Singh</dc:creator>
    </item>
    <item>
      <title>What the Eyes See, the LLMs Miss: Exploiting Human Perception for Adversarial Text Attacks</title>
      <link>https://arxiv.org/abs/2606.09700</link>
      <description>arXiv:2606.09700v1 Announce Type: new 
Abstract: Large language model (LLM)-powered content moderation systems have become a critical defense against harmful online content. However, these systems primarily operate on tokenized text and largely ignore the visual cues that humans naturally rely on when interpreting content. We show that this discrepancy creates a fundamental perceptual mismatch: content that is readily recognized as harmful by humans can become effectively invisible to automated moderation systems. To study this vulnerability, we introduce a class of Human-Perceptible Adversarial Attacks (HPAA), in which harmful expressions are embedded into otherwise benign text through visually salient typographic manipulations. Our key insight is that typographic features, including spacing, visual emphasis, and spatial arrangement, can be strategically combined to preserve human recognition of harmful content while substantially reducing machine detectability. Operating in black-box settings with only a small query budget, our attack automatically generates evasive content without requiring model access or gradient information. We evaluate the attack across multiple datasets and ten deployed moderation systems, including commercial APIs and state-of-the-art open-source guardrails. Results reveal a striking gap between human and machine perception: with only three detector queries, generated attacks achieve over 86\% human recognition while maintaining detection rates below 1\% across the evaluated systems. We further conduct ablation studies to identify the typographic factors driving successful evasion, analyze why current moderation architectures fail to capture these signals, and discuss practical defenses. Our findings expose a fundamental blind spot in today's LLM-based moderation ecosystem and highlight need for moderation systems that reason about content in a manner more consistent with human perceptual understanding.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09700v1</guid>
      <category>cs.CR</category>
      <category>cs.HC</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qin Yang, Lu Malloy, Joshua Lee, Xiaohan Chang, Meisam Mohammady, Doowon Kim, Yuan Hong</dc:creator>
    </item>
    <item>
      <title>Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO</title>
      <link>https://arxiv.org/abs/2606.09701</link>
      <description>arXiv:2606.09701v1 Announce Type: new 
Abstract: AI red teaming must continually adapt to evolving attackers and defenders. Reinforcement learning offers a promising approach to discovering novel attacks, and co-training methods can produce more robust defenders in tandem. Recent works have demonstrated the efficacy of attacker-defender co-training by applying PPO and DPO, but report that GRPO is unstable in this setting. We introduce AdvGRPO, a co-training framework that makes GRPO viable for joint attacker-defender optimization using dense multi-channel rewards and decoupled advantage normalization. Training progresses through a curriculum from single-turn to closed-loop multi-turn attacks before bootstrapping co-training, where attacker and defender models are updated in alternation. We show that our method can produce highly effective and transferable attacks and that co-trained defenders outperform baselines on safety benchmarks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09701v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Blake Bullwinkel, Eugenia Kim, Amanda Minnich, Mark Russinovich</dc:creator>
    </item>
    <item>
      <title>When Do Local Score Models Extrapolate Across Size? A Diagnostic Theory and Benchmark</title>
      <link>https://arxiv.org/abs/2606.09705</link>
      <description>arXiv:2606.09705v1 Announce Type: new 
Abstract: Scientific generative modeling often requires size transfer, where models trained on small systems are evaluated on larger ones. While translation-invariant architectures enable this evaluation, we show that architectural locality alone does not guarantee stable size extrapolation. Instead, stable extrapolation is governed by the quasi-locality of the Gaussian-smoothed score. Through Tweedie's formula, far-away perturbations can influence local score components via posterior covariance, meaning a local model succeeds only if its receptive field covers the smoothed score's response range. We formalize this mechanism, proving a size-uniform comparison theorem for local marginals under reverse diffusion. We also introduce Finite-Depth Local Flow (FDLF), a white-box diagnostic benchmark with exact scores, densities, and controllable response ranges. Empirically, we validate the interplay between spatial mixing, smoothed-score quasi-locality, and model receptive fields. Under spatial mixing, the smoothed score remains quasi-local relative to the receptive field, enabling stable extrapolation. Conversely, when spatial mixing weakens, the score's locality rapidly degrades, causing size transfer to fail.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09705v1</guid>
      <category>cs.LG</category>
      <category>cond-mat.stat-mech</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wenjie Xi</dc:creator>
    </item>
    <item>
      <title>BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling</title>
      <link>https://arxiv.org/abs/2606.09707</link>
      <description>arXiv:2606.09707v1 Announce Type: new 
Abstract: As deep learning models scale, managing, inspecting, and modifying large checkpoints has become increasingly challenging. Researchers often need to alter model weights for layer restructuring, precision casting, low-rank factorization, and architectural debugging, yet these workflows often rely on fragile ad-hoc Python scripts. Here, we introduce BrainSurgery, a tool for robust and reproducible "tensor surgery" on neural network checkpoints, and provide a system demonstration covering four examples and three case studies from model upcycling to LoRA extraction. By abstracting storage formats and memory management, BrainSurgery executes complex transformations through declarative YAML plans. It supports structural modifications, mathematical transformations, and tensor reshaping through expressive regex and structural targeting, while built-in assertions validate tensor shapes, data types, and values to prevent silent errors. We envision that BrainSurgery will provide a strong foundation for future research through its reproducible and validated operations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09707v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gianluca Barmina, Annemette Broch Pirchert, Andrea Blasi N\'u\~nez, Lukas Galke Poech, Peter Schneider-Kamp</dc:creator>
    </item>
    <item>
      <title>IS-CoT: Breaking the Long-form Generation Collapse via Interleaved Structural Thinking</title>
      <link>https://arxiv.org/abs/2606.09709</link>
      <description>arXiv:2606.09709v1 Announce Type: new 
Abstract: Generating coherent and controllable long-form content remains a persistent challenge for Large Language Models (LLMs). While reasoning-enhanced models have demonstrated success in logic-intensive domains, our evaluation reveals that they suffer from a severe length collapse in open-ended writing, where performance degrades sharply as target lengths exceed 2,000 words. We attribute this failure to the limitation of static hierarchical planning, which struggles to provide dynamic guidance over extended contexts. To bridge this gap, we introduce the Interleaved Structural Chain-of-Thought (IS-CoT) framework. Unlike external agentic workflows, IS-CoT embeds a dynamic Plan-Write-Reflect cycle into the generation process, enabling continuous strategy adaptation and global alignment without additional assistance. Based on this framework, we construct a high-quality dataset of interleaved reasoning traces via a multi-teacher pipeline and train IS-Writer-8B. Experiments demonstrate that IS-Writer-8B achieves state-of-the-art performance on challenging long-form benchmarks (e.g., +3.08 vs. DeepSeek-V3.2 on LongBench-Write), exhibiting robust length compliance and coherence competitive with significantly larger proprietary models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09709v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Zechen Sun, Yuyang Sun, Zecheng Tang, Juntao Li, Wenpeng Hu, Wenliang Chen, Zhunchen Luo, Guotong Geng, Min Zhang</dc:creator>
    </item>
    <item>
      <title>Proxy Reward Internalization and Mechanistic Exploitation: A Learned Precursor to Reward Hacking and Its Generalization</title>
      <link>https://arxiv.org/abs/2606.09711</link>
      <description>arXiv:2606.09711v1 Announce Type: new 
Abstract: Reward hacking is usually studied after it becomes visible, once a model earns high proxy reward while failing the intended task. We instead study what proxy RL teaches before that failure appears. We introduce Proxy Reward Internalization and Mechanistic Exploitation (PRIME), a learned capability to assess task correctness, predict proxy acceptance, and reason about exploitable proxy--gold gaps. In coding RL environments with exploitable pytest rewards, we measure PRIME through chain-of-thought monitoring, direct probes, and activation-level concept vectors. We find that PRIME emerges in a staged sequence before sustained reward hacking, and that its current direct-probe score forecasts later hack onset and severity even when the visible hack rate is still low. PRIME also adapts when the evaluator changes, retargeting to whichever proxy--gold gap remains rewarded and persisting when gold reward suppresses overt hacking, and ablating its activation directions reduces hacking. Across checkpoints, in-domain PRIME tracks out-of-domain misalignment. Together these results suggest that exploitable proxy RL amplifies a proxy-internalization capability upstream of visible hacking, making PRIME a candidate early-warning signal for broader alignment risk.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09711v1</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mohammad Beigi, Ming Jin, Lifu Huang</dc:creator>
    </item>
    <item>
      <title>What Makes Synthetic Speech Sound Sarcastic? A Prosody-Controlled Perception Study</title>
      <link>https://arxiv.org/abs/2606.09717</link>
      <description>arXiv:2606.09717v1 Announce Type: new 
Abstract: Prosody plays a central role in sarcasm perception, yet previous studies have relied on naturally produced speech that lacks fine-grained control over individual acoustic dimensions. As prosodic cues co-vary in natural data, isolating their independent contributions remains challenging. We introduce a controlled framework using neural text-to-speech (TTS) with prompt-based prosodic conditioning to manipulate speech rate, pitch variation, and loudness. An orthogonal stimulus set was constructed to enable causal testing of prosodic cue effects. Human listeners rated sarcasm and naturalness, and their judgments were compared with predictions from a foundation model capable of processing audio input. Results show that loudness primarily drives human sarcasm perception, whereas the model assigns greater weight to speech rate, leading to distinct cue-weighting patterns. This study shows how controllable neural TTS enables investigation of prosodic cue weighting in speech perception.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09717v1</guid>
      <category>cs.SD</category>
      <category>eess.AS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhu Li, Shekhar Nayak, Matt Coler</dc:creator>
    </item>
    <item>
      <title>Evaluating the Representation Space of Diffusion Models via Self-Supervised Principles</title>
      <link>https://arxiv.org/abs/2606.09718</link>
      <description>arXiv:2606.09718v1 Announce Type: new 
Abstract: Diffusion models have demonstrated remarkable generative capabilities and have also emerged as powerful self-supervised representation learners, yet the connection between these two abilities remains less explored. Drawing inspiration from self-supervised learning (SSL), we introduce a framework for jointly evaluating the representation and generation capabilities of diffusion models. Specifically, we decompose features into invariant and residual components and derive the Invariant Contamination Ratio (ICR), a Fisher-based metric that quantifies how residual variation contaminates invariant signal in feature space. We use this framework to analyze both discriminative and generative behavior of diffusion models. On the representation side, we find that invariance peaks at intermediate noise levels, which also yield the best downstream classification performance. On the generative side, we study how training transitions from genuine generalization to memorization in data-limited regimes, and show that ICR serves as a sensitive training-time indicator of early learning: increasing residual energy along Fisher directions marks the onset of memorization, detectable from training features alone without external evaluators or held-out test sets. Overall, our results show that diffusion models can be monitored from a self-supervised perspective through the geometry of their learned representations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09718v1</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiao Li, Yixuan Jia, Zekai Zhang, Xiang Li, Lianghe Shi, Jinxin Zhou, Zhihui Zhu, Liyue Shen, Qing Qu</dc:creator>
    </item>
    <item>
      <title>Safe Polytope-in-Polytope Motion Planning and Control with Control Barrier Functions</title>
      <link>https://arxiv.org/abs/2606.09719</link>
      <description>arXiv:2606.09719v1 Announce Type: new 
Abstract: Autonomous mobile robots operating in tight environments require motion planning frameworks that account for the physical footprint of the robot. Simplifying the geometry to a point or a circle is conservative and discards information needed to successfully and safely traverse narrow passages. This work proposes a safe local motion planning and control method that guarantees that a polytopic robot footprint stays inside a continuously updated convex free-space region. The containment condition is formulated as a set of discrete-time control barrier function constraints within a model predictive controller. The number of safety constraints depends on the complexity of the local free-space geometry and the robot shape, instead of the number of obstacles. The proposed free-space formulation does not need any obstacle detection or segmentation. A comparative analysis against a polytope-based obstacle avoidance formulation confirms favorable scaling up to a reduction of 91$\times$ in computation time as the number of obstacles increases. The approach is validated in simulation with an autonomous surface vehicle and on hardware with a non-holonomic mobile robot, using both occupancy grids and LiDAR sensing. The experiments demonstrate safe real-time motion planning and control at 10~Hz on an onboard embedded computer, including reactive avoidance of dynamic obstacles.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09719v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Alejandro Gonzalez-Garcia, Dries Dirckx, Jan Swevers, Wilm Decr\'e</dc:creator>
    </item>
    <item>
      <title>Beyond Probabilistic Similarity: Structural, Temporal, and Causal Limitations of Retrieval-Augmented Generation in the Legal Domain</title>
      <link>https://arxiv.org/abs/2606.09724</link>
      <description>arXiv:2606.09724v1 Announce Type: new 
Abstract: Retrieval-Augmented Generation (RAG) has become a standard architectural response to unreliability in legal AI, yet high-profile failures, including fabricated citations submitted to courts and anachronistic legal content presented as current, continue to appear across jurisdictions. We argue that these failures are not residual confabulations to be eliminated by scaling language models, but symptoms of an architectural mismatch between probabilistic retrieval and the hierarchical, temporal, and institutional structure of legal knowledge. We develop the argument in three moves. First, we articulate the ontological commitment of legal knowledge as a triad of properties derivable from classical legal theory: hierarchical and mereological structure, diachronic dynamism under operational closure, and causal traceability of institutional provenance grounded in the duty of justification. Second, we identify three corresponding pathologies of retrieval (mereological blindness, diachronic blindness, and causal opacity), each developed with an operational definition, a failure mechanism, a canonical example, and detection criteria for diagnostic use. Third, we review the state of the art through this lens, showing that existing approaches address these requirements unevenly and do not yet compose into a paradigm that treats them as co-constitutive. From this analysis we derive four architectural commitments that characterize the deterministic-by-design direction for legal retrieval: ontological primacy, event reification, bitemporal correctness, and deterministic interaction protocols. The framework concerns quaestio juris (which norms apply and in what state) rather than the downstream tasks that act on identified norms, and addresses legislative and constitutional retrieval primarily, with interpretive time as an explicit extension.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09724v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hudson de Martim</dc:creator>
    </item>
    <item>
      <title>Disentanglement with Holographic Reduced Representations</title>
      <link>https://arxiv.org/abs/2606.09725</link>
      <description>arXiv:2606.09725v1 Announce Type: new 
Abstract: Disentanglement, the separation of factors of variation in data using neural networks, remains a long-standing challenge in machine learning. Prior work has addressed this problem with variational autoencoders and generative adversarial networks that incorporate ideas from variational inference and information-theoretic constraints. In contrast to methods that rely on continuous representations, we propose a design that treats disentangled representations as symbolic structures, motivated by the compositional relationships among the concepts that make up samples from a distribution. However, learning discrete symbolic structures with neural networks while maintaining differentiability is difficult and often requires complex architectures. To address this, we introduce an unsupervised learning algorithm that uses holographic reduced representations (HRR) for neural disentanglement. We show that the HRR unbinding operation provides an inductive bias for separating factors and yields competitive results against baselines, as measured by latent traversals and disentanglement metrics. We complement these empirical findings with an information-theoretic analysis of the HRR unbinding channel. We prove that unbinding induces approximately independent symbol-value pairs and derive a per-slot capacity bound that quantifies how many distinct symbolic concepts can be reliably encoded, giving a quantitative account of the inductive bias toward disentanglement. The resulting representations differ from standard autoencoder-based models, in that their latent units are vectors that are summed together, rather than scalar dimensions of a low-dimensional latent vector. We show that this HRR representation is more robust to noise than other disentangled representations and maintains reconstruction quality across a range of SNRs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09725v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jhonny J. Velasquez Olivera, Christo K. Thomas, Walid Saad</dc:creator>
    </item>
    <item>
      <title>Bayesian Probing on Graphs</title>
      <link>https://arxiv.org/abs/2606.09729</link>
      <description>arXiv:2606.09729v1 Announce Type: new 
Abstract: We introduce a stochastic probing problem with correlated items. In our model, which we call Bayesian Probing, the correlations are modeled by an underlying graph $G$. Each vertex is independently active with a known probability. Each item corresponds to an edge in the graph. Probing an edge has some cost, gives some reward if both endpoints are active, and also reveals the state of its endpoints. Hence a probe induces a Bayesian update on the remaining edges. The goal is to adaptively probe items/edges subject to a knapsack constraint to maximize the expected total reward obtained from the probed edges.
  Bayesian Probing generalizes stochastic knapsack and stochastic probing by allowing correlations between items. Moreover, it gives a tractable model for the Bayesian Active Search problem, a popular problem considered in the machine learning community. In Bayesian Active Search, the goal is to find items in a particular class by adaptively probing at most, say $k$, items. Given a prior distribution over items, we want to compute a Bayesian policy to maximize the number of such items found. For this general problem with arbitrary priors, there are strong lower bounds on efficiently computing good policies.
  In this paper, we design efficient approximation algorithms for Bayesian Probing. These results give the first efficient approximation algorithms for Bayesian Active Search, for a class of practically-relevant prior distributions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09729v1</guid>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Anupam Gupta, Benjamin Moseley, Rudy Zhou</dc:creator>
    </item>
    <item>
      <title>SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research</title>
      <link>https://arxiv.org/abs/2606.09730</link>
      <description>arXiv:2606.09730v1 Announce Type: new 
Abstract: Large language models are increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet model context windows remain inherently finite. Recent work explores a paradigm where a main agent decomposes tasks and dispatches subtasks to subagents, which execute and return only summarized results, conserving the main agent's context budget. However, performing this well requires delegation intelligence: the ability to decompose complex tasks, determine when and what to delegate, and integrate returned results into the ongoing workflow. Training data for this capability is scarce in naturally occurring text, and to our knowledge, how to synthesize such data and train models to acquire this capability remains largely unexplored in the open-source community. To bridge this gap, we present a preliminary exploration targeting deep research, a representative long-horizon agent task. Specifically, we design a harness that guides the model toward high-quality task decomposition and delegation, while constraining subagents to return results properly to support the main agent's workflow. The harness-guided trajectories naturally encode correct delegation decisions, which we use as supervised fine-tuning data to internalize delegation intelligence into model weights. Our resulting model, SearchSwarm-30B-A3B, achieves 68.1 on BrowseComp and 73.3 on BrowseComp-ZH, the best results among all models of comparable scale. We will release our harness, model weights, and training data to facilitate future research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09730v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pu Ning, Quan Chen, Kun Tao, Xinyu Tang, Tianshu Wang, Qianggang Cao, Xinyu Kong, Zujie Wen, Zhiqiang Zhang, Jun Zhou</dc:creator>
    </item>
    <item>
      <title>Tight Sample Complexity of Transformers</title>
      <link>https://arxiv.org/abs/2606.09731</link>
      <description>arXiv:2606.09731v1 Announce Type: new 
Abstract: We tightly characterize the VC dimension of depth-$L$ Transformers with a total of $W$ parameters, mapping an input sequence of length $T$ to a single output, establishing an upper bound of $O(L W \log (T W))$ and a nearly matching lower bound of $\Omega(L W \log (T W / L))$. We further tightly characterize the sample complexity of chain-of-thought learning using such a Transformer, showing teacher forcing (i.e. selecting a predictor consistent with the entire chain-of-thought on training data) learns with sample complexity $O\left(L W \log \left(\left(T+T^{\prime}\right) W\right)\right)$ and that any learning rule that uses chain-of-thought data requires at least $\Omega\left(L W \log \left(\left(T+T^{\prime}\right) W / L\right)\right)$ examples, where $T$ is the input length and $T^{\prime}$ is the number of autoregressive steps.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09731v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chenxiao Yang, Nathan Srebro, Zhiyuan Li</dc:creator>
    </item>
    <item>
      <title>The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model</title>
      <link>https://arxiv.org/abs/2606.09735</link>
      <description>arXiv:2606.09735v1 Announce Type: new 
Abstract: The ambition behind alignment training is to make large language models safe and useful. The primary mechanism, reinforcement learning from human feedback (RLHF), shapes the behavior of deployed language models by aligning them with ``human values.'' Yet the process is opaque. What values are being encoded; whose values are they; and how does RLHF encode them? A growing body of evidence suggests that RLHF produces only functional compliance rather than deep alignment. We offer a mechanistic case study of this phenomenon for partisan political orientation with a comparison of the internal representations of Llama 3.1 8B before and after RLHF. We show that RLHF does not remove the structured partisan direction in the base model. Instead, it compresses the variance of the partisan signal to generate consistently balanced and non-partisan output. Sparse autoencoder decomposition reveals that policy-encoding features, which activate sporadically in the base model, are completely inactive in the Instruct model. Feature-level steering experiments confirm the causal disconnect. RLHF thus encodes a norm of political neutrality, not by erasing the model's knowledge of partisanship, but by severing the causal pathway from partisan geometry to output generation. Importantly, this neutrality is functional, not structural so that the underlying geometry that enables partisan steering remains intact. The mechanisms that bypass RLHF's guardrails, such as inferring and amplifying a user's partisan identity, reactivate partisan generation. If RLHF operates by disconnecting rather than removing value-laden structure, then the same pattern may hold for other value domains, and the aligned model's behavior may be more fragile than its outputs suggest.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09735v1</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wendy K. Tam</dc:creator>
    </item>
    <item>
      <title>HDSL: A Hierarchical Domain-Specific Language for Structured 3D Indoor Scene Generation and Localized Editing with LLM Agents</title>
      <link>https://arxiv.org/abs/2606.09738</link>
      <description>arXiv:2606.09738v1 Announce Type: new 
Abstract: Text-driven indoor scene generation and editing require an intermediate representation that language models can both produce and revise. Existing LLM-based systems often rely on scene graphs or global constraint lists, which are compact but underspecify local geometry and make instruction-based edits difficult to localize. We frame this problem as structured program generation and local program repair, and propose Hierarchical Descriptive Scene Language (HDSL), an XML/CSS-style domain-specific language for structured 3D indoor scenes. HDSL represents rooms, regions, objects, and support surfaces as a tree with local coordinates, making complex scenes easier to plan recursively and easier to retrieve for editing. Our pipeline uses LLM agents to generate HDSL subtrees with bounded verification, grounds non-virtual nodes through multimodal asset retrieval, and applies force-directed layout optimization to repair boundary and collision errors. For editing, Hierarchical Retrieval-Augmented Generation retrieves the relevant subtree, asks the LLM to rewrite only that local context, and merges the result back through a deterministic three-way merge. In our reproduced benchmark, HDSL improves average object coverage, text-scene alignment, and generation time over full text-to-scene baselines while remaining competitive with recent layout-only reproductions on geometry metrics; for editing, HRAG reduces token use by $5.22\times$ and runtime by $6.19\times$, produces valid DSL for all eight paired edits, and better preserves unrelated scene objects.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09738v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Letian Li, Chao Shen, Shuzhao Xie, Chenghao Gu, ZhengXiao He, Yu Meng, Xin Yang, Wenyuan Jiang, Zhi Wang</dc:creator>
    </item>
    <item>
      <title>ProbeAct: Probe-Guided Training-Free Failure Recovery in Vision-Language-Action Models</title>
      <link>https://arxiv.org/abs/2606.09740</link>
      <description>arXiv:2606.09740v1 Announce Type: new 
Abstract: Vision-Language-Action (VLA) models demonstrate strong perfor-1 mance on language-conditioned robotic manipulation within their training dis-2 tribution, yet their generalization capabilities remain fundamentally limited. They3 lack the robustness required to handle perturbations, frequently failing when con-4 fronted with lighting changes, altered camera viewpoints, or small initial-state5 variations. We propose PROBEACT, a training-free runtime intervention frame-6 work that detects and recovers from grasping and placement failures in pre-7 trained VLA policies without modifying their weights or requiring additional8 demonstrations. PROBEACT combines three components: (i) a lightweight multi-9 target hidden-state probe that predicts the 3D positions of task-relevant objects10 from intermediate VLA features, with Hungarian-matched identity tracking for11 multi-object scenes; (ii) an object-agnostic kinematic state machine that detects12 grasp, transport, and placement failures using only gripper-internal signals and13 end-effector kinematics; and (iii) a hierarchical Control Barrier Function (CBF)14 filter that encodes repeated-failure locations as soft safe-set constraints, mini-15 mally correcting VLA actions while preserving baseline behavior. As a plug-and-16 play, training-free intervention loop, PROBEACT is orthogonal to existing train-17 ing pipelines. Evaluated on the LIBERO-plus benchmark, our framework acts as18 a universal safety net, improving the success rate of the OpenVLA-OFT model19 from 69.6% to 74.1%, while demonstrating broad applicability to both base and20 fine-tuned VLA policies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09740v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Fan Zhang, Seongbin Park, Baharan Mirzasoleiman, Shariar Talebi, Nader Sehatbakhsh</dc:creator>
    </item>
    <item>
      <title>bbsolver: A Unified Error-Bounded Spatiotemporal Optimization Solver for Key Timing and Topology-Consistent Vector Paths</title>
      <link>https://arxiv.org/abs/2606.09741</link>
      <description>arXiv:2606.09741v1 Announce Type: new 
Abstract: Dense sampling records what an animation system actually evaluated, but it produces a poor final representation: every sampled frame can become a key, edit handles become noisy, and animated vector paths remain hard to adjust. Existing reducers usually treat the two axes separately: animation-curve reducers reduce key timing, while curve and path simplifiers reduce geometry. When applied independently to animated paths, these methods can break point identity across frames, change vertex structure over time, or provide no single error budget that covers both timing and shape. bbsolver frames the task as tolerance-bounded spatiotemporal reduction. A host application, such as After Effects or Blender, samples temporal and spatial animation into a documented JSON bundle; the standalone solver chooses sparse keys, interpolation metadata, and path representation; and the output is accepted only if replayed samples remain within the requested worst-case error. The same solver core can be used by any application that can export samples and write back returned keys or paths. In After Effects validation, solved keys written back into AE and re-sampled from AE playback reduce a DUIK humanoid walk cycle from 12,684 samples to 540 keys at epsilon=1, a 23.5x reduction, and an ant rig from 11,956 samples to 653 keys, an 18.3x reduction, with maximum errors below 1 px and 1 degree. A Blender-sampled FBX mocap retarget reaches 214 keys from 13,455 samples at epsilon=3; baselines tuned to matched measured accuracy require 4.5x to 27.5x more scalar key entries. For vector paths, bbsolver supports reduction when vertex identity/order is constant over time and diagnostics for variable-vertex-count streams, including a 6.7x After Effects-compatible procedural-path compression and exact transition-timing recovery in a diagnostic case.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09741v1</guid>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ilya Gusinski (IVG Design)</dc:creator>
    </item>
    <item>
      <title>Learning Dynamics Reveal a Hierarchy of Weight-Induced Layerwise Gram Metrics</title>
      <link>https://arxiv.org/abs/2606.09744</link>
      <description>arXiv:2606.09744v1 Announce Type: new 
Abstract: We study feed-forward ReLU networks with fixed readout and quadratic loss. The aim is to rewrite gradient descent not primarily as a dynamics in weight space, but as a collective dynamics closed in terms of fields defined on the training-set space. For a single hidden layer, the weight variables can be eliminated from the activation dynamics, yielding a closed equation for the residuals governed by a collective kernel that factorizes into an input-geometric matrix and a dynamical co-activation matrix. For deeper networks, the residual dynamics retains a clean layer-wise kernel structure. However, from depth three onward, closure requires a hierarchy of weight-induced Gram operators that mediate information transport across layers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09744v1</guid>
      <category>cs.LG</category>
      <category>cond-mat.dis-nn</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Claudio Nordio</dc:creator>
    </item>
    <item>
      <title>Hybrid Robustness Verification for Spatio-Temporal Neural Networks</title>
      <link>https://arxiv.org/abs/2606.09746</link>
      <description>arXiv:2606.09746v1 Announce Type: new 
Abstract: With AI increasingly deployed in safety-critical systems, providing formal robustness guarantees for the underlying models is essential. Existing verification methods either rely on overly conservative approximations or incur prohibitive computational costs. For example, the use of lp-norm perturbations in video settings encodes the belief that the adversary can inject noise in every video frame. In practice, adversarial perturbations exhibit structured spatial and temporal correlations, constrained to lower-dimensional, semantically meaningful subspaces. In this work, we study robustness verification of 3D CNNs processing video and volumetric inputs, targeting applications in action recognition (UCF-101), autonomous driving (Udacity), and medical imaging (MedMNIST) exploiting realistic assumptions on adversarial strength by modelling them as spatio-temporal constraints - where the attacker can modify either a subset of frames or patches within a set of consecutive frames. We demonstrate that modelling realistic constraints enables tighter approximations. We introduce Spatio-Temporal Bound Propagation (STBP), a verification framework that computes an exact closed-form characterization of the first convolutional layer and propagates certified bounds through subsequent layers using scalable approximations. Computing the exact closed form provides the tightest bounds for the first convolutional layer. Thus, we utilise approximation methods in the remainder of the network. To spur further progress in this field, we propose ST-Bench, a verification benchmark for autonomous driving and activity recognition, to systematically evaluate verifiable robustness. Compared to existing verification-based approaches, STBP provides stronger robustness guarantees with significantly improved scalability, achieving 1.7x higher certified robust accuracy under identical perturbation budgets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09746v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Sherwin Varghese, Matthew Wicker, Alessio Lomuscio</dc:creator>
    </item>
    <item>
      <title>Multi-Turn Evaluation of Deep Research Agents Under Process-Level Feedback</title>
      <link>https://arxiv.org/abs/2606.09748</link>
      <description>arXiv:2606.09748v1 Announce Type: new 
Abstract: Existing benchmarks for deep research agents (DRAs) assess only single-shot outputs, ignoring a key question: can DRAs improve their reports when guided by feedback? To investigate this, we conduct a multi-turn evaluation of DRAs under two feedback settings: self-reflection, in which the agent revises its report without any external diagnostic signal, and process-level feedback, in which the agent receives guidance targeting gaps in its research strategy. To enable process-level feedback, we design Research Gap Inference (RGI), a method that analyzes patterns of satisfied and unsatisfied rubric criteria to infer research-process gaps. Our analysis reveals three key findings: (i) under self-reflection, agents incorporate and regress on rubric criteria at nearly equal rates, yielding negligible net improvement; (ii) a single round of process-level feedback yields substantial gains, raising the normalized score by approximately $8$-$15$ points and yielding a roughly $35$-$40\%$ incorporation rate; (iii) these gains do not compound over subsequent turns, as agents regress on up to $24\%$ of previously satisfied criteria when rewriting the full report to address remaining gaps. Even with targeted guidance, reliable multi-turn improvement remains out of reach for the DRA architectures we evaluate. Our code and results are publicly available at https://github.com/sabharwalrishabh/Multi-Turn-Evaluation-of-DRAs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09748v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rishabh Sabharwal, Hongru Wang, Amos Storkey, Jeff Z. Pan</dc:creator>
    </item>
    <item>
      <title>Your Model Already Knows: Attention-Guided Safety Filter for Vision-Language-Action Models</title>
      <link>https://arxiv.org/abs/2606.09749</link>
      <description>arXiv:2606.09749v1 Announce Type: new 
Abstract: Vision-Language-Action (VLA) models have demonstrated impressive end-to-end performance across a variety of robotic manipulation tasks. However, these policies offer no guarantees against collisions with task-irrelevant objects in the scene. Existing safety filters sidestep this problem by querying a vision-language model (VLM) to identify obstacles and their locations. This, however, is too slow to run in the control loop and can only be invoked at episode initialization, leaving the filter unable to track moving obstacles. We discover that a small number of attention heads within a VLA model reliably localize the object the policy intends to approach. These heads can be exploited within a training-free safety framework that obtains the active target from the attention heads at every step, treats the remainder of the scene as obstacles, and feeds these into a Control Barrier Function (CBF) filter. Together with a lightweight real-time object tracker, this allows for collision avoidance for non-static obstacles. We evaluate our framework on SafeLIBERO, which we extend with moving obstacles. On the original static benchmark, our method performs comparably to an oracle that uses privileged simulator state to identify the target, emulating a VLM-based identification step run once at episode initialization. On the dynamic variant, where the oracle's init-time target assignment becomes stale, our method substantially outperforms it by 43%, on average. Our findings suggest that the perceptual signals needed for real-time safety filtering are already present within VLA policies and can be exploited without additional training or heavy auxiliary models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09749v1</guid>
      <category>cs.RO</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Seongbin Park, Fan Zhang, Baharan Mirzasoleiman, Shahriar Talebi, Nader Sehatbakhsh</dc:creator>
    </item>
    <item>
      <title>Collaborative Human-Agent Protocol (CHAP)</title>
      <link>https://arxiv.org/abs/2606.09751</link>
      <description>arXiv:2606.09751v1 Announce Type: new 
Abstract: Foundation models are moving from response generation into operational roles. They plan across steps, call tools, request human input, coordinate with other agents, and increasingly carry responsibility for work that affects customers, claims, code, contracts, and clinical decisions. Production deployments are no longer one human supervising one model. They are multi-human, multi-agent collaborations that cross teams, time zones, and trust boundaries. The technical surface for this collaboration remains weakly specified. When an agent drafts a response and a human edits it before it ships, the moment of human judgement is the most valuable signal in the system. In current practice it is recorded, if at all, in application code, chat threads, ticket comments, and tribal memory. Two protocol standards address adjacent concerns: MCP standardises agent access to tools and data, and A2A standardises agent-to-agent interoperability. Neither defines the shared workspace in which humans and agents perform accountable work together. This paper presents CHAP, the Collaborative Human-Agent Protocol. Under CHAP, the override that used to vanish into a chat thread becomes a structured event carrying a diff, a rationale, and a content hash. The handoff between shifts becomes a portable envelope rather than a pinned message. The human approval of an agent's draft becomes a non-repudiable signed decision that can be replayed years later. The protocol achieves this through a small Core (workspaces, participants, tasks, artefacts, and an append-only evidence log) together with composable profiles that add review, modes, routing, deliberation, handoff, identity, signatures, and transparency-backed audit as deployments require them. Specification, reference implementation, conformance suite, and worked examples are available at: https://github.com/BrightbeamAI/chap</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09751v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Arsalan Shahid, Gordon Suttie, Philip Black</dc:creator>
    </item>
    <item>
      <title>Human-Centred Risk Mitigation for AI-Mediated Information Manipulation: A SOCMINT Framework Based on Information Manipulation Sets</title>
      <link>https://arxiv.org/abs/2606.09754</link>
      <description>arXiv:2606.09754v1 Announce Type: new 
Abstract: AI-mediated information manipulation increasingly takes the form of social cyber attacks that target trust, attention, credibility, reputation, and decision-making rather than only technical infrastructures or isolated false contents. Existing defensive approaches often oscillate between incident-level analysis, which fragments campaigns into weak signals, and attribution-first analysis, which may delay mitigation until responsibility is established. This paper proposes a SOCMINT framework based on Information Manipulation Sets (IMS) as an intermediate operational unit between individual incidents and strategic attribution. Building on the VIGINUM/EEAS use of IMS in counter-FIMI analysis, the framework treats manipulation as a coherent process involving narratives, accounts, infrastructures, temporal patterns, cross-platform migration, synthetic amplification, and cognitive targeting. The proposed pipeline moves from signal detection and diagnostic triage to IMS hypothesis construction, confidence/severity assessment, mitigation selection, and iterative update. A compact scenario illustrates how IMS-based analysis captures what content-level and attribution-first approaches miss. The paper also proposes a tabletop evaluation protocol to assess decision quality, confidence calibration, and mitigation proportionality. The main implication is that human-centred risk mitigation requires not only better detection, but also structured reasoning under uncertainty, auditable decision-making, and safeguards against over-securitising legitimate dissent.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09754v1</guid>
      <category>cs.CY</category>
      <category>cs.CR</category>
      <category>cs.SI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Antonio Scala</dc:creator>
    </item>
    <item>
      <title>Perturbative Contrastive Physical Learning</title>
      <link>https://arxiv.org/abs/2606.09756</link>
      <description>arXiv:2606.09756v1 Announce Type: new 
Abstract: Responses to perturbations are key to understanding physical systems. The ability to contrast such responses by comparing how a system reacts under slightly different conditions provides a mechanism for learning. Here, we introduce Perturbative Contrastive Physical Learning (PCPL), a general framework in which learning emerges from measurable contrasts between physical states produced by controlled changes to inputs, boundary conditions, parameters, or interpreter functions. PCPL unifies and extends prior approaches: Equilibrium Propagation is rooted in contrasts between free and nudged equilibria in energy-based systems, while Frequency Propagation corresponds to contrasts extracted from sinusoidally driven, frequency-demodulated responses. We show that contrast-driven updates can reflect either local sensitivities or global inverse-problem structure, yet do not require centralized gradient computation. Instead, effective learning geometry emerges implicitly from the system's own physical response, allowing learning behavior to arise without an external processor or explicit backpropagation. We demonstrate PCPL in two platforms: (i) spring networks that update bond stiffness using measured displacements and forces, and (ii) continuous-variable photonic circuits trained via x quadrature measurements and finite-difference estimates of the Jacobian. Both platforms successfully learn classification tasks. We further show that a continuous-variable photonic circuit can be trained to implement analog multiplication, illustrating a step toward more autonomous physical learning systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09756v1</guid>
      <category>cs.LG</category>
      <category>cond-mat.dis-nn</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Kyungeun Kim, Amanuel Anteneh, Israel Klich, Olivier Pfister, J. M. Schwarz</dc:creator>
    </item>
    <item>
      <title>Difference-Aware Retrieval Policies for Imitation Learning</title>
      <link>https://arxiv.org/abs/2606.09758</link>
      <description>arXiv:2606.09758v1 Announce Type: new 
Abstract: Parametric imitation learning via behavior cloning can suffer from poor generalization to out-of-distribution states due to compounding errors during deployment. We show that reusing the training data during inference via a semi-parametric retrieval-based imitation learning approach can alleviate this challenge. We present Difference-Aware Retrieval Policies for Imitation Learning (DARP), a semi-parametric retrieval-based imitation learning approach that addresses this limitation by reparameterizing the imitation learning problem in terms of local neighborhood structure rather than direct state-to-action mappings. Instead of learning a global policy, DARP trains a model to predict actions based on $k$-nearest neighbors from expert demonstrations, their corresponding actions, and the relative distance vectors between neighbor states and query states. DARP requires no additional assumptions beyond those made for standard behavior cloning -- it does not require additional data collection, online expert feedback, or task-specific knowledge. We demonstrate consistent performance improvements of 15-46% over standard behavior cloning across diverse domains, including continuous control and robotic manipulation, and across different representations, including high-dimensional visual features. Code and demos are available at https://weirdlabuw.github.io/darp-site/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09758v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Quinn Pfeifer, Ethan Pronovost, Paarth Shah, Khimya Khetarpal, Siddhartha Srinivasa, Abhishek Gupta</dc:creator>
    </item>
    <item>
      <title>Preserving Plasticity in Continual Learning via Dynamical Isometry</title>
      <link>https://arxiv.org/abs/2606.09762</link>
      <description>arXiv:2606.09762v1 Announce Type: new 
Abstract: Continual training of deep neural networks under non-stationarity often leads to a progressive loss of plasticity, eventually limiting further learning. We relate plasticity to the empirical Neural Tangent Kernel, and identify dynamical isometry (the condition that layer-wise Jacobian singular values remain close to one) as a key mechanism for preserving plasticity in continual learning. We revisit a class of networks that are almost-everywhere isometric while remaining universal Lipschitz function approximators, demonstrating that near-dynamical isometry is compatible with expressive nonlinear representations. For general architectures, we propose an efficient isometry-promoting regularization scheme and identify a novel mechanism by which it can reactivate dormant ReLU units. Building on this, we introduce AdamO, an Adam-style adaptive optimizer that decouples isometry regularization from gradient updates, analogous to AdamW. We further reinterpret prior plasticity-preserving approaches through the lens of dynamical isometry, showing that they target only a partial measure of isometry. Across supervised and reinforcement-learning continual-learning benchmarks designed to induce plasticity loss, our methods consistently match or outperform existing approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09762v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>Forty-Third International Conference on Machine Learning (ICML 2026)</arxiv:journal_reference>
      <dc:creator>Andries Rosseau, Robert M\"uller, Ann Now\'e</dc:creator>
    </item>
    <item>
      <title>iOSWorld: A Benchmark for Personally Intelligent Phone Agents</title>
      <link>https://arxiv.org/abs/2606.09764</link>
      <description>arXiv:2606.09764v1 Announce Type: new 
Abstract: A useful phone agent needs to be personally intelligent. It should reason over a user's identity, history, and preferences as they exist on the device, not just follow isolated instructions in an impersonal sandbox. Existing mobile agent benchmarks lack this kind of personalization. We introduce iOSWorld, the first interactive native iOS simulator benchmark built around a persistent user identity spanning 26 newly built iOS apps. These apps contain connected data such as transactions, messages, travel records, social relationships, and financial activity. iOSWorld includes 133 tasks across three increasingly difficult categories. Single-app tasks (27) test one app, multi-app tasks (60) span 2 to 8 apps, and memory and personalization tasks (46) require agents to infer patterns from personal data. We evaluate frontier and open-source computer-use models in both vision-only and privileged vision+XML settings. The best configuration reaches 52\% overall but only 37\% on multi-app tasks. Privileged vision+XML access improves frontier models by up to 26 percentage points, while smaller models do not benefit from added accessibility-tree input. We release iOSWorld as an open-source benchmark with all apps, seeded data, tasks, rubrics, and evaluation code.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09764v1</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lawrence Keunho Jang, Mareks Woodside, Geronimo Carom, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan Salakhutdinov</dc:creator>
    </item>
    <item>
      <title>Data Synthesis and Parameter-Efficient Fine-Tuning for Low-Resource NMT: A Case Study on Q'eqchi' Mayan</title>
      <link>https://arxiv.org/abs/2606.09767</link>
      <description>arXiv:2606.09767v1 Announce Type: new 
Abstract: Neural machine translation for digitally low-resource Indigenous languages is often hindered by extreme data scarcity, prompting reliance on extractive web-scraping. To ensure data sovereignty, this study introduces a data synthesis methodology to bootstrap NMT models without scraping target-language parallel text. Focusing on Q'eqchi' Mayan, we transformed community-sourced dictionaries into a massive synthetic corpus, utilizing Parameter-Efficient Fine-Tuning (PEFT) via LoRA adapters on an mT5-base model.
  In-domain evaluation demonstrates high structural acquisition (BLEU 42.02), proving that synthetic constraints effectively teach complex agglutinative morphology and VOS word order. However, evaluation against an organic glossary reveals a structural-semantic gap (BLEU 0.59), where the model maintains grammatical integrity but lacks the lexical grounding of natural language. The model exhibits overfitting to the constrained structural variance of the synthetic templates; despite high semantic entropy in the pipeline, it struggles with the syntactic fluidity of natural language, forcing organic inputs into rigid learned patterns. Furthermore, an ablation study utilizing a Multi-Task Learning architecture resulted in negative transfer, suggesting that auxiliary tasks competed for limited parameter capacity within the LoRA adapters, causing over-optimization for synthetic markers at the expense of organic flexibility. Ultimately, we establish that synthetic bootstrapping is a highly effective structural primer, but requires authentic data for semantic refinement via Curriculum Learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09767v1</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Alexander Chulzhanov, Soeren Eberhardt, Arjun Mukherjee</dc:creator>
    </item>
    <item>
      <title>SemDINO: A DINOv3-Driven Network for Cross-Temporal Semantic Alignment in Change Detection</title>
      <link>https://arxiv.org/abs/2606.09772</link>
      <description>arXiv:2606.09772v1 Announce Type: new 
Abstract: Semantic change detection (SCD) aims to simultaneously locate land-cover changes and identify semantic categories before and after transition. However, existing methods suffer from insufficient cross-temporal alignment, weak multi-scale representation, and poor robustness to pseudo-changes caused by illumination, season, and registration noise. To address these issues, we propose a novel end-to-end semantic change detection network named SemDINO, which integrates a dual-branch encoder, multi-scale temporal interaction, semantic purification, change enhancement, and decoupled multi-task prediction into a unified framework. Specifically, we construct a dual-branch encoder that combines a CNN backbone and frozen DINOv3 features via gated pyramid fusion, enabling rich multi-scale semantic representation. Then, a multi-scale temporal bidirectional transformer interaction (M-TBTT) module is proposed to achieve global cross-temporal feature alignment and information interaction. To further enhance genuine changes and suppress pseudo-variations, we introduce semantic purification (SCP), bidirectional change enhancement (BiChangeEnhance), and multi-scale change enhancement (MCE) modules collaboratively. Finally, a multi-branch CD prediction head is designed to jointly output binary change mask, bi-temporal semantic maps, and edge constraint. Extensive experiments on public remote sensing CD datasets demonstrate that SemDINO achieves superior performance and generalization ability against state-of-the-art methods, especially in complex scenarios with interference factors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09772v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xinyu Tong, Meihua Zhou, Jinxiao Sun, Yingjie Tang, Lei Wang</dc:creator>
    </item>
    <item>
      <title>SIGA: Self-Evolving Coding-Agent Adapters for Scientific Simulation</title>
      <link>https://arxiv.org/abs/2606.09774</link>
      <description>arXiv:2606.09774v1 Announce Type: new 
Abstract: Advanced scientific simulators expose specialized input languages that turn simulation goals into executable configurations, but learning them can cost domain scientists hours to days. We study simulator setup as a problem of agent-tool interface grounding: what minimal simulator-specific adaptations are needed for an off-the-shelf coding agent to operate real scientific software? Our intuition is that coding agents already know how to navigate files, edit code, run commands, and repair outputs, but they lack the simulator's executable contract: its vocabulary, structural constraints, validation rules, and termination conditions. We introduce SIGA, a Simulator-Interface Grounding Adapter that supplies this contract through retrieval, procedural memory, in-trajectory validation, and validation-enforced termination. We primarily evaluate SIGA on GEOS, an open-source multiphysics simulator used in subsurface science. SIGA produces a complete GEOS deck in about five minutes with TreeSim above 0.90, matching an extended-budget human expert who took about three hours, a roughly 36x wall-clock speedup. On a harder held-out set, grounding raises TreeSim from 0.720 to 0.789, a roughly 10% relative gain over the bare agent, and can reduce the across-seed standard deviation by 16x. Self-evolution further improves SIGA by rewriting adapter contents from prior trajectories, yielding the highest held-out GEOS mean and matching or outperforming the strongest hand-designed configuration. Transfers to OpenFOAM and LAMMPS show that the dominant mechanism shifts by interface: validation matters most when structural completeness is the bottleneck, while memory and retrieval matter most when domain correctness is the bottleneck. These results suggest that lightweight, self-improvable grounding layers can turn general coding agents into practical operators of scientific software.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09774v1</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Matthew Ho, Brian Liu, Jixuan Chen, Audrey Wang, Lianhui Qin</dc:creator>
    </item>
    <item>
      <title>AetheRock: An Arm-Worn Robot Teaching System for Force-Guided Vision-Tactile Learning</title>
      <link>https://arxiv.org/abs/2606.09777</link>
      <description>arXiv:2606.09777v1 Announce Type: new 
Abstract: Force and tactile sensing are indispensable in contact-rich manipulation. However, force-aware robot learning faces critical challenges due to the incompatible assembly of tactile and force sensors in handheld or wearable devices. To address these limitations, we first introduce AetheRock for gripper-force, vision, and tactile data collection, which is an arm-worn device featuring a modular and easily manufactured visuo-tactile sensor, GelSlim-MiniFab, at the fingertip, a resistive pressure sensor at the human finger contact region, a customized PCB module, and a wearable kit for comfortable and robust collection. Building on this, we propose ForceVT, a representation learning framework that uses force and vision to guide fidelity-agnostic tactile learning, enabling robust inference in any tactile situation. Real-world experiments show that AetheRock achieves qualified data efficiency and that ForceVT effectively alleviates inefficiencies when visuo-tactile sensors exhibit manufacturing and utilization inconsistencies. Overall, our work mitigates the limitations of gripper-force vision-tactile robot learning through innovative hardware design and algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09777v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hong Li, Yue Xu, Yihan Tang, Yankang Dong, Chenyuan Liu, Chenyang Yu, Xuyang Li, Siyuan Huang, Yujun Shen, Nan Xue, Yong-Lu Li</dc:creator>
    </item>
    <item>
      <title>Quality-Diversity Search in Sound Generation: Investigating Innovation Engines for Audio Exploration</title>
      <link>https://arxiv.org/abs/2606.09780</link>
      <description>arXiv:2606.09780v1 Announce Type: new 
Abstract: This study addresses the challenges composers and sound designers face in creating and refining tools to achieve their musical goals. Using evolutionary processes to promote diversity and foster serendipitous discoveries, we automate the search through uncharted sonic spaces for sound discovery, arguing that diversity-promoting algorithms can bridge the gap between the theoretical realisation and practical accessibility of sounds. We describe a system for generative sound synthesis combining Quality Diversity (QD) algorithms with a supervised discriminative model, inspired by the Innovation Engine algorithm, and explore different configurations and the interplay between the chosen synthesis approach and the discriminative model. We examine the interaction between Compositional Pattern Producing Networks (CPPNs) and Digital Signal Processing (DSP) graphs, introducing a novel approach that uses multiple specialised CPPNs for different frequency ranges; this yields simpler networks while maintaining performance comparable to single-CPPN setups. We also investigate evolutionary stepping stones by analysing goal switches between musical and non-musical contexts, revealing how lineages traverse unlikely paths to current elites. Expanding the behaviour space of a previous study to include various sound durations, we uncover specialisation within temporal niches. Results indicate that CPPN and DSP graphs coupled with a Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) and a deep learning classifier can generate a substantial variety of synthetic sounds, diverse and innovative across temporal and contextual dimensions. We present the generated sound objects through an online explorer and as rendered sound files, and, in the context of music composition, an experimental application that showcases their creative potential across various durations and contexts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09780v1</guid>
      <category>cs.SD</category>
      <category>cs.NE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bj\"orn {\TH}\'or J\'onsson, \c{C}a\u{g}r{\i} Erdem, Stefano Fasciani, Kyrre Glette</dc:creator>
    </item>
    <item>
      <title>Cohort-based Semantic Labeling: AI-Enabled Recovery of Visualization Semantics from Deployed SVGs</title>
      <link>https://arxiv.org/abs/2606.09782</link>
      <description>arXiv:2606.09782v1 Announce Type: new 
Abstract: Many web-based visualizations are deployed as Scalable Vector Graphics (SVG), a format that faithfully preserves visual appearance but typically omits the higher-level semantic structure needed for machine interpretation. Once rendered and published, information about a visualization's components, roles, and encodings is no longer explicitly available, limiting downstream operations such as querying, accessibility augmentation, explanation, personalization, and transformation. To address this gap, we introduce CSL, an AI-enabled, multi-stage pipeline for automatically recovering visualization semantics from deployed SVGs through two complementary mechanisms: (1) cohort-based decomposition, which organizes heterogeneous SVG primitives into structurally coherent subsets that reduce the semantic assignment space, and (2) hybrid semantic grounding, which combines model-based inference with deterministic structural validation and propagation to make labeling both context-sensitive and structurally anchored. CSL produces Semantic SVG (SSVG), a representation in which SVG elements are annotated with graphical mark type, visualization role, and data role. We implemented CSL as an end-to-end prototype and evaluated it on 102 SVG visualizations, achieving global macro-averaged accuracies of 0.822 for mark type, 0.853 for visualization role, and 0.860 for data-role recovery. An ablation against a non-cohort whole-chart baseline showed that cohorting significantly improves accuracy (paired t-test: t &gt; 20, p &lt; 0.001; Cohen's d &gt; 2.0), and repeated labeling of a randomly selected SVG over 100 runs yielded mean agreement above 91.9% across all three attributes. These results provide strong evidence that CSL can transform deployed SVGs into machine-usable semantic representations, enabling more accessible, adaptive, and user-steerable visualization systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09782v1</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jeongah Lee, Hima Varshini Surisetty, Durga Nirmaleswaran, Jahnavi Sharma, Srikiran Kavuri, Narges Mahyar, Ali Sarvghad</dc:creator>
    </item>
    <item>
      <title>Zero Touch Predictive Orchestration: Automating Time-Series Models for the Cloud-Edge Continuum</title>
      <link>https://arxiv.org/abs/2606.09787</link>
      <description>arXiv:2606.09787v1 Announce Type: new 
Abstract: The Cloud-Edge Continuum (CEC) enables latency-critical applications by distributing resources to the far edge, but its extreme volatility makes proactive Zero Touch Management via time-series forecasting essential. However, orchestrators face a severe "cold start" problem: newly discovered nodes lack the historical data required to train localized predictive models, while generalized models fail to capture unique hardware and microservice behaviors. To solve this, we propose a fully automated time-series prediction architecture driven by a novel data-mixing methodology. At the infrastructure level, we introduce a lightweight, technology-agnostic Resource Exposer (RE) that dynamically discovers nodes and continuously collects customizable telemetry (e.g., compute, network, energy). To overcome the sparsity of these initial local samples, our framework automatically merges them with TimeTrack, our publicly available, high-resolution dataset collected at 45-second intervals. This synergizes TimeTrack's foundational, high-frequency temporal patterns with the precise calibration of the local node data. Processed through a Neural Architecture Search (NAS) engine, the system automatically generates highly accurate baseline models. Experimental results demonstrate that merging the target data with TimeTrack effectively mitigates the cold start challenge. This integration significantly improves forecasting accuracy measured in Mean Squared Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE) and accelerates convergence compared to training on the sparse local samples alone, training solely on generic datasets, or mixing the target data with standard alternative datasets, establishing a robust foundation for continuous MLOps deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09787v1</guid>
      <category>cs.LG</category>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Abd Elghani Meliani, Arora Sagar, Adlen Ksentini, Raymond Knopp</dc:creator>
    </item>
    <item>
      <title>POTATR: A Lightweight Image-to-Graph Model for Page-Level Table Extraction</title>
      <link>https://arxiv.org/abs/2606.09788</link>
      <description>arXiv:2606.09788v1 Announce Type: new 
Abstract: Large-scale document processing requires contextually aware table extraction (TE) that is both accurate and efficient. Yet current approaches require billions of parameters, hundreds of autoregressive steps, or costly API inference. Motivated by this, we introduce the Page-Object Table Transformer (POTATR), a lightweight 29M parameter image-to-graph model that extends the Table Transformer (TATR) for contextualized page-level TE. POTATR outperforms all models tested on the PubTables-v2 Single Pages benchmark -- including frontier MLLMs -- achieving $\textrm{GriTS}_\textrm{Con}$ of 0.964 while running over 130$\times$ faster at roughly 300$\times$ lower cost. Further, POTATR's output is spatially grounded: every recognized element has a bounding box, enabling visual verification and geometric text assignment. As a result, POTATR performs unified page-level TE while composing with other models, enabling extension to scanned documents via external OCR and to full-document TE via techniques like cross-page merging. Code and models will be released.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09788v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Brandon Smock, Libin Liang, Max Sokolov, Amrit Ramesh, Valerie Faucon-Morin, Tayyibah Khanam, Maury Courtland</dc:creator>
    </item>
    <item>
      <title>Principled Uncertainty in Clinical AI: End-to-End Bayesian Modelling and Algorithmic Equity Auditing Across Multimodal Patient Data</title>
      <link>https://arxiv.org/abs/2606.09789</link>
      <description>arXiv:2606.09789v1 Announce Type: new 
Abstract: Clinical artificial intelligence (AI) systems routinely produce predictions without principled quantification of uncertainty, limiting their trustworthiness in high-stakes medical environments. This paper presents an integrated research programme addressing two interconnected problems: (1) the development of a fully end-to-end Bayesian uncertainty modelling framework for multimodal clinical data, and (2) the application of calibrated uncertainty estimates as a formal measure of algorithmic equity across patient subgroups. We construct a probabilistic deep learning architecture comprising modality-specific variational encoders, a precision-weighted late fusion mechanism, and a decomposed uncertainty output head that separates aleatoric from epistemic uncertainty. The system is trained with a composite Bayesian loss incorporating binary cross-entropy, Kullback-Leibler divergence regularisation, and an uncertainty calibration penalty. We evaluate model calibration using Expected Calibration Error (ECE = 0.096) and conduct a subgroup equity audit across facility type, socioeconomic status, age group, and biological sex on a dataset of 1,000 simulated patients. Results demonstrate that epistemic uncertainty systematically identifies underserved populations: primary/rural facility patients show a 15.3% uncertainty equity gap (p &lt; 0.001, effect size = 0.698), low socioeconomic status patients exhibit a 6.8% gap (p &lt; 0.001), and elderly patients show a 3.9% gap (p &lt; 0.001), whilst no significant sex-based disparity is detected. These findings establish that calibrated uncertainty is not merely a technical property of probabilistic models but constitutes an actionable equity signal with direct clinical relevance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09789v1</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Oladimeji Anthonio, Dimeji Abdulsobur Olawuyi, Oloruntoba Ajayi, Temiloluwa Aderemi, Joseph Odamo</dc:creator>
    </item>
    <item>
      <title>End-to-End Optimization of Incoherent Imaging for Classification Under Detector-Limited Readout</title>
      <link>https://arxiv.org/abs/2606.09792</link>
      <description>arXiv:2606.09792v1 Announce Type: new 
Abstract: End-to-end co-optimization of optical front-ends (e.g. metasurfaces) and neural network back-ends has been widely applied to imaging tasks, yet a formalism characterizing when and why such systems outperform conventional lens-based imaging is largely lacking. This paper focuses on object classification, a central imaging task, and asks when end-to-end optimization of a phase mask for incoherent imaging improves performance over a conventional focusing lens. We find that these gains arise primarily under constrained detector readout and are limited under full detector readout. In the latter setting, we prove that no incoherent phase mask exceeds the ideal-channel mutual information between detector measurements and class labels; a conventional focusing lens approaches this ceiling, and joint optimization yields no empirical gain. When detector readout is constrained -- by coarse spatial sampling or a limited number of measurements -- optimized optics can substantially improve classification by increasing class separability in the detector measurements. These gains are largest under low detector noise and shrink as noise grows, because the optics shape the signal before it reaches the detector but cannot remove noise added afterward. The advantage also depends on the spectral structure of the task: co-design helps most when class-discriminative content is concentrated at lower spatial frequencies than within-class variation. We develop a theoretical framework formalizing these distinctions and test its predictions on synthetic data and standard benchmarks (MNIST, FashionMNIST, SVHN).</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09792v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Archer Wang, Joshua Chen, Sachin Vaidya, Marin Solja\v{c}i\'c</dc:creator>
    </item>
    <item>
      <title>Beyond Spherical Harmonics: Rethinking Appearance Models for Radiance Reconstruction</title>
      <link>https://arxiv.org/abs/2606.09794</link>
      <description>arXiv:2606.09794v1 Announce Type: new 
Abstract: View-dependent appearance modeling remains a challenging problem in novel-view synthesis and reconstruction. Accurately representing complex angular effects often requires substantial memory and computational resources. For new learning-based methods, a common approach is to rely on SH. However, capturing high-frequency phenomena such as specular reflections demands high-order expansions, which increase memory usage and computational cost. Consequently, most methods employ low-order SH, which limits the ability to model complex view-dependent effects, resulting in overly smooth or diffuse representations. To address these limitations, we systematically evaluate a wide range of spherical functions in the context of scene reconstruction. Some of them are introduced to graphics and computer vision for the first time in this paper. Based on the insights from the experiment, we develop a novel spherical formulation, the Normalized Anisotropic Spherical Gabor function that enables efficient modeling and learning of high-frequency appearance effects while maintaining compact representation. Compared to existing approaches, our function achieves higher-quality reconstruction of view-dependent phenomena such as glints, while being up to five times more memory-efficient and more efficient to evaluate. We validate its performance in radiance-field reconstruction tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09794v1</guid>
      <category>cs.CV</category>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ewa Miazga, Jorge Condor, Piotr Didyk</dc:creator>
    </item>
    <item>
      <title>SynManDex: Synthesizing Human-like Dexterous Grasps from Synthetic Human Pre-Grasps</title>
      <link>https://arxiv.org/abs/2606.09798</link>
      <description>arXiv:2606.09798v1 Announce Type: new 
Abstract: Human hand-object interactions encode functional intent, but direct transfer to robotic hands often fails under morphology, contact, and reachability constraints. We present SynManDex, a synthetic pipeline that uses generated human pre-grasps as affordance-aware proposals and resolves the final contacts with robot-native optimization. SynManDex samples object-conditioned digital human pre-grasps, retargets them to dexterous robotic hand poses, optimizes force-closure contacts on the target embodiment, and admits trajectories that pass checks from each step. The resulting keyframes support both grasp-and-lift demonstrations and various prehensile manipulation tasks such as tea pouring, photo taking, and flute playing, designed via VLM agents. As a result, SynManDex combines high grasp quality (86.4\% grasp stability) with 4.67/5 human-likeness (93.4\%). It achieves 80.7\% successes in simulation and 25/30 (83.3\%) real-robot successes when applied to a 36-DOF bimanual dexterous robotic platform.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09798v1</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yanming Shao, Zanxin Chen, Wenwei Lin, Mingjie Zhou, Tianxing Chen, Xiaokang Yang, Yichen Chi, Yao Mu</dc:creator>
    </item>
    <item>
      <title>FASE: Fast Adaptive Semantic Entropy for Code Quality</title>
      <link>https://arxiv.org/abs/2606.09800</link>
      <description>arXiv:2606.09800v1 Announce Type: new 
Abstract: Multi-agent code generation offers a promising paradigm for autonomous software development by simulating the human software engineering lifecycle. However, system reliability remains hindered by LLM hallucinations and error propagation across interacting agents. While semantic entropy provides a principled way to quantify uncertainty without ground-truth answers, current methods often rely on costly LLM-driven equivalence checks. In this work, we introduce Fast Adaptive Semantic Entropy (FASE), a novel metric that approximates functional correctness based on the minimum spanning tree of structural and semantic dissimilarity graphs. Evaluations on HumanEval and BigCodeBench demonstrate that FASE outperforms state-of-the-art semantic entropy by LLM entailment, achieving a 25% average improvement in Spearman correlation and a 19% increase in ROCAUC score against Pass@1 from ground-truth test cases when using the Qwen3-Embedding-8B model. Furthermore, by eliminating costly LLM-driven equivalence evaluation, FASE incurs negligible computational overhead, requiring only approximately 0.3% of the runtime cost of traditional semantic entropy approaches. These results position FASE as a practical, cost-effective solution for optimizing uncertainty quantification in real-world multi-agent workflows.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09800v1</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Shizhe Lin, Ladan Tahvildari</dc:creator>
    </item>
    <item>
      <title>Bandits for Efficient Experimentation: Adapting to Control Group, Preferences, and Context Drifts</title>
      <link>https://arxiv.org/abs/2606.09802</link>
      <description>arXiv:2606.09802v1 Announce Type: new 
Abstract: We consider a variant of the linear contextual stochastic multi-armed bandits, where the learner must provide recommendations to a group of users, each having its personalized preference vector, and in the presence of context distributions that are drifting over time. Under practitioner-friendly assumptions, we reduce this setting to linear bandit with stationary mean but heteroskedastic and non-stationary noise. We further study the case when the learner must ensure the mean reward of each decision must exceed that of a baseline strategy $\boldsymbol{\pi}_0$ at each decision step. We introduce Dri-MED, an algorithm inspired from the linear version of the MED strategy, and carefully adapted to handle the non-stationary heteroskedastic noise. We show that the instance-dependent regret scales as $\tilde{\mathcal O}\left(\frac{\kappa}{\tilde{\Delta}}d^2(\log(T)\right)$, where $\tilde{\Delta}$ is the constraint-aware sub-optimality gap subject to policy $\pi_0$, with variance-aware multiplicative term $\kappa$ that we carefully handle using heteroskedastic regression. We further show Dri-MED enjoys $\tilde{\mathcal{O}}(d)$ expected constraint violations. Our numerical results suggest that Dri-MED significantly outperforms conservative baselines that ignores the drift and preference structure.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09802v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Udvas Das, Waris Radji, Debabrota Basu, Odalric-Ambrym Maillard</dc:creator>
    </item>
    <item>
      <title>Echo-Memory: A Controlled Study of Memory in Action World Models</title>
      <link>https://arxiv.org/abs/2606.09803</link>
      <description>arXiv:2606.09803v1 Announce Type: new 
Abstract: We present \textbf{Echo-Memory}, a controlled study of memory mechanisms in action-conditioned world models. These models generate multi-segment videos from a first frame, text prompt, and camera-action sequence, but their central failure is often memory rather than local image synthesis: after the camera leaves and returns, the scene or salient object may silently change. Existing memory designs are hard to compare because gains are entangled with backbone, training, retrieval, and evaluation differences. Echo-Memory fixes the action-to-video interface and varies only how history is stored and read by the generator. Under a shared video diffusion backbone, optimizer, camera-action representation, sampler, and evaluation pipeline, we compare raw context, compression-based memory, spatial summaries with different read-out paths, and state-space recurrence. This matched matrix separates four otherwise conflated axes: \emph{capacity}, \emph{compression}, \emph{read-out}, and \emph{recurrence}. We also evaluate memory through a three-branch protocol: replay quality, in-domain loop revisit, and open-domain return probes. The branches routinely disagree, showing that replay fidelity is not a sufficient proxy for remembering a world. Three findings follow. Raw context is a strong capacity baseline and improves open-domain return far more than it improves replay metrics. Compactness is not a free substitute for capacity: aggressive spatial and hybrid-compression memories lose the salient evidence needed for return. Finally, block-wise state-space recurrence is the strongest open-domain return mechanism in our matrix, showing that the structure of implicit memory matters as much as the decision to use it. These results provide a compact protocol for studying memory in action world models beyond isolated replay metrics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09803v1</guid>
      <category>cs.CV</category>
      <category>cs.GR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wayne King, Zeyue Xue, Yuxuan Bian, Jie Huang, Haoran Li, Yaowei Li, Yaofeng Su, Yuming Li, Haoyu Wang, Shiyi Zhang, Songchun Zhang, Yuwei Niu, Sihan Xu, Junhao Zhuang, Haoyang Huang, Nan Duan</dc:creator>
    </item>
    <item>
      <title>Topological Neural Operators</title>
      <link>https://arxiv.org/abs/2606.09806</link>
      <description>arXiv:2606.09806v1 Announce Type: new 
Abstract: We introduce Topological Neural Operators (TNOs), a principled framework for operator learning on cell complexes that lifts neural operators (NOs) from functions on points and/or edges to topological domains. TNOs represent data as features defined on cells of varying dimension and model their interactions through Discrete Exterior Calculus, enabling explicit cross-dimensional coupling via gradient-, curl-, and divergence-type operators. The key design principle is to decouple where information flows, as governed by fixed topological operators, from how it is transformed (which is learned), yielding models that respect the geometric support of physical quantities and expose conservation and compatibility structure. We further propose Hierarchical TNOs (HTNOs), which incorporate learned coarse complexes to propagate long-range and topology-dependent information. Our framework subsumes existing NOs as a special case, providing a unified perspective on operator learning across discretizations. Across a range of PDE benchmarks, including irregular-geometry flow problems, TNOs and HTNOs improve accuracy; controlled studies further isolate the benefits of native higher-rank and topological structure. Project page: https://circle-group.github.io/research/TNO</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09806v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lennart Bastian, Samuel Leventhal, Mustafa Hajij, Tolga Birdal</dc:creator>
    </item>
    <item>
      <title>Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting</title>
      <link>https://arxiv.org/abs/2606.09809</link>
      <description>arXiv:2606.09809v1 Announce Type: new 
Abstract: AI evaluation results are produced at scale but reported inconsistently across leaderboards, model cards, benchmark papers, and company blogs. The cost is interpretive: readers cannot reliably compare results across sources, identify what a report omits, or trace an aggregate claim to its underlying evidence. Recent efforts address isolated components but leave three gaps: they cover only narrow slices of the evaluation lifecycle and do not compose into a single interpretable record; they specify static representations that do not differentiate the questions different stakeholders bring to the same evidence; and they remain proposals on paper, lacking the extraction infrastructure required for adoption at scale. We present \EvalCards{}, an operational reporting layer that composes benchmark metadata, evaluation run data, and model metadata into a unified record. We (1) derive a reporting schema from a structured review of 52 papers and 10 stakeholder interviews, (2) implement four interpretive signals (reproducibility, documentation completeness, provenance and risk, and score comparability), rendered through reader modes calibrated to research and non-research audiences, and (3) deploy a monitoring tool that applies \EvalCards{} across 5,816 models, 635 benchmarks, and 101,843 results, surfacing systematic gaps in current reporting practice.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09809v1</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Avijit Ghosh, Anka Reuel, Jenny Chim, Wm. Matthew Kennedy, Srishti Yadav, Jennifer Mickel, Yanan Long, Andrew Tran, Anastassia Kornilova, Damian Stachura, Kevin Klyman, Felix Friedrich, Jeba Sania, Max Lamparth, Jan Batzner, Anoop Mishra, Eliya Habba, Yixiong Hao, Nathan Heath, Shalaleh Rismani, Usman Gohar, Andrea Loehr, David Manheim, Ruchira Dhar, Sree Harsha Nelaturu, Aarush Sinha, Leshem Choshen, Drishti Sharma, Ishan Khire, Amit Saha, Subramanyam Sahoo, Michael Hardy, Michael Alexander Riegler, Kabir Manghnani, Michelle Lin, Yanan Jiang, Yilin Huang, Asaf Yehudai, Jessica Ji, Aris Hofmann, Mubashara Akhtar, Nuno Moniz, Yacine Jernite, Stella Biderman, Zeerak Talat, Sanmi Koyejo, Mykel Kochenderfer, Irene Solaiman</dc:creator>
    </item>
    <item>
      <title>AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing</title>
      <link>https://arxiv.org/abs/2606.09811</link>
      <description>arXiv:2606.09811v1 Announce Type: new 
Abstract: World-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors into policy learning. However, existing world-action models couple world prediction and action execution at the same temporal resolution, forcing the world branch to model near-term frame variations that are redundant and weakly informative. We posit that strictly binding world prediction and action execution to the same temporal rhythm may underutilize the potential of the video branch for embodied control. Therefore, we propose AHA-WAM, an Asynchronous Horizon-Adaptive World-Action Model built on a dual Diffusion Transformer (DiT) architecture that reorganizes world-action modeling around this temporal asymmetry. AHA-WAM instantiates the video DiT as a low-frequency world planner that maintains rolling key-value memory over past observations and exposes reusable layerwise latent context encoding long-horizon scene evolution, while a high-frequency action DiT executes short action chunks in closed loop by querying this context through layerwise joint attention. To support asynchronous execution, we introduce horizon-adaptive offset training and Observation-Guided Video-Context Routing (OVCR), which together let the action expert exploit long-horizon world context while remaining responsive to real-time execution state without rerunning the video DiT. Experiments on RoboTwin and real-world manipulation tasks show that AHA-WAM achieves state-of-the-art performance without any robot-data pretraining, attaining 92.80% average success on RoboTwin and 78.3% success across 4 real-world tasks, while reaching 24.17 Hz closed-loop control with a 4.59x speedup over Fast-WAM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09811v1</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jisong Cai, Long Ling, Shiwei Chu, Zhongshan Liu, Jiayue Kang, Zhixuan Liang, Wenjie Xu, Yinan Mao, Weinan Zhang, Xiaokang Yang, Ru Ying, Ran Zheng, Yao Mu</dc:creator>
    </item>
    <item>
      <title>iMaC: Translating Actions into Motion and Contact Images for Embodied World Models</title>
      <link>https://arxiv.org/abs/2606.09813</link>
      <description>arXiv:2606.09813v1 Announce Type: new 
Abstract: Embodied world models have emerged as a pivotal paradigm for visual robotic decision-making and interactive environment simulation. However, conventional embodied frameworks rely on low-dimensional structured action vectors (e.g., joint angles and end-effector poses), which suffer from limited expressive capacity, poor generalization across diverse embodiments, and unnatural dynamic modeling for complex physical interactions. To address these limitations, this paper proposesiMac (Image as Action Control), a novel unified control paradigm that treats raw visual images as native action representations for embodied world models. Departing from traditional explicit kinematic action encoding, iMac formulates continuous visual manipulation as image-based action tokens, which inherently encapsulate spatial motion intentions, interactive geometric constraints and subtle physical dynamics. We construct a dual-branch embodied architecture consisting of an image-action encoder and a dynamic world predictor: the encoder compresses target-driven visual images into compact action embeddings, while the predictor learns environment transition rules conditioned on image actions to achieve high-fidelity future state prediction and closed-loop embodied control. Extensive experiments are conducted on public embodied manipulation benchmarks and real-world robotic scenarios. The results demonstrate that iMac outperforms vector-based action control baselines in prediction accuracy, task success rate and cross-scene generalization ability. Moreover, our image-action design eliminates the reliance on manually defined action spaces, realizing flexible and universal control for heterogeneous embodied agents. This work provides an innovative visual-action perspective for embodied world models, offering a simple yet effective paradigm for scalable robotic perception and manipulation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09813v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhenyu Wu, Xiuwei Xu, Yukun Zhou, Yifan Li, Qiuping Deng, Xiaofeng Wang, Zheng Zhu, Bingyao Yu, Ziwei Wang, Jiwen Lu, Haibin Yan</dc:creator>
    </item>
    <item>
      <title>PTL-Diffusion: Manifold-Aware Diffusion with Periodic Terminal Laws</title>
      <link>https://arxiv.org/abs/2606.09816</link>
      <description>arXiv:2606.09816v1 Announce Type: new 
Abstract: Standard diffusion models typically use a single time-homogeneous Gaussian terminal distribution as the reference law for generation. While this choice is analytically convenient and empirically powerful, it provides little explicit structure for data concentrated near low-dimensional manifolds, where different regions of the data distribution may correspond to distinct local geometric or semantic factors. As a result, the reverse model must recover manifold-level structure almost entirely from an unstructured terminal reference distribution.
  We propose PTL-Diffusion, a proof-of-concept diffusion framework whose forward noising process converges to a nonconstant periodic family of Gaussian terminal laws rather than to a single invariant law. Unlike a phase-conditioned DDPM, where phase information only enters the denoising network while the forward process remains unchanged, PTL-Diffusion embeds phase structure directly into the forward noising dynamics.
  The proposed construction remains close to standard denoising diffusion models: for a periodically forced Ornstein--Uhlenbeck-type forward process, we derive closed-form forward marginals, the limiting periodic Gaussian terminal family, and explicit Gaussian reverse posteriors, enabling standard noise-prediction training. We also introduce an invariant-average regularization term coupling the phase-conditioned reverse dynamics through the averaged periodic reference law. Experiments on torus and cylinder point-cloud benchmarks and the Olivetti face dataset show that PTL-Diffusion improves manifold-level distributional matching over matched DDPM baselines, reducing phase-conditioned errors, feature-space covariance errors, and nearest-neighbour manifold distances. These results suggest structured terminal reference laws as a promising direction, while motivating more expressive phase constructions and larger-scale evaluations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09816v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>math.PR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Danqi Zhuang, Jisui Huang, Xiaoyue Xi, Andrew Kiggins, Xiaojie Wang, Ke Chen, Yue Wu</dc:creator>
    </item>
    <item>
      <title>Rethinking the Divergence Regularization in LLM RL</title>
      <link>https://arxiv.org/abs/2606.09821</link>
      <description>arXiv:2606.09821v1 Announce Type: new 
Abstract: Reinforcement learning (RL) has become a key component of post-training large language models (LLMs). In practice, LLM RL is often off-policy because of training-inference mismatch and policy staleness, making trust-region control essential for stable optimization. Mainstream methods such as PPO and GRPO approximate this control with a ratio-clipping mechanism, but the importance ratio can be a poor proxy for distributional shift in long-tailed vocabularies. Recent work such as DPPO addresses this mismatch by replacing ratio-based clipping with a divergence-based mask, yielding a trust region defined by the sampled token's absolute probability shift. However, DPPO still relies on a hard mask: once a token crosses the trust-region boundary in a harmful direction, its gradient is discarded rather than corrected. To address this, we propose Divergence Regularized Policy Optimization (DRPO), which replaces the hard mask with a smooth advantage-weighted quadratic regularizer on policy shift. DRPO preserves the same trust-region geometry as DPPO while inducing bounded, continuous gradient weights that attenuate diverging updates and provide corrective signals beyond the boundary. Experiments across model scales, architectures, and precision settings show that DRPO improves the stability and efficiency of LLM RL training.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09821v1</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiarui Yao, Xiangxin Zhou, Penghui Qi, Wee Sun Lee, Liefeng Bo, Tianyu Pang</dc:creator>
    </item>
    <item>
      <title>Causally Evaluating the Learnability of Formal Language Tasks</title>
      <link>https://arxiv.org/abs/2606.09822</link>
      <description>arXiv:2606.09822v1 Announce Type: new 
Abstract: Language models, as multi-task learners, acquire a wide range of abilities during training. A fundamental question is how much task-specific data is needed to learn a given task. Answering this for natural language is difficult: tasks are hard to delineate and can confound one another. To rigorously investigate the relationship between data frequency and learnability, we turn to a controlled setting using formal languages induced from probabilistic finite automata. These serve as a methodological testbed to demonstrate that standard correlational evaluation practices are inherently flawed. To enable causal analysis, we introduce the binning semiring, an algebraic object that lets us control how often a targeted property occurs in a sampled corpus. We formulate the experimental pipeline as a causal graphical model and derive decomposed Kullback-Leibler divergence metrics to measure the learnability of specific sub-tasks. Our experiments show that evaluating learnability without causal intervention leads to incorrect conclusions due to confounders in correlational analysis, and serve as a warning about correlational pitfalls in natural-language settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09822v1</guid>
      <category>cs.CL</category>
      <category>cs.FL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>V\'esteinn Sn{\ae}bjarnarson, Anej Svete, Josef Valvoda, Reda Boumasmoud, Brian DuSell, Ryan Cotterell</dc:creator>
    </item>
    <item>
      <title>TSseek: Regular Expression-Based Similarity Search for Distributed Time Series Datasets</title>
      <link>https://arxiv.org/abs/2606.09824</link>
      <description>arXiv:2606.09824v1 Announce Type: new 
Abstract: Similarity search is a fundamental operation in time series analysis. Most existing techniques, however, require users to supply a precise sequence of values (typically an entire time series object) as the query input. This rigid requirement limits real-world applications, where users instead want to express patterns, trends, or value ranges. Flexible, pattern-based search has been explored in text retrieval and complex event processing, but remains underexplored for large-scale distributed time series.
  To close this gap, we propose TSseek, a regular-expression-powered search framework for distributed time series datasets. TSseek's query language enables users to compose patterns encompassing trends, value ranges, and wildcard segments. We show that conventional approximation techniques (e.g., PAA and SAX) and their index structures are ill-suited for such queries because they cannot operate on regular-expression query constructs.
  In TSseek, we map the time series objects and the query constructs into the same space by approximating time series objects as sequences of line segments that retain both trend (slope direction) and value range, and translating query constructs into bounding rectangles. To support efficient processing, we build TSseek-X, a distributed spatial index over the time series segments. TSseek supports two fundamental query types, namely whole-matching queries (over entire series) and subsequence-matching queries (over arbitrary windows within a series).
  Across benchmark and real-world datasets, full-scan, model-based, and SAX-based baselines all sacrifice either accuracy or speed, whereas TSseek returns exact answers efficiently. Also, for subsequence workloads, it achieves significant speedups over state-of-the-art subsequence matching engines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09824v1</guid>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaoshuai Li, Khalid Alnuaim, Mohamed Y. Eltabakh, Elke A. Rundensteiner</dc:creator>
    </item>
    <item>
      <title>An Agency-Transferring Model-Free Policy Enhancement Technique</title>
      <link>https://arxiv.org/abs/2606.09825</link>
      <description>arXiv:2606.09825v1 Announce Type: new 
Abstract: Training reinforcement learning (RL) policies from scratch is
  costly: it requires careful reward and environment design,
  extensive tuning, and substantial computation.
  Yet many control problems already have a functional but
  suboptimal policy available as a baseline.
  This paper proposes a method for embedding such a baseline into
  the RL training process, simultaneously improving training
  efficiency relative to from-scratch methods and producing a
  learning policy that outperforms the baseline.
  At each step, the method arbitrates between the baseline policy
  and a trainable learning policy, initially relying strongly on
  the baseline policy and then progressively transferring agency to
  the learning policy.
  By the end of training, the learning policy is a standalone
  neural network that operates without baseline policy support.
  The paper formalizes what it means for the baseline policy to be
  functional: under this policy, the agent reaches a goal set and
  remains there with high probability.
  The proposed arbitration mechanism is designed to exploit this
  property during training, yielding high goal-reaching rates right
  from the beginning of training.
  A theoretical analysis provides a formal interpretation of this
  behavior under stated assumptions and extends it to the final
  baseline-free regime, where explicit lower bounds are derived for
  the goal-reaching probability of the standalone learning policy.
  Empirical results on continuous-control benchmarks show that the
  proposed method achieves returns that match or exceed those of
  competitive approaches, while maintaining the highest
  goal-reaching rates throughout training among the compared
  methods -- including in the final stage, where the learning policy
  operates without any baseline support.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09825v1</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Anton Bolychev, Georgiy Malaniya, Sinan Ibrahim, Pavel Osinenko</dc:creator>
    </item>
    <item>
      <title>OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics</title>
      <link>https://arxiv.org/abs/2606.09826</link>
      <description>arXiv:2606.09826v1 Announce Type: new 
Abstract: Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair, focus on single-agent Solo play, and lack unified protocols for evaluating heterogeneous agent classes (commercial VLMs, open-weight VLMs, and specialized game policies) on the same footing. We address these gaps with OmniGameArena, a real-time benchmark of twelve newly built Unreal Engine 5 games spanning Solo (7), PvP (3), and Coop (2) with unified action interfaces, and the Improvement Dynamics Curve (IDC), an agentic-reflection harness in which a tool-using reflector LLM autonomously refines a bounded skill prompt across multiple rounds. Beyond cold-start leaderboard scores, IDC exposes two additional observables for each (agent, game) pair: how the score evolves across reflection rounds, and how the learned skill behaves on held-out task variants. We report these observables for twelve VLM agents on the cold-start leaderboard and four top agents under IDC.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09826v1</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mingxian Lin, Shengju Qian, Yuqi Liu, Yi-Hua Huang, Yiyu Wang, Wei Huang, Yitang Li, Fan Zhang, Zeyu Hu, Lingting Zhu, Xin Wang, Xiaojuan Qi</dc:creator>
    </item>
    <item>
      <title>MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action Models</title>
      <link>https://arxiv.org/abs/2606.09827</link>
      <description>arXiv:2606.09827v1 Announce Type: new 
Abstract: Temporal modeling is essential for robotic manipulation, as effective control requires both memory of past interactions and imagination of future states. However, most VLA models rely primarily on the current observation and therefore struggle with long-horizon, temporally dependent tasks. Cognitive science suggests that humans rely on working memory to buffer short-lived context, the hippocampal system to preserve episodic memory of past experience, and internal models to imagine possible future state evolution. Inspired by these mechanisms, we propose MemoryVLA++, a full temporal modeling framework that equips VLA models with memory and imagination for robotic manipulation. A pretrained VLM encodes the current observation into perceptual and cognitive tokens, forming working memory. These tokens query a Perceptual-Cognitive Memory Bank to retrieve relevant historical context. This bank stores low-level details and high-level semantics from past interactions, and is updated through redundancy-aware consolidation. A world model imagines future states in a denoising latent space, and the imagined latents are integrated under memory guidance to form full temporal-aware tokens. The resulting tokens condition a diffusion action expert to predict temporally consistent action sequences. We conduct extensive experiments on 5 simulation benchmarks and 3 categories of real-robot tasks across 3 robots, covering general manipulation, long-horizon temporal tasks, robustness, and generalization. Our method achieves strong performance across Libero, SimplerEnv, Mikasa-Robo, Calvin, Libero-Plus, and diverse real-robot tasks, validating the effectiveness of full temporal modeling with memory and imagination. For example, on real robots, it achieves +9%, +26%, +28% gains on general, memory-dependent, and imagination-dependent tasks. Project Page: https://shihao1895.github.io/MemoryVLA-PP-Web</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09827v1</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hao Shi, Weiye Li, Bin Xie, Yulin Wang, Renping Zhou, Tiancai Wang, Xiangyu Zhang, Ping Luo, Gao Huang</dc:creator>
    </item>
    <item>
      <title>Latent Spatial Memory for Video World Models</title>
      <link>https://arxiv.org/abs/2606.09828</link>
      <description>arXiv:2606.09828v1 Announce Type: new 
Abstract: Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit point cloud memory constructed in RGB space. This design is both computationally expensive, requiring repeated rendering and VAE encoding, and inherently lossy, as the round trip through pixel space discards rich features of the learned latent representation. In this paper, we introduce \emph{latent spatial memory} for video world models, a persistent 3D cache that stores scene information directly in the diffusion latent space, avoiding pixel-space reconstruction. Building on this, we propose Mirage, a latent-space spatial memory framework that constructs the memory by lifting latent tokens into 3D via depth-guided back-projection and queries it by synthesizing novel views through direct latent-space warping. This unified formulation eliminates both the information loss of pixel-space reconstruction and the computational burden of repeated encoding and rendering. Experiments show that latent spatial memory achieves up to \textbf{10.57}$\times$ faster end-to-end video generation and \textbf{55}$\times$ reduction in memory footprint relative to explicit 3D baselines. Leveraging the geometric prior of the diffusion model, Mirage attains state-of-the-art performance on WorldScore and strong reconstruction quality on RealEstate10K.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09828v1</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>new</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Weijie Wang, Haoyu Zhao, Yifan Yang, Feng Chen, Zeyu Zhang, Yefei He, Zicheng Duan, Donny Y. Chen, Yuqing Yang, Bohan Zhuang</dc:creator>
    </item>
    <item>
      <title>Unveiling the Potential of iMarkers: Invisible Fiducial Markers for Advanced Robotics</title>
      <link>https://arxiv.org/abs/2501.15505</link>
      <description>arXiv:2501.15505v5 Announce Type: cross 
Abstract: Fiducial markers are widely used in robotics for navigation, object recognition, and scene understanding. While offering significant advantages for robots and Augmented Reality (AR) applications, they often disrupt the visual aesthetics of environments, as they are visible to humans, making them unsuitable for many everyday use cases. To address this gap, this paper presents iMarkers, innovative, unobtrusive fiducial markers detectable exclusively by robots and AR devices equipped with adequate sensors and detection algorithms. These markers offer high flexibility in production, allowing customization of their visibility range and encoding algorithms to suit various demands. The paper also introduces the hardware designs and open-sourced software algorithms developed for detecting iMarkers, highlighting their adaptability and robustness in the detection and recognition stages. Numerous evaluations have demonstrated the effectiveness of iMarkers relative to conventional (printed) and blended fiducial markers and have confirmed their applicability across diverse robotics scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.15505v5</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ali Tourani, Deniz Isinsu Avsar, Hriday Bavle, Jose Luis Sanchez-Lopez, Jan Lagerwall, Holger Voos</dc:creator>
    </item>
    <item>
      <title>XAInomaly: Explainable and Interpretable Deep Contractive Autoencoder for O-RAN Traffic Anomaly Detection</title>
      <link>https://arxiv.org/abs/2502.09194</link>
      <description>arXiv:2502.09194v1 Announce Type: cross 
Abstract: Generative Artificial Intelligence (AI) techniques have become integral part in advancing next generation wireless communication systems by enabling sophisticated data modeling and feature extraction for enhanced network performance. In the realm of open radio access networks (O-RAN), characterized by their disaggregated architecture and heterogeneous components from multiple vendors, the deployment of generative models offers significant advantages for network management such as traffic analysis, traffic forecasting and anomaly detection. However, the complex and dynamic nature of O-RAN introduces challenges that necessitate not only accurate detection mechanisms but also reduced complexity, scalability, and most importantly interpretability to facilitate effective network management. In this study, we introduce the XAInomaly framework, an explainable and interpretable Semi-supervised (SS) Deep Contractive Autoencoder (DeepCAE) design for anomaly detection in O-RAN. Our approach leverages the generative modeling capabilities of our SS-DeepCAE model to learn compressed, robust representations of normal network behavior, which captures essential features, enabling the identification of deviations indicative of anomalies. To address the black-box nature of deep learning models, we propose reactive Explainable AI (XAI) technique called fastshap-C.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.09194v1</guid>
      <category>cs.IT</category>
      <category>cs.AI</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Osman Tugay Basaran, Falko Dressler</dc:creator>
    </item>
    <item>
      <title>Physics-Embedded Neural Networks for sEMG-based Continuous Motion Estimation</title>
      <link>https://arxiv.org/abs/2506.22459</link>
      <description>arXiv:2506.22459v1 Announce Type: cross 
Abstract: Accurately decoding human motion intentions from surface electromyography (sEMG) is essential for myoelectric control and has wide applications in rehabilitation robotics and assistive technologies. However, existing sEMG-based motion estimation methods often rely on subject-specific musculoskeletal (MSK) models that are difficult to calibrate, or purely data-driven models that lack physiological consistency. This paper introduces a novel Physics-Embedded Neural Network (PENN) that combines interpretable MSK forward-dynamics with data-driven residual learning, thereby preserving physiological consistency while achieving accurate motion estimation. The PENN employs a recursive temporal structure to propagate historical estimates and a lightweight convolutional neural network for residual correction, leading to robust and temporally coherent estimations. A two-phase training strategy is designed for PENN. Experimental evaluations on six healthy subjects show that PENN outperforms state-of-the-art baseline methods in both root mean square error (RMSE) and $R^2$ metrics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.22459v1</guid>
      <category>eess.SP</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wending Heng, Chaoyuan Liang, Yihui Zhao, Zhiqiang Zhang, Glen Cooper, Zhenhong Li</dc:creator>
    </item>
    <item>
      <title>XAI-on-RAN: Explainable, AI-native, and GPU-Accelerated RAN Towards 6G</title>
      <link>https://arxiv.org/abs/2511.17514</link>
      <description>arXiv:2511.17514v1 Announce Type: cross 
Abstract: Artificial intelligence (AI)-native radio access networks (RANs) will serve vertical industries with stringent requirements: smart grids, autonomous vehicles, remote healthcare, industrial automation, etc. To achieve these requirements, modern 5G/6G design increasingly leverage AI for network optimization, but the opacity of AI decisions poses risks in mission-critical domains. These use cases are often delivered via non-public networks (NPNs) or dedicated network slices, where reliability and safety are vital. In this paper, we motivate the need for transparent and trustworthy AI in high-stakes communications (e.g., healthcare, industrial automation, and robotics) by drawing on 3rd generation partnership project (3GPP)'s vision for non-public networks. We design a mathematical framework to model the trade-offs between transparency (explanation fidelity and fairness), latency, and graphics processing unit (GPU) utilization in deploying explainable AI (XAI) models. Empirical evaluations demonstrate that our proposed hybrid XAI model xAI-Native, consistently surpasses conventional baseline models in performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.17514v1</guid>
      <category>cs.NI</category>
      <category>cs.AI</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Osman Tugay Basaran, Falko Dressler</dc:creator>
    </item>
    <item>
      <title>BRAIN: Bayesian Reasoning via Active Inference for Agentic and Embodied Intelligence in Mobile Networks</title>
      <link>https://arxiv.org/abs/2602.14033</link>
      <description>arXiv:2602.14033v1 Announce Type: cross 
Abstract: Future sixth-generation (6G) mobile networks will demand artificial intelligence (AI) agents that are not only autonomous and efficient, but also capable of real-time adaptation in dynamic environments and transparent in their decisionmaking. However, prevailing agentic AI approaches in networking, exhibit significant shortcomings in this regard. Conventional deep reinforcement learning (DRL)-based agents lack explainability and often suffer from brittle adaptation, including catastrophic forgetting of past knowledge under non-stationary conditions. In this paper, we propose an alternative solution for these challenges: Bayesian reasoning via Active Inference (BRAIN) agent. BRAIN harnesses a deep generative model of the network environment and minimizes variational free energy to unify perception and action in a single closed-loop paradigm. We implement BRAIN as O-RAN eXtended application (xApp) on GPU-accelerated testbed and demonstrate its advantages over standard DRL baselines. In our experiments, BRAIN exhibits (i) robust causal reasoning for dynamic radio resource allocation, maintaining slice-specific quality of service (QoS) targets (throughput, latency, reliability) under varying traffic loads, (ii) superior adaptability with up to 28.3% higher robustness to sudden traffic shifts versus benchmarks (achieved without any retraining), and (iii) real-time interpretability of its decisions through human-interpretable belief state diagnostics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.14033v1</guid>
      <category>cs.IT</category>
      <category>cs.AI</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Osman Tugay Basaran, Martin Maier, Falko Dressler</dc:creator>
    </item>
    <item>
      <title>A Proof in Coq that Core Logic is not Paraconsistent</title>
      <link>https://arxiv.org/abs/2606.05953</link>
      <description>arXiv:2606.05953v1 Announce Type: cross 
Abstract: First, this paper proves that Tennant's two claims (i.e. that his own logical system is paraconsistent, and that it overlaps minimal logic) are both false. Second, this proof is certified with Coq.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.05953v1</guid>
      <category>math.LO</category>
      <category>cs.LO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Joseph Vidal-Rosset</dc:creator>
    </item>
    <item>
      <title>When the Scaffold Stays On: AI, Practice Style, and Screening in Elite Skill Formation</title>
      <link>https://arxiv.org/abs/2606.06253</link>
      <description>arXiv:2606.06253v1 Announce Type: cross 
Abstract: Generative AI raises short-term productivity by completing tasks that learners would otherwise practice on their own. Whether this substitution erodes frontier skill, the skill behind top-tail non-AI-aided performance, is an open question of rising stakes. The sharper question is whether selection mechanisms can screen apart two coexisting types: substitute-users, who use AI in place of deliberate practice, and complement-users, who use it to accelerate skill development. In elite programming, the International Collegiate Programming Contest (ICPC) and the International Olympiad in Informatics (IOI) prohibit AI under proctoring and admit entrants through qualification rounds, whereas online Codeforces (CF) contests are unproctored and open to all. From CF histories we build an AI-prompt signature (more first-attempt acceptances, fewer attempts and retries) consistent with AI-assisted practice. Three patterns triangulate institutional screening. First, CF practice shifted toward this signature across cohorts over two AI rollouts. Second, in open CF contests a stronger signature predicts smaller rating gains for users with no ICPC/IOI affiliation, but not for those who qualified for the AI-prohibited contests. Third, inside the AI-prohibited ICPC environment, a shift toward AI-style practice predicts higher non-AI-aided scores for AI-era entrants. The same practice input carries opposite signs depending on whether the environment screens for it. The contrast points to two levers: how AI is integrated into training, since within the screened pool AI-style practice coincides with stronger non-AI-aided performance; and the design of AI-prohibited evaluation gates as a type-separating institution. Both extend beyond programming to credentialing systems (medical and legal boards, professional certification) that certify skill in a workforce increasingly shaped by AI.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06253v1</guid>
      <category>econ.EM</category>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Song Yao</dc:creator>
    </item>
    <item>
      <title>Blockchain Infrastructure for Intelligent Cyber--Physical--Social Systems:Post-Quantum Security, Interoperability, and Trustworthy Data Economies in the Era of Embodied AI</title>
      <link>https://arxiv.org/abs/2606.06895</link>
      <description>arXiv:2606.06895v1 Announce Type: cross 
Abstract: The deployment of embodied artificial intelligence via world-model-based robotics presents a transformative opportunity for blockchain infrastructure, establishing urgent demand for trustworthy data provenance, cross-organizational governance, and incentive-compatible sharing across decentralized ecosystems. Simultaneously, quantum computing advances recognized by the 2025 Nobel Prize in Physics and the Turing Award threaten the cryptographic primitives securing these data economies, creating an interdependent imperative: long-lived verification for embodied AI depends on crypto-agile architectures capable of withstanding quantum adversaries. This tutorial examines blockchain as the coordination layer bridging this dual transition, from financial substrate to foundational Cyber-Physical-Social Systems infrastructure that simultaneously secures against quantum cryptanalysis and enables scalable, trustworthy data economies. The session opens with an immersive AWS Braket demonstration engaging participants with superconducting, trapped-ion, and neutral-atom hardware to assess cryptographic threat timelines and witness ECDSA-to-post-quantum signature transitions. Five integrated modules progress from embodied AI and world-model requirements through quantum hardware reality and evidence-based security migration, to scalable cross-shard architectures via BrokerChain protocols, trustworthy data economies implementing Croissant metadata standards and robotic learning provenance, and industry ecosystem integration for multi-modal cloud deployment. By bridging quantum hardware realities with embodied AI data requirements, this tutorial charts blockchain as unified infrastructure for next-generation decentralized intelligent environments, providing open-source frameworks and roadmaps for architecting quantum-resistant, interoperable, and data-trustworthy systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06895v1</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <category>cs.ET</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Song Guo, Huawei Huang, Dongping Liu, Aoyu Zhang, Luyao Zhang</dc:creator>
    </item>
    <item>
      <title>The Montparnasse Algorithm for RNA Design</title>
      <link>https://arxiv.org/abs/2606.07562</link>
      <description>arXiv:2606.07562v1 Announce Type: cross 
Abstract: RNA design consists of discovering a nucleotide sequence that optimizes predefined criteria, such as secondary structure. It is useful for synthetic biology, medicine, and nanotechnology. We propose Montparnasse, a Monte Carlo search framework based on Generalized Nested Rollout Policy Adaptation, augmented with a problem-specific prior, slow and long adaptation at level 1, and a lexicographic multicriteria evaluation. Montparnasse solves all 100 puzzles of the Eterna100 V1 benchmark consistently faster than DesiRNA, the previous state of the art, across all time limits, reaching full coverage more than three times faster overall. On messenger RNA secondary structure optimization for hemoglobin alpha, it identifies sequences with more paired bases than the MFE-optimal solution of LinearDesign.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07562v1</guid>
      <category>q-bio.BM</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tristan Cazenave</dc:creator>
    </item>
    <item>
      <title>Considerations for an Integrated Detector Design at FCC-ee: A Human-AI Exploration</title>
      <link>https://arxiv.org/abs/2606.07564</link>
      <description>arXiv:2606.07564v1 Announce Type: cross 
Abstract: This report explores detector design considerations for the Future Circular Collider in its electron-positron mode (FCC-ee) through an extended dialogue between a physicist and an AI assistant. Starting from initial "prejudice" detector concepts proposed by the AI assistant without explicit physicist input, each subsystem is examined in detail, with the AI's assumptions challenged and revised through the exchange. The discussion covers the full detector from beam pipe to luminosity monitor, with particular attention to the interplay between subsystem choices and the practical considerations - calibration, stability, and operational simplicity - that are essential for a fifteen-year precision physics program. The narrative documents how the integrated detector design evolved substantially from the starting point to revised "prejudice" detector concepts of the AI assistant. The focus of this report is on the process to illustrate both the potential and the limitations of human-AI collaboration in experimental physics design, and the physics capabilities of any of the "prejudice" detector concepts remain to be explored.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07564v1</guid>
      <category>physics.ins-det</category>
      <category>cs.AI</category>
      <category>hep-ex</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Charles Young</dc:creator>
    </item>
    <item>
      <title>SurfDesign: Effective Protein Design on Molecular Surfaces</title>
      <link>https://arxiv.org/abs/2606.07567</link>
      <description>arXiv:2606.07567v1 Announce Type: cross 
Abstract: Protein function is largely determined by molecular surface geometry and physicochemical complementarity, yet most protein design methods condition only on backbone structure. We introduce SurfDesign, a surface-conditioned protein design framework that models molecular surfaces as continuous geometric manifolds and integrates them with pretrained protein language models. SurfDesign employs surface-based equivariant message passing to capture surface normals, curvature, and directional geometry, together with a parameter-efficient fine-tuning strategy. Focusing on functional protein design, we show that SurfDesign consistently outperforms prior surface-conditioned and backbone-only methods on de novo binder and enzyme design benchmarks. We also report strong performance on inverse-folding benchmarks as a diagnostic of structural compatibility. Our results highlight manifold-aware surface representations as a principled foundation for functional protein and enzyme design. Code is available at https://github.com/smiles724/SurfDesign.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07567v1</guid>
      <category>q-bio.BM</category>
      <category>cs.AI</category>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>KDD 2026 AI4Science</arxiv:journal_reference>
      <dc:creator>Fang Wu, Shuting Jin, Xiangru Tang, Mark Gerstein, Xiangxiang Zeng, Yejin Choi, Jure Leskovec, Jinbo Xu</dc:creator>
    </item>
    <item>
      <title>Forecasting Japanese elections: A nonlinear machine-learning approach</title>
      <link>https://arxiv.org/abs/2606.07572</link>
      <description>arXiv:2606.07572v1 Announce Type: cross 
Abstract: Despite Japan being one of the world's largest advanced democracies, the development of election forecasting models for its national elections remains limited. This study introduces nonlinear machine-learning forecasting models, based on decision tree and ensemble learning methods, for predicting the outcomes of Japanese lower-house elections. To assess the methodological benefits of our approach, we replicated the theoretical framework and dataset of Lewis-Beck and Tien's (LBT) foundational statistical forecasting model for Japanese elections. Our models demonstrated moderately but consistently improved predictive accuracy compared to LBT's model in both in-sample and out-of-sample evaluations, suggesting that nonlinear algorithms offer an alternative approach to classical linear methods in capturing complex electoral dynamics. This study represents one of the earlier applications of nonlinear machine-learning techniques to single-country election forecasting. It offers a replicable framework that, when combined with the country-specific electoral theories of other nations, may enhance the predictive performance of forecasting models in broader national contexts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07572v1</guid>
      <category>physics.soc-ph</category>
      <category>cs.LG</category>
      <category>stat.AP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sota Kato, Xuan Luo, Budrul Ahsan, Asahi Obata, Takafumi Nakanishi</dc:creator>
    </item>
    <item>
      <title>Forward-Looking Stress Testing Under Macro Scenarios: Stable SVaR Estimation Using a Hybrid GPR-HS Framework with SACS</title>
      <link>https://arxiv.org/abs/2606.07575</link>
      <description>arXiv:2606.07575v1 Announce Type: cross 
Abstract: Regulatory stress testing frameworks, including the Comprehensive Capital Analysis and Review (CCAR) and the Internal Capital Adequacy Assessment Process (ICAAP), require robust Stressed Value-at-Risk (SVaR) estimation under forward-looking macroeconomic scenarios. Traditional parametric approaches often exhibit numerical instability under extreme shocks, reducing the reliability of capital projections.
  This paper extends the Hybrid Gaussian Process Regression Historical Simulation (GPR-HS) framework of Vadrevu (2026) to forward-looking stress scenarios, demonstrating stability across three regimes: West Asia War, Climate Risk, and AI Bubble/Regulation.
  A key contribution is the Scenario-Averaged Covariance Stabilization (SACS) framework, which constructs stress covariance as a weighted aggregation of historical crisis regimes, providing stable and interpretable dependence structures. Stressed return paths are generated over a 252-day horizon using deterministic drift and stochastic residuals, while volatility is modeled via Gaussian Process Regression with Aggressive Noise Initialization (ANI).
  The framework exhibits consistent convergence across all assets and scenarios. SVaR ranges from -2.1020% to -2.2231%, with the coherence property |SES| &gt; |SVaR| preserved. The results support GPR-HS with SACS as a stable and regulator-aligned approach for forward-looking SVaR and SES estimation in CCAR and ICAAP applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07575v1</guid>
      <category>q-fin.RM</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ujjwala Vadrevu</dc:creator>
    </item>
    <item>
      <title>FADRW: A Feature-Aware Modulated and Dynamically Reweighted Loss for Few-Shot Linguistic Steganalysis</title>
      <link>https://arxiv.org/abs/2606.07655</link>
      <description>arXiv:2606.07655v1 Announce Type: cross 
Abstract: The ubiquity of social media platforms facilitates malicious linguistic steganography, posing significant security risks. However, detection is severely hampered by two fundamental issues during model training. Firstly, extreme class imbalance (less than 1% steganographic samples) induces a strong decision bias. Secondly, the invisibility of generative steganography means its features are nearly indistinguishable from benign text; this similarity, compounded by their extreme rarity, leads to severe feature marginalization, where faint steganographic signals are completely overwhelmed. To directly address these optimization-level challenges, we propose FADRW (Feature-Aware Modulated and Dynamically Reweighted Loss), a novel loss function framework engineered for few-shot steganalysis. FADRW employs Dynamic Reweighting to progressively counteract decision bias, and a Feature-Aware Modulation module to structurally reshape the feature space, preventing feature marginalization by enhancing the separability of these subtle features. Extensive experiments on datasets from three real-world social platforms demonstrate that FADRW significantly outperforms state-of-the-art methods, particularly in the challenging few-shot steganographic sample scenario.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07655v1</guid>
      <category>eess.SP</category>
      <category>cs.CR</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1109/LSP.2026.3700567</arxiv:DOI>
      <dc:creator>Shuo Liu, Xianghong Lin, Yukun Wei, Zhongliang Yang</dc:creator>
    </item>
    <item>
      <title>SC3: The Multi-Solvent Solubility Challenge and Benchmark</title>
      <link>https://arxiv.org/abs/2606.07656</link>
      <description>arXiv:2606.07656v1 Announce Type: cross 
Abstract: Solubility prediction is a standard benchmark in computational chemistry, yet multi-solvent models which reportedly approach the experimental-noise ceiling (i.e. the aleatoric limit) are not yet reliable enough to be deployed. We argue that this gap is partly artefactual: published benchmarks differ in curation policies, evaluate on count-weighted RMSE that hides failure on tail-heavy solvent distributions, and treat the widely cited 0.6-0.8 log S inter-laboratory figure as the aleatoric ceiling even though it reflects worst-case, not expected, disagreement. We introduce SC3, a multi-solvent solubility benchmark built on BigSolDB v2.1 with three contributions: (i) a reproducible curation pipeline yielding 101,535 measurements over 1,327 solutes and 206 solvents, with a recalibrated aleatoric floor of 0.106 log S-roughly 6 times tighter than the conventional figure; (ii) nested Gold/Silver/Bronze consensus tiers with per-point standard deviation, three leakage-checked splits, and a multi-solvent metric suite (PS-RMSE, Z-RMSE); and (iii) a 31-model benchmark across six families, whose best Bronze PS-RMSE sits at 5 times the aleatoric limit, and we observe this is a gap unclosed by any deep alternative tested. We perform three follow-on analyses: data scaling, transfer from quantum-chemistry solvation energies, and feature-level attribution, which demonstrates that calibrated per-point uncertainty is a reusable infrastructure for diagnosis beyond point prediction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07656v1</guid>
      <category>physics.chem-ph</category>
      <category>cs.CE</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vansh Ramani, Har Ashish Arora, Dhairya Kuchhal, Sergei Tatarin, Lev Krasnov, Sayan Ranu, Tarak Karmakar</dc:creator>
    </item>
    <item>
      <title>Hardware-aware Low-latency Quantum Compilation with Data-driven Lightweight Error Detection for Early Fault-Tolerant Systems</title>
      <link>https://arxiv.org/abs/2606.07666</link>
      <description>arXiv:2606.07666v1 Announce Type: cross 
Abstract: Noisy intermediate-scale quantum (NISQ) processors are entering an early fault-tolerance regime where full quantum error correction carries prohibitive resource costs, yet lightweight error detection can meaningfully improve algorithmic success rates. Existing compilation and error-detection toolchains treat these concerns in isolation, with no principled way to balance detection overhead against success probability under latency constraints. We present an integrated hardware-aware compilation and data-driven quantum error-detection (QED) framework that jointly optimises qubit mapping, SWAP insertion, and syndrome-schedule placement via a noise-weighted cost function and a learned multi-objective scheduler. Simulation experiments on an HPC cluster using GPU-accelerated density-matrix simulation (NVIDIA cuQuantum SDK) across VQE, phase-estimation, and Grover benchmarks, three noise profiles, and circuit sizes of 6-20 qubits (depths 10-160), show that joint co-design raises algorithmic success probability by up to 68 percent (95 percent CI: 60 percent to 76 percent) over SABRE on an 8-qubit VQE instance with post-selection.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07666v1</guid>
      <category>quant-ph</category>
      <category>cs.AR</category>
      <category>cs.DC</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sumit Chongder (Indian Institute of Technology Jodhpur)</dc:creator>
    </item>
    <item>
      <title>The Need for Neural ISP in the Small-Pixel Era: How Shrinking Pixels Push Optics to the Limit and Neural Restoration Pushes Back</title>
      <link>https://arxiv.org/abs/2606.07675</link>
      <description>arXiv:2606.07675v1 Announce Type: cross 
Abstract: Smartphone telephoto cameras are approaching a "telephoto physics wall": as pixel pitches shrink toward sub-0.5 micron, the optics remain limited by geometric aberrations, leading to diminishing returns on resolution. Traditional Image Signal Processors (ISPs) cannot eliminate these aberrations, because they operate through local, stage-wise processing with no explicit model of the underlying point spread function (PSF). We demonstrate how a learning-based Neural ISP for image restoration, trained on the underlying degradations, inverts what stage-wise pipelines cannot, turning small-pixel designs into a net advantage.
  We investigate this through a controlled simulation of a representative telephoto module, evaluating five configurations (0.35--0.75 micron pixel pitch). The aperture is scaled proportionally to keep per-pixel SNR and diffraction spot size fixed, thereby isolating geometric aberration and spatial sampling. While the traditional ISP improves only modestly with smaller pixels, the Neural ISP scales substantially: at 0.35 micron} it reaches 745 cycles/mm MTF50 (vertical), a 2.5--3x resolution improvement over the traditional ISP, and LPIPS improves significantly from 0.244 to 0.151 while traditional results stay comparatively flat. In a low-SNR extension (15 dB per-frame bursts at 0.35 micron), a multi-frame Neural ISP recovers performance close to the bright-light single-frame baseline, whereas a multi-frame traditional ISP shows no meaningful improvement -- indicating that traditional pipelines at small pixels are bottlenecked by uncorrected PSF blur rather than by noise. These results point to a design philosophy in which Neural ISPs enable high-resolution telephoto modules by correcting residual optical aberrations rather than requiring increasingly complex optics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07675v1</guid>
      <category>eess.IV</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jingxi Li, Neerja Aggarwal, Laurent Gudemann, Shivansh Rao, Vishal Vinod, Tom E. Bishop, Ziv Attar</dc:creator>
    </item>
    <item>
      <title>Single-Cell Cross-Modal Transfer by Adversarial Fine-Tuning of Foundation Models</title>
      <link>https://arxiv.org/abs/2606.07676</link>
      <description>arXiv:2606.07676v1 Announce Type: cross 
Abstract: Spatial transcriptomics (ST) is a powerful tool for exploring biological properties dependent on structure, proximity, and interaction in tissue. The methods underpinning ST are developing rapidly but are limited in their ability to profile many thousands of genes at a subcellular scale. Although dissociated from tissue, it is known that the whole-transcriptome readouts of cells in single-cell RNA sequencing (scRNA-seq) retain information about their former in situ neighbourhoods, motivating computational methods to recover it. While paired ST and scRNA-seq datasets are scarce, each modality in its own right is abundantly available. We therefore propose to perform cross-modal translation between unpaired ST and scRNA-seq data. In this work we show that a single-cell foundation model can perform this translation via adversarial fine-tuning. We demonstrate that our method performs favourably against methods built for multi-omics translation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07676v1</guid>
      <category>q-bio.GN</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Joseph Boyd, Matthew Lyon, Martino Mansoldo, Christian Hurry, Finnian Firth</dc:creator>
    </item>
    <item>
      <title>Disentangling Latent Risk Pathways via Bayesian Hypergraph Inference</title>
      <link>https://arxiv.org/abs/2606.07677</link>
      <description>arXiv:2606.07677v1 Announce Type: cross 
Abstract: Electronic health records (EHR) pose large-scale multi-disease modeling problems in which many outcomes are rare and strongly influenced by shared risk factors. While modern approaches achieve strong predictive performance, they often treat diseases independently or rely on black-box architectures, offering limited insight into how risk factors organize disease risk and little principled uncertainty quantification. We introduce a Bayesian hypergraph inference framework that reframes multi-disease modeling around latent, risk-factor-modulated disease pathways. Risk factors act on hyperedges, latent disease subsets with shared risk patterns, allowing diseases to participate in multiple distinct pathways and enabling interpretable, higher-order structure beyond pairwise associations. A repulsion prior encourages parsimonious and identifiable structure, while posterior inference provides calibrated uncertainty over both disease groupings and risk-factor influence. To enable scalable inference on large EHR datasets, we develop a structured variational inference algorithm that preserves logical dependencies among hyperedge existence, disease membership, and pathway-level effects. Experiments on simulated data and UK Biobank demonstrate stable and interpretable disease pathway structure, well-calibrated uncertainty, improved estimation for rare diseases, and competitive predictive performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07677v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>stat.AP</category>
      <category>stat.ME</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shengxian Ding, Haonan Gao, Pangpang Liu, Xinyuan Tian, Yize Zhao</dc:creator>
    </item>
    <item>
      <title>A Counting Process View of Relational Event Models: Practical Asymptotics</title>
      <link>https://arxiv.org/abs/2606.07680</link>
      <description>arXiv:2606.07680v1 Announce Type: cross 
Abstract: Relational Event Models (REMs) provide a rigorous framework for analyzing dyadic interactions observed in continuous time, capturing history-dependent dynamics such as triadic closure and reciprocity. Framing REMs through the lens of counting processes embeds the model in a rich theoretical foundation, facilitating its mathematical analysis. While Maximum Likelihood Estimation (MLE) is standard practice for estimating these models, the underlying statistical guarantees rely on specific asymptotic regimes, namely, whether the network size (n), the observational period (T), or both approach infinity. We review the theoretical foundations of such counting-process-based models, formalizing the core assumptions required to achieve asymptotic normality across these different limits. With a specific focus on Cox-type multiplicative models, we detail the circumstances under which these assumptions hold. Supported by simulation studies, we illustrate how structural modeling choices, including temporal windowing and logarithmic transformations, affect empirical coverage and estimator convergence. We thereby derive several guiding principles for specifying such models in realistic contexts, bridging theory and practice.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07680v1</guid>
      <category>stat.ME</category>
      <category>cs.SI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Cornelius Fritz, Alexander Fuchs-Kreiss</dc:creator>
    </item>
    <item>
      <title>Transfer learning for causal forest</title>
      <link>https://arxiv.org/abs/2606.07693</link>
      <description>arXiv:2606.07693v1 Announce Type: cross 
Abstract: Transfer learning addresses the challenge of transfering knowledge from one domain to another. Traditional transfer learning focuses on adapting models trained on a source domain (with a lot of observations) to improve performance on a target domain (with few observations). In this work we consider the case of a model shift and we focus on the transfer learning applied to a causal forest namely HTERF. This causal forest aims to estimate the Conditional Average Treatment Effect (CATE). The approach considered is the offset method presented by Wang (2016) adapted to a causal context. This method relies on the use of intermediate models in order to estimate the offset between source and target distributions. Our main result is a bound on the CATE error of HTERF on target depending on the error of the intermediate models. Simulation studies show the good performances of this approach in different settings on simulations and on a real-world dataset.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07693v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>math.PR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>B\'er\'enice-Alexia Jocteur (ICJ, PSPM), V\'eronique Maume-Deschamps (ICJ, PSPM), Pierre Ribereau (PSPM, ICJ)</dc:creator>
    </item>
    <item>
      <title>TianJi-Environ: An Autonomous AI Scientist for Atmospheric Environmental Research</title>
      <link>https://arxiv.org/abs/2606.07697</link>
      <description>arXiv:2606.07697v1 Announce Type: cross 
Abstract: As atmospheric environmental prediction continues to improve, interpretable validation of pollution mechanisms and feedback processes has become a main challenge in atmospheric chemistry. Yet mechanism validation based on complex numerical models still relies heavily on expert knowledge: mechanistic hypotheses must be operationalized into executable experiments, and model outputs must be organized into traceable evidence. We present TianJi-Environ, an auditable AI Scientist for atmospheric-chemistry mechanism validation. TianJi-Environ establishes the first WRF-Chem-based multi-agent framework that autonomously drives complex atmospheric-chemistry simulations, converting mechanistic hypotheses into executable configurations, testing experiments, and evidence criteria. Using ozone response and particulate-matter feedback as two representative examples, we demonstrate TianJi-Environ's capability for mechanism validation. In a summertime ozone case over the North China Plain, the system detects directionally consistent aerosol-radiation-interaction signals in shortwave radiation and boundary-layer height, but judges the evidence for ozone response to NOx control to be incomplete. In a wintertime PM2.5 case over the Guanzhong Basin, it localizes the unsupported link to insufficient propagation from black-carbon perturbation to particulate response and missing diagnostics of vertical absorptive heating. These results show that TianJi-Environ makes expert-driven mechanism validation explicit, structured, and auditable, offering a reproducible paradigm for multi-agent systems coupled with complex atmospheric-chemistry models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07697v1</guid>
      <category>physics.ao-ph</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haoluo Zhao, Hongchun Zhang, Nan Li, Jing-Jia Luo, Kaikai Zhang, Mengyang Yu, Nan Chen, Tao Song, Fan Meng</dc:creator>
    </item>
    <item>
      <title>MatMind: A Structure-Activity Knowledge-Driven Generative Foundation Model for Materials Science</title>
      <link>https://arxiv.org/abs/2606.07712</link>
      <description>arXiv:2606.07712v1 Announce Type: cross 
Abstract: Progress in AI-driven crystal materials science has so far been carried by narrow architectures purpose-built for individual tasks -- graph neural networks for property prediction, diffusion and flow-matching models for crystal generation -- each excelling within its niche yet unable to act as a shared backbone across the full spectrum of materials problems. Generative large language models offer a fundamentally different paradigm, in which structural representation, quantitative prediction, and structure-activity reasoning can be unified within one model, but the materials community has yet to see this paradigm realized at a level competitive with established narrow specialists. Here we present MatMind, a generative foundation model purpose-built for crystal materials science under this paradigm, developed through the coordinated activation of structure-activity knowledge and physics-informed feedback within a progressive training framework -- combining structure-activity knowledge injection, a dual-head architecture that jointly trains language reasoning and numerical regression in a shared representation space, and multi-objective physics-informed reinforcement learning over stability, novelty, and structural diversity. Across three task families, MatMind attains the lowest mean absolute error on energy above hull, bulk modulus, and band gap -- surpassing graph neural network predictors purpose-built for these tasks -- reaches an S.U.N. rate of 65.3% on unconditional crystal generation, and achieves a comparable multiplicative improvement on magnetization-density-conditioned generation, where only 21 positive samples exist within over 600000 training entries. By matching or surpassing narrow specialists on their own ground while operating within a single unified model, MatMind shows that the LLM-based paradigm can serve as a viable backbone for crystal materials science going forward.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07712v1</guid>
      <category>cond-mat.mtrl-sci</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhan'ao Yao, Boxuan Zhang, Jingyuan Shu, Xiaoyu Wu, Rongyan Wang, Linjing Li, Dajun Zeng, Yudong Yao, Tingwei Chen, Youwei Wang, Xiaolin Zhao, Jiahui Shi, Jianjun Liu</dc:creator>
    </item>
    <item>
      <title>Multi-planar 2D-U-Net Segmentation of 3D-CT Abdominal Organs augmented by Spatial Occurrence Maps</title>
      <link>https://arxiv.org/abs/2606.07717</link>
      <description>arXiv:2606.07717v1 Announce Type: cross 
Abstract: This work proposes a lightweight 2D-U-Net-based framework for segmenting five abdominal organs in large field-of-view 3D CT scans. The method combines coarse-to-fine segmentation, predictions from multiple anatomical planes, and additional fuzzy 3D spatial maps that provide anatomical location cues to improve segmentation accuracy. We combine multi-planar 2D-U-Net models augmented by a spatial occurrence map. The approach involves two main stages. First, the abdominal volume of interest region is detected by traversing the whole scan axially with a 2D-U-Net and determining the x-y-z-minimum and -maximum extents of the 5 abdominal organs of interest. Second, we use spatial occurrence maps to enhance our multi-planar 2D-U-net architecture inside the bounds from the former stage. The method is evaluated on 80 CT scans from various public sources. The results show Dice improvements of about 4% at maximum compared to the same model trained without spatial occurrence maps.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07717v1</guid>
      <category>eess.IV</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Daria Kern, Negar Chabi, Souraj Adhikary, Andre Mastmeyer</dc:creator>
    </item>
    <item>
      <title>GNSS-FM: A Self-Supervised Foundation Model for Daily GNSS Displacement Time Series</title>
      <link>https://arxiv.org/abs/2606.07725</link>
      <description>arXiv:2606.07725v1 Announce Type: cross 
Abstract: Displacement time series from Global Navigation Satellite Systems (GNSS) are essential for a wide range of applications, including monitoring tectonic crustal deformations and investigating the different stages of the earthquake cycle. Machine learning methods have proven promising for GNSS applications; however, most remain fully supervised. This creates a bottleneck as labeled data are scarce, even though large amounts of unlabeled GNSS data are freely available. We present GNSS-FM, a self-supervised foundation model for daily GNSS time series. The model uses a dual-stream input combining displacement and velocity-like increments, and is pretrained using a masked latent prediction objective with vector-quantized targets adapted from wav2vec 2.0, with several modifications for geodetic data. Pretrained on data from over 17,000 globally distributed GNSS stations, an analysis of the learned codebook suggests that the representations capture the main signal types in GNSS displacement data, including seismic offsets, tectonic drift, and seasonal patterns. The foundation model is later fine-tuned on two downstream tasks, namely 90-day displacement forecasting and seismic step localization, where it outperforms strong task-specific baselines in both cases. These results show that self-supervised pretraining is a promising approach for GNSS time series analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07725v1</guid>
      <category>physics.geo-ph</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nick Teutschmann (Institute of Geodesy and Photogrammetry, ETH Zurich, Switzerland), Laura Crocetti (Institute of Geodesy and Photogrammetry, ETH Zurich, Switzerland), Fanny Lehmann (ETH AI Center, Switzerland), Leonardo Trentini (Institute of Geodesy and Photogrammetry, ETH Zurich, Switzerland), Benedikt Soja (Institute of Geodesy and Photogrammetry, ETH Zurich, Switzerland)</dc:creator>
    </item>
    <item>
      <title>Benchmarking Quantum Algorithmic Resilience for CVaR Portfolio Optimization: The Expressibility-Coherence Trade-off</title>
      <link>https://arxiv.org/abs/2606.07727</link>
      <description>arXiv:2606.07727v1 Announce Type: cross 
Abstract: Quantum combinatorial optimization offers theoretical advantages for complex financial modeling, but physical implementation on Noisy Intermediate Scale Quantum (NISQ) devices is severely constrained by hardware topology. This study presents a hardware benchmarking analysis between a Hardware Efficient Variational Quantum Neural Network (HE-VQNN) and the Warm Start Quantum Approximate Optimization Algorithm (WS-QAOA) for a hybrid Mean Variance and Conditional Value at Risk (CVaR) portfolio objective. By implementing a novel classical quantum hybrid proxy matrix to bypass the CVaR auxiliary qubit bottleneck, we map up to 16 assets from the NIFTY 50 index onto an IBM heavy hex processor. We systematically quantify algorithmic resilience to the "SWAP tax" incurred during routing. Empirical results reveal a critical operational trade-off: WS-QAOA provides exact theoretical mapping but suffers catastrophic hardware decoherence due to exponential nonlocal gate overhead. Conversely, HE-VQNN preserves hardware coherence but lacks the mathematical expressibility to capture dense tail risk asset correlations. This study exposes the limitations of dense financial optimization on current architectures forces an nonviable choice between algorithmic inexpressibility and hardware decoherence. This is indicative of a deeper limitation as to what can and cannot be done with NISQ computers lacking in all-to-all connectivity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07727v1</guid>
      <category>quant-ph</category>
      <category>cs.CL</category>
      <category>math.OC</category>
      <category>q-fin.PM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Prashik N. Somkuwar, K. Srinivasan, G. Raghavan</dc:creator>
    </item>
    <item>
      <title>Decomposing tournaments into comparability graphs</title>
      <link>https://arxiv.org/abs/2606.07748</link>
      <description>arXiv:2606.07748v1 Announce Type: cross 
Abstract: In this note, we introduce the \emph{partial order decomposition number} of a digraph $D$, denoted $pod(D)$, defined as the minimum integer $k$ such that $A(D)=A(P_1)\cup\cdots\cup A(P_k)$, where $P_1,\ldots,P_k$ are partial orders on $V(D)$. We prove that $\dic(D)\le \diomega(D)^{pod(D)}$ for every digraph $D$. In particular, every class of digraphs with bounded $pod$ is polynomially $\dic$-bounded. We apply this to tournaments, showing that if $\mathcal C$ is a class of tournaments with bounded dichromatic number, then the closure of $\mathcal C$ under substitution is polynomially $\dic$-bounded, thereby making progress on a question of Aubian, Charbit, Lopes, and the first author.
  As further applications of $pod$, we prove that poset tournaments of bounded dimension are $\dic$-bounded, derive polynomial lower bounds on the directed clique number of an explicit family of tournaments, thereby answering a conjecture of Gutowski and Rams, and show that tournaments with bounded $pod$ have bounded domination number.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07748v1</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pierre Aboulker, Logan Crew, Julien Duron, Xinyue Fan, Hugo Jacob, R\'emy Kimbrough, Hidde Koerts, Benjamin Moore, Sophie Spirkl, St\'ephan Thomass\'e</dc:creator>
    </item>
    <item>
      <title>Beyond Point Estimates: Benchmarking Uncertainty Quantification Methods on the AION-1 Astronomical Foundation Model</title>
      <link>https://arxiv.org/abs/2606.07771</link>
      <description>arXiv:2606.07771v1 Announce Type: cross 
Abstract: Foundation models for astronomical surveys offer powerful learned representations that can be transferred to downstream regression tasks such as galaxy property estimation. However, point predictions alone are insufficient for scientific inference; reliable uncertainty quantification (UQ) is essential. We compare seven UQ methods on galaxy property regression using frozen AION-1 foundation-model embeddings, predicting redshift, stellar mass, stellar-population age, gas-phase metallicity, and specific star-formation rate, from Legacy Survey photometry/imaging and DESI spectra, with PROVABGS-derived labels. Distribution-free conformal methods achieve marginal coverage within $\sim$1\,pp of the nominal 90\% across all properties, while non-conformal baselines (Deep Ensembles, MC~Dropout) fail to calibrate reliably. Among conformal approaches, Conformalized Quantile Regression (CQR) delivers the best coverage in the bin with the poorest model predictions. More importantly, only the Locally Valid and Discriminative (LVD) framework -- particularly when operating on AION-1 embeddings -- also provides finite-sample \emph{local validity}, producing intervals that adapt to each galaxy's local prediction difficulty rather than relying on marginal guarantees alone. These results establish conformal prediction, and LVD in particular, as the preferred UQ framework for uncertainty-aware inference on foundation-model embeddings in astrophysics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07771v1</guid>
      <category>astro-ph.IM</category>
      <category>astro-ph.GA</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>Contribution to Conference on Physics and AI at Stanford University (PAI 2026)</arxiv:journal_reference>
      <dc:creator>Karla Tame-Narvaez, Aleksandra \'Ciprijanovi\'c, Shubhendu Trivedi</dc:creator>
    </item>
    <item>
      <title>Non-Archimedean Polydisc Spaces and Applications to Optimisation</title>
      <link>https://arxiv.org/abs/2606.07782</link>
      <description>arXiv:2606.07782v1 Announce Type: cross 
Abstract: We propose a new framework for optimisation over non-Archimedean spaces inspired by Berkovich geometry. Specifically, we introduce polydisc spaces, which consists of products of closed balls over a non-Archimedean field. These spaces retain the rigid hierarchical structure of the non-Archimedean field whilst acquiring many desirable geometric features absent from it. We show that metric trees embed naturally into these spaces, demonstrating their capacity to represent hierarchical data. We study their metric geometry, establishing properties such as geodesic uniqueness, confirming their comaptibility with classical optimisation techniques. We further propose a class of real-valued functions given by linear combinations of absolute values of polynomials. These functions admit a piecewise polynomial description along geodesics and satisfy a universal approximation property. We formulate a theory of optimisation on polydisc spaces: we prove existence of minimisers and explore algorithms for finding them. We provide an accompanying open-source Julia library implementing the core objects and optimisation procedures introduced.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07782v1</guid>
      <category>math.OC</category>
      <category>cs.LG</category>
      <category>math.MG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Paul Lezeau, Yiannis Fam, Anthea Monod, Yue Ren</dc:creator>
    </item>
    <item>
      <title>Blow-ups of order types of positive density</title>
      <link>https://arxiv.org/abs/2606.07806</link>
      <description>arXiv:2606.07806v1 Announce Type: cross 
Abstract: Order types are an equivalence relation between point configurations that capture their combinatorial and convexity properties. Let $P$ be a $\kappa$-colored sequence of $n \ge d+1$ points in general position in $\mathbb{R}^d$. Let $\rho$ be a $\kappa$-colored order type on $k \le d+1$ points that has positive density on $P$; that is, for some constant $\delta &gt;0$, there are $\delta \cdot \binom{n}{k}$ $k$-point subsequences of $P$ that have the same order type as $\rho$ and the same color pattern. In this paper we show that there exists a constant $c &gt;0$ (depending only on $d, \delta$, $k$ and $\kappa$) and disjoint subsets $X_1,\dots,X_k$ of $P$, each with at least $c \cdot n$ points, such that for every choice of $k$ points $x_i \in X_i$, $(x_1,\dots,x_k)$ has the same order type and color pattern as $\rho$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07806v1</guid>
      <category>math.CO</category>
      <category>cs.CG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ruy Fabila-Monroy, Benedikt Hahn, Jes\'us Lea\~nos</dc:creator>
    </item>
    <item>
      <title>Agentic multi-fidelity learning of quasiparticle and excitonic properties</title>
      <link>https://arxiv.org/abs/2606.07836</link>
      <description>arXiv:2606.07836v1 Announce Type: cross 
Abstract: Many-body GW-Bethe-Salpeter equation calculations are essential for accurate simulations of electronic structure and optical properties in modern low-dimensional nanomaterials. However, these methods are computationally demanding and can exhibit localized numerical instabilities or convergence failures that are difficult to detect within high-throughput workflows. We introduce an agent-guided multi-fidelity framework for correcting GW-Bethe-Salpeter excited-state landscapes in strained MoS2-WS2 bilayers. Across stacking registries, strain branches and reciprocal-space samplings, the workflow identifies spike-like excursions, near-zero-gap collapse and cross-fidelity inconsistencies associated with fragile long-wavelength dielectric screening. A structural agent evaluates calculations by assigning confidence weights and selectively using a small number of high-accuracy reference points. Machine learning models then transfer information across related systems and apply Gaussian process corrections to recover improved quasiparticle gaps and exciton binding energies, with calibrated uncertainty estimates. The approach corrects numerically induced artifacts without erasing physical strain dependence and substantially improves agreement with higher-fidelity references relative to a no-agent baseline. These results show that reliable surrogate learning for excited-state materials requires explicit diagnosis of numerical fragility, not direct interpolation of raw first-principles data points. The proposed framework is readily transferable to other optoelectronic nanomaterials characterized by strong quantum confinement, such as quantum dots, nanoribbons, layered two-dimensional semiconductors, and hybrid perovskite nanostructures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07836v1</guid>
      <category>cond-mat.mtrl-sci</category>
      <category>cond-mat.stat-mech</category>
      <category>cs.AI</category>
      <category>physics.comp-ph</category>
      <category>quant-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Arnab Neogi, Aaron Forde, Christopher A. Lane, Sergei Tretiak, Jian-Xin Zhu</dc:creator>
    </item>
    <item>
      <title>Large-scale empirical tuning and comparison of default optimizers for variational inference</title>
      <link>https://arxiv.org/abs/2606.07841</link>
      <description>arXiv:2606.07841v1 Announce Type: cross 
Abstract: Black-box variational inference (BBVI) is a methodology for posterior approximation that relies on stochastic optimization. In practice, the stochastic optimizers underpinning BBVI generally require extensive problem-specific tuning, which undermines its promise as a truly "black box" inference algorithm. However, over the past decade, many new adaptive stochastic optimization algorithms have been developed that reduce or remove entirely the need for tuning. In this work, we investigate this new collection of adaptive methods in the context of BBVI, with the goal of establishing the current state of the art in tuning-free optimization-based inference. In particular, we present a large-scale empirical evaluation of 56 stochastic gradient-based optimization algorithms applied to 1092 Bayesian inference optimization problems, involving over 550,000 individual optimization runs and 15 core-years of compute. The optimization algorithms we evaluate are chosen to represent a wide spectrum of recent approaches and the benchmark problems are chosen to span a range of difficulty, with posterior target dimension 1-10^4, condition number 1-10^8, and a range of variational families. Our results show that no single method dominates, but running a selection of 5 algorithms suffices to reliably get close to the best-possible observed performance. We thus provide a strong baseline for applications where expert tuning is not possible and for comparison when developing new stochastic optimization algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07841v1</guid>
      <category>stat.CO</category>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Trevor Campbell, Jonathan H. Huggins, Kyurae Kim, Charles C. Margossian</dc:creator>
    </item>
    <item>
      <title>Counting Hamiltonian Paths in 3-Regular Planar Graphs</title>
      <link>https://arxiv.org/abs/2606.07844</link>
      <description>arXiv:2606.07844v1 Announce Type: cross 
Abstract: We introduce two infinite families of 3-regular planar graphs. Both families are conceptual adversaries to the Pohl-Warnsdorf algorithm for finding Hamiltonians. We provide a closed form calculation of the number of Hamiltonians.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07844v1</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Ira Pohl, Larry Stockmeyer</dc:creator>
    </item>
    <item>
      <title>Affine Filtering Measurements and Their Applications to Quantum Decoding</title>
      <link>https://arxiv.org/abs/2606.07852</link>
      <description>arXiv:2606.07852v1 Announce Type: cross 
Abstract: Unambiguous state discrimination (USD) measurements are attractive because outcomes are either marked as conclusive (i.e., error free) or inconclusive (i.e., erased). We study affine filtering measurements, a structured variant of USD for decoding classical linear codes over pure-state classical-quantum channels, where a conclusive outcome identifies an affine subspace containing the transmitted codeword and an inconclusive outcome is treated as an erasure. For a group-covariant indexing of pure-state codewords, we show that the optimal design of affine filtering measurements is a semidefinite program that can be reduced to a linear program via character-based diagonalization. We use the resulting measurement to build a quantum decoding framework for local codes, and we demonstrate (via simulations on regular LDPC codes from Gallager ensembles using single parity check local constraints) that affine filtering based decoding can outperform symbol-wise USD and symbol-wise pretty good measurement based decoding methods on i.i.d. pure-state channels. In an independent and concurrent work, Buzet and Chailloux study similar fine-grained USD measurements for symmetric families of states. Their focus is on the code-agnostic setting whereas our focus is on code-aware constructions and decoding.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07852v1</guid>
      <category>quant-ph</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Avijit Mandal, Noah Shutty, Henry D. Pfister, Stephen P. Jordan</dc:creator>
    </item>
    <item>
      <title>Optimal Wiener-Filter Solutions for Denoising of Graph Signals on Directed Graphs</title>
      <link>https://arxiv.org/abs/2606.07876</link>
      <description>arXiv:2606.07876v1 Announce Type: cross 
Abstract: Graph signal processing has opened new avenues to the canonical denoising problem in interesting settings. Specifically, here we propose a Wiener-filter solution for graph signals on directed graphs. Under various stationarity assumptions combining uncorrelated and correlated noise conditions, we show optimal solutions, including a successful proof-of-concept for temperature graph.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07876v1</guid>
      <category>eess.SP</category>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Chun Hei Michael Chan, Alexandre Cionca, Dimitri Van De Ville</dc:creator>
    </item>
    <item>
      <title>On the sharp linear convergence rate of the circumcentered--reflection method on subspaces</title>
      <link>https://arxiv.org/abs/2606.07888</link>
      <description>arXiv:2606.07888v1 Announce Type: cross 
Abstract: For two subspaces $U,V\subseteq\RR^n$, the circumcentered--reflection method (CRM) of Behling, Bello-Cruz, and Santos~\cite{BBS2018} computes the projection onto $U\cap V$ using only the reflections across $U$ and $V$, with known linear-convergence rate $c_F$, the cosine of the Friedrichs angle. We prove that, when CRM is initialized in $V$, it contracts at the strictly smaller rate $\rho_V=(\sin^2\theta_p-\sin^2\theta_F)/(\sin^2\theta_p+\sin^2\theta_F)$, where $\theta_F\in(0,\pi/2]$ is the Friedrichs angle and $\theta_p\in[\theta_F,\pi/2]$ the largest principal angle between $U$ and $V$. The bound is sharp, attained on an explicit ray in $V$, and optimal among parameter-free single-step iterations. The constant itself is not new: Bauschke, Bello-Cruz, Nghia, Phan, and Wang~\cite{BBNPW2016} identified it as the optimal rate of the relaxed alternating-projection family and of their adaptive linesearch map $B_T$. Our contribution is that the parameter-free geometric circumcenter attains it as well, via Kantorovich's inequality applied to a single self-adjoint operator on $V$. Restricted to $V$, CRM coincides pointwise with the linesearch maps $A_T$ and $B_T$ from the Gubin--Polyak--Raik framework~\cite{GPR1967}. We further prove $\rho_V&lt;c_F^2$ whenever $\theta_F&lt;\pi/2$, with one-step convergence exactly when $\theta_F=\theta_p$. Over-reflecting either or both of $R_U$, $R_V$ inside the circumcenter does not help. Going faster than $\rho_V$ universally requires memory: Chebyshev semi-iteration applied to $P_VP_U$ attains a strictly smaller rate, beating $\rho_V$ by a factor at most $2$, attained in the limit $\theta_F\to\theta_p$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07888v1</guid>
      <category>math.OC</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yunier Bello-Cruz</dc:creator>
    </item>
    <item>
      <title>Beyond the Thin-Layer Limit: Differentiable Volumetric Training for Visible-Range Diffractive Neural Networks</title>
      <link>https://arxiv.org/abs/2606.07896</link>
      <description>arXiv:2606.07896v1 Announce Type: cross 
Abstract: Diffractive deep neural networks (D2NNs) promise miniaturized, power-efficient, light-speed optical front-ends for machine vision, yet the most mature demonstrations remain in the terahertz regime, built from readily fabricated millimeter-scale neurons. Translating D2NNs to the visible range, where nearly all vision pipelines operate, was long blamed on the difficulty of fabricating nanoscale neurons; but even after recent advances removed that barrier, visible-range D2NNs matching their terahertz counterparts remain out of reach. We identify the true obstacle as the thin-layer approximation underlying nearly all D2NN training, which treats each diffractive layer as an infinitely thin mask. It fails not because of the short wavelength, as is commonly assumed, but because the low-refractive-index materials (n approximately 1.3-1.5) used at visible wavelengths require relief structures thick enough that intra-layer diffraction and phase accumulation become significant. To overcome this, we introduce a differentiable beam-propagation ($\partial$BPM) layer that models each element as a finite-thickness volume and propagates light through it during training, keeping the fabrication-compatible height map end-to-end trainable without full-wave simulation in the loop. Across MNIST, Fashion-MNIST, and CIFAR-100 classification and imaging, $\partial$BPM training substantially reduces the design-to-device mismatch, and full-wave FDTD validation raises classification accuracy from 50% to 90% without re-optimization. The $\partial$BPM layer thus offers a scalable, physics-aware bridge between efficient optical neural-network optimization and fabrication-consistent diffractive design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07896v1</guid>
      <category>physics.optics</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dineth Jayakody, Dushan N. Wadduwage</dc:creator>
    </item>
    <item>
      <title>Identifiability and Estimation for Unlabeled Finite Mixtures under Marginal Independence</title>
      <link>https://arxiv.org/abs/2606.07914</link>
      <description>arXiv:2606.07914v1 Announce Type: cross 
Abstract: We study component recovery and mixing-matrix estimation from unlabeled finite mixtures whose observable distributions share the same latent components but have unknown mixing weights. The main identifying signal is marginal independence: each component is assumed to be independent on at least one coordinate pair, but no labels, clean component samples, or mixing weights are observed. We first prove a structural result for product components: under linear independence of the univariate marginals, any independent affine combination of the components must coincide with a single component. We then extend this principle to observable mixtures and show that, under full-rank and no-cancellation conditions, marginally independent affine combinations recover the corresponding latent components. When every component is independent on some coordinate pair, all components are identifiable, and the mixing matrix is recoverable under the stated completion conditions. Finally, we propose a Product-Marginal Maximum Mean Discrepancy (PM-MMD) estimator over affine combinations of the observable mixtures and prove uniform convergence and stability under approximate marginal independence. This framework also separates the empirical roles of the assumptions: irreducibility is, in general, not directly testable from the unlabeled mixtures alone, whereas marginal independence yields a candidate-level diagnostic through held-out PM-MMD. Controlled and flow-cytometry experiments show when marginal independence provides a useful recovery signal. In the reported multi-component comparisons, condition-aware representative selection stabilizes PM-MMD and improves recovery relative to clustering, factorization, and pairwise mixture-proportion baselines using the same unlabeled mixtures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07914v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Takafumi Kanamori, Yushi Hirose, Shohei Yamamoto</dc:creator>
    </item>
    <item>
      <title>Barycentric Projections of Optimal Transport Plans on Riemannian Manifolds</title>
      <link>https://arxiv.org/abs/2606.07926</link>
      <description>arXiv:2606.07926v1 Announce Type: cross 
Abstract: Optimal transport couplings are probabilistic objects, while many learning pipelines require deterministic maps. In Euclidean space, barycentric projection converts a coupling into a map by taking conditional expectations, but on a Riemannian manifold curvature and cut loci make this operation nontrivial. We develop a framework for barycentric projections of transport couplings on Riemannian manifolds. The intrinsic projection maps each source point to the conditional Fr\'echet mean of its destination law and is shown to be the best deterministic representative under squared geodesic loss. The corresponding minimum value is an integrated conditional Fr\'echet variance, which vanishes exactly for map-induced couplings and therefore defines a conditional-variance Monge defect. We also study a tangential log-exp projection, prove its Euclidean exactness, its compatibility with Brenier-McCann maps in the Monge case, and its interpretation as the first unit Riemannian gradient update for the intrinsic objective. For discrete couplings, both constructions decompose row-wise into weighted Fr\'echet mean and log-exp problems. Experiments on spherical data, synthetic SPD data, and real EEG covariance matrices support the proposed division of roles: the intrinsic projection is the variational representative, while the tangential projection is a useful local displacement surrogate.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07926v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Kisung You</dc:creator>
    </item>
    <item>
      <title>Pointwise Complexity for Gaussian Fields: Upper Envelopes, Algorithmic Lower Bounds, and Separation</title>
      <link>https://arxiv.org/abs/2606.07931</link>
      <description>arXiv:2606.07931v1 Announce Type: cross 
Abstract: We prove a variance-aware pointwise majorizing-measure theorem for centered Gaussian processes. Classical generic chaining characterizes the scalar quantity $\mathbb E\sup_{x\in T}X_x$; the theorem here gives a simultaneous high-probability envelope for the entire field. For an ambient prior $\mu$, the envelope at $x$ is governed by a pointwise Fernique-Talagrand functional \[\Phi_\mu(x):=\int_0^{4\sigma(x)}\sqrt{\log\frac{1}{\mu(B_d(x,\varepsilon))}}\,d\varepsilon,\] together with the corresponding Gaussian tail term. The theorem provides a reusable field-level refinement of classical generic chaining and a Gaussian-process counterpart of pointwise empirical-process bounds for deep neural networks.
  We also record a Bayesian algorithmic lower envelope from the interactive Fano/data-processing principle. For a known prior $\pi$, an observation channel, and a concrete estimator $\widehat t(Y)$, the lower bound is expressed through the exact ghost small-ball mass $\mathbb E_{Y\sim Q}\pi(B_d(\widehat t(Y),\Delta))$, rather than a worst-case covering number. In Gaussian location experiments, comparison decoders convert Bayes location error into lower bounds on decision-aligned Gaussian ranges. We then construct an elementary weighted-basis example separating the usual Fano relaxation for a fixed prior, the Bayesian algorithmic lower envelope, the pointwise Gaussian envelope on the selected subatlas, and the full-class minimax risk/global Gaussian scale. Together, these results show that algorithmic lower bounds provide local-geometric certificates of pointwise complexity for fixed estimators in overparameterized ambient classes, precisely in regimes where classical minimax theory becomes either too coarse or oracle-dependent.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07931v1</guid>
      <category>math.PR</category>
      <category>cond-mat.stat-mech</category>
      <category>cs.IT</category>
      <category>cs.LG</category>
      <category>math.IT</category>
      <category>math.ST</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yunbei Xu</dc:creator>
    </item>
    <item>
      <title>Feasibility to detect rapid change and disappearance of seagrass: Lessons from nearly 80 years of vegetation change in the Ako, Seto Inland Sea, Japan</title>
      <link>https://arxiv.org/abs/2606.07949</link>
      <description>arXiv:2606.07949v1 Announce Type: cross 
Abstract: This study analyses the Ako tidal flat in the Seto Inland Sea, Japan, where nearly all Zostera marina disappeared within a single year in 2025. Using aerial photographs from the 1940s onward, high-resolution satellite imagery, GRUS images (2.5-5 m), and monthly Sentinel-2 composites (10 m), we reconstructed approximately 80 years of seagrass distribution. YOLO-based segmentation using deep learning achieved high accuracy (overall accuracy &gt;= 0.9) across these datasets; although species could not be discriminated, the models captured the major temporal dynamics in vegetation area. The long-term mean seagrass area was 6.8 ha, but values fluctuated widely, from 3.5 ha in 1974 to 41.3 ha in 1989 except 0.2 ha in 2025. Sentinel-2 composites from 2019 to 2026 revealed clear seasonality, with vegetation increasing in early summer and declining from autumn. In 2025, however, the area decreased sharply after summer and remained anomalously low throughout the winter of 2025-2026. Our results, indicating that the 2025 event was not a normal fluctuation but a rapid ecosystem shift involving the loss of the dominant canopy-forming species, most plausibly driven by regionally elevated summer water temperatures. The findings also have implications for seagrass Essential Ocean Variables (EOVs) and the State of Nature (SoN) metrics used in TNFD-aligned nature-related disclosures. Unlike forests, seagrass meadows require finer temporal resolution because both pronounced seasonality and abrupt collapse strongly influence area-based indicators. Therefore, in addition to previously noted issues such as species-level classification accuracy, we recommend that (1) baselines be defined over the longest available record and justified ecologically, (2) seasonal standardization be applied before inter-annual comparisons, and (3) years with extreme area anomalies be flagged rather than used as reference points.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07949v1</guid>
      <category>q-bio.PE</category>
      <category>cs.CV</category>
      <category>eess.IV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Takehisa Yamakita, Yoji Igarashi, Akira Eto, Ken Ishida, Masaaki Iiyama</dc:creator>
    </item>
    <item>
      <title>Lagrange multipliers in Maximum likelihood estimations and Least squares problems with Constraints</title>
      <link>https://arxiv.org/abs/2606.07984</link>
      <description>arXiv:2606.07984v1 Announce Type: cross 
Abstract: This study investigates a statistical property of Lagrange multipliers in constrained Maximum Likelihood Estimation (MLE) and Least Squares (LS) problems from the perspective of numerical optimization. Building on large-sample theory, we show that the associated Lagrange multipliers converge to zero as the sample size increases, provided the distribution is correctly specified in MLE or the residuals are normally distributed in LS. Although this asymptotic behavior has long been recognized in statistics, it has received little explicit attention in numerical optimization and has rarely been exploited in algorithmic design. Importantly, the insight extends beyond classical low-dimensional settings: even in modern high-dimensional applications, such as deep learning, where the number of parameters may exceed the sample size, the same reasoning applies provided the generalization performance is good.
  This observation has two main implications. First, many constrained optimization algorithms, including the Augmented Lagrangian Method, Sequential Quadratic Programming, and Interior Point methods, require initial values for the multipliers, and choosing zero is statistically justified. Numerical experiments for constrained regressions and dynamic discrete choice model estimations support this implication by showing that initializing multipliers at zero usually lead to stable and efficient performance. Second, penalty-based approaches that convert constrained problems into unconstrained ones can perform well when the true multipliers are small. This helps explain why penalty-based methods often perform well in practice.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07984v1</guid>
      <category>econ.EM</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <category>stat.CO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Takeshi Fukasawa</dc:creator>
    </item>
    <item>
      <title>Repair Before Veto, When Repair Is Hidden: Quantum-Accessible Features for Repair-Augmented Constraint Learning</title>
      <link>https://arxiv.org/abs/2606.08020</link>
      <description>arXiv:2606.08020v1 Announce Type: cross 
Abstract: Hard-constraint decision systems usually veto infeasible candidates. This is too rigid when the system can act: if a known affordable repair would make an infeasible candidate feasible and valuable, rejection is a false veto rather than a ranking error. We introduce Q-RACL (Quantum Repair-Augmented Constraint Learning), a repair-before-veto framework that first defines RACL decision semantics and then identifies the single inference link where quantum feature access can be load-bearing. RACL accepts a candidate when a sequential repair plan restores feasibility and preference; otherwise it returns structured rejection credit. The hard link is repair-feasibility inference: which repair class restores feasibility from an observed candidate and context. We construct a discrete-logarithm-hidden RACL family where the repair class is a shifted interval rule in the latent exponent a = log_g(x), while the learner observes only x = g^a mod p. Under standard DLP-based learning separation, this coordinate is inaccessible to efficient raw-input classical policies but accessible to a quantum agent through Shor/Fourier structure. Across six primes and ten seeds, bounded raw-input classical policies and a wrong raw-Fourier encoding remain near chance, whereas the Q-DLP policy keeps false-veto rate below 1.1%, wins all paired seeds, and yields QNI_cond = 0.9777 to 0.9972. A classical DLog oracle matches it, isolating feature access rather than classifier capacity. Thus quantum AI is not added as a generic model upgrade; for this DLP-hidden repair family, it supplies the missing feature that closes the repair-before-veto loop.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08020v1</guid>
      <category>quant-ph</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yifan Wang</dc:creator>
    </item>
    <item>
      <title>Variational Proximal Policy Optimization</title>
      <link>https://arxiv.org/abs/2606.08032</link>
      <description>arXiv:2606.08032v1 Announce Type: cross 
Abstract: Reinforcement Learning from Human Feedback via Proximal Policy Optimization often suffers from policy mode collapse, brittle exploration loops, and distribution drift. This paper introduces Variational Proximal Policy Optimization (\(\textsc{VP}_2\textsc{O}\)), a particle-based variational inference framework that maps policy optimization to Stein Variational Gradient Descent within a Mixture-of-Experts architecture. By leveraging functional kernels over localized expert prototypes alongside an expert orthogonalization loss, \(\textsc{VP}_2\textsc{O}\) introduces a geometry-based proximal-control mechanism that can reduce reliance on fixed clipping or KL schedules. Our results on a 33B/4B sparse Mixture-of-Experts model show several improvements across complex reasoning benchmarks, establishing a \(+\mathbf{179}\) ELO gain on Codeforces and a \(\mathbf{32\%}\) reduction in token count on AIME mathematical reasoning tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08032v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ousmane Amadou Dia</dc:creator>
    </item>
    <item>
      <title>Numerical solution of the nonlinear Dirac equation by a splitting variational quantum algorithm</title>
      <link>https://arxiv.org/abs/2606.08053</link>
      <description>arXiv:2606.08053v1 Announce Type: cross 
Abstract: In this work, we propose an operator-splitting variational quantum algorithm, termed Dirac-sVQA, for simulating the nonlinear Dirac equation (NLDE). The main difficulty arises from the state-dependent nonlinear interaction, its time-discrete update depends explicitly on the intermediate spinor state and, in general, cannot be implemented as a fixed state-independent unitary circuit. To address this difficulty, we decompose the NLDE evolution into a structured linear Dirac substep and a nonlinear variational correction. The linear substep is implemented by a spinor-Fourier Dirac propagator on a joint position-spin register, preserving the spin-momentum coupling and mass-induced spin evolution of the Dirac operator. The nonlinear correction is reformulated as a measurement-based variational update through a small set of overlap, self-channel, and cross-channel observables. We provide the corresponding quantum circuits and derive measurement-aware resource and complexity estimates. Numerical experiments in several nonlinear regimes show that Dirac-sVQA accurately captures both the total density and the componentwise spinor dynamics, agrees well with classical Fourier pseudospectral splitting solutions, and exhibits stable error behavior over time. These results provide numerical evidence for the feasibility of operator-splitting variational quantum simulation for nonlinear relativistic wave equations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08053v1</guid>
      <category>quant-ph</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qian Zuo, Ying He, Xiaofei Zhao</dc:creator>
    </item>
    <item>
      <title>New Fractional Ambiguity Function Integrated with CNN-Based Machine Learning for Signal Classification</title>
      <link>https://arxiv.org/abs/2606.08110</link>
      <description>arXiv:2606.08110v1 Announce Type: cross 
Abstract: A new fractional ambiguity function (NFrAF) derived from the fractional Fourier transform is introduced as a generalization of the classical ambiguity function. The fundamental analytical properties of the NFrAF, including symmetry, marginality, and Moyal type identities, are rigorously established. After verifying its ability to detect and localize monocomponent and multicomponent linear frequency modulated (LFM) signals, the NFrAF is integrated into a convolutional neural network based machine learning framework for signal classification. Owing to its superior time frequency resolution and localization, the NFrAF provides a more informative input representation than conventional methods such as the spectrogram and classical ambiguity function. Experimental results on simulated datasets demonstrate consistent improvements in classification accuracy, highlighting the effectiveness of the proposed representation for data driven signal analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08110v1</guid>
      <category>math.FA</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Aamir H. Dar, Prakhar Kumar Sonkar, Neeraj Kumar Sharma</dc:creator>
    </item>
    <item>
      <title>Palindrome complexity versus factor complexity</title>
      <link>https://arxiv.org/abs/2606.08127</link>
      <description>arXiv:2606.08127v1 Announce Type: cross 
Abstract: Let ${\bf x} = (a_i)_{i \geq 0}$ be an infinite word over a finite alphabet $\Sigma$. Let $\rho (n)$ be the factor complexity function for $\bf x$ and ${\rm Pal}(n)$ be the palindrome complexity function for $\bf x$. We give a new relationship between these two quantities; namely, if $\bf x$ is not ultimately periodic, then $$ \lim_{n \rightarrow \infty} {{ {\rm Pal} (n) \log ({\rm Pal} (n) + 1)} \over {\rho (n)}} = 0. $$ Furthermore, we prove that the numerator in this result is essentially optimal.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08127v1</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <category>cs.FL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jeffrey Shallit</dc:creator>
    </item>
    <item>
      <title>Biological Reasoning-Informed Regression for Interpretable Regulatory DNA Activity Prediction</title>
      <link>https://arxiv.org/abs/2606.08147</link>
      <description>arXiv:2606.08147v1 Announce Type: cross 
Abstract: DNA cis-regulatory elements (CREs) such as enhancers control gene expression levels. Accurately predicting regulatory activity from DNA sequences is valuable but challenging, as it requires understanding complex biological regulatory processes. Existing methods typically regress activity scores from sequences in a black-box manner, limiting both interpretability and regression performance. Meanwhile, large language models (LLMs) benefit from explicit reasoning processes, yet directly applying LLMs to raw DNA sequences performs poorly. In this paper, we bridge this gap by introducing R3LM, a framework that teaches LLMs reasoning-informed regression on regulatory DNA through structured biological knowledge. Specifically, we design a biologically grounded data format that structures DNA's regulatory information for improved LLM understanding, and construct CRE-ReasonBench, the first dataset that associates DNA sequences and activity scores with mechanistic reasoning traces. Through two-stage training that first teaches LLMs reasoning over structured biological information then performs regression, R3LM achieves state-of-the-art performance on enhancer prediction across three cell types, outperforming both LLMs with raw sequence input and specialized DNA models while providing interpretable mechanistic explanations. We expect R3LM as an interpretable reward model that can effectively assist biologists in CRE design. Code is available at https://github.com/DuanYi516/R3LM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08147v1</guid>
      <category>q-bio.GN</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yi Duan, Zhao Yang, Jiwei Zhu, Ying Ba, Chuan Cao, Bing Su</dc:creator>
    </item>
    <item>
      <title>Inverse design of bespoke interatomic potentials via active learning by information-matching</title>
      <link>https://arxiv.org/abs/2606.08148</link>
      <description>arXiv:2606.08148v1 Announce Type: cross 
Abstract: Interatomic potentials (IPs) enable large-scale atomistic simulations beyond the reach of first-principles methods, but their predictive reliability depends critically on the selection of training data, quantified uncertainty, and model expressiveness. Active learning (AL) provides a principled framework for constructing efficient and accurate IPs, yet most strategies reduce parameter uncertainty without explicitly accounting for the specific material properties being predicted. The information-matching (IM) approach addresses this limitation by requiring that the selected training data provide at least as much parameter space information as needed to achieve prescribed uncertainty targets for selected quantities of interest (QoIs). Here, we apply IM to develop bespoke IPs specifically tailored for predicting plastic strength in metals. Due to the high computational cost of simulating plastic strength, we employ an indirect IM strategy that targets inexpensive intermediate QoIs that correlate with strength. The IM method enables precise parameter constraints with minimal training data, yielding precise predictions for both the intermediate QoIs and plastic strength. Yet, model error remains a key limitation, and a post hoc uncertainty inflation correction provides a viable means to mitigate this limitation. These findings illustrate both the promise and limits of uncertainty-aware AL for predicting complex material properties.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08148v1</guid>
      <category>cond-mat.mtrl-sci</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yonatan Kurniawan (Department of Physics and Astronomy, Brigham Young University, Provo, UT, USA), Logan D. Williams (Lawrence Livermore National Laboratory, Livermore, CA, USA), Amit Samanta (Lawrence Livermore National Laboratory, Livermore, CA, USA), Ilia Nikiforov (Department of Aerospace Engineering and Mechanics, University of Minnesota, Minneapolis, MN, USA), Daniel Schwalbe-Koda (Department of Materials Science and Engineering, University of California, Los Angeles, CA, USA), Mark K. Transtrum (Cross Stream Consulting, Springville, UT, USA), Ellad B. Tadmor (Department of Aerospace Engineering and Mechanics, University of Minnesota, Minneapolis, MN, USA), Vincenzo Lordi (Lawrence Livermore National Laboratory, Livermore, CA, USA), Vasily V. Bulatov (Lawrence Livermore National Laboratory, Livermore, CA, USA)</dc:creator>
    </item>
    <item>
      <title>Latent Structural Categorical Matrix Completion with Application to Quasispecies Analysis</title>
      <link>https://arxiv.org/abs/2606.08188</link>
      <description>arXiv:2606.08188v1 Announce Type: cross 
Abstract: Matrix completion has been extensively studied for real-valued data, but existing methods are often limited in handling categorical variables. We propose LCMC, a double-loop optimization framework for categorical matrix completion via latent factorization based on a binary tensor representation. In this setting, each categorical entry is encoded as a one-hot vector along a third tensor mode, thereby preserving its discrete, non-ordinal nature. The outer loop adaptively estimates the latent dimension by iteratively updating it with feedback from the inner loop, while the inner loop reconstructs the categorical matrix through tensor factorization, supported by a corresponding theoretical analysis. To further improve scalability and robustness, we introduce enhancements including a split-merge-refine strategy and an adaptive data reduction technique. Experiments on synthetic and real-world datasets in viral quasispecies reconstruction, demonstrate that LCMC achieves superior accuracy and efficiency compared to existing methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08188v1</guid>
      <category>math.OC</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qian Zhang, Meixia Lin</dc:creator>
    </item>
    <item>
      <title>Beyond Additivity: Causal Discovery in Location-Scale Noise Models with Hidden Variables</title>
      <link>https://arxiv.org/abs/2606.08196</link>
      <description>arXiv:2606.08196v1 Announce Type: cross 
Abstract: We study causal discovery from observational data when some variables are hidden and the data-generating process follows a location-scale noise model (LSNM). Existing methods that handle hidden confounders typically assume additive noise, but in practice, causes often modulate not just the mean but also the variance of their effects. We prove that acyclic directed mixed graphs (ADMGs) satisfying a bow-free condition are identifiable under LSNM with hidden variables, establishing the first identifiability result for causally insufficient models beyond noise additivity. We further provide sufficient conditions for identifying causal direction even when the bow-free assumption is violated. Our two-stage algorithm, LSNM-UV, is sound and complete, and experiments demonstrate improved performance over additive baselines on heteroscedastic data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08196v1</guid>
      <category>stat.ML</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>stat.ME</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Mariyam Khan, Shohei Shimizu, Thong Pham</dc:creator>
    </item>
    <item>
      <title>Vector Space of Cycles</title>
      <link>https://arxiv.org/abs/2606.08202</link>
      <description>arXiv:2606.08202v1 Announce Type: cross 
Abstract: Most statistical and machine learning methods for directed interactions focus on pairwise effects among variables. Even existing cyclic models represent feedback primarily through node-level dependencies, making large-scale recurrent organization difficult to estimate and compare. This limitation is particularly acute in biological and neural systems, where interactions are highly recurrent and involve many overlapping cycles. We introduce a variational framework for statistical inference on cyclic interactions. Directed interactions are represented as edge flows on a simplicial complex and evolved under an energy-minimizing dynamical system. The resulting dynamics separate transient interaction components from persistent harmonic flows, yielding a low-dimensional cycle space that captures stable recurrent organization. Rather than enumerating individual cycles, the proposed framework represents cyclic interactions as elements of a Hilbert space, enabling projection, averaging, comparison, and population-level statistical inference. We establish theoretical properties of the harmonic projection, including characterization of the cycle space, variance reduction, and population inference. Simulations demonstrate substantially improved recovery of cyclic structure in dense recurrent systems compared with existing directed-interaction methods. Applied to resting-state fMRI from 400 human subjects, the framework reveals reproducible large-scale cyclic organization that is not detectable through edgewise averaging. These results provide a scalable statistical framework for studying recurrent interactions in high-dimensional dynamical systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08202v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>physics.data-an</category>
      <category>q-bio.NC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Moo K. Chung, Anass B. El-Yaagoubi, Hernando Ombao</dc:creator>
    </item>
    <item>
      <title>Paediatric-HGNN: A Hybrid Heterogeneous Graph Neural Network for Detecting Disfluency in Children's Speech via Multiscale Acoustic Fusion</title>
      <link>https://arxiv.org/abs/2606.08210</link>
      <description>arXiv:2606.08210v1 Announce Type: cross 
Abstract: Automated stuttering detection (ASD) systems struggle with paediatric speech due to high acoustic variability in developing voices and the subtle distinction between pathological stuttering and typical developmental disfluencies. We introduce Paediatric-HGNN, a framework using a Context-aware Part-whole Interaction Network (CaPIN) tailored for paediatric data. Instead of conventional 1D signal modelling, our approach builds a heterogeneous graph capturing hierarchical relationships between lexical units (word nodes) and fine-grained acoustic segments (frame nodes). Trained on curated paediatric corpora (UCLASS and FluencyBank), Paediatric-HGNN achieves 82.4% weighted accuracy and a Typical Disfluency F1-score of 0.386. Modelling hierarchical lexical-acoustic interactions captures developmental "searching" behaviour, offering a more robust and interpretable tool for early clinical intervention.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08210v1</guid>
      <category>eess.AS</category>
      <category>cs.CL</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rashini Liyanarachchi, Rachael Mackay, Alison Short, Aditya Joshi, Erik Meijering</dc:creator>
    </item>
    <item>
      <title>Entanglement in the Quantum Volunteer's Dilemma</title>
      <link>https://arxiv.org/abs/2606.08227</link>
      <description>arXiv:2606.08227v1 Announce Type: cross 
Abstract: A well-known model in game theory, the Volunteer's Dilemma describes a group of $n$ players who decide whether to volunteer for a collective benefit at a personal cost, or to abstain and risk forfeiting the benefit altogether. A quantum version of this dilemma, developed within the Eisert-Wilkens-Lewenstein framework, allows each player to manipulate one qubit of a shared entangled state, leading to symmetric Nash equilibria with higher expected payoffs than in the classical game. Existing analyses, however, assume maximal entanglement. Within the same framework, we introduce a generalized Quantum Volunteer's Dilemma with a tunable entanglement parameter $\gamma$ and study the extent to which equilibrium behavior depends on the level of entanglement. We derive explicit conditions relating $\gamma$, the number of players, and the players' strategies under which symmetric Nash equilibria exist, focusing on two canonical strategy profiles: one for $2\leq n\leq 9$, and one for even $n$. We find that maximal entanglement is not required to sustain symmetric equilibria. Instead, equilibrium behavior persists above a threshold value, which we compute analytically in both cases. We also demonstrate that the threshold value directly depends on system size. This characterization is directly relevant for implementations on resource-constrained quantum devices, where entanglement is inherently limited.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08227v1</guid>
      <category>quant-ph</category>
      <category>cs.GT</category>
      <category>econ.TH</category>
      <category>math-ph</category>
      <category>math.MP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Noah Dane Hebdon, Dax Enshan Koh</dc:creator>
    </item>
    <item>
      <title>Post-Rejection Follow-up Sampling: A Methodology for Counterfactual Outcome Measurement in Algorithmic DEX Trading</title>
      <link>https://arxiv.org/abs/2606.08228</link>
      <description>arXiv:2606.08228v1 Announce Type: cross 
Abstract: Algorithmic trading systems on decentralised exchanges (DEXs) reject most candidate tokens they evaluate. The counterfactual outcome of rejected candidates (what would have happened had the system entered) is rarely measured. This paper introduces Post-Rejection Follow-up Sampling (PRFS). A separate tracking subsystem samples each rejected token's price and liquidity at a configurable cadence, over a horizon of up to twenty-four hours. PRFS produces the data needed to evaluate filter precision against actual market outcomes of rejected candidates, not against synthetic backtest reconstructions. The methodology, data architecture, and deposit format are described in Section III. The companion dataset contains 67,000 forward-outcome observation rows across 2,997 rejection events spanning 457 unique mints, collected over a continuous eight-day window (2026-04-10 to 2026-04-19, UTC). Approximately 55 percent of rejection events receive at least one forward observation; coverage at the mint level is complete. The principal binding constraint on downstream classification is per-event horizon density, not event-level coverage. PRFS is dataset-independent. It generalises to any algorithmic decision system in which rejections substantially outnumber executions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08228v1</guid>
      <category>q-fin.TR</category>
      <category>cs.LG</category>
      <category>q-fin.CP</category>
      <category>q-fin.ST</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.5281/zenodo.20043516</arxiv:DOI>
      <dc:creator>Arati Uday Kamat</dc:creator>
    </item>
    <item>
      <title>AeroSpectra Sentinel: An Auditable LLM Prompt-Chaining Decision-Support Workflow for Acute Asthma Risk Assessment from Respiratory Sounds and Clinical Signals</title>
      <link>https://arxiv.org/abs/2606.08247</link>
      <description>arXiv:2606.08247v1 Announce Type: cross 
Abstract: Acute asthma risk assessment requires rapid interpretation of respiratory sounds, oxygenation, airflow limitation, speech ability, work of breathing, mental status, and response to reliever therapy. Conventional audio-only classifiers can detect wheeze-like patterns but often lack transparent clinical reasoning and safe escalation logic. This paper presents AeroSpectra Sentinel, a client-side research prototype and decision-support workflow that combines short-time Fourier transform (STFT) respiratory sound analysis, lightweight machine-learning screening, clinical feature fusion, and a five-stage large language model (LLM) prompt-chaining process. The workflow separates signal acquisition, preprocessing, acoustic feature extraction, ML screening, clinical guardrails, and FHIR-ready reporting. We evaluated the audio screening component on a public respiratory sound dataset containing 1,211 WAV recordings from five labels. Using a stratified subset of 584 recordings, a random forest achieved 91.10% binary accuracy and 78.69% F1-score for asthma-vs-non-asthma screening, while a feature-based multilayer perceptron achieved 89.73% accuracy and 78.26% F1-score. A compact log-spectrogram CNN achieved 73.29% accuracy and 55.17% F1-score. Multiclass classification achieved 77.40% accuracy and 77.23% macro-F1. To evaluate the LLM workflow, we conducted a scenario-based audit on 40 simulated clinical vignettes comparing one-shot prompting, prompt chaining, prompt chaining with guardrails, and prompt chaining with guardrails plus FHIR schema validation. The guardrail-plus-schema variant achieved the strongest simulated safety and documentation consistency. AeroSpectra Sentinel is intended as a research prototype, not as a diagnostic medical device or clinically validated risk-assessment product.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08247v1</guid>
      <category>eess.AS</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>eess.SP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Aueaphum Aueawatthanaphisut</dc:creator>
    </item>
    <item>
      <title>QnRL: Quantum-Native Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2606.08276</link>
      <description>arXiv:2606.08276v1 Announce Type: cross 
Abstract: Quantum reinforcement learning (QRL) is a promising approach to learn effective decision strategies across several applications with stochastic environments. Instead of directly modeling the random variables that govern these environments, existing QRL architectures indirectly approximate environment behavior by estimating expected outcomes, which limits their expressive power and adaptive potential. Overcoming such challenges requires a novel QRL approach that exploits the distributional nature of quantum computers to directly model environment random variables as quantum state distributions. Hence, in this paper, a novel framework dubbed quantum-native reinforcement learning (QnRL) is proposed. QnRL is a distributional RL framework that learns conditional distributions naturally in Hilbert space via superimposed and entangled quantum states. Thus, QnRL can directly model the behavior of stochastic learning environments via the natural properties of quantum systems. QnRL accomplishes this via a novel, proposed quantum amplitude kickback (QuAK) algorithm that enables comparing the $n$-th power of the $m$-th moment of multiple superimposed distributions. It is theoretically proven that a conditional action policy distribution is distilled from the moments of a quantum generative model entirely within Hilbert space via QuAK, and optimized via QnRL. This complex distribution composition is also shown to provide extra dimensions for expressing environment correlations that are unknown to purely classical and classically-sampled quantum distributional models. Experimental results across diverse environments show that QnRL achieves up to $82.9\%$ higher evaluation scores, with up to $94.3\%$ fewer parameters on average, more accurately estimates the expected return for unseen observations, and better adapts to varying stochastic conditions compared to the baseline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08276v1</guid>
      <category>quant-ph</category>
      <category>cs.ET</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Alexander DeRieux, Walid Saad</dc:creator>
    </item>
    <item>
      <title>Strategic Type Spaces</title>
      <link>https://arxiv.org/abs/2606.08297</link>
      <description>arXiv:2606.08297v1 Announce Type: cross 
Abstract: We provide a strategic foundation for information: in any given game with incomplete information we define strategic quotients as information representations that are sufficient for players to compute best-responses to other players. We prove 1/ existence and essential uniqueness of a minimal strategic quotient called the Strategic Type Space (STS) in which a type is given by an interim correlated rationalizability hierarchy and represents a set of beliefs over other players' types and nature that rationalize this hierarchy and 2/ that the minimal STS has a recursive structure that is captured by a finite automaton.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08297v1</guid>
      <category>econ.TH</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Olivier Gossner, Rafael Veiel</dc:creator>
    </item>
    <item>
      <title>MEC-Cox: Machine-Learning-Assisted Generalized Entropy Calibration for ATT Marginal Hazard-Ratio Estimation</title>
      <link>https://arxiv.org/abs/2606.08305</link>
      <description>arXiv:2606.08305v1 Announce Type: cross 
Abstract: Externally controlled survival trials are increasingly used when concurrent randomized controls are infeasible, particularly in oncology and rare-disease settings with time-to-event endpoints. We target an average-treatment-effect-on-the-treated (ATT)-type marginal hazard-ratio estimand, comparing treatment with counterfactual control in the treated trial population, and estimate it using inverse-probability-weighted (IPW) Cox regression. Valid inference is challenging because IPW Cox regression depends on the weights through both event contributions and risk-set averages, making flexible machine-learning nuisance estimation difficult to incorporate directly. Building on machine-learning-assisted generalized entropy calibration (MEC) by Lee and Kim (2026), we propose MEC-Cox for ATT-weighted IPW Cox regression. The method begins with normalized source-propensity-score odds weights for external controls and then applies Bregman calibration to balance cross-fitted prognostic summaries between external controls and treated trial patients. The calibration basis may include control-survival predictions, Cox linear predictors, penalized-survival-model predictions, or other prognostic-score summaries. MEC-updated weights therefore play a dual role as source-transport and prognostic-score balancing weights. We establish consistency, characterize a calibration-induced efficiency gain, and develop a stacked sandwich variance estimator. Simulations show that MEC-Cox can reduce bias, increase efficiency, and improve coverage through flexible machine-learning-assisted adjustment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08305v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Se Yoon Lee, Yonghyun Kwon, Jae Kwang Kim</dc:creator>
    </item>
    <item>
      <title>MetaboliSim: a Python implementation of the Mader model for dynamic and steady-state simulation of muscular energy metabolism</title>
      <link>https://arxiv.org/abs/2606.08366</link>
      <description>arXiv:2606.08366v1 Announce Type: cross 
Abstract: The Mader model is the most widely used mathematical framework for muscular energy metabolism in German-language sport science, underpinning lactate diagnostics, maximal lactate steady state (MLSS) estimation and training prescription. Despite decades of use, neither its dynamic ODE formulation nor its steady-state equations have been available as open code, leaving results based on the model impossible to reproduce independently. We close this gap with MetaboliSim, an open-source Python implementation of both formulations: a dynamic model that integrates the five-variable ODE system (phosphate potential, $\dot{V}\mathrm{O}_2$, muscle and blood lactate, and glycogen) with a fourth-order Runge-Kutta scheme, and a steady-state model that computes MLSS power and the lactate-power relationship in one- and two-compartment variants. We verified implementation correctness against published reference values and assessed physiological plausibility across constant-load, step-test, sprint and running protocols. The implementation reproduces the published reference output within stated tolerances and remains numerically stable throughout (halving the time step changes blood lactate by less than 0.01 mmol/L), with both formulations yielding congruent MLSS estimates. Key physiological behaviour ($\dot{V}\mathrm{O}_2$ on-kinetics, lactate accumulation, PCr dynamics and the sub/supra-MLSS separation) emerges directly from the model equations without protocol-specific tuning, and a sensitivity analysis shows MLSS power varying approximately linearly with $\dot{V}\mathrm{O}_{2\max}$ and nonlinearly with $\dot{V}\mathrm{La}_{\max}$. As the first openly available implementation of the complete Mader model (AGPL-3.0), MetaboliSim lets independent groups reproduce, verify and build on published model-based results. Source code: https://codeberg.org/3phos/metabolisim; Platform: https://metabolisim.org</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08366v1</guid>
      <category>q-bio.QM</category>
      <category>cs.MS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Katharina Dunst, Vincent Scharf, Clemens Hesse, Alexander Asteroth</dc:creator>
    </item>
    <item>
      <title>Programmable Silicon Retina on Pixel Processor Array</title>
      <link>https://arxiv.org/abs/2606.08370</link>
      <description>arXiv:2606.08370v1 Announce Type: cross 
Abstract: Standard dynamic vision sensors approximate retinal processing by detecting temporal contrast changes, offering high speed and high dynamic range. In this work, we explore whether incorporating additional biologically inspired processing stages - specifically spatial filtering and gain control - can offer advantages for certain downstream tasks such as saliency prediction. We present the first implementation of a multi-stage Silicon Retina model on the SCAMP-5 Pixel Processor Array, along with a GPU-based simulation framework. We evaluate the performance of our model on Video Intensity Reconstruction and Video Saliency Prediction. While the bio-inspired model is less effective at reconstructing absolute intensity frames, it achieves a 13\% reduction in saliency prediction loss in comparison to standard DVS event representation, while reducing the event rate by approximately 47\%. These experiments are obtained using a lightweight $\approx 100$k-parameter FireNet-style network, adapted from event-based reconstruction to saliency prediction. These results suggest that the silicon retina's "information distillation" mechanism can achieve a more efficient representation for downstream neural networks, particularly in bandwidth-constrained edge applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08370v1</guid>
      <category>eess.IV</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Maciej Lewandowski, Prince Philip, Alexandre Marcireau, Chetan Singh Thakur, Andr\'e van Schaik, Piotr Dudek</dc:creator>
    </item>
    <item>
      <title>A Switching Beamformer for Highly Non-Stationary Environments</title>
      <link>https://arxiv.org/abs/2606.08385</link>
      <description>arXiv:2606.08385v1 Announce Type: cross 
Abstract: Adaptive beamforming is a cornerstone of array signal processing, yet its performance often collapses in the face of complex, rapidly changing interference. When interferers appear or move unpredictably, conventional estimators encounter a fundamental memory trade-off: short windows enable rapid tracking but suffer from high estimation variance, while long windows provide stable rejection but fail to adapt to shifts. This challenge is resolved by introducing the Universal Switching Beamformer (USB), which integrates competitive sequential prediction into the beamforming architecture. By employing a linear transition diagram, the USB implicitly maintains an exponentially large family of candidate covariance histories and dynamically re-weights them based on their cumulative output power. This mechanism allows the beamformer to automatically vary its effective memory length without explicit change detection or heuristic parameter tuning. A theoretical upper bound is proven on the regret relative to an omniscient oracle that selects the best piecewise-stationary covariance model in hindsight. Extensive simulations and experiments on the SwellEx-96 dataset demonstrate that the USB achieves the agility of short-window estimators and the precision of long-term integration, providing a principled solution for tracking highly non-stationary scenes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08385v1</guid>
      <category>eess.SP</category>
      <category>cs.IT</category>
      <category>cs.SD</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <category>math.IT</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Manan Mittal, Ryan M. Corey, John R. Buck, Andrew C. Singer</dc:creator>
    </item>
    <item>
      <title>X-Palm: Paired Multispectral-to-Smartphone Dataset for Cross-Domain Palmprint Authentication</title>
      <link>https://arxiv.org/abs/2606.08437</link>
      <description>arXiv:2606.08437v1 Announce Type: cross 
Abstract: Palmprint modality offers a privacy-preserving biometric solution, yet its deployment is hindered by the domain gap between controlled enrollment and unconstrained authentication. Existing datasets are largely restricted to controlled setups and fail to capture the compound variability of real-world environments. In this paper, we introduce X-Palm, a cross-domain dataset comprising 6,006 palm images from 103 individuals (206 hands). To the best of our knowledge, X-Palm is the first palmprint dataset providing novel paired-identity acquisition specifically designed to bridge the gap between reliably controlled multispectral enrollment and unconstrained mobile authentication while encompassing a broad spectrum of in-the-wild variability. Unlike existing datasets that focus on single to a few variations, X-Palm addresses the massive modality and environmental shifts encountered in practical deployments by capturing paired data for identities across two distinct domains: (1) a controlled Multispectral Palmprint setting using our custom-developed scanner, and (2) an unconstrained smartphone palmprint setting that is participant-driven, incorporating simultaneous variations in hardware, hand pose, illumination, background, camera-to-hand distance, perspective, and palm surface conditions (e.g., moisture and occlusions). Our extensive benchmarks of 12 SOTA models reveal that while existing methods achieve high performance on controlled data, they experience severe performance collapse on X-Palm. Conversely, models trained on X-Palm demonstrate consistent robustness across domains, positioning X-Palm as a valuable resource for training a model towards real-world, cross-domain generalization. Data access instructions and the related benchmarking codes are publicly available at: https://github.com/X-Palm/X-Palm-2026</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08437v1</guid>
      <category>eess.IV</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jamal Seyedmohammadi, Pai Chet Ng, Angelo Genovese, Zhixiang Chi, Jeannie Lee, Konstantinos N. Plataniotis</dc:creator>
    </item>
    <item>
      <title>Improving Bayesian Optimization via Training-Aware Conditional Diffusion Models</title>
      <link>https://arxiv.org/abs/2606.08438</link>
      <description>arXiv:2606.08438v1 Announce Type: cross 
Abstract: Bayesian optimization (BO) is a widely used approach for black-box optimization that uses a Gaussian process (GP) as a surrogate and guides sequential evaluations via an acquisition function, with the ultimate goal of locating the global optimum $\mathbf{x}^{\star}$. To align with this goal, information-based acquisition functions such as Predictive Entropy Search (PES) model $\mathbf{x}^{\star}$ as a random variable and reduce the entropy of its distribution, but approximating this distribution via traditional GP posterior sampling is computationally expensive. To address this limitation, we leverage Conditional Diffusion Models (CDMs) to efficiently approximate the distribution of $\mathbf{x}^{\star}$ and develop BO-inherent training strategies for CDMs. Motivated by the structural properties of the CDM-learned distribution, we further develop an acquisition strategy termed Diffusion-based Mode Seeking (DMS) to guide the sequential evaluation. We establish a sub-optimality guarantee for the CDM-learned distribution and demonstrate through extensive experiments that DMS outperforms standard BO baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08438v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yilin Zheng, Haowei Wang, Szu Hui Ng, Enlu Zhou</dc:creator>
    </item>
    <item>
      <title>Quantum Kravchuk Transform using $\mathfrak{su}(2)$ fast-forwarding</title>
      <link>https://arxiv.org/abs/2606.08443</link>
      <description>arXiv:2606.08443v1 Announce Type: cross 
Abstract: We present a quantum algorithm for the Kravchuk transform that scales logarithmically in both the dimension and the inverse of the error parameter. The quantum Kravchuk transform maps computational basis states to states with amplitudes proportional to Kravchuk functions. We achieve this by combining two key techniques: the structural relationship between the Kravchuk transform and the Lie algebras $\mathfrak{su}(2)$, and a recent fast-forwarding simulation method for $\mathfrak{su}(2)$ operators in the oscillator representation. More precisely, we first establish the map from Kravchuk transform in computational basis to $\mathfrak{su}(2)$ in Fock basis. Then built on this connection, we apply the fast-forwarding to achieve an efficient quantum Kravchuk transform.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08443v1</guid>
      <category>quant-ph</category>
      <category>cs.CC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chaowen Guan, Akshit Katiyar</dc:creator>
    </item>
    <item>
      <title>LOTTERY: Learning from Reference-Only Samples in Two-Sample Testing under Size Asymmetry</title>
      <link>https://arxiv.org/abs/2606.08460</link>
      <description>arXiv:2606.08460v1 Announce Type: cross 
Abstract: Data-adaptive two-sample testing assesses if two samples come from the same distribution, using a discrepancy learned from the data (e.g., via kernel-based feature representations). Such methods typically rely on data splitting to decouple learning from testing and control type I error. However, this paradigm is ill-suited to few-shot settings with severe sample-size imbalance: abundant reference samples are available, while only a handful of query samples arrive. In this paper, we show how this imbalance can be leveraged constructively. Using abundant reference data, we learn reference-dependent representations that summarize salient structure of the reference distribution and provide informative signals for detecting departures. We incorporate a collection of representation families that capture both global and local structure, and adaptively weight them using only reference samples via an uncertainty-guided principle. Theoretically, we establish permutation-based type I error control and show consistency of the aggregated test: as the sample sizes grow, the test power converges to one whenever the representation set contains at least one consistent representation. Empirically, our aggregation achieves strong performance across a range of benchmarks while retaining type I error control.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08460v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>ICML 2026</arxiv:journal_reference>
      <dc:creator>Xunye Tian, Zhijian Zhou, Liuhua Peng, Feng Liu</dc:creator>
    </item>
    <item>
      <title>Querying Counterfactuals on Tissue Graphs with Supervised Disentanglement</title>
      <link>https://arxiv.org/abs/2606.08493</link>
      <description>arXiv:2606.08493v1 Announce Type: cross 
Abstract: \textit{Tissue graph counterfactuals} ask how a cell's expression would change under altered spatial neighbor contexts. Such queries are central to predicting cell behavior in tissues, but lack a unified definition, with existing methods targeting specific intervention types or treating cells as i.i.d. In this work, we first formalize \textit{tissue graph counterfactuals} as a class of spatial interventions that either rewire connections between cells (\textit{edge perturbation}) or modify the expression of their neighbors (\textit{node perturbation}). We then introduce \textit{Cellina} {\renewcommand{\thefootnote}{\ddag}\footnote{https://cellina.readthedocs.io}\addtocounter{footnote}{-1}}, a framework that uses supervised disentanglement to decompose a cell's intrinsic state from its spatial context, using the latter as a conditioning input for counterfactual predictions. Across benchmarks spanning over 2.5 million spatially-resolved cells in colorectal cancer and mouse brain, \textit{Cellina} outperforms spatially-informed and non-spatial competitors in tissue perturbations, disentanglement, and scalability. Additionally, we show that \textit{Cellina} reveals biologically distinct cancer subdomains in an unsupervised manner and enables targeted neighbor perturbation simulations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08493v1</guid>
      <category>q-bio.GN</category>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Abdul Moeed, Stefan Schrod, Martin Rohbeck, Marc Jan Bonder, Pavlo Lutsik, Oliver Stegle, Daniel Dimitrov</dc:creator>
    </item>
    <item>
      <title>Fast and Robust On-Device Speaker Diarization: Relative Minimum Cluster Size for Stride-Accelerated Pipelines</title>
      <link>https://arxiv.org/abs/2606.08505</link>
      <description>arXiv:2606.08505v1 Announce Type: cross 
Abstract: Speech applications such as meeting transcription and voice agents would benefit from on-device speaker diarization, but practical adoption is limited by inference cost. We study how far a Pyannote 3.1-based pipeline can be accelerated on consumer hardware (an RTX 5070 Ti GPU and an Apple M4 laptop) while preserving diarization error rate (DER). A simple recipe: coarser segmentation stride and per-chunk embedding, yields multi-fold speedups and is DER-neutral on AMI, but degrades sharply on in-the-wild data: on VoxConverse, DER rises from 0.075 to 0.113. We trace the failure to speaker under-counting in the clustering stage, caused by a fixed minimum cluster size interacting with the reduced number of embeddings per speaker. We propose a relative minimum cluster size, mcs = round(f * n) with f = 0.01, which adapts to the embedding budget per recording. A single value of f recovers VoxConverse DER to 0.079 (about 89% of the lost accuracy) while keeping AMI flat, and the accelerated pipeline reaches up to 12.2x speedup on AMI (MPS) over our CAM++ baseline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08505v1</guid>
      <category>eess.AS</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Fumiaki Yamaguchi</dc:creator>
    </item>
    <item>
      <title>Almost balanced ordered biclique covering of graphs</title>
      <link>https://arxiv.org/abs/2606.08506</link>
      <description>arXiv:2606.08506v1 Announce Type: cross 
Abstract: Let $f(n,k)$ be the minimum size of a collection of bicliques such that (i) every edge of the complete graph $K_n$ is covered by at least one and at most $k$ bicliques in the collection, and (ii) for each edge $\{u,v\}$, the number of bicliques in which $u$ appears in the first class and $v$ in the second class differs by at most one from the number of bicliques in which $u$ appears in the second class and $v$ in the first class.
  For $k=1$, $f(n,k)$ reduces to the biclique partition number of $K_n$, and the Graham--Pollak theorem gives $f(n,1)=n-1$. For $k=2$, $f(n,k)$ is the ordered biclique partition number of $K_n$, for which it is known that $c_1 n^{1/2} \le f(n,2) \le c_2 n^{1/2+o(1)}$ for some positive constants $c_1$ and $c_2$. In this note, we establish almost tight bounds for $f(n,k)$ for general $k$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08506v1</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Anand Babu, Ervin Ranjan, Maddipati Deshith Sai, Jatla Naga Sidhartha, Anagh Indu Suresh, Sreedhara Vishwas</dc:creator>
    </item>
    <item>
      <title>Fixed-Parameter Tractability of $t$-Uniform Hypergraphicality</title>
      <link>https://arxiv.org/abs/2606.08523</link>
      <description>arXiv:2606.08523v1 Announce Type: cross 
Abstract: We study the $t$-uniform hypergraphicality problem under a compressed representation of the degree sequence. Instead of listing all vertex degrees explicitly, the input consists of pairs $$ (\delta_1,n_1),\dots,(\delta_k,n_k), $$ meaning that exactly $n_i$ vertices have degree $\delta_i$. Thus the parameter $k$ denotes the number of distinct degrees.
  Although deciding $t$-hypergraphicality is NP-complete for every fixed $t&gt;2$, we prove that the problem is fixed-parameter tractable parameterized by $(k,t)$. Our result shows that tractability extends substantially beyond previously known bounded-range regimes: even degree sequences with large overall degree spread can be handled efficiently when the number of distinct degrees is bounded.
  Our approach decomposes hyperedges according to their types with respect to the degree classes, yielding a bounded-dimension spectrum representation. Using balancing hinge-flips, we show that every feasible spectrum can be transformed into a realization of the prescribed degree sequence. This leads to an integer programming feasibility formulation with $$ \binom{t+k-1}{k-1} $$ variables. Applying Lenstra's theorem yields an FPT algorithm running in time $$ f(k,t)\cdot \mathrm{poly}(L), $$ where $L$ denotes the encoding length of the compressed input.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08523v1</guid>
      <category>math.CO</category>
      <category>cs.CC</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Riley Brown, Istvan Miklos</dc:creator>
    </item>
    <item>
      <title>A Taxonomy of Real-World Asset Tokenization for Blockchain-Based Financial Infrastructure</title>
      <link>https://arxiv.org/abs/2606.08534</link>
      <description>arXiv:2606.08534v1 Announce Type: cross 
Abstract: Real-world asset (RWA) tokenization has emerged as a prominent application of blockchain technology, enabling off-chain financial and non-financial assets to be represented through blockchain-based instruments. However, deployed RWA systems remain difficult to compare because legal claims, custody arrangements, token mechanics, verification processes, and on-chain integrations are often described separately. This paper develops a systems-level taxonomy of RWA tokenization to classify how off-chain assets are legally, economically, and technically represented on-chain. Following an iterative taxonomy-development method, we organize twenty-three dimensions into five components: governance, asset structure, token properties, distributed ledger technology, and economy. We apply the taxonomy to twenty major RWA systems selected by market capitalization and compare their design choices across asset classes and implementation models. The classification shows that current RWA tokenization is predominantly implemented through hybrid architectures: blockchain tokens support representation, transfer control, redemption workflows, pricing, and composability, while core legal guarantees remain anchored in off-chain legal wrappers, custodial arrangements, compliance processes, and verification mechanisms. The analysis also reveals recurring documentation gaps concerning voting rights, dispute forums, burn mechanics, supply constraints, and reserve verification. Overall, the taxonomy provides a structured basis for comparing RWA systems, identifying design patterns and limitations, and supporting future research on blockchain-based financial infrastructure.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08534v1</guid>
      <category>econ.GN</category>
      <category>cs.CY</category>
      <category>q-fin.EC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Giorgio Vella, Luca Pennella, Mark C. Ballandies</dc:creator>
    </item>
    <item>
      <title>Block coordinate descent for joint delay-energy optimization in multi-hop D2D networks</title>
      <link>https://arxiv.org/abs/2606.08544</link>
      <description>arXiv:2606.08544v1 Announce Type: cross 
Abstract: In multi-hop device-to-device (D2D) networks, the optimization of network-level metrics is particularly difficult due to the tight coupling between network-layer routing and physical-layer resource allocation. Departing from traditional average-performance metrics, this paper addresses the joint optimization of routing paths, transmission power, and bandwidth allocation. We formulate a generalized cost function to minimize the maximum transmission time (i.e., the bottleneck delay) alongside the total energy consumption. To tackle the resulting highly non-convex formulation, we propose a novel block coordinate descent (BCD) framework. At the network layer, we develop two adaptive routing algorithms: a matrix-free Frank-Wolfe (MF-FW) algorithm for fast execution in dense topologies, and a low-rank primal-dual interior-point method (LR-PDIPM) that bypasses dense matrix inversions via the Sherman-Morrison formula for high-precision solutions. At the physical layer, we design a parallel dual ascent algorithm leveraging a time-domain perspective transformation to solve the resource allocation subproblem to global optimality. The proposed BCD framework is proven to converge to an {\epsilon}-neighborhood of a stationary point. Through comprehensive experiments, the proposed BCD framework establishes its superiority in achieving the optimal delay-energy trade-off. Specifically, the LR-PDIPM variant achieves a maximum 9.14-fold reduction in total energy consumption and up to an order of magnitude improvement in energy efficiency, while maintaining a bounded maximum delay gap (up to 3.78-fold) relative to the best baseline. Meanwhile, the warm-start MF-FW variant identifies near-optimal solutions in mere seconds, serving as a highly practical engineering approach.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08544v1</guid>
      <category>math.OC</category>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kai-Xiang Hu, Jacek Gondzio, Caixia Kou</dc:creator>
    </item>
    <item>
      <title>G-MaP-SE: Guided Speech Enhancement via GMM-Based Prior Matching</title>
      <link>https://arxiv.org/abs/2606.08580</link>
      <description>arXiv:2606.08580v1 Announce Type: cross 
Abstract: Using speaker embeddings as conditioning can strengthen speech enhancement, but most methods either require clean enrollment audio or rely on embeddings extracted from noisy speech, which are fragile under noise and domain shift. We propose G-MaP-SE, a guided enhancement framework that builds a clean-speech embedding prior with a Gaussian Mixture Model (GMM) and refines a noisy conditioning embedding by matching it to this prior. The matched prior embedding is then injected into a time-frequency enhancement backbone via a lightweight gated fusion module. Experiments on VoiceBank+DEMAND and DNS Challenge 2020 datasets show that the proposed prior matching consistently outperforms noisy conditioning and substantially narrows the gap to an oracle clean-conditioning upper bound, while requiring no enrollment audio at inference time. The code, audio samples, and checkpoint are available.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08580v1</guid>
      <category>eess.AS</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yike Zhu, Ziqian Wang, Zikai Liu, Xingchen Li, Zhuangqi Chen, Xianjun Xia, Chuanzeng Huang, Lei Xie</dc:creator>
    </item>
    <item>
      <title>Improving the sharpness in neural network-based parametric post-processing of ensemble forecasts</title>
      <link>https://arxiv.org/abs/2606.08587</link>
      <description>arXiv:2606.08587v1 Announce Type: cross 
Abstract: Statistical post-processing has proven to be an effective tool in improving ensemble forecast of different weather variables. Case studies show that post-processing can remedy the typically underdispersive and potentially biased behaviour of the ensemble while optimizing a proper scoring rule expressing the forecast skill. The price of these positive effects is generally a deterioration in sharpness; the width of the central prediction intervals and the uncertainty of the predictions are increasing, especially for shorter lead times. This work aims to reduce the extent of the latter phenomenon for neural network-based parametric post-processing methods by extending the network's loss function with a penalty term. We demonstrate the effect of the proposed technique for 2m temperature ensemble forecasts of the European Centre for Medium-Range Weather Forecasts downloaded from the EUPPBench benchmark dataset and verified against synoptic observations. Here, the predictive distribution is Gaussian, and we use the continuous ranked probability score (CRPS) as loss function. The case studies confirm a substantial relative decrease ($8.2\%-12.5\%$) in the width of the nominal central prediction interval compared to the width of the predictive distribution computed without the penalty term, while there is no deterioration in the mean CRPS of probabilistic forecasts and in the RMSE of the predictive mean.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08587v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>\'Agnes Baran, M\'at\'e Mihalina</dc:creator>
    </item>
    <item>
      <title>Coil-Integrated Alignment Sensor for Real-Time Feedback of Coil-Scalp Contact Point and Angle During Transcranial Magnetic Stimulation (TMS)</title>
      <link>https://arxiv.org/abs/2606.08618</link>
      <description>arXiv:2606.08618v1 Announce Type: cross 
Abstract: Whereas coil positioning in transcranial magnetic stimulation (TMS) to reach a specific cortical target with modern focal stimulation coils has been intensively studied, the alignment and contact of a coil with the head is often ignored. Focal figure-of-eight coils have a point on the surface, where they generate the largest induced electric field. This point should touch the head first, and the coil should be approximately tangential to the head in this point. Previous research has demonstrated the large impact if the coil does not touch the head with the right point and that many operators struggle with establishing or maintaining the correct coil-scalp alignment. This paper presents a technological support technology that can monitor the exact position of the contact point and also pressure to provide feedback to users. As the system uses exclusively components from consumer electronics, the sensor is low-cost and affordable. Through proper design, we achieved sufficient robustness so that the sensor does neither reset during TMS pulses and also not show any detectable degradation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08618v1</guid>
      <category>physics.med-ph</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>B. Seyed, M. Koehler, S. M. Goetz</dc:creator>
    </item>
    <item>
      <title>Parameter Tuning with Generalization Guarantees for GPU-Accelerated Linear Programming</title>
      <link>https://arxiv.org/abs/2606.08638</link>
      <description>arXiv:2606.08638v1 Announce Type: cross 
Abstract: Recent research has developed practical, parallelizable first-order methods for large scale linear programming, but performance is highly dependent on hyperparameter selection. We derive generalization guarantees for hyperparameter tuning within (cu)PDLP, a state-of-the-art first-order LP solver designed for modern hardware. First, we pin down the behavior of PDHG, the primal-dual hybrid gradient algorithm that underlies PDLP, as a function of its step size and primal weight, leading to linear sample complexity guarantees for learning those parameters. We then conduct a structural analysis of PDLP, which augments PDHG with several specialized techniques like preconditioning, adaptive step sizes, averaging, adaptive restarts, and smoothed primal weight updates. Our analysis captures the behavior of the solution trajectory as a function of the hyperparameters and leverages recent advances in data-driven algorithm design to obtain polynomial sample complexity guarantees for learning those hyperparameters. Finally, we conduct proof-of-concept experiments that demonstrate the need for data-driven PDLP parameter tuning. Our results showcase the versatility of the data-driven algorithm design toolkit for principled hyperparameter tuning within solver-grade implementations of complex modern optimization algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08638v1</guid>
      <category>math.OC</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Siddharth Prasad, Dravyansh Sharma</dc:creator>
    </item>
    <item>
      <title>Nonlocal Teams and Information Structures</title>
      <link>https://arxiv.org/abs/2606.08645</link>
      <description>arXiv:2606.08645v1 Announce Type: cross 
Abstract: We look at Bell inequalities from the lens of information structures in stochastic teams. We consider the usual CHSH game and a dynamic variant of the same to study how various classes of strategies, classical, projective and quantum, behave under team theoretic solution concepts. We find that projective strategies (where each player performs projective measurements) enjoy important properties in the usual CHSH game, but they do not carry over to its dynamic version. These results shed light on the delicate interplay of information structure in quantum strategies and the fragility of some well known ideas under changes of information structure.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08645v1</guid>
      <category>quant-ph</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Drishti Baruah, Sachin Teli, Ankur A. Kulkarni</dc:creator>
    </item>
    <item>
      <title>Reconstructing Synthetic SDO/AIA 193 A EUV Images from He I 10830 A Observations with Diffusion Model Translator</title>
      <link>https://arxiv.org/abs/2606.08652</link>
      <description>arXiv:2606.08652v1 Announce Type: cross 
Abstract: Routine full-disk EUV imaging has been available only since the modern era, such as SOHO and SDO. To extend EUV coronal context into earlier periods, we leverage the multi-decade availability of full-disk \HeI{} observations, whose absorption is modulated by coronal irradiance and magnetic topology and is widely used as a proxy for open-field regions. We present a diffusion-based conditional image translation framework, Coronal Hole-aware Diffusion Model Translator (CH-aware DMT), to reconstruct synthetic SDO/AIA 193 \AA{} EUV images from \HeI{} inputs. The model is trained on temporally co-aligned SOLIS \HeI{} and AIA 193 \AA{} pairs spanning 2011--2015 using a month-based split, where January--October are used for training, November is used for validation, and December for testing. On the held-out test set, the reconstructions preserve dominant full-disk EUV morphology (CC=0.92) and recover CH-related low-intensity structure (CC=0.84). We further assess historical applicability by (1) comparing reconstructed AIA 193 \AA{} morphology with SOHO/EIT 195 \AA{} over 2005--2015; (2) comparing reconstructed AIA 193 \AA{} images generated from KPVT \HeI{} inputs against Yohkoh/SXT soft X-ray observations; and (3) evaluating long-term reconstructed disk-integrated emission statistics against observational EUV series and independent solar activity proxies (sunspot number and F10.7 radio flux over 1974--2015). These results indicate that CH-aware DMT conditioned on \HeI{} can provide a physically plausible synthetic AIA 193 \AA{} coronal proxy for historical studies, supporting multi-decade analyses of large-scale coronal evolution before the direct EUV imaging was available.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08652v1</guid>
      <category>astro-ph.SR</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Marco Marena, Qin Li, Haimin Wang, Haodi Jiang, Prajwal Shah, Bo Shen</dc:creator>
    </item>
    <item>
      <title>Uncertainty Principles for the Number Theoretic Transform</title>
      <link>https://arxiv.org/abs/2606.08662</link>
      <description>arXiv:2606.08662v1 Announce Type: cross 
Abstract: Motivated by polynomial identity testing with exponentials (Li and Wu, ITCS'26), we study uncertainty principles for the number-theoretic transform (NTT). We show that the NTT satisfies strong sparsity tradeoffs: For every fixed prime $q$ and for all but finitely many primes $p \equiv 1 \pmod q$ every nonzero $f\in \mathbb F_p^{\mathbb Z_q}$ and its number-theoretic transform $\hat f$ satisfy \[ |\mathrm{Supp}(f)| + |\mathrm{Supp}(\hat f)| \ge q+1. \] Thus, a $k$-sparse function has transform support at least $q-k+1$. As our main technical contribution, we prove a probabilistic version of the above uncertainty principle, averaged over primes $p$, in the regime $p=q^{O(1)}$.
  As an application, we obtain a black-box identity test for $k$-sparse exponential polynomials of degree at most $d$ with vanishing soundness error, for $q$ moderately larger than $k$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08662v1</guid>
      <category>math.NT</category>
      <category>cs.CR</category>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Giulio Malavolta, Alon Rosen</dc:creator>
    </item>
    <item>
      <title>Rank Intervals for Leaderboards: A Hierarchical Framework for Model Evaluation</title>
      <link>https://arxiv.org/abs/2606.08679</link>
      <description>arXiv:2606.08679v1 Announce Type: cross 
Abstract: Pretrained models are often evaluated on multi-task leaderboards to measure their applicability in diverse contexts. However, current methods for aggregating performance across tasks into leaderboard-level rankings do not address the uncertainty and variability at the task level. While recent works have proposed interval-based model rankings, the principled aggregation of uncertainty from individual tasks to leaderboard-level rankings remains unaddressed, and variation in models' performance across tasks is frequently obscured. In this work, we introduce a hierarchical framework that constructs model rank intervals with statistical guarantees at both levels: task-level rank confidence intervals from pairwise comparisons, and leaderboard-level rank prediction intervals using a conformal approach. This enables reliable quantification of model rank for each observed task and for new potential tasks. Experiments on simulated data and the TabArena and PromptEval (MMLU) benchmarks show that our method yields statistically valid and informative intervals, enabling reliable, uncertainty-aware model ranking on leaderboards.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08679v1</guid>
      <category>stat.ML</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <category>stat.ME</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bitya Neuhof, Yuval Benjamini</dc:creator>
    </item>
    <item>
      <title>Discovering and decoding latent mean-field structure with variational autoencoders</title>
      <link>https://arxiv.org/abs/2606.08694</link>
      <description>arXiv:2606.08694v1 Announce Type: cross 
Abstract: Generative models are increasingly used to capture correlations in many-body systems, but the representations they learn remain largely opaque to physical interpretation. Here, we establish an intuitive criterion that quantifies the capacity of a variational autoencoder (VAE) to faithfully reconstruct the joint probability distribution of a many body system. In a nutshell, a bound on the VAE capacity is obtained by comparing the rate of the latent channel to the bipartite mutual information of the data. Using this bound, we show that the conditionally independent decoder of any successful VAE is structurally identical to a finite-size mean-field factorization. Hence, a successful reconstruction is direct evidence for a latent mean-field theory and the microscopic parameters of that theory can be read off the trained decoder. We validate these conclusions on a hierarchy of solvable models with scalar (Curie-Weiss), vector (Hopfield) and tensor (Maier-Saupe) order parameters, recovering the full Hopfield pattern matrix from equilibrium samples alone. We find that, when applied to Salamander retinal recordings, a two-latent VAE reproduces the population statistics with only two effective collective variables allowing us to recover the `stored patterns' of the neural population and write a generalized Hopfield model which correctly models the experimental data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08694v1</guid>
      <category>cond-mat.soft</category>
      <category>cond-mat.stat-mech</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Marco Biroli, Max Welling, Vincenzo Vitelli</dc:creator>
    </item>
    <item>
      <title>EL-Shellability of the poset of ranked cactuses</title>
      <link>https://arxiv.org/abs/2606.08724</link>
      <description>arXiv:2606.08724v1 Announce Type: cross 
Abstract: Recently the poset of ranked cactuses $(\mathfrak{P}(X),\preceq)$ was introduced. For a finite set $X$, this poset consists of a set $\mathfrak{P}(X)$ of certain collections of ordered pairs of subsets of $X$ together with an ordering $\preceq$ that is similar to the refinement ordering of partitions of a finite set. In addition, the maximal chains in this poset correspond to binary ranked cactuses, a fact which can be used to construct the so-called space of equidistant cactuses. In this paper, we show that the poset of ranked cactuses is EL-shellable. As a consequence we also show that the proper part of the link of the origin of the space of equidistant cactuses has the homotopy type of a wedge of spheres.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08724v1</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Vincent Moulton, Andreas Spillner, Antonia Stavemann</dc:creator>
    </item>
    <item>
      <title>Numerical Analysis on Backward Stochastic Differential Equations by Finite Transposition Method</title>
      <link>https://arxiv.org/abs/2606.08731</link>
      <description>arXiv:2606.08731v1 Announce Type: cross 
Abstract: In this paper, we propose a finite transposition method to solve backward stochastic differential equations (BSDEs, for short). Based on the transposition solution theory for BSDEs, our method offers a promising way of efficiently computing solutions, which can be regarded as an analogous method for BSDEs as the classical finite element method for partial differential equations. Our method has the advantage of easily computable conditional expectations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08731v1</guid>
      <category>math.PR</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Penghui Wang, Yanqing Wang, Xu Zhang</dc:creator>
    </item>
    <item>
      <title>Algebra of Bivariate-Bicycle Surface Codes</title>
      <link>https://arxiv.org/abs/2606.08771</link>
      <description>arXiv:2606.08771v1 Announce Type: cross 
Abstract: We relate the properties of bivariate-bicycle-surface (BBS) codes, constructed from a pair of bivariate polynomials over a finite field, to the number and location of their common roots in the extension field. The number of roots $(x,y)$ with finite, non-zero coordinates -- counted with algebraic multiplicity -- determines the dimension of the codes. This dimension is invariant under monomial automorphisms of the Laurent polynomial ring. Conversely, roots with zero or infinite $x$- or $y$-coordinates indicate that specialized generators are required near the corresponding boundary (e.g., the left or right boundary for a root where $x$ is zero or infinite, respectively). These roots can appear or disappear under monomial transformations, which reveals the structure of tilted boundaries. Based on these results, we formulate a prescription for constructing BBS codes that works for regions with rectangular, diagonal, and arbitrarily tilted boundaries. A key advantage of this approach is that no corner corrections are needed, provided the polynomials satisfy orientation-specific edge conditions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08771v1</guid>
      <category>quant-ph</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Renyu Wang, Leonid P. Pryadko</dc:creator>
    </item>
    <item>
      <title>OptMuon: Closed-Loop Orthogonalized Momentum Methods for Stochastic Optimization with Zero-Noise Optimality</title>
      <link>https://arxiv.org/abs/2606.08783</link>
      <description>arXiv:2606.08783v1 Announce Type: cross 
Abstract: Orthogonalized momentum updates, as used in Muon-style optimizers, have recently shown strong empirical stability in large-scale deep learning. However, existing orthogonalized methods are typically paired with constant or open-loop magnitude rules, and therefore do not explicitly calibrate their update magnitudes from the observed optimization trajectory. Motivated by the closed-loop perspective behind Lipschitz-free and noise-adaptive methods, we propose OptMuon, a family of adaptive momentum orthogonalization methods for stochastic nonconvex optimization. OptMuon combines Muon-style polar-factor directions with a trajectory-dependent AdaGrad-Norm-type coefficient schedule, so that the update magnitude is determined by the observed gradient and momentum history rather than by a prescribed Lipschitz-dependent rule. The schedule does not use the smoothness constant, the variance level, or the bounded-gradient constant in parameter selection, and its running-maximum correction prevents isolated gradient spikes from causing excessive coefficient collapse. Under lower-boundedness, unbiased stochastic gradients with bounded variance, smoothness, and an almost-sure bounded stochastic-gradient condition, we prove two complementary guarantees. OptMuon-A achieves the noise-adaptive rate \(\tilde{\mathcal O}(T^{-1/2}+\sigma^{1/2}T^{-1/4})\) under average smoothness, while OptMuon-I achieves \(\tilde{\mathcal O}(T^{-1/2}+\sigma^{1/3}T^{-1/3})\) under individual smoothness. In the zero-noise regime, both bounds automatically reduce to a nearly optimal deterministic first-order rate \(\tilde{\mathcal O}(T^{-1/2})\) without manual hyperparameter retuning. These results show that closed-loop scalar adaptation can be combined with Muon-style momentum orthogonalization while retaining noise adaptivity and zero-noise optimality up to logarithmic factors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08783v1</guid>
      <category>math.OC</category>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ganzhao Yuan</dc:creator>
    </item>
    <item>
      <title>Evaluating AI Investment Strategies</title>
      <link>https://arxiv.org/abs/2606.08791</link>
      <description>arXiv:2606.08791v1 Announce Type: cross 
Abstract: We study the problem of auditing a black-box algorithmic decision-maker from observable inputs and outputs alone. Our main result is an exact decomposition: under precisely characterized conditions, the cumulative \emph{regret} of a dynamic policy equals the sum of per-period covariances between the cost vector and the policy's decision. This extends the single-period identity of Aldridge~(2026) to the full multi-period setting of stochastic dynamic programming.
  We prove the identity holds exactly under i.i.d. costs and mean-unbiased Markov policies, derive closed-form bias corrections for non-stationary and time-varying cases, and establish the discounted-horizon analog. A Bellman recursion for the covariance regret functional connects the result to standard reinforcement learning algorithms; for rolling-window policies, the estimation-error bias is $O(d/w)$.
  The decomposition has direct implications for algorithmic auditing in strategic environments: in platform mechanism design, it provides a welfare-based audit metric without access to the agent's private type; in repeated games, covariance reduction is a sufficient condition for policy improvement; in procurement and ad auctions, the bias correction quantifies welfare loss from strategic misreporting. The associated trajectory estimator is consistent, asymptotically normal with HAC variance, and computable in $O(T \cdot nd)$ time. This makes the proposed approach a tractable, model-free audit tool for platform mechanisms, algorithmic portfolio strategies, and any sequential decision system subject to external performance review.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08791v1</guid>
      <category>econ.EM</category>
      <category>cs.AI</category>
      <category>q-fin.PM</category>
      <category>q-fin.RM</category>
      <category>q-fin.ST</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Irene Aldridge</dc:creator>
    </item>
    <item>
      <title>Generalization in Nonlinear Least Squares via Learned Feature Geometry</title>
      <link>https://arxiv.org/abs/2606.08799</link>
      <description>arXiv:2606.08799v1 Announce Type: cross 
Abstract: We study the generalization of ridge-regularized nonlinear least-squares models via on-average algorithmic stability, deriving error bounds for local minimizers in terms of a data-dependent effective dimension that reflects the geometry of the gradient model at the trained parameters, through the empirical Jacobian Gram matrix and a residual--curvature term. In the linear case, where the curvature term vanishes, this recovers the classical effective dimension of the Jacobian kernel covariance, but evaluated at the trained model rather than at initialization as is typical in neural tangent kernel analyses. We further bound this effective dimension via covering complexity of the gradient features, leading to guarantees that depend on learned geometry rather than parameter count. In particular, for manifold-supported data and piecewise Lipschitz Jacobians, the bounds scale with intrinsic dimension, while for one-hidden-layer ReLU networks, the mechanism can be made explicit through counts of activation-stable regions. Experiments on synthetic manifolds, clustered distributions, and benchmark datasets illustrate trained-Jacobian compression, the tightness of the residual-curvature linearization, and agreement between the stability bound and observed generalization gaps. A key feature of our bounds is the simplicity of their derivation, which follows from first principles using the Brascamp--Lieb inequality under strongly log-concave noise.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08799v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ayub Kharel, Ilja Kuzborski, Patrick Rebeschini, Yasin Abbasi-Yadkori</dc:creator>
    </item>
    <item>
      <title>Optimal Control and Dissipativity of Linear Hermitian Matrix-Valued Dynamical Systems</title>
      <link>https://arxiv.org/abs/2606.08856</link>
      <description>arXiv:2606.08856v1 Announce Type: cross 
Abstract: We develop a unified framework for linear-cost optimal control, finite-time optimal steering, dissipativity analysis, and zero-sum differential games for linear impulsive systems whose state is a Hermitian matrix evolving in $\mathbb{H}^{n+m}_{\succeq0}$, a class that encompasses continuous- and discrete-time linear systems and switched systems as degenerate cases, and includes the second-order moment dynamics of linear (stochastic) hybrid systems. The entire theory rests on three tools: a single \emph{key identity} relating cost, trajectory, and a dual variable, an Extended Schur complement lemma, and a Schur inner-product decomposition, applied identically to the flow integral and to each jump. These yield structurally uniform sufficient and necessary conditions, dual linear matrix inequality (LMI) characterizations, and explicit optimal policies for every problem class, on both finite and infinite horizons under time-varying assumptions (without time invariance or periodicity), together with causal dwell-time policies for the problems that admit them.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08856v1</guid>
      <category>math.OC</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Corentin Briat</dc:creator>
    </item>
    <item>
      <title>SCOPE: A Syndrome-Driven Control Plane for QEC-Enabled Quantum Networks</title>
      <link>https://arxiv.org/abs/2606.08873</link>
      <description>arXiv:2606.08873v1 Announce Type: cross 
Abstract: As quantum networks evolve from experimental testbeds to fault-tolerant systems, the primary performance metric shifts from physical link fidelity to end-to-end logical error rate. However, current control planes remain ill-equipped for this transition: routing decisions are typically decoupled from Quantum Error Correction (QEC) strategies, relying on topology or scalar fidelity metrics that fail to predict how specific physical noise structures interact with logical codes. Optimizing this coupled route-and-code performance requires precise, real-time visibility into network error biases, yet traditional active tomography is operationally prohibitive due to throughput collapse and service interruption.
  We present SCOPE (Syndrome-based COntrol PlanE), a network-layer architecture that enables joint routing and coding optimization using purely passive telemetry. Instead of injecting probes, SCOPE harvests error syndromes -- the parity-check outcomes naturally generated by QEC decoders during user service. By aggregating these signals, SCOPE's inference engine reconstructs the network's time-varying error map, capturing complex, context-dependent noise correlations. This visibility drives a decision engine that proactively pushes optimal route-and-code configurations to source nodes. NetSquid and IBM-calibrated simulations show that SCOPE reduces estimation error by more than 60% relative to a standard EM baseline. In large-scale networks, this precision reduces logical error rates by 30-35% (up to 65%) against topology-aware baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08873v1</guid>
      <category>quant-ph</category>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiaojie Fan, Zian Wang, Ashutosh Tiwari, Himanshu Gupta</dc:creator>
    </item>
    <item>
      <title>Few-shot Class-variable Incremental Audio Classification via Prototype Adaptation and Pseudo Class-variable Training</title>
      <link>https://arxiv.org/abs/2606.08898</link>
      <description>arXiv:2606.08898v1 Announce Type: cross 
Abstract: In the task of few-shot class-incremental audio classification, the number of classes is assumed to always increase without considering the possibility of decrease. However, the number of classes generally increases or decreases in practice. In this paper, we investigate a problem of Few-shot Class-variable Incremental Audio Classification (FCIAC), in which the number of classes increases or decreases. We propose a FCIAC method using prototype adaptation and pseudo class-variable training. The model in our method consists of an encoder and a classifier. The classifier is initialized by a class-variable prototype adaptation network, whose structure dynamically changes with the change of classes. In addition, we design a pseudo class-variable training strategy to enhance the model's adaptability to changing classes. Experiments on three public datasets show that our method exceeds previous methods in average accuracy. The code is at: https://github.com/cgq2971-afk/FCIAC.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08898v1</guid>
      <category>eess.AS</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yanxiong Li, Guoqing Chen, Qianqian Li, Sen Huang</dc:creator>
    </item>
    <item>
      <title>Estimate Collapsibility of Causal Effects in Completed Partial DAGs via Strong d-Convex Hulls</title>
      <link>https://arxiv.org/abs/2606.08941</link>
      <description>arXiv:2606.08941v1 Announce Type: cross 
Abstract: This paper proposes a collapsible method for estimating causal effects that maintains the estimator's consistency before and after marginalization over some variables in completed partially directed acyclic graphs (CPDAGs). We first introduce the estimate collapsibility for CPDAGs and characterize the minimal collapsible sets as strong d-convex hulls. An efficient algorithm is devised to obtain such sets in DAGs and is generalized to CPDAGs. Then, we combine the graph reduction procedure with the IDA framework. Finally, experiments and empirical analysis show the effectiveness of the collapsibility for causal estimations in CPDAGs. Code is available at https://github.com/Jamyang-D/strongly-convex.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08941v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuxin Deng, Yi Sun, Zhiming Li, Huaxiong Liu</dc:creator>
    </item>
    <item>
      <title>A systematic investigation of molecular encoding methods for drug property predictions across neural network and Transformer encoder-based model</title>
      <link>https://arxiv.org/abs/2606.08973</link>
      <description>arXiv:2606.08973v1 Announce Type: cross 
Abstract: Fundamental investigations into how different molecular encoding methods affect molecular property prediction remain relatively limited. In this study, we extensively examined the optimal molecular encoding methods for molecular properties prediction using two prevalent structure designs: a classical neural network model (MLP) and a Transformer encoder-based model (MLP+TL). For molecular encoding methods, we investigated several types of fingerprints, including traditional topological fingerprints, substructure-based fingerprints, and string-based representations. These two models were trained on seven well-known molecular datasets to evaluate different input molecular encoding methods based on evaluation metrics. On several biologically relevant classification tasks, including toxicity, mutagenicity, and side-effect prediction, our models consistently achieved average AUC values above 0.9. Rather than relying on external post-hoc explanation methods such as the local interpretable model-agnostic explanation (LIME) or the Deep SHapley Additive exPlanations (SHAP), we leveraged the model's intrinsic attention weights as an internal interpretability signal for identifying potentially important feature. The MLP+TL model using MACCS and PubChem as input can capture chemically interpretable groups that determined the major blood-brain barrier (BBB) permeability and mutagenicity in Salmonella typhimurium. In particular, a comparison between Morphine and Heroin highlighted the role of hydroxyl-related substructures in BBB permeability prediction, which was consistently reflected in the attention weights. Overall, our findings provide practical guidance for selecting effective molecular encoding methods and contribute to the development of interpretable molecular informatics approaches for drug discovery.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08973v1</guid>
      <category>q-bio.QM</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sheng-Ya Chen, Shan-Ju Yeh</dc:creator>
    </item>
    <item>
      <title>Dynamics in a Low-Rank Separable Field Cellular Automaton</title>
      <link>https://arxiv.org/abs/2606.08983</link>
      <description>arXiv:2606.08983v1 Announce Type: cross 
Abstract: Complex collective dynamics in cellular automata are usually associated with local-neighborhood combinatorics, yet it remains unclear whether long-lived dynamical organization requires such explicit local interaction structure. Here, we introduce a Separable-Field Cellular Automaton (SFCA), a normalized-field cellular automaton in which local neighbor counting is replaced by a rank-one-like row-column field. Each cell is updated according to a normalized field, with survival and birth governed by two threshold intervals. Systematic scans over interval widths and positions revealed four outcome classes: extinction, fixed points, cycles, and long transients. The outcome phase diagram was organized by the relative geometry of the survival and birth intervals: fixed points dominated when born interval was contained in survival interval, whereas long transients concentrated near the boundary between partial overlap and no overlap. A fine scan along this transition showed that the long-transient region forms a narrow but persistent ridge separating two qualitatively distinct cycle-dominated regimes. One side produced dense, high-change-rate cycles approximating global period-2 alternation, whereas the other produced sparse, low-change-rate, stripe-like cycles. Damage-spreading further supported a basin-competition interpretation, in which the long-transient ridge reflects delayed selection between two cyclic attractor families rather than random nonconvergence, while finite-size analysis shows that the long-transient ridge remains robust across tested grid sizes. These results show that structured long-transient dynamics can arise under compressed separable field coupling, suggesting that nontrivial collective organization does not necessarily require full local-neighborhood combinatorics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08983v1</guid>
      <category>nlin.CG</category>
      <category>cs.FL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaorui Shi, Mengsha Huang</dc:creator>
    </item>
    <item>
      <title>Not All Warm Starts Help: Benchmarking Primal-Dual Initializations for ACOPF Algorithms</title>
      <link>https://arxiv.org/abs/2606.08984</link>
      <description>arXiv:2606.08984v1 Announce Type: cross 
Abstract: Warm starts are widely used to accelerate AC optimal power flow (ACOPF) solves, but the impact of different initialization strategies has received limited systematic study, particularly for the primal-dual interior-point methods that dominate large-scale ACOPF algorithms. This paper benchmarks initialization strategies for ACOPF solved with the interior-point solver IPOPT on 19 PGLib-OPF instances (5 to 30,000 buses), testing all 15 non-empty subsets of the primal blocks $\{P_g, Q_g, V_m, V_a\}$ under oracle conditions and three DC-seeded combinations in a practical setting. The experiments show that most partial primal-plus-dual restarts increase solve time or reduce convergence reliability. Among the oracle primal-plus-dual (O-PD) configurations, only the complete restart reliably converges on every baseline-convergent case, reaching a $47.6\%$ median solve-time speedup. Twelve of the 14 partial O-PD combinations have negative median speedups, and several fail repeatedly on larger networks. Decomposing the dual into constraint and bound multipliers shows that \emph{coverage}, not the presence of duals per se, governs robustness: the full bound-multiplier vector reaches 90.7\% convergence and a $+26.8$\% median speedup, whereas block-matched coverage (oracle multipliers on some bounds, defaults on the rest) drops to 70.4\% and $-31.1$\%. Practical DC seeding sometimes helps the AC solve, but the benefit is no longer statistically significant once the DCOPF presolve cost is included in the end-to-end comparison ($p = 0.4171$). For learned warm-start methods, the results support the following target ordering: predict the full primal vector first; if only partial coverage is possible, prioritize voltage variables; and avoid partial or inconsistent dual predictions unless the primal estimate is nearly complete.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.08984v1</guid>
      <category>math.OC</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Babak Taheri, Daniel K. Molzahn</dc:creator>
    </item>
    <item>
      <title>Multi-Armed Bandits with Arriving Arms: Sequential Screening, Dynamic Regret, and Sublinear Guarantees</title>
      <link>https://arxiv.org/abs/2606.09002</link>
      <description>arXiv:2606.09002v1 Announce Type: cross 
Abstract: We study a stochastic multi-armed bandit problem in which the set of available arms expands over time. This setting arises in sequential experimentation when new actions or treatments become available during an ongoing study, making regret against a single best arm in hindsight inappropriate. We instead evaluate performance relative to the best arm currently available, leading to a dynamic-regret criterion for arriving-arm environments. To address the resulting challenges of arrival information discrepancy (AID) and a drifting benchmark (DB), we propose UCB for Arriving Arms (UCB-AA), an elimination-based procedure with an aiding preliminary screening step for newly arrived arms before full competition with incumbent arms. We show that UCB-AA attains regret bounds that depend explicitly on the arrival process, achieves sublinear dynamic regret under regularity conditions on gap evolution, and admits an online extension for unknown horizons. Simulation results show that UCB-AA reduces wasted pulls and maintains a smaller active arm set while preserving competitive regret performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09002v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>math.ST</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Deqi Zheng, Xiaoyang Xu, Yuhong Yang</dc:creator>
    </item>
    <item>
      <title>BareWave: Waveform-Native Flow-Matching Text-to-Speech</title>
      <link>https://arxiv.org/abs/2606.09048</link>
      <description>arXiv:2606.09048v1 Announce Type: cross 
Abstract: Removing intermediate representations and separately trained decoding stages has become an important direction in generative modeling. In text-to-speech, however, high-quality systems are still commonly built through an intermediate acoustic representation before waveform synthesis. In this work, we present BareWave, a fully waveform-native framework for direct text-to-wave generation in flow-matching TTS. We consider this setting to raise three training challenges: raw-waveform modeling lacks a strong pretrained representational scaffold, different stages of training benefit from different noise schedules, and data-space perceptual objectives do not automatically share the temporal structure of the velocity-space flow objective. As a result, direct waveform training is hard to optimize efficiently, hard to push toward a strong final operating point with a fixed recipe, and hard to integrate effective perceptual refinement. Guided by this view, we develop a direct text-to-wave training framework that combines training-time representation alignment, staged noise scheduling, and velocity-aware perceptual alignment (VAPA), while preserving a single waveform-native inference path without pretrained components at test time. Experiments on zero-shot voice cloning show that strong intelligibility, speaker similarity, and naturalness can be achieved under a fully waveform-native inference path, supporting waveform-native flow-matching TTS as a practical direction. Project page with audio demos is available at https://barewave.github.io/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09048v1</guid>
      <category>eess.AS</category>
      <category>cs.AI</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wei Fan, Chao-Hong Tan, Qian Chen, Wen Wang, Xiangang Li, Kejiang Chen, Weiming Zhang, Nenghai Yu</dc:creator>
    </item>
    <item>
      <title>Data augmented bootstrap: Unifying confidence interval construction by approximate invariance</title>
      <link>https://arxiv.org/abs/2606.09049</link>
      <description>arXiv:2606.09049v1 Announce Type: cross 
Abstract: We propose the data augmented bootstrap (DAB), a framework for constructing confidence intervals from approximately invariant transformations of the data. As special cases, DAB recovers popular methods that rely on exact group symmetries, such as conformal prediction, wild bootstrap for Maximum Mean Discrepancy U-statistics and the recently proposed SymmPI. Meanwhile, DAB also recovers the classical bootstrap method, which exploits the dataset's approximate invariance under uniform sampling of data indices as the dataset size grows. For all DAB methods, we establish theoretical coverage results that interpolate between finite-sample and asymptotic guarantees according to the strength of the invariance, and without assuming a group structure. The approximate invariance is measured in the Kolmogorov distance and, for statistics that satisfy Gaussian universality, reduces to conditional mean and variance matching. This allows us to incorporate data augmentation (DA), a widely used machine learning heuristic based on approximate invariances, into known statistical methods. We empirically test the performance of incorporating DA into bootstrap, wild bootstrap and conformal prediction for simulated settings as well as for image, language and scientific data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09049v1</guid>
      <category>stat.ME</category>
      <category>cs.LG</category>
      <category>math.ST</category>
      <category>stat.ML</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kevin Han Huang</dc:creator>
    </item>
    <item>
      <title>MeanVC 2: Robust Low-Latency Streaming Zero-Shot Voice Conversion</title>
      <link>https://arxiv.org/abs/2606.09050</link>
      <description>arXiv:2606.09050v1 Announce Type: cross 
Abstract: Streaming zero-shot voice conversion (VC) has become increasingly popular due to its potential for real-time applications. The recently proposed MeanVC achieves lightweight streaming zero-shot VC, but it has several limitations: its chunk-wise autoregressive denoising doubles the effective training sequence length, conversion quality degrades under small-chunk settings, and its timbre encoder directly relies on reference mel-spectrograms, making it sensitive to reference audio quality. To address these limitations we propose MeanVC 2. We introduce future-receptive chunking (FRC), which explicitly schedules past and future receptive fields across diffusion transformer decoder layers and removes clean-chunk teacher forcing. By incorporating bounded future context, FRC enables stable conversion with a 40 ms chunk size. We further introduce a universal timbre token encoder, which constructs a timbre representation from a global speaker embedding and retrieves fine-grained timbre cues via cross-attention, improving robustness to low-quality references and enhancing zero-shot speaker similarity. Experimental results show that MeanVC 2 significantly outperforms MeanVC, while reducing latency from 211 ms to 110 ms. Audio samples are publicly available. The source code will be publicly released.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09050v1</guid>
      <category>eess.AS</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Guobin Ma, Yuxuan Xia, Yuepeng Jiang, Dake Guo, Hanke Xie, Jingbin Hu, Yanbo Wang, Lei Xie, Pengcheng Zhu</dc:creator>
    </item>
    <item>
      <title>FlashTTS: Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation</title>
      <link>https://arxiv.org/abs/2606.09141</link>
      <description>arXiv:2606.09141v1 Announce Type: cross 
Abstract: Recent progress in speech dialogue systems requires Text-to-Speech (TTS) models to be faster and more responsive. Modern speech dialogue systems impose two primary requirements on TTS models: low latency and support for streaming inputs and outputs. However, most existing single-codebook LLM-based TTS methods rely on multi-stage pipelines that lack native streaming capabilities. These systems typically suffer from high end-to-end latency due to slow autoregressive prediction and multi-step flow matching. To address these limitations, we propose FlashTTS, an open-source and low-latency streaming TTS framework. FlashTTS introduces a lagged multi-track architecture that natively processes streaming text and speech inputs, thereby eliminating the need for sentence-level buffering. To accelerate acoustic generation, we integrate parallel Multi-Token Prediction (MTP) with an X-pred mean flow matching decoder. This configuration achieves high-fidelity token-to-mel generation in exactly two function evaluations (2-NFE). By jointly optimizing input processing and decoding efficiency, FlashTTS offers a practical foundation for real-time speech dialogue systems. Experiments show that FlashTTS substantially reduces First-Packet Latency to 325ms compared to robust streaming baselines, all while preserving strong zero-shot voice cloning and cross-lingual intelligibility. Speech samples are available. The model code and checkpoints will be released as open source.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09141v1</guid>
      <category>eess.AS</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hanke Xie, Xiaming Ren, Dake Guo, Ruonan You, Wenhao Li, Jingbin Hu, Guobin Ma, Huakang Chen, Kejie Xu, Rui Huang, Weiguo Tan, Xianrong Wang, Lei Xi</dc:creator>
    </item>
    <item>
      <title>The Size of the Intersection of $q$-ary Hamming Balls</title>
      <link>https://arxiv.org/abs/2606.09158</link>
      <description>arXiv:2606.09158v1 Announce Type: cross 
Abstract: The interest in studying the size of the intersection of multiple $q$-ary Hamming balls has grown due to the recent advances in DNA-based data storage systems. We present an exact formula for the cardinality of the intersection of $s$ Hamming balls of varying radii over a $q$-ary alphabet. It is known that the distances between the center points of the Hamming balls are not enough, in general, to determine the size of the intersection. Based on our formula, we are able to find more refined structural properties of the center points for determining the exact size of the intersection. Moreover, we also analyze the size of the intersection for sufficiently large $n$. When $s=3$, we give the necessary and sufficient conditions (for all $q\ge 2$, $q\neq 6$ and sufficiently large $n$) to obtain the maximum size of the intersection when the center points of the Hamming balls have a given minimum distance and demonstrate how to compute it using our general formula.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09158v1</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Ville Junnila, Tero Laihonen, Tuomo Lehtil\"a, Pavan Padavu Devaraj</dc:creator>
    </item>
    <item>
      <title>Multilevel Stochastic Gradient Descent for Risk-Averse PDE-Constrained Optimization</title>
      <link>https://arxiv.org/abs/2606.09291</link>
      <description>arXiv:2606.09291v1 Announce Type: cross 
Abstract: We present recent advances in applying and analyzing multilevel stochastic gradient descent algorithms to risk-averse, three-dimensional PDE-constrained optimization problems. The algorithm uses adaptive multilevel Monte Carlo gradient estimates, provides parallel scalability as well as improved convergence rates and computational complexity compared to standard batched stochastic gradient descent methods. We study the method in computationally demanding settings using three-dimensional elliptic diffusion problems and large risk-aversion parameters.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09291v1</guid>
      <category>math.OC</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Niklas Baumgarten, Philipp A. Guth, David Schneiderhan, Tommaso Vanzan</dc:creator>
    </item>
    <item>
      <title>Declines in research funding and science ecosystem fragility</title>
      <link>https://arxiv.org/abs/2606.09387</link>
      <description>arXiv:2606.09387v1 Announce Type: cross 
Abstract: Scientific knowledge advances through within-country and cross-border scientific activities and collaborations, influenced by funding and strength of research enterprise. Sudden declines in research funding, for example from Federal sources in the United States (US) 2024-25, adversely impact on scientific collaboration. How rapid declines in funding affect the science enterprise and the magnitude of impact need to be analysed.
  Past studies have modelled the global scientific system as complex collaborative networks of entities and studied its topology and dynamics. However, these studies have not undertaken compensation analysis to real-world shocks that have produced rapid declines in scientific research funding.
  In this study we examine the effect of the sharp declines in the US Federal funding on cancer science research enterprise globally. We model the cancer science ecosystem as a 5-layer multiplex network of collaborative linkages between 233 countries and territories in grants and clinical trial co-investigations, paper co-authorships, co-inventions and patent co-ownerships. We quantify information flow in the multiplex system through network efficiency.
  Proposing a framework for compensation analysis, we show that sharp declines in US Federal funding for research degrade global information exchange in science, imposing outsized compensatory burdens on country groups such as the European Union (EU) and BRICS (Brazil, Russia, India, China and South Africa). However, we also show that if other countries provide more support for international collaborations, there is an opportunity to remodel the cancer science system to be more resilient to future shocks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09387v1</guid>
      <category>physics.soc-ph</category>
      <category>cs.SI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anbang Du, Beining Zhang, Rifat Atun, Michael G Head, Markus Brede</dc:creator>
    </item>
    <item>
      <title>SAILS: Surrogate-based Analysis of Interactions via Local Effect Smooths</title>
      <link>https://arxiv.org/abs/2606.09404</link>
      <description>arXiv:2606.09404v1 Announce Type: cross 
Abstract: Feature interactions drive much of the predictive power of machine learning models, yet existing explanation methods only detect and quantify interactions without revealing their functional form, or visualize only restricted interaction types. We propose Surrogate-based Analysis of Interactions via Local effect Smooths (SAILS), a model-agnostic framework that analyzes pairwise interactions through interpretable generalized additive model (GAM) surrogates fitted to the local effects of a black-box model. For each interval of a feature of interest, the surrogate smooth terms isolate the interaction components on derivative level, enabling (i) interaction detection through a heuristic derived from significance tests on smooth terms, (ii) interaction form categorization into linear, product-separable, and non-product-separable types, and (iii) tailored, interpretable visualizations for each interaction type. We empirically validate the framework through controlled simulations and a real-world task, demonstrating its effectiveness for pairwise interactions, with limitations under strong feature correlations and higher-order interactions. SAILS fills a notable gap in the XAI toolbox, going beyond detection of interactions alone to characterizing their functional form.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09404v1</guid>
      <category>stat.ML</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Timo Hei{\ss}, Julia Herbinger, Bernd Bischl, Giuseppe Casalicchio</dc:creator>
    </item>
    <item>
      <title>Context-Aware Deep Learning for Defect Classification in Atomic-Resolution STEM</title>
      <link>https://arxiv.org/abs/2606.09419</link>
      <description>arXiv:2606.09419v1 Announce Type: cross 
Abstract: Artificial intelligence is rapidly advancing materials characterization, yet most applications in electron microscopy rely solely on image contrast, overlooking the chemical and experimental context that shapes image formation. This limitation makes defect classification inherently ambiguous, as similar contrasts can arise from different materials or imaging conditions. Here we develop a context-aware learning framework that integrates image-derived contrast with metadata describing composition, beam energy, and detector geometry. Using a systematically constructed dataset of ~55 million simulated patches spanning 576 cases across 96 doped monolayer transition-metal dichalcogenides, we show that conditioning on contextual variables transforms defect classification from an ill-posed image-only task into a well-posed, physically grounded problem. The framework achieves over 98% accuracy on simulations and near-human agreement on experimental data, with a 94% reduction in posterior entropy. By emphasizing contextual grounding over architectural complexity, this approach links experimental image contrast to the underlying chemical and imaging conditions, supporting physically grounded defect assignments and a general pathway toward multimodal AI models for autonomous materials characterization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09419v1</guid>
      <category>cond-mat.mtrl-sci</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiadong Dan, Cheng Zhang, Leyi Loh, Ivan Verzhbitskiy, Yuan Chen, Goki Eda, Michel Bosman, N. Duane Loh</dc:creator>
    </item>
    <item>
      <title>The macroscopic Kaehler metric of Geometric Thermodynamics versus the microscopic one on the Event Manifold: Exact Partition Functions on CV manifolds. Extended Souriau temperatures and spontaneous magnetizations</title>
      <link>https://arxiv.org/abs/2606.09438</link>
      <description>arXiv:2606.09438v1 Announce Type: cross 
Abstract: In this paper we clarify the relation between Geometric Thermodynamics and Information Geometry based on the Fisher matrix. On the macroscopic odd-dimensional contact manifold of thermodynamic variables, we introduce for the first time a metric, whose pull-back on the isoentropic symplectic submanifolds transverse to the Reeb field is K\"ahlerian. The pull-back of such metric on equilibrium states, that are lagrangian submanifolds, is the Fisher Hessian. Then we consider the Souriau-like Thermodynamics that uses Calabi-Vesentini (CV) manifolds as Kaehlerian microscopic event manifolds and the Killing moment maps as observable functions. A systematic use of the theory of compact abelian structures and the setup of Special K\"ahler Geometry in which CV manifolds are encoded allows us to perform the explicit integration defining the partition function for any entry in the CV Tits Satake universality class. The additional actions completing the abelian structure are non linear Casimir functions of the Killing moment-maps and suggest a generalization of Souriau thermodynamics that partially breaks the isometry group symmetry by means of the non vanishing mean values of the Casimir functions in a manner similar to the spontaneous magnetization in ferromagnetism. Our new exact Gibbs distributions provide the analogue for Cartan Neural Networks of the Gaussian probability distributions in flat space used in conventional Machine Learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09438v1</guid>
      <category>hep-th</category>
      <category>cs.IT</category>
      <category>math-ph</category>
      <category>math.IT</category>
      <category>math.MP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pietro Fr\'e, Alexander S. Sorin, Mario Trigiante</dc:creator>
    </item>
    <item>
      <title>Metric-Free Riemannian Optimization</title>
      <link>https://arxiv.org/abs/2606.09465</link>
      <description>arXiv:2606.09465v1 Announce Type: cross 
Abstract: Riemannian optimization provides a powerful framework for constrained optimization by incorporating problem-specific structure directly into the geometry of the search space. In many applications, however, the explicit evaluation or application of the Riemannian metric can be computationally expensive or numerically unstable, limiting the practical efficiency of otherwise well-founded algorithms. Motivated by such settings, this work investigates to what extent classical Riemannian optimization algorithms can be reformulated without explicitly applying the metric. We show that many first-order components of Riemannian optimization only rely on the differential of the objective function and access to the Riemannian gradient, but not on explicit metric application. Based on this observation, we develop metric-free formulations and generalize optimization approaches to Finsler and Banach manifolds. Numerical experiments demonstrate that the proposed metric-free strategies retain the effectiveness of their metric-dependent counterparts while significantly reducing computational overhead. These results highlight that a substantial portion of Riemannian optimization can be carried out independently of explicit metric application, broadening its applicability to problems with expensive or implicitly defined metrics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09465v1</guid>
      <category>math.OC</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jonas P\"uschel</dc:creator>
    </item>
    <item>
      <title>Report the Floor: A Training-Free Conformal Interval Is a Mandatory Baseline for Probabilistic Time-Series Forecasting</title>
      <link>https://arxiv.org/abs/2606.09473</link>
      <description>arXiv:2606.09473v1 Announce Type: cross 
Abstract: Probabilistic forecasters are increasingly learned, yet the baselines they are compared against are often weak or omitted. We show that the simplest possible conformal interval - a last-value point forecast wrapped in a finite-sample split-conformal residual quantile, with no parameters and no training - is a far stronger baseline than its near-total absence from recent learned-forecasting and conformal-time-series comparisons would suggest. In one-step-ahead online forecasting across 2,217 real series from nine public sources (Monash, LOTSA, the LTSF traffic/electricity/weather suites, METR-LA, BOOM, nips/probts), this ConformalNaive interval decisively beats the naive value-quantile baselines, the entire NPTS family (NPTS 73%, SeasonalNPTS 64% of series), and the published Conformal Seasonal Pools (CSP) method (71% of series, bootstrap 95% CI [69,73], paired Wilcoxon p approx 7.6e-135); it is on par with the simpler learned conformal predictors (RCI, quantile regression; median relative Winkler within 2%) and is beaten only by the adaptive-online and ensemble methods (SPCI, ACI, AgACI), which track distribution shift and lead by 9-33% relative Winkler. It is also better calibrated than a trained neural forecaster: on the six datasets that introduced DeepNPTS, the trivial floors cover the truth 84-85% of the time at a nominal 95%, versus DeepNPTS's 66%. At multi-step seasonal horizons the picture inverts: the random-walk floor is the weakest method and the seasonal pool (CSP) wins - a boundary we map. Finally we give ConformalNaive+, a one-line, training-free, horizon-adaptive selector that attains the better of two complementary floors at every horizon with restored coverage. We argue the matching conformal naive floor must be a mandatory baseline whenever a learned probabilistic forecaster claims gains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09473v1</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.5281/zenodo.20594484</arxiv:DOI>
      <dc:creator>Valery Manokhin</dc:creator>
    </item>
    <item>
      <title>Closing the Prior-Posterior Loop: Self-Reflective Molecular Design with Analysis-Driven LLM Iteration</title>
      <link>https://arxiv.org/abs/2606.09520</link>
      <description>arXiv:2606.09520v1 Announce Type: cross 
Abstract: Can a general-purpose large language model design molecules with the precision of a seasoned chemist? Current LLM-based frameworks answer this question with scalar feedback loops-generate, score, reject-that amount to informed trial-and-error. Here we show that replacing a single number with the full physicochemical rationale from first-principles calculations transforms the LLM from a stochastic sampler into a causal reasoner. Our system couples retrieval-augmented generation with a self-reflection module that feeds orbital energies, atomic charges, and electron densities-rather than compressed scores-back into the design loop. On HOMO-LUMO gap targets from 1.0 to 5.0 eV, this structure-property-relationship (SPR) reflection achieves a deviation as low as 0.0003 eV and a 100% success rate on moderate tasks, decisively outperforming scalar-feedback and non-reflective baselines. The framework generalizes seamlessly to dipole-moment design and proves robust across five distinct LLM backbones. These results establish a new paradigm: when the model understands not only that a molecule fails, but why, iterative molecular design becomes genuinely mechanistic.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09520v1</guid>
      <category>physics.chem-ph</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Junyi Gong, Zijie Qiu, Ben Zhong Tang</dc:creator>
    </item>
    <item>
      <title>Automating the Expert Eye: A System-Agnostic Deep Learning Framework for Rare Event Discovery in Imbalanced Force Spectroscopy</title>
      <link>https://arxiv.org/abs/2606.09541</link>
      <description>arXiv:2606.09541v1 Announce Type: cross 
Abstract: Single-Molecule Force Spectroscopy (SMFS) provides unprecedented insights into biomolecular mechanics, yet the high-throughput generation of force-extension trajectories creates a severe data curation bottleneck. Identifying rare molecular unbinding events within thousands of noise-dominated curves traditionally relies on tedious, non-scalable manual auditing. Here, we present a system-agnostic, interpretable deep learning framework tailored to overcome extreme class imbalance in automated SMFS triage. Utilizing 1D-to-2D rasterized geometric matrices, we deployed a modified ResNet18 architecture governed by an asymmetric Focal Loss objective function. We evaluated this framework on the complex mechanical unfolding pathways of the R. champanellensis cellulosome. Under hyper-imbalanced test conditions where the target interaction constituted only 1.34% of the dataset (13 true events out of 970 traces), the model achieved an overall accuracy of 0.9196 and a remarkable True Positive Rate (Recall) of 0.9231. By implementing an empirically calibrated dual-threshold triage system, the pipeline automatically discarded 880 unambiguous background noise traces , reducing the manual curation workload by over 90% while safely preserving high-value rare data. Finally, Gradient-weighted Class Activation Mapping (Grad-CAM) visually validated that the network's decisions are firmly anchored in the relevant geometric features of the force curves, specifically localizing on the structural unbinding regions, effectively mitigating 'black-box' skepticism. Built for free cloud-based execution, this open-source tool democratizes scalable, highly precise molecular discovery across the biophysics community.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09541v1</guid>
      <category>physics.app-ph</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jorge Rodriguez-Ramos</dc:creator>
    </item>
    <item>
      <title>Integrating gene regulatory priors into Transformer attention with scTransformer for interpretable scRNA-seq analysis</title>
      <link>https://arxiv.org/abs/2606.09558</link>
      <description>arXiv:2606.09558v1 Announce Type: cross 
Abstract: Motivation: Transformer-based models are increasingly applied to large-scale single-cell transcriptomics, showing strong performance through self-supervised learning on millions of cells. However, most existing approaches treat genes as independent features, and largely ignore prior biological knowledge, which limits interpretability and robustness. In this paper, we explore whether explicitly incorporating gene regulatory information can improve both model performance and biological insight. Results: We present scTransformer, the first Transformer-based approach that builds a priori knowledge of biological mechanisms into the model's attention patterns. By constraining information flow according to known regulatory structures, the model learns representations that are more biologically meaningful. We evaluate scTransformer on a disease-relevant single-nucleus RNA-seq dataset using supervised cell-type classification. Compared to standard Transformers, our approach improves classification accuracy, enhances separation of cell types in embedding space, and produces attention patterns consistent with known regulatory programs. Overall, our results demonstrate that embedding biological structure into Transformer models can enhance interpretability without sacrificing performance, offering a principled step toward biologically grounded foundation models for single-cell omics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09558v1</guid>
      <category>q-bio.GN</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mikele Milia, Louis Fabrice Tshimanga, Henning Mueller, Manfredo Atzori, Barbara Di Camillo</dc:creator>
    </item>
    <item>
      <title>Constraint residuals, graph posteriors, and determinant-corrected full-space targets in Bayesian inverse problems</title>
      <link>https://arxiv.org/abs/2606.09594</link>
      <description>arXiv:2606.09594v1 Announce Type: cross 
Abstract: Bayesian inverse problems constrained by state equations are often sampled in a full parameter-state space by penalising the residual, rather than in a reduced space where the state is eliminated. We show that these formulations are not automatically equivalent as posterior measures. For finite-dimensional discretisations of equality-constrained inverse problems, assume the state equation \(c(\theta,u)=0\) has a unique solution \(u=G(\theta)\) and nonsingular state Jacobian \(\D_u c\). The reduced posterior, its graph lift, and the zero-noise residual posterior are then distinct. A local change of variables shows that an uncorrected Gaussian residual penalty converges, after marginalisation over \(u\), to the reduced density multiplied by \(\abs{\det \D_u c(\theta,G(\theta))}^{-1}\). Thus algebraically equivalent residuals can define the same feasible set but different limiting posteriors. We derive determinant corrections for unweighted, weighted, and rescaled residual penalties that have the graph-lifted reduced posterior as their hard-constraint limit. The result separates feasibility from posterior calibration: driving the residual to zero is not sufficient for exact sampling of the graph-lifted reduced posterior unless the sampling or correction step targets the corresponding corrected density.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09594v1</guid>
      <category>math.ST</category>
      <category>cond-mat.stat-mech</category>
      <category>cs.NA</category>
      <category>math-ph</category>
      <category>math.MP</category>
      <category>math.NA</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jonathon Cottom, Emilia Olsson</dc:creator>
    </item>
    <item>
      <title>Powering the Future of AI: Navigating the Trade-offs for Europe's Energy Transition and Net-Zero Goals</title>
      <link>https://arxiv.org/abs/2606.09617</link>
      <description>arXiv:2606.09617v1 Announce Type: cross 
Abstract: The rapid expansion of AI globally has led to the proliferation of energy-intensive hyperscale data centres (DCs), making them as a structurally challenging component in power system planning and operation. Using a spatially explicit optimisation model of Europe across 21 AI growth scenarios, we systematically quantify additional demand, capacity requirements, emissions, and operational impacts of DCs. Results indicate that AI could drive 73-723 TWh of extra demand by 2050, risking cumulative emissions overshoots of 67-181 MtCO2 between 2030 and 2050. Our analysis indicates that after 2030, the geography of AI infrastructure will be shaped more by firm power and system flexibility than by the mere abundance of clean energy. In moderate scenarios, AI requires an additional of 200 hours of firm generation, which increases LCOE by 35 EUR/MWh in key hubs. We show that even under the pessimistic scenarios, existing infrastructure would require 70 GW additional capacity, while under managed growth pathways, this expansion could reach 226 GW. We further find DCs workload dynamics strongly shape energy dispatch, system flexibility, and emissions, while improved efficiency significantly reduces capacity needs, and system peaks. While our findings suggest that net-zero targets for 2050 may be achieved, critical emission risks may appear in the intermediate years, and the EU may compromise its carbon-neutral goals unless policies adapt to this accelerating digital transformation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09617v1</guid>
      <category>math.OC</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mohammad Hemmati, Gbemi Oluleye, Vassilis M. Charitopoulos</dc:creator>
    </item>
    <item>
      <title>Cross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and Lipreading</title>
      <link>https://arxiv.org/abs/2606.09667</link>
      <description>arXiv:2606.09667v1 Announce Type: cross 
Abstract: Speech restoration through silent speech interfaces (SSIs) has emerged as a promising assistive technology for individuals with impaired or absent laryngeal voice production. Among non-invasive SSI modalities, surface electromyography (sEMG) and video-based lipreading provide complementary articulatory information, yet their integration for continuous speech synthesis remains underexplored. Moreover, existing multimodal approaches rarely address robustness to modality degradation or temporary sensor failure, limiting their applicability in realistic scenarios. In this work, we propose a masked multimodal speech synthesis framework that jointly leverages sEMG and lipreading signals through modality masking during training. Under multispeaker settings, the proposed approach reduces word error rate by up to 14 absolute percentage points compared to the strongest unimodal baseline. Experimental results not only show that masking strategies are critical for these performance gains and robustness under low-bitrate conditions, but also that they generalize better than degradation-specific data augmentations in the presence of modality absence conditions. Phone-level analyses further reveal complementary contributions across modalities, with particularly strong benefits for vowels and for specific consonant groups. Overall, these findings demonstrate the effectiveness and robustness of masked multimodal integration for silent speech synthesis, although adaptation to laryngectomized speakers remains an open research challenge.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09667v1</guid>
      <category>eess.AS</category>
      <category>cs.CL</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Eder del Blanco, David Gimeno-G\'omez, Eva Navas, Carlos-D. Mart\'inez-Hinarejos, Inma Hern\'aez</dc:creator>
    </item>
    <item>
      <title>MeCo: One-Step MeanFlow-based Corrector for Multi-Channel Speech Separation</title>
      <link>https://arxiv.org/abs/2606.09677</link>
      <description>arXiv:2606.09677v1 Announce Type: cross 
Abstract: While discriminative models for multi-channel speech separation excel in reference-based metrics, they often exhibit suboptimal human listening quality. To address this, we propose a novel MeanFlow-based one-step generative corrector (MeCo). MeCo learns a conditional average velocity field to map discriminative estimates directly onto the clean speech manifold in a single step. To maximize one-step generation performance, we introduce Data-Space Optimization (DSO). DSO integrates an $\mathbf{x}_r$-loss, which penalizes prediction errors on longer displacement intervals to serve as a generative objective for human listening quality, with an Endpoint SI-SDR loss that directly optimizes terminal signal fidelity. Experiments demonstrate that MeCo achieves state-of-the-art (SOTA) performance with minimal computational overhead, simultaneously achieving superior signal fidelity and human listening quality in both in-domain and out-of-domain scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09677v1</guid>
      <category>eess.AS</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dohwan Kim, Jung-Woo Choi</dc:creator>
    </item>
    <item>
      <title>A Bell-State Extension of Loop-Back Quantum Key Distribution</title>
      <link>https://arxiv.org/abs/2606.09723</link>
      <description>arXiv:2606.09723v1 Announce Type: cross 
Abstract: Bidirectional quantum key distribution (QKD) protocols face persistent challenges related to classical disclosure, confinement of the signal space to predictable subspaces, and limited detectability under substitution or entanglement-swapping attacks. In this work, we present a Bell-state extension of the Loop-Back QKD architecture that improves efficiency and detectability while preserving its defining feature of a simplified, measurement-free remote terminal. The protocol employs entangled Bell states together with deterministic local Pauli encoding at the remote node. A central element is that Alice privately prepares and knows the initial Bell state, which serves as a hidden reference enabling her to interpret the Bell-state transition induced by Bob, while preventing an adversary from reconstructing the encoding without access to this reference. By exploiting both intra- and inter-family Bell transitions, the scheme expands the effective signal space beyond the subspace restrictions of earlier two-way protocols. Alice performs a Bell-state measurement to deterministically infer Bob's operation without any basis sifting. Although the traveling subsystem remains locally maximally mixed, concealing the initial Bell family amplifies disturbance under separable substitution strategies, yielding an intrinsic detection probability of approximately 3/4 per round. From an efficiency perspective, the protocol lifts the intrinsic post-selection limitation of single-qubit Loop-Back schemes: the effective throughput is bounded only by the Bell-state measurement success probability, reaching up to 50% in linear-optical implementations. These features make the proposed scheme particularly suitable for mobile or edge-based QKD scenarios, where passive remote nodes must operate under high loss and limited interaction times.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09723v1</guid>
      <category>quant-ph</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Luis Adri\'an Lizama-P\'erez</dc:creator>
    </item>
    <item>
      <title>Quantum Cut Sparsifiers</title>
      <link>https://arxiv.org/abs/2606.09728</link>
      <description>arXiv:2606.09728v1 Announce Type: cross 
Abstract: In this paper, we continue a line of research initiated by Basu, Brakensiek, and Putterman [2026] studying the sparsifiability of Hamiltonians. We focus particularly on the sparsifiability of the widely-studied Quantum Cut (QC) Hamiltonians. Our main result is that in an $n$-qubit system, any $n$-qubit QC Hamiltonian can be sparsified to $\widetilde{O}(n /\varepsilon^2)$ many terms while preserving the energy of every state up to a factor of $1 \pm \varepsilon$. Our result can be interpreted as giving an importance sampling scheme for the edges of an arbitrary graph $G$ such that the \emph{Kikuchi} graph at level $\ell$ of the sampled graph is a spectral approximation to the Kikuchi graph of $G$. Importantly, the \emph{same} sampling scheme works simultaneously for all $\ell$.
  The natural approach of leverage score sampling, analyzed via matrix concentration inequalities, yields a polynomially worse bound in our setting because the underlying matrices have dimension $\sim 2^n$. Instead, our approach relies on decomposing the action of these matrices into invariant subspaces. Then, by using an operator-valued inequality of Alon and Kozma [Ann. Henri Poincar\'e, 2020], itself building on an \emph{octopus inequality} of Caputo, Liggett, and Richthammer [J. AMS, 2010], we extend our sparsification technique to all expander graphs. We then invoke expander decomposition to extend our sparsifier to all graphs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09728v1</guid>
      <category>quant-ph</category>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Arpon Basu, Joshua Brakensiek, Pravesh K. Kothari, Aaron Putterman</dc:creator>
    </item>
    <item>
      <title>Adaptive directional gradients for parameterised quantum circuits</title>
      <link>https://arxiv.org/abs/2606.09734</link>
      <description>arXiv:2606.09734v1 Announce Type: cross 
Abstract: Training parameterised quantum circuits (PQCs) on quantum hardware is bottlenecked by the measurement cost of gradient estimation, which under the parameter-shift rule scales linearly in the number of trainable parameters and dominates the total shot budget of training at scale. In this work, we propose a framework of forward gradient estimators for PQCs, based on the forward mode of automatic differentiation, that yields an unbiased estimator of the gradient by averaging a freely tunable number of random directional derivatives and recovers SPSA, random coordinate descent, and the parameter-shift rule as limiting cases, with no ancilla qubits or controlled-gate overhead. We prove that stochastic quantum forward gradient descent converges under standard assumptions, with an explicit second-moment expansion that interpolates between the single-direction extreme of SPSA and the full-gradient extreme of parameter-shift. Within this framework we derive QUIVER (Quantum Iterative V-adaptive Estimator Rule), an adaptive optimiser for parameterised circuits whose update rule follows from a closed-form minimum measurement-cost allocation. We show numerically that forward gradients train Hamming-weight-preserving orthogonal quantum neural networks with up to 60 qubits and 1770 parameters on the ECG5000 and MNIST datasets orders of magnitude more efficiently than the parameter-shift rule. We also demonstrate that our proposed QUIVER optimiser can outperform iCANS and gCANS measurement-frugal optimisers on optimisation problems using the quantum approximate optimisation algorithm and quantum simulation with the variational quantum eigensolver.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09734v1</guid>
      <category>quant-ph</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Brian Coyle, Snehal Raj, Virag Umathe, El Amine Cherrat, Elham Kashefi</dc:creator>
    </item>
    <item>
      <title>Almost-perfect packings and Tuza's conjecture in the random geometric graph</title>
      <link>https://arxiv.org/abs/2606.09736</link>
      <description>arXiv:2606.09736v1 Announce Type: cross 
Abstract: The triangle packing number $\nu(G)$ of a graph $G$ is the maximum size of a set of edge-disjoint triangles in $G$. Tuza conjectured that in any graph $G$ there exists a set of at most $2\nu(G)$ edges intersecting every triangle in $G$. We show that Tuza's conjecture holds in the random geometric graph for a large range of densities. We also study the problem of covering almost all edges of the random geometric graph with edge-disjoint copies of some fixed graph $F$. In particular, we show the existence of almost-perfect packings for an infinite family of $F$, and state some negative results as well.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09736v1</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Patrick Bennett, Ryan Cushman, Andrzej Dudek, Xavier P\'erez-Gim\'enez</dc:creator>
    </item>
    <item>
      <title>Discovering Functionally Selective Brain Regions with a Deep Topographic Multimodal Model</title>
      <link>https://arxiv.org/abs/2606.09770</link>
      <description>arXiv:2606.09770v1 Announce Type: cross 
Abstract: Nearby neurons in cortex share similar response profiles, producing systematic spatial organization across sensory and cognitive systems. Recent topographic models reproduce aspects of this structure but remain unimodal and spatially constrain each layer separately, yielding fragmented maps that capture neither the contiguity of cortical processing streams nor their integration across modalities. We introduce Topo-Omni, a topographic multimodal model in which visual, auditory, and language/cognitive processing share a single contiguous in-silico sheet. Built by fine-tuning a pretrained foundation model with a spatial smoothness objective, this architecture develops clusters across modalities that are consistent with human neuroimaging, from sensory to cognitive systems. Driving or suppressing a cluster selectively biases or impairs perception, paralleling human intervention studies. Finally, we use our model to screen for novel clusters in-silico and discover new natural landscape and animal networks which we validate in human data. A single spatial principle thus organizes representations across modalities and processing stages, yielding testable hypotheses about cortical organization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09770v1</guid>
      <category>q-bio.NC</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Badr AlKhamissi, Johannes Mehrer, Lara Marinov, Ahmed Abdelaal, Abdulkadir Gokce, Martin Schrimpf</dc:creator>
    </item>
    <item>
      <title>Who Earns the Safety? Intervention-Aware Quantum Predictive Control with Safety Attribution</title>
      <link>https://arxiv.org/abs/2606.09778</link>
      <description>arXiv:2606.09778v1 Announce Type: cross 
Abstract: Hard safety filters are increasingly placed downstream of learned controllers to guarantee constraint satisfaction at run time. Yet a filtered controller that never violates a constraint may still have learned nothing about safety: the filter can silently repair an incompetent upstream policy, so that post-filter success measures the filter, not the policy. We argue that safe policy learning should ask who earns the safety - the policy or its protective layers - and we make this question measurable. We introduce Intervention-Aware Variational Quantum Differentiable Predictive Control (IA-VQC-DPC), which (i) trains a compact variational quantum circuit (VQC) policy under a primal-dual intervention budget that penalizes reliance on a differentiable Control-Barrier-Function (CBF) projection, and (ii) is evaluated with a safety-attribution protocol that decomposes the executed-trajectory correction into a CBF term and a deployment runtime-guard term, and stress-tests the policy with guard-off evaluation. On closed-loop, high-fidelity BOPTEST building-control emulators (5 seeds, 60 episodes per method), intervention-aware training significantly lowers the quantum policy's raw pre-filter violation and total safety-layer reliance (both p &lt; 10^-4) with no significant energy regression; at an equal approximately 400-parameter budget the quantum policy is significantly safer and more comfortable than a matched classical policy. Guard-off evaluation confirms the improvement is policy-level and exposes a valuable negative result: a learned differentiable energy head is only safe when paired with a distribution-aware runtime guard. The attribution protocol is general beyond quantum policies and buildings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09778v1</guid>
      <category>quant-ph</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yifan Wang</dc:creator>
    </item>
    <item>
      <title>Biclique decompositions from Welzl orders</title>
      <link>https://arxiv.org/abs/2606.09785</link>
      <description>arXiv:2606.09785v1 Announce Type: cross 
Abstract: A biclique decomposition of a graph is a partition of its edges into complete bipartite subgraphs. We consider graphs whose vertices can be ordered such that the neighborhood of every vertex is the union of a sublinear number of intervals. We observe that these graphs admit compact representations in the form of biclique decompositions of small size. Here, the size of a decomposition is measured as the sum of the number of vertices of its bicliques. Combining this result with the existence of suitable vertex orderings for graphs of low neighborhood complexity, as proven by Welzl in 1988, we recover and extend several known results up to logarithmic factors. These results include upper bounds on the Zarankiewicz problem, matrix multiplication, quantum circuit complexity, and shortest path algorithms in ``well-structured'' instances.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09785v1</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jean Cardinal, Rose McCarty, Yelena Yuditsky</dc:creator>
    </item>
    <item>
      <title>Finite-n Estimate of Dedekind Numbers by Layer-Ratio Monte Carlo</title>
      <link>https://arxiv.org/abs/2606.09795</link>
      <description>arXiv:2606.09795v1 Announce Type: cross 
Abstract: Dedekind's problem counts monotone Boolean functions, equivalently downsets of a Boolean lattice. We recast this enumeration as a finite layer-ratio reconstruction problem for the Whitney numbers of the ranked ideal lattice. An exact adjacent-layer double count expresses each layer ratio through local averages of the number of addable elements and the number of removable elements. Reversible fixed-layer Markov chains estimate these averages and hence estimate the Dedekind number M(n). Backtests at M(8) and M(9) calibrate seed-level variability under the fixed protocol and measure the observed Monte Carlo budget scaling. The resulting estimate probes the Whitney-number sequence of the ideal lattice. Although these rows have previously been described empirically as unimodal, the high-precision n=9 estimate has a shallow two-shoulder feature around the central rank, contrary to that empirical description; n=11 and n=13 center-window estimates show a larger-contrast analogous pattern. The protocol estimate for M(10) is \[
  \widehat M(10)=(8.9360\pm0.0010)\times 10^{78}, \] where the displayed uncertainty is the budget-based forecast scale from the cross-n scaling law under the production budget.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09795v1</guid>
      <category>math.CO</category>
      <category>cs.IT</category>
      <category>cs.NA</category>
      <category>hep-th</category>
      <category>math-ph</category>
      <category>math.IT</category>
      <category>math.MP</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tian-Shun Chen, Hao Feng, Haozhe Wang, Kilar Zhang</dc:creator>
    </item>
    <item>
      <title>On the generalized Tur\'an number of complete bipartite graphs</title>
      <link>https://arxiv.org/abs/2606.09801</link>
      <description>arXiv:2606.09801v1 Announce Type: cross 
Abstract: For graphs $F$ and $H$, the generalized Tur\'an number $\mathrm{ex}(n,F,H)$ denotes the maximum number of copies of $F$ in an $H$-free graph on $n$ vertices. We prove that if $s\in \{2,3\}$, $s&lt; a\leq b$ and $t$ is sufficiently large, then $\mathrm{ex}(n,K_{a,b},K_{s,t})=\Theta(n^s)$. The $s=2$, $a=b=3$ case of this result answers a question of Spiro.
  Proving another conjecture of Spiro, we show that for every graph $F$ with at least one edge, there exist infinitely many real numbers $r$ such that $\mathrm{ex}(n,F,H)=\Theta(n^r)$ holds for some graph $H$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09801v1</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Oliver Janzer, Sean Longbrake, Liana Yepremyan</dc:creator>
    </item>
    <item>
      <title>Weighted universal approximation of differentiable maps on infinite-dimensional manifolds</title>
      <link>https://arxiv.org/abs/2606.09820</link>
      <description>arXiv:2606.09820v1 Announce Type: cross 
Abstract: We generalize the universal approximation theorem for functional input neural networks (FNN) to differentiable maps by including the approximation of the derivatives. A FNN maps the input from a possibly infinite-dimensional weighted manifold to the real-valued hidden layer, on which a non-linear scalar activation function is applied, and then returns the output into a Banach space via some linear readouts. By proving a weighted Nachbin theorem, we establish a universal approximation theorem (UAT) for differentiable maps, which goes beyond the usual formulation on compact sets and also includes the approximation of the derivatives. This leads us to approximation results for non-anticipative functionals including the horizontal and vertical derivatives. As a further application, we show that linear functions of the signature are able to approximate path space functionals including their directional derivatives.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.09820v1</guid>
      <category>math.FA</category>
      <category>cs.LG</category>
      <category>math.PR</category>
      <category>q-fin.MF</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Philipp Schmocker, Josef Teichmann</dc:creator>
    </item>
    <item>
      <title>New Combinations of Polynomial Root-Finding Iterations</title>
      <link>https://arxiv.org/abs/1705.00729</link>
      <description>arXiv:1705.00729v4 Announce Type: replace 
Abstract: Some near-optimal polynomial root-finders of 2024-25, based on subdivision iterations, approximate all complex roots of a polynomial or all roots lying in a fixed Region of Interest in the complex plane. We combine these iterations with Newton's and/or Schroeder's to yield significant empirical acceleration versus each approach standing alone. Like the cited recent algorithms, our root-finders can be applied not only to a polynomial represented in monomial basis by its coefficients but also to a black box polynomial represented by an oracle (black box subroutine) for its evaluation. Some by-products of our study such as an extension of the Gauss-Lucas theorem and a fast black box estimator for root radius can be of independent interest.</description>
      <guid isPermaLink="false">oai:arXiv.org:1705.00729v4</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Victor Y. Pan</dc:creator>
    </item>
    <item>
      <title>Measuring a hate speech spectrum with faceted Rasch item response theory and perspective-aware, explainable-by-design deep learning</title>
      <link>https://arxiv.org/abs/2009.10277</link>
      <description>arXiv:2009.10277v2 Announce Type: replace 
Abstract: We propose a system for measuring hate speech on a continuous, interval-valued spectrum ranging from genocidal to supportive speech by combining supervised deep learning with faceted Rasch item response theory (IRT). We decompose the theoretical construct of hate speech into constituent concepts operationalized as 10 ordinal labels. Those labels are reconstituted via IRT probabilistic latent modeling into an interval outcome measure while simultaneously estimating and adjusting for each annotator's labeling perspective. Our scaling procedure integrates naturally with a multitask deep learning architecture for automated prediction, allowing design-based explainability of the continuous score through those components. We apply this method to a new, open source dataset of 50,070 social media comments sourced from YouTube, Twitter, and Reddit, annotated and labeled by 11,143 United States-based Amazon Mechanical Turk workers. Our RoBERTa-based model shows improved accuracy compared to alternative approaches. This system offers a new paradigm for supervised NLP that encourages continuous rather than binary constructs, and design-based incorporation of annotator perspective and model explainability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2009.10277v2</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <category>cs.SI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chris J. Kennedy, Geoff Bacon, Alexander Sahn, Claudia von Vacano</dc:creator>
    </item>
    <item>
      <title>An Interval Branch-and-Bound-Based Inverse Kinemetics Algorithm Towards Global Optimal Redundancy Resolution</title>
      <link>https://arxiv.org/abs/2104.12183</link>
      <description>arXiv:2104.12183v2 Announce Type: replace 
Abstract: The general inverse kinematics (IK) problem of a manipulator, namely that of acquiring the self-motion manifold (SMM) of all admissible joint angles for a desired end-effector pose, plays a vital role in robotics modeling, planning and control. To efficiently solve the generalized IK, this paper proposes an interval branch-and-bound-based approach, which is augmented with a fast numerical IK-solver-enabled search heuristics. In comparison to independent solutions generated by sampling based methods, our approach generates patches of neighboring solutions to provide richer information of the inherent geometry of the SMM for optimal planning and other applications. It can also be utilized in an anytime fashion to obtain solutions with sub-optimal resolution for applications within a limited period. The performance of our approach is verified by numerical experiments on both non-redundant and redundant manipulators.</description>
      <guid isPermaLink="false">oai:arXiv.org:2104.12183v2</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yajue Yang, Zeqing Zhang, Yuanqing Wu, Jia Pan</dc:creator>
    </item>
    <item>
      <title>Generalized binary utility functions and fair allocations</title>
      <link>https://arxiv.org/abs/2109.08461</link>
      <description>arXiv:2109.08461v2 Announce Type: replace 
Abstract: The problem of finding envy-free allocations of indivisible goods can not always be solved; therefore, it is common to study some relaxations such as envy-free up to one good (EF1). Another property of interest for efficiency of an allocation is the Pareto Optimality (PO). Under additive utility functions, it is possible to find allocations EF1 and PO using Nash social welfare. However, to find an allocation that maximizes the Nash social welfare is a computationally hard problem. In this work we propose a polynomial time algorithm which maximizes the utilitarian social welfare and at the same time produces an allocation which is EF1 and PO in a special case of additive utility functions called buyer utility functions. Moreover, a slight modification of our algorithm produces an allocation which is envy-free up to any positively valued good (EFX).</description>
      <guid isPermaLink="false">oai:arXiv.org:2109.08461v2</guid>
      <category>cs.GT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1016/j.mathsocsci.2022.10.003</arxiv:DOI>
      <arxiv:journal_reference>Mathematical Social Sciences, Volume 121, January 2023, Pages 50-60</arxiv:journal_reference>
      <dc:creator>Franklin Camacho, Rigoberto Fonseca-Delgado, Ram\'on Pino P\'erez, Guido Tapia</dc:creator>
    </item>
    <item>
      <title>SFILES 2.0: An extended text-based flowsheet representation</title>
      <link>https://arxiv.org/abs/2208.00778</link>
      <description>arXiv:2208.00778v2 Announce Type: replace 
Abstract: SFILES are a text-based notation for chemical process flowsheets. They were originally proposed by d'Anterroches (Process flow sheet generation &amp; design through a group contribution approach) who was inspired by the text-based SMILES notation for molecules. The text-based format has several advantages compared to flowsheet images regarding the storage format, computational accessibility, and eventually for data analysis and processing. However, the original SFILES version cannot describe essential flowsheet configurations unambiguously, such as the distinction between top and bottom products. Neither is it capable of describing the control structure required for the safe and reliable operation of chemical processes. Also, there is no publicly available software for decoding or encoding chemical process topologies to SFILES. We propose the SFILES 2.0 with a complete description of the extended notation and naming conventions. Additionally, we provide open-source software for the automated conversion between flowsheet graphs and SFILES 2.0 strings. This way, we hope to encourage researchers and engineers to publish their flowsheet topologies as SFILES 2.0 strings. The ultimate goal is to set the standards for creating a FAIR database of chemical process flowsheets, which would be of great value for future data analysis and processing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2208.00778v2</guid>
      <category>cs.DB</category>
      <category>cs.LG</category>
      <category>q-bio.QM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1007/s11081-023-09798-9</arxiv:DOI>
      <arxiv:journal_reference>Optimization and Engineering, Volume 24, pages 2911-2933, (2023)</arxiv:journal_reference>
      <dc:creator>Gabriel Vogel, Edwin Hirtreiter, Lukas Schulze Balhorn, Artur M. Schweidtmann</dc:creator>
    </item>
    <item>
      <title>Learning from flowsheets: A generative transformer model for autocompletion of flowsheets</title>
      <link>https://arxiv.org/abs/2208.00859</link>
      <description>arXiv:2208.00859v2 Announce Type: replace 
Abstract: We propose a novel method enabling autocompletion of chemical flowsheets. This idea is inspired by the autocompletion of text. We represent flowsheets as strings using the text-based SFILES 2.0 notation and learn the grammatical structure of the SFILES 2.0 language and common patterns in flowsheets using a transformer-based language model. We pre-train our model on synthetically generated flowsheet topologies to learn the flowsheet language grammar. Then, we fine-tune our model in a transfer learning step on real flowsheet topologies. Finally, we use the trained model for causal language modeling to autocomplete flowsheets. Eventually, the proposed method can provide chemical engineers with recommendations during interactive flowsheet synthesis. The results demonstrate a high potential of this approach for future AI-assisted process synthesis but also reveal the limitations at the present state and the next steps that need to be taken to deploy this technique in realistic flowsheet synthesis scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2208.00859v2</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1016/j.compchemeng.2023.108162</arxiv:DOI>
      <arxiv:journal_reference>Computers and Chemical Engineering Volume 171, March 2023, 108162</arxiv:journal_reference>
      <dc:creator>Gabriel Vogel, Lukas Schulze Balhorn, Artur M. Schweidtmann</dc:creator>
    </item>
    <item>
      <title>Toward automatic generation of control structures for process flow diagrams with large language models</title>
      <link>https://arxiv.org/abs/2211.05583</link>
      <description>arXiv:2211.05583v2 Announce Type: replace 
Abstract: Developing Piping and Instrumentation Diagrams (P&amp;IDs) is a crucial step during process development. We propose a data-driven method for the prediction of control structures. Our methodology is inspired by end-to-end transformer-based human language translation models. We cast the control structure prediction as a translation task where Process Flow Diagrams (PFDs) without control structures are translated to PFDs with control structures. We represent the topology of PFDs as strings using the SFILES 2.0 notation. We pretrain our model using generated PFDs to learn the grammatical structure. Thereafter, the model is fine-tuned leveraging transfer learning on real PFDs. The model achieved a top-5 accuracy of 74.8% on 10,000 generated PFDs and 89.2% on 100,000 generated PFDs. These promising results show great potential for AI-assisted process engineering. The tests on a dataset of 312 real PFDs indicate the need for a larger PFD dataset for industry applications and hybrid artificial intelligence solutions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2211.05583v2</guid>
      <category>cs.CL</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1002/aic.18259</arxiv:DOI>
      <arxiv:journal_reference>AIChE Journal, Volume 70, Issue 1, January 2024, e18259</arxiv:journal_reference>
      <dc:creator>Edwin Hirtreiter, Lukas Schulze Balhorn, Artur M. Schweidtmann</dc:creator>
    </item>
    <item>
      <title>TAMUNA: Doubly Accelerated Distributed Optimization under Partial Participation</title>
      <link>https://arxiv.org/abs/2302.09832</link>
      <description>arXiv:2302.09832v4 Announce Type: replace 
Abstract: In distributed optimization and federated learning, slow and costly communication between parallel devices and the central server constitutes the primary bottleneck. To alleviate this burden, two strategies have emerged: 1) local training (LT), which reduces communication frequency by performing multiple local computations between rounds, and 2) compression (CC), which consists of transmitting lower-dimensional, compact representations. Recent theoretical advances have successfully combined LT and CC to achieve doubly-accelerated communication rates, with respect to both condition number and model dimension. However, these methods have a major drawback: they require full client participation and break down when idle clients miss communication triggers. We introduce TAMUNA, the first algorithm to successfully intertwine LT, CC, and partial participation. By decoupling primal model updates from dual control variates, TAMUNA overcomes the architectural deadlock of prior methods. In the strongly convex setting, TAMUNA converges linearly to the exact solution, establishing a new state of the art by exhibiting doubly-accelerated convergence, while supporting arbitrary levels of client participation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2302.09832v4</guid>
      <category>cs.LG</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Laurent Condat, Ivan Agarsk\'y, Grigory Malinovsky, Peter Richt\'arik</dc:creator>
    </item>
    <item>
      <title>Reversible Numeric Composite Key (RNCK)</title>
      <link>https://arxiv.org/abs/2306.04353</link>
      <description>arXiv:2306.04353v2 Announce Type: replace 
Abstract: In database design, composite keys uniquely identify records and prevent duplication. However, wide multi-column keys can increase index size, comparison work, and join costs. Surrogate keys can mitigate some of these costs, but they also require additional constraints and governance to preserve business-level uniqueness.
  This paper presents a Reversible Numeric Composite Key (RNCK): a single non-negative integer that encodes multiple normalized attributes and can be decoded back to the original tuple under a fixed schema. RNCK is designed to combine the semantic fidelity of composite keys with the operational convenience of numeric keys.
  RNCK can improve storage footprint and key-comparison efficiency when attribute domains are bounded and stable. We formalize correctness and ordering properties, and specify operational semantics for partial-overflow mode. The approach has been used in production systems and is applicable to relational databases, static datasets, and key-value caching systems within the stated constraints.</description>
      <guid isPermaLink="false">oai:arXiv.org:2306.04353v2</guid>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Nicola Asuni</dc:creator>
    </item>
    <item>
      <title>Chatlaw: A Multi-Agent Legal Assistant based on a Role-Aligned Mixture-of-Experts Architecture</title>
      <link>https://arxiv.org/abs/2306.16092</link>
      <description>arXiv:2306.16092v3 Announce Type: replace 
Abstract: Artificial Intelligence (AI) holds great potential in legal services, yet Large Language Models (LLMs) face two major challenges: limited knowledge of the Chinese legal system and vulnerability to hallucinations. To address these issues, we present Chatlaw, a multi-agent legal assistant. Chatlaw's framework is designed to emulate the Standard Operating Procedures (SOP) of real law firms, where different roles (e.g., assistant, researcher, senior lawyer) collaborate on a case. To computationally mirror this collaborative structure, we developed a novel Role-Aligned Mixture-of-Experts (RA-MoE) architecture. In this system, the internal "experts" are specifically trained to align with the distinct tasks of each agent role (e.g., inquiry, analysis, drafting). These specialized agents (Legal Assistant, Researcher, etc.) then form the collaborative framework. When they interact with users, retrieve legal knowledge, analyze case details, or generate reliable consultations, the RA-MoE architecture intelligently routes their computations to the corresponding dedicated expert, ensuring each step is handled by the most qualified parameters. In evaluations, Chatlaw surpasses general-purpose AI models, including GPT-4, achieving a 7.73% improvement in accuracy on the LawBench benchmark and an 11-point higher score on the Unified Qualification Exam for Legal Professionals. Real-case studies and expert assessments further confirm its robustness. Chatlaw enhances the accessibility and reliability of legal services, advancing the provision of legal support to the public.</description>
      <guid isPermaLink="false">oai:arXiv.org:2306.16092v3</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:DOI>10.1016/j.fmre.2026.03.026</arxiv:DOI>
      <dc:creator>Jiaxi Cui, Munan Ning, Zongjian Li, Bohua Chen, Yang Yan, Hao Li, Bin Ling, Yonghong Tian, Li Yuan</dc:creator>
    </item>
    <item>
      <title>Deep reinforcement learning for process design: Review and perspective</title>
      <link>https://arxiv.org/abs/2308.07822</link>
      <description>arXiv:2308.07822v2 Announce Type: replace 
Abstract: The transformation towards renewable energy and feedstock supply in the chemical industry requires new conceptual process design approaches. Recently, breakthroughs in artificial intelligence offer opportunities to accelerate this transition. Specifically, deep reinforcement learning, a subclass of machine learning, has shown the potential to solve complex decision-making problems and aid sustainable process design. We survey state-of-the-art research in reinforcement learning for process design through three major elements: (i) information representation, (ii) agent architecture, and (iii) environment and reward. Moreover, we discuss perspectives on underlying challenges and promising future works to unfold the full potential of reinforcement learning for process design in chemical engineering.</description>
      <guid isPermaLink="false">oai:arXiv.org:2308.07822v2</guid>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qinghe Gao, Artur M. Schweidtmann</dc:creator>
    </item>
    <item>
      <title>Empirical assessment of ChatGPT's answering capabilities in natural science and engineering</title>
      <link>https://arxiv.org/abs/2309.10048</link>
      <description>arXiv:2309.10048v2 Announce Type: replace 
Abstract: ChatGPT is a powerful language model from OpenAI that is arguably able to comprehend and generate text. ChatGPT is expected to greatly impact society, research, and education. An essential step to understand ChatGPT's expected impact is to study its domain-specific answering capabilities. Here, we perform a systematic empirical assessment of its abilities to answer questions across the natural science and engineering domains. We collected 594 questions on natural science and engineering topics from 198 faculty members across five faculties at Delft University of Technology. After collecting the answers from ChatGPT, the participants assessed the quality of the answers using a systematic scheme. Our results show that the answers from ChatGPT are, on average, perceived as ''mostly correct''. Two major trends are that the rating of the ChatGPT answers significantly decreases (i) as the educational level of the question increases and (ii) as we evaluate skills beyond scientific knowledge, e.g., critical attitude.</description>
      <guid isPermaLink="false">oai:arXiv.org:2309.10048v2</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1038/s41598-024-54936-7</arxiv:DOI>
      <arxiv:journal_reference>Scientific Reports, Volume 14, 2024, Article number: 4998</arxiv:journal_reference>
      <dc:creator>Lukas Schulze Balhorn, Jana M. Weber, Stefan Buijsman, Julian R. Hildebrandt, Martina Ziefle, Artur M. Schweidtmann</dc:creator>
    </item>
    <item>
      <title>Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook</title>
      <link>https://arxiv.org/abs/2310.10196</link>
      <description>arXiv:2310.10196v3 Announce Type: replace 
Abstract: Temporal data, including time series and spatio-temporal data, are pervasive in real-world applications. Generated in massive volumes by physical and virtual sensors, they record dynamic system behaviors and enable a wide range of downstream tasks. Effectively analyzing such data is crucial to unlocking their rich information content. Recent advances in large language models and other foundation models have accelerated their use in time series and spatio-temporal data mining. These approaches not only improve pattern recognition and reasoning across diverse domains but also support progress toward artificial general intelligence that can understand and process temporal data. In this survey, we present a comprehensive, up-to-date review of large models tailored or adapted for time series and spatio-temporal data along four dimensions: data types, model categories, model scopes, and application areas/tasks. We organize existing work into two main groups: large models for time series analysis (LM4TS) and for spatio-temporal data mining (LM4STD), and further distinguish general-purpose from domain-specific models. We also curate related resources, including datasets, model implementations, and tools, organized by major application areas. Overall, this survey consolidates recent advances and highlights foundations, applications, resources, and open research opportunities in large model-centric temporal data analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2310.10196v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ming Jin, Yaxuan Kong, Yuxuan Liang, Chaoli Zhang, Siqiao Xue, Xue Wang, James Zhang, Yi Wang, Haifeng Chen, Xiaoli Li, Vincent S. Tseng, Yu Zheng, Lei Chen, Hui Xiong, Shirui Pan, Qingsong Wen</dc:creator>
    </item>
    <item>
      <title>A higher order numerical method for singularly perturbed elliptic problems with characteristic boundary layers</title>
      <link>https://arxiv.org/abs/2311.00554</link>
      <description>arXiv:2311.00554v2 Announce Type: replace 
Abstract: A Petrov-Galerkin finite element method is constructed for a singularly perturbed elliptic problem in two space dimensions. The solution contains a regular boundary layer and two characteristic boundary layers. Exponential splines are used as test functions in one coordinate direction and are combined with bilinear trial functions defined on a Shishkin mesh. The resulting numerical method is shown to be a stable parameter-uniform numerical method that achieves a higher order of convergence compared to upwinding on the same mesh.</description>
      <guid isPermaLink="false">oai:arXiv.org:2311.00554v2</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Alan F. Hegarty, Eugene O'Riordan</dc:creator>
    </item>
    <item>
      <title>Notes on data-driven output-feedback control of linear MIMO systems</title>
      <link>https://arxiv.org/abs/2311.17484</link>
      <description>arXiv:2311.17484v3 Announce Type: replace 
Abstract: Recent works have approached the data-driven design of dynamic output-feedback controllers for discrete-time LTI systems by constructing non-minimal state vectors composed of past inputs and outputs. Depending on the system's complexity (order $n$, lag $\ell$ and number of outputs $p$), it was observed in several works that such an approach presents significant limitations. In particular, many works require to restrict the class of LTI systems to those satisfying the relation $p\ell=n$. In this note, we show how to address the general MIMO case (for which $p\ell\geq n$ in general) by constructing an alternative non-minimal state vector from data. Different from the existing literature, our method guarantees the satisfaction of certain rank conditions when the system is persistently excited, thereby facilitating the direct data-driven dynamic output-feedback control of MIMO systems by applying methods that were originally developed for the input-state data setting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2311.17484v3</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:DOI>10.1109/TAC.2025.3553073</arxiv:DOI>
      <arxiv:journal_reference>IEEE Transactions on Automatic Control ( Volume: 70, Issue: 9, September 2025)</arxiv:journal_reference>
      <dc:creator>Mohammad Alsalti, Victor G. Lopez, Matthias A. M\"uller</dc:creator>
    </item>
    <item>
      <title>Toward autocorrection of chemical process flowsheets using large language models</title>
      <link>https://arxiv.org/abs/2312.02873</link>
      <description>arXiv:2312.02873v2 Announce Type: replace 
Abstract: The process engineering domain widely uses Process Flow Diagrams (PFDs) and Process and Instrumentation Diagrams (P&amp;IDs) to represent process flows and equipment configurations. However, the P&amp;IDs and PFDs, hereafter called flowsheets, can contain errors causing safety hazards, inefficient operation, and unnecessary expenses. Correcting and verifying flowsheets is a tedious, manual process. We propose a novel generative AI methodology for automatically identifying errors in flowsheets and suggesting corrections to the user, i.e., autocorrecting flowsheets. Inspired by the breakthrough of Large Language Models (LLMs) for grammatical autocorrection of human language, we investigate LLMs for the autocorrection of flowsheets. The input to the model is a potentially erroneous flowsheet and the output of the model are suggestions for a corrected flowsheet. We train our autocorrection model on a synthetic dataset in a supervised manner. The model achieves a top-1 accuracy of 80% and a top-5 accuracy of 84% on an independent test dataset of synthetically generated flowsheets. The results suggest that the model can learn to autocorrect the synthetic flowsheets. We envision that flowsheet autocorrection will become a useful tool for chemical engineers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2312.02873v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1016/B978-0-443-28824-1.50519-6</arxiv:DOI>
      <arxiv:journal_reference>Computer Aided Chemical Engineering, Volume 53, 2024, Pages 3109-3114</arxiv:journal_reference>
      <dc:creator>Lukas Schulze Balhorn, Marc Caballero, Artur M. Schweidtmann</dc:creator>
    </item>
    <item>
      <title>Greedy Grammar Induction with Indirect Negative Evidence</title>
      <link>https://arxiv.org/abs/2312.15321</link>
      <description>arXiv:2312.15321v3 Announce Type: replace 
Abstract: This paper proposes a non-lexicalized grammar-induction procedure that separates two tests: recognition of the observed finite presentation, and rejection of short preterminal strings generated by a hypothesis but unsupported by the evidence.
  The central object is the rule-coverage bound \(\ell^*(G)\): the maximum, over rules in \(G\), of the length of the shortest preterminal string whose derivation uses that rule. This bound induces the comparison universe \(\Sigma_{\mathrm{pre}}^{\le \ell^*(G)}\), where unsupported generated strings serve as indirect evidence against overgenerating hypotheses.
  We give a greedy search algorithm over rule sets and prove a conditional weak-recovery theorem: under explicit reachability conditions and sufficient saturation of the presentation, the exact learner reaches a grammar weakly equivalent to the unknown target. The complexity analysis is slice-wise: for each fixed incrementality radius \(k\), the search explores polynomially many rule-set extensions in the finite rule universe. Across 31 benchmark runs spanning Dyck-\(k\) languages \((1\le k\le4)\), palindromes, \(a^n b^n\), English-like recursive fragments, and an inherently ambiguous union language, grammar-level analysis establishes weak equivalence between every returned grammar and its target.</description>
      <guid isPermaLink="false">oai:arXiv.org:2312.15321v3</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Joseph Potashnik</dc:creator>
    </item>
    <item>
      <title>EnchantDance: Unveiling the Potential of Music-Driven Dance Movement</title>
      <link>https://arxiv.org/abs/2312.15946</link>
      <description>arXiv:2312.15946v3 Announce Type: replace 
Abstract: The task of music-driven dance generation involves creating coherent dance movements that correspond to the given music. While existing methods can produce physically plausible dances, they often struggle to generalize to out-of-set data. The challenge arises from three aspects: 1) the high diversity of dance movements and significant differences in the distribution of music modalities, which make it difficult to generate music-aligned dance movements. 2) the lack of a large-scale music-dance dataset, which hinders the generation of generalized dance movements from music. 3) The protracted nature of dance movements poses a challenge to the maintenance of a consistent dance style. In this work, we introduce the EnchantDance framework, a state-of-the-art method for dance generation. Due to the redundancy of the original dance sequence along the time axis, EnchantDance first constructs a strong dance latent space and then trains a dance diffusion model on the dance latent space. To address the data gap, we construct a large-scale music-dance dataset, ChoreoSpectrum3D Dataset, which includes four dance genres and has a total duration of 70.32 hours, making it the largest reported music-dance dataset to date. To enhance consistency between music genre and dance style, we pre-train a music genre prediction network using transfer learning and incorporate music genre as extra conditional information in the training of the dance diffusion model. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance on dance quality, diversity, and consistency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2312.15946v3</guid>
      <category>cs.SD</category>
      <category>cs.GR</category>
      <category>eess.AS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bo Han, Teng Zhang, Zeyu Ling, Feilin Han</dc:creator>
    </item>
    <item>
      <title>Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay</title>
      <link>https://arxiv.org/abs/2401.01599</link>
      <description>arXiv:2401.01599v4 Announce Type: replace 
Abstract: The generalization error curve of certain kernel regression method aims at determining the exact order of generalization error with various source condition, noise level and choice of the regularization parameter rather than the minimax rate. In this work, under mild assumptions, we rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method (and a large class of analytic spectral algorithms) in kernel regression. Consequently, we could sharpen the near inconsistency of kernel interpolation and clarify the saturation effects of kernel regression algorithms with higher qualification, etc. Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks. A novel technical contribution, the analytic functional argument, might be of independent interest.</description>
      <guid isPermaLink="false">oai:arXiv.org:2401.01599v4</guid>
      <category>cs.LG</category>
      <category>math.ST</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Yicheng Li, Weiye Gan, Zuoqiang Shi, Qian Lin</dc:creator>
    </item>
    <item>
      <title>A New Class of Linear Codes</title>
      <link>https://arxiv.org/abs/2401.07986</link>
      <description>arXiv:2401.07986v3 Announce Type: replace 
Abstract: Let $n$ be a prime power, $r$ be a prime with $r\mid n-1$, and $\varepsilon\in (0,1/2)$. Using the theory of multiplicative character sums and superelliptic curves, we construct new codes over $\mathbb F_r$ having length $n$, relative distance $(r-1)/r+O(n^{-\varepsilon})$ and rate $n^{-1/2-\varepsilon}$. When $r=2$, our binary codes have exponential size when compared to all previously known families of linear and non-linear codes with relative distance asymptotic to $1/2$, such as Delsarte--Goethals codes. Moreover, concatenating with a Reed-Solomon code we get a family of codes of length $n$ and rate $n^{-1/(2n+2)-2\varepsilon/(n+1)}+O(n^{-1/(n+1)})$ and relative distance $1/2+O(n^{-\varepsilon})$. This shows that, for a fixed length, the rate of the concatenation suggested by Kschischang and Tasbihi (2024) of a Reed-Solomon and a Reed-Muller code can be made an order of magnitude smaller than a concatenation of a Reed-Solomon with a large dimensional Shadow code, while still keeping the regime of relative distance $1/2$. Finally, we show that the square of a Shadow code behaves like a random code and the Shadow code itself has a decoding algorithm, which suggest that such class of codes has the potential to be interesting for cryptographic applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2401.07986v3</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <category>math.NT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Akash Bhople, Giacomo Cherubini, Giacomo Micheli, Tefjol Pllaha</dc:creator>
    </item>
    <item>
      <title>The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes</title>
      <link>https://arxiv.org/abs/2402.08922</link>
      <description>arXiv:2402.08922v3 Announce Type: replace 
Abstract: Large-scale black-box models have become ubiquitous across numerous applications. Understanding the influence of individual training data sources on predictions made by these models is crucial for improving their trustworthiness. Current influence estimation techniques involve computing gradients for every training point or repeated training on different subsets. These approaches face obvious computational challenges when scaled up to large datasets and models.
  In this paper, we introduce and explore the Mirrored Influence Hypothesis, highlighting a reciprocal nature of influence between training and test data. Specifically, it suggests that evaluating the influence of training data on test predictions can be reformulated as an equivalent, yet inverse problem: assessing how the predictions for training samples would be altered if the model were trained on specific test samples. Through both empirical and theoretical validations, we demonstrate the wide applicability of our hypothesis. Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point. This approach can capitalize on the common asymmetry in scenarios where the number of test samples under concurrent examination is much smaller than the scale of the training dataset, thus gaining a significant improvement in efficiency compared to existing approaches.
  We demonstrate the applicability of our method across a range of scenarios, including data attribution in diffusion models, data leakage detection, analysis of memorization, mislabeled data detection, and tracing behavior in language models. Our code will be made available at https://github.com/ruoxi-jia-group/Forward-INF.</description>
      <guid isPermaLink="false">oai:arXiv.org:2402.08922v3</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024</arxiv:journal_reference>
      <dc:creator>Myeongseob Ko, Feiyang Kang, Weiyan Shi, Ming Jin, Zhou Yu, Ruoxi Jia</dc:creator>
    </item>
    <item>
      <title>Investigating the Histogram Loss in Regression</title>
      <link>https://arxiv.org/abs/2402.13425</link>
      <description>arXiv:2402.13425v3 Announce Type: replace 
Abstract: It is becoming increasingly common in regression to train neural networks that model the entire distribution even if only the mean is required for prediction. This additional modeling often comes with performance gain and the reasons behind the improvement are not fully known. This paper investigates a recent approach to regression, the Histogram Loss, which involves learning the conditional distribution of the target variable by minimizing the cross-entropy between a target distribution and a flexible histogram prediction. We design theoretical and empirical analyses to determine why and when this performance gain appears, and how different components of the loss contribute to it. Our results suggest that the benefits of learning distributions in this setup come from improvements in optimization rather than modelling extra information. We then demonstrate the viability of the Histogram Loss in common deep learning applications without a need for costly hyperparameter tuning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2402.13425v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>JMLR,2026</arxiv:journal_reference>
      <dc:creator>Ehsan Imani, Kai Luedemann, Sam Scholnick-Hughes, Esraa Elelimy, Martha White</dc:creator>
    </item>
    <item>
      <title>Are Classification Robustness and Explanation Robustness Really Strongly Correlated? An Analysis Through Input Loss Landscape</title>
      <link>https://arxiv.org/abs/2403.06013</link>
      <description>arXiv:2403.06013v2 Announce Type: replace 
Abstract: This paper delves into the critical area of deep learning robustness, challenging the conventional belief that classification robustness and explanation robustness in image classification systems are inherently correlated. Through a novel evaluation approach leveraging clustering for efficient assessment of explanation robustness, we demonstrate that enhancing explanation robustness does not necessarily flatten the input loss landscape with respect to explanation loss - contrary to flattened loss landscapes indicating better classification robustness. To deeply investigate this contradiction, a groundbreaking training method designed to adjust the loss landscape with respect to explanation loss is proposed. Through the new training method, we uncover that although such adjustments can impact the robustness of explanations, they do not have an influence on the robustness of classification. These findings not only challenge the prevailing assumption of a strong correlation between the two forms of robustness but also pave new pathways for understanding relationship between loss landscape and explanation loss.</description>
      <guid isPermaLink="false">oai:arXiv.org:2403.06013v2</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tiejin Chen, Wenwang Huang, Linsey Pang, Dongsheng Luo, Hua Wei</dc:creator>
    </item>
    <item>
      <title>Quantifying Noise of Dynamic Vision Sensor</title>
      <link>https://arxiv.org/abs/2404.01948</link>
      <description>arXiv:2404.01948v3 Announce Type: replace 
Abstract: Dynamic visual sensors (DVS) are characterized by a large amount of background activity (BA) noise, which it is mixed with the original (cleaned) sensor signal. The dynamic nature of the signal and the absence in practical application of the ground truth, it clearly makes difficult to distinguish between noise and the cleaned sensor signals using standard image processing techniques. In this letter, a new technique is presented to characterise BA noise derived from the Detrended Fluctuation Analysis (DFA). The proposed technique can be used to address an existing DVS issues, which is how to quantitatively characterised noise and signal without ground truth, and how to derive an optimal denoising filter parameters. The solution of the latter problem is demonstrated for the popular real moving-car dataset.</description>
      <guid isPermaLink="false">oai:arXiv.org:2404.01948v3</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Evgeny V. Votyakov, Alessandro Artusi</dc:creator>
    </item>
    <item>
      <title>A Survey on Large Language Model-Based Game Agents</title>
      <link>https://arxiv.org/abs/2404.02039</link>
      <description>arXiv:2404.02039v5 Announce Type: replace 
Abstract: Game environments provide rich, controllable settings that stimulate many aspects of real-world complexity. As such, game agents offer a valuable testbed for exploring capabilities relevant to Artificial General Intelligence. Recently, the emergence of Large Language Models (LLMs) provides new opportunities to endow these agents with generalizable reasoning, memory, and adaptability in complex game environments. This survey offers an up-to-date review of LLM-based game agents (LLMGAs) through a unified reference architecture. At the single-agent level, we synthesize existing studies around three core components: memory, reasoning, and perception-action interfaces, which jointly characterize how language enables agents to perceive, think, and act. At the multi-agent level, we outline how communication protocols and organizational models support coordination, role differentiation, and large-scale social behaviors. To contextualize these designs, we introduce a challenge-centered taxonomy linking six major game genres to their dominant agent requirements, from low-latency control in action games to open-ended goal formation in sandbox worlds. A curated list of related papers is available at https://github.com/git-disl/awesome-LLM-game-agent-papers</description>
      <guid isPermaLink="false">oai:arXiv.org:2404.02039v5</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sihao Hu, Tiansheng Huang, Gaowen Liu, Ramana Rao Kompella, Fatih Ilhan, Selim Furkan Tekin, Yichang Xu, Zachary Yahn, Ling Liu</dc:creator>
    </item>
    <item>
      <title>Embedded Graph Convolutional Networks for Real-Time Event Data Processing on SoC FPGAs</title>
      <link>https://arxiv.org/abs/2406.07318</link>
      <description>arXiv:2406.07318v3 Announce Type: replace 
Abstract: The utilisation of event cameras represents an important and swiftly evolving trend aimed at addressing the constraints of traditional video systems. Particularly within the automotive domain, these cameras find significant relevance for their integration into embedded real-time systems due to lower latency and power consumption. One effective approach to ensure the necessary throughput and latency for event processing is through the utilisation of graph convolutional networks (GCNs). In this study, we introduce a custom EFGCN (Event-based FPGA-accelerated Graph Convolutional Network) designed with a series of hardware-aware optimisations tailored for PointNetConv,a graph convolution designed for point cloud processing. The proposed techniques result in up to 100-fold reduction in model size compared to Asynchronous Event-based GNN (AEGNN), one of the most recent works in the field, with a relatively small decrease in accuracy (2.9% for the N-Caltech101 classification task, 2.2% for the N-Cars classification task), thus following the TinyML trend. We implemented EFGCN on a ZCU104 SoC FPGA platform without any off-chip external memory resources, achieving a throughput of 13.3 million events per second (MEPS) and real-time partially asynchronous processing with low latency. Across multiple event-based classification benchmarks, our approach achieves competitive accuracy while providing state-of-the-art computational efficiency per event, small model size, and high scalability, customisability and resource efficiency. We publish both software and hardware source code in an open repository: https://github.com/vision-agh/gcnn-dvs-fpga.</description>
      <guid isPermaLink="false">oai:arXiv.org:2406.07318v3</guid>
      <category>cs.CV</category>
      <category>cs.AR</category>
      <category>eess.IV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:DOI>10.1016/j.sysarc.2026.103850</arxiv:DOI>
      <arxiv:journal_reference>Journal of Systems Architecture, Volume 177, August 2026, 103850</arxiv:journal_reference>
      <dc:creator>Kamil Jeziorek, Piotr Wzorek, Krzysztof Blachut, Andrea Pinna, Tomasz Kryjak</dc:creator>
    </item>
    <item>
      <title>Strategic Integration of Artificial Intelligence in the C-Suite: The Role of the Chief AI Officer</title>
      <link>https://arxiv.org/abs/2407.10247</link>
      <description>arXiv:2407.10247v3 Announce Type: replace 
Abstract: The integration of Artificial Intelligence (AI) into corporate strategy has become critical for organizations seeking to maintain competitive advantage in the digital age. Although organizations increasingly rely on AI as a strategic and organizational resource, existing C-suite roles remain only partially equipped to govern, integrate, and leverage it coherently at the enterprise level. Organizations vary in their responses. Some create a dedicated Chief AI Officer (CAIO), others extend existing mandates into hybrid roles, and still others coordinate AI through federated structures. This paper develops a role-design theory to explain this variation. I identify three properties that distinguish AI from earlier cross-cutting enterprise technologies - distributed accountability for judgment, upstream governance, and non-stationarity - and three configurations through which organizations respond: concentrated extension, distributed extension, and role creation. The CAIO Framework links these properties to the executive design problems they generate and to the functions and capabilities required of the dedicated role. Four propositions specify when a dedicated CAIO emerges, what form an organization's response takes, when the dedicated role is effective, and how configurations evolve over time. This paper contributes to research on executive leadership, organizational design, and digital governance by offering a theory-driven account of the strategic integration of AI at the executive level.</description>
      <guid isPermaLink="false">oai:arXiv.org:2407.10247v3</guid>
      <category>cs.CY</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>econ.GN</category>
      <category>q-fin.EC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Marc Schmitt</dc:creator>
    </item>
    <item>
      <title>Data Want to be Free: An Innovation Resistance Theory Model for Identifying Barriers to Government Data Sharing</title>
      <link>https://arxiv.org/abs/2407.10883</link>
      <description>arXiv:2407.10883v2 Announce Type: replace 
Abstract: Data sharing is increasingly essential for digital government and data-driven innovation, yet many public organizations remain reluctant to make their data openly available. While prior research has examined factors influencing open data adoption, little theoretical work explores why resistance persists within public agencies. This study develops an Innovation Resistance Theory (IRT) model tailored to government data sharing to identify predictors of organizational resistance. An initial model was derived from literature and refined through interviews with 21 public organizations across six European countries. The resulting IRT4DS model identifies 39 barriers spanning usage, value, risk, tradition, and image dimensions, and 23 countermeasures mapped to the most critical barriers and the actors responsible for addressing them. By extending IRT into the context of governmental data sharing, the study advances theoretical understanding of why public data often remains closed and provides actionable guidance for policymakers seeking to design enabling data ecosystems and reduce structural and cultural barriers to OGD adoption.</description>
      <guid isPermaLink="false">oai:arXiv.org:2407.10883v2</guid>
      <category>cs.CY</category>
      <category>cs.DB</category>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anastasija Nikiforova, Antoine Clarinval, Anneke Zuiderwijk, Daniel Rudmark, Petar Milic, Charalampos Alexopoulos, Katrin Rajam\"ae-Soosaar</dc:creator>
    </item>
    <item>
      <title>Mean Teacher based SSL Framework for Indoor Localization Using Wi-Fi RSSI Fingerprinting</title>
      <link>https://arxiv.org/abs/2407.13303</link>
      <description>arXiv:2407.13303v2 Announce Type: replace 
Abstract: Conventional large-scale indoor localization based on Wi-Fi RSSI fingerprinting faces issues of time-consuming and labor-intensive labeled data collection, limited generalization of a model trained under a supervised learning (SL) framework due to its inability to leverage unlabeled data, and model performance degradation in dynamic scenarios with environmental variations. To address those challenging issues, we propose a comprehensive semi-supervised learning (SSL) framework for a deep neural network (DNN) localization model based on the Mean Teacher, which incorporates access point selection, model pre-training/cloning, and batch-level noise injection. The proposed SSL framework can not only efficiently use hybrid labeled/unlabeled databases for static training of a model during the offline phase, but also exploit unlabeled fingerprints from users of the indoor localization system deployed in the field for continuous retraining of the model during the online phase. We base the proposed SSL framework on the Mean Teacher because it can generate more stable target labels through an exponential moving average of model weights without incurring the high computational complexity of the Pi-Model and with better scalability for online learning than Temporal Ensembling, making it an optimal choice that strikes the right balance between performance and computational complexity in large-scale indoor localization. With the UJIIndoorLoc database, the proposed SSL framework reduces the mean 3D errors of the CNNLoc and SIMO-DNN models by 7.403% and 7.748%, respectively, compared with those under the conventional SL framework; with the XJTLU dynamic database, the maximum reduction in mean 2D error reaches up to 49.227% under a dynamic training scenario, demonstrating the substantial performance improvement achieved by the proposed SSL framework.</description>
      <guid isPermaLink="false">oai:arXiv.org:2407.13303v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:DOI>10.1016/j.asoc.2026.115711</arxiv:DOI>
      <arxiv:journal_reference>Applied Soft Computing, Available online 6 June 2026, 115711</arxiv:journal_reference>
      <dc:creator>Sihao Li, Zhe Tang, Kyeong Soo Kim, Jeremy S. Smith</dc:creator>
    </item>
    <item>
      <title>Assessing the Variety of a Concept Space Using an Unbiased Estimate of Rao's Quadratic Index</title>
      <link>https://arxiv.org/abs/2408.00684</link>
      <description>arXiv:2408.00684v2 Announce Type: replace 
Abstract: Past research relates design creativity to 'divergent thinking,' i.e., how well the concept space is explored during the early phase of design. Researchers have argued that generating several concepts would increase the chances of producing better design solutions. 'Variety' is one of the parameters by which one can quantify the breadth of a concept space explored by the designers. It is useful to assess variety at the conceptual design stage because, at this stage, designers have the freedom to explore different solution principles so as to satisfy a design problem with substantially novel concepts. This article elaborates on and critically examines the existing variety metrics from the engineering design literature, discussing their limitations. A new distance-based variety metric is proposed, along with a prescriptive framework to support the assessment process. The framework measures the real-valued distance between two design concepts using any chosen representation of their underlying abstraction levels. The proposed framework is implemented in a software tool called 'VariAnT.' Furthermore, the tool's application is demonstrated through an illustrative example.</description>
      <guid isPermaLink="false">oai:arXiv.org:2408.00684v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anubhab Majumder, Ujjwal Pal, Amaresh Chakrabarti</dc:creator>
    </item>
    <item>
      <title>Proven Advantage of Multiobjective Evolutionary Algorithms for Problems with Different Degrees of Conflict</title>
      <link>https://arxiv.org/abs/2408.04207</link>
      <description>arXiv:2408.04207v3 Announce Type: replace 
Abstract: The field of multiobjective evolutionary algorithms (MOEAs) often emphasizes its popularity for optimization problems with conflicting objectives. However, it is still theoretically unknown how MOEAs perform compared with typical approaches outside this field.
  This paper conducts such a systematic theoretical comparison on problem classes with different degrees of conflict. With OneMaxMin$_k$ depicting $k\in[0..n]$ degrees of conflict, we show the difficulties of two typical non-MOEA approaches, the scalarization (weighted-sum) and {the} $\epsilon-$constraint approach. We prove that for any set of weights, the set of optima formed by {the} scalarization approach cannot cover its full Pareto front for $k&gt;2$. Although constrained problems constructed from $\epsilon-$constraint approach ensure the full coverage, general ways (via exterior or nonparameter penalty functions) to solve these constrained problems encounter difficulties. The nonparameter penalty function way cannot guarantee the full coverage, and the exterior way covers the Pareto front with expected $O(\max\{k,1\}n\ln n)$ number of function evaluations, but only with careful settings of $\epsilon$ and $r$ ($r&gt;1/(\epsilon+1-\lceil \epsilon \rceil)$).
  In contrast, MOEAs efficiently solve OneMaxMin$_k$ without careful designs. We prove the same expected runtime of $O(\max\{k,1\}n\ln n)$ for the (G)SEMO, MOEA/D, NSGA-II, and SMS-EMOA.
  Our brief discussions on a bi-objective LeadingOnes variant with different degrees of conflict show similar findings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2408.04207v3</guid>
      <category>cs.NE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1016/j.artint.2026.104573</arxiv:DOI>
      <arxiv:journal_reference>Artificial Intelligence, 2026, 104573</arxiv:journal_reference>
      <dc:creator>Weijie Zheng</dc:creator>
    </item>
    <item>
      <title>Exposing Barriers to Flexibility Aggregation in Unbalanced Distribution Networks</title>
      <link>https://arxiv.org/abs/2408.06516</link>
      <description>arXiv:2408.06516v4 Announce Type: replace 
Abstract: The increasing integration of distributed energy resources (DER) offers new opportunities for distribution system operators (DSO) to improve network operation through flexibility services. To utilise flexible resources, various DER flexibility aggregation methods have been proposed, such as the concept of aggregated P-Q flexibility areas. Yet, many existing studies assume perfect coordination among DER and rely on single-phase power flow analysis, thus overlooking barriers to flexibility aggregation in real unbalanced systems. To quantify the impact of these barriers, this paper proposes a three-phase optimal power flow (OPF) framework for P-Q flexibility assessment, implemented as an open-source Julia tool 3FlexAnalyser.jl. The framework explicitly accounts for voltage unbalance and imperfect coordination among DER in low voltage (LV) distribution networks. Simulations on an illustrative 5-bus system and a real 221-bus LV network in the UK reveal that over 30% of the theoretical aggregated flexibility potential can be lost due to phase unbalance and lack of coordination across phases. These findings highlight the need for improved flexibility aggregation tools applicable to real unbalanced distribution networks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2408.06516v4</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Andrey Churkin, Wangwei Kong, Pierluigi Mancarella, Eduardo A. Mart\'inez Cese\~na</dc:creator>
    </item>
    <item>
      <title>Federated Large Language Models: Current Progress and Future Directions</title>
      <link>https://arxiv.org/abs/2409.15723</link>
      <description>arXiv:2409.15723v3 Announce Type: replace 
Abstract: Large Language Models have achieved impressive performance across diverse applications, yet their training typically depends on centralized data collection, raising serious privacy and governance concerns. Federated Learning offers a decentralized alternative by enabling multiple clients to collaboratively train shared models without exposing raw local data. However, integrating FL with LLMs introduces new challenges, including data heterogeneity, convergence instability, communication overhead, and computational constraints. This survey provides a comprehensive and up-to-date overview of Federated Learning for Large Language Models (FedLLM). We systematically review recent advances, with particular emphasis on federated fine-tuning and federated prompt learning, and analyze how existing methods address efficiency, personalization, and security challenges. We further summarize emerging directions such as federated pre-training and federated agents. Our goal is to offer a structured perspective on this rapidly evolving field and to highlight promising avenues for future research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2409.15723v3</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yuhang Yao, Jianyi Zhang, Junda Wu, Chengkai Huang, Yu Xia, Tong Yu, Ruiyi Zhang, Sungchul Kim, Ryan Rossi, Ang Li, Lina Yao, Julian McAuley, Yiran Chen, Carlee Joe-Wong</dc:creator>
    </item>
    <item>
      <title>Robust and efficient data-driven predictive control</title>
      <link>https://arxiv.org/abs/2409.18867</link>
      <description>arXiv:2409.18867v2 Announce Type: replace 
Abstract: We propose a robust and efficient data-driven predictive control (eDDPC) scheme which is more sample efficient (requires less offline data) compared to existing schemes, and is also computationally efficient. This scheme employs a recently proposed data-based representation of linear time-invariant (LTI) systems as a predictor. Such a representation serves as an alternative to Hankel-based predictors obtained from, e.g., the so-called fundamental lemma, and can be derived by exploiting the kernel structure of shallow Hankel matrices of data. This allows for application of our proposed scheme using very short (and potentially irregularly measured) noisy input-output data, the amount of which is independent of the prediction horizon. To account for measurement noise, we provide a novel result that quantifies the uncertainty between the true (unknown) restricted behavior of the system and the estimated one from noisy data. Furthermore, we show that the robust eDDPC scheme is recursively feasible and that the resulting closed-loop system is practically exponentially stable. Finally, we compare the performance of this scheme to existing ones on a case study of a four tank system.</description>
      <guid isPermaLink="false">oai:arXiv.org:2409.18867v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:DOI>10.1016/j.automatica.2026.113108</arxiv:DOI>
      <arxiv:journal_reference>Automatica, September 2026, Article: 113108, Volume 191</arxiv:journal_reference>
      <dc:creator>Mohammad Alsalti, Manuel Barkey, Victor G. Lopez, Matthias A. M\"uller</dc:creator>
    </item>
    <item>
      <title>RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations</title>
      <link>https://arxiv.org/abs/2410.00713</link>
      <description>arXiv:2410.00713v4 Announce Type: replace 
Abstract: Anomaly detection is a core capability for robotic perception and industrial inspection, yet most existing benchmarks are collected under controlled conditions with fixed viewpoints and stable illumination, failing to reflect real deployment scenarios. We introduce RAD (Realistic Anomaly Detection), a robot-captured, multi-view dataset designed to stress pose variation, reflective materials, and viewpoint-dependent defect visibility. RAD covers 13 everyday object categories and four realistic defect types--scratched, missing, stained, and squeezed--captured from over 60 robot viewpoints per object under uncontrolled lighting. We benchmark a wide range of state-of-the-art approaches, including 2D feature-based methods, 3D reconstruction pipelines, and vision-language models (VLMs), under a pose-agnostic setting. Surprisingly, we find that mature 2D feature-embedding methods consistently outperform recent 3D and VLM-based approaches at the image level, while the performance gap narrows for pixel-level localization. Our analysis reveals that reflective surfaces, geometric symmetry, and sparse viewpoint coverage fundamentally limit current geometry-based and zero-shot methods. RAD establishes a challenging and realistic benchmark for robotic anomaly detection, highlighting critical open problems beyond controlled laboratory settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2410.00713v4</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kaichen Zhou, Xinhai Chang, Taewhan Kim, Jiadong Zhang, Yang Cao, Chufei Peng, Fangneng Zhan, Hao Zhao, Hao Dong, Kai Ming Ting, Ye Zhu</dc:creator>
    </item>
    <item>
      <title>Communication-Efficient Federated Learning under Dynamic Device Arrival and Departure: Convergence Analysis and Algorithm Design</title>
      <link>https://arxiv.org/abs/2410.05662</link>
      <description>arXiv:2410.05662v4 Announce Type: replace 
Abstract: Most federated learning (FL) approaches assume a fixed device set. However, real-world scenarios often involve devices dynamically joining or leaving the system, driven by, e.g., user mobility patterns or handovers across cell boundaries. This dynamic setting introduces unique challenges: (1) the optimization objective evolves with the active device set, unlike traditional FL's static objective; and (2) the current global model may no longer serve as an effective initialization for subsequent rounds, potentially hindering adaptation, delaying convergence, and reducing resource efficiency. To address these challenges, we first provide a convergence analysis for FL under a dynamic device set, accounting for factors such as gradient noise, local training iterations, and data heterogeneity in this practical setting. Motivated by this analysis, we propose a model initialization algorithm that enables rapid adaptation whenever devices join or leave the network. Our key idea is to compute a weighted average of previous global models, guided by gradient similarity, to prioritize models trained on data distributions that closely align with the current device set, thereby accelerating recovery from distribution shifts in fewer training rounds. This plug-and-play algorithm is designed to integrate seamlessly with existing FL methods, offering broad applicability. Experiments demonstrate that our approach achieves convergence speedups typically an order of magnitude or more compared to baselines, which we show drastically reduces energy consumption to reach a target accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2410.05662v4</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhan-Lun Chang, Dong-Jun Han, Seyyedali Hosseinalipour, Mung Chiang, Christopher G. Brinton</dc:creator>
    </item>
    <item>
      <title>MotionGPT-2: A General-Purpose Motion-Language Model for Motion Generation and Understanding</title>
      <link>https://arxiv.org/abs/2410.21747</link>
      <description>arXiv:2410.21747v2 Announce Type: replace 
Abstract: Generating lifelike human motions from descriptive texts has experienced remarkable research focus in the recent years, propelled by the emerging requirements of digital humans.Despite impressive advances, existing approaches are often constrained by limited control modalities, task specificity, and focus solely on body motion representations.In this paper, we present MotionGPT-2, a unified Large Motion-Language Model (LMLM) that addresses these limitations. MotionGPT-2 accommodates multiple motion-relevant tasks and supporting multimodal control conditions through pre-trained Large Language Models (LLMs). It quantizes multimodal inputs-such as text and single-frame poses-into discrete, LLM-interpretable tokens, seamlessly integrating them into the LLM's vocabulary. These tokens are then organized into unified prompts, guiding the LLM to generate motion outputs through a pretraining-then-finetuning paradigm. We also show that the proposed MotionGPT-2 is highly adaptable to the challenging 3D holistic motion generation task, enabled by the innovative motion discretization framework, Part-Aware VQVAE, which ensures fine-grained representations of body and hand movements. Extensive experiments and visualizations validate the effectiveness of our method, demonstrating the adaptability of MotionGPT-2 across motion generation, motion captioning, and generalized motion completion tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2410.21747v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Yuan Wang, Di Huang, Yaqi Zhang, Wanli Ouyang, Jile Jiao, Xuetao Feng, Dan Xu, Shixiang Tang</dc:creator>
    </item>
    <item>
      <title>Discovering Data Structures: Nearest Neighbor Search and Beyond</title>
      <link>https://arxiv.org/abs/2411.03253</link>
      <description>arXiv:2411.03253v2 Announce Type: replace 
Abstract: We propose a general framework for end-to-end learning of data structures. Our framework adapts to the underlying data distribution and provides fine-grained control over query and space complexity. Crucially, the data structure is learned from scratch, and does not require careful initialization or seeding with candidate data structures/algorithms. We first apply this framework to the problem of nearest neighbor search. In several settings, we are able to reverse-engineer the learned data structures and query algorithms. For 1D nearest neighbor search, the model discovers optimal distribution (in)dependent algorithms such as binary search and variants of interpolation search. In higher dimensions, the model learns solutions that resemble k-d trees in some regimes, while in others, they have elements of locality-sensitive hashing. The model can also learn useful representations of high-dimensional data and exploit them to design effective data structures. We also adapt our framework to the problem of estimating frequencies over a data stream, and believe it could also be a powerful discovery tool for new problems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2411.03253v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Omar Salemohamed, Laurent Charlin, Shivam Garg, Vatsal Sharan, Gregory Valiant</dc:creator>
    </item>
    <item>
      <title>ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction?</title>
      <link>https://arxiv.org/abs/2411.06469</link>
      <description>arXiv:2411.06469v2 Announce Type: replace 
Abstract: Large Language Models (LLMs) hold great promise to revolutionize current clinical systems for their superior capacities on medical text processing tasks and medical licensing exams. Meanwhile, traditional ML models such as SVM and XGBoost have still been mainly adopted in clinical prediction tasks. An emerging question is: Can LLMs beat traditional ML models in clinical prediction? Thus, we build a new benchmark ClinicalBench to comprehensively study the clinical predictive modeling capacities of both general-purpose and medical LLMs, and compare them with traditional ML models. ClinicalBench embraces three common clinical prediction tasks, two databases, 14 general-purpose LLMs, 8 medical LLMs, and 11 traditional ML models. Through extensive empirical investigation, we discover that both general-purpose and medical LLMs, even with different model scales, diverse prompting or fine-tuning strategies, still cannot beat traditional ML models in clinical prediction yet, shedding light on their potential deficiency in clinical reasoning and decision-making. We call for caution when practitioners adopt LLMs in clinical applications. ClinicalBench can be utilized to bridge the gap between LLMs' development for healthcare and real-world clinical practice.</description>
      <guid isPermaLink="false">oai:arXiv.org:2411.06469v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Canyu Chen, Jian Yu, Shan Chen, Che Liu, Zhongwei Wan, Shuang Zhou, Yuan Luo, Rui Zhang, Danielle Bitterman, Fei Wang, Kai Shu</dc:creator>
    </item>
    <item>
      <title>Modeling Stochastic Conditional Dynamics from Sparse Observations via Kernel-Stabilized Flow Matching</title>
      <link>https://arxiv.org/abs/2411.08314</link>
      <description>arXiv:2411.08314v5 Announce Type: replace 
Abstract: Learning to transform conditional probability densities over time is a fundamental challenge spanning probabilistic modeling and the natural sciences. This task is paramount when forecasting the evolution of stochastic nonlinear dynamical systems in biological and physical domains. While flow-based models can predict the temporal evolution of probability distributions, existing approaches often assume discrete conditioning with samples that are paired across time, limiting their scientific applicability where frequently only sparse data with unpaired continuous conditioning is available. We propose Conditional Variable Flow Matching (CVFM), a framework for learning flows transforming conditional distributions with amortization across the continuous space of conditional densities. CVFM addresses the high-variance instability of prior methods by jointly sampling flows over state and conditioning variables, utilizing a conditioning mismatch kernel alongside a conditional Wasserstein distance to reweight the conditional optimal transport objective. Collectively, these advances allow for learning dynamics from sparse unpaired measurements of state-condition across time. We evaluate CVFM on conditional mapping benchmarks and a case study modeling the temporal evolution of materials internal structure during manufacturing processes, observing improved performance and convergence characteristics over existing conditional variants. Code is available at https://github.com/agenerale/conditional-variable-flow-matching.</description>
      <guid isPermaLink="false">oai:arXiv.org:2411.08314v5</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>Transactions on Machine Learning Research, 2026</arxiv:journal_reference>
      <dc:creator>Adam P. Generale, Andreas E. Robertson, Surya R. Kalidindi</dc:creator>
    </item>
    <item>
      <title>Zero and Few Shot Load Forecasting with Large Language Models</title>
      <link>https://arxiv.org/abs/2411.11350</link>
      <description>arXiv:2411.11350v2 Announce Type: replace 
Abstract: Deep learning models have shown strong performance in load forecasting, but they generally require large amounts of data for model training before being applied to new scenarios, which limits their effectiveness in data-scarce scenarios. Inspired by the great success of pre-trained language models (LLMs) in natural language processing, this paper proposes a zero and few shot load forecasting approach using an advanced LLM framework denoted as the Chronos model. By utilizing its extensive pre-trained knowledge, the Chronos model enables accurate load forecasting in data-scarce scenarios. Simulation results across five real-world datasets demonstrate that the Chronos model significantly outperforms nine popular baseline models for both deterministic and probabilistic load forecasting with various forecast horizons (e.g., 1 to 48 hours), even though the Chronos model is neither tailored nor fine-tuned to these specific load datasets. Notably, Chronos reduces root mean squared error (RMSE), continuous ranked probability score (CRPS), and quantile score (QS) by approximately 7.34%-84.30%, 19.63%-60.06%, and 22.83%-54.49%, respectively, compared to baseline models. These results highlight the superiority and flexibility of the Chronos model, positioning it as an effective solution in data-scarce scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2411.11350v2</guid>
      <category>cs.LG</category>
      <category>eess.SP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:DOI>10.1016/j.ijepes.2026.111792</arxiv:DOI>
      <arxiv:journal_reference>International Journal of Electrical Power &amp; Energy Systems, Volume 177,April 2026</arxiv:journal_reference>
      <dc:creator>Wenlong Liao, Chengrui Zhang, Zhe Yang, Mengshuo Jia, Christian Rehtanz, Jiannong Fang, Fernando Port\'e-Agel</dc:creator>
    </item>
    <item>
      <title>Asymptotic tensor rank is characterized by polynomials</title>
      <link>https://arxiv.org/abs/2411.15789</link>
      <description>arXiv:2411.15789v2 Announce Type: replace 
Abstract: Asymptotic tensor rank is notoriously difficult to determine. Indeed, determining its value for the $2\times 2$ matrix multiplication tensor would determine the matrix multiplication exponent, a long-standing open problem. On the other hand, Strassen's asymptotic rank conjecture makes the bold claim that asymptotic tensor rank equals the largest dimension of the tensor and is thus as easy to compute as matrix rank. Despite tremendous interest, much is still unknown about the structural and computational properties of asymptotic rank; for instance whether it is computable.
  We prove that asymptotic tensor rank is "computable from above", that is, for any real number $r$ there is an (efficient) algorithm that determines, given a tensor $T$, if the asymptotic tensor rank of $T$ is at most $r$. The algorithm has a simple structure; it consists of evaluating a finite list of polynomials on the tensor. Indeed, we prove that the sublevel sets of asymptotic rank are Zariski-closed (just like matrix rank). While we do not exhibit these polynomials explicitly, their mere existence has strong implications on the structure of asymptotic rank.
  As one such implication, we find that the values that asymptotic tensor rank takes, on all tensors, is a well-ordered set. In other words, any non-increasing sequence of asymptotic ranks stabilizes ("discreteness from above"). In particular, for the matrix multiplication exponent (which is an asymptotic rank) there is no sequence of exponents of bilinear maps that approximates it arbitrarily closely from above without being eventually constant. In other words, any such upper bound on the matrix multiplication exponent that is close enough, will "snap" to it. Previously such discreteness results were only known for finite fields or for other tensor parameters (e.g., asymptotic slice rank). We obtain them for infinite fields like the complex numbers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2411.15789v2</guid>
      <category>cs.CC</category>
      <category>math.AG</category>
      <category>quant-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Matthias Christandl, Koen Hoeberechts, Harold Nieuwboer, P\'eter Vrana, Jeroen Zuiddam</dc:creator>
    </item>
    <item>
      <title>BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching</title>
      <link>https://arxiv.org/abs/2411.16102</link>
      <description>arXiv:2411.16102v2 Announce Type: replace 
Abstract: Offline batch inference, which leverages the flexibility of request batching to achieve higher throughput and lower costs, is becoming more popular for latency-insensitive applications. Meanwhile, recent progress in model capability and modality makes requests more diverse in compute and memory demands, creating unique opportunities for throughput improvement by resource overlapping. However, a request schedule that maximizes resource overlapping can conflict with the schedule that maximizes prefix sharing, a widely-used performance optimization, causing sub-optimal inference throughput. We present BlendServe, a system that maximizes resource utilization of offline batch inference by combining the benefits of resource overlapping and prefix sharing using a resource-aware prefix tree. BlendServe exploits the relaxed latency requirements in offline batch inference to reorder and overlap requests with varied resource demands while ensuring high prefix sharing. We evaluate BlendServe on a variety of synthetic multi-modal workloads and show that it provides up to $1.44\times$ throughput boost compared to widely-used industry standards, vLLM and SGLang.</description>
      <guid isPermaLink="false">oai:arXiv.org:2411.16102v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yilong Zhao, Shuo Yang, Kan Zhu, Lianmin Zheng, Baris Kasikci, Yang Zhou, Jiarong Xing, Ion Stoica</dc:creator>
    </item>
    <item>
      <title>TQA-Bench: Evaluating LLMs for Multi-Table Question Answering</title>
      <link>https://arxiv.org/abs/2411.19504</link>
      <description>arXiv:2411.19504v2 Announce Type: replace 
Abstract: The advance of large language models (LLMs) has unlocked great opportunities in complex multi-modal data management tasks, particularly in question answering (QA) over complicated multi-table relational data. Despite significant progress, systematically evaluating LLMs on multi-table QA remains a critical challenge due to the inherent complexity of analyzing the modality of relational data structures and the potentially large scale of serialized tabular data. Existing benchmarks primarily focus on single-table QA, failing to capture the intricacies of connections across multiple relational tables, as required in real-world domains such as finance, healthcare, and e-commerce. We present TQA-Bench, a long-context analytical multi-table QA benchmark derived from real-world public datasets, with a flexible sampling mechanism that varies context length (8K--64K tokens) and symbolic extensions for assessing reasoning beyond retrieval and pattern matching. We systematically evaluate a set of LLMs spanning model scales from 2 billion to 671 billion parameters. Our extensive experiments reveal critical insights into the performance of LLMs in multi-table QA, highlighting both challenges and opportunities for advancing their application in complex, data-driven environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2411.19504v2</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zipeng Qiu, Chenyue Li, You Peng, Guangxin He, Binhang Yuan, Chen Wang</dc:creator>
    </item>
    <item>
      <title>Graph-to-SFILES: Control structure prediction from process topologies using generative artificial intelligence</title>
      <link>https://arxiv.org/abs/2412.00508</link>
      <description>arXiv:2412.00508v2 Announce Type: replace 
Abstract: Control structure design is an important but tedious step in P&amp;ID development. Generative artificial intelligence (AI) promises to reduce P&amp;ID development time by supporting engineers. Previous research on generative AI in chemical process design mainly represented processes by sequences. However, graphs offer a promising alternative because of their permutation invariance. We propose the Graph-to-SFILES model, a generative AI method to predict control structures from flowsheet topologies. The Graph-to-SFILES model takes the flowsheet topology as a graph input and returns a control-extended flowsheet as a sequence in the SFILES 2.0 notation. We compare four different graph encoder architectures, one of them being a graph neural network (GNN) proposed in this work. The Graph-to-SFILES model achieves a top-5 accuracy of 73.2% when trained on 10,000 flowsheet topologies. In addition, the proposed GNN performs best among the encoder architectures. Compared to a purely sequence-based approach, the Graph-to-SFILES model improves the top-5 accuracy for a relatively small training dataset of 1,000 flowsheets from 0.9% to 28.4%. However, the sequence-based approach performs better on a large-scale dataset of 100,000 flowsheets. These results highlight the potential of graph-based AI models to accelerate P&amp;ID development in small-data regimes but their effectiveness on industry relevant case studies still needs to be investigated.</description>
      <guid isPermaLink="false">oai:arXiv.org:2412.00508v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1016/j.compchemeng.2025.109121</arxiv:DOI>
      <arxiv:journal_reference>Computers &amp; Chemical Engineering, Volume 199, 2025, Pages 109121</arxiv:journal_reference>
      <dc:creator>Lukas Schulze Balhorn, Kevin Degens, Artur M. Schweidtmann</dc:creator>
    </item>
    <item>
      <title>Integrated Hierarchical Decision-Making in Inverse Kinematic Planning and Control</title>
      <link>https://arxiv.org/abs/2412.01324</link>
      <description>arXiv:2412.01324v5 Announce Type: replace 
Abstract: This work presents a novel and efficient nonlinear programming framework that tightly integrates hierarchical decision-making with whole-body inverse kinematic planning and control. Decision-making plays a central role in many aspects of robotics, from sparse inverse kinematic control with a minimal number of joints, to inverse kinematic planning while simultaneously selecting a discrete end-effector location from multiple candidates. Current approaches often rely on heavy computations using mixed-integer nonlinear programming, separate decision-making from inverse kinematics (some times approximated by reachability methods), or employ efficient but less versatile $\ell_1$-norm formulations of linear sparse programming, without addressing the underlying nonlinear problem formulations. In contrast, the proposed sparse hierarchical nonlinear programming solver is efficient, versatile, and accurate by exploiting sparse hierarchical structure and leveraging the $\ell_0$-norm which is rarely used in robotics. The solver efficiently tackles complex nonlinear hierarchical decision-making problems previously unaddressed in the literature, such as inverse kinematic planning with simultaneous prioritized selection of end-effector locations from a large set of candidates, or inverse kinematic control with simultaneous selection of bi-manual grasp locations on a randomly rotated box.</description>
      <guid isPermaLink="false">oai:arXiv.org:2412.01324v5</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kai Pfeiffer, Quan Zhang, Yuqing Chen, Gordon Boateng, Yuquan Wang, Vincent Bonnet, Aberrahmane Kheddar</dc:creator>
    </item>
    <item>
      <title>Advancements in Machine Learning and Deep Learning for Early Detection and Management of Mental Health Disorder</title>
      <link>https://arxiv.org/abs/2412.06147</link>
      <description>arXiv:2412.06147v2 Announce Type: replace 
Abstract: For the early identification, diagnosis, and treatment of mental health illnesses, the integration of deep learning (DL) and machine learning (ML) have started playing a significant role. By evaluating complex data from imaging, genetics, and behavioral assessments, these technologies have the potential to improve clinical results significantly. However, they also present unique challenges relating to data integration and ethical issues. The development of ML and DL methods for the early diagnosis and treatment of mental health issues is reviewed in this survey. It examines a range of applications, with a particular emphasis on behavioral assessments, genetic and biomarker analysis, and medical imaging for the diagnosis of diseases like depression, bipolar disorder, and schizophrenia. Predictive modeling for illness development is further discussed in the review, focusing on the function of risk prediction models and longitudinal investigations. Important discoveries show how ML and DL might improve treatment outcomes and diagnostic accuracy while tackling methodological inconsistency, data integration, and ethical concerns. The study emphasizes the significance of building real-time monitoring systems for individualized treatment, improving data fusion techniques, and interdisciplinary collaboration. Upcoming studies should concentrate on surmounting these obstacles to maximize ML and DL's valuable and moral implementation in mental health services.</description>
      <guid isPermaLink="false">oai:arXiv.org:2412.06147v2</guid>
      <category>cs.LG</category>
      <category>cs.ET</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kamala Devi Kannan, Senthil Kumar Jagatheesaperumal, Rajesh N. V. P. S. Kandala, Mojtaba Lotfaliany, Roohallah Alizadehsanid, Mohammadreza Mohebbi</dc:creator>
    </item>
    <item>
      <title>Health-Informed Computing: Estimating and Addressing the Public Health Impact of Data Centers</title>
      <link>https://arxiv.org/abs/2412.06288</link>
      <description>arXiv:2412.06288v4 Announce Type: replace 
Abstract: The surging demand for artificial intelligence (AI) has led to a rapid expansion of energy-intensive data centers, contributing to criteria air pollutant emissions and raising public health concerns that have received comparatively limited attention in sustainability assessments. This paper introduces a principled methodology to model air pollutant emissions for data centers and estimate the public health impacts. Our findings reveal that the growing demand for AI and computing technologies is projected to push the total annual public health burden of U.S. data centers up to more than $20 billion in 2028. Although national-level impacts remain modest, data center health costs are unevenly distributed: in the most affected counties, the estimated per-household health burden can reach about seven times the national average. Next, we propose a health-informed computing framework that explicitly incorporates public health impacts into data center resource management across space and time, mitigating public health costs while supporting environmental sustainability. More broadly, we recommend extended energy reporting to include public health impact of data centers and paying attention to all impacted communities.</description>
      <guid isPermaLink="false">oai:arXiv.org:2412.06288v4</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuelin Han, Zhifeng Wu, Pengfei Li, Adam Wierman, Shaolei Ren</dc:creator>
    </item>
    <item>
      <title>Changing topic bias in biomedical science maps by linking documents through alternative data sources: policy documents, patents, authors, Facebook, and Twitter</title>
      <link>https://arxiv.org/abs/2412.07550</link>
      <description>arXiv:2412.07550v4 Announce Type: replace 
Abstract: Traditional science maps visualize topics by clustering documents within a network, but they are inherently biased toward clustering certain topics over others. If these topics could be chosen, then the science maps could be tailored for different needs. In this paper, we explore the extent to which the topic bias of a science map can be changed by choosing different data sources to build the document network. We analyze this by evaluating the clustering effectiveness of several topic categories over two sources that are traditionally used for the creation of science maps (citations and text similarity) and six non-traditional data sources, which we found favor different kinds of topics: Health issues for Facebook users, biotechnology topics for patent families, government and social issues for policy documents, food topics for Twitter conversations, nursing topics for Twitter users, and geographical entities for document authors (the favoring in this latter source was particularly strong). Our results show that diverse data sources can be used to control topic bias, which opens up the possibility of creating science maps tailored for different needs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2412.07550v4</guid>
      <category>cs.DL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Juan Pablo Bascur, Rodrigo Costas, Suzan Verberne</dc:creator>
    </item>
    <item>
      <title>IDEQ -- Improving Diffusion Models for the Traveling Salesman Problem (TSP) by Leveraging the Structure of the Solution Space</title>
      <link>https://arxiv.org/abs/2412.13858</link>
      <description>arXiv:2412.13858v2 Announce Type: replace 
Abstract: We investigate diffusion models to solve the Traveling Salesman Problem. Building on the recent DIFUSCO and T2TCO approaches, we propose IDEQ. IDEQ improves the quality of the solutions by leveraging the constrained structure of the state space of the TSP. Another key component of IDEQ consists in replacing the last stages of DIFUSCO curriculum learning by considering a uniform distribution over the Hamiltonian tours whose orbits by the 2-opt operator converge to the optimal solution as the training objective. Our experiments show that IDEQ improves the state of the art for such neural network based techniques on synthetic instances. More importantly, our experiments show that IDEQ performs very well on the instances of the TSPlib, a reference benchmark in the TSP community: it closely matches the performance of the best heuristics, LKH3, being even able to obtain better solutions than LKH3 on 2 instances of the TSPlib defined on 1577 and 3795 cities. IDEQ obtains 0.3% optimality gap on TSP instances made of 500 cities, and 0.5% on TSP instances with 1000 cities. This sets a new SOTA for neural based methods solving the TSP. Moreover, IDEQ exhibits a lower variance and better scales-up with the number of cities with regards to DIFUSCO and T2TCO.</description>
      <guid isPermaLink="false">oai:arXiv.org:2412.13858v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Mickael Basson, Philippe Preux</dc:creator>
    </item>
    <item>
      <title>Visual Template Inference for Data Extraction from Documents</title>
      <link>https://arxiv.org/abs/2501.06659</link>
      <description>arXiv:2501.06659v2 Announce Type: replace 
Abstract: Many templatized documents are programmatically generated from structured data following a visual template. Such documents include invoices, tax documents, financial reports, and purchase orders. Effective data extraction from these documents is crucial to support downstream analytical tasks. Current data extraction tools often struggle with complex document layouts, incur high latency and/or cost on large datasets, and require significant human effort. The key insight of our tool, TWIX, is to infer the underlying template used to create such documents, and then extract the data, rather than extracting directly from documents. To do so, TWIX first infers the underlying fields, such as columns of tabular portions or keys in co-located key-value pairs, by leveraging their consistent location patterns (e.g., two fields in the same template repeatedly co-occur within a fixed distance apart across multiple records). TWIX then assembles these fields into a template by enforcing visual constraints, such as vertically aligning table rows with their column headers for tabular regions, and horizontally aligning keys with their values for key-value pairs. TWIX then uses this inferred template to accurately and efficiently extract data from templatized documents at a low cost. On one benchmark with 34 diverse real-world datasets, TWIX outperforms state-of-the-art structured data extraction tools (Evaporate, Textract, and Azure Document Intelligence), and vision-based LLMs like GPT-4-Vision, by over 25% in precision and recall. Another benchmark with 30 large datasets demonstrates TWIX's scalability: it is 520X faster and 3,786X cheaper than the most competitive compared tool, for extracting data from large document collections with over 2000 pages.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.06659v2</guid>
      <category>cs.DB</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yiming Lin, Mawil Hasan, Rohan Kosalge, Alvin Cheung, Aditya G. Parameswaran</dc:creator>
    </item>
    <item>
      <title>CodecFake+: Codec-Based Resynthesized Data as a Proxy for Detecting CodecFake Speech</title>
      <link>https://arxiv.org/abs/2501.08238</link>
      <description>arXiv:2501.08238v3 Announce Type: replace 
Abstract: With the rapid advancement of neural audio codecs, codec-based speech generation (CoSG) systems have become highly powerful. Unfortunately, CoSG also enables the creation of highly realistic deepfake speech, making it easier to mimic an individual's voice and spread misinformation. We refer to this emerging deepfake speech generated by CoSG systems as CodecFake. Detecting such CodecFake is an urgent challenge, yet most existing systems primarily focus on detecting fake speech generated by traditional speech synthesis models. In this paper, we introduce CodecFake+, a large-scale dataset designed to advance CodecFake detection. To our knowledge, CodecFake+ is the largest dataset encompassing the most diverse range of codec architectures. The training set is generated through re-synthesis using 31 publicly available open-source codec models, while the evaluation set includes web-sourced data from 17 advanced CoSG models. We also propose a comprehensive taxonomy that categorizes codecs by their root components: vector quantizer, auxiliary objectives, and decoder types. Our proposed dataset and taxonomy enable detailed analysis at multiple levels to discern the key factors for successful CodecFake detection. At the individual codec level, we validate the effectiveness of using codec re-synthesized speech (CoRS) as training data for large-scale CodecFake detection. At the taxonomy level, we show that detection performance is strongest when the re-synthesis model incorporates disentanglement auxiliary objectives or a frequency-domain decoder. Furthermore, from the perspective of using all the CoRS training data, we show that our proposed taxonomy can be used to select better training data for improving detection performance. Overall, we envision that CodecFake+ will be a valuable resource for both general and fine-grained exploration to develop better anti-spoofing models against CodecFake.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.08238v3</guid>
      <category>cs.SD</category>
      <category>eess.AS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xuanjun Chen, Jiawei Du, Haibin Wu, Lin Zhang, I-Ming Lin, I-Hsiang Chiu, Wenze Ren, Yuan Tseng, Yu Tsao, Jyh-Shing Roger Jang, Hung-yi Lee</dc:creator>
    </item>
    <item>
      <title>Harnessing Rydberg Atomic Receivers: From Quantum Physics to Wireless Communications</title>
      <link>https://arxiv.org/abs/2501.11842</link>
      <description>arXiv:2501.11842v4 Announce Type: replace 
Abstract: The intrinsic integration of Rydberg atomic receivers into wireless communication systems is proposed, by harnessing the principles of quantum physics in wireless communications. More particularly, we conceive a pair of Rydberg atomic receivers, one incorporates a local oscillator (LO), referred to as an LO-dressed receiver, while the other operates without an LO and is termed an LO-free receiver. The appropriate wireless model is developed for each configuration, elaborating on the receiver's responses to the radio frequency (RF) signal, on the potential noise sources, and on the signal-to-noise ratio (SNR) performance. The developed wireless model conforms to the classical RF framework, facilitating compatibility with established signal processing methodologies. Next, we investigate the associated distortion effects that might occur, specifically identifying the conditions under which distortion arises and demonstrating the boundaries of linear dynamic ranges. This provides critical insights into its practical implementations in wireless systems. Finally, extensive simulation results are provided for characterizing the performance of wireless systems, harnessing this pair of Rydberg atomic receivers. Our results demonstrate that LO-dressed systems achieve a significant SNR gain of approximately 40~50 dB over conventional RF receivers in the standard quantum limit regime. This SNR head-room translates into reduced symbol error rates, enabling efficient and reliable transmission with higher-order constellations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.11842v4</guid>
      <category>cs.IT</category>
      <category>eess.SP</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yuanbin Chen, Xufeng Guo, Chau Yuen, Yufei Zhao, Yong Liang Guan, Chong Meng Samson See, Merouane D\'ebbah, Lajos Hanzo</dc:creator>
    </item>
    <item>
      <title>FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint</title>
      <link>https://arxiv.org/abs/2501.15509</link>
      <description>arXiv:2501.15509v5 Announce Type: replace 
Abstract: Model fingerprinting has emerged as a crucial mechanism for safeguarding the intellectual property of open-source models, offering a non-intrusive approach that requires no modifications to the protected model. However, our analysis reveals that existing fingerprinting techniques are fundamentally vulnerable to false claim attacks, wherein adversaries can fraudulently assert ownership over independent third-party models. We demonstrate that this vulnerability stems from the untargeted nature of current methods, which evaluate model similarity based on arbitrary sample outputs rather than alignment with a specific, predefined reference. To mitigate this vulnerability, we introduce FIT-Print, a targeted fingerprinting paradigm that actively counters false claim attacks. Specifically, FIT-Print leverages optimization to transform the fingerprint into a verifiable, targeted signature. Building upon this foundation, we propose two black-box fingerprinting methods, the bit-wise FIT-ModelDiff and the list-wise FIT-LIME, which utilize output distances and feature attributions as robust model signatures, respectively. Extensive evaluations across benchmark models and datasets show that our framework perfectly neutralizes false claim attacks (100% defense success rate) and eliminates false alarms on independent models (0.0%), all while maintaining a 100% ownership verification rate against diverse model reuse techniques.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.15509v5</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shuo Shao, Haozhe Zhu, Yiming Li, Hongwei Yao, Tianwei Zhang, Zhan Qin</dc:creator>
    </item>
    <item>
      <title>Adaptive Prior Selection in Gaussian Process Bandits with Thompson Sampling</title>
      <link>https://arxiv.org/abs/2502.01226</link>
      <description>arXiv:2502.01226v4 Announce Type: replace 
Abstract: Gaussian process (GP) bandits provide a powerful framework for performing blackbox optimization of unknown functions. The characteristics of the unknown function depend heavily on the assumed GP prior. Most work in the literature assume that this prior is known but in practice this seldom holds. Instead, practitioners often rely on maximum likelihood estimation to select the hyperparameters of the prior - which lacks theoretical guarantees. In this work, we study two algorithms for joint prior selection and regret minimization in GP bandits based on GP Thompson sampling (GP-TS): Prior-Elimination GP-TS (PE-GP-TS) that disqualifies priors with poor predictive performance, and HyperPrior GP-TS (HP-GP-TS) that utilizes a bi-level Thompson sampling scheme. We theoretically analyze the algorithms and establish a sublinear regret bound for HP-GP-TS. In addition, we demonstrate the effectiveness of these algorithms compared to the alternatives through extensive experiments with synthetic and real-world data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.01226v4</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jack Sandberg, Morteza Haghir Chehreghani</dc:creator>
    </item>
    <item>
      <title>Noncooperative Coordination via a Trading-based Auction</title>
      <link>https://arxiv.org/abs/2502.03616</link>
      <description>arXiv:2502.03616v4 Announce Type: replace 
Abstract: Noncooperative multi-agent systems often face coordination challenges due to conflicting preferences among agents. In particular, when agents act in their own self-interest, they may prefer different choices among multiple feasible outcomes, leading to suboptimal outcomes or even safety concerns. We propose an algorithm named trading auction for consensus (TACo), a decentralized approach that enables noncooperative agents to reach consensus without communicating directly or disclosing private valuations. TACo facilitates coordination through a structured trading-based auction, where agents iteratively select choices of interest and provably reach an agreement within an a priori bounded number of steps. A series of numerical experiments validate that the termination guarantees of TACo hold in practice, and show that TACo achieves a median performance that minimizes the total cost across all agents, while allocating resources significantly more fairly than baseline approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.03616v4</guid>
      <category>cs.GT</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jaehan Im, Filippos Fotiadis, Daniel Delahaye, Ufuk Topcu, David Fridovich-Keil</dc:creator>
    </item>
    <item>
      <title>AccioScene: Compositional 3D Scene Generation via Graph Diffusion and Interaction-driven Critics</title>
      <link>https://arxiv.org/abs/2502.06819</link>
      <description>arXiv:2502.06819v2 Announce Type: replace 
Abstract: This paper presents a framework for generating 3D indoor scenes from text prompts. Existing methods often formulate scene synthesis as an object layout prediction problem conditioned on a single input modality, such as a text description, room shape, or scene graph. This design can lead to object collisions and limited functional plausibility, reducing its practical applicability. To address these limitations, we introduce a multi-stage pipeline that better reflects practical scene creation scenarios. Given a text prompt describing partial scene content, our method first uses graph diffusion to produce a contextually coherent scene graph and then predicts a realistic object layout. In addition, we incorporate lightweight human-object interaction priors to encourage human-centric and functional arrangements, with explicit spatial constraints to reduce interpenetration. Our approach generates coherent 3D scenes with viable layouts that better support human interaction. Experiments on the 3D-FRONT dataset demonstrate that our method achieves competitive or state-of-the-art performance compared with existing approaches, while improving the physical plausibility of generated scenes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.06819v2</guid>
      <category>cs.LG</category>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yao Wei, Matteo Toso, Pietro Morerio, Changjae Oh, Michael Ying Yang, Alessio Del Bue</dc:creator>
    </item>
    <item>
      <title>Dealing with Annotator Disagreement in Hate Speech Classification</title>
      <link>https://arxiv.org/abs/2502.08266</link>
      <description>arXiv:2502.08266v3 Announce Type: replace 
Abstract: Hate speech detection is a crucial task, especially on social media where harmful content can spread quickly. Collecting social media content (tweets etc.) to train machine learning models is easy, but detecting and categorizing hate speech can be difficult due to the inherently subjective nature. This subjectivity leads to frequent disagreement among annotators, particularly for subtle or borderline content. Traditional approaches either discard non-consensus samples or force a ''gold standard'' through expert adjudication, ignoring valuable information about uncertainty and diverse human perspectives. We examine the largely overlooked problem of annotator disagreement in hate speech classification and evaluate a range of aggregation methods, including majority voting, ordinal strategies (minimum, maximum, and mean), and analyze their impact across binary, 4-class, and 6-class classification tasks. In addition, we leverage annotators' perceived hate speech strength scores to explore regression-based and hybrid modeling approaches. Among others, we show that filtering non-consensus samples results in over-optimistic results and that the perceived strength provides a complementary signal that enhance classification performance. Finally, we establish new state-of-the-art results for hate speech detection in Turkish tweets, and demonstrate that annotator disagreement, when properly modeled, is a valuable resource for building more robust and reliable systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.08266v3</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Somaiyeh Dehghan, Mehmet Umut Sen, Berrin Yanikoglu</dc:creator>
    </item>
    <item>
      <title>A Nitsche method for incompressible fluids with general dynamic boundary conditions</title>
      <link>https://arxiv.org/abs/2502.09550</link>
      <description>arXiv:2502.09550v2 Announce Type: replace 
Abstract: Both Newtonian and non-Newtonian fluids may exhibit complex slip behaviour at the boundary. We examine a broad class of slip boundary conditions that generalises the commonly used Navier slip, perfect slip, stick-slip and Tresca friction boundary conditions. In particular, set-valued, nonmonotone, noncoercive and dynamic relations may occur. For a unifying framework of such relations, we present a fully discrete numerical scheme for the time-dependent Navier-Stokes equations subject to impermeability and general slip-type boundary conditions on polyhedral domains. Based on compactness arguments, we prove convergence of subsequences, finally ensuring the existence of a weak solution. The numerical scheme uses a general inf-sup stable pair of finite element spaces for the velocity and pressure, a regularisation approach for the implicit slip boundary condition and, most importantly, a general Nitsche method to impose the impermeability and a backward Euler time stepping. One of the key tools in the convergence proof is an inhomogeneous Korn inequality that includes a normal trace term.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.09550v2</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Pablo Alexei Gazca-Orozco, Franz Gmeineder, Erika Maringov\'a Kokavcov\'a, Tabea Tscherpel</dc:creator>
    </item>
    <item>
      <title>Deep Tree Tensor Networks</title>
      <link>https://arxiv.org/abs/2502.09928</link>
      <description>arXiv:2502.09928v2 Announce Type: replace 
Abstract: Originating in quantum physics, tensor networks (TNs) have been widely adopted as exponential machines and parametric decomposers for recognition tasks. Typical TN models, such as Matrix Product States (MPS), have not yet achieved successful application in natural image recognition. When employed, they primarily serve to compress parameters within pre-existing networks, thereby losing their distinctive capability to capture exponential-order feature interactions. This paper introduces a novel architecture named \textit{\textbf{D}eep \textbf{T}ree \textbf{T}ensor \textbf{N}etwork} (DTTN), which captures $2^L$-order multiplicative interactions across features through multilinear operations, while essentially unfolding into a \emph{tree}-like TN topology with the parameter-sharing property. DTTN is stacked with multiple antisymmetric interaction modules (AIMs), and this design facilitates efficient implementation. Furthermore, our theoretical analysis demonstrates the equivalence between quantum-inspired TN models and polynomial/multilinear networks under specific conditions. We posit that the DTTN could catalyze more interpretable research within this field. The proposed model is evaluated across multiple benchmarks and domains, demonstrating superior performance compared to both peer methods and state-of-the-art architectures. Our code is publicly available at https://github.com/NieCha/deep_tree_tensor_network.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.09928v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chang Nie</dc:creator>
    </item>
    <item>
      <title>Audio-FLAN: An Instruction-Following Dataset for Unified Audio Understanding and Generation of Speech, Music, and Sound</title>
      <link>https://arxiv.org/abs/2502.16584</link>
      <description>arXiv:2502.16584v2 Announce Type: replace 
Abstract: Recent advancements in audio tokenization have significantly enhanced the integration of audio capabilities into large language models (LLMs). However, audio understanding and generation are often treated as distinct tasks, hindering the development of truly unified audio-language models. While instruction tuning has demonstrated remarkable success in improving generalization and zero-shot learning across text and vision, its application to audio remains largely unexplored. A major obstacle is the lack of comprehensive datasets that unify audio understanding and generation. To address this, we introduce Audio-FLAN, a large-scale instruction-tuning dataset covering 80 diverse tasks across speech, music, and sound domains, with over 100 million instances. Audio-FLAN lays the foundation for unified audio-language models that can seamlessly handle both understanding (e.g., transcription, comprehension) and generation (e.g., speech, music, sound) tasks across a wide range of audio domains in a zero-shot manner. The Audio-FLAN dataset is available on HuggingFace and GitHub.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.16584v2</guid>
      <category>cs.SD</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.MM</category>
      <category>eess.AS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Liumeng Xue, Ziya Zhou, Jiahao Pan, Zixuan Li, Shuai Fan, Yinghao Ma, Sitong Cheng, Dongchao Yang, Haohan Guo, Yujia Xiao, Xinsheng Wang, Zixuan Shen, Chuanbo Zhu, Xinshen Zhang, Tianchi Liu, Ruibin Yuan, Zeyue Tian, Haohe Liu, Xingjian Du, Emmanouil Benetos, Ge Zhang, Yike Guo, Wei Xue</dc:creator>
    </item>
    <item>
      <title>Rule-based autocorrection of Piping and Instrumentation Diagrams (P&amp;IDs) on graphs</title>
      <link>https://arxiv.org/abs/2502.18493</link>
      <description>arXiv:2502.18493v2 Announce Type: replace 
Abstract: A piping and instrumentation diagram (P&amp;ID) is a central reference document in chemical process engineering. Currently, chemical engineers manually review P&amp;IDs through visual inspection to find and rectify errors. However, engineering projects can involve hundreds to thousands of P&amp;ID pages, creating a significant revision workload. This study proposes a rule-based method to support engineers with error detection and correction in P&amp;IDs. The method is based on a graph representation of P&amp;IDs, enabling automated error detection and correction, i.e., autocorrection, through rule graphs. We use our pyDEXPI Python package to generate P&amp;ID graphs from DEXPI-standard P&amp;IDs. In this study, we developed 33 rules based on chemical engineering knowledge and heuristics, with five selected rules demonstrated as examples. A case study on an illustrative P&amp;ID validates the reliability and effectiveness of the rule-based autocorrection method in revising P&amp;IDs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.18493v2</guid>
      <category>cs.CE</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.69997/sct.150968</arxiv:DOI>
      <arxiv:journal_reference>Systems and Control Transactions, Volume 4, 2025, Pages 1656-1661</arxiv:journal_reference>
      <dc:creator>Lukas Schulze Balhorn, Niels Seijsener, Kevin Dao, Minji Kim, Dominik P. Goldstein, Ge H. M. Driessen, Artur M. Schweidtmann</dc:creator>
    </item>
    <item>
      <title>In-Context Learning of Stochastic Differential Equations with Foundation Inference Models</title>
      <link>https://arxiv.org/abs/2502.19049</link>
      <description>arXiv:2502.19049v3 Announce Type: replace 
Abstract: Stochastic differential equations (SDEs) describe dynamical systems where deterministic flows, governed by a drift function, are superimposed with random fluctuations, dictated by a diffusion function. The accurate estimation (or discovery) of these functions from data is a central problem in machine learning, with wide application across the natural and social sciences. Yet current solutions either rely heavily on prior knowledge of the dynamics or involve intricate training procedures. We introduce FIM-SDE (Foundation Inference Model for SDEs), a pretrained recognition model that delivers accurate in-context (or zero-shot) estimation of the drift and diffusion functions of low-dimensional SDEs, from noisy time series data, and allows rapid finetuning to target datasets. Leveraging concepts from amortized inference and neural operators, we (pre)train FIM-SDE in a supervised fashion to map a large set of noisy, discretely observed SDE paths onto the space of drift and diffusion functions. We demonstrate that FIM-SDE achieves robust in-context function estimation across a wide range of synthetic and real-world processes -- from canonical SDE systems (e.g., double-well dynamics or weakly perturbed Lorenz attractors) to stock price recordings and oil-price and wind-speed fluctuations -- while matching the performance of symbolic, Gaussian process and Neural SDE baselines trained on the target datasets. When finetuned to the target processes, we show that FIM-SDE consistently outperforms all these baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.19049v3</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>39th Conference on Neural Information Processing Systems (NeurIPS 2025)</arxiv:journal_reference>
      <dc:creator>Patrick Seifner, Kostadin Cvejoski, David Berghaus, Cesar Ojeda, Ramses J. Sanchez</dc:creator>
    </item>
    <item>
      <title>TACO: General Acrobatic Flight Control via Target-and-Command-Oriented Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2503.01125</link>
      <description>arXiv:2503.01125v5 Announce Type: replace 
Abstract: Although acrobatic flight control has been studied extensively, one key limitation of the existing methods is that they are usually restricted to specific maneuver tasks and cannot change flight pattern parameters online. In this work, we propose a target-and-command-oriented reinforcement learning (TACO) framework, which can handle different maneuver tasks in a unified way and allows online parameter changes. Additionally, we propose a spectral normalization method with input-output rescaling to enhance the policy's temporal and spatial smoothness, independence, and symmetry, thereby overcoming the sim-to-real gap. We validate the TACO approach through extensive simulation and real-world experiments, demonstrating its capability to achieve high-speed circular flights and continuous multi-flips.</description>
      <guid isPermaLink="false">oai:arXiv.org:2503.01125v5</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zikang Yin, Canlun Zheng, Shiliang Guo, Zhikun Wang, Shiyu Zhao</dc:creator>
    </item>
    <item>
      <title>phepy: Visual benchmarks and improvements for out-of-distribution detectors</title>
      <link>https://arxiv.org/abs/2503.05169</link>
      <description>arXiv:2503.05169v2 Announce Type: replace 
Abstract: Applying machine learning to increasingly high-dimensional problems with sparse or biased training data increases the risk that a model is used on inputs outside its training domain. For such out-of-distribution (OOD) inputs, the model can no longer make valid predictions, and its error is potentially unbounded. Since testing OOD detection methods on real-world datasets is complicated, we design a benchmark for OOD detection, which includes three novel and easily-visualisable toy examples. These simple examples provide direct and intuitive insight into whether the detector is able to detect (1) linear and (2) non-linear concepts and (3) identify thin in-distribution (ID) subspaces (needles) within high-dimensional spaces (haystacks). We use our benchmark to evaluate the performance of various methods from the literature. Since tactile examples of OOD inputs may benefit OOD detection, we also review several simple methods to synthesise OOD inputs for supervised training. We introduce two improvements, $t$-poking and OOD sample weighting, to make supervised detectors more precise at the ID-OOD boundary. This is especially important when conflicts between real ID and synthetic OOD sample blur the decision boundary. Finally, we provide recommendations for constructing and applying OOD detectors in machine learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2503.05169v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Felix Krumbiegel, Juniper Tyree, Michael Boy, Petri Clusius, Andreas Rupp</dc:creator>
    </item>
    <item>
      <title>Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models</title>
      <link>https://arxiv.org/abs/2503.08434</link>
      <description>arXiv:2503.08434v5 Announce Type: replace 
Abstract: Recent advances in large-scale text-to-image models have revolutionized creative fields by generating visually captivating outputs from textual prompts; however, while traditional photography offers precise control over camera settings to shape visual aesthetics - such as depth-of-field via aperture - current diffusion models typically rely on prompt engineering to mimic such effects. This approach often results in crude approximations and inadvertently alters the scene content. In this work, we propose Bokeh Diffusion, a scene-consistent bokeh control framework that explicitly conditions a diffusion model on a physical defocus blur parameter. To overcome the scarcity of paired real-world images captured under different camera settings, we introduce a hybrid training pipeline that aligns in-the-wild images with synthetic blur augmentations, providing diverse scenes and subjects as well as supervision to learn the separation of image content from lens blur. Central to our framework is our grounded self-attention mechanism, trained on image pairs with different bokeh levels of the same scene, which enables blur strength to be adjusted in both directions while preserving the underlying scene. Extensive experiments demonstrate that our approach enables flexible, lens-like blur control, supports downstream applications such as real image editing via inversion, and generalizes effectively across both Stable Diffusion and FLUX architectures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2503.08434v5</guid>
      <category>cs.GR</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1145/3757377.3763906</arxiv:DOI>
      <dc:creator>Armando Fortes, Tianyi Wei, Shangchen Zhou, Xingang Pan</dc:creator>
    </item>
    <item>
      <title>SDTrack: A Baseline for Event-based Tracking via Spiking Neural Networks</title>
      <link>https://arxiv.org/abs/2503.08703</link>
      <description>arXiv:2503.08703v4 Announce Type: replace 
Abstract: Event cameras provide superior temporal resolution, dynamic range, energy efficiency, and pixel bandwidth. Spiking Neural Networks (SNNs) naturally complement event data through discrete spike signals, making them ideal for event-based tracking. However, current approaches combining Artificial Neural Networks (ANNs) and SNNs suffer from suboptimal architectures that compromise energy efficiency and limit tracking performance. To address these limitations, we propose the first Transformer-based \textbf{S}pike-\textbf{D}riven \textbf{T}racking (SDTrack) pipeline. It incorporates a novel event frame aggregation method called Global Trajectory Prompt (GTP) and a Transformer-based tracker. The GTP method effectively captures global trajectory information and aggregates it with event streams into event frames to enhance spatiotemporal representation. The Transformer-based tracker comprises a fully spike-driven SNN backbone and a simple tracking head. The SDTrack pipeline operates end-to-end without data augmentation or post-processing. Extensive experiments demonstrate that our SDTrack-Tiny pipeline achieves competitive accuracy with only 19.61$M$ parameters and 8.16$mJ$ energy consumption, while our Base version achieves state-of-the-art accuracy across three datasets. Our work establishes a solid foundation for future neuromorphic vision research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2503.08703v4</guid>
      <category>cs.NE</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Yimeng Shan, Zhenbang Ren, Haodi Wu, Wenjie Wei, Rui-Jie Zhu, Shuai Wang, Dehao Zhang, Yichen Xiao, Jieyuan Zhang, Kexin Shi, Jingzhinan Wang, Jason K. Eshraghian, Haicheng Qu, Malu Zhang</dc:creator>
    </item>
    <item>
      <title>HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions</title>
      <link>https://arxiv.org/abs/2503.14229</link>
      <description>arXiv:2503.14229v4 Announce Type: replace 
Abstract: Vision-and-Language Navigation (VLN) has been studied mainly in either discrete or continuous spaces, with little attention to dynamic, crowded environments. We present HA-VLN 2.0, a unified benchmark introducing explicit social-awareness constraints. Our contributions are: (i) a standardized task and metrics capturing both goal accuracy and personal-space adherence; (ii) HAPS 2.0 dataset and simulators modeling multi-human interactions, outdoor contexts, and finer language-motion alignment; (iii) benchmarks on 16,844 socially grounded instructions, revealing sharp performance drops of leading agents under human dynamics and partial observability; and (iv) real-world robot experiments validating sim-to-real transfer, with an open leaderboard enabling transparent comparison. Results show that explicit social modeling improves navigation robustness and reduces collisions, underscoring necessity of human-centric approaches. By releasing datasets, simulators, baselines, and protocols, HA-VLN 2.0 provides a strong foundation for safe, human-aware navigation research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2503.14229v4</guid>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yifei Dong, Fengyi Wu, Qi He, Lingdong Kong, Heng Li, Minghan Li, Zebang Cheng, Yuxuan Zhou, Jingdong Sun, Qi Dai, Alexander G Hauptmann, Zhi-Qi Cheng</dc:creator>
    </item>
    <item>
      <title>LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty</title>
      <link>https://arxiv.org/abs/2503.18314</link>
      <description>arXiv:2503.18314v5 Announce Type: replace 
Abstract: We present LoTUS, a novel Machine Unlearning (MU) method that eliminates the influence of training samples from pre-trained models, avoiding retraining from scratch. LoTUS smooths the prediction probabilities of the model up to an information-theoretic bound, mitigating its over-confidence stemming from data memorization. We evaluate LoTUS on Transformer and ResNet18 models against eight baselines across five public datasets. Beyond established MU benchmarks, we evaluate unlearning on ImageNet1k, a large-scale dataset, where retraining is impractical, simulating real-world conditions. Moreover, we introduce the novel Retrain-Free Jensen-Shannon Divergence (RF-JSD) metric to enable evaluation under real-world conditions. The experimental results show that LoTUS outperforms state-of-the-art methods in terms of both efficiency and effectiveness. Code: https://github.com/cspartalis/LoTUS.</description>
      <guid isPermaLink="false">oai:arXiv.org:2503.18314v5</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Christoforos N. Spartalis, Theodoros Semertzidis, Petros Daras, Efstratios Gavves</dc:creator>
    </item>
    <item>
      <title>The MINI mixed virtual element for the Stokes equation</title>
      <link>https://arxiv.org/abs/2503.20921</link>
      <description>arXiv:2503.20921v2 Announce Type: replace 
Abstract: We present and discuss a generalization of the popular MINI mixed finite element for the 2D Stokes equation by means of conforming virtual elements on polygonal meshes. We prove optimal error estimates for both velocity and pressure. Theoretical results are confirmed by several numerical tests performed with different choices of polynomial accuracy and meshes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2503.20921v2</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Silvia Bertoluzza, Fabio Credali, Daniele Prada</dc:creator>
    </item>
    <item>
      <title>CamoSAM2: SAM2-oriented Prompt Auto-Refinement for Video Camouflaged Object Detection</title>
      <link>https://arxiv.org/abs/2504.00375</link>
      <description>arXiv:2504.00375v2 Announce Type: replace 
Abstract: The Segment Anything Model 2 (SAM2), a prompt-guided video foundation model, has remarkably performed in video object segmentation, drawing significant attention in the community. Due to the high similarity between camouflaged objects and their surroundings, which makes them difficult to distinguish even by the human eye, the application of SAM2 for automated segmentation in real-world scenarios faces challenges in camouflage perception and reliable prompts generation. To address these issues, we propose CamoSAM2, a motion-appearance prompt inducer (MAPI) and refinement framework to automatically generate and refine prompts for SAM2, enabling high-quality automatic detection and segmentation in VCOD task. Initially, we introduce a prompt inducer that simultaneously integrates motion and appearance cues to detect camouflaged objects, delivering more accurate initial predictions than existing methods. Subsequently, we propose a video-based adaptive multi-prompts refinement (AMPR) strategy tailored for SAM2, aimed at mitigating prompt error in initial coarse masks and further producing good prompts. Specifically, we introduce a novel three-step process to generate reliable prompts by camouflaged object determination, pivotal prompt frame selection, and multi-prompts formation. Extensive experiments conducted on two benchmark datasets demonstrate that our proposed model, CamoSAM2, significantly outperforms existing state-of-the-art methods, achieving increases of 8.0% and 10.1% in mIoU metric. Additionally, our method achieves the fastest inference speed compared to current VCOD models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2504.00375v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xin Zhang, Keren Fu, Qijun Zhao</dc:creator>
    </item>
    <item>
      <title>Chinese Grammatical Error Correction: A Survey</title>
      <link>https://arxiv.org/abs/2504.00977</link>
      <description>arXiv:2504.00977v2 Announce Type: replace 
Abstract: Chinese Grammatical Error Correction (CGEC) is a critical task in Natural Language Processing, addressing the growing demand for automated writing assistance in both second-language (L2) and native (L1) Chinese writing. While L2 learners struggle with mastering complex grammatical structures, L1 users also benefit from CGEC in academic, professional, and formal contexts where writing precision is essential. This survey provides a comprehensive review of CGEC research, covering datasets, annotation schemes, evaluation methodologies, and system advancements. We examine widely used CGEC datasets, highlighting their characteristics, limitations, and the need for improved standardization. We also analyze error annotation frameworks, discussing challenges such as word segmentation ambiguity and the classification of Chinese-specific error types. Furthermore, we review evaluation metrics, focusing on their adaptation from English GEC to Chinese, including character-level scoring and the use of multiple references. In terms of system development, we trace the evolution from rule-based and statistical approaches to neural architectures, including Transformer-based models and the integration of large pre-trained language models. By consolidating existing research and identifying key challenges, this survey provides insights into the current state of CGEC and outlines future directions, including refining annotation standards to address segmentation challenges, and leveraging multilingual approaches to enhance CGEC.</description>
      <guid isPermaLink="false">oai:arXiv.org:2504.00977v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Mengyang Qiu, Qingyu Gao, Linxuan Yang, Yang Gu, Tran Minh Nguyen, Zihao Huang, Jungyeul Park</dc:creator>
    </item>
    <item>
      <title>Hummus: A Dataset of Humorous Multimodal Metaphor Use</title>
      <link>https://arxiv.org/abs/2504.02983</link>
      <description>arXiv:2504.02983v3 Announce Type: replace 
Abstract: Metaphor and humor share a lot of common ground, and metaphor is one of the most common humorous mechanisms. This study focuses on the humorous capacity of multimodal metaphors, which has not received due attention in the community. We take inspiration from the Incongruity Theory of humor, the Conceptual Metaphor Theory, and the annotation scheme behind the VU Amsterdam Metaphor Corpus, and developed a novel annotation scheme for humorous multimodal metaphor use in image-caption pairs. We create the Hummus Dataset of Humorous Multimodal Metaphor Use, providing expert annotation on 1k image-caption pairs sampled from the New Yorker Caption Contest corpus. Using the dataset, we test state-of-the-art multimodal large language models (MLLMs) on their ability to detect and understand humorous multimodal metaphor use. Our experiments show that current MLLMs still struggle with processing humorous multimodal metaphors, particularly with regard to integrating visual and textual information. We release our dataset and code at github.com/xiaoyuisrain/humorous-multimodal-metaphor-use.</description>
      <guid isPermaLink="false">oai:arXiv.org:2504.02983v3</guid>
      <category>cs.CL</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Xiaoyu Tong, Zhi Zhang, Pia Sommerauer, Martha Lewis, Ekaterina Shutova</dc:creator>
    </item>
    <item>
      <title>Contour Field based Elliptical Shape Prior for the Segment Anything Model</title>
      <link>https://arxiv.org/abs/2504.12556</link>
      <description>arXiv:2504.12556v2 Announce Type: replace 
Abstract: The elliptical shape prior information plays a vital role in improving the accuracy of image segmentation for specific tasks in medical and natural images. Existing deep learning-based segmentation methods, including the Segment Anything Model (SAM), often struggle to produce segmentation results with elliptical shapes efficiently. This paper proposes a new approach to integrate the prior of elliptical shapes into the deep learning-based SAM image segmentation techniques using variational methods. The proposed method establishes a parameterized elliptical contour field, which constrains the segmentation results to align with predefined elliptical contours. Utilizing the dual algorithm, the model seamlessly integrates image features with elliptical priors and spatial regularization priors, thereby greatly enhancing segmentation accuracy. By decomposing SAM into four mathematical sub-problems, we integrate the variational ellipse prior to design a new SAM network structure, ensuring that the segmentation output of SAM consists of elliptical regions. Experimental results on some specific image datasets demonstrate an improvement over the original SAM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2504.12556v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xinyu Zhao, Jun Liu, Faqiang Wang, Li Cui, Yuping Duan</dc:creator>
    </item>
    <item>
      <title>Enhancing Strawberry Yield Forecasting with Backcasted IoT Sensor Data and Machine Learning</title>
      <link>https://arxiv.org/abs/2504.18451</link>
      <description>arXiv:2504.18451v2 Announce Type: replace 
Abstract: Rapid global population growth underscores the need for digitally enabled agricultural systems that support sustainable food production and data-driven resource management for farmers and stakeholders. The adoption of Internet of Things (IoT) technologies, capable of capturing real-time environmental (e.g., temperature, humidity) and operational (e.g., irrigation) parameters, is a crucial step toward enabling advanced applications such as AI-based yield forecasting. However, the effectiveness of such models is often constrained by limited data availability, particularly in dynamic farm environments where IoT observations must be accumulated over multiple growing seasons. In this study, we deployed IoT sensors in strawberry production polytunnels over two growing seasons to collect data on water usage, internal and external temperature and humidity, soil moisture, soil temperature, and photosynthetically active radiation. These observations were combined with manually recorded yield data spanning four seasons. To address gaps in IoT data for the two seasons without sensor coverage, we developed an AI-based backcasting approach that synthesizes missing sensor observations using historical weather data from a nearby station and existing polytunnel measurements. We then trained AI-based yield forecasting models using both real and synthetic datasets. In this retrospective evaluation, results show that incorporating synthetic data improved yield forecasting accuracy, with models trained on the combined dataset outperforming those using only real sensor, weather, and yield data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2504.18451v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tewodros Alemu Ayall, Andy Li, Matthew Beddows, Milan Markovic, Georgios Leontidis</dc:creator>
    </item>
    <item>
      <title>Follow Everything: A Leader-Following and Obstacle Avoidance Framework with Goal-Aware Adaptation</title>
      <link>https://arxiv.org/abs/2504.19399</link>
      <description>arXiv:2504.19399v5 Announce Type: replace 
Abstract: Robust and flexible leader-following is a critical capability for robots to integrate into human society. While existing methods struggle to generalize to leaders of arbitrary form and often fail when the leader temporarily leaves the robot's field of view, this work introduces a unified framework addressing both challenges. First, traditional detection models are replaced with a segmentation model, allowing the leader to be anything. To enhance recognition robustness, a distance frame buffer is implemented that stores leader embeddings at multiple distances, accounting for the unique characteristics of leader-following tasks. Second, a goal-aware adaptation mechanism is designed to govern robot planning states based on the leader's visibility and motion, complemented by a graph-based planner that generates candidate trajectories for each state, ensuring efficient following with obstacle avoidance. Simulations and real-world experiments with a legged robot follower and various leaders (human, ground robot, UAV, legged robot, stop sign) in both indoor and outdoor environments show competitive improvements in follow success rate, reduced visual loss duration, lower collision rate, and decreased leader-follower distance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2504.19399v5</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Qianyi Zhang, Shijian Ma, Boyi Liu, Jianhao Jiao, Dimitrios Kanoulas</dc:creator>
    </item>
    <item>
      <title>Pathfinders in the Sky: Formal Decision-Making Models for Collaborative Air Traffic Control in Convective Weather</title>
      <link>https://arxiv.org/abs/2505.01804</link>
      <description>arXiv:2505.01804v2 Announce Type: replace 
Abstract: Air traffic can be significantly disrupted by weather. Pathfinder operations involve assigning a designated aircraft to assess whether airspace that was previously impacted by weather can be safely traversed through. Despite relatively routine use in air traffic control, there is little research on the underlying multi-agent decision-making problem. We seek to address this gap herein by formulating decision models to capture the operational dynamics and implications of pathfinders. Specifically, we construct a Markov chain to represent the stochastic transitions between key operational states (e.g., pathfinder selection). We then analyze its steady-state behavior to understand long-term system dynamics. We also propose models to characterize flight-specific acceptance behaviors (based on utility trade-offs) and pathfinder selection strategies (based on sequential offer allocations). We then conduct a worst-case scenario analysis that highlights risks from collective rejection and explores how selfless behavior and uncertainty affect system resilience. Empirical analysis of data from the US Federal Aviation Administration demonstrates the real-world significance of pathfinder operations and informs future model calibration.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.01804v2</guid>
      <category>cs.MA</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1109/ITSC60802.2025.11423180</arxiv:DOI>
      <arxiv:journal_reference>2025 IEEE 28th International Conference on Intelligent Transportation Systems (ITSC)</arxiv:journal_reference>
      <dc:creator>Jimin Choi, Kartikeya Anand, Husni R. Idris, Huy T. Tran, Max Z. Li</dc:creator>
    </item>
    <item>
      <title>Coop-WD: Cooperative Perception with Weighting and Denoising for Robust V2V Communication</title>
      <link>https://arxiv.org/abs/2505.03528</link>
      <description>arXiv:2505.03528v2 Announce Type: replace 
Abstract: Cooperative perception, leveraging shared information from multiple vehicles via vehicle-to-vehicle (V2V) communication, plays a vital role in autonomous driving to alleviate the limitation of single-vehicle perception. Existing works have explored the effects of V2V communication impairments on perception precision, but they lack generalization to different levels of impairments. In this work, we propose a joint weighting and denoising framework, Coop-WD, to enhance cooperative perception subject to V2V channel impairments. In this framework, the self-supervised contrastive model and the conditional diffusion probabilistic model are adopted hierarchically for vehicle-level and pixel-level feature enhancement. An efficient variant model, Coop-WD-eco, is proposed to selectively deactivate denoising to reduce processing overhead. Rician fading, non-stationarity, and time-varying distortion are considered. Simulation results demonstrate that the proposed Coop-WD outperforms conventional benchmarks in all types of channels. Qualitative analysis with visual examples further proves the superiority of our proposed method. The proposed Coop-WD-eco achieves up to 50% reduction in computational cost under severe distortion while maintaining comparable accuracy as channel conditions improve.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.03528v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chenguang Liu, Jianjun Chen, Yunfei Chen, Yubei He, Zhuangkun Wei, Hongjian Sun, Haiyan Lu, Qi Hao</dc:creator>
    </item>
    <item>
      <title>Hybrid-Field 6D Movable Antenna for Terahertz Communications: Channel Modeling and Estimation</title>
      <link>https://arxiv.org/abs/2505.04753</link>
      <description>arXiv:2505.04753v2 Announce Type: replace 
Abstract: In this work, we study a six-dimensional movable antenna (6DMA)-enhanced Terahertz (THz) network that supports a large number of users with a few antennas by controlling the three-dimensional (3D) positions and 3D rotations of antenna surfaces/subarrays at the base station (BS). However, the short wavelength of THz signals combined with a large 6DMA movement range extends the near-field region. As a result, a user can be in the far-field region relative to the antennas on one 6DMA surface, while simultaneously residing in the near-field region relative to other 6DMA surfaces. Moreover, 6DMA THz channel estimation suffers from increased computational complexity and pilot overhead due to uneven power distribution across the large number of candidate position-rotation pairs, as well as the limited number of radio frequency (RF) chains in THz bands. To address these issues, we propose an efficient hybrid-field generalized 6DMA THz channel model, which accounts for planar wave propagation within individual 6DMA surfaces and spherical waves among different 6DMA surfaces. Furthermore, we propose a low-overhead channel estimation algorithm that leverages directional sparsity to construct a complete channel map for all potential antenna position-rotation pairs.
  Numerical results show that the proposed hybrid-field channel model achieves a sum rate close to that of the ground-truth near-field channel model and confirm that the channel estimation method yields accurate results with low complexity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.04753v2</guid>
      <category>cs.IT</category>
      <category>eess.SP</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaodan Shao, Yixiao Zhang, Shisheng Hu, Zhixuan Tang, Mingcheng He, Xinyu Huang, Weihua Zhuang, Xuemin Shen</dc:creator>
    </item>
    <item>
      <title>Maris: A Formally Verifiable Privacy Policy Enforcement Paradigm for Multi-Agent Collaboration Systems</title>
      <link>https://arxiv.org/abs/2505.04799</link>
      <description>arXiv:2505.04799v5 Announce Type: replace 
Abstract: Multi-agent collaboration systems (MACS), powered by large language models (LLMs), solve complex problems efficiently by leveraging each agent's specialization and communication between agents. However, the inherent exchange of information between agents and their interaction with external environments, such as LLM, tools, and users, inevitably introduces significant risks of sensitive data leakage, including vulnerabilities to attacks such as eavesdropping and prompt injection. Existing MACS lack fine-grained data protection controls, making it challenging to manage sensitive information securely. In this paper, we take the first step to mitigate the MACS's data leakage threat through a privacy-enhanced MACS development paradigm, Maris. Maris enables rigorous message flow control within MACS by embedding reference monitors into key multi-agent conversation components. We implemented Maris as an integral part of widely-adopted open-source multi-agent development frameworks, AutoGen and LangChain. To evaluate its effectiveness, we develop a Privacy Assessment Framework that emulates MACS under different threat scenarios. Our evaluation shows that Maris effectively mitigated sensitive data leakage threats across three different task suites while maintaining a high task success rate.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.04799v5</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jian Cui, Zichuan Li, Chi Wang, Luyi Xing, Xiaojing Liao</dc:creator>
    </item>
    <item>
      <title>Smaller and More Flexible Cuckoo Filters</title>
      <link>https://arxiv.org/abs/2505.05847</link>
      <description>arXiv:2505.05847v4 Announce Type: replace 
Abstract: Cuckoo filters are space-efficient approximate set membership data structures with a controllable false positive rate (FPR) and zero false negatives, similar to Bloom filters. In contrast to Bloom filters, Cuckoo filters store multi-bit fingerprints of keys in a hash table using variants of Cuckoo hashing, allowing each fingerprint to be stored at a small number of possible locations. Existing Cuckoo filters use fingerprints of $(k+3)$ bits per key and an additional space overhead factor of at least $1.05$ to achieve an FPR of $2^{-k}$. For $k=10$, this amounts to $1.365\, kn$ bits to store $n$ keys, which is better than $1.443\, kn$ bits for Bloom filters. The $+3$ for the fingerprint size is required to balance out the multiplied FPR caused by looking for the fingerprint at several locations. In the original Cuckoo filter, the number of hash table buckets is restricted to a power of 2, which may lead to much larger space overheads, up to $2.1\, (1+3/k)\, kn$ bits.
  We present two improvements of Cuckoo filters. First, we remove the restriction that the number of buckets must be a power of 2 by using a different placement strategy. Second, we reduce the space overhead factor of Cuckoo filters to $1.06 \, (1+2/k)$ by using overlapping windows instead of disjoint buckets to maintain the load threshold of the hash table, while reducing the number of alternative slots where any fingerprint may be found.
  A detailed evaluation demonstrates that the alternative memory layout based on overlapping windows decreases the size of Cuckoo filters not only in theory, but also in practice. A comparison with other state-of-the art filter types, Prefix filters and Vector Quotient filters (VQFs), shows that the reduced space overhead makes windowed Cuckoo filters the smallest filters supporting online insertions, with similarly fast queries, but longer insertion times.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.05847v4</guid>
      <category>cs.DS</category>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Johanna Elena Schmitz, Jens Zentgraf, Sven Rahmann</dc:creator>
    </item>
    <item>
      <title>Robust Renal Mass Segmentation on CT: A Validation Study of an AI-Based Framework</title>
      <link>https://arxiv.org/abs/2505.07573</link>
      <description>arXiv:2505.07573v2 Announce Type: replace 
Abstract: Renal mass segmentation has important potential to enhance the clinical workflow, especially in settings requiring quantitative assessments. Kidney volume could serve as an important biomarker for renal diseases, with changes in volume correlating directly with kidney function. Currently, clinical practice often relies on subjective visual assessment for evaluating kidney size and kidney lesions, including tumors and cysts, which are typically staged based on diameter, volume, and anatomical location. To support a more objective and reproducible approach, this research aims to develop a robust, thoroughly validated renal mass segmentation algorithm, named Renal-Net. We employ publicly available training datasets and leverage the state-of-the-art medical image segmentation framework nnU-Net. Validation is conducted using both proprietary and public test datasets, with segmentation performance quantified by Dice coefficient and the 95th percentile Hausdorff distance. Furthermore, we analyze robustness across subgroups based on patient sex, age, CT contrast phases, and tumor histologic subtypes. Our findings demonstrate that our segmentation algorithm, trained exclusively on publicly available data, generalizes effectively to external test sets and outperforms existing state-of-the-art models across all tested datasets. Subgroup analyses reveal consistent high performance, indicating strong robustness and reliability. The developed algorithm and associated code are publicly accessible at https://github.com/DIAGNijmegen/oncology-kidney-abnormality-segmentation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.07573v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.59275/j.melba.2026-67g5</arxiv:DOI>
      <arxiv:journal_reference>Machine.Learning.for.Biomedical.Imaging. 2026 (2026)</arxiv:journal_reference>
      <dc:creator>Sarah de Boer, Hartmut H\"antze, Kiran Vaidhya Venkadesh, Myrthe A. D. Buser, Gabriel E. Humpire Mamani, Lina Xu, Lisa C. Adams, Jawed Nawabi, Keno K. Bressem, Bram van Ginneken, Mathias Prokop, Alessa Hering</dc:creator>
    </item>
    <item>
      <title>Harmonia: End-to-End RAG Serving Optimization</title>
      <link>https://arxiv.org/abs/2505.07833</link>
      <description>arXiv:2505.07833v2 Announce Type: replace 
Abstract: Retrieval-Augmented Generation (RAG) improves the reliability of large language models by integrating external knowledge, but serving RAG pipelines efficiently is challenging because requests traverse heterogeneous components spanning LLM inference, databases, and CPU-side processing. We present Harmonia, an end-to-end RAG serving framework that addresses these bottlenecks through (i) a flexible pipeline specification interface for composing custom workflows, (ii) heterogeneity-aware deployment that provisions and configures components as a distributed inference system, and (iii) a closed-loop runtime controller that monitors load and execution progress and reduces SLO violations through request prioritization and auto-scaling. Across four RAG applications, Harmonia outperforms commercial alternatives, improving throughput by more than 2.04x while reducing SLO violations by up to 78.4 percent.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.07833v2</guid>
      <category>cs.DC</category>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <category>cs.OS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Saurabh Agarwal, Bodun Hu, Luis Pabon, Myungjin Lee, Jayanth Srinivasa, Aditya Akella</dc:creator>
    </item>
    <item>
      <title>Can Global XAI Methods Reveal Injected Behaviours in LLMs? SHAP vs Rule Extraction vs RuleSHAP</title>
      <link>https://arxiv.org/abs/2505.11189</link>
      <description>arXiv:2505.11189v3 Announce Type: replace 
Abstract: Large language models (LLMs) can amplify misinformation, undermining societal goals such as the UN SDGs. We study three documented drivers of misinformation (valence framing, information overload, and oversimplification) often shaped by default beliefs. Building on evidence that LLMs encode such defaults (e.g., "joy is positive", "math is complex") and can act as "bags of heuristics", we ask whether belief-driven heuristics behind misinformation-related behaviour can be recovered from black-box LLM behaviour as explicit rules. A key obstacle is that global rule-extraction methods in explainable AI (XAI) are built for numerical input-output data, not text. We address this by eliciting global LLM beliefs and mapping them to numerical scores via statistically validated abstractions, enabling off-the-shelf global XAI to detect belief-driven heuristics. For ground truth, we inject nonlinear behavioural triggers of increasing complexity (univariate, conjunctive, non-convex) into GPT-family and Llama models via system instructions. We find that RuleFit often misses non-univariate triggers, while global SHAP better ranks conjunctive trigger features but yields no symbolic rules. To bridge this gap, we propose RuleSHAP, a rule-extraction algorithm that couples global SHAP aggregates with rule induction to better capture non-univariate triggers, improving MRR@1 over RuleFit by +82% on average. Our results suggest a practical pathway for surfacing behavioural triggers in LLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.11189v3</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1145/3770855.3818093</arxiv:DOI>
      <dc:creator>Francesco Sovrano</dc:creator>
    </item>
    <item>
      <title>CoSeP: Complementary Separability Pruning via Class-Separability Clustering</title>
      <link>https://arxiv.org/abs/2505.13225</link>
      <description>arXiv:2505.13225v2 Announce Type: replace 
Abstract: Neural network pruning aims to compress models for efficient deployment, yet two fundamental challenges remain. First, many methods rely on per-component importance scores, selecting filters or neurons independently and ignoring redundancy: the retained set may include multiple components capturing similar discriminative patterns while missing others entirely. Second, determining per-layer pruning ratios typically requires manual, architecture-specific tuning with no principled stopping criterion. We propose CoSeP (Complementary Separability Pruning) to address both issues. Rather than scoring components in isolation, CoSeP represents each component by its class-separability profile across all class pairs, computed via Jeffries--Matusita distances. This defines a separability space in which nearby components are potentially redundant and distant components capture complementary information. CoSeP selects a compact set of representatives in this space: components are grouped via k-medoids clustering, candidate subset sizes are evaluated using the Mean Simplified Silhouette, and a knee-detection criterion automatically determines how many components to retain. Across CIFAR-10, CIFAR-100, and ImageNet-1K, on ResNet, VGG, MobileNet, and DenseNet architectures, CoSeP matches or improves accuracy while reducing FLOPs, with measured wall-clock inference-time reductions of up to 20%. For example, it achieves a +0.66% top-1 accuracy gain with 2.30x FLOPs reduction on ResNet-50/ImageNet-1K, and a 0.37% gain with 2.59x FLOPs reduction on VGG-16/CIFAR-10. These results demonstrate that modeling complementarity in class-separability space provides an effective and principled approach to pruning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.13225v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>David Levin, Gonen Singer</dc:creator>
    </item>
    <item>
      <title>LLMSynthor: Macro-Aligned Micro-Records Synthesis with Large Language Models</title>
      <link>https://arxiv.org/abs/2505.14752</link>
      <description>arXiv:2505.14752v4 Announce Type: replace 
Abstract: Macro-aligned micro-records are crucial for credible simulations in social science and urban studies. For example, epidemic models are only reliable when individual-level mobility and contacts mirror real behavior, while aggregates match real-world statistics like case counts or travel flows. However, collecting such fine-grained data at scale is impractical, leaving researchers with only macro-level data. LLMSynthor addresses this by turning a pretrained LLM into a macro-aware simulator that generates realistic micro-records consistent with target macro-statistics. It iteratively builds synthetic datasets: in each step, the LLM generates batches of records to minimize discrepancies between synthetic and target aggregates. Treating the LLM as a nonparametric copula allows the model to capture realistic joint dependencies among variables. To improve efficiency, LLM Proposal Sampling guides the LLM to propose targeted record batches, specifying variable ranges and counts, to efficiently correct discrepancies while preserving realism grounded in the model's priors. Evaluations across domains (mobility, e-commerce, population) show that LLMSynthor achieves strong realism, statistical fidelity, and practical utility, making it broadly applicable to economics, social science, and urban studies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.14752v4</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yihong Tang, Menglin Kong, Junlin He, Tong Nie, Wei Ma, Lijun Sun</dc:creator>
    </item>
    <item>
      <title>SpectraLDS: Provable Distillation for Linear Dynamical Systems</title>
      <link>https://arxiv.org/abs/2505.17868</link>
      <description>arXiv:2505.17868v2 Announce Type: replace 
Abstract: We present the first provable method for identifying symmetric linear dynamical systems (LDS) with accuracy guarantees that are independent of the systems' state dimension or effective memory. Our approach builds upon recent work that represents symmetric LDSs as convolutions learnable via fixed spectral transformations. We show how to invert this representation, thereby recovering an LDS model from its spectral transform and yielding an end-to-end convex optimization procedure. This distillation preserves predictive accuracy while enabling constant-time and constant-space inference per token, independent of sequence length. We evaluate our method, SpectraLDS, as a component in sequence prediction architectures and demonstrate that accuracy is preserved while inference efficiency is improved on tasks such as language modeling.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.17868v2</guid>
      <category>cs.LG</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Devan Shah, Shlomo Fortgang, Sofiia Druchyna, Elad Hazan</dc:creator>
    </item>
    <item>
      <title>FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks</title>
      <link>https://arxiv.org/abs/2505.19662</link>
      <description>arXiv:2505.19662v4 Announce Type: replace 
Abstract: This paper introduces FieldWorkArena, a benchmark for agentic AI targeting real-world field work. With the recent increase in demand for agentic AI, they are built to detect and document safety hazards, procedural violations, and other critical incidents across real-world manufacturing and retail environments. Whereas most agentic AI benchmarks focus on performance in simulated or digital environments, our work addresses the fundamental challenge of evaluating agents in the real-world. In this paper, we improve the evaluation function from previous methods to assess the performance of agentic AI in diverse real-world tasks. Our dataset comprises on-site captured images/videos in factories, warehouses and retails. Tasks were meticulously developed through interviews with site workers and managers. Evaluation results confirmed that performance evaluation considering the characteristics of Multimodal LLM (MLLM) such as GPT-4o is feasible. Furthermore, this study identifies both the effectiveness and limitations of the proposed new evaluation methodology. The complete dataset and evaluation program are publicly accessible on the website (https://en-documents.research.global.fujitsu.com/fieldworkarena/)</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.19662v4</guid>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jun Takahashi, Atsunori Moteki, Akiyoshi Uchida, Shoichi Masui, Fan Yang, Kanji Uchino, Yueqi Song, Yonatan Bisk, Graham Neubig, Ikuo Kusajima, Yasuto Watanabe, Hiroyuki Ishida, Koki Nakagawa, Shan Jiang</dc:creator>
    </item>
    <item>
      <title>ePC: Fast and Deep Predictive Coding in Digital Simulation</title>
      <link>https://arxiv.org/abs/2505.20137</link>
      <description>arXiv:2505.20137v5 Announce Type: replace 
Abstract: Predictive Coding (PC) offers a brain-inspired alternative to backpropagation for neural network training, described as a physical system minimizing its internal energy. However, in practice, PC is predominantly digitally simulated, requiring excessive amounts of compute while struggling to scale to deeper architectures. This paper reformulates PC to overcome this hardware-algorithm mismatch. First, we uncover how the canonical state-based formulation of PC (sPC) is, by design, deeply inefficient in digital simulation, inevitably resulting in exponential signal decay that stalls the entire minimization process. Then, to overcome this fundamental limitation, we introduce error-based PC (ePC), a novel reparameterization of PC which does not suffer from signal decay. Though no longer biologically plausible, ePC numerically computes exact PC weights gradients and runs orders of magnitude faster than sPC. Experiments across multiple architectures and datasets demonstrate that ePC matches backpropagation's performance even for deeper models where sPC struggles. Besides practical improvements, our work provides theoretical insight into PC dynamics and establishes a foundation for scaling PC-based learning to deeper architectures on digital hardware and beyond.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.20137v5</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>C\'edric Goemaere, Gaspard Oliviers, Rafal Bogacz, Thomas Demeester</dc:creator>
    </item>
    <item>
      <title>A Unified LLM-Adaptable Framework for Cold-Start Cognitive Diagnosis</title>
      <link>https://arxiv.org/abs/2505.21239</link>
      <description>arXiv:2505.21239v2 Announce Type: replace 
Abstract: Cognitive Diagnosis has become a critical task in AI-empowered education, supporting personalized learning by accurately assessing students' cognitive states. However, traditional cognitive diagnosis models (CDMs) often struggle in cold-start scenarios due to the lack of student-exercise interaction data. Recent NLP-based approaches leveraging pre-trained language models (PLMs) have shown promise by utilizing textual features, but they fail to fully bridge the gap between semantic understanding and cognitive profiling. To address this limitation, we propose \textbf{L}anguage \textbf{M}odel-based \textbf{C}ognitive \textbf{D}iagnosis (LMCD), a unified, LLM-adaptable framework designed to tackle cold-start challenges by harnessing the advanced capabilities of large language models (LLMs). LMCD operates via two primary phases: (1) Knowledge Diffusion, where LLMs generate enriched content for exercises and knowledge concepts (KCs) to establish stronger semantic links; and (2) Semantic-Cognitive Fusion, which leverages LLMs to deeply integrate textual information with student cognitive states. By unifying the semantic and cognitive spaces, LMCD creates comprehensive representations that serve as a plug-and-play enhancement for various off-the-shelf CDMs. Experiments on two real-world datasets demonstrate that LMCD significantly outperforms state-of-the-art methods in both exercise-cold and domain-cold settings. https://github.com/TAL-auroraX/LMCDThe code is publicly available at https://github.com/TAL-auroraX/LMCD</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.21239v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zihan Yao, Chentao Song, Yu He, Tianyu Qi, Jian Zhang, Weiping Fu, Jun Liu</dc:creator>
    </item>
    <item>
      <title>ACTIVE-o3: Empowering MLLMs with Active Perception via Pure Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2505.21457</link>
      <description>arXiv:2505.21457v2 Announce Type: replace 
Abstract: Active vision, also known as active perception, refers to actively selecting where and how to look in order to gather task-relevant information. It is a critical component of efficient perception and decision-making in humans and advanced embodied agents. With the rise of Multimodal Large Language Models (MLLMs) as central planners in robotic systems, the lack of methods for equipping MLLMs with active perception has become a key gap. We first provide a systematic definition of MLLM-based active perception tasks and show that GPT-o3's zoom-in strategy can be viewed as a special case, though it suffers from low efficiency and inaccurate region selection. To address these issues, we propose ACTIVE-o3, a reinforcement learning framework built on GRPO that equips MLLMs with active perception capabilities. Leveraging a modular sensing-action design and a dual-form reward, ACTIVE-o3 autonomously learns efficient and stable region selection strategies without explicit region-selection supervision. We further establish a comprehensive benchmark covering both open-world tasks, including small- and dense-object grounding, and domain-specific scenarios, including remote sensing, autonomous driving, and interactive segmentation. Experimental results demonstrate that ACTIVE-o3 significantly enhances active perception capabilities compared to baselines. Moreover, we show that our framework not only preserves the model's general understanding ability but can also serve as a proxy task for leveraging perception data, further improving performance on benchmarks such as RealWorldQA and MME.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.21457v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Muzhi Zhu, Hao Zhong, Canyu Zhao, Zongze Du, Mingyu Liu, Zheng Huang, Anzhou Li, Hao Chen, Cheng Zou, Jingdong Chen, Ming Yang, Chunhua Shen</dc:creator>
    </item>
    <item>
      <title>A Robust $\widetilde{\mathcal{O}}(1/\sqrt{T})$ Rate for Unprojected TD Learning with Linear Function Approximation</title>
      <link>https://arxiv.org/abs/2506.01052</link>
      <description>arXiv:2506.01052v3 Announce Type: replace 
Abstract: We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone of reinforcement learning.
  We are interested in the so-called ``robust'' setting, where the convergence guarantee does not depend on the potential function's minimal curvature.
  While prior work has established convergence guarantees in this setting, these results typically rely on the artificial assumption that each iterate is projected onto a bounded set. Removing such a condition was left as an open problem by Bhandari et al. (COLT'18), hypothesizing the need for additional ``regularity conditions''.
  In this paper, we show that the simple unprojected TD(0) converges with a rate of $\widetilde{\mathcal{O}}\left(\frac{\|\theta^*\|^2_2}{\sqrt{T}}\right)$ in expectation, even in the presence of Markovian noise. We do not require an additional regularity condition, but only a minor polylog correction to the learning rate. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.01052v3</guid>
      <category>cs.LG</category>
      <category>math.OC</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wei-Cheng Lee, Francesco Orabona</dc:creator>
    </item>
    <item>
      <title>dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching</title>
      <link>https://arxiv.org/abs/2506.06295</link>
      <description>arXiv:2506.06295v3 Announce Type: replace 
Abstract: Autoregressive Models (ARMs) have long dominated the landscape of Large Language Models. Recently, a new paradigm has emerged in the form of diffusion-based Large Language Models (dLLMs), which generate text by iteratively denoising masked segments. This approach has shown significant advantages and potential. However, dLLMs suffer from high inference latency. Traditional ARM acceleration techniques, such as Key-Value caching, are incompatible with dLLMs due to their bidirectional attention mechanism. To address this specific challenge, our work begins with a key observation that dLLM inference involves a static prompt and a partially dynamic response, where most tokens remain stable across adjacent denoising steps. Based on this, we propose dLLM-Cache, a training-free adaptive caching framework that combines long-interval prompt caching with partial response updates guided by feature similarity. This design enables efficient reuse of intermediate computations without compromising model performance. Extensive experiments on representative dLLMs, including LLaDA 8B and Dream 7B, show that dLLM-Cache achieves up to 9.1x FLOPs reduction on LongBench-HotpotQA while maintaining competitive output quality. Notably, our method brings dLLM inference latency close to that of ARMs under many settings. The code for this work is publicly available at: https://github.com/maomaocun/dLLM-cache.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.06295v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhiyuan Liu, Yicun Yang, Yaojie Zhang, Junjie Chen, Chang Zou, Qingyan Wei, Shaobo Wang, Yichen Zhu, Linfeng Zhang</dc:creator>
    </item>
    <item>
      <title>Robust In-Context Reinforcement Learning Under Reward Poisoning Attacks</title>
      <link>https://arxiv.org/abs/2506.06891</link>
      <description>arXiv:2506.06891v3 Announce Type: replace 
Abstract: We study the corruption-robustness of in-context reinforcement learning (ICRL), focusing on the Decision-Pretrained Transformer (DPT, Lee et al., 2023). To address the challenge of reward poisoning attacks targeting the DPT, we propose a novel adversarial training framework, called Adversarially Trained DPT (AT-DPT). Our method simultaneously trains a population of attackers to minimize the true reward of the DPT by poisoning environment rewards, and a DPT model to infer optimal actions from the poisoned data. We evaluate the effectiveness of our approach against standard bandit algorithms, including robust baselines designed to handle reward contamination. Our results show that AT-DPT significantly outperforms them in bandit settings under a learned attacker, and generalizes to more complex environments such as adaptive attackers and MDPs. It shows promise in ICRL as a meta-RL approach to learning effective corruption-robust algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.06891v3</guid>
      <category>cs.LG</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Paulius Sasnauskas, Yi\u{g}it Yal{\i}n, Goran Radanovi\'c</dc:creator>
    </item>
    <item>
      <title>Modeling the Diachronic Evolution of Legal Norms: An LRMoo-Based, Component-Level, Event-Centric Approach to Legal Knowledge Graphs</title>
      <link>https://arxiv.org/abs/2506.07853</link>
      <description>arXiv:2506.07853v5 Announce Type: replace 
Abstract: Representing the temporal evolution of legal norms is a critical challenge for automated processing. While foundational frameworks exist, they lack a formal pattern for granular, component-level versioning, hindering the deterministic point-in-time reconstruction of legal texts required by reliable AI applications. This paper proposes a structured, temporal modeling pattern grounded in the LRMoo ontology. Our approach models a norm's evolution as a diachronic chain of versioned F1 Works, distinguishing between language-agnostic Temporal Versions (TV), each being a distinct Work, and their monolingual Language Versions (LV), modeled as F2 Expressions. The legislative amendment process is formalized through event-centric modeling, allowing changes to be traced precisely. Using the Brazilian Constitution as a case study, we demonstrate that our architecture enables the exact reconstruction of any part of a legal text as it existed on a specific date. This provides a verifiable semantic backbone for legal knowledge graphs, offering a deterministic foundation for trustworthy legal AI.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.07853v5</guid>
      <category>cs.AI</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hudson de Martim</dc:creator>
    </item>
    <item>
      <title>Verification of the Release-Acquire Semantics</title>
      <link>https://arxiv.org/abs/2506.08238</link>
      <description>arXiv:2506.08238v2 Announce Type: replace 
Abstract: The Release-Acquire (RA) semantics and its variants are some of the most fundamental models of concurrent semantics for architectures, programming languages, and distributed systems. Several steps have been taken in the direction of testing such semantics, where one is interested in whether a single program execution is consistent with a memory model. The more general verification problem, i.e., checking whether any allowed program run is consistent with a memory model, has still not been studied as much. The purpose of this work is to bridge this gap. We tackle the verification problem, where, given an implementation described as a register machine, we check if any of its runs violates the RA semantics or its Strong (SRA) and Weak (WRA) variants. We show that verifying WRA in this setup is in O(n5 ), while verifying the RA and SRA is PSPACE complete. This both answers some fundamental questions about the complexity of these problems, but also provides insights on the expressive power of register machines as a model.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.08238v2</guid>
      <category>cs.PL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Parosh Abdulla, Elli Anastasiadi, Mohamed Faouzi Atig, L\'eo Exibard, Samuel Grahn</dc:creator>
    </item>
    <item>
      <title>Formalizing Learning from Language Feedback with Provable Guarantees</title>
      <link>https://arxiv.org/abs/2506.10341</link>
      <description>arXiv:2506.10341v2 Announce Type: replace 
Abstract: Interactively learning from observation and language feedback is an increasingly studied area driven by the emergence of large language model (LLM) agents. Despite impressive empirical demonstrations, so far a principled framing of these decision problems remains lacking. We formalize the Learning from Language Feedback (LLF) problem, assert sufficient assumptions to enable learning despite latent rewards, and introduce $\textit{transfer eluder dimension}$ as a measure to characterize the hardness of LLF. We formalize the intuition that information in the language feedback governs the learning complexity, and demonstrate cases where learning from rich language feedback can be exponentially faster than learning from reward. We develop a no-regret algorithm, called $\texttt{HELiX}$, that provably solves LLF problems through sequential interactions, with performance guarantees that scale with the transfer eluder dimension. Across several empirical domains, we show that $\texttt{HELiX}$ performs well even when repeatedly prompting LLMs does not work reliably. Our contributions mark an important step towards designing principled interactive learning algorithms using generic language feedback.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.10341v2</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wanqiao Xu, Allen Nie, Ruijie Zheng, Aditya Modi, Adith Swaminathan, Ching-An Cheng</dc:creator>
    </item>
    <item>
      <title>The Sample Complexity of Parameter-Free Stochastic Convex Optimization</title>
      <link>https://arxiv.org/abs/2506.11336</link>
      <description>arXiv:2506.11336v2 Announce Type: replace 
Abstract: We study the sample complexity of stochastic convex optimization when problem parameters such as the distance to optimality and the Lipschitz constant are unknown. We pursue two strategies. First, we develop a reliable model selection method that avoids overfitting to the validation set. This method allows us to generically tune the learning rate of stochastic optimization methods to match the optimal known-parameter sample complexity up to log log factors. Second, we develop a regularization-based method that is specialized to the case that only the distance to optimality is unknown. More specifically, it uses norm-regularized empirical risk minimization to estimate the distance to optimality to within a constant factor, allowing known-parameter stochastic optimization methods to achieve optimal sample complexity. This method provides perfect adaptability to unknown distance to optimality, demonstrating a separation between the sample and computational complexity of parameter-free stochastic convex optimization. Combining these two methods allows us to simultaneously adapt to multiple problem structures.
  Experiments performing few-shot learning on CIFAR-10 by fine-tuning CLIP models and prompt engineering Gemini to count shapes indicate that our reliable model selection method can help mitigate overfitting to small validation sets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.11336v2</guid>
      <category>cs.LG</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jared Lawrence, Ari Kalinsky, Hannah Bradfield, Yair Carmon, Oliver Hinder</dc:creator>
    </item>
    <item>
      <title>The Parametrised Complexity of Counting Small Sub-Hypergraphs</title>
      <link>https://arxiv.org/abs/2506.14081</link>
      <description>arXiv:2506.14081v2 Announce Type: replace 
Abstract: Subgraph counting is a fundamental and well-studied problem whose computational complexity is well understood. Quite surprisingly, the hypergraph version of subgraph counting has been almost ignored. In this work, we address this gap by investigating the most basic sub-hypergraph counting problem: given a (small) hypergraph $H$ and a (large) hypergraph $G$, compute the number of sub-hypergraphs of $G$ isomorphic to $H$. Formally, for a family $\mathcal{H}$ of hypergraphs, let #Sub($\mathcal{H}$) be the restriction of the problem to $H \in \mathcal{H}$; the induced variant #IndSub($\mathcal{H}$) is defined analogously. Our main contribution is a complete classification of the complexity of these problems. Assuming the Exponential Time Hypothesis, we prove that #Sub($\mathcal{H}$) is fixed-parameter tractable if and only if $\mathcal{H}$ has bounded fractional co-independent edge-cover number, a novel graph parameter we introduce. Moreover, #IndSub($\mathcal{H}$) is fixed-parameter tractable if and only if $\mathcal{H}$ has bounded fractional edge-cover number. Both results subsume pre-existing results for graphs as special cases. We also show that the fixed-parameter tractable cases of #Sub($\mathcal{H}$) and #IndSub($\mathcal{H}$) are unlikely to be in polynomial time, unless respectively #P = P and Graph Isomorphism $\in$ P. This shows a separation with the special case of graphs, where the fixed-parameter tractable cases are known to actually be in polynomial time.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.14081v2</guid>
      <category>cs.CC</category>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Marco Bressan, Julian Brinkmann, Holger Dell, Marc Roth, Philip Wellnitz</dc:creator>
    </item>
    <item>
      <title>Efficient and Stealthy Jailbreak Attacks via Adversarial Prompt Distillation from LLMs to SLMs</title>
      <link>https://arxiv.org/abs/2506.17231</link>
      <description>arXiv:2506.17231v3 Announce Type: replace 
Abstract: Current jailbreak attacks on large language models (LLMs) predominantly rely on LLMs themselves to generate adversarial prompts, creating a critical efficiency bottleneck: each attack requires substantial computational resources and API queries, limiting scalability and practical deployment. To overcome this limitation, we propose Adversarial Prompt Distillation (APD), a novel framework that transfers jailbreaking capabilities from LLMs to small language models (SLMs) for efficient, low-resource attacks. APD integrates three key components: (1) masked adversarial knowledge pre-training via LoRA fine-tuning, (2) dynamic temperature-controlled knowledge distillation to bridge architectural gaps, and (3) reinforcement learning-based template optimization for adaptive refinement. Extensive experiments across 12 models show that APD achieves state-of-the-art attack success rates (e.g., 96.4% ASR_k on GPT-4) while dramatically improving efficiency - generating prompts 3.7x faster with 11.3x fewer parameters than teacher models. Our work establishes the first practical framework for lightweight jailbreak attacks, exposes new vulnerabilities in LLM defenses, and provides a scalable testbed for advancing AI safety research. Our code is available at: https://github.com/lxgem/Efficient_and_Stealthy_Jailbreak_Attacks_via_Adversarial_Prompt.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.17231v3</guid>
      <category>cs.CL</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiang Li, Chong Zhang, Jia Wang, Fangyu Wu, Yushi Li, Xiaobo Jin</dc:creator>
    </item>
    <item>
      <title>Revisiting Power System Stabilizers with Increased Inverter-Based Generation: A Case Study</title>
      <link>https://arxiv.org/abs/2506.19357</link>
      <description>arXiv:2506.19357v3 Announce Type: replace 
Abstract: As power systems evolve with increasing production from Inverter-Based Resources (IBRs), their underlying dynamics are undergoing significant changes that can jeopardize system operation, leading to poorly damped oscillations or small-signal rotor angle instability. In this work, we investigate whether Power System Stabilizer (PSS) setting adjustments can effectively restore system stability and provide adequate damping in systems with increased IBR penetration, using the benchmark Kundur Two-Area System as a case study. Specifically, we evaluate the model-based Residues and P-Vref PSS tuning methods to examine their effectiveness under evolving grid conditions. Our findings indicate that the effectiveness of these tuning methods is not guaranteed, particularly when coordination is limited. Consequently, our case study motivates local and adaptive online PSS tuning methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.19357v3</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1109/ISGTEurope64741.2025.11305525</arxiv:DOI>
      <arxiv:journal_reference>Proc. 2025 IEEE Power &amp; Energy Society Innovative Smart Grid Technologies Conference (ISGT), 2025</arxiv:journal_reference>
      <dc:creator>Jovan Krajacic, Keith Moffat, Gustavo Valverde</dc:creator>
    </item>
    <item>
      <title>TRIM: A Self-Supervised Video Summarization Framework Maximizing Temporal Relative Information and Representativeness</title>
      <link>https://arxiv.org/abs/2506.20588</link>
      <description>arXiv:2506.20588v3 Announce Type: replace 
Abstract: The increasing ubiquity of video content and the corresponding demand for efficient access to meaningful information have elevated video summarization and video highlights as a vital research area. However, many state-of-the-art methods depend heavily either on supervised annotations or on attention-based models, which are computationally expensive and brittle in the face of distribution shifts that hinder cross-domain applicability across datasets. We introduce a pioneering self-supervised video summarization model that captures both spatial and temporal dependencies without the overhead of attention, RNNs, or transformers. Our framework integrates a novel set of Markov process-driven loss metrics and a two-stage self supervised learning paradigm that ensures both performance and efficiency. Our approach achieves state-of-the-art performance on the SUMME and TVSUM datasets, outperforming all existing unsupervised methods. It also rivals the best supervised models, demonstrating the potential for efficient, annotation-free architectures. This paves the way for more generalizable video summarization techniques and challenges the prevailing reliance on complex architectures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.20588v3</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pritam Mishra, Coloma Ballester, Dimosthenis Karatzas</dc:creator>
    </item>
    <item>
      <title>Structural Decoupling: A Scaffold-Flow Theory of Generalization and Alignment</title>
      <link>https://arxiv.org/abs/2506.20699</link>
      <description>arXiv:2506.20699v2 Announce Type: replace 
Abstract: Learning in non-stationary and multi-context environments requires more than ordinary within-task generalization. A system must also discover which contexts exist, route inputs to the correct context, preserve old contexts, and revise the context library when the environment changes. This paper presents Structural Learning Theory (StrLT) as a framework of filling this missing structural gap. StrLT complements Vapnik's Statistical Learning Theory (SLT): SLT governs the \emph{funnel}, prediction or control within a fixed regime; while StrLT governs the \emph{trap}, the discovery and maintenance of structural regimes. The core StrLT object is \emph{width}, the minimum number of locally feasible contexts needed to cover a problem. We summarize three basic results: width is incomparable with VC dimension; learning exhibits a phase transition at the true width; and width can be estimated by a contractive-similarity (CS) operator that converts task-induced non-contractivity into spectral separation. Under the StrLT framework, we explain how fixed-class structural learnability leads to a \emph{structural decoupling principle}: the mechanisms that maintain the structural scaffold should not be trained by the same gradients that optimize within-context flow. This principle motivates a scaffold-flow model in which alignment and generalization separate architecturally. Finally, we argue that several safety failures, including hallucination, reward-model boundary errors, and deceptive alignment, can be interpreted as scaffold-resolution or scaffold-preservation failures rather than merely output-level prediction errors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.20699v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/publicdomain/zero/1.0/</dc:rights>
      <dc:creator>Xin Li</dc:creator>
    </item>
    <item>
      <title>Pseudo-Equilibria, or: How to Stop Worrying About Crypto and Just Analyze the Game</title>
      <link>https://arxiv.org/abs/2506.22089</link>
      <description>arXiv:2506.22089v2 Announce Type: replace 
Abstract: We consider the problem of a game theorist analyzing a game that uses cryptographic protocols. Ideally, a theorist abstracts protocols as ideal, implementation-independent primitives, letting conclusions in the "ideal world" carry over to the "real world." This is crucial, since the game theorist cannot--and should not be expected to--handle full cryptographic complexity. In today's landscape, the rise of distributed ledgers makes a shared language between cryptography and game theory increasingly necessary.
  The security of cryptographic protocols hinges on two types of assumptions: state-of-the-world (e.g., "factoring is hard") and behavioral (e.g., "honest majority"). We observe that for protocols relying on behavioral assumptions (e.g., ledgers), our goal is unattainable in full generality. For state-of-the-world assumptions, we show that standard solution concepts, e.g., ($\epsilon$-)Nash equilibria, are not robust to transfer from the ideal to the real world.
  We propose a new solution concept: the pseudo-Nash equilibrium. Informally, a profile $s=(s_1,\dots,s_n)$ is a pseudo-Nash equilibrium if, for any player $i$ and deviation $s'_i$ with higher expected utility, $i$'s utility from $s_i$ is (computationally) indistinguishable from that of $s'_i$. Pseudo-Nash is simpler and more accessible to game theorists than prior notions addressing the mismatch between (asymptotic) cryptography and game theory. We prove that Nash equilibria in games with ideal, unbreakable cryptography correspond to pseudo-Nash equilibria when ideal cryptography is instantiated with real protocols (under state-of-the-world assumptions). Our translation is conceptually simpler and more general: it avoids tuning or restricting utility functions in the ideal game to fit quirks of cryptographic implementations. Thus, pseudo-Nash lets us study game-theoretic and cryptographic aspects separately and seamlessly.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.22089v2</guid>
      <category>cs.GT</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Alexandros Psomas, Athina Terzoglou, Yu Wei, Vassilis Zikas</dc:creator>
    </item>
    <item>
      <title>How Reliable are Fairness Audits with Unreliable Data?</title>
      <link>https://arxiv.org/abs/2506.23033</link>
      <description>arXiv:2506.23033v2 Announce Type: replace 
Abstract: Fairness audits are a key component of responsible machine-learning deployment. Yet, the reliability of audit recommendations under incomplete protected-label access is still poorly understood. In this work, we focused on protected-label missingness in fairness mitigation audits. We introduced a seed-calibrated stress test to separate missingness effects from seed-to-seed movement that is already present under complete labels. Across ACS/Folktables tasks, we found that positive-availability missingness usually does not move selected mitigation methods beyond the complete-label seed floor. The no-label endpoint behaves differently, exposing ERM-equivalent candidates and deterministic tie-breaking rather than a broad missingness effect. We also found that threshold optimization can turn single-axis fairness gains into above-null intersectional harm, a sharper failure pattern that appears to remain visible under random-forest validation. Overall, our results highlight that protected-label missingness should be reported with seed-null calibration, candidate-set context, and intersectional consequences before it is treated as evidence of audit fragility.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.23033v2</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yash Vardhan Tomar</dc:creator>
    </item>
    <item>
      <title>Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones</title>
      <link>https://arxiv.org/abs/2507.00322</link>
      <description>arXiv:2507.00322v2 Announce Type: replace 
Abstract: Despite remarkable advances in coding capabilities, language models (LMs) still struggle with simple syntactic tasks such as generating balanced parentheses. In this study, we investigate the underlying mechanisms behind the persistence of these errors across LMs of varying sizes (124M-7B) to both understand and mitigate the errors. Our study reveals that LMs rely on a number of components (attention heads and FF neurons) that independently make their own predictions. While some components reliably promote correct answers across a generalized range of inputs (i.e., implementing "sound mechanisms''), others are less reliable and introduce noise by promoting incorrect tokens (i.e., implementing "faulty mechanisms''). Errors occur when the faulty mechanisms overshadow the sound ones and dominantly affect the predictions. Motivated by this insight, we introduce RASteer, a steering method to systematically identify and increase the contribution of reliable components for improving model performance. RASteer substantially improves performance on balanced parentheses tasks, boosting accuracy of some models from $0$% to around $100$% without impairing the models' general coding ability. We further demonstrate its broader applicability in arithmetic reasoning tasks, achieving performance gains of up to around $20$%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.00322v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Daking Rai, Samuel Miller, Kevin Moran, Ziyu Yao</dc:creator>
    </item>
    <item>
      <title>Convergence Bound and Critical Batch Size of Muon Optimizer</title>
      <link>https://arxiv.org/abs/2507.01598</link>
      <description>arXiv:2507.01598v5 Announce Type: replace 
Abstract: Muon, a recently proposed optimizer that leverages the inherent matrix structure of neural network parameters, has demonstrated strong empirical performance, indicating its potential as a successor to standard optimizers such as AdamW. This paper presents theoretical analysis to support its practical success. We provide convergence proofs for Muon across four practical settings, systematically examining its behavior with and without the inclusion of Nesterov momentum and weight decay. We then demonstrate that the addition of weight decay ensures almost-sure boundedness of the parameter and gradient norms -- without relying on the commonly imposed bounded-gradient assumption -- and clarify the interplay between the weight decay coefficient and the learning rate. Finally, we derive a lower bound on the critical batch size for Muon -- the batch size that minimizes the stochastic first-order oracle (SFO) complexity of training. Because the resulting formula involves problem-dependent quantities that are not directly observable (gradient variance, target precision, effective rank), it does not predict the critical batch size in absolute terms; rather, it reveals how the hyperparameters $\beta$ (momentum) and $\lambda$ (weight decay) govern the qualitative scaling of this value. Our experiments validate these hyperparameter-dependent predictions across workloads including image classification and language modeling.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.01598v5</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Naoki Sato, Hiroki Naganuma, Hideaki Iiduka</dc:creator>
    </item>
    <item>
      <title>Functional design of efficient and parallelizable combinatorial generators using convolution</title>
      <link>https://arxiv.org/abs/2507.03980</link>
      <description>arXiv:2507.03980v4 Announce Type: replace 
Abstract: The application of program transformation and algebraic methods to the development of efficient combinatorial optimization (CO) algorithms relies on an exhaustive combinatorial generator for the problem specification, followed by the fusion of thinning or filtering processes into this specification. However, the effectiveness of such fusion transformations critically depends on the structural compatibility between the objective function and the generator, which is highly problem dependent. In practice, when the majority of candidate solutions remain unfiltered or are not eliminated-as is the case for most intractable CO problems-the overall efficiency of the resulting fused program is largely determined by the intrinsic efficiency of the combinatorial generator. Consequently, if the specification itself exhibits suboptimal performance, the fused program will inherit a correspondingly inferior level of efficiency.
  We argue that a genuine designed process should also account for hardware compatibility and parallelizability-particularly the ability to support efficient parallel execution on modern hardware architectures, including multi-level cache hierarchies and GPUs. However, does achieving formal correctness necessarily conflict with designing algebraically elegant algorithms that support fusion? Can we obtain both simultaneously?
  In this paper, we show that techniques from functional programming, provide powerful formal tools for the systematic construction of such hardware-compatible and parallelizable combinatorial generators. This paper investigates generators for two of the most fundamental combinatorial structures-combinations and permutations-together with their natural extension to nested generators (e.g., combinations/permutations of combinations/permutations).</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.03980v4</guid>
      <category>cs.DM</category>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Xi He, Zhenjiang Hu, Max. A. Little</dc:creator>
    </item>
    <item>
      <title>Tunable Real-Time Safety Filters via Set-Based Control Barrier Functions</title>
      <link>https://arxiv.org/abs/2507.07805</link>
      <description>arXiv:2507.07805v3 Announce Type: replace 
Abstract: Safety filters for industrial constrained systems are required to combine certified constraint satisfaction, predictable online computation, and a transparent tuning interface. Existing set-based filters are based on a well-established control invariant set design that scales favorably with state and input constraints, but typically intervene only at the set boundary. Control barrier function (CBF)-based filters, by contrast, provide tunable intervention but require a scalar barrier construction. This paper proposes a set-based CBF safety filter that turns a convex control invariant set directly into a tunable barrier via its Minkowski functional. The resulting filter is formulated as a single-level quadratic program (QP) in which one class-$\mathcal{K}^e$ parameter sets the intervention aggressiveness. Explicit convex formulations are derived for polytopic, zonotopic, and MPC-based invariant sets. Under standard bounded-disturbance assumptions, the resulting safety filter guarantees constraint satisfaction and asymptotic recovery into the invariant set. For tight real-time budgets, a learning-based approximation enables online acceleration, while the formal safety guarantees remain tied to the exact formulation. The method is validated in numerical studies and on a permanent-magnet synchronous motor drive, where an explicit QP implementation evaluates within a 150 microseconds sampling window and has a worst-case execution time of 28.04 microseconds.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.07805v3</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kim P. Wabersich, Felix Berkel, Felix Gruber, Sven Reimann</dc:creator>
    </item>
    <item>
      <title>PUMA: Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning</title>
      <link>https://arxiv.org/abs/2507.08064</link>
      <description>arXiv:2507.08064v4 Announce Type: replace 
Abstract: As multimedia content expands, the demand for unified multimodal retrieval (UMR) in real-world applications increases. Recent work leverages multimodal large language models (MLLMs) to tackle this task. However, their large parameter size results in high training costs and low inference efficiency. To address this, we propose PUMA: a Layer-Pruned Language Model for Efficient Unified Multimodal Retrieval with Modality-Adaptive Learning. Our approach improves UMR from both structural and learning perspectives. (1) Structurally, we propose Layer-Pruned Self-Distillation, which prunes MLLMs by keeping only shallow layers while distilling features from dropped deep layers as teacher signals. This reduces parameters and preserves representation capability. (2) On the learning side, we introduce Modality-Adaptive Contrastive Learning Loss (MAC-Loss), which separates in-batch negatives into harder intra-modality and easier inter-modality groups based on the target modality, assigning different temperature strategies to enhance learning efficiency. Experiments show our method significantly reduces resource usage while maintaining strong performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.08064v4</guid>
      <category>cs.MM</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yibo Lyu, Rui Shao, Gongwei Chen, Yijie Zhu, Weili Guan, Liqiang Nie</dc:creator>
    </item>
    <item>
      <title>Analysis of Information Theory for Explainable AI</title>
      <link>https://arxiv.org/abs/2507.09092</link>
      <description>arXiv:2507.09092v2 Announce Type: replace 
Abstract: With the intervention of machine vision in our crucial day to day necessities including healthcare and automated power plants, attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network provides specific inferences. This paper proposes a novel post-hoc visual explanation method called MI CAM based on activation mapping. Differing from previous class activation mapping based approaches, MI CAM produces saliency visualizations by weighing each feature map through its mutual information with the input image and the final result is generated by a linear combination of weights and activation maps. It also adheres to producing causal interpretations as validated with the help of counterfactual analysis. We aim to exhibit the visual performance and unbiased justifications for the model inferencing procedure achieved by MI CAM. Our approach works at par with all state-of-the-art methods but particularly outperforms some in terms of qualitative and quantitative measures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.09092v2</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Ram S Iyer</dc:creator>
    </item>
    <item>
      <title>Sound and Complete Neurosymbolic Reasoning with LLM-Grounded Interpretations</title>
      <link>https://arxiv.org/abs/2507.09751</link>
      <description>arXiv:2507.09751v3 Announce Type: replace 
Abstract: Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation, but exhibit problems with logical consistency in their output. How can we harness LLMs' broad-coverage parametric knowledge in formal reasoning despite their inconsistency? We present a method for directly integrating an LLM into the interpretation function of the formal semantics for a paraconsistent logic. We evaluate the method empirically using datasets derived from the short-form factuality benchmarks GPQA and SimpleQA, showing that bilateral factuality evaluation improves macro-F1 over a unilateral baseline by roughly 6 percentage points on both benchmarks (at the cost of reduced coverage, as abstention is triggered on inconsistent or uncertain cases). We further describe a proof-of-concept tableau reasoner implementing the method, and apply it to a medication-safety knowledge base of 228 asserted and 712 inferred statements: the system detects 92 gluts corresponding to medically significant errors (e.g., opioids inferred as non-addictive, beta-blockers inferred as safe in asthma) while remaining satisfiable, demonstrating that contradictions are localized rather than causing logical explosion. Unlike prior work, our method offers a theoretical framework with a practical implementation for neurosymbolic reasoning that leverages an LLM's knowledge while preserving the underlying logic's soundness and completeness properties.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.09751v3</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bradley P. Allen, Prateek Chhikara, Thomas Macaulay Ferguson, Filip Ilievski, Paul Groth</dc:creator>
    </item>
    <item>
      <title>Learning Task Mixtures from Task Affinities: A Probabilistic Graphical Model for Supervised Fine-Tuning</title>
      <link>https://arxiv.org/abs/2507.12612</link>
      <description>arXiv:2507.12612v4 Announce Type: replace 
Abstract: Supervised fine-tuning performance for large language models depends strongly on how training budget is distributed across a heterogeneous set of tasks. In practice, mixtures are often fixed using simple heuristics (e.g., uniform or size-proportional sampling) that ignore task interactions, which can hurt transfer and waste budget on redundant sources. We introduce TaskPGM, a framework for learning continuous task mixtures via an energy-based model over tasks. Tasks form the nodes of a Markov random field: unary potentials capture per-task utility, and pairwise potentials encode inter-task relationships using behavioral divergences computed from predictive distributions of single-task fine-tuned models (e.g., Jensen--Shannon divergence and pointwise mutual information). Optimizing this objective yields mixtures that balance coverage against redundancy. We show that the resulting set function is weakly submodular under budget constraints, enabling approximation guarantees for discrete selection variants. Across multiple model families (LLaMA-7B, Qwen2-7B) and evaluation suites (BIG-Bench Hard), TaskPGM improves over standard mixing strategies and provides interpretable structure over task interactions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.12612v4</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Prateek Chanda, Saral Sureka, Parth Pratim Chatterjee, Krishnateja Killamsetty, Nikhil Shivakumar Nayak, Ganesh Ramakrishnan</dc:creator>
    </item>
    <item>
      <title>Are Two Datasets Close Enough With Statistical Significance? A Kernel Distributional Closeness Testing Approach</title>
      <link>https://arxiv.org/abs/2507.12843</link>
      <description>arXiv:2507.12843v3 Announce Type: replace 
Abstract: Are two distributions close to each other with statistical significance? Distribution closeness testing (DCT) formalizes this question by testing whether the distance between a distribution pair is at least epsilon-far. Existing DCT methods mainly measure discrepancies between distribution pairs defined on discrete spaces, for example using total variation, which limits their application to complex data such as images. To extend DCT to more types of data, a natural idea is to introduce maximum mean discrepancy (MMD), a powerful measure of distributional discrepancy between complex distributions, into DCT scenarios. However, empirical results indicate that many distribution pairs can have the same MMD value despite having different norms in the same reproducing kernel Hilbert space (RKHS). These pairs may exhibit different finite-sample distinguishability and reflect different practical closeness levels, making MMD less informative for DCT. To mitigate this issue, we design a new measure of distributional discrepancy, norm-adaptive MMD (NAMMD), which scales the MMD value using the RKHS norms of distributions. Based on the asymptotic distribution of NAMMD, we propose NAMMD-based DCT to assess the closeness level of a distribution pair. Theoretically, we prove that NAMMD-based DCT has higher test power than MMD-based DCT while maintaining bounded type-I error. This is further validated by extensive experiments on multiple types of data, including synthetic noise and real images. Our code is available at https://github.com/zhijianzhouml/NAMMD.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.12843v3</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhijian Zhou, Liuhua Peng, Xunye Tian, Mingming Gong, Feng Liu</dc:creator>
    </item>
    <item>
      <title>Disjoint Generation of Synthetic Data</title>
      <link>https://arxiv.org/abs/2507.19700</link>
      <description>arXiv:2507.19700v2 Announce Type: replace 
Abstract: We propose a new framework for generating tabular synthetic datasets via disjoint generative models. In this paradigm, a dataset is partitioned into disjoint subsets that are supplied to separate instances of generative models. The results are then combined post hoc by a joining operation that works in the absence of common variables/identifiers. The success of the framework is demonstrated through several case studies and examples on tabular data that help illuminate some of the design choices that one may make. The advantages achieved by the disjoint generation include: i) An observed increase in the empirical measurement of privacy. ii) Increased computational feasibility of certain model types. iii) Ability to generate synthetic data using a mixture of different generative models. Specifically, mixed-model synthesis bridges the gap between privacy and utility performance, providing highly competitive performance on Accuracy and Area Under the Curve for downstream tasks while significantly lowering the empirical re-identification risk.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.19700v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>Transact. mach. learn. res. (June 2026). https://openreview.net/forum?id=LSzXkAWBKI</arxiv:journal_reference>
      <dc:creator>Anton Danholt Lautrup, Muhammad Rajabinasab, Tobias Hyrup, Arthur Zimek, Peter Schneider-Kamp</dc:creator>
    </item>
    <item>
      <title>Discovering heuristics in a complex SAT solver with large language models</title>
      <link>https://arxiv.org/abs/2507.22876</link>
      <description>arXiv:2507.22876v2 Announce Type: replace 
Abstract: The Satisfiability problem (SAT) is fundamental in computational complexity theory and has a wide range of industrial applications. Optimizing modern SAT solvers in real-world settings is quite challenging due to their intricate architectures. While automatic configuration frameworks have been developed, they rely on manually constrained search spaces. Here we develop AutoModSAT, a framework that uses large language models (LLMs) to automatically optimize SAT solvers. AutoModSAT combines an LLM-compatible modular solver design, unsupervised prompt optimization to diversify generated functions, and an efficient search procedure based on presearch strategy and a $(1+\lambda)$ evolutionary algorithm. Extensive experiments across a wide range of datasets demonstrate that AutoModSAT achieves $40\%$ performance improvement over the baseline solver and $30\%$ improvement over the state-of-the-art solvers. Moreover, AutoModSAT also attains a notable speedup compared to the parameter-tuned alternatives of the state-of-the-art solvers over most of the test datasets. These results demonstrate the potential of LLM-guided heuristic discovery for optimizing complex SAT solvers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.22876v2</guid>
      <category>cs.AI</category>
      <category>cs.LO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yiwen Sun, Furong Ye, Zhihan Chen, Ke Wei, Shaowei Cai</dc:creator>
    </item>
    <item>
      <title>Petri Net Modeling and Deadlock-Free Scheduling of Attachable Heterogeneous AGV Systems</title>
      <link>https://arxiv.org/abs/2508.00724</link>
      <description>arXiv:2508.00724v2 Announce Type: replace 
Abstract: The increasing demand for flexible automation has accelerated the adoption of heterogeneous automated guided vehicles (AGVs). This work investigates a new scheduling problem in a material transportation system consisting of attachable heterogeneous AGVs, including carriers and shuttles, that flexibly attach and detach for cooperative task execution. While such collaboration enhances operational efficiency, the attachment-induced synchronization renders the system highly coupled and susceptible to deadlocks. To address this, we propose a Petri net (PN)-based deadlock-free scheduling framework integrated into an adaptive large neighborhood search (ALNS) algorithm. The PN is introduced to map candidate solutions from static permutations into dynamic collaborative processes, enabling performance evaluation via state evolution and proactive deadlock prevention through structural analysis. Extensive experiments on real-world and synthetic instances demonstrate that the proposed framework significantly improves computational efficiency, with the developed ALNS outperforming the current on-site policy, exact solvers, and state-of-the-art metaheuristics. Finally, sensitivity analysis yields managerial insights for optimal fleet sizing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.00724v2</guid>
      <category>eess.SY</category>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Boyu Li, Zhengchen Li, Weimin Wu, Mengchu Zhou</dc:creator>
    </item>
    <item>
      <title>A Survey on Deep Multi-Task Learning in Connected Autonomous Vehicles</title>
      <link>https://arxiv.org/abs/2508.00917</link>
      <description>arXiv:2508.00917v2 Announce Type: replace 
Abstract: Connected autonomous vehicles (CAVs) must simultaneously perform multiple tasks, such as perception, prediction, planning, and control, to ensure safe and reliable navigation in complex environments. Moreover, through vehicle-to-everything (V2X) communication, cooperative perception and driving among CAVs can be enabled, thereby mitigating the limitations of individual vehicles, while it also introduces stringent latency, reliability, and bandwidth constraints. Traditionally, tasks are addressed using separate models, which leads to high deployment costs, increased computational overhead, and challenges in achieving real-time performance. Multi-task learning (MTL) has recently emerged as a promising solution that enables the joint learning of multiple tasks within a unified model. This offers improved efficiency and resource utilization. To the best of our knowledge, this survey is the first comprehensive review focusing on deep MTL in CAVs. We begin with an overview of CAVs and MTL to provide foundational background. Then, we review MTL approaches across key functional domains in CAVs, including perception, prediction, planning, control, as well as V2X communications and radio resource management (RRM). For the first four domains, we categorize existing works under ego vehicle-only (onboard-only) and V2X-enhanced cooperative (multi-agent) paradigms. We further discuss V2X communications and RRM as communication-centric MTL problems. Finally, we discuss the strengths and limitations of existing methods, identify key research gaps, and provide future research directions aimed at advancing MTL methodologies for CAV systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.00917v2</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1109/COMST.2026.3699223</arxiv:DOI>
      <dc:creator>Jiayuan Wang, Farhad Pourpanah, Q. M. Jonathan Wu, Ning Zhang</dc:creator>
    </item>
    <item>
      <title>Resilience metrics to guide back-up investments in the power system during extreme weather</title>
      <link>https://arxiv.org/abs/2508.05163</link>
      <description>arXiv:2508.05163v2 Announce Type: replace 
Abstract: Security of supply is a common and important concern when integrating renewables in net-zero power systems. Extreme weather affects both demand and supply leading to power system stress; in Europe this stress spreads continentally beyond the meteorological root cause. We use an approach based on shadow prices to identify periods of elevated stress called system-defining events and analyse their impact on the power system. By classifying different types of system-defining events, we identify challenges to power system operation and planning. Crucially, we find the need for sufficient resilience back-up (power) capacities whose financial viability is precarious due to weather variability and weather-induced risk. Furthermore, we disentangle short- and long-term resilience challenges (from multi-day to annual scale) with distinct metrics and stress tests to incorporate both into future energy modelling assessments. Our methodology and implementation in an open energy system model (PyPSA-Eur) can be re-applied to other systems and help researchers and policymakers in building more resilient and adequate energy systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.05163v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aleksander Grochowicz, Hannah C. Bloomfield, Marta Victoria</dc:creator>
    </item>
    <item>
      <title>CLONE: A 3DGS-Based Closed-Loop Differentiable Optimization Framework for Single-Image Normal Estimation</title>
      <link>https://arxiv.org/abs/2508.05950</link>
      <description>arXiv:2508.05950v2 Announce Type: replace 
Abstract: We propose CLONE, a 3DGS-based Closed-Loop differentiable Optimization framework for single-image Normal Estimation. The core idea is to construct an "image-geometry-image" consistency loop that unifies and jointly constrains the limitations of both paradigms: the reliance on explicit supervision without cross-domain geometric constraints in discriminative methods, and the absence of stable differentiable optimization pathways in generative methods despite strong generative priors. Specifically, we first employ 3D Gaussian Splatting to explicitly parameterize the scene and derive continuous and differentiable surface normals via covariance eigen-decomposition, providing an analytical gradient pathway for geometric modeling. We then introduce a differentiable illumination model with a learnable light modulation kernel to establish a continuous mapping between surface normals and image radiance, enabling reprojection errors to directly supervise the underlying 3D geometry. Furthermore, to compensate for the limited local detail expressiveness of Gaussian representations, we design a one-step deterministic diffusion-inspired refinement network, which enhances local geometric details while preserving end-to-end differentiability. A cross-domain gating fusion mechanism is introduced to coordinate global geometric consistency and local detail reconstruction. Finally, all components are jointly optimized under a unified reprojection objective, forming a closed-loop and stable gradient propagation pathway. This enables effective constraint of the multi-solution space and improved geometric consistency without requiring ground-truth normal supervision.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.05950v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yanxing Liang, Yinghui Wang, Wei Li, Tao Yan, Jiaxing Shen</dc:creator>
    </item>
    <item>
      <title>Unsupervised Partner Design Enables Robust Ad-hoc Teamwork</title>
      <link>https://arxiv.org/abs/2508.06336</link>
      <description>arXiv:2508.06336v2 Announce Type: replace 
Abstract: We introduce Unsupervised Partner Design (UPD), a population-free multi-agent reinforcement learning method for robust ad-hoc teamwork. UPD generates training partners on-the-fly and selects them adaptively based on a learnability criterion, removing the need for pre-trained partner populations or manual parameter tuning. We show that this simple mechanism enables effective partner diversity and can be extended to joint partner-environment selection when a procedural level generator is available. Across Level-Based Foraging, Overcooked-AI, and the Overcooked Generalisation Challenge, UPD consistently achieves strong performance compared to both population-based and population-free baselines. In a human-AI user study, agents trained with UPD achieve higher returns and are rated as more adaptive, more human-like, and less frustrating than all evaluated baseline methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.06336v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.HC</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Constantin Ruhdorfer, Matteo Bortoletto, Victor Oei, Anna Penzkofer, Andreas Bulling</dc:creator>
    </item>
    <item>
      <title>In-Context Reinforcement Learning via Communicative World Models</title>
      <link>https://arxiv.org/abs/2508.06659</link>
      <description>arXiv:2508.06659v2 Announce Type: replace 
Abstract: Reinforcement learning (RL) agents often struggle to generalize to new tasks and contexts without updating their parameters, mainly because their learned representations and policies are overfit to the specifics of their training environments. To boost agents' in-context RL (ICRL) ability, this work formulates ICRL as a two-agent emergent communication problem and introduces CORAL (Communicative Representation for Adaptive RL), a framework that learns a transferable communicative context by functionally separating latent representation learning from control. In CORAL, an Information Agent (IA) is pre-trained as a world model on a diverse distribution of tasks. Its objective is not direct return maximization, but world modeling and distilling its understanding into concise messages. The emergent communication protocol is shaped by a novel Causal Influence Loss, which measures the effect that the message has on the next action. During deployment, the previously trained IA serves as a fixed contextualizer for a new Control Agent (CA), which learns to solve tasks by interpreting the provided communicative context. Our experiments demonstrate that this approach enables the CA to achieve significant gains in sample efficiency and successfully perform zero-shot adaptation with the help of pre-trained IA in diverse online and offline environments, validating the efficacy of learning a transferable communicative representation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.06659v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Fernando Martinez-Lopez, Tao Li, Yingdong Lu, Juntao Chen</dc:creator>
    </item>
    <item>
      <title>HiMat: DiT-based Ultra-High Resolution SVBRDF Generation</title>
      <link>https://arxiv.org/abs/2508.07011</link>
      <description>arXiv:2508.07011v5 Announce Type: replace 
Abstract: Creating ultra-high-resolution spatially varying bidirectional reflectance functions (SVBRDFs) is critical for photorealistic 3D content creation, to faithfully represent fine-scale surface details required for close-up rendering. However, achieving 4K generation faces two key challenges: (1) the need to synthesize multiple reflectance maps at full resolution, which multiplies the pixel budget and imposes prohibitive memory and computational cost, and (2) the requirement to maintain strong pixel-level alignment across maps at 4K, which is particularly difficult when adapting pretrained models designed for the RGB image domain. We introduce HiMat, a diffusion-based framework tailored for efficient and diverse 4K SVBRDF generation. To address the first challenge, HiMat performs generation in a high-compression latent space via DC-AE, and employs a pretrained diffusion transformer with linear attention to improve per-map efficiency. To address the second challenge, we propose CrossStitch, a lightweight convolutional module that enforces cross-map consistency without incurring the cost of global attention. Our experiments show that HiMat achieves high-fidelity 4K SVBRDF generation with superior efficiency, structural consistency, and diversity compared to prior methods. Beyond materials, our framework also generalizes to related applications such as intrinsic decomposition.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.07011v5</guid>
      <category>cs.CV</category>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zixiong Wang, Jian Yang, Yiwei Hu, Milos Hasan, Beibei Wang</dc:creator>
    </item>
    <item>
      <title>Autonomous Air-Ground Vehicle Operations Optimization in Hazardous Environments: A Multi-Armed Bandit Approach</title>
      <link>https://arxiv.org/abs/2508.08217</link>
      <description>arXiv:2508.08217v2 Announce Type: replace 
Abstract: Hazardous environments such as chemical spills, radiological zones, and bio-contaminated sites pose significant threats to human safety and public infrastructure. Rapid and reliable hazard mitigation in these settings often unsafe for humans, calling for autonomous systems that can adaptively sense and respond to evolving risks. This paper presents a decision-making framework for autonomous vehicle dispatch in hazardous environments with uncertain and evolving risk levels. The system integrates a Bayesian Upper Confidence Bound (BUCB) sensing strategy with task-specific vehicle routing problems with profits (VRPP), enabling adaptive coordination of unmanned aerial vehicles (UAVs) for hazard sensing and unmanned ground vehicles (UGVs) for cleaning. Using VRPP allows selective site visits under resource constraints by assigning each site a visit value that reflects sensing or cleaning priorities. Site-level hazard beliefs are maintained through a time-weighted Bayesian update. BUCB scores guide UAV routing to balance exploration and exploitation under uncertainty, while UGV routes are optimized to maximize expected hazard reduction under resource constraints. Simulation results demonstrate that our framework reduces the number of dispatch cycles to resolve hazards by around 30% on average compared to uninformed baseline dispatch strategies, underscoring the value of uncertainty-aware vehicle dispatch for reliable hazard mitigation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.08217v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jimin Choi, Max Z. Li</dc:creator>
    </item>
    <item>
      <title>From Hanging to Standing: Fabric-Formed Catenary Arches as Scalable Concrete Building Components</title>
      <link>https://arxiv.org/abs/2508.08572</link>
      <description>arXiv:2508.08572v2 Announce Type: replace 
Abstract: Concrete is the most widely used construction material globally. Despite its versatility, it is typically poured into stiff, rectilinear formwork that restricts formal exploration and leads to considerable material waste and higher carbon output. Fabric formwork offers an alternative in which flexible textiles shape fresh concrete into structurally efficient geometries such as thin shells and catenary arches. However, a persistent challenge remains that forms optimized in tension under gravity often crack when rotated into their final compression orientation. Previous research has focused on form-finding and fabrication workflows, with little attention to damage-free reorientation. This paper addresses this gap through two contributions: a CNC-milled repositionable frame with soft-to-rigid connection details enabling controlled tilt-up reorientation without damage, and a scalar reframing that embeds small repeating catenary units within larger building components such as walls and slabs. The research pursues three objectives: (1) to design and refine compatible textile-concrete combinations, with particular focus on non-woven geotextiles; (2) to develop a CNC-cut, repositionable frame system that redistributes stresses during reorientation; and (3) to devise robust soft-to-rigid connection details that permit safe demolding and handling. Through material testing and iterative prototyping, the study identifies concrete paste-geotextile pairings that produce high-quality surface finishes. A tilt-up method was developed where the frame rotates with the arch, minimizing tensile stress. Results demonstrate that catenary arches can be cast, released, and reoriented without cracking or damage. These findings advance fabric-formed concrete toward low-tech, materially efficient structures with reduced environmental impact.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.08572v2</guid>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Aysan Mokhtarimousavi, Vivian Nguyen, Farzad Saeidi Samet, Lavender Tessmer</dc:creator>
    </item>
    <item>
      <title>Breaking the Curse of Knowledge: Designing Personalized Jargon Support for Real-Time Online Meetings</title>
      <link>https://arxiv.org/abs/2508.10239</link>
      <description>arXiv:2508.10239v3 Announce Type: replace 
Abstract: Cross-disciplinary communication is often hindered by specialized language (i.e., jargon) and uneven background knowledge. Recent advances in speech-to-text and large language models make it possible to provide jargon support during online meetings, but generic support (i.e., defining the same terms for everyone) can overwhelm listeners with definitions they do not need. We present ParseJargon, a system for personalized jargon support in real-time online meetings. We begin with an initial prototype to probe the use of single-sentence user profiles for personalization. We conducted a controlled study and showed that even this minimal personalization enhanced listeners' comprehension and engagement over generic support because of more precise jargon identification. Guided by insights from participants' feedback, we refined the system with more advanced personalization techniques, including in-session user feedback and portable glossary-based profiles. We evaluated how these techniques can further improve jargon identification precision using data collected in the controlled study to simulate personalization over time. We also conducted a latency test, complemented by a lightweight deployment, to analyze the system's real-time capability and usability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.10239v3</guid>
      <category>cs.HC</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yifan Song, Yijun Liu, Wing Yee Au, Hon Yung Wong, Brian P. Bailey, Tal August</dc:creator>
    </item>
    <item>
      <title>Discovering Expert-Level Nash Equilibrium Algorithms with Large Language Models</title>
      <link>https://arxiv.org/abs/2508.11874</link>
      <description>arXiv:2508.11874v2 Announce Type: replace 
Abstract: Designing polynomial-time algorithms for approximate Nash equilibria (ANE) with provable worst-case guarantees is a fundamental open problem in algorithmic game theory. While large language models (LLMs) can generate candidate algorithms at scale, certifying worst-case guarantees requires formal analysis over all game instances -- a task for which no automated system previously existed. Here, we present LegoNE, a framework encoding expert proof strategies into a symbolic language that automatically compiles any candidate algorithm into a finite optimization problem certifying its worst-case guarantee. Integrating LegoNE with a reasoning LLM, we rediscovered an algorithm matching the best polynomial-time guarantee for two-player games, and discovered a three-player algorithm improving the best guarantee from $0.6+\delta$ to $0.5+\delta$ -- provably beyond the reach of the extension technique, the only previously known multi-player ANE design paradigm. These results show that encoding domain-specific proof strategies into a machine-tractable language can support LLM-driven discovery of algorithms outside known human design paradigms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.11874v2</guid>
      <category>cs.GT</category>
      <category>cs.AI</category>
      <category>cs.DS</category>
      <category>cs.LO</category>
      <category>cs.PL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:DOI>10.1038/s41467-026-74003-1</arxiv:DOI>
      <dc:creator>Hanyu Li, Dongchen Li, Xiaotie Deng</dc:creator>
    </item>
    <item>
      <title>jXBW: A Compressed Index for Structure-Aware JSONL Retrieval in Structured RAG</title>
      <link>https://arxiv.org/abs/2508.12536</link>
      <description>arXiv:2508.12536v3 Announce Type: replace 
Abstract: Providing \textit{structured} information to large language models (LLMs) improves multi-step reasoning and factual grounding, and recent retrieval-augmented generation (RAG) systems therefore reconstruct structure from retrieved text on every query. When the corpus is \emph{already} structured -- as in JSON Lines (JSONL), a popular format for LLM prompts, chemical compounds, and geospatial records -- this per-query rebuilding can be replaced by direct \emph{structural retrieval}. The core primitive is \textit{substructure search}: finding all JSON objects in a collection that contain a given query pattern. Existing approaches index each document separately, so both index space and query time grow with the total collection size; XML-based engines add conversion overhead and semantic mismatches. We propose \textbf{jXBW}, a compressed index for fast substructure search over JSONL, combining three innovations: (i) a merged tree representation that consolidates repeated structures across objects, (ii) a succinct tree index based on the eXtended Burrows--Wheeler Transform (XBW), and (iii) a newly developed three-phase substructure search algorithm that runs on this index. Together they achieve \textbf{query-dependent complexity}: the cost is determined by query characteristics rather than collection size, in compressed space. Experiments on seven real-world datasets, including PubChem ($10^6$ compounds) and OpenStreetMap ($6.6 \times 10^6$ objects), show that jXBW outperforms the strongest tree-based baseline by $\mathbf{16\times}$ on the smallest dataset and by up to $\mathbf{2{,}800\times}$ on the largest, and is more than $\mathbf{2 \times 10^6\times}$ faster than the XQuery engine Saxon. jXBW thus brings structural retrieval over million-record JSONL collections into the sub-millisecond range.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.12536v3</guid>
      <category>cs.DB</category>
      <category>cs.DS</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yasuo Tabei</dc:creator>
    </item>
    <item>
      <title>A Nodal Discontinuous Galerkin Method with Rank-Adaptive Velocity Space Representation for the Multiscale BGK Model</title>
      <link>https://arxiv.org/abs/2508.16564</link>
      <description>arXiv:2508.16564v2 Announce Type: replace 
Abstract: A novel hybrid algorithm is presented for the Boltzmann-BGK equation, in which a rank-adaptive decomposition is applied solely in the velocity subspace, while a full-rank representation is maintained in the physical (position) space. This approach establishes a foundation for extending modern rank-adaptive techniques to solve the Boltzmann equation in realistic settings, particularly where structured representations, such as conformal geometries, may not be feasible in practical engineering applications. A nodal discontinuous Galerkin method is employed for spatial discretization, coupled with a rank-adaptive decomposition over the velocity grid, as well as implicit-explicit Runge-Kutta methods for time integration. To handle the limit of vanishing collision time, a multiscale implicit integrator based on an auxiliary moment equation is utilized. The algorithm's order of accuracy, reduced computational complexity, and robustness are demonstrated on a suite of canonical gas kinetics problems with increasing complexity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.16564v2</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Andres Galindo-Olarte, Joseph Nakao, Mirjeta Pasha, Jing-Mei Qiu, William Taitano</dc:creator>
    </item>
    <item>
      <title>Uncovering Intervention Opportunities for Suicide Prevention with Language Model Assistants</title>
      <link>https://arxiv.org/abs/2508.18541</link>
      <description>arXiv:2508.18541v3 Announce Type: replace 
Abstract: Warning: This paper discusses topics of suicide and suicidal ideation, which may be distressing to some readers.
  The National Violent Death Reporting System (NVDRS) documents information about suicides in the United States, including free text narratives (e.g., circumstances surrounding a suicide). In a demanding public health data pipeline, annotators manually extract structured information from death investigation records following extensive guidelines developed painstakingly by experts. In this work, we facilitate data-driven insights from the NVDRS data to support the development of novel suicide interventions by investigating the value of language models (LMs) as efficient assistants to these (a) data annotators and (b) experts. We find that LM predictions match existing data annotations about 85% of the time across 50 NVDRS variables. In the cases where the LM disagrees with existing annotations, expert review reveals that LM assistants can surface annotation discrepancies 38% of the time. Finally, we introduce a human-in-the-loop algorithm to assist experts in efficiently building and refining guidelines for annotating new variables by allowing them to focus only on providing feedback for incorrect LM predictions. We apply our algorithm to a real-world case study for a new variable that characterizes victim interactions with lawyers and demonstrate that it achieves comparable annotation quality with a laborious manual approach. Our findings provide evidence that LMs can serve as effective assistants to public health researchers who handle sensitive data in high-stakes scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.18541v3</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics, 2026</arxiv:journal_reference>
      <dc:creator>Jaspreet Ranjit, Hyundong J. Cho, Claire J. Smerdon, Yoonsoo Nam, Myles Phung, Jonathan May, John R. Blosnich, Swabha Swayamdipta</dc:creator>
    </item>
    <item>
      <title>Quantum latent distributions in deep generative models</title>
      <link>https://arxiv.org/abs/2508.19857</link>
      <description>arXiv:2508.19857v3 Announce Type: replace 
Abstract: Many successful families of generative models leverage a low-dimensional latent distribution that is mapped to a data distribution. Though simple latent distributions are often used, the choice of distribution has a strong impact on model performance. Recent experiments have suggested that the probability distributions produced by quantum processors, which are typically highly correlated and classically intractable, can lead to improved performance on some datasets. However, when and why latent distributions produced by quantum processors can improve performance, and whether these improvements are connected to quantum properties of these distributions, are open questions that we investigate in this work. We show in theory that, under certain conditions, these "quantum latent distributions" enable generative models to produce data distributions that classical latent distributions cannot efficiently produce. We provide intuition as to the underlying mechanisms that could explain a performance advantage on real datasets. Based on this, we perform extensive benchmarking on a synthetic quantum dataset and the QM9 molecular dataset, using both simulated and real photonic quantum processors. We find that the statistics arising from quantum interference lead to improved generative performance compared to classical baselines, suggesting that quantum processors can play a role in expanding the capabilities of deep generative models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.19857v3</guid>
      <category>cs.LG</category>
      <category>quant-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Omar Bacarreza, Thorin Farnsworth, Alexander Makarovskiy, Hugo Wallner, Tessa Hicks, Santiago Sempere-Llagostera, John Price, Robert J. A. Francis-Jones, William R. Clements</dc:creator>
    </item>
    <item>
      <title>CardioMorphNet: Cardiac Motion Prediction Using a Shape-Guided Bayesian Recurrent Deep Network</title>
      <link>https://arxiv.org/abs/2508.20734</link>
      <description>arXiv:2508.20734v2 Announce Type: replace 
Abstract: Accurate cardiac motion estimation from cine cardiac magnetic resonance (CMR) images is vital for assessing cardiac function and detecting its abnormalities. Existing methods often struggle to accurately capture heart motion because they rely on intensity-based image registration similarity losses that may overlook cardiac anatomical regions. To address this, we propose CardioMorphNet, a recurrent Bayesian deep learning framework for 3D cardiac shape-guided deformable registration using short-axis (SAX) CMR images. It employs a recurrent variational autoencoder to model spatio-temporal dependencies across the cardiac cycle, along with two posterior models for bi-ventricular segmentation and motion estimation. The derived loss function from the Bayesian formulation guides the framework to focus on anatomical regions by recursively registering segmentation maps without using intensity-based image registration similarity loss, while leveraging sequential SAX volumes and spatio-temporal features. The Bayesian modelling also enables the computation of uncertainty maps for the estimated motion fields. Validated on the UK Biobank and M&amp;M datasets by comparing warped mask shapes with ground-truth masks, CardioMorphNet demonstrates superior performance in cardiac motion estimation, outperforming state-of-the-art methods. Uncertainty assessment shows that it also yields lower uncertainty values for estimated motion fields in the cardiac region compared with other probabilistic-based cardiac registration methods, indicating higher confidence in its predictions. In addition, the clinical indices extraction assessment shows that CardioMorphNet estimates the clinical indices more accurately than other approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2508.20734v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1016/j.media.2026.104149</arxiv:DOI>
      <arxiv:journal_reference>Medical Image Analysis, vol. 113, p. 104149, 2026</arxiv:journal_reference>
      <dc:creator>Reza Akbari Movahed, Abuzar Rezaee, Arezoo Zakeri, Colin Berry, Edmond S. L. Ho, Ali Gooya</dc:creator>
    </item>
    <item>
      <title>AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights</title>
      <link>https://arxiv.org/abs/2509.00462</link>
      <description>arXiv:2509.00462v4 Announce Type: replace 
Abstract: As artificial intelligence (AI) tools become widely adopted, large language models (LLMs) are increasingly involved on both sides of decision-making processes, ranging from hiring to content moderation. This dual adoption raises a critical question: do LLMs systematically favor content that resembles their own outputs? Prior research in computer science has identified self-preference bias -- the tendency of LLMs to favor their own generated content -- but its real-world implications have not been empirically evaluated. We focus on the hiring context, where job applicants often rely on LLMs to refine resumes, while employers deploy them to screen those same resumes. Using a large-scale controlled resume correspondence experiment, we find that LLMs consistently prefer resumes generated by themselves over those written by humans or produced by alternative models, even when content quality is controlled. The bias against human-written resumes is particularly substantial, with self-preference bias ranging from 67% to 82% across major commercial and open-source models. To assess labor market impact, we simulate realistic hiring pipelines across 24 occupations. These simulations show that candidates using the same LLM as the evaluator are 23% to 60% more likely to be shortlisted than equally qualified applicants submitting human-written resumes, with the largest disadvantages observed in business-related fields such as sales and accounting. We further demonstrate that this bias can be reduced by more than 50% through simple interventions targeting LLMs' self-recognition capabilities. These findings highlight an emerging but previously overlooked risk in AI-assisted decision making and call for expanded frameworks of AI fairness that address not only demographic-based disparities, but also biases in AI-AI interactions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.00462v4</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiannan Xu, Gujie Li, Jane Yi Jiang</dc:creator>
    </item>
    <item>
      <title>Speeding Up the NSGA-II via Dynamic Population Sizes</title>
      <link>https://arxiv.org/abs/2509.01739</link>
      <description>arXiv:2509.01739v3 Announce Type: replace 
Abstract: Multi-objective evolutionary algorithms (MOEAs) are among the most widely and successfully applied optimizers for multi-objective problems. However, to store many optimal trade-offs (the Pareto optima) simultaneously, MOEAs are typically run with a large population of solution candidates. This slows down the algorithm and renders the choice of the population size a crucial design decision. In this work, we aim to overcome these difficulties by proposing the dynamic NSGA-II, a variant of the well-known NSGA-II that starts with a small initial population and doubles it after a user-specified number $\tau$ of function evaluations, up to a maximum size of $N_{max}$. We prove that the dynamic NSGA-II with optimal parameters computes the Pareto front of the OneMinMax benchmark of size $n$ with high probability in $O(n \log^2 n)$ function evaluations, which is considerably faster than the $\Theta(n^2 \log n)$ runtime of the static NSGA-II with optimal parameters. For the OneJumpZeroJump benchmark with gap size $k$, we show a runtime of $O(n^k \log^2 n)$, improving upon the known runtime of $\Theta(n^{k+1})$. We also propose a variant that uses the initial population size for a longer period and achieves slightly better performance. Finally, we show that a simple concurrent-run strategy turns our dynamic NSGA-II variants into parameter-less algorithms that exceed the above runtimes only by a logarithmic factor and hence still outperform the static NSGA-II by a factor of $\tilde\Omega(n)$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.01739v3</guid>
      <category>cs.NE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Benjamin Doerr, Martin S. Krejca, Simon Wietheger</dc:creator>
    </item>
    <item>
      <title>Causal Representation Learning from Network Data</title>
      <link>https://arxiv.org/abs/2509.01916</link>
      <description>arXiv:2509.01916v2 Announce Type: replace 
Abstract: Causal disentanglement from soft interventions is identifiable under the assumptions of linear interventional faithfulness and availability of both observational and interventional data. Prior work has focused on unstructured observations without leveraging known relational context among measured entities. In many scientific applications, however, the measured variables come with an observed interaction network that provides structured context, such as protein-protein interactions and pathway-gene membership. We propose GraCE-VAE, a graph-aware causal discrepancy variational autoencoder that treats pathway-level information as an auxiliary view of the latent causal programs. The graph neural network encoder conditions on this auxiliary pathway view and the biological graph to improve amortized inference, while the causal decoder remains a latent SCM with soft interventions. Assuming samples are i.i.d. within each intervention regime, we show that GraCE-VAE inherits the identifiability guarantees of causal discrepancy VAEs and identifies the latent causal graph and intervention targets up to the standard equivalence class. Experiments on three CRISPR perturbation datasets demonstrate that leveraging structured biological context improves prediction of interventional outcomes, including unseen perturbation combinations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.01916v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jifan Zhang, Michelle M. Li, Elena Zheleva</dc:creator>
    </item>
    <item>
      <title>AudioRWKV: Efficient and Stable Bidirectional RWKV for Audio Pattern Recognition</title>
      <link>https://arxiv.org/abs/2509.02167</link>
      <description>arXiv:2509.02167v2 Announce Type: replace 
Abstract: Recently, Transformers (e.g., Audio Spectrogram Transformers, AST) and state-space models (e.g., Audio Mamba, AuM) have achieved remarkable progress in audio modeling. However, the O(L^2) computational complexity of the Transformer architecture hinders efficient long-sequence processing, while the Mamba architecture tends to become unstable when scaling parameters and data. To address these challenges, this paper proposes AudioRWKV (A-RWKV), a highly efficient and stable architecture for audio modeling. Specifically, we inherit the stable and efficient recurrent formulation of RWKV7 and replace its 1D token-shift operation with a 2D depthwise separable convolution to better capture local spectro-temporal patterns. Furthermore, we adapt the original causal WKV kernel into a bidirectional WKV kernel (Bi-WKV), enabling global context modeling over the entire audio sequence while maintaining linear computational complexity. Benefiting from the inherent stability of the RWKV7 foundation, A-RWKV scales seamlessly to larger model sizes. Experimental results demonstrate that, under the same linear-model regime, A-RWKV-S (22M) achieves performance parity with AuM-B (92M) while exhibiting more stable throughput than AST; for long-form audio (~5 minutes 28 seconds), WKV7 achieves up to a 13.3X speedup in processing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.02167v2</guid>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jing Wang, Maoxiang Wu, Jiayu Xiong, Jianlong Kwan, Jun Xue</dc:creator>
    </item>
    <item>
      <title>Modelling Scenarios for Carbon-aware Geographic Load Shifting of Compute Workloads</title>
      <link>https://arxiv.org/abs/2509.07043</link>
      <description>arXiv:2509.07043v3 Announce Type: replace 
Abstract: We present an analytical model to evaluate the reductions in emissions resulting from geographic load shifting. This model is optimistic as it ignores issues of grid capacity, demand and curtailment. In other words, real-world reductions will be smaller than the estimates. However, even with these assumptions, the presented scenarios show that the realistic reductions from carbon-aware geographic load shifting are small, of the order of 5\%. This is not enough to compensate the growth in emissions from global data centre expansion.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.07043v3</guid>
      <category>cs.OH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Wim Vanderbauwhede</dc:creator>
    </item>
    <item>
      <title>Subdivision Schemes in Metric Spaces</title>
      <link>https://arxiv.org/abs/2509.08070</link>
      <description>arXiv:2509.08070v2 Announce Type: replace 
Abstract: We develop a unified framework for nonlinear subdivision schemes on complete metric spaces (CMS). We begin with CMS preliminaries and formalize refinement in CMS, retaining key structural properties, such as locality. We prove a convergence theorem under contractivity and demonstrate its applicability. To address schemes where contractivity is unknown, we introduce two notions of proximity. Our proximity methods relate a nonlinear scheme to another nonlinear scheme with known contractivity, rather than to a linear scheme, as in much of the literature. Specifically, the first type proximity compares the two schemes after a single refinement step and, as in the classical theory, yields convergence from sufficiently dense initial data. The proximity of the second type monitors alignment across all refinement levels and provides strong convergence without density assumptions. We formulate and prove the corresponding theorems, and illustrate them with various examples, such as schemes over metric spaces of compact sets in $\R^n$ and schemes over the Wasserstein space, as well as a geometric Hermite metric space. These results extend subdivision theory beyond Euclidean and manifold-valued data for data in metric spaces.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.08070v2</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Nira Dyn, Nir Sharon</dc:creator>
    </item>
    <item>
      <title>Video Understanding by Design: How Datasets Shape Video Models</title>
      <link>https://arxiv.org/abs/2509.09151</link>
      <description>arXiv:2509.09151v2 Announce Type: replace 
Abstract: Research in video understanding has advanced rapidly, driven by increasingly diverse datasets and more powerful model architectures. While existing surveys typically organize progress by tasks, benchmarks, or model families, they provide limited insight into why particular architectures emerged and succeeded. In this survey, we argue that the evolution of video understanding is fundamentally shaped by dataset structure. We present a dataset-centric perspective that connects dataset structure, inductive biases, and architectural design within a unified framework. We show that different datasets require models to capture specific invariances and capabilities, such as robustness to viewpoint changes, sensitivity to temporal ordering, reasoning over long-range dependencies, relational interactions, and cross-modal alignment. These requirements naturally give rise to inductive biases, i.e., architectural assumptions that favor particular patterns of reasoning and generalization. From this perspective, milestone architectures, including two-stream networks, 3D CNNs, temporal models, transformers, graph-based methods, and multimodal foundation models, can be understood as architectural responses to the challenges posed by evolving datasets. Building on this framework, we systematically analyze how dataset characteristics have shaped architectural innovation across video understanding tasks and discuss the representational biases induced by different data regimes. By unifying datasets, inductive biases, and architectures into a coherent perspective, this survey offers both a retrospective explanation of the field's evolution and a forward-looking roadmap toward general-purpose video understanding systems. Code and dynamic video visualizations of dataset-induced biases are available at https://time.griffith.edu.au/paper-sites/video-understanding/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.09151v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lei Wang, Syuan-Hao Li, Piotr Koniusz, Yongsheng Gao</dc:creator>
    </item>
    <item>
      <title>Region-Wise Correspondence Prediction between Manga Line Art Images</title>
      <link>https://arxiv.org/abs/2509.09501</link>
      <description>arXiv:2509.09501v4 Announce Type: replace 
Abstract: Understanding region-wise correspondences between manga line art images is fundamental for high-level manga processing, supporting downstream tasks such as line art colorization and in-between frame generation. Unlike natural images that contain rich visual cues, manga line art consists only of sparse black-and-white strokes, making it challenging to determine which regions correspond across images. In this work, we introduce a new task: predicting region-wise correspondence between raw manga line art images without any annotations. To address this problem, we propose a Transformer-based framework trained on large-scale, automatically generated region correspondences. The model learns to suppress noisy matches and strengthen consistent structural relationships, resulting in robust patch-level feature alignment within and across images. During inference, our method segments each line art and establishes coherent region-level correspondences through edge-aware clustering and region matching. We construct manually annotated benchmarks for evaluation, and experiments across multiple datasets demonstrate both high patch-level accuracy and strong region-level correspondence performance, achieving 78.4-84.4% region-level accuracy. These results highlight the potential of our method for real-world manga and animation applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.09501v4</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yingxuan Li, Jiafeng Mao, Qianru Qiu, Yusuke Matsui</dc:creator>
    </item>
    <item>
      <title>I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation</title>
      <link>https://arxiv.org/abs/2509.10334</link>
      <description>arXiv:2509.10334v2 Announce Type: replace 
Abstract: Vision Transformers (ViTs) have recently achieved strong results in semantic segmentation, yet their deployment on resource-constrained devices remains limited due to their high memory footprint and computational cost. Quantization offers an effective strategy to improve efficiency, but ViT-based segmentation models are notoriously fragile under low precision, as quantization errors accumulate across deep encoder-decoder pipelines. We introduce I-Segmenter, the first fully integer-only ViT segmentation framework. Building on the Segmenter architecture, I-Segmenter systematically replaces floating-point operations with integer-only counterparts. To further stabilize both training and inference, we propose $\lambda$-ShiftGELU, a novel activation function that mitigates the limitations of uniform quantization in handling long-tailed activation distributions. In addition, we remove the L2 normalization layer and replace bilinear interpolation in the decoder with nearest neighbor upsampling, ensuring integer-only execution throughout the computational graph. Extensive experiments show that I-Segmenter achieves accuracy within a reasonable margin of its FP32 baseline (5.1 % on average), while reducing model size by up to 3.8x and enabling up to 1.2x faster inference with optimized runtimes. Notably, even in one-shot PTQ with a single calibration image, I-Segmenter delivers competitive accuracy, underscoring its practicality for real-world deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.10334v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jordan Sassoon, Michal Szczepanski, Martyna Poreba</dc:creator>
    </item>
    <item>
      <title>Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings</title>
      <link>https://arxiv.org/abs/2509.10534</link>
      <description>arXiv:2509.10534v3 Announce Type: replace 
Abstract: The attention mechanism in a Transformer architecture matches key to query based on both content -- the what -- and position in a sequence -- the where. We present an analysis indicating that what and where are entangled in the popular RoPE rotary position embedding. This entanglement can impair performance particularly when decisions require independent matches on these two factors. We propose an improvement to RoPE, which we call Polar Coordinate Position Embeddings or PoPE, that eliminates the what-where confound. PoPE is far superior on a diagnostic task requiring indexing solely by position or by content. On autoregressive sequence modeling in music, genomic, and natural language domains, Transformers using PoPE as the positional encoding scheme outperform baselines using RoPE with respect to evaluation loss (perplexity) and downstream task performance. On language modeling, these gains persist across model scale, from 124M to 774M parameters. Crucially, PoPE shows strong zero-shot length extrapolation capabilities compared not only to RoPE but even a method designed for extrapolation, YaRN, which requires additional fine tuning and frequency interpolation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.10534v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anand Gopalakrishnan, Robert Csord\'as, J\"urgen Schmidhuber, Michael C. Mozer</dc:creator>
    </item>
    <item>
      <title>Similarity-Distance-Magnitude Activations</title>
      <link>https://arxiv.org/abs/2509.12760</link>
      <description>arXiv:2509.12760v5 Announce Type: replace 
Abstract: We introduce the Similarity-Distance-Magnitude (SDM) activation function, a more robust and interpretable formulation of the standard softmax activation function, adding Similarity (i.e., correctly predicted depth-matches into training) awareness and Distance-to-training-distribution awareness to the existing output Magnitude (i.e., decision-boundary) awareness, and enabling interpretability-by-exemplar via dense matching. We further introduce the SDM estimator, based on a data-driven partitioning of the class-wise empirical CDFs via the SDM activation, to control the class- and prediction-conditional accuracy among selective classifications. When used as the final-layer activation over pre-trained language models for selective classification, the SDM estimator is more robust to covariate shifts and out-of-distribution inputs than existing calibration methods using softmax activations, while remaining informative over in-distribution data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.12760v5</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Allen Schmaltz</dc:creator>
    </item>
    <item>
      <title>Linguistic Nepotism: Trading-off Quality for Language Preference in Multilingual RAG</title>
      <link>https://arxiv.org/abs/2509.13930</link>
      <description>arXiv:2509.13930v3 Announce Type: replace 
Abstract: Multilingual Retrieval-Augmented Generation (mRAG) systems enable language models to answer knowledge-intensive queries with citation-supported responses across languages. Despite their growing use, an open questions is whether the mixture of different document languages impacts generation and citation behavior in unintended ways. To investigate this, we introduce a controlled methodology using model internals to measure language preference while holding other factors such as document relevance constant. Across eight languages and six open-weight models, we find that models preferentially cite English sources when queries are in English, with this bias amplified for lower-resource languages and for documents positioned mid-context. More crucially, we find that models sometimes trade-off document relevance for language preference, indicating that citation choices are not always driven by informativeness alone. Our findings shed light on how language models leverage multilingual context and influence citation behavior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.13930v3</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dayeon Ki, Marine Carpuat, Paul McNamee, Daniel Khashabi, Eugene Yang, Dawn Lawrie, Kevin Duh</dc:creator>
    </item>
    <item>
      <title>LiMuon: Light and Fast Muon Optimizer for Large Models</title>
      <link>https://arxiv.org/abs/2509.14562</link>
      <description>arXiv:2509.14562v4 Announce Type: replace 
Abstract: Large models recently are widely applied in machine learning, so efficient training of large models has received widespread attention. More recently, the useful Muon optimizer is specifically designed for matrix-structured parameters of large models. Although some works have begun to study the Muon optimizer, the existing Muon and its variants still suffer from high sample complexity or high memory for large models. To fill this gap, we propose a light and fast Muon (LiMuon) optimizer for training large models, which builds on the momentum-based variance reduced technique and randomized Singular Value Decomposition (SVD). In particular, our LiMuon simultaneously has a lower memory and lower sample complexity than the Muon and its variants. Moreover, we prove that our LiMuon with lower memory has a lower sample complexity of $O(\epsilon^{-3})$ for finding an $\epsilon$-stationary solution of non-convex stochastic optimization under the generalized smoothness condition. To further narrow practice and theory gap, we also prove that our LiMuon with Newton-Schulz steps has a lower sample complexity than the Muon with Newton-Schulz steps. Numerical experimental results on training Mamba-130M, Qwen2.5-0.5B and ViT models demonstrate effectiveness of our LiMuon.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.14562v4</guid>
      <category>cs.LG</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Feihu Huang, Yuning Luo, Songcan Chen</dc:creator>
    </item>
    <item>
      <title>No Modality Left Behind: Adapting to Missing Modalities via Knowledge Distillation for Brain Tumor Segmentation</title>
      <link>https://arxiv.org/abs/2509.15017</link>
      <description>arXiv:2509.15017v2 Announce Type: replace 
Abstract: Accurate brain tumor segmentation is essential for preoperative evaluation and personalized treatment. Multi-modal MRI is widely used due to its ability to capture complementary tumor features across different sequences. However, in clinical practice, missing modalities are common, limiting the robustness and generalizability of existing deep learning methods that rely on complete inputs, especially under non-dominant modality combinations. To address this, we propose AdaMM, a multi-modal brain tumor segmentation framework tailored for missing-modality scenarios, centered on knowledge distillation and composed of three synergistic modules. The Graph-guided Adaptive Refinement Module explicitly models semantic associations between generalizable and modality-specific features, enhancing adaptability to modality absence. The Bi-Bottleneck Distillation Module transfers structural and textural knowledge from teacher to student models via global style matching and adversarial feature alignment. The Lesion-Presence-Guided Reliability Module predicts prior probabilities of lesion types through an auxiliary classification task, effectively suppressing false positives under incomplete inputs. Extensive experiments on the Pretreat-MetsToBrain-Masks and BraTS 2018, 2024 datasets demonstrate that AdaMM consistently outperforms existing methods, exhibiting superior segmentation accuracy and robustness, particularly in single-modality and weak-modality configurations. In addition, we conduct a systematic evaluation of six categories of missing-modality strategies, supporting the superiority of knowledge distillation and offering practical guidance for method selection and future research. Our source code is available at https://github.com/Quanato607/AdaMM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.15017v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <arxiv:DOI>10.1016/j.media.2026.104108</arxiv:DOI>
      <dc:creator>Shenghao Zhu, Yifei Chen, Weihong Chen, Shuo Jiang, Guanyu Zhou, Yuanhan Wang, Feiwei Qin, Changmiao Wang, Qiyuan Tian</dc:creator>
    </item>
    <item>
      <title>Multi-resolution Enhancement for Full Spectrum Neural Representations</title>
      <link>https://arxiv.org/abs/2509.15494</link>
      <description>arXiv:2509.15494v2 Announce Type: replace 
Abstract: Scientific data acquisition continues to outpace storage and analysis capabilities, making voxel-based representations increasingly intractable. Implicit neural representations (INRs) offer a promising solution by encoding signals through coordinate-based neural networks, serving as surrogates of data, with computational and storage requirements scaling with network complexity rather than data dimensionality. However, smaller INRs struggle to faithfully represent the multi-scale structures, high-frequency information, and fine textures that constitute a large proportion of scientific measurements. We propose WIEN-INR, a theoretically-guided hierarchical INR framework that distributes modeling across resolution scales and enables improved representation capacity through a novel enhancement network to recover subtle details. This multi-scale architecture allows smaller networks to retain the full spectrum of information while preserving training efficiency and lowering storage cost. Evaluated on distinct raw experimental measurements across scales and complexities, WIEN-INR represents a practical step toward broader adoption of neural representations in scientific workflows, delivering compact, robust, and high-fidelity representations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.15494v2</guid>
      <category>cs.LG</category>
      <category>physics.data-an</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuan Ni, Zhantao Chen, Shizhou Xu, Cheng Peng, Rajan Plumley, Chun Hong Yoon, Jana B. Thayer, Joshua J. Turner</dc:creator>
    </item>
    <item>
      <title>Reward Evolution with Graph-of-Thoughts: A Bi-Level Language Model Framework for Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2509.16136</link>
      <description>arXiv:2509.16136v5 Announce Type: replace 
Abstract: Designing effective reward functions remains a major challenge in reinforcement learning (RL), often requiring considerable human expertise and iterative refinement. Recent advances leverage Large Language Models (LLMs) for automated reward design, but these approaches are limited by hallucinations, reliance on human feedback, and challenges with handling complex, multi-step tasks. In this work, we introduce Reward Evolution with Graph-of-Thoughts (RE-GoT), a novel bi-level framework that enhances LLMs with structured graph-based reasoning and integrates Visual Language Models (VLMs) for automated rollout evaluation. RE-GoT first decomposes tasks into text-attributed graphs, enabling comprehensive analysis and reward function generation, and then iteratively refines rewards using visual feedback from VLMs without human intervention. Extensive experiments on 10 RoboGen and 4 ManiSkill2 tasks demonstrate that RE-GoT consistently outperforms existing LLM-based baselines. On RoboGen, our method improves average task success rates by 32.25%, with notable gains on complex multi-step tasks. On ManiSkill2, RE-GoT achieves an average success rate of 93.73% across four diverse manipulation tasks, significantly surpassing prior LLM-based approaches and even exceeding expert-designed rewards. Our results indicate that combining LLMs and VLMs with graph-of-thoughts reasoning provides a scalable and effective solution for autonomous reward evolution in RL.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.16136v5</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:journal_reference>IEEE International Conference on Robotics and Automation (ICRA 2026)</arxiv:journal_reference>
      <dc:creator>Changwei Yao, Xinzi Liu, Chen Li, Marios Savvides</dc:creator>
    </item>
    <item>
      <title>Enhanced Detection of Tiny Objects in Aerial Images</title>
      <link>https://arxiv.org/abs/2509.17078</link>
      <description>arXiv:2509.17078v3 Announce Type: replace 
Abstract: While one-stage detectors like YOLOv8 offer fast training speed, they often under-perform on detecting small objects as a trade-off. This becomes even more critical when detecting tiny objects in aerial imagery due to low-resolution targets and cluttered backgrounds.
  To address this, we introduce four enhancement strategies-input image resolution adjustment, data augmentation, attention mechanisms, and an alternative gating function for attention modules-that can be easily implemented on YOLOv8. We demonstrate that image size enlargement and the proper use of augmentation can lead to enhancement. Additionally, we designed a Mixture of Orthogonal Neural-modules Network (MoonNet) pipeline which consists of multiple attention-module-augmented CNNs. Two well-known attention modules, Squeeze-and-Excitation (SE) Block and Convolutional Block Attention Module (CBAM), were integrated into the backbone of YOLOv8 to form the MoonNet design, and the MoonNet backbone obtained improved detection accuracy compared to the original YOLOv8 backbone and single-type attention-module-augmented backbones. MoonNet further proved its adaptability and potential by achieving state-of-the-art performance on a tiny-object benchmark when integrated with the YOLC model.
  Our code is available at: https://github.com/Kihyun11/MoonNet</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.17078v3</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kihyun Kim, Michalis Lazarou, Tania Stathaki</dc:creator>
    </item>
    <item>
      <title>MVCL-DAF++: Enhancing Multimodal Intent Recognition via Prototype-Aware Contrastive Alignment and Coarse-to-Fine Dynamic Attention Fusion</title>
      <link>https://arxiv.org/abs/2509.17446</link>
      <description>arXiv:2509.17446v4 Announce Type: replace 
Abstract: Multimodal intent recognition (MMIR) suffers from weak semantic grounding and poor robustness under noisy or rare-class conditions. We propose MVCL-DAF++, which extends MVCL-DAF with two key modules: (1) Prototype-aware contrastive alignment, aligning instances to class-level prototypes to enhance semantic consistency; and (2) Coarse-to-fine attention fusion, integrating global modality summaries with token-level features for hierarchical cross-modal interaction. On MIntRec and MIntRec2.0, MVCL-DAF++ achieves new state-of-the-art results, improving rare-class recognition by +1.05\% and +4.18\% WF1, respectively. These results demonstrate the effectiveness of prototype-guided learning and coarse-to-fine fusion for robust multimodal understanding. The source code is available at https://github.com/chr1s623/MVCL-DAF-PlusPlus.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.17446v4</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haofeng Huang, Yifei Han, Long Zhang, Bin Li, Yangfan He, Yaxin Xue</dc:creator>
    </item>
    <item>
      <title>Understanding Benchmark Language Under Weakened Formal Semantics</title>
      <link>https://arxiv.org/abs/2509.17455</link>
      <description>arXiv:2509.17455v2 Announce Type: replace 
Abstract: State-of-the-art NLP benchmarks require interpretation of natural language that specifies conditions, procedures, and exceptions, often relying on implicit assumptions and external knowledge. Constructing complete semantic representations with proof-theoretic guarantees is frequently impractical at scale, and purely text-based reasoning offers limited means of inspection. This paper asks how much understanding of benchmark language can be achieved when formal semantic guarantees are weakened. We investigate this question by extracting computables: executable representations whose runtime behavior provides operational evidence of semantic adequacy, including executability, execution traces, and runtime failures. We induce and iteratively refine computables for benchmark instances using retrieval from external knowledge. Across mathematical reasoning, multi-step reasoning, causal inference, and rule- and exception-heavy legal and biomedical benchmarks, we find that the proposed approach consistently exceeds text-only reasoning and one-shot code execution. Beyond accuracy, our analyses show that these computables provide scalable, inspectable semantic evidence: they expose conditions and exceptions benchmark language forces into executable form, offering a practical bridge between proof-oriented semantics and purely textual reasoning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.17455v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Haoyang Chen, Kumiko Tanaka-Ishii</dc:creator>
    </item>
    <item>
      <title>Optimality of quasi-Monte Carlo methods and suboptimality of the sparse-grid Gauss--Hermite rule in Gaussian Sobolev spaces</title>
      <link>https://arxiv.org/abs/2509.18712</link>
      <description>arXiv:2509.18712v3 Announce Type: replace 
Abstract: Optimality of several quasi-Monte Carlo methods and suboptimality of the sparse-grid quadrature based on the univariate Gauss--Hermite rule is proved in the Sobolev spaces of mixed dominating smoothness of order $\alpha$, where the optimality is in the sense of worst-case convergence rate. For sparse-grid Gauss--Hermite quadrature, lower and upper bounds are established, with rates coinciding up to a logarithmic factor. The dominant rate is found to be only $N^{-\alpha/2}$ with $N$ function evaluations, although the optimal rate is known to be $N^{-\alpha}(\ln N)^{(d-1)/2}$. The lower bound is obtained by exploiting the structure of the Gauss--Hermite nodes and is independent of the quadrature weights; consequently, no modification of the weights can improve the rate $N^{-\alpha/2}$. In contrast, several quasi-Monte Carlo methods with a change of variables are shown to achieve the optimal rate, some up to, and one including, the logarithmic factor.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.18712v3</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yoshihito Kazashi, Yuya Suzuki, Takashi Goda</dc:creator>
    </item>
    <item>
      <title>Cryptographic Backdoor for Neural Networks: Boon and Bane</title>
      <link>https://arxiv.org/abs/2509.20714</link>
      <description>arXiv:2509.20714v2 Announce Type: replace 
Abstract: In this paper we show that cryptographic backdoors in a neural network (NN) can be highly effective in two directions, namely mounting the attacks as well as in presenting the defenses as well. On the attack side, a carefully planted cryptographic backdoor enables powerful and invisible attack on the NN. Considering the defense, we present applications: first, a provably robust NN watermarking scheme; second, a protocol for guaranteeing user authentication; and third, a protocol for tracking unauthorized sharing of the NN intellectual property (IP). From a broader theoretical perspective, borrowing the ideas from Goldwasser et. al. [FOCS 2022], our main contribution is to show that all these instantiated practical protocol implementations are provably robust. The protocols for watermarking, authentication and IP tracking resist an adversary with black-box access to the NN, whereas the backdoor-enabled adversarial attack is impossible to prevent under the standard assumptions. While the theoretical tools used for our attack is mostly in line with the Goldwasser et. al. ideas, the proofs related to the defense need further studies. Finally, all these protocols are implemented on state-of-the-art NN architectures with empirical results corroborating the theoretical claims. Further, one can utilize post-quantum primitives for implementing the cryptographic backdoors, laying out foundations for quantum-era applications in machine learning (ML).</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.20714v2</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anh Tu Ngo, Anupam Chattopadhyay, Subhamoy Maitra</dc:creator>
    </item>
    <item>
      <title>Distant Object Localisation from Noisy Image Segmentation Sequences</title>
      <link>https://arxiv.org/abs/2509.20906</link>
      <description>arXiv:2509.20906v3 Announce Type: replace 
Abstract: 3D object localisation based on a sequence of camera measurements is essential for safety-critical surveillance tasks, such as drone-based wildfire monitoring. Localisation of objects detected with a camera can typically be solved with specialised sensor configurations or 3D scene reconstruction. However, in the context of distant objects or tasks limited by the amount of available computational resources, neither solution is feasible. In this paper, we show that the task can be solved with either multi-view triangulation or particle filters, with the latter also providing shape and uncertainty estimates. We studied the solutions using 3D simulation and drone-based image segmentation sequences with global navigation satellite system (GNSS) based camera pose estimates. The results suggest that combining the proposed methods with pre-existing image segmentation models and drone-carried computational resources yields a reliable system for drone-based wildfire monitoring. The proposed solutions are independent of the detection method, also enabling quick adaptation to similar tasks. Code is available at https://fgi_nls.gitlab.io/public/distant-localisation</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.20906v3</guid>
      <category>cs.CV</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Julius Pesonen, Arno Solin, Eija Honkavaara</dc:creator>
    </item>
    <item>
      <title>Generation Properties of Stochastic Interpolation under Finite Training Set</title>
      <link>https://arxiv.org/abs/2509.21925</link>
      <description>arXiv:2509.21925v2 Announce Type: replace 
Abstract: This paper investigates the theoretical behavior of generative models under finite training populations. Within the stochastic interpolation generative framework, we derive closed-form expressions for the optimal velocity field and score function when only a finite number of training samples are available. We demonstrate that, under some regularity conditions, the deterministic generative process exactly recovers the training samples, while the stochastic generative process manifests as training samples with added Gaussian noise. Beyond the idealized setting, we consider model estimation errors and introduce formal definitions of underfitting and overfitting specific to generative models. Our theoretical analysis reveals that, in the presence of estimation errors, the stochastic generation process effectively produces convex combinations of training samples corrupted by a mixture of uniform and Gaussian noise. Experiments on generation tasks and downstream tasks such as classification support our theory.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.21925v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yunchen Li, Shaohui Lin, Zhou Yu</dc:creator>
    </item>
    <item>
      <title>SecureVibeBench: Benchmarking Secure Vibe Coding of AI Agents via Reconstructing Vulnerability-Introducing Scenarios</title>
      <link>https://arxiv.org/abs/2509.22097</link>
      <description>arXiv:2509.22097v5 Announce Type: replace 
Abstract: Large language model-powered code agents are rapidly transforming software engineering, yet the security risks of their generated code have become a critical concern. Existing benchmarks have provided valuable insights, but they fail to capture scenarios in which vulnerabilities are actually introduced by human developers, making fair comparisons between humans and agents infeasible. We therefore introduce SecureVibeBench, a benchmark of 105 C/C++ secure coding tasks sourced from 41 projects in OSS-Fuzz for code agents. SecureVibeBench has the following features: (i) realistic task settings that require multi-file edits in large repositories, (ii)~aligned contexts based on real-world open-source vulnerabilities with precisely identified vulnerability introduction points, and (iii) comprehensive evaluation that combines functionality testing and security checking with both static and dynamic oracles. We evaluate 5 popular code agents like OpenHands, supported by 5 LLMs (e.g., Claude sonnet 4.5) on SecureVibeBench. Results show that current agents struggle to produce both correct and secure code, as even the best-performing one, produces merely 23.8\% correct and secure solutions on SecureVibeBench. Our code and data are on https://github.com/iCSawyer/SecureVibeBench.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.22097v5</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Junkai Chen, Huihui Huang, Yunbo Lyu, Junwen An, Jieke Shi, Chengran Yang, Ting Zhang, Haoye Tian, Yikun Li, Zhenhao Li, Xin Zhou, Xing Hu, David Lo</dc:creator>
    </item>
    <item>
      <title>UAV-Enabled Fluid Antenna Systems for Multi-Target Wireless Sensing over LAWCNs</title>
      <link>https://arxiv.org/abs/2509.22497</link>
      <description>arXiv:2509.22497v2 Announce Type: replace 
Abstract: Fluid antenna system (FAS) is emerging as a key technology for enhancing spatial flexibility and sensing accuracy in future wireless systems. This paper investigates an unmanned aerial vehicle (UAV)-enabled FAS for multi-target wireless sensing in low-altitude wireless consumer networks (LAWCNs) for achieving the low-altitude economy (LAE) missions. We formulate an optimization problem aimed at minimizing the average Cram\'er-Rao bound (CRB) for multiple target estimations. To tackle this non-convex problem, an efficient alternating optimization (AO) algorithm is proposed, which jointly optimizes the UAV trajectory, the antenna position of the transmit fluid antennas (FAs) and the receive FAs, and the transmit beamforming at the UAV. Simulation results demonstrate significant performance improvements in estimation accuracy and sensing reliability compared to conventional schemes, e.g., the fixed position antenna scheme. The proposed system achieves enhanced sensing performance through adaptive trajectory design and beamforming, alongside effective interference suppression via the flexible FAS antenna repositioning, underscoring its practical potential for precision sensing in the UAV-enabled LAWCNs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.22497v2</guid>
      <category>cs.IT</category>
      <category>eess.SP</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xuhui Zhang, Wenchao Liu, Chunjie Wang, Jinke Ren, Huijun Xing, Shuqiang Wang, Yanyan Shen</dc:creator>
    </item>
    <item>
      <title>Stabilizing the singularity swap quadrature for near-singular line integrals</title>
      <link>https://arxiv.org/abs/2509.23881</link>
      <description>arXiv:2509.23881v3 Announce Type: replace 
Abstract: Singularity swap quadrature (SSQ) is an effective method for the evaluation at nearby targets of potentials due to densities on curves in three dimensions. While highly accurate in most settings, it is known to suffer from catastrophic cancellation when the kernel exhibits both near-vanishing numerators and strong singularities, as arises with scalar double layer potentials or tensorial kernels in Stokes flow or linear elasticity. This precision loss turns out to be tied to the interpolation basis, namely monomial (for open curves) or Fourier (for closed curves). We introduce a simple yet powerful remedy: target-specific translated monomial and Fourier bases that explicitly incorporate the near-vanishing behavior of the kernel numerator. We combine this with a stable evaluation of the constant term which now dominates the integral, significantly reducing cancellation. We show that our approach achieves close to machine precision for prototype integrals, and up to ten orders of magnitude lower error than standard SSQ at extremely close evaluation distances, without significant additional computational cost.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.23881v3</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1007/s10543-026-01132-w</arxiv:DOI>
      <dc:creator>David Krantz, Alex H. Barnett, Anna-Karin Tornberg</dc:creator>
    </item>
    <item>
      <title>SPECTRA: Revealing the Full Spectrum of User Preferences via Distributional LLM Inference</title>
      <link>https://arxiv.org/abs/2509.24189</link>
      <description>arXiv:2509.24189v4 Announce Type: replace 
Abstract: Large Language Models (LLMs) are increasingly used to model user preferences, with the typical output as a directly-generated ranked item list per user. However, this generative paradigm inherits the bias and opacity of autoregressive decoding. It over-emphasizes frequent (head) preferences and suppresses minority, long-tail ones. To address this, we propose SPECTRA (Softmax Probing for Extracted Category-level Token Readouts and Analysis), which treats the finetuned LLM as an implicit probabilistic model and probes its softmax to infer a probability distribution over semantically interpretable preference categories. We evaluate SPECTRA on MovieLens, Yelp, and a large-scale short-video platform. SPECTRA delivers (i) distributional alignment, reducing Jensen-Shannon divergence to the empirical preference distribution by 38 to 44 percent across public datasets; (ii) long-tail recovery with cross-user fairness, raising top-3 category exposure entropy by 23 percent on MovieLens and producing a larger gain on tail-preference users than on head-preference users; and (iii) downstream application value, with a 41 to 46 percent category-NDCG boost on MovieLens and Yelp, and a 7x improvement on long-tail category ranking on a large-scale deployment against a head-optimized production ranker.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.24189v4</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Luyang Zhang, Jialu Wang, Shichao Zhu, Beibei Li, Zhongcun Wang, Guangmou Pan, Yang Song</dc:creator>
    </item>
    <item>
      <title>Interpretable Self-Supervised Learning via Representer Landmarks and Nystr\"om Approximation</title>
      <link>https://arxiv.org/abs/2509.24467</link>
      <description>arXiv:2509.24467v4 Announce Type: replace 
Abstract: Self-supervised learning (SSL) learns representations from massive unlabeled data, yet the resulting models typically operate as black boxes, necessitating domain-specific explanations. We introduce KREPES, a unified framework to analytically interpret the learned representations of SSL objectives, including SimCLR, BYOL, and VICReg. By bridging empirical neural tangent kernel approximations of neural networks with the Representer Theorem for kernels, we express the learned latent space directly via "Representer Landmarks", which are the representations of influential unlabeled training examples. We introduce novel metrics, "Sample-Specific Influence Score", "Concept-Conditioned Influence Score" and "Feature Alignment Gap", to quantify the transparency of the learned representations. KREPES enables direct audit of the latent space without supervision, for example, revealing an algorithmic bias in the Adult-1M dataset where SSL uses demographic proxies for income. Finally, to ensure scalability to benchmarks with 1M+ samples (ImageNet-1K, Adult-1M), KREPES introduces a novel Nystr\"om approximation-based analytical inference framework for SSL objectives.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.24467v4</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Maedeh Zarvandi, Michael Timothy, Theresa Wasserer, Debarghya Ghoshdastidar</dc:creator>
    </item>
    <item>
      <title>Diffusion Bridge or Flow Matching? A Unifying Framework and Comparative Analysis</title>
      <link>https://arxiv.org/abs/2509.24531</link>
      <description>arXiv:2509.24531v2 Announce Type: replace 
Abstract: Diffusion Bridge and Flow Matching have both demonstrated compelling empirical performance in transformation between arbitrary distributions. However, there remains confusion about which approach is generally preferable, and the substantial discrepancies in their modeling assumptions and practical implementations have hindered a unified theoretical account of their relative merits. We have, for the first time, provided a unified theoretical and experimental validation of these two models. We recast their frameworks through the lens of Stochastic Optimal Control and prove that the cost function of the Diffusion Bridge is lower, guiding the system toward more stable and natural trajectories. Simultaneously, from the perspective of Optimal Transport, interpolation coefficients $t$ and $1-t$ of Flow Matching become increasingly ineffective when the training data size is reduced. To corroborate these theoretical claims, we propose a novel, powerful architecture for Diffusion Bridge built on a latent Transformer, and implement a Flow Matching model with the same structure to enable a fair performance comparison in various experiments. Comprehensive experiments are conducted across Image Restoration, Translation, and Style Transfer tasks, systematically varying both the distributional discrepancy (different difficulty) and the training data size. Extensive empirical results align perfectly with our theoretical predictions and allow us to delineate the respective advantages and disadvantages of these two models. Our code is available at https://github.com/zhukaizhen/diffusion_bridge_flow_matching.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.24531v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Kaizhen Zhu, Mokai Pan, Zhechuan Yu, Jingya Wang, Jingyi Yu, Ye Shi</dc:creator>
    </item>
    <item>
      <title>In-Context Learning of Temporal Point Processes with Foundation Inference Models</title>
      <link>https://arxiv.org/abs/2509.24762</link>
      <description>arXiv:2509.24762v3 Announce Type: replace 
Abstract: Modeling event sequences of multiple event types with marked temporal point processes (MTPPs) provides a principled way to uncover governing dynamical rules and predict future events. Current neural network approaches to MTPP inference rely on training separate, specialized models for each target system. We pursue a radically different approach: drawing on amortized inference and in-context learning, we pretrain a deep neural network to infer, in-context, the conditional intensity functions of event histories from a context defined by sets of event sequences. Pretraining is performed on a large synthetic dataset of MTPPs sampled from a broad distribution of Hawkes processes. Once pretrained, our Foundation Inference Model for Point Processes (FIM-PP) can estimate MTPPs from real-world data without any additional training, or be rapidly finetuned to target systems. Experiments show that this amortized approach matches the performance of specialized models on next-event prediction across common benchmark datasets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.24762v3</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>The Fourteenth International Conference on Learning Representations (ICLR 2026)</arxiv:journal_reference>
      <dc:creator>David Berghaus, Patrick Seifner, Kostadin Cvejoski, C\'esar Ojeda, Rams\'es J. S\'anchez</dc:creator>
    </item>
    <item>
      <title>CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning</title>
      <link>https://arxiv.org/abs/2509.25004</link>
      <description>arXiv:2509.25004v2 Announce Type: replace 
Abstract: Online reinforcement learning with verifiable rewards (RLVR) has become an effective paradigm for improving the reasoning abilities of large language models, but most methods still optimize reasoning trajectories over the static problem set, wasting rollout budget on solved or overly difficult problems. We propose \textbf{CLPO (Curriculum Learning meets Policy Optimization)}, a self-evolving curriculum framework that uses on-policy rollout accuracy to identify solved, medium-difficulty, and hard problems, then restructures selected tasks according to the model's current capability. Hard problems are simplified to become learnable, while medium-difficulty problems are diversified to provide useful training variation. This allows the learning curriculum to co-evolve with the policy rather than remaining fixed as the model's capability boundary shifts. Rather than treating these rewrites as static data augmentation, CLPO optimizes restructuring trajectories with credit assigned by the downstream accuracy gain of the rewritten problem, requiring no additional human annotations beyond the original verifiable answers. Experiments across mathematical reasoning and out-of-domain general reasoning benchmarks show that CLPO substantially outperforms GRPO and DAPO on Qwen3-8B by 10.21 and 7.75 average points, respectively. Ablation studies on math and code domains further show that both the restructuring mode and the rewriting loss contribute to the final gains, demonstrating that CLPO provides a scalable and robust pathway for eliciting stronger reasoning capabilities through a self-evolving curriculum.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.25004v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shijie Zhang, Zheng Xiao, Shiyu Liu, Guohao Sun, Kevin Zhang, Xiang Guo, Rujun Guo, Shaoyu Liu, Wangxiao Zhao, Guanjun Jiang</dc:creator>
    </item>
    <item>
      <title>Symskill: Symbol and Skill Co-Invention for Data-Efficient and Reactive Long-Horizon Manipulation</title>
      <link>https://arxiv.org/abs/2510.01661</link>
      <description>arXiv:2510.01661v3 Announce Type: replace 
Abstract: Multi-step manipulation in dynamic environments remains challenging. Imitation learning (IL) is reactive but lacks compositional generalization, since monolithic policies do not decide which skill to reuse when scenes change. Classical task-and-motion planning (TAMP) offers compositionality, but its high planning latency prevents real-time failure recovery. We introduce SymSkill, a unified framework that jointly learns predicates, operators, and skills from unlabeled, unsegmented demonstrations, combining compositional generalization with real-time recovery. Offline, SymSkill learns symbolic abstractions and goal-oriented skills directly from demonstrations. Online, given a conjunction of learned predicates, it uses a symbolic planner to compose and reorder skills to achieve symbolic goals while recovering from failures at both the motion and symbolic levels in real time. Coupled with a compliant controller, SymSkill supports safe execution under human and environmental disturbances. In RoboCasa simulation, SymSkill executes 12 single-step tasks with 85% success and composes them into multi-step plans without additional data. On a real Franka robot, it learns from 5 minutes of play data and performs 12-step tasks from goal specifications. Code and additional analysis are available at https://symskill.github.io/ .</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.01661v3</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yifei Simon Shao, Yuchen Zheng, Sunan Sun, Pratik Chaudhari, Vijay Kumar, Nadia Figueroa</dc:creator>
    </item>
    <item>
      <title>Normality Calibration in Semi-supervised Graph Anomaly Detection</title>
      <link>https://arxiv.org/abs/2510.02014</link>
      <description>arXiv:2510.02014v3 Announce Type: replace 
Abstract: Graph anomaly detection (GAD) has attracted growing interest for its crucial ability to uncover irregular patterns in broad applications. Semi-supervised GAD, which assumes a subset of annotated normal nodes available during training, is among the most widely explored application settings. However, the normality learned by existing semi-supervised GAD methods is limited to the labeled normal nodes, often inclining to overfitting the given patterns. These can lead to high detection errors, such as high false positives. To overcome this limitation, we propose GraphNC , a graph normality calibration framework that leverages both labeled and unlabeled data to calibrate the normality from a teacher model (a pre-trained semi-supervised GAD model) jointly in anomaly score and node representation spaces. GraphNC includes two main components, anomaly score distribution alignment (ScoreDA) and perturbation-based normality regularization (NormReg). ScoreDA optimizes the anomaly scores of our model by aligning them with the score distribution yielded by the teacher model. Due to accurate scores in most of the normal nodes and part of the anomaly nodes in the teacher model, the score alignment effectively pulls the anomaly scores of the normal and abnormal classes toward the two ends, resulting in more separable anomaly scores. Nevertheless, there are inaccurate scores from the teacher model. To mitigate the misleading by these scores, NormReg is designed to regularize the graph normality in the representation space, making the representations of normal nodes more compact by minimizing a perturbation-guided consistency loss solely on the labeled nodes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.02014v3</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Guolei Zeng, Hezhe Qiao, Guoguo Ai, Jinsong Guo, Guansong Pang</dc:creator>
    </item>
    <item>
      <title>Dynamic Function Configuration and its Management in Serverless Computing: A Taxonomy and Future Directions</title>
      <link>https://arxiv.org/abs/2510.02404</link>
      <description>arXiv:2510.02404v2 Announce Type: replace 
Abstract: The serverless cloud computing model offers a framework where the service provider abstracts the underlying infrastructure management from developers. In this serverless model, FaaS provides an event-driven, function-oriented computing service characterised by fine-grained, usage-based pricing that eliminates cost for idle resources. Platforms like AWS Lambda, Azure Functions, and Cloud Run Functions require developers to configure their function(s) with minimum operational resources for its successful execution. This resource allocation influences both the operational expense and the performance quality of these functions. However, a noticeable lack of platform transparency forces developers to rely on expert knowledge or experience-based ad-hoc decisions to request desired function resources. This makes optimal resource configuration a non-trivial task while adhering to performance constraints. Furthermore, while commercial platforms often scale resources like CPU and network bandwidth proportional to memory, open-source frameworks permit independent configuration of function resources, introducing additional complexity for developers aiming to optimise their functions. These complexities have directed researchers to resolve developer challenges and advance towards an efficient server-less execution model. In this article, we identify different aspects of resource configuration techniques in FaaS settings and propose a taxonomy of factors that influence function design, configuration, run-time cost, and performance guarantees. We conduct an analysis of existing literature on resource configuration to present a comprehensive review of current studies on function configuration. We also identify existing research gaps and suggest future research directions to enhance function configuration and strengthen the capabilities of serverless computing environments to drive its broader adoption.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.02404v2</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Siddharth Agarwal, Maria A. Rodriguez, Rajkumar Buyya</dc:creator>
    </item>
    <item>
      <title>VFEM: Visual Feature Empowered Multivariate Time Series Forecasting with Cross-Modal Fusion</title>
      <link>https://arxiv.org/abs/2510.03244</link>
      <description>arXiv:2510.03244v2 Announce Type: replace 
Abstract: Large time series foundation models often adopt channel-independent architectures to handle varying data dimensions, but this design ignores crucial cross-channel dependencies. Meanwhile, existing cross-modal methods predominantly rely on textual modalities, leaving the spatial pattern recognition capabilities of vision models underexplored for time series analysis. To address these limitations, we propose VFEM, a cross-modal forecasting model that leverages pre-trained large vision models (LVMs) to capture complex cross-variable patterns. VFEM transforms multivariate time series into visual representations, enabling LVMs to perceive spatial relationships that are not explicitly modeled by channel-independent models. Through a dual-branch architecture, visual and temporal features are independently extracted and then fused via cross-modal attention, allowing complementary information from both modalities to enhance forecasting. By freezing the LVM and training only 7.45% of the total parameters, VFEM achieves competitive performance on multiple benchmarks, offering a new perspective on multivariate time series forecasting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.03244v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yanlong Wang, Hang Yu, Jian Xu, Fei Ma, Hongkang Zhang, Tongtong Feng, Zijian Zhang, Shao-Lun Huang, Danny Dongning Sun, Xiao-Ping Zhang</dc:creator>
    </item>
    <item>
      <title>Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era</title>
      <link>https://arxiv.org/abs/2510.04127</link>
      <description>arXiv:2510.04127v2 Announce Type: replace 
Abstract: Approximate nearest neighbour (ANN) search underpins large-scale retrieval, increasingly within the retrieval-augmented
  generation pipelines that ground large language models, yet the methods that address it have multiplied across communities
  until they are seldom read as a single field. We argue they form one field with three design choices, and develop the
  projection-quantisation-organisation (PQO) lens, under which locality-sensitive hashing, learned binary hashing, deep
  end-to-end hashing, product quantisation, graph-based indexes, and the binary embeddings of modern vector databases are all
  settings of three coupled questions: where to place the projections, where to place the quantisation thresholds, and how to
  organise the resulting codes. The projection-then-quantisation reading is established; our contribution is the third,
  co-equal organisation stage, a demonstration that the three run unbroken from the field's origins to the deep,
  product-quantisation, graph, and retrieval-augmented eras, and a reproducible measurement that turns the lens from
  classifying methods to predicting them. The measurement yields three findings. First, memory is won on the quantisation axis:
  a one-bit code is a thirty-second the size of the float, and a single full-precision re-ranking pass over a short candidate
  list recovers uncompressed quality in full. Second, the trade-off orderings the lens anticipates recur unchanged as the
  embedding grows. Third, where supervision is available, an eight-byte code more than doubles the quality of the two-kilobyte
  float it replaces. We release these measurements as BitBudget, an extensible benchmark with a live leaderboard, recast
  generative retrieval's "semantic identifiers" as quantisation codes, and identify the open problems that follow as compact
  codes return to the centre of large-scale retrieval.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.04127v2</guid>
      <category>cs.IR</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Sean Moran</dc:creator>
    </item>
    <item>
      <title>Rivaling Transformers: Multi-Scale Structured State-Space Mixtures for Agentic 6G O-RAN</title>
      <link>https://arxiv.org/abs/2510.05255</link>
      <description>arXiv:2510.05255v2 Announce Type: replace 
Abstract: In sixth-generation (6G) Open Radio Access Networks (O-RAN), proactive control is preferable. A key open challenge is delivering control-grade predictions within Near-Real-Time (Near-RT) latency and computational constraints under multi-timescale dynamics. We therefore cast RAN Intelligent Controller (RIC) analytics as an agentic perceive-predict xApp that turns noisy, multivariate RAN telemetry into short-horizon per-User Equipment (UE) key performance indicator (KPI) forecasts to drive anticipatory control. In this regard, Transformers are powerful for sequence learning and time-series forecasting, but they are memory-intensive, which limits Near-RT RIC use. Therefore, we need models that maintain accuracy while reducing latency and data movement. To this end, we propose a lightweight Multi-Scale Structured State-Space Mixtures (MS3M) forecaster that mixes HiPPO-LegS kernels to capture multi-timescale radio dynamics. We develop stable discrete state-space models (SSMs) via bilinear (Tustin) discretization and apply their causal impulse responses as per-feature depthwise convolutions. Squeeze-and-Excitation gating dynamically reweights KPI channels as conditions change, and a compact gated channel-mixing layer models cross-feature nonlinearities without Transformer-level cost. The model is KPI-agnostic -- Reference Signal Received Power (RSRP) serves as a canonical use case -- and is trained on sliding windows to predict the immediate next step. Empirical evaluations conducted using our bespoke O-RAN testbed KPI time-series dataset (59,441 windows across 13 KPIs). Crucially for O-RAN constraints, MS3M achieves a 0.057 s per-inference latency with 0.70M parameters, yielding 3-10x lower latency than the Transformer baselines evaluated on the same hardware, while maintaining competitive accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.05255v2</guid>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Farhad Rezazadeh, Hatim Chergui, Merouane Debbah, Houbing Song, Lingjia Liu</dc:creator>
    </item>
    <item>
      <title>Mitigating Diffusion Model Hallucinations with Dynamic Guidance</title>
      <link>https://arxiv.org/abs/2510.05356</link>
      <description>arXiv:2510.05356v2 Announce Type: replace 
Abstract: Hallucinations in diffusion models are samples with structural inconsistencies that can emerge due to the excessive smoothing of the learned score function, which in turn leads to interpolations between modes of the data distribution. Since semantic interpolations are often desirable and contribute to sample diversity, we believe that a nuanced and targeted solution is required to address diffusion model hallucinations. In this work, we introduce Dynamic Guidance, which mitigates hallucinations by selectively sharpening the score function only along the pre-determined directions known to cause artifacts, while preserving valid semantic variations. This sharpening can be performed using either pre-determined classes or semantically coherent clusters that form pseudo-classes over the data distribution. The latter allows for a principled extension of Dynamic Guidance to text-to-image generation, where we select modes to correspond to fine-grained contextual differences in textual descriptions. To our knowledge, this is the first approach that addresses hallucinations at generation time rather than through post-hoc filtering. Dynamic Guidance substantially reduces hallucinations on both controlled and natural image datasets, significantly outperforming baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.05356v2</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kostas Triaridis, Alexandros Graikos, Aggelina Chatziagapi, Grigorios G. Chrysos, Dimitris Samaras</dc:creator>
    </item>
    <item>
      <title>MixReasoning: Switching Modes to Think</title>
      <link>https://arxiv.org/abs/2510.06052</link>
      <description>arXiv:2510.06052v2 Announce Type: replace 
Abstract: Reasoning models enhance performance by tackling problems in a step-by-step manner, decomposing them into sub-problems and exploring long chains of thought before producing an answer. However, applying extended reasoning to every step introduces substantial redundancy, as sub-problems vary widely in difficulty and complexity: a small number of pivotal steps are genuinely challenging and decisive for the final answer, while many others only involve straightforward revisions or simple computations. Therefore, a natural idea is to endow reasoning models with the ability to adaptively respond to this variation, rather than treating all steps with the same level of elaboration. To this end, we propose MixReasoning, a framework that dynamically adjusts the depth of reasoning within a single response. The resulting chain of thought then becomes a mixture of detailed reasoning on difficult steps and concise inference on simpler ones. Experiments on GSM8K, MATH-500, and AIME show that MixReasoning shortens reasoning length and substantially improves efficiency without compromising accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.06052v2</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haiquan Lu, Gongfan Fang, Xinyin Ma, Qi Li, Xinchao Wang</dc:creator>
    </item>
    <item>
      <title>How Well Do Latent World Models Understand Partially Observable Safety Constraints?</title>
      <link>https://arxiv.org/abs/2510.06492</link>
      <description>arXiv:2510.06492v2 Announce Type: replace 
Abstract: Latent world models are a promising approach for learning state representations and dynamics directly from high-dimensional observations, enabling robot control in hard-to-model settings. However, control performance ultimately depends on the latent representation encoding the required information for the task. In this work, we study latent-space safe control problems and show how partial observability can induce control failures when safety-relevant information is not preserved in the latent state. Specifically, we identify two world model failure modes: estimation gaps, where current observations do not reveal safety-critical quantities (e.g., temperature in a cooking task), and prediction gaps, where failures are observable once they occur but cannot be reliably anticipated from available observations. We introduce two diagnostics for these gaps: a mutual-information-based measure of safety observability and a rollout-based measure of future safety predictability. Finally, we present mitigation strategies for each failure mode: privileged multimodal supervision for estimation gaps and conformal risk calibration for prediction gaps. Across two hardware case studies -- using unimodal RGB world models and multimodal RGB+Tactile and RGB+Thermal variants -- we show that these mitigation strategies improve the safety of a Franka Research 3 manipulator on challenging cooking tasks under partial observability, albeit with increased conservativeness. More broadly, our work raises the question of when world model state representations are sufficient for reliable robot control</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.06492v2</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Matthew Kim, Kensuke Nakamura, Andrea Bajcsy</dc:creator>
    </item>
    <item>
      <title>Large Language Models for Imbalanced Classification: Diversity makes the difference</title>
      <link>https://arxiv.org/abs/2510.09783</link>
      <description>arXiv:2510.09783v2 Announce Type: replace 
Abstract: Oversampling is one of the most widely used approaches for addressing imbalanced classification. The core idea is to generate additional minority samples to rebalance the dataset. Most existing methods, such as SMOTE, require converting categorical variables into numerical vectors, which often leads to information loss. Recently, large language model (LLM)-based methods have been introduced to overcome this limitation. However, current LLM-based approaches typically generate minority samples with limited diversity, reducing robustness and generalizability in downstream classification tasks. To address this gap, we propose a novel LLM-based oversampling method designed to enhance diversity. First, we introduce a sampling strategy that conditions synthetic sample generation on both minority labels and features. Second, we develop a new permutation strategy for fine-tuning pre-trained LLMs. Third, we fine-tune the LLM not only on minority samples but also on interpolated samples to further enrich variability. Extensive experiments on 10 tabular datasets demonstrate that our method significantly outperforms eight SOTA baselines. The generated synthetic samples are both realistic and diverse. Moreover, we provide theoretical analysis through an entropy-based perspective, proving that our method encourages diversity in the generated samples.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.09783v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dang Nguyen, Sunil Gupta, Kien Do, Thin Nguyen, Taylor Braund, Alexis Whitton, Svetha Venkatesh</dc:creator>
    </item>
    <item>
      <title>Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization</title>
      <link>https://arxiv.org/abs/2510.10028</link>
      <description>arXiv:2510.10028v2 Announce Type: replace 
Abstract: The rapid advancement of Low-Altitude Economy Networks (LAENets) has enabled a variety of applications, including aerial surveillance, environmental sensing, and semantic data collection. To support these scenarios, unmanned aerial vehicles (UAVs) equipped with onboard vision-language models (VLMs) offer a promising solution for real-time multimodal inference. However, ensuring both inference accuracy and communication efficiency remains a significant challenge due to limited onboard resources and dynamic network conditions. In this paper, we first propose a UAV-enabled LAENet system model that jointly captures UAV mobility, user-UAV communication, and the onboard visual question answering (VQA) pipeline. Based on this model, we formulate a mixed-integer non-convex optimization problem to minimize task latency and power consumption under user-specific accuracy constraints. To solve the problem, we design a hierarchical optimization framework composed of two parts: (i) an Alternating Resolution and Power Optimization (ARPO) algorithm for resource allocation under accuracy constraints, and (ii) a Large Language Model-augmented Reinforcement Learning Approach (LLaRA) for adaptive UAV trajectory optimization. The large language model (LLM) serves as an expert in refining reward design of reinforcement learning in an offline fashion, introducing no additional latency in real-time decision-making. Numerical results demonstrate the efficacy of our proposed framework in improving inference performance and communication efficiency under dynamic LAENet conditions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.10028v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yang Li, Ruichen Zhang, Yinqiu Liu, Guangyuan Liu, Abbas Jamalipour, Xianbin Wang, Dong In Kim</dc:creator>
    </item>
    <item>
      <title>RECON: Reasoning with Condensation for Efficient Retrieval-Augmented Generation</title>
      <link>https://arxiv.org/abs/2510.10448</link>
      <description>arXiv:2510.10448v2 Announce Type: replace 
Abstract: Search agents trained with reinforcement learning (RL) interleave reasoning with tool calls in a multi-turn, tool-integrated reasoning (TIR) loop, where each tool invocation returns an environment observation that is appended to the agent's context. As the rollout proceeds, these raw observations accumulate, inflating token cost and diluting the signal available for downstream reasoning. Unlike single-pass retrieve-then-read pipelines, where context compression is a one-time postprocessing step, the multi-turn RL setting requires compression that runs at every observation step while remaining decoupled from policy optimization. We introduce RECON (REasoning with CONdensation), a framework that addresses this challenge by inserting a dedicated observation compressor into the reasoning loop. The compressor is trained via a two-stage curriculum: relevance pretraining on QA datasets followed by multi-aspect distillation from proprietary LLMs, and remains frozen during RL training to preserve policy stability. Integrated into the Search-R1 search-agent pipeline, RECON reduces total context length by 35%, improves training speed by 5.4% and inference latency by 30.9%, while boosting average exact-match by 14.5% on the 3B agent and 3.0% on the 7B agent, with particular strength in multi-hop QA. These results establish learned observation compression as a key component for building practical, scalable RL-trained search agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.10448v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhichao Xu, Minheng Wang, Yawei Wang, Wenqian Ye, Yuntao Du, Yunpu Ma, Yijun Tian</dc:creator>
    </item>
    <item>
      <title>MatSciBench: Benchmarking the Reasoning Ability of Large Language Models in Materials Science</title>
      <link>https://arxiv.org/abs/2510.12171</link>
      <description>arXiv:2510.12171v2 Announce Type: replace 
Abstract: Large Language Models have shown strong scientific reasoning ability, but their performance on materials science problems remains less studied. To fill this gap, we introduce MatSciBench, a comprehensive college-level benchmark comprising 1340 problems that span the essential subdisciplines of materials science. MatSciBench features a structured and fine-grained taxonomy that categorizes materials science questions into 6 primary fields and 31 subfields, together with a three-tier difficulty classification based on the reasoning length needed to solve each problem. MatSciBench includes detailed reference solutions for 946 questions, supports process-level error analysis, and contains 315 questions with images for evaluating multimodal reasoning. We evaluate leading thinking and non-thinking LLMs on MatSciBench, and further test three reasoning methods for non-thinking models: basic chain-of-thought prompting, tool augmentation, and self-correction. The results show that current models still face clear limits in college-level materials science reasoning. DeepSeek-R1 achieves the highest score on text-only questions at 75.22% accuracy, and GPT-5 performs the best on questions with images at 53.02%. Our analysis shows that tool augmentation improves many non-thinking models in a token-efficient way, while self-correction often fails to provide reliable gains and can revise correct answers into incorrect ones. We further analyze performance across difficulty levels, reasoning efficiency, multimodal reasoning, and failure patterns, and find that current models are mainly limited by domain knowledge gaps, calculation errors, problem comprehension failures, and difficulty in extracting precise information from scientific figures. Overall, MatSciBench provides a clear testbed for measuring current LLM limitations and guiding future work on scientific reasoning in materials science.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.12171v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Junkai Zhang, Jingru Gan, Xiaoxuan Wang, Zian Jia, Changquan Gu, Jianpeng Chen, Yanqiao Zhu, Mingyu Derek Ma, Dawei Zhou, Ling Li, Wei Wang</dc:creator>
    </item>
    <item>
      <title>Accelerating Bidiagonalization of Banded Matrices through Memory-Aware Bulge-Chasing on GPUs</title>
      <link>https://arxiv.org/abs/2510.12705</link>
      <description>arXiv:2510.12705v3 Announce Type: replace 
Abstract: The reduction of a banded matrix to bidiagonal form is a critical step in the calculation of Singular Values, a cornerstone of scientific computing and AI. Although inherently parallel, this step has traditionally been considered unsuitable for GPUs due to its memory-bound nature. However, recent advances in GPU architectures, such as increased L1 memory per Streaming Multiprocessor or Compute Unit and larger L2 caches, have shifted this paradigm. In this work, we present the first GPU-accelerated algorithm for reducing a banded matrix to bidiagonal form, integrated into an open-source software package. Our algorithm builds on prior multicore CPU cache-efficient bulge-chasing methods, adapted to modern GPU architectures to optimize throughput. Leveraging Julia's high-level array abstractions and KernelAbstractions.jl, we implement a single function that is both hardware-agnostic and data-precision-aware, running efficiently across NVIDIA, AMD, Intel, and Apple Metal GPUs. We develop a hardware-aware performance model to guide tuning and identify key hyperparameters that govern optimal GPU performance for memory-bound workloads. We show that such workloads, when carefully optimized, can achieve substantial speed-ups on modern GPUs: our implementation outperforms multithreaded CPU libraries (PLASMA,SLATE) starting from matrix sizes as small as 1024x1024, and achieves over 100x speed-up on 32k x 32k matrices. Moreover, the algorithm's performance scales linearly with the matrix bandwidth, enabling efficient reduction of matrices with larger bandwidths, previously considered impractical.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.12705v3</guid>
      <category>cs.DC</category>
      <category>cs.MS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Evelyne Ringoot, Rabab Alomairy, Alan Edelman</dc:creator>
    </item>
    <item>
      <title>Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization</title>
      <link>https://arxiv.org/abs/2510.13554</link>
      <description>arXiv:2510.13554v2 Announce Type: replace 
Abstract: The reasoning pattern of Large language models (LLMs) remains opaque, and reinforcement learning (RL) typically applies uniform credit across an entire generation, blurring the distinction between pivotal and routine steps. This work positions attention as a privileged substrate that renders the internal logic of LLMs legible, not merely as a byproduct of computation, but as a mechanistic blueprint of reasoning itself. We first distinguish attention heads between locally and globally focused information processing and reveal that locally focused heads produce a sawtooth pattern near the diagonal indicating phrasal chunks, while globally focused heads expose tokens that exert broad downstream influence over future tokens. We formalize these with two metrics: 1) Windowed Average Attention Distance, which measures the extent of backward attention within a clipped window; 2) Future Attention Influence, which quantifies a token's global importance as the average attention it receives from subsequent tokens. Taken together, these signals reveal a recurring preplan-and-anchor mechanism, where the model first performs a long-range contextual reference to generate an introductory token, which is immediately followed by or coincides with a semantic anchor token that organizes subsequent reasoning. Leveraging these insights, we introduce three novel RL strategies that dynamically perform targeted credit assignment to critical nodes (preplan tokens, anchor tokens, and their temporal coupling) and show consistent performance gains across various reasoning tasks. By aligning optimization with the model's intrinsic reasoning rhythm, we aim to transform opaque optimization into an actionable structure-aware process, hoping to offer a potential step toward more transparent and effective optimization of LLM reasoning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.13554v2</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yang Li, Zhichen Dong, Yuhan Sun, Weixun Wang, Shaopan Xiong, Yijia Luo, Jiashun Liu, Han Lu, Jiamang Wang, Wenbo Su, Bo Zheng, Junchi Yan</dc:creator>
    </item>
    <item>
      <title>GLP: A Grassroots, Multiagent, Concurrent, Logic Programming Language for AI (Full Version)</title>
      <link>https://arxiv.org/abs/2510.15747</link>
      <description>arXiv:2510.15747v3 Announce Type: replace 
Abstract: A grassroots platform is a multiagent distributed system in which multiple independent instances can form and operate independently of each other and of any global resource, yet may coalesce into ever larger instances, possibly resulting in a single global instance. Grassroots platforms aim to offer an egalitarian/democratic alternative to centralised/autocratic and decentralised/plutocratic global platforms.
  Here, we present Grassroots Logic Programs (GLP), a multiagent concurrent logic programming language designed for the implementation of grassroots platforms: we recall the standard operational semantics of logic programs; introduce the concurrent operational semantics of GLP as its restriction; recall multiagent atomic transactions; use them to introduce a multiagent operational semantics of GLP; and prove multiagent GLP to be grassroots. The grassroots social graph -- the foundational grassroots platform on which all others are based -- serves as a GLP programming example.
  These mathematical foundations are being used by AI to implement GLP as well as to program in GLP: a workstation-based implementation of concurrent GLP in Dart was derived from the concurrent operational semantics of GLP; a multiagent smartphone-based implementation of GLP in Dart/Flutter is being developed based on the multiagent operational semantics of GLP; a moded type system for GLP was designed (and implemented by AI in Dart) to facilitate collaborative human-AI development of GLP programs, where AI derives working GLP programs from human-approved type definitions and declarations; GLP implementations of grassroots platforms for the social graph, social networks, currencies and bonds, and more, have been derived by AI from mathematical specifications written as volitional multiagent atomic transactions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.15747v3</guid>
      <category>cs.PL</category>
      <category>cs.CR</category>
      <category>cs.DC</category>
      <category>cs.LO</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Ehud Shapiro</dc:creator>
    </item>
    <item>
      <title>TAO: Tolerance-Aware Optimistic Verification for Floating-Point Neural Networks</title>
      <link>https://arxiv.org/abs/2510.16028</link>
      <description>arXiv:2510.16028v4 Announce Type: replace 
Abstract: Neural networks increasingly run on hardware outside the user's control (cloud GPUs, inference marketplaces). Yet ML-as-a-Service reveals little about what actually ran or whether returned outputs faithfully reflect the intended inputs. Users lack recourse against service downgrades (model swaps, quantization, graph rewrites, or discrepancies like altered ad embeddings). Verifying outputs is hard because floating-point(FP) execution on heterogeneous accelerators is inherently nondeterministic. Existing approaches are either impractical for real FP neural networks or reintroduce vendor trust. We present TAO: a Tolerance Aware Optimistic verification protocol that accepts outputs within principled operator-level acceptance regions rather than requiring bitwise equality. TAO combines two error models: (i) sound per-operator IEEE-754 worst-case bounds and (ii) tight empirical percentile profiles calibrated across hardware. Discrepancies trigger a Merkle-anchored, threshold-guided dispute game that recursively partitions the computation graph until one operator remains, where adjudication reduces to a lightweight theoretical-bound check or a small honest-majority vote against empirical thresholds. Unchallenged results finalize after a challenge window, without requiring trusted hardware or deterministic kernels. We implement TAO as a PyTorch-compatible runtime and a contract layer currently deployed on Ethereum Holesky testnet. The runtime instruments graphs, computes per-operator bounds, and runs unmodified vendor kernels in FP32 with negligible overhead (0.3% on Qwen3-8B). Across CNNs, Transformers and diffusion models on A100, H100, RTX6000, RTX4090, empirical thresholds are $10^2-10^3$ times tighter than theoretical bounds, and bound-aware adversarial attacks achieve 0% success. Together, TAO reconciles scalability with verifiability for real-world heterogeneous ML compute.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.16028v4</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1145/3767295.3803612</arxiv:DOI>
      <arxiv:journal_reference>Proceedings of the 21st European Conference on Computer Systems, (2026) 1515-1532</arxiv:journal_reference>
      <dc:creator>Jianzhu Yao, Hongxu Su, Taobo Liao, Zerui Cheng, Huan Zhang, Xuechao Wang, Pramod Viswanath</dc:creator>
    </item>
    <item>
      <title>PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits</title>
      <link>https://arxiv.org/abs/2510.17947</link>
      <description>arXiv:2510.17947v3 Announce Type: replace 
Abstract: Large Language Models (LLMs) are improving at an exceptional rate. With the advent of agentic workflows, multi-turn dialogue has become the de facto mode of interaction with LLMs for completing long and complex tasks. While LLM capabilities continue to improve, they remain increasingly susceptible to jailbreaking, especially in multi-turn scenarios where harmful intent can be subtly injected across the conversation to produce nefarious outcomes. While single-turn attacks have been extensively explored, adaptability, efficiency and effectiveness continue to remain key challenges for their multi-turn counterparts. To address these gaps, we present PLAGUE, a novel plug-and-play framework for designing multi-turn attacks inspired by lifelong-learning agents. PLAGUE dissects the lifetime of a multi-turn attack into three carefully designed phases (Primer, Planner and Finisher) that enable a systematic and information-rich exploration of the multi-turn attack family. Evaluations show that red-teaming agents designed using PLAGUE achieve state-of-the-art jailbreaking results, improving attack success rates (ASR) by more than 30% across leading models in a lesser or comparable query budget. Particularly, PLAGUE enables an ASR (based on StrongReject) of 81.4% on OpenAI's o3 and 67.3% on Claude's Opus 4.1, two models that are considered highly resistant to jailbreaks in safety literature. Our work offers tools and insights to understand the importance of plan initialization, context optimization and lifelong learning in crafting multi-turn attacks for a comprehensive model vulnerability evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.17947v3</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Neeladri Bhuiya, Madhav Aggarwal, Diptanshu Purwar</dc:creator>
    </item>
    <item>
      <title>AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library</title>
      <link>https://arxiv.org/abs/2510.18428</link>
      <description>arXiv:2510.18428v4 Announce Type: replace 
Abstract: Optimization modeling underlies critical decision-making across industries, yet remains difficult to automate: natural-language problem descriptions must be translated into precise mathematical formulations and executable solver code. Existing LLM-based approaches typically rely on brittle prompting or costly retraining, both of which offer limited generalization. Recent work suggests that large models can improve via experience reuse, but how to systematically acquire, refine, and reuse such experience in structurally constrained settings remains unclear. We present \textbf{AlphaOPT}, a self-improving experience library that enables LLMs to learn optimization modeling knowledge from limited supervision, including answer-only feedback without gold-standard programs, annotated reasoning traces, or parameter updates. AlphaOPT operates in a continual two-phase cycle: a \emph{Library Learning} phase that extracts solver-verified, structured insights from failed attempts, and a \emph{Library Evolution} phase that refines the applicability of stored insights based on aggregate evidence across tasks. This design allows the model to accumulate reusable modeling principles, improve transfer across problem instances, and maintain bounded library growth over time. Evaluated on multiple optimization benchmarks, AlphaOPT steadily improves as more training data become available (65\% $\rightarrow$ 72\% from 100 to 300 training items) and outperforms the strongest baseline by 9.1\% and 8.2\% on two out-of-distribution datasets. These results demonstrate that structured experience learning, grounded in solver feedback, provides a practical alternative to retraining for complex reasoning tasks requiring precise formulation and execution. All code and data are available at: https://github.com/Minw913/AlphaOPT.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.18428v4</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Minwei Kong, Ao Qu, Xiaotong Guo, Wenbin Ouyang, Chonghe Jiang, Han Zheng, Yining Ma, Dingyi Zhuang, Yuhan Tang, Junyi Li, Shenhao Wang, Haris Koutsopoulos, Hai Wang, Cathy Wu, Jinhua Zhao</dc:creator>
    </item>
    <item>
      <title>When Users Are Happy but Agents Are Wrong: Multi-Dimensional Evaluation of Tool-Augmented Dialogue</title>
      <link>https://arxiv.org/abs/2510.19186</link>
      <description>arXiv:2510.19186v2 Announce Type: replace 
Abstract: Evaluating conversational AI systems that use external tools is challenging, as errors can arise from complex interactions among user, agent, and tools. While existing evaluation methods assess either user satisfaction or agents' tool-calling capabilities, they fail to capture critical errors in multi-turn tool-augmented dialogues-such as when agents misinterpret tool results yet appear satisfactory to users. We introduce TRACE, a benchmark of systematically synthesized tool-augmented conversations covering diverse error cases. Evaluation with state-of-the-art conversation evaluation frameworks reveals that all approaches remain far from ideal performance, demonstrating the fundamental difficulty of this benchmark.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.19186v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Tanya Shourya, Yingfan Wang, Zhaoyi Joey Hou, Shamik Roy, Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah</dc:creator>
    </item>
    <item>
      <title>Optimal Kron-based Reduction of Networks (Opti-KRON) for Three-phase Distribution Feeders</title>
      <link>https://arxiv.org/abs/2510.19608</link>
      <description>arXiv:2510.19608v3 Announce Type: replace 
Abstract: This paper presents a novel structure-preserving, Kron-based reduction framework for unbalanced distribution feeders. The method aggregates electrically similar nodes within a mixed-integer optimization (MIP) problem to produce reduced networks that optimally reproduce the voltage profiles of the original full network. To overcome computational bottlenecks of MIP formulations, we propose an exhaustive-search formulation to identify optimal aggregation decisions while enforcing voltage margin limits. The proposed exhaustive network reduction algorithm is parallelizable on GPUs, which enables scalable network reduction. The resulting reduced networks approximate the full system's voltage profiles with low errors and are suitable for steady-state analysis and optimal power flow studies. The framework is validated on two real utility distribution feeders with 5,991 and 8,381 nodes. The reduced models achieve up to 90% and 80% network reduction, respectively, while the maximum voltage-magnitude error remains below 0.003 p.u. Furthermore, on a 1000-node version of the network, the GPU-accelerated reduction algorithm runs up to 15x faster than its CPU-based counterpart.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.19608v3</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Omid Mokhtari, Samuel Chevalier, Mads Almassalkhi</dc:creator>
    </item>
    <item>
      <title>PEDRA: Evaluating the Realism of Pedestrian Dynamics in Video Generation</title>
      <link>https://arxiv.org/abs/2510.20182</link>
      <description>arXiv:2510.20182v2 Announce Type: replace 
Abstract: Pedestrian simulation traditionally relies on expert-tuned, hand-crafted models that limit scalability and generalization. Meanwhile, large-scale video generation models have achieved high visual realism across diverse settings, motivating exploration of their potential as general-purpose world simulators. Existing benchmarks primarily assess single-subject realism rather than scenes with multiple interacting people, leaving the plausibility of multi-agent dynamics in generated videos untested. We propose a rigorous evaluation protocol to benchmark text-to-video (T2V) and image-to-video (I2V) models as implicit simulators of pedestrian dynamics. For I2V, we leverage start frames from established datasets to enable direct comparison with ground truth videos, while for T2V we design a prompt suite covering varied crowd densities and interaction types. A key component is a method to reconstruct 2D bird's-eye view trajectories from pixel-space without known camera parameters. Our analysis shows that leading models exhibit effective priors for plausible multi-agent behavior, though issues such as merging and disappearing pedestrians reveal limits to their physical consistency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.20182v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Aaron Appelle, Jerome P. Lynch</dc:creator>
    </item>
    <item>
      <title>A Variational Framework for the Complexity of PDE Solutions</title>
      <link>https://arxiv.org/abs/2510.21290</link>
      <description>arXiv:2510.21290v3 Announce Type: replace 
Abstract: Partial Differential Equations (PDEs) are fundamental mathematical models for describing physical phenomena, yet most PDEs of practical interest require numerical approximations. The feasibility of such methods is constrained by existing computational models. Since digital computers are the primary realizations of numerical computations, and Turing machines define their theoretical limits, computability of PDE solutions is of fundamental significance. It provides a rigorous framework to distinguish equations that are effectively solvable from those that encode undecidable or non-computable behavior. Once computability is established, complexity theory quantifies the resources required to approximate PDE solutions. In this work, we present a novel framework based on least-squares variational formulations and associated gradient flows to analyze the computability and complexity of PDE solutions from an optimization perspective. Our approach approximates PDE solution operators via discrete gradient flows, linking PDE properties, such as coercivity, ellipticity, and convexity, to solution complexity. Within this setting, we characterize representation- and discretization-dependent sufficient conditions for regimes where PDEs admit polynomial-time approximations, as well as regimes exhibiting complexity blowup, where polynomial-time input data produce solutions with super-polynomial complexity. In summary, this paper develops a variational framework for analyzing computability and computational complexity of PDE solution classes. The results show how PDE structure and solution regularity influence their complexity, by establishing sufficient conditions for computability and complexity bounds. Beyond the theoretical characterization, the framework provides guidelines for effective numerical methods and contributes to understanding the limitations of digital computation for PDE problems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.21290v3</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Juan Esteban Suarez Cardona, Holger Boche, Gitta Kutyniok</dc:creator>
    </item>
    <item>
      <title>Security Analysis of LTE Connectivity in Connected Cars: A Case Study of Tesla</title>
      <link>https://arxiv.org/abs/2510.22024</link>
      <description>arXiv:2510.22024v2 Announce Type: replace 
Abstract: Modern connected vehicles rely on persistent LTE connectivity to enable remote diagnostics, over-the-air (OTA) updates, and safety-relevant services. While mobile network vulnerabilities are well documented in the smartphone ecosystem, their impact in safety-relevant automotive settings remains insufficiently examined. We conduct a black-box case study of LTE security in Tesla's Model 3 and Cybertruck, revealing systemic protocol weaknesses and architectural misconfigurations in connected vehicles. We find that Tesla's telematics stack is susceptible to IMSI catching, rogue base station hijacking, and insecure fallback mechanisms that may silently degrade service availability. Furthermore, legacy control-plane configurations allow for silent SMS injection and broadcast message spoofing without driver awareness. While the vulnerabilities are grounded in Tesla, this case study suggests broader implications for connected-vehicle telematics and for regulatory frameworks such as ISO/SAE 21434 and UN R155/R156, which assume secure, traceable, and resilient telematics in modern vehicles.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.22024v2</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Evangelos Bitsikas, Jason Veara, Aanjhan Ranganathan</dc:creator>
    </item>
    <item>
      <title>SmartMixed: A Two-Phase Training Strategy for Adaptive Activation Function Learning in Neural Networks</title>
      <link>https://arxiv.org/abs/2510.22450</link>
      <description>arXiv:2510.22450v3 Announce Type: replace 
Abstract: The choice of activation function plays a critical role in neural networks, yet most architectures still rely on fixed, uniform activation functions across all neurons. We introduce SmartMixed, a novel two-phase training strategy that allows networks to learn optimal per-neuron activation functions while preserving computational efficiency at inference. In the first phase, neurons adaptively select from a pool of candidate activation functions (ReLU, Sigmoid, Tanh, Leaky\_ReLU, ELU, SELU) using a differentiable hard mixture mechanism. In the second phase, each neuron's activation function is fixed according to the learned selection, resulting in a computationally efficient network that supports continued training with optimized vectorized operations. We evaluate SmartMixed on the MNIST dataset using feedforward neural networks of different architectures. Our analysis reveals that neurons in different layers exhibit distinct preferences for activation functions, providing insights into the functional diversity within neural architectures. We also demonstrated that SmartMixed effectively trains the network by allowing neurons to select their preferred activation functions, competing against models using a single fixed state-of-the-art activation function.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.22450v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Amin Omidvar</dc:creator>
    </item>
    <item>
      <title>Future-Proofing Authentication Against Insecure Bootstrapping for 5G Networks: Feasibility, Resiliency, and Accountability</title>
      <link>https://arxiv.org/abs/2510.23457</link>
      <description>arXiv:2510.23457v3 Announce Type: replace 
Abstract: The 5G protocol lacks a robust base station (BS) authentication mechanism during the initial bootstrapping phase, leaving it susceptible to fake BSs, spoofed broadcasts, and large-scale manipulation of System Information Blocks (SIBs). Existing solutions incur high communication overhead, rely on centralized trust, and lack accountability and long-term breach resiliency. Given the inevitability of BS compromise and the severe impact of forged SIBs as the root of trust (e.g., fake alerts, tracking, false roaming), distributed trust, verifiable forgery detection, and audit logging are essential yet remain largely unexplored. These challenges are further amplified by the emergence of quantum-capable adversaries. While NIST Post-Quantum Cryptography (PQC) standards are widely viewed as a path toward long-term security, their feasibility under 5G's strict packet-size, latency, and broadcast constraints has not been systematically studied. This work presents, to our knowledge, the first comprehensive network-level performance characterization of integrating NIST-PQC standards and conventional digital signatures into 5G BS authentication, showing that direct PQC adoption is impractical due to excessive signature sizes, fragmentation, and protocol-level delays. To address these challenges, we propose BORG, a future-proof authentication framework based on a Hierarchical Identity-Based Threshold Signature with Fail-Stop (HITFS) properties. BORG distributes trust across multiple BSs via threshold signing, enables post-mortem verifiable forgery detection, and provides tamper-evident, PQ-secure audit logging, while maintaining compact signatures that fit within a single SIB1 packet without fragmentation and incurring minimal UE overhead, as validated through our real over-the-air 5G testbed implementation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.23457v3</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Saleh Darzi, Mirza Masfiqur Rahman, Imtiaz Karim, Rouzbeh Behnia, Attila A Yavuz, Elisa Bertino</dc:creator>
    </item>
    <item>
      <title>Pragmatic Theories Enhance Understanding of Implied Meanings in LLMs</title>
      <link>https://arxiv.org/abs/2510.26253</link>
      <description>arXiv:2510.26253v3 Announce Type: replace 
Abstract: The ability to accurately interpret implied meanings plays a crucial role in human communication and language use, and language models are also expected to possess this capability. This study demonstrates that providing language models with pragmatic theories as prompts is an effective in-context learning approach for tasks to understand implied meanings. Specifically, we propose an approach in which an overview of pragmatic theories, such as Gricean pragmatics and Relevance Theory, is presented as a prompt to the language model, guiding it through a step-by-step reasoning process to derive a final interpretation. Experimental results showed that, compared to the baseline, which prompts intermediate reasoning without presenting pragmatic theories (0-shot Chain-of-Thought), our methods enabled language models to achieve up to 9.6\% higher scores on pragmatic reasoning tasks. Furthermore, we show that even without explaining the details of pragmatic theories, merely mentioning their names in the prompt leads to a certain performance improvement (around 1-3%) in larger models compared to the baseline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.26253v3</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Takuma Sato, Seiya Kawano, Koichiro Yoshino</dc:creator>
    </item>
    <item>
      <title>A Survey of Heterogeneous Graph Neural Networks for Cybersecurity Anomaly Detection</title>
      <link>https://arxiv.org/abs/2510.26307</link>
      <description>arXiv:2510.26307v3 Announce Type: replace 
Abstract: Anomaly detection is a critical task in cybersecurity, where identifying insider threats, access violations, and coordinated attacks is essential for ensuring system resilience. Graph-based approaches have become increasingly important for modeling entity interactions, yet most rely on homogeneous and static structures, which limits their ability to capture the heterogeneity and temporal evolution of real-world environments. Heterogeneous Graph Neural Networks (HGNNs) have emerged as a promising paradigm for anomaly detection by incorporating type-aware transformations and relation-sensitive aggregation, enabling more expressive modeling of complex cyber data. However, current research on HGNN-based anomaly detection remains fragmented, with diverse modeling strategies, limited comparative evaluation, and an absence of standardized benchmarks. To address this gap, we provide a comprehensive survey of HGNN-based anomaly detection methods in cybersecurity. We introduce a taxonomy that classifies approaches by anomaly type and graph dynamics, analyze representative models, and map them to key cybersecurity applications. We also review commonly used benchmark datasets and evaluation metrics, highlighting their strengths and limitations. Finally, we identify key open challenges related to modeling, data, and deployment, and outline promising directions for future research. This survey aims to establish a structured foundation for advancing HGNN-based anomaly detection toward scalable, interpretable, and practically deployable solutions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.26307v3</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Laura Jiang, Reza Ryan, Qian Li, Nasim Ferdosian</dc:creator>
    </item>
    <item>
      <title>TempoBench: Evaluating Temporal Causal Reasoning in Large Language Models</title>
      <link>https://arxiv.org/abs/2510.27544</link>
      <description>arXiv:2510.27544v2 Announce Type: replace 
Abstract: Temporal reasoning involves understanding how systems evolve over time through input-driven state transitions. A key aspect is temporal causal reasoning, causally reasoning about what prior inputs were necessary in causing an observed outcome. While large language models (LLMs) perform well at forward simulation, predicting outputs from inputs, they struggle to identify the minimal causal inputs of outcomes. To study this distinction, we define two tasks: \textit{trace simulation} (SIM), which requires models to simulate system execution, and \textit{minimal causal attribution} (MIN), which identifies the minimal set of inputs necessary for a given outcome. We introduce \textsc{TempoBench}, the first formally verified benchmark for temporal causal reasoning, built from synthesized Mealy machines with controllable complexity and provably correct causal labels. Across frontier models, we observe that despite achieving up to 96\% accuracy on the SIM task, performance on the causal attribution MIN task drops below 25\%; models fail to reason about causal necessity. Over 94\% of causal errors involve overspecification, where models perform retrieval and list all possible inputs rather than reasoning about the minimal causal subset. Fine-tuning on \textsc{TempoBench} training corpus improves causal reasoning and generalizes better than math, code, or instruction training, with gains across standard reasoning benchmarks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.27544v2</guid>
      <category>cs.AI</category>
      <category>cs.FL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Nikolaus Holzer, William Fishell, Baishakhi Ray, Mark Santolucito</dc:creator>
    </item>
    <item>
      <title>pacSTL: PAC-Bounded Signal Temporal Logic from Data-Driven Reachability Analysis</title>
      <link>https://arxiv.org/abs/2511.00934</link>
      <description>arXiv:2511.00934v2 Announce Type: replace 
Abstract: Signal Temporal Logic (STL) is an expressive language for specifying behaviors of dynamical systems from continuous signals. However, a limitation of standard STL is its inherently deterministic semantics, which prevents it from accommodating uncertainty. Existing approaches to overcome this limitation are computationally costly and limit real-time capability, requiring repeated trajectory sampling or the redesign of probability distributions over atomic propositions whenever the atomic propositions or specifications change. We introduce pacSTL, a framework that combines Probably Approximately Correct (PAC)-bounded reachable set predictions with an interval extension of STL. pacSTL computes lower and upper bounds on atomic robustness values by solving optimization problems over PAC-bounded reachable sets and propagates the bounds through the temporal logic operators. The resulting evaluation yields a PAC-bounded robustness interval at the specification level. We demonstrate the efficiency and relevance of pacSTL by verifying a quadrotor flight scenario and runtime monitoring a maritime navigation specification.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.00934v2</guid>
      <category>cs.LO</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hanna Krasowski, Elizabeth Dietrich, Emir Cem Gezer, Roger Skjetne, Asgeir Johan S{\o}rensen, Murat Arcak</dc:creator>
    </item>
    <item>
      <title>Bulk-boundary decomposition of neural networks</title>
      <link>https://arxiv.org/abs/2511.02003</link>
      <description>arXiv:2511.02003v2 Announce Type: replace 
Abstract: We present the bulk--boundary decomposition as a new framework for understanding the training dynamics of deep neural networks. Starting from the stochastic gradient descent formulation, we show that the Lagrangian can be reorganized into a data-independent bulk term and a data-dependent boundary term. The bulk captures the intrinsic dynamics set by network architecture and activation functions, while the boundary reflects stochastic interactions from training samples at the input and output layers. This decomposition exposes the local and homogeneous structure underlying deep networks. As a physical consequence of locality and homogeneity, we derive the energy continuity equation within a deep neural network.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.02003v2</guid>
      <category>cs.LG</category>
      <category>cond-mat.dis-nn</category>
      <category>hep-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Donghee Lee, Hye-Sung Lee, Jaeok Yi</dc:creator>
    </item>
    <item>
      <title>Artificial-reference tracking MPC with probabilistically validated performance on industrial embedded systems</title>
      <link>https://arxiv.org/abs/2511.03603</link>
      <description>arXiv:2511.03603v2 Announce Type: replace 
Abstract: Industrial embedded systems are typically used to execute simple control algorithms due to their low computational resources. Despite these limitations, the implementation of advanced control techniques such as Model Predictive Control (MPC) has been explored by the control community in recent years, typically considering simple linear formulations or explicit ones to facilitate the online computation of the control input. These simplifications often lack features and properties that are desirable in real-world environments. This article presents an efficient implementation for embedded systems of MPC for tracking with artificial reference, solved via a recently developed structure-exploiting ADMM-based algorithm. This formulation is tailored to a wide range of applications by incorporating essential practical features at a small computational cost, including integration with an offset-free scheme, back-off parameters that enable constraint tightening, and soft constraints that preserve feasibility under disturbances or plant-model mismatch. This is accompanied with a framework for probabilistic performance validation of the closed-loop system over long-term operation. The applicability of the approach is illustrated on a Programmable Logic Controller (PLC), incorporated in a hardware-in-the-loop setup to control a nonlinear continuous stirred-tank reactor. The behavior of the closed-loop system is probabilistically validated with respect to constraint violations and the number of iterations required at each time step by the MPC optimization algorithm.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.03603v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1016/j.conengprac.2026.107032</arxiv:DOI>
      <arxiv:journal_reference>Control Engineering Practice, Volume 174, 2026, 107032</arxiv:journal_reference>
      <dc:creator>Victor Gracia, Pablo Krupa, Filiberto Fele, Teodoro Alamo</dc:creator>
    </item>
    <item>
      <title>Benchmark Datasets for Lead-Lag Forecasting on Social Platforms</title>
      <link>https://arxiv.org/abs/2511.03877</link>
      <description>arXiv:2511.03877v2 Announce Type: replace 
Abstract: Social and collaborative platforms emit multivariate time-series traces in which early interactions -- such as views, likes, or downloads -- are followed, sometimes months or years later, by higher impact like citations, sales, or reviews. We formalize this setting as Lead-Lag Forecasting (LLF): given an early usage channel (the lead), predict a correlated but temporally shifted outcome channel (the lag). Despite the ubiquity of such patterns, LLF has not been treated as a unified forecasting problem within the time-series community, largely due to the absence of standardised datasets. To anchor research in LLF, here we present two high-volume benchmark datasets: arXiv (accesses -&gt; citations of 2.3M papers) and GitHub (pushes/stars -&gt; forks of 3M repositories). Our datasets provide ideal testbeds for lead-lag forecasting, by capturing long-horizon dynamics across years, spanning the full spectrum of outcomes, and avoiding survivorship bias in sampling. We documented all technical details of data curation and cleaning, verified the presence of lead-lag dynamics through statistical and classification tests, and benchmarked parametric and non-parametric baselines for regression. Our study establishes LLF as a novel forecasting paradigm and lays an empirical foundation for its systematic exploration in social and usage data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.03877v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3770855.3817523</arxiv:DOI>
      <dc:creator>Kimia Kazemian (Department of Computer Science, Cornell University), Zhenzhen Liu (Department of Computer Science, Cornell University), Yangfanyu Yang (Department of Information Science, Cornell University), Katie Luo (Department of Computer Science, Stanford University), Shuhan Gu (Department of Computer Science, Cornell University), Audrey Du (Department of Computer Science, Cornell University), Xinyu Yang (Department of Information Science, Cornell University), Jack Jansons (Department of Computer Science, Cornell University), Kilian Q. Weinberger (Department of Computer Science, Cornell University), John Thickstun (Department of Computer Science, Cornell University), Yian Yin (Department of Information Science, Cornell University), Sarah Dean (Department of Computer Science, Cornell University)</dc:creator>
    </item>
    <item>
      <title>Decomposable Neuro Symbolic Regression</title>
      <link>https://arxiv.org/abs/2511.04124</link>
      <description>arXiv:2511.04124v3 Announce Type: replace 
Abstract: Symbolic regression (SR) models complex systems by discovering mathematical expressions that capture underlying relationships in observed data. However, most SR methods prioritize minimizing prediction error over identifying the governing equations, often producing overly complex or inaccurate expressions. To address this, we present a decomposable SR method that generates interpretable multivariate expressions leveraging transformer models, genetic algorithms (GAs), and genetic programming (GP). In particular, our explainable SR method distills a trained ``opaque'' regression model into mathematical expressions that serve as explanations of its computed function. Our method employs a Multi-Set Transformer to generate multiple univariate symbolic skeletons that characterize how each variable influences the opaque model's response. We then evaluate the generated skeletons' performance using a GA-based approach to select a subset of high-quality candidates before incrementally merging them via a GP-based cascade procedure that preserves their original skeleton structure. The final multivariate skeletons undergo coefficient optimization via a GA. We evaluated our method on problems with controlled and varying degrees of noise, demonstrating lower or comparable interpolation and extrapolation errors compared to two GP-based methods, three neural SR methods, and a hybrid approach. Unlike them, our approach consistently learned expressions that matched the original mathematical structure. Similarly, our method achieved both a high symbolic solution recovery rate and competitive predictive performance relative to benchmark methods on the Feynman dataset.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.04124v3</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Giorgio Morales, John W. Sheppard</dc:creator>
    </item>
    <item>
      <title>Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings</title>
      <link>https://arxiv.org/abs/2511.05017</link>
      <description>arXiv:2511.05017v2 Announce Type: replace 
Abstract: Hallucinations in Large Vision-Language Models (LVLMs) remain a persistent challenge, often stemming from inadequate integration of visual information during multimodal reasoning. A key cause is the model's over-reliance on textual priors and underutilization of visual cues, leading to outputs that are linguistically fluent but visually inaccurate. For example, given an image of an empty kitchen countertop, an LVLM might hallucinate a "bowl of fruit" or "cup of coffee", relying on language associations rather than visual evidence. Most LVLMs incorporate visual features by appending them to the input stream of a pre-trained LLM and training on large-scale vision-language datasets. Our systematic analysis reveals that this strategy often leads to over-dependence on textual information due to the inherent bias of LLMs towards language-dominant representations. This imbalance skews attention towards the text over visual content, weakening the model's ability to ground outputs in visual inputs. To address this, we propose a simple yet effective visual feature incorporation method that encourages the model to learn visually-informed textual embeddings distinct from those of the base LLM and promotes a more balanced attention distribution. Experimental results across multiple hallucination benchmarks demonstrate that our method significantly reduces hallucinations and fosters more balanced multimodal reasoning. Notably, our approach achieves substantial gains, including +9.33% on MMVP-MLLM, +2.99% on POPE-AOKVQA, up to +3.4% on Merlin, and +3% on the hard-data split of HallusionBench.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.05017v2</guid>
      <category>cs.CV</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aakriti Agrawal, Gouthaman KV, Rohith Aralikatti, Gauri Jagatap, Jiaxin Yuan, Sarvesh Baskar, Vijay Kamarshi, Andrea Fanelli, Furong Huang</dc:creator>
    </item>
    <item>
      <title>SAD-Flower: Flow Matching for Safe, Admissible, and Dynamically Consistent Planning</title>
      <link>https://arxiv.org/abs/2511.05355</link>
      <description>arXiv:2511.05355v3 Announce Type: replace 
Abstract: Flow matching (FM) has shown promising results in data-driven planning. However, it inherently lacks formal guarantees for ensuring state and action constraints, whose satisfaction is a fundamental and crucial requirement for the safety and admissibility of planned trajectories on various systems. Moreover, existing FM planners do not ensure the dynamical consistency, which potentially renders trajectories inexecutable. We address these shortcomings by proposing SAD-Flower, a novel framework for generating Safe, Admissible, and Dynamically consistent trajectories. Our approach relies on an augmentation of the flow with a virtual control input. Thereby, principled guidance can be derived using techniques from nonlinear control theory, providing formal guarantees for state constraints, action constraints, and dynamic consistency. Crucially, SAD-Flower operates without retraining, enabling test-time satisfaction of unseen constraints. Through extensive experiments across several tasks, we demonstrate that SAD-Flower outperforms various generative-model-based baselines in ensuring constraint satisfaction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.05355v3</guid>
      <category>cs.LG</category>
      <category>cs.RO</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tzu-Yuan Huang, Armin Lederer, Dai-Jie Wu, Xiaobing Dai, Sihua Zhang, Hsiu-Chin Lin, Shao-Hua Sun, Stefan Sosnowski, Sandra Hirche</dc:creator>
    </item>
    <item>
      <title>DWM-RO: Decentralized World Models with Reasoning Offloading for SWIPT-enabled Satellite-Terrestrial HetNets</title>
      <link>https://arxiv.org/abs/2511.05972</link>
      <description>arXiv:2511.05972v2 Announce Type: replace 
Abstract: Wireless networks are undergoing a paradigm shift toward massive connectivity with energy-efficient operation, driving the integration of satellite-terrestrial architectures with simultaneous wireless information and power transfer (SWIPT). Optimizing transmit beamforming and power splitting in such systems faces formidable challenges, e.g., time-varying channels and multi-tier interference, which create a complex decision landscape where conventional model-free multi-agent reinforcement learning (MARL) suffers from sample inefficiency due to rarely-encountered state transitions and poor coordination as decentralized agents act independently. This paper proposes the Decentralized World Model with Reasoning Offloading (DWM-RO) framework to address these fundamental limitations. Specifically, each agent employs a world model to learn compact predictive representations of environment dynamics, enabling imagination-based policy training that dramatically reduces required environment interactions. An uncertainty-aware offloading gate monitors local interference levels and model reconstruction errors to trigger selective edge coordination. When activated, a lightweight latent decorrelation mechanism at the edge refines agents' strategic representations, guiding them toward orthogonal actions that minimize resource conflicts. Extensive simulations demonstrate that DWM-RO converges 5 times faster than state-of-the-art baselines while achieving 34.7% higher spectral efficiency and reducing constraint violations by 40%. In dense network scenarios with 10 users, DWM-RO maintains violation rates below 20% while baselines exceed 70%, validating superior robustness.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.05972v2</guid>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Guangyuan Liu, Yinqiu Liu, Ruichen Zhang, Jiawen Kang, Sumei Sun, Abbas Jamalipour, Ping Zhang</dc:creator>
    </item>
    <item>
      <title>UniADC: A Unified Framework for Anomaly Detection and Classification</title>
      <link>https://arxiv.org/abs/2511.06644</link>
      <description>arXiv:2511.06644v3 Announce Type: replace 
Abstract: In this paper, we introduce a novel task termed unified anomaly detection and classification, which aims to simultaneously detect anomalous regions in images and identify their specific categories. Existing methods typically treat anomaly detection and classification as separate tasks, thereby neglecting their inherent correlations and limiting information sharing, which results in suboptimal performance. To address this, we propose UniADC, a model designed to effectively perform both tasks with only a few or even no anomaly images. Specifically, UniADC consists of two key components: a training-free Controllable Inpainting Network and an Implicit-Normal Discriminator. The inpainting network can synthesize anomaly images of specific categories by repainting normal regions guided by anomaly priors, and can also repaint few-shot anomaly samples to augment the available anomaly data. The implicit-normal discriminator addresses the severe challenge of the imbalance between normal and anomalous pixel distributions by implicitly modeling the normal state, achieving precise anomaly detection and classification by aligning fine-grained image features with anomaly-category embeddings. We conduct extensive experiments on four anomaly detection and classification datasets, including MVTec-FS, MTD, WFDD and Real-IAD, and the results demonstrate that UniADC consistently outperforms existing methods in anomaly detection, localization, and classification. The code is available at https://github.com/cnulab/UniADC.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.06644v3</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ximiao Zhang, Min Xu, Zheng Zhang, Yap-Peng Tan, Xiuzhuang Zhou</dc:creator>
    </item>
    <item>
      <title>Automated Attribution Graph Interpretation via Probe Prompting</title>
      <link>https://arxiv.org/abs/2511.07002</link>
      <description>arXiv:2511.07002v2 Announce Type: replace 
Abstract: Even though we know the precise computations that lead from a large language model (LLM) input to its output this computation remains very hard to interpret. One way to make it easier to understand this process is by creating a sparse computational graph that captures most of the model behavior with smallest number of computational nodes. Cross-layer transcoders (CLT) decompose the dense computations of the MLP but the resulting circuits still contain thousands of nodes even for short prompts. Existing automated interpretation methods label individual features from corpus activations, and it often happens that these labels are not validated by causal intervention. We introduce probe prompting, a transparent rule-based pipeline that groups the features of an attribution graph into concept-aligned supernodes from their responses on a small set of concept-targeted probe prompts, summarized as Cross-Prompt Activation Signatures (CPAS). Across four factual domains, on Gemma-2-2B with a public CLT dictionary and 45,596 entity-swap interventions, we find that the labeled supernodes have the predicted steering behavior in every one of them. Code, datasets, and an interactive demo are released anonymously as a reusable harness for calibrating supernode labels against causal interventions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.07002v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Giuseppe Birardi, Gon\c{c}alo Paulo</dc:creator>
    </item>
    <item>
      <title>Learning Quantized Continuous Controllers for Integer Hardware</title>
      <link>https://arxiv.org/abs/2511.07046</link>
      <description>arXiv:2511.07046v4 Announce Type: replace 
Abstract: Deploying continuous-control reinforcement learning policies on embedded hardware requires meeting tight latency and power budgets. Small FPGAs can deliver these, but only if costly floating-point pipelines are avoided. We study quantization-aware training (QAT) of policies for integer inference and we present a learning-to-hardware pipeline that automatically selects low-bit policies and synthesizes them to an Artix-7 FPGA. Across five MuJoCo tasks, we obtain policy networks that are competitive with full precision (FP32) policies but require as few as 3 or even only 2 bits per weight, and per internal activation value, as long as input precision is chosen carefully. On the target hardware, the selected policies achieve inference latencies on the order of microseconds and consume microjoules per action, favorably comparing to a quantized reference. Last, we observe that the quantized policies exhibit increased input noise robustness compared to the floating-point baseline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.07046v4</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Fabian Kresse, Christoph H. Lampert</dc:creator>
    </item>
    <item>
      <title>RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments</title>
      <link>https://arxiv.org/abs/2511.07317</link>
      <description>arXiv:2511.07317v2 Announce Type: replace 
Abstract: We introduce Reinforcement Learning (RL) with Adaptive Verifiable Environments (RLVE), an approach using verifiable environments that procedurally generate problems and provide algorithmically verifiable rewards, to scale up RL for language models (LMs). RLVE enables each verifiable environment to dynamically adapt its problem difficulty distribution to the policy model's capabilities as training progresses. In contrast, static data distributions often lead to vanishing learning signals when problems are either too easy or too hard for the policy. To implement RLVE, we create RLVE-Gym, a large-scale suite of 400 verifiable environments carefully developed through manual environment engineering. Using RLVE-Gym, we show that environment scaling, i.e., expanding the collection of training environments, consistently improves generalizable reasoning capabilities. RLVE with joint training across all 400 environments in RLVE-Gym yields a 3.37% absolute average improvement across six reasoning benchmarks, starting from one of the strongest 1.5B reasoning LMs. By comparison, continuing this LM's original RL training yields only a 0.49% average absolute gain despite using over 3x more compute. We release our code publicly.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.07317v2</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhiyuan Zeng, Hamish Ivison, Yiping Wang, Lifan Yuan, Shuyue Stella Li, Zhuorui Ye, Siting Li, Jacqueline He, Runlong Zhou, Tong Chen, Chenyang Zhao, Yulia Tsvetkov, Simon Shaolei Du, Natasha Jaques, Hao Peng, Pang Wei Koh, Hannaneh Hajishirzi</dc:creator>
    </item>
    <item>
      <title>Decision-Focused Continual Learning for Seaport Power-Logistics Scheduling: Generalization across Varying Tasks</title>
      <link>https://arxiv.org/abs/2511.07938</link>
      <description>arXiv:2511.07938v3 Announce Type: replace 
Abstract: Power-logistics scheduling in modern seaports typically follows a predict-then-optimize pipeline. To enhance the decision quality of predictions, decision-focused learning has been proposed, which aligns the training of forecasting models with downstream decision outcomes. However, this end-to-end design inherently restricts the value of forecasting models to a specific task structure and therefore generalizes poorly to evolving tasks induced by varying vessel arrivals. We address this gap with a decision-focused continual learning framework that adapts online to a stream of scheduling tasks. Specifically, we introduce Fisher-information-based regularization to enhance cross-task generalization by preserving parameters critical to prior tasks. A differentiable convex surrogate is also developed to stabilize gradient backpropagation. The proposed approach enables learning a decision-aligned forecasting model across a varying task stream with sustainable long-term computational and memory requirements. Experiments calibrated to Jurong Port show improved decision performance and cross-task generalization over existing methods, together with reduced computational cost and a bounded memory footprint.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.07938v3</guid>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chuanqing Pu, Feilong Fan, Nengling Tai, Yan Xu, Wentao Huang, Honglin Wen</dc:creator>
    </item>
    <item>
      <title>Beyond Accuracy: Behavioral Dynamics of Agentic Multi-Hunk Repair</title>
      <link>https://arxiv.org/abs/2511.11012</link>
      <description>arXiv:2511.11012v2 Announce Type: replace 
Abstract: Automated program repair has traditionally focused on single-hunk defects, overlooking multi-hunk bugs that are prevalent in real-world systems. Repairing these bugs requires coordinated edits across multiple, disjoint code regions, posing substantially greater challenges. We present the first systematic study of LLM-driven coding agents (Claude Code, Codex, Gemini-cli, and Qwen Code) on this task. We evaluate these four state-of-the-art agents on 404 multi-hunk bugs from the PolyHunk dataset, yielding 1,616 repair trajectories for large-scale behavioral analysis. We employ fine-grained metrics to assess localization, repair accuracy, regression behavior, and operational dynamics across agents. We find that localization capability varies substantially, with Codex achieving the highest success rate (75.3%) and Qwen Code the lowest (40.4%). Repair accuracy also differs widely, ranging from 26.98% (Qwen Code) to 92.82% (Claude Code), and consistently declines with increasing bug dispersion and complexity (hunk divergence and spatial proximity). High-performing agents (Claude Code and Codex) demonstrate superior semantic consistency, achieving positive average regression reduction, whereas lower-performing agents often introduce new test failures. Notably, agents do not fail fast; failed repairs consume substantially more resources (33%-440% more input tokens) and require longer execution time (35%-330%). Additionally, we developed Maple to provide agents with repository-level context. Empirical results show that Maple improves repair accuracy of Gemini-cli by ~21% through enhanced localization. By analyzing fine-grained metrics and trajectory-level analysis, this study moves beyond accuracy to explain how coding agents localize, reason, and act during multi-hunk repair. Our findings underscore the impact of bug divergence and spatial proximity on multi-hunk repair success for coding agents.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.11012v2</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Noor Nashid, Daniel Ding, Keheliya Gallaba, Ahmed E. Hassan, Ali Mesbah</dc:creator>
    </item>
    <item>
      <title>Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB</title>
      <link>https://arxiv.org/abs/2511.11041</link>
      <description>arXiv:2511.11041v2 Announce Type: replace 
Abstract: We find that current sentence-embedding models produce outputs with a consistent bias: every embedding $e$ decomposes as $\tilde e + \mu$, where the mean $\mu$ is near-identical across all sentences. We study two training-free corrections -- subtracting $\mu$ directly (R1), or projecting each embedding off the mean direction (R2) -- and show, via a first-order error-propagation argument, that R2 cancels the parallel component of mean-estimation error that R1 retains. Across 38 models on the Massive Multilingual Text Embedding Benchmark (MMTEB)~\citep{MMTEB}, R2 yields consistent classification gains (paired $\bar t = 3.31$, 29 of 38 models with $t&gt;2$, zero losses), and the per-model mean norm $\Vert\mu\Vert$ correlates with which models benefit most. A nine-method dose-response ablation on five models further reveals that mild single-direction removal helps, but full principal component analysis (PCA) whitening hurts every model we test, and that R2 and All-but-the-Top with depth one agree within $0.18$ pp downstream despite weak geometric alignment between $\hat\mu$ and the centered top principal component.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.11041v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xingyu Ren, Youran Sun, Haoyu Liang</dc:creator>
    </item>
    <item>
      <title>SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM</title>
      <link>https://arxiv.org/abs/2511.14143</link>
      <description>arXiv:2511.14143v2 Announce Type: replace 
Abstract: Video Moment Retrieval is a task in video understanding that aims to localize a specific temporal segment in an untrimmed video based on a natural language query. Despite recent progress in moment retrieval from videos using both traditional techniques and Multimodal Large Language Models (MLLM), most existing methods still rely on coarse temporal understanding and a single visual modality, limiting performance on complex videos. To address this, we introduce \textit{S}hot-aware \textit{M}ultimodal \textit{A}udio-enhanced \textit{R}etrieval of \textit{T}emporal \textit{S}egments (SMART), an MLLM-based framework that integrates audio cues and leverages shot-level temporal structure. SMART enriches multimodal representations by combining audio and visual features while applying \textbf{Shot-aware Token Compression}, which selectively retains high-information tokens within each shot to reduce redundancy and preserve fine-grained temporal details. We also refine prompt design to better utilize audio-visual cues. Evaluations on Charades-STA and QVHighlights show that SMART achieves significant improvements over state-of-the-art methods, including a 1.61\% increase in R1@0.5 and 2.59\% gain in R1@0.7 on Charades-STA.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.14143v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>An Yu, Weiheng Lu, Jian Li, Zhenfei Zhang, Yunhang Shen, Felix X. -F. Ye, Ming-Ching Chang</dc:creator>
    </item>
    <item>
      <title>QuickLAP: Quick Language-Action Preference Learning for Semi-Autonomous Agents</title>
      <link>https://arxiv.org/abs/2511.17855</link>
      <description>arXiv:2511.17855v5 Announce Type: replace 
Abstract: Robots must learn from both what people do and what they say, but either modality alone is often incomplete: physical corrections are grounded but ambiguous in intent, while language expresses high-level goals but lacks physical grounding. We introduce QuickLAP: Quick Language-Action Preference learning, a Bayesian framework that fuses physical and language feedback to infer reward functions in real time. Our key insight is to treat language as a probabilistic observation over the user's latent preferences, clarifying which reward features matter and how physical corrections should be interpreted. QuickLAP uses Large Language Models (LLMs) to extract reward feature attention masks and preference shifts from free-form utterances, which it integrates with physical feedback in a closed-form update rule. This enables fast, real-time, and robust reward learning that handles ambiguous feedback. In a semi-autonomous driving simulator, QuickLAP reduces reward learning error by over 70% compared to physical-only and heuristic multimodal baselines. A 15-participant user study further validates our approach: participants found QuickLAP significantly more understandable and collaborative, and preferred its learned behavior over baselines. Code is available at https://github.com/MIT-CLEAR-Lab/QuickLAP.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.17855v5</guid>
      <category>cs.AI</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jordan Abi Nader, David Lee, Nathaniel Dennler, Andreea Bobu</dc:creator>
    </item>
    <item>
      <title>DHAuDS: A Dynamic and Heterogeneous Audio Benchmark for Test-Time Adaptation</title>
      <link>https://arxiv.org/abs/2511.18421</link>
      <description>arXiv:2511.18421v2 Announce Type: replace 
Abstract: Existing Test-time Adaptation (TTA) studies rely heavily on static and homogeneous corruption protocols, such as ImageNet-C and CIFAR-10-C/100-C, leading to inconsistent evaluation settings and potentially inflated robustness estimates that are compared with real-world situations. TTA lacks a standardized evaluation infrastructure capable of modeling realistic heterogeneous acoustic degradation. We introduce DHAuDS, a standardized benchmark suite for evaluating audio classification TTA robustness under dynamic corruption severity and heterogeneous noise mixtures. Rather than proposing a new TTA algorithm, DHAuDS focuses on exposing robustness limitations that remain hidden under conventional fixed-noise evaluation protocols.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.18421v2</guid>
      <category>cs.SD</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Weichuang Shao, Iman Yi Liao, Tomas Henrique Bode Maul, Tissa Chandesa</dc:creator>
    </item>
    <item>
      <title>AttnRegDeepLab: A Two-Stage Decoupled Framework for Interpretable Embryo Fragmentation Grading</title>
      <link>https://arxiv.org/abs/2511.18454</link>
      <description>arXiv:2511.18454v4 Announce Type: replace 
Abstract: Assessing embryo fragmentation is crucial for predicting IVF success, yet manual grading is prone to subjectivity, and existing AI models struggle with clinical interpretability and segmentation errors. We propose AttnRegDeepLab, a Multi-Task Learning (MTL) framework designed to solve these challenges. The model enhances a DeepLabV3+ decoder with Attention Gates to filter out cytoplasmic noise and retain sharp contour details. It also introduces a Multi-Scale Regression Head with Feature Injection, guiding the segmentation process with global grading priors to eliminate systematic area estimation errors. Based on a two-stage decoupled training strategy and a range-based loss for weakly labeled data, our method resolves MTL gradient conflicts. AttnRegDeepLab yields high grading precision and excellent segmentation quality (Dice coefficient = 0.729), avoiding the trade-off between contour integrity and grading accuracy seen under standard joint optimization. This provides a reliable, clinically interpretable tool balancing visual and quantitative accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.18454v4</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ming-Jhe Lee, Chang-Hong Wu, Jung-Hua Wang, Ming-Jer Chen, Yu-Chiao Yi, Tsung-Hsien Lee</dc:creator>
    </item>
    <item>
      <title>MedVision: Benchmarking Quantitative Medical Image Analysis</title>
      <link>https://arxiv.org/abs/2511.18676</link>
      <description>arXiv:2511.18676v2 Announce Type: replace 
Abstract: Current vision-language models (VLMs) in medicine are primarily designed for categorical question answering (e.g., "Is this normal or abnormal?") or qualitative descriptive tasks. However, clinical decision-making often relies on quantitative assessments, such as measuring the size of a tumor or the angle of a joint, from which physicians draw their own diagnostic conclusions. This quantitative reasoning capability remains underexplored and poorly supported in existing VLMs. In this work, we introduce MedVision, a large-scale dataset and benchmark specifically designed to evaluate and improve VLMs on quantitative medical image analysis. MedVision spans 22 public datasets covering diverse anatomies and modalities, with 30.8 million image-annotation pairs. We focus on three representative quantitative tasks: (1) detection of anatomical structures and abnormalities, (2) tumor/lesion (T/L) size estimation, and (3) angle/distance (A/D) measurement. We show that current off-the-shelf VLMs perform poorly on these tasks. However, supervised and reinforcement fine-tuning on MedVision significantly enhances performance across detection, T/L estimation, and A/D measurement. MedVision provides a foundation for developing VLMs with robust quantitative reasoning capabilities in medical imaging.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.18676v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yongcheng Yao, Yongshuo Zong, Raman Dutt, Yongxin Yang, Sotirios A Tsaftaris, Timothy Hospedales</dc:creator>
    </item>
    <item>
      <title>Knowing How to Edit: Reliable Evaluation Signals for Diagnosing and Optimizing Prompts at Query Level</title>
      <link>https://arxiv.org/abs/2511.19829</link>
      <description>arXiv:2511.19829v3 Announce Type: replace 
Abstract: Prompt optimization has become a central mechanism for eliciting strong performance from LLMs, and recent work has made substantial progress by proposing diverse prompt evaluation metrics and optimization strategies. Despite these advances, prompt evaluation and prompt optimization are often developed in isolation, limiting the extent to which evaluation can effectively inform prompt refinement. In this work, we study prompt optimization as a process guided by performance-relevant evaluation signals. To address the disconnect between evaluation and optimization, we propose an evaluation-instructed prompt optimization approach that explicitly connects prompt evaluation with query-dependent optimization. Our method integrates multiple complementary prompt quality metrics into a performance-reflective evaluation framework and trains an execution-free evaluator that predicts prompt quality directly from text, avoiding repeated model executions. These evaluation signals then guide prompt refinement in a targeted and interpretable manner. Empirically, the proposed evaluator achieves 83.7% accuracy in predicting prompt performance. When incorporated into the optimization process, our approach consistently outperforms existing optimization baselines across eight benchmark datasets and three different backbone LLMs. Overall, our results demonstrate that reliable and efficient evaluation signals can serve as an effective foundation for robust and interpretable prompt optimization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.19829v3</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ke Chen, Yifeng Wang, Hassan Almosapeeh, Haohan Wang</dc:creator>
    </item>
    <item>
      <title>Model-Based Learning of Whittle indices</title>
      <link>https://arxiv.org/abs/2511.20397</link>
      <description>arXiv:2511.20397v2 Announce Type: replace 
Abstract: We present BLINQ, a new model-based algorithm that learns the Whittle indices of an indexable, communicating and unichain Markov Decision Process (MDP). Our approach relies on building an empirical estimate of the MDP and then computing its Whittle indices using an extended version of a state-of-the-art existing algorithm. We provide a proof of convergence to the Whittle indices we want to learn as well as a bound on the time needed to learn them with arbitrary precision. Moreover, we investigate its computational complexity. Our numerical experiments suggest that BLINQ significantly outperforms existing Q-learning approaches in terms of the number of samples needed to get an accurate approximation. In addition, it has a total computational cost even lower than Q-learning for any reasonably high number of samples. These observations persist even when the Q-learning algorithms are speeded up using neural networks to predict Q-values.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.20397v2</guid>
      <category>cs.LG</category>
      <category>cs.DS</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Jo\"el Charles-Rebuff\'e, Nicolas Gast, Bruno Gaujal</dc:creator>
    </item>
    <item>
      <title>DRS-OSS: A Diff-Risk Scoring Tool for Continuous Integration Workflows</title>
      <link>https://arxiv.org/abs/2511.21964</link>
      <description>arXiv:2511.21964v3 Announce Type: replace 
Abstract: Software teams need change-risk scores that can guide continuous integration decisions such as review prioritization, test scheduling, and downstream validation before risky changes are merged or released. However, open-source teams often lack deployable tools for surfacing these risk signals in everyday CI workflows. We present DRS-OSS, an open-source diff-risk scoring tool for continuous integration workflows. DRS-OSS is designed as a deployable and customizable pipeline rather than as a standalone prediction model. It combines a REST API gateway, containerized model services, a developer dashboard, GitHub integration, and a replication package that lets users retrain or replace the backend with other transformer models. The bundled workflow combines commit messages, commit diffs, and change metrics in a single risk-prediction pipeline. The default packaged backend uses a Llama 3.1 8B sequence classifier configured for long diffs. Its training recipe uses parameter-efficient tuning, quantization, CPU offloading, and customization helper scripts so that it can be adapted on modest hardware. We compare DRS-OSS with similar tools and evaluate the bundled classifier on ApacheJIT, where it reaches an ROC-AUC of 0.895 and outperforms prior baselines. From a user-feedback perspective, DRS-OSS has received interest from Uber, Duolingo, and Microsoft in adapting the workflow to their own continuous integration settings. The full tool is released with source code, customization scripts, deployment artifacts, a public repository, a live demo at worldofcode.org/drs, and a demonstration video at youtube.com/watch?v=2FzeRRdNaco.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.21964v3</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ali Sayedsalehi, Peter C. Rigby, Audris Mockus</dc:creator>
    </item>
    <item>
      <title>Power System Robust State Estimation As a Layer: An Optimization-embedded End-to-end Learning Approach</title>
      <link>https://arxiv.org/abs/2511.22836</link>
      <description>arXiv:2511.22836v2 Announce Type: replace 
Abstract: Serving as an essential prerequisite for modern power system operation, robust state estimation (RSE) could effectively resist noises and outliers in measurements. The emerging neural network (NN) based end-to-end (E2E) learning framework enables real-time application of RSE but potentially yields solutions that are statistically accurate yet physically inconsistent. To bridge this gap, this work proposes a novel E2E learning based RSE framework, where the convex-relaxed RSE problem is innovatively constructed as an explicit differentiable layer into an NN as the first trial. This optimization-embedded layer (termed as `Opt-Layer` in our work) serves as a solver of the RSE problem. Then, the relaxed solutions are recovered through post-processing layers. Through seamlessly embedding the underlying KKT conditions into the gradients during backward propagation, the physical consistency in the estimated states could be significantly enhanced, realizing lower measurement residuals. Also, the measurement weights are treated as learnable parameters of NN to enhance estimation robustness, enabling the Opt-Layer to actively denoise. A hybrid loss function is formulated to pursue accurate and physically consistent solutions. Extensive simulations have been carried out to demonstrate that the proposed framework can significantly improve the SE performance especially in terms of physical consistency on eight test systems, in comparison to classical E2E learning models, physics-informed NN (PINN) models, graph-based learning models, and conventional optimization-based approaches. The estimation performances under partial observability, severe noise contamination are systematically evaluated. Computational complexity and runtime analysis are also comprehensively demonstrated.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.22836v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yibo Ding, Wenzhuo Shi, Mengzhao Duan, Yuhong Zhao, Jiaqi Ruan, Jian Zhao, Zhao Xu</dc:creator>
    </item>
    <item>
      <title>Iterative convergence in phase-field brittle fracture computations: exact line search is all you need</title>
      <link>https://arxiv.org/abs/2511.23064</link>
      <description>arXiv:2511.23064v2 Announce Type: replace 
Abstract: Variational phase-field models of brittle fracture pose a local constrained minimization problem of a non-convex energy functional. In the discrete setting, the problem is most often solved by alternate minimization, exploiting the separate convexity of the energy with respect to the two unknowns. This approach is theoretically guaranteed to converge, provided each of the individual subproblems is solved successfully. However, strong non-linearities of the energy functional may lead to failure of iterative convergence within one or both subproblems. We analyze and visualize the energy along Newton directions to illustrate why Newton's method without line search fails. Motivated by this, we propose to employ an exact line search algorithm based on bisection, which (under certain conditions) can guarantee global convergence of Newton's method for each subproblem and consequently the successful determination of critical points of the energy through the alternate minimization scheme. Through several benchmark tests computed with various strain energy decompositions and two strategies for the enforcement of the irreversibility constraint in two and three dimensions, we demonstrate the robustness of the approach and assess its efficiency in comparison with other commonly used line search algorithms. With the outlined approach, we are able to compute the especially demanding Brazilian test featuring contact in 3D with the star-convex model.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.23064v2</guid>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1007/s00466-026-02779-6</arxiv:DOI>
      <arxiv:journal_reference>Computational Mechanics (2026)</arxiv:journal_reference>
      <dc:creator>Jonas Heinzmann, Francesco Vicentini, Pietro Carrara, Laura De Lorenzis</dc:creator>
    </item>
    <item>
      <title>Self-Supervised Dynamical System Representations for Physiological Time-Series</title>
      <link>https://arxiv.org/abs/2512.00239</link>
      <description>arXiv:2512.00239v2 Announce Type: replace 
Abstract: The effectiveness of self-supervised learning (SSL) for physiological time series depends on the ability of a pretraining objective to preserve information about the underlying physiological state while filtering out unrelated noise. However, existing strategies are limited due to reliance on heuristic principles or poorly constrained generative tasks. To address this limitation, we propose a pretraining framework that exploits the information structure of a dynamical systems generative model across multiple time-series. This framework reveals our key insight that class identity can be efficiently captured by extracting information about the generative variables related to the system parameters shared across similar time series samples, while noise unique to individual samples should be discarded. Building on this insight, we propose PULSE, a cross-reconstruction-based pretraining objective for physiological time series datasets that explicitly extracts system information while discarding non-transferrable sample-specific ones. We establish theory that provides sufficient conditions for the system information to be recovered, and empirically validate it using a synthetic dynamical systems experiment. Furthermore, we apply our method to diverse real-world datasets, demonstrating that PULSE learns representations that can broadly distinguish semantic classes, increase label efficiency, and improve transfer learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.00239v2</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Yenho Chen, Maxwell A. Xu, James M. Rehg, Christopher J. Rozell</dc:creator>
    </item>
    <item>
      <title>Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control</title>
      <link>https://arxiv.org/abs/2512.01467</link>
      <description>arXiv:2512.01467v2 Announce Type: replace 
Abstract: Controlling autonomous systems under real-world conditions often requires policies that can be evaluated with low latency and minimal energy consumption. Unfortunately, these conditions are at odds with the use of high-precision deep neural networks as controllers. In this work, we introduce Differentiable Weightless Controllers (DWCs), a symbolic-differentiable architecture that learns flexible, non-linear, yet highly efficient control policies. DWCs can be trained end-to-end via gradient-based techniques, yet compile directly into FPGA-compatible circuits with few- or even single-clock-cycle latency and nanojoule-level energy cost per action. Across five MuJoCo benchmarks, including high-dimensional Humanoid, DWCs achieve returns competitive with standard deep policies (full-precision or quantized neural networks). Furthermore, DWCs exhibit structurally sparse and interpretable connectivity patterns, enabling direct inspection of which input values influence control decisions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.01467v2</guid>
      <category>cs.LG</category>
      <category>cs.AR</category>
      <category>cs.SC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Fabian Kresse, Christoph H. Lampert</dc:creator>
    </item>
    <item>
      <title>SVRG and Beyond via Posterior Correction</title>
      <link>https://arxiv.org/abs/2512.01930</link>
      <description>arXiv:2512.01930v2 Announce Type: replace 
Abstract: Stochastic Variance Reduced Gradient (SVRG) and its variants aim to speed-up training by using gradient corrections. Originally proposed over a decade ago, these methods have never been connected to any Bayesian method at a fundamental level. Here, we fill this gap and derive surprising new connections of SVRG to a recently proposed Bayesian method called `posterior correction'. Our main contribution is to show that SVRG can be recovered as a special case of posterior-correction over isotropic-Gaussian posteriors. Novel extensions of SVRG are automatically obtained by using more flexible exponential-family posteriors. We derive two new such extensions by using Gaussian families: a Newton-like variant with novel Hessian corrections, and an Adam-like extension that scales to large problems. Our work is the first to connect SVRG to Bayes and use it to speed-up training.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.01930v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nico Daheim, Thomas M\"ollenhoff, Ming Liang Ang, Mohammad Emtiyaz Khan</dc:creator>
    </item>
    <item>
      <title>Tuning for TraceTarnish: Techniques, Trends, and Testing Tangible Traits</title>
      <link>https://arxiv.org/abs/2512.03465</link>
      <description>arXiv:2512.03465v5 Announce Type: replace 
Abstract: In this study, we more rigorously evaluated our attack script $\textit{TraceTarnish}$, which leverages adversarial stylometry principles to anonymize the authorship of text-based messages. To ensure the efficacy and utility of our attack, we sourced, processed, and analyzed Reddit comments -- comments that were later alchemized into $\textit{TraceTarnish}$ data -- to gain valuable insights. The transformed $\textit{TraceTarnish}$ data was then further augmented by $\textit{StyloMetrix}$ to manufacture stylometric features -- features that were culled using the Information Gain criterion, leaving only the most informative, predictive, and discriminative ones. Our results found that function words and function word types ($L\_FUNC\_A$ $\&amp;$ $L\_FUNC\_T$); content words and content word types ($L\_CONT\_A$ $\&amp;$ $L\_CONT\_T$); and the Type-Token Ratio ($ST\_TYPE\_TOKEN\_RATIO\_LEMMAS$) yielded significant Information-Gain readings. The identified stylometric cues -- function-word frequencies, content-word distributions, and the Type-Token Ratio -- serve as reliable indicators of compromise (IoCs), revealing when a text has been deliberately altered to mask its true author. Similarly, these features could function as forensic beacons, alerting defenders to the presence of an adversarial stylometry attack; granted, in the absence of the original message, this signal may go largely unnoticed, as it appears to depend on a pre- and post-transformation comparison. "In trying to erase a trace, you often imprint a larger one." Armed with this understanding, we framed $\textit{TraceTarnish}$'s operations and outputs around these five isolated features, using them to conceptualize and implement enhancements that further strengthen the attack.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.03465v5</guid>
      <category>cs.CR</category>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Robert Dilworth</dc:creator>
    </item>
    <item>
      <title>STGBD-Net: Spatio-temporal Gradient Basis Decomposition Network for Infrared Small Target Detection</title>
      <link>https://arxiv.org/abs/2512.03470</link>
      <description>arXiv:2512.03470v5 Announce Type: replace 
Abstract: A key challenge in infrared small target detection (IRSTD) is that weak target signal responses are easily obscured by strong background clutter, frequently resulting in missed detections. While traditional gradient-based methods attempt to capture fine details, their robustness is limited by the static fusion of multi-directional gradient features. In this paper, we rethink feature fusion from the perspective of Basis Decomposition Theory and propose a novel framework that reformulates the process into an explicit and adaptive decomposition-and-reconstruction paradigm. Specifically, we introduce the Basis Decomposition Module (BDM) and its specialized variant, the Gradient Decomposition Module (GDM) for IRSTD. GDMs treat the normalized gradient features as basis vectors to reconstruct a new feature, thereby maintaining detailed structures and highlighting infrared small targets. By integrating GDMs into a lightweight three-stage U-Net, we develop two unified architectures: the Spatial Gradient Basis Decomposition Network for single-frame detection and the Spatio-temporal Gradient Basis Decomposition Network for multi-frame scenarios. Extensive experiments demonstrate that our networks achieve state-of-the-art (SOTA) performance across multiple benchmarks, offering a superior balance between detection accuracy and computational efficiency. Our codes will be made public at: https://github.com/greekinRoma/IRSTD_HC_Platform.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.03470v5</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chen Hu, Mingyu Zhou, Shuai Yuan, Hongbo Hu, Zhenming Peng, Tian Pu, Xiying Li</dc:creator>
    </item>
    <item>
      <title>Observation-driven correction of numerical weather prediction for marine winds</title>
      <link>https://arxiv.org/abs/2512.03606</link>
      <description>arXiv:2512.03606v2 Announce Type: replace 
Abstract: Accurate marine wind forecasts are essential for safe navigation, ship routing, and energy operations, yet they remain challenging because observations over the ocean are sparse, heterogeneous, and temporally variable. We present an observation-informed correction approach for global numerical weather prediction (NWP) of marine winds. Rather than forecasting winds directly, we learn local correction patterns by assimilating the latest in-situ observations to adjust the Global Forecast System (GFS) output. We propose ORCA (Observation-informed Real-time Correction with Attention), a transformer-based deep learning architecture that (i) handles irregular and time-varying observation sets through masking and set-based attention mechanisms, (ii) conditions predictions on recent observation--forecast pairs via cross-attention, and (iii) employs cyclical time embeddings and coordinate-aware location representations to enable single-pass inference at arbitrary spatial coordinates. We evaluate ORCA over the Atlantic Ocean using observations from the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) as reference. ORCA reduces GFS 10-meter wind error at all lead times up to 48 hours, achieving 45% improvement at 1-hour lead time and 13% improvement at 48-hour lead time. Spatial analyses reveal the most persistent improvements along coastlines and shipping routes, where observations are most abundant. The tokenized architecture naturally accommodates heterogeneous observing platforms (ships, buoys, tide gauges, and coastal stations) and produces both site-specific predictions and basin-scale gridded products in a single forward pass. These results demonstrate a practical, low-latency post-processing approach that complements NWP by learning to correct systematic forecast errors.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.03606v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Matteo Peduto, Qidong Yang, Jonathan Giezendanner, Devis Tuia, Sherrie Wang</dc:creator>
    </item>
    <item>
      <title>CaFTRA: Frequency-Domain Correlation-Aware Feedback-Free MIMO Transmission and Resource Allocation for 6G and Beyond</title>
      <link>https://arxiv.org/abs/2512.03767</link>
      <description>arXiv:2512.03767v3 Announce Type: replace 
Abstract: The fundamental designs of wireless systems toward AI-Native 6G and beyond are driven by the need for ever-increasing demand of mobile data traffic, extreme spectral efficiency, and adaptability across diverse service scenarios. To overcome the limitations posed by feedback-based multiple-input and multiple-output (MIMO) transmission, we propose a novel frequency-domain Correlation-aware Feedback-free MIMO Transmission and Resource Allocation (CaFTRA) framework tailored for fully-decoupled radio access networks (FD-RAN) to meet the emerging requirements of AI-Native 6G and beyond. By leveraging artificial intelligence (AI), CaFTRA effectively eliminates real-time uplink feedback by predicting channel state information (CSI) based solely on user geolocation. We introduce a Learnable Queries-driven Transformer Network for CSI mapping from user geolocation, which utilizes multi-head attention and learnable query embeddings to accurately capture frequency-domain correlations among resource blocks (RBs), thereby significantly improving the precision of CSI prediction. Once base stations (BSs) adopt feedback-free transmission, their downlink transmission coverage can be significantly expanded due to the elimination of frequent uplink feedback. To enable efficient resource scheduling under such extensive-coverage scenarios, we apply a low-complexity many-to-one matching theory-based algorithm for efficient multi-BS association and multi-RB resource allocation, which is proven to converge to a stable matching within limited iterations. Simulation results demonstrate that CaFTRA achieves stable matching convergence and significant gains in spectral efficiency and user fairness compared to 5G, underscoring its potential value for 6G standardization efforts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.03767v3</guid>
      <category>eess.SY</category>
      <category>cs.IT</category>
      <category>cs.SY</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1109/TMC.2026.3696984</arxiv:DOI>
      <arxiv:journal_reference>IEEE Transactions on Mobile Computing, 2026</arxiv:journal_reference>
      <dc:creator>Bo Qian, Hanlin Wu, Jiacheng Chen, Yunting Xu, Xiaoyu Wang, Haibo Zhou, Yusheng Ji</dc:creator>
    </item>
    <item>
      <title>Distinguishing Imitation Error from Intrinsic Motion Learning Difficulty</title>
      <link>https://arxiv.org/abs/2512.07248</link>
      <description>arXiv:2512.07248v2 Announce Type: replace 
Abstract: Physics-based motion imitation is central to humanoid control, yet current evaluation metrics (e.g., MPJPE) only quantify imitation outcomes, not their underlying causes. This conflation obscures a critical diagnostic question: when imitation error occurs, does it stem from policy limitations or the intrinsic learning difficulty of the target motion? To resolve this ambiguity, we propose the Torque Variation Score (TVS), a physics-grounded metric that quantifies the inherent learning difficulty of a motion independently of any policy's performance. TVS measures the magnitude of torque variation required to correct small pose perturbations, directly capturing how dynamical properties shape the reinforcement learning landscape. We establish that high-TV motions induce flat reward landscapes and vanishing policy gradients, explaining persistent imitation failures. Extensive experiments with state-of-the-art methods (UHC, PHC+) confirm TVS strongly correlates with imitation error and enables principled error attribution: high error on low-TV motions indicates policy deficiency, while high error on high-TV motions reflects fundamental learning constraints. Beyond error diagnosis, TVS facilitates three practical applications: Maximum Imitable Difficulty (MID) for policy capability assessment, Difficulty-Stratified Joint Error (DSJE) for granular performance profiling, and Flawed Motion Detection for identifying segments with abnormally high learning difficulty to support mocap data curation and quality control. TVS provides a rigorous lens to distinguish policy-induced errors from motion-inherent challenges and enhances motion dataset reliability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.07248v2</guid>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhaorui Meng, Lu Yin, Xinrui Chen, Chengxu Zuo, Anjun Chen, Shihui Guo, Yipeng Qin</dc:creator>
    </item>
    <item>
      <title>MultiAPI Spoof: A Multi-API Dataset and Local-Attention Network for Speech Anti-spoofing Detection</title>
      <link>https://arxiv.org/abs/2512.07352</link>
      <description>arXiv:2512.07352v4 Announce Type: replace 
Abstract: Existing speech anti-spoofing benchmarks rely on a narrow set of public models, creating a substantial gap from real-world scenarios in which commercial systems employ diverse, often proprietary APIs. To address this issue, we introduce MultiAPI Spoof, a multi-API audio anti-spoofing dataset comprising about 230 hours of synthetic speech generated by 30 distinct APIs, including commercial services, open-source models, and online platforms. Furthermore, we propose Nes2Net-LA, a local-attention enhanced variant of Nes2Net that improves local context modeling and fine-grained spoofing feature extraction. Based on this dataset, we also define the API tracing task, enabling fine-grained attribution of spoofed audio to its generation source. Experiments show that Nes2Net-LA achieves state-of-the-art performance and offers superior robustness, particularly under diverse and unseen spoofing conditions. Code \footnote{https://github.com/XuepingZhang/MultiAPI-Spoof} and dataset \footnote{https://xuepingzhang.github.io/MultiAPI-Spoof-Dataset/} have been released.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.07352v4</guid>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Xueping Zhang, Zhenshan Zhang, Yechen Wang, Linxi Li, Liwei Jin, Ming Li</dc:creator>
    </item>
    <item>
      <title>A Geometric Unification of Concept Learning with Concept Cones</title>
      <link>https://arxiv.org/abs/2512.07355</link>
      <description>arXiv:2512.07355v2 Announce Type: replace 
Abstract: Two traditions of interpretability have evolved side by side but seldom spoken to each other: Concept Bottleneck Models (CBMs), which prescribe what a concept should be, and Sparse Autoencoders (SAEs), which discover what concepts emerge. While CBMs use supervision to align activations with human-labeled concepts, SAEs rely on sparse coding to uncover emergent ones. We show that both paradigms instantiate the same geometric structure: each learns a set of linear directions in activation space whose nonnegative combinations form a concept cone. Supervised and unsupervised methods thus differ not in kind but in how they select this cone. Building on this view, we propose an operational bridge between the two paradigms. CBMs provide human-defined reference geometries, while SAEs can be evaluated by how well their learned cones approximate or contain those of CBMs. This containment framework yields quantitative metrics linking inductive biases -- such as SAE type, sparsity, or expansion ratio -- to emergence of plausible\footnote{We adopt the terminology of \citet{jacovi2020towards}, who distinguish between faithful explanations (accurately reflecting model computations) and plausible explanations (aligning with human intuition and domain knowledge). CBM concepts are plausible by construction -- selected or annotated by humans -- though not necessarily faithful to the true latent factors that organise the data manifold.} concepts. Using these metrics, we uncover a ``sweet spot'' in both sparsity and expansion factor that maximizes both geometric and semantic alignment with CBM concepts. Overall, our work unifies supervised and unsupervised concept discovery through a shared geometric framework, providing principled metrics to measure SAE progress and assess how well discovered concept align with plausible human concepts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.07355v2</guid>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alexandre Rocchi, Thomas Fel, Gianni Franchi</dc:creator>
    </item>
    <item>
      <title>DIJIT: A Robotic Head for an Active Observer</title>
      <link>https://arxiv.org/abs/2512.07998</link>
      <description>arXiv:2512.07998v2 Announce Type: replace 
Abstract: We present DIJIT, a novel binocular robotic head expressly designed for mobile agents that behave as active observers. DIJIT's unique breadth of functionality enables active vision research and the study of human-like eye and head-neck motions, their interrelationships, and how each contributes to visual ability. DIJIT is also being used to explore the differences between how human vision employs eye/head movements to solve visual tasks and current computer vision methods. DIJIT's design features nine mechanical degrees of freedom, while the cameras and lenses provide an additional four optical degrees of freedom. The ranges and speeds of the mechanical design are comparable to human performance. DIJIT attains 85\% of the peak human saccade speed. Our design includes the ranges of motion required for convergent stereo, namely, vergence, version, and cyclotorsion. Here, we present DIJIT and some aspects of its performance. We also present a novel method for saccadic camera movements, using a direct relationship between camera orientation and motor values. The resulting saccadic camera movements are close to human movements in terms of their accuracy, with 1.17$^\circ$ and 1.14$^\circ$ mean error for the left and right cameras, respectively.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.07998v2</guid>
      <category>cs.RO</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1109/LRA.2026.3682980</arxiv:DOI>
      <arxiv:journal_reference>IEEE Robotics and Automation Letters, Vol. 11, No. 6, pp. 7038-7045, June 2026</arxiv:journal_reference>
      <dc:creator>Mostafa Kamali Tabrizi, Mingshi Chi, Bir Bikram Dey, Kelly Yuan, Markus D. Solbach, Yiqian Liu, Michael Jenkin, John K. Tsotsos</dc:creator>
    </item>
    <item>
      <title>Developing Distance-Aware Physics-Constrained Probabilistic Frameworks for Industrial Prognostics</title>
      <link>https://arxiv.org/abs/2512.08499</link>
      <description>arXiv:2512.08499v3 Announce Type: replace 
Abstract: Development of reliable and physically interpretable probabilistic frameworks for industrial prognostics remain nascent, and existing literature is often insensitive as inputs move away from the training manifold. In this paper, we develop two sampling-free, distance-aware physics-constrained probabilistic frameworks: (i) PC-SNGP and (ii) PC-SNER. Both apply spectral normalization to hidden layer weights, enforcing bi-Lipschitz distance-preserving representation from the input to the latent space. PC-SNGP replaces the dense output with Gaussian process whose posterior variance increases with input distance from the training manifold. PC-SNER modifies the output layer to predict Normal-Inverse-Gamma~(NIG) parameters for distance preserving estimation. To maintain balance between data fidelity and physical consistency during training, we introduce a dynamic weighting strategy for the physics-constrained loss. We also introduce a distance-aware-coefficient~(DAC) metric to quantify sensitivity to distributional shifts. Empirically, we validate both frameworks on rolling-element-bearings (REBs) prognostics using the PRONOSTIA, XJTU-SY, and HUST benchmark datasets. Experimental results demonstrate improved prediction accuracy and well-calibrated uncertainty estimates relative to competing baselines, while maintaining auditable performance in cross-validation and robustness under extreme adversarial perturbations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.08499v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Waleed Razzaq, Yun-Bo Zhao</dc:creator>
    </item>
    <item>
      <title>Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search</title>
      <link>https://arxiv.org/abs/2512.08724</link>
      <description>arXiv:2512.08724v3 Announce Type: replace 
Abstract: Text-to-image (TTI) diffusion models have achieved remarkable visual quality, yet they have been repeatedly shown to exhibit social biases across sensitive attributes such as gender, race and age. To mitigate these biases, existing approaches frequently depend on curated prompt datasets - either manually constructed or generated with large language models (LLMs) - as part of their training and/or evaluation procedures. Beside the curation cost, this also risks overlooking unanticipated, less obvious prompts that trigger biased generation, even in models that have undergone debiasing. In this work, we introduce Bias-Guided Prompt Search (BGPS), a framework that automatically generates prompts that aim to maximize the presence of biases in the resulting images. BGPS comprises two components: (1) an LLM instructed to produce attribute-neutral prompts and (2) attribute classifiers acting on the TTI's internal representations that steer the decoding process of the LLM toward regions of the prompt space that amplify the image attributes of interest. We conduct extensive experiments on Stable Diffusion 1.5 and a state-of-the-art debiased model and discover an array of subtle and previously undocumented biases that severely deteriorate fairness metrics. Crucially, the discovered prompts are interpretable, i.e they may be entered by a typical user, quantitatively improving the perplexity metric compared to a prominent hard prompt optimization counterpart. Our findings uncover TTI vulnerabilities, while BGPS expands the bias search space and can act as a new evaluation tool for bias mitigation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.08724v3</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Manos Plitsis, Giorgos Bouritsas, Vassilis Katsouros, Yannis Panagakis</dc:creator>
    </item>
    <item>
      <title>Token Sample Complexity of Attention</title>
      <link>https://arxiv.org/abs/2512.10656</link>
      <description>arXiv:2512.10656v3 Announce Type: replace 
Abstract: As context windows in large language models continue to expand, it is essential to characterize how attention behaves at extreme sequence lengths. We introduce token sample complexity: the rate at which attention computed on $n$ tokens converges to its infinite-token limit. We estimate finite-$n$ convergence bounds at two levels: pointwise uniform convergence of the attention map, and convergence of moments for the transformed token distribution.
  For compactly supported (and more generally sub-Gaussian) distributions, our first result shows that the attention map converges uniformly on a ball of radius $R$ at rate $C(R)/\sqrt{n}$, where $C(R)$ grows exponentially with $R$. For large $R$, this estimate loses practical value, and our second result addresses this issue by establishing convergence rates for the moments of the transformed distribution (the token output of the attention layer). In this case, the rate is $C'(R)/n^{\beta}$ with $\beta&lt;\tfrac{1}{2}$, and $C'(R)$ depends polynomially on the size of the support of the distribution. The exponent $\beta$ depends on the attention geometry and the spectral properties of the token distribution. We also examine the regime in which the attention parameter tends to infinity and the softmax approaches a hardmax, and in this setting, we establish a logarithmic rate of convergence. Experiments on synthetic and real data support our predictions and show that the predicted slowdown is reflected in downstream accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.10656v3</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>L\'ea Bohbot, Cyril Letrouit, Gabriel Peyr\'e, Fran\c{c}ois-Xavier Vialard</dc:creator>
    </item>
    <item>
      <title>A Geometric Theory of Cognition for Machine Intelligence</title>
      <link>https://arxiv.org/abs/2512.12225</link>
      <description>arXiv:2512.12225v3 Announce Type: replace 
Abstract: Developing artificial agents that unify representation, memory, adaptation, and prediction remains a fundamental challenge in artificial intelligence. Here we introduce a geometric framework in which cognitive computation emerges from Riemannian gradient flow on a learned latent manifold. The learned metric encodes representational constraints and computational preferences, while anisotropies in the geometry naturally generate multiple timescales of behaviour, yielding both rapid reactive responses and slower adaptive dynamics without explicit memory modules or recurrent mechanisms. We instantiate this framework through Riemannian representation and dynamics models and evaluate them in partially observable reinforcement-learning environments. Across observation masking, sensory blackouts, dynamics perturbations, and predictive latent-modelling tasks, the proposed approach consistently outperforms feedforward baselines, achieves robustness comparable to recurrent architectures, and produces highly predictable latent trajectories with low long-horizon rollout error. These results suggest that learned latent geometry can serve simultaneously as a substrate for representation, memory, adaptation, and prediction. More broadly, the framework provides a principled connection between dynamical systems, representation learning, and world-model-based intelligence.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.12225v3</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Laha Ale</dc:creator>
    </item>
    <item>
      <title>Programmable Deformation Design of Porous Soft Actuator through Volumetric-Pattern-Induced Anisotropy</title>
      <link>https://arxiv.org/abs/2512.12320</link>
      <description>arXiv:2512.12320v2 Announce Type: replace 
Abstract: Conventional soft pneumatic actuators, typically based on hollow elastomeric chambers, often suffer from small structural support and require costly geometry-specific redesigns for multimodal functionality. Porous materials such as foam, filled into chambers, can provide structural stability for the actuators. However, methods to achieve programmable deformation by tailoring the porous body itself remain underexplored. In this paper, a novel design method is presented to realize soft porous actuators with programmable deformation by incising specific patterns into the porous foam body. This approach introduces localized structural anisotropy of the foam guiding the material's deformation under a global vacuum input. Furthermore, three fundamental patterns on a cylindrical foam substrate are discussed: transverse for bending, longitudinal for tilting, and diagonal for twisting. A computational model is built with Finite Element Analysis (FEA), to investigate the mechanism of the incision-patterning method. Experiments demonstrate that with a potential optimal design of the pattern array number N, actuators can achieve bending up to $80^{\circ}$ (N=2), tilting of $18^{\circ}$ (N=1), and twisting of $115^{\circ}$ (N=8). The versatility of our approach is demonstrated via pattern transferability, scalability, and mold-less rapid prototyping of complex designs. As a comprehensive application, we translate the human hand crease map into a functional incision pattern, creating a bio-inspired soft robot hand capable of human-like adaptive grasping. Our work provides a new, efficient, and scalable paradigm for the design of multi-functional soft porous robots.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.12320v2</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Canqi Meng, Weibang Bai</dc:creator>
    </item>
    <item>
      <title>Coarse-to-Fine Hierarchical Alignment for UAV-based Human Detection using Diffusion Models</title>
      <link>https://arxiv.org/abs/2512.13869</link>
      <description>arXiv:2512.13869v3 Announce Type: replace 
Abstract: Training object detectors demands extensive, task-specific annotations, yet this requirement becomes impractical in UAV-based human detection due to constantly shifting target distributions and the scarcity of labeled images. As a remedy, synthetic simulators are adopted to generate annotated data, with a low annotation cost. However, the domain gap between synthetic and real images hinders the model from being effectively applied to the target domain. Accordingly, we introduce Coarse-to-Fine Hierarchical Alignment (CFHA), a three-stage diffusion-based framework designed to transform synthetic data for UAV-based human detection, narrowing the domain gap while preserving the original synthetic labels. CFHA explicitly decouples global style and local content domain discrepancies and bridges those gaps using three modules: (1) Global Style Transfer -- a diffusion model aligns color, illumination, and texture statistics of synthetic images to the realistic style, using only a small real reference set; (2) Local Refinement -- a super-resolution diffusion model is used to facilitate fine-grained and photorealistic details for the small objects, such as human instances, preserving shape and boundary integrity; (3) Hallucination Removal -- a module that filters out human instances whose visual attributes do not align with real-world data to make the human appearance closer to the target distribution. Extensive experiments on public UAV Sim2Real detection benchmarks demonstrate that our methods significantly improve the detection accuracy compared to the non-transformed baselines. Specifically, our method achieves up to $+14.1$ improvement of mAP50 on Semantic-Drone benchmark. Ablation studies confirm the complementary roles of the global and local stages and highlight the importance of hierarchical alignment. The code is released at \href{https://github.com/liwd190019/CFHA}{this url}.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.13869v3</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wenda Li, Meng Wu, Liangzhao Chen, Sungmin Eum, Heesung Kwon, Qing Qu</dc:creator>
    </item>
    <item>
      <title>FADTI: Fourier and Attention Driven Diffusion for Multivariate Time Series Imputation</title>
      <link>https://arxiv.org/abs/2512.15116</link>
      <description>arXiv:2512.15116v2 Announce Type: replace 
Abstract: Multivariate time series imputation is fundamental in applications such as healthcare, traffic forecasting, and biological modeling, where sensor failures and irregular sampling lead to pervasive missing values. However, existing Transformer- and diffusion-based models lack explicit inductive biases and frequency awareness, limiting their generalization under structured missing patterns and distribution shifts. We propose FADTI, a diffusion-based framework that injects frequency-informed feature modulation via a learnable Fourier Bias Projection (FBP) module and combines it with temporal modeling through self-attention and gated convolution. FBP supports multiple spectral bases, enabling adaptive encoding of both stationary and non-stationary patterns. This design injects frequency-domain inductive bias into the generative imputation process. Experiments on multiple benchmarks, including a newly introduced biological time series dataset, show that FADTI consistently outperforms state-of-the-art methods, particularly under high missing rates. Code is available at https://anonymous.4open.science/r/TimeSeriesImputation-52BF</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.15116v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Runze Li, Hanchen Wang, Wenjie Zhang, Binghao Li, Yu Zhang, Xuemin Lin, Ying Zhang</dc:creator>
    </item>
    <item>
      <title>A Survey on the Applications of Generative Artificial Intelligence in Automated Driving Systems Test Scenario Generation Methods</title>
      <link>https://arxiv.org/abs/2512.15422</link>
      <description>arXiv:2512.15422v2 Announce Type: replace 
Abstract: Ensuring the safety and reliability of Automated Driving Systems (ADS) remains a critical challenge, as traditional verification methods such as large-scale on-road testing are prohibitively costly and time-consuming.To address this,scenario-based testing has emerged as a scalable and efficient alternative,yet existing surveys provide only partial coverage of recent methodological and technological advances.This review systematically analyzes 31 primary studies,and 10 surveys identified through a comprehensive search spanning 2015~2025;however,the in-depth methodological synthesis and comparative evaluation focus primarily on recent frameworks(2023~2025),reflecting the surge of Artificial Intelligent(AI)-assisted and multimodal approaches in this period.Traditional approaches rely on expert knowledge,ontologies,and naturalistic driving or accident data,while recent developments leverage generative models,including large language models,generative adversarial networks,diffusion models,and reinforcement learning frameworks,to synthesize diverse and safety-critical scenarios.Our synthesis identifies three persistent gaps:the absence of standardized evaluation metrics,limited integration of ethical and human factors,and insufficient coverage of multimodal and Operational Design Domain (ODD)-specific scenarios.To address these challenges,this review contributes a refined taxonomy that incorporates multimodal extensions,an ethical and safety checklist for responsible scenario design,and an ODD coverage map with a scenario-difficulty schema to enable transparent benchmarking.Collectively,these contributions provide methodological clarity for researchers and practical guidance for industry,supporting reproducible evaluation and accelerating the safe deployment of higher-level ADS.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.15422v2</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ji Zhou (Institute of Automotive Engineering, Graz University of Technology, Graz, Austria), Yongqi Zhao (Institute of Automotive Engineering, Graz University of Technology, Graz, Austria), Yixian Hu (Institute of Automotive Engineering, Graz University of Technology, Graz, Austria), Hexuan Li (Institute of Automotive Engineering, Graz University of Technology, Graz, Austria), Zhengguo Gu (Institute of Automotive Engineering, Graz University of Technology, Graz, Austria), Nan Xu (National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin university), Arno Eichberger (Institute of Automotive Engineering, Graz University of Technology, Graz, Austria)</dc:creator>
    </item>
    <item>
      <title>Collaborative Edge-to-Server Inference for Vision-Language Models</title>
      <link>https://arxiv.org/abs/2512.16349</link>
      <description>arXiv:2512.16349v2 Announce Type: replace 
Abstract: We propose a collaborative edge-to-server inference framework for vision-language models (VLMs) that reduces communication cost while maintaining inference accuracy. In typical deployments, visual data captured at edge devices (clients) is transmitted to the server for VLM inference. However, transmitting full-resolution images incurs high communication cost. Conversely, aggressive downsizing or excessive compression to mitigate communication overhead can discard fine-grained details, leading to accuracy degradation. To overcome this limitation, we design a communication-efficient two-stage framework. In the first stage, the server performs inference on the downsized thumbnail (global image) and quantifies the min-entropy of the output tokens. If the min-entropy exceeds a predefined threshold, the server identifies a region of interest (RoI) using the VLM's internal attention and requests the edge device to send a detail-preserved local image of the RoI. The server then refines its inference by jointly leveraging the global and local images. This selective retransmission strategy ensures that only essential visual content is additionally transmitted. Experimental results consistently confirm that the proposed framework substantially reduces communication overhead while maintaining inference accuracy across diverse VQA benchmarks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.16349v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Soochang Song, Yongjune Kim</dc:creator>
    </item>
    <item>
      <title>Efficient Stable Population Protocols for Parity and Beyond</title>
      <link>https://arxiv.org/abs/2512.20163</link>
      <description>arXiv:2512.20163v4 Announce Type: replace 
Abstract: For nearly two decades, population protocols have been extensively studied, yielding efficient solutions for central problems in distributed computing, including leader election, and majority computation, a predicate type in Presburger Arithmetic closely tied to population protocols. Surprisingly, no protocols have achieved both time- and space-efficiency for congruency predicates, such as parity computation, which are complementary in this arithmetic framework. This gap highlights a significant challenge in the field. To address this gap, we explore the parity problem, where agents are tasked with computing the parity of the given sub-population size. Then we extend the solution for parity to compute congruences modulo an arbitrary $m$.
  Previous research on efficient population protocols has focused on protocols that minimise both stabilisation time and state utilisation for specific problems. In contrast, this work slightly relaxes this expectation, permitting protocols to place less emphasis on full optimisation and more on universality, robustness, and probabilistic guarantees. This allows us to propose a novel computing paradigm that integrates population weights (or simply weights), a robust clocking mechanism, and efficient anomaly detection coupled with a switching mechanism (which ensures slow but always correct solutions). This paradigm facilitates universal design of efficient multistage stable population protocols. Specifically, the first efficient parity and congruence protocols introduced here use both $O(\log^3 n)$ states and achieve silent stabilisation in $O(\log^3 n)$ time. We conclude by discussing the impact of implicit conversion between unary and binary representations enabled by the weight system, with applications to other problems, including the computation and representation of (sub-)population sizes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.20163v4</guid>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Leszek G\k{a}sieniec, Tytus Grodzicki, Tomasz Jurdzi\'nski, Jakub Kowalski, Grzegorz Stachowiak</dc:creator>
    </item>
    <item>
      <title>LightTact: A Visual-Tactile Fingertip Sensor for Deformation-Independent Contact Sensing</title>
      <link>https://arxiv.org/abs/2512.20591</link>
      <description>arXiv:2512.20591v3 Announce Type: replace 
Abstract: Contact often occurs without macroscopic surface deformation, such as during interaction with liquids, semi-liquids, or ultra-soft materials. However, most existing tactile sensors rely on deformation to infer contact, making such light-contact interactions difficult to perceive robustly. To address this, we present LightTact, a visual-tactile fingertip sensor that makes contact directly visible via a deformation-independent principle. LightTact features an ambient-blocking optical configuration that suppresses both external light and internal illumination at non-contact regions, while transmitting only the scattered light generated at true contacts. As a result, LightTact produces high-contrast raw images in which non-contact pixels remain near-black (mean gray value &lt; 3) and contact pixels preserve the natural appearance of the contacting surface. Built on this, LightTact achieves accurate pixel-level contact segmentation that is robust to material properties, contact force, surface appearance, and environmental lighting. We further demonstrate that LightTact unlocks new robotic manipulation behaviors that require detection of extremely light contact, including water spreading, facial-cream dipping, and soft thin-film interaction. In addition, we show that LightTact's spatially aligned visual-tactile images can be directly interpreted by vision-language models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.20591v3</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Changyi Lin, Boda Huo, Mingyang Yu, Emily Ruppel, Bingqing Chen, Jonathan Francis, Ding Zhao</dc:creator>
    </item>
    <item>
      <title>MAR:Multi-Agent Reflexion Improves Reasoning Abilities in LLMs</title>
      <link>https://arxiv.org/abs/2512.20845</link>
      <description>arXiv:2512.20845v2 Announce Type: replace 
Abstract: LLMs have shown the capacity to improve their performance on reasoning tasks through reflecting on their mistakes, and acting with these reflections in mind. However, continual reflections of the same LLM onto itself exhibit degeneration of thought, where the LLM continues to repeat the same errors again and again even with the knowledge that its wrong. To address this problem, we instead introduce multi-agent with multi-persona debators as the method to generate reflections. Through out extensive experimentation, we've found that the leads to better diversity of in the reflections generated by the llm agent. We demonstrate an accuracy of 47% EM HotPot QA (question answering) and 82.7% on HumanEval (programming), both performances surpassing reflection with a single llm.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.20845v2</guid>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Onat Ozer, Yuchen Wang, Grace Wu, Daniel Dosti, Honghao Zhang, Vivi De La Rue</dc:creator>
    </item>
    <item>
      <title>Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives</title>
      <link>https://arxiv.org/abs/2601.01665</link>
      <description>arXiv:2601.01665v2 Announce Type: replace 
Abstract: Deep reinforcement learning (DRL) has shown great promise in addressing multi-objective combinatorial optimization problems (MOCOPs). Nevertheless, the robustness of these learning-based solvers has remained insufficiently explored, especially across diverse and complex problem distributions. In this paper, we propose a unified robustness-oriented framework for preference-conditioned DRL solvers for MOCOPs. Within this framework, we develop a preference-based adversarial attack to generate hard instances that expose solver weaknesses, and quantify the attack impact by the resulting degradation on Pareto-front quality. We further introduce a defense strategy that integrates hardness-aware preference selection into adversarial training to reduce overfitting to restricted preference regions and improve out-of-distribution performance. The experimental results on multi-objective traveling salesman problem (MOTSP), multi-objective capacitated vehicle routing problem (MOCVRP), and multi-objective knapsack problem (MOKP) verify that our attack method successfully learns hard instances for different solvers. Furthermore, our defense method significantly strengthens the robustness and generalizability of neural solvers, delivering superior performance on hard or out-of-distribution instances.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.01665v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wei Liu, Yaoxin Wu, Yingqian Zhang, Thomas B\"ack, Yingjie Fan</dc:creator>
    </item>
    <item>
      <title>Vision-Based Early Fault Diagnosis and Self-Recovery for Strawberry Harvesting Robots</title>
      <link>https://arxiv.org/abs/2601.02085</link>
      <description>arXiv:2601.02085v3 Announce Type: replace 
Abstract: Strawberry-harvesting robots faced challenges such as poor visual perception, gripper misalignment, empty grasp/misgrasp, and slippage, which reduced harvesting stability and efficiency.To overcome these issues, this paper proposes a visual fault diagnosis and self-recovery framework. An end-to-end SRR-Net achieved unified perception and fault diagnosis through joint detection, segmentation, and ripeness regression of the fruit and gripper. Leveraging this integrated perception, a relative error compensation method driven by simultaneous target-gripper detection was designed to correct positional misalignments exceeding the tolerance threshold. A micro-optical camera integrated within the end-effector delivered real-time visual feedback. Based on the micro-optical camera, a MobileNet V3-Small classifier was utilized for grasp adjustment during the deflating stage, enabling the early abort of the harvesting cycle in cases of empty grasp/misgrasps. Furthermore, a time-series LSTM classifier was applied during the snap-off stage to predict strawberry slippage. Based on these predictions, the system executed re-inflation and a secondary snap-off attempt for slipping strawberries, or aborted the cycle for slipped strawberries. Experiments demonstrated that the mean absolute errors between the end-effector and the picking point were reduced to 3.12 mm and 4.06 mm from 11.50 mm and 5.25 mm along the x- and y-axes, respectively, at the cost of a time increment of 0.64 $pm$ 0.24 s. The grasp adjustment module reduced the grasping phase by approximately 0.5 s and avoided empty-placement for failure cases. The strawberry slip prediction module handled slipped cases with an 88.89% success rate, saving approximately 4.00 s per harvesting cycle for failure cases. Also, it achieved an 81.25% recovery rate for slipping strawberries, requiring additional 0.63 s for re-grasping.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.02085v3</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1016/j.aiia.2026.05.009</arxiv:DOI>
      <dc:creator>Meili Sun, Chunjiang Zhao, Lichao Yang, Hao Liu, Shimin Hu, Ya Xiong</dc:creator>
    </item>
    <item>
      <title>ReTreVal: Reasoning Tree with Validation and Cross-Problem Memory for Large Language Models</title>
      <link>https://arxiv.org/abs/2601.02880</link>
      <description>arXiv:2601.02880v3 Announce Type: replace 
Abstract: Every existing inference-time reasoning framework discards all failure context at problem boundaries, leaving a model solving problem 500 no wiser than it was on problem 1. We present ReTreVal (Reasoning Tree with Validation), a training-free framework that closes this gap through adaptive tree exploration with tool-augmented node refinement, typed-failure backtracking that injects categorized error context into the recovered branch, and a self-rewriting memory that accumulates and revises strategy entries across problems, enabling inference-time cross-problem learning on any fixed, unmodified LLM without fine-tuning. ReTreVal achieves 85.8% pass@1 on MATH-500 (+8.6 pp over Zero-Shot CoT, +8.6 pp over the strongest baseline Self-Refine) and 54.4% on MMLU-Pro (+15.3 pp over Self-Refine), with a 3.4:1 win-to-regression ratio confirming genuine error recovery rather than noise. These capabilities, previously requiring gradient updates, allow a 32B model to compete with much larger single-pass systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.02880v3</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Abhishek HS, Pavan C Shekar, Arpit Jain, Ashwanth Krishnan</dc:creator>
    </item>
    <item>
      <title>ATLAS: Verifier-Guided Adaptive Latent Activation Steering for Efficient LLM Reasoning</title>
      <link>https://arxiv.org/abs/2601.03093</link>
      <description>arXiv:2601.03093v2 Announce Type: replace 
Abstract: Recent work on activation and latent steering has demonstrated that modifying internal representations can effectively guide large language models (LLMs) toward improved reasoning and efficiency without updating model parameters. However, most existing approaches rely on fixed steering policies and static intervention strengths, which limit their robustness across problem instances and often result in over- or under-steering. We propose Adaptive Test-time Latent Steering (ATLAS), a lightweight framework that dynamically controls steering decisions at inference time using a trained, lightweight verifier over the latent states. Given intermediate hidden states, the verifier predicts the quality of ongoing reasoning and adaptively selects which steering action to apply, enabling per-example and per-step adjustment with minimal overhead. ATLAS provides a unified framework for combining learned latent verification with test-time activation steering, enabling adaptive reasoning control without additional LLM decoding or inference-time process reward model calls. Experiments on multiple mathematical and coding reasoning benchmarks show that ATLAS consistently outperforms both vanilla decoding and fixed steering baselines, achieving higher accuracy while substantially reducing test-time token usage. These results demonstrate that verifier-guided latent adaptation provides an effective and scalable mechanism for controlling reasoning efficiency without sacrificing solution quality. All source code will be publicly available.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.03093v2</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Tuc Nguyen, Thai Le</dc:creator>
    </item>
    <item>
      <title>Muses: Designing, Composing, Generating Nonexistent Fantasy 3D Creatures without Training</title>
      <link>https://arxiv.org/abs/2601.03256</link>
      <description>arXiv:2601.03256v2 Announce Type: replace 
Abstract: We present Muses, the first training-free method for fantastic 3D creature generation in a feed-forward paradigm. Previous methods, which rely on part-aware optimization, manual assembly, or 2D image generation, often produce unrealistic or incoherent 3D assets due to the challenges of intricate part-level manipulation and limited out-of-domain generation. In contrast, Muses leverages the 3D skeleton, a fundamental representation of biological forms, to explicitly and rationally compose diverse elements. This skeletal foundation formalizes 3D content creation as a structure-aware pipeline of design, composition, and generation. Muses begins by constructing a creatively composed 3D skeleton with coherent layout and scale through graph-constrained reasoning. This skeleton then guides a voxel-based assembly process within a structured latent space, integrating regions from different objects. Finally, image-guided appearance modeling under skeletal conditions is applied to generate a style-consistent and harmonious texture for the assembled shape. Extensive experiments establish Muses' state-of-the-art performance in terms of visual fidelity and alignment with textual descriptions, and potential on flexible 3D object editing. Project page: https://luhexiao.github.io/Muses.github.io/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.03256v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hexiao Lu, Xiaokun Sun, Zeyu Cai, Hao Guo, Ying Tai, Jian Yang, Zhenyu Zhang</dc:creator>
    </item>
    <item>
      <title>Bathtubs, Boundaries, and Sandboxes: AI Regulatory Learning under Legal Uncertainty</title>
      <link>https://arxiv.org/abs/2601.04094</link>
      <description>arXiv:2601.04094v3 Announce Type: replace 
Abstract: Effective regulation of AI is a defining policy challenge, driven by their integration into all aspects of society. To remain responsive to their rapid development and emergent properties, policymakers across the globe rely on high-level principles and abstract legal requirements. Yet, while this flexibility supports future-proofing human-centred regulations and aligning them with socio-ethical values, it also causes legal uncertainty downstream as developers, companies, and auditors struggle with translating these abstract requirements into verifiable technical requirements. Using the AI Act as an example, this paper draws on Coleman's bathtub to analyse the regulatory learning space in AI governance. It argues that legal uncertainty cannot be fully reduced ex ante and that, within reasonable bounds, it is also necessary for regulatory learning because it creates the space in which boundary negotiation over socio-technical meaning can occur. Building on this analysis, the paper shows how boundary objects and boundary negotiating artifacts help explain the translation of legal requirements into operational practice. By examining technical sandbox frameworks, it further identifies concrete properties that technical infrastructures must possess to function effectively as boundary negotiation artifacts in AI assessment. The paper concludes that legal certainty remains the long-term aim, but that premature closure of regulatory instruments risks undermining the learning processes needed for adaptive governance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.04094v3</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tom Deckenbrunnen, Alessio Buscemi, Marco Almada, Alfredo Capozucca, German Castignani</dc:creator>
    </item>
    <item>
      <title>State Backdoor: Towards Stealthy Real-world Poisoning Attack on Vision-Language-Action Model in State Space</title>
      <link>https://arxiv.org/abs/2601.04266</link>
      <description>arXiv:2601.04266v2 Announce Type: replace 
Abstract: Vision-Language-Action (VLA) models are widely deployed in safety-critical embodied AI applications such as robotics. However, their complex multimodal interactions also expose new security vulnerabilities. In this paper, we investigate a backdoor threat in VLA models, where malicious inputs cause targeted misbehavior while preserving performance on clean data. Existing backdoor methods predominantly rely on inserting visible triggers into visual modality, which suffer from poor robustness and low insusceptibility in real-world settings due to environmental variability. To overcome these limitations, we introduce the State Backdoor, a novel and practical backdoor attack that leverages the robot arm's initial state as the trigger. To optimize trigger for insusceptibility and effectiveness, we design a Preference-guided Genetic Algorithm (PGA) that efficiently searches the state space for minimal yet potent triggers. Extensive experiments on five representative VLA models and five real-world tasks show that our method achieves over 90% attack success rate without affecting benign task performance, revealing an underexplored vulnerability in embodied AI systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.04266v2</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ji Guo, Wenbo Jiang, Yansong Lin, Yijing Liu, Ruichen Zhang, Guomin Lu, Aiguo Chen, Xinshuo Han, Hongwei Li</dc:creator>
    </item>
    <item>
      <title>IGenBench: Benchmarking the Reliability of Text-to-Infographic Generation</title>
      <link>https://arxiv.org/abs/2601.04498</link>
      <description>arXiv:2601.04498v2 Announce Type: replace 
Abstract: Infographics are composite visual artifacts that combine data visualizations with textual and illustrative elements to communicate information. While recent text-to-image (T2I) models can generate aesthetically appealing images, their reliability in generating infographics remains unclear. Generated infographics may appear correct at first glance but contain easily overlooked issues, such as distorted data encoding or incorrect textual content. We present IGENBENCH, the first benchmark for evaluating the reliability of text-to-infographic generation, comprising 600 curated test cases spanning 30 infographic types. We design an automated evaluation framework that decomposes reliability verification into atomic yes/no questions based on a taxonomy of 10 question types. We employ multimodal large language models (MLLMs) to verify each question, yielding question-level accuracy (Q-ACC) and infographic-level accuracy (I-ACC). We comprehensively evaluate 10 state-of-the-art T2I models on IGENBENCH. Our systematic analysis reveals key insights for future model development: (i) a three-tier performance hierarchy with the top model achieving Q-ACC of 0.90 but I-ACC of only 0.49; (ii) data-related dimensions emerging as universal bottlenecks (e.g., Data Completeness: 0.21); and (iii) the challenge of achieving end-to-end correctness across all models. We release IGENBENCH at https://igen-bench.vercel.app/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.04498v2</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yinghao Tang, Xueding Liu, Boyuan Zhang, Tingfeng Lan, Yupeng Xie, Jiale Lao, Yiyao Wang, Haoxuan Li, Tingting Gao, Bo Pan, Luoxuan Weng, Xiuqi Huang, Minfeng Zhu, Yingchaojie Feng, Yuyu Luo, Wei Chen</dc:creator>
    </item>
    <item>
      <title>Thinking-Based Non-Thinking: Solving the Reward Hacking Problem in Training Hybrid Reasoning Models via Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2601.04805</link>
      <description>arXiv:2601.04805v2 Announce Type: replace 
Abstract: Large reasoning models (LRMs) have attracted much attention due to their exceptional performance. However, their performance mainly stems from thinking, a long Chain of Thought (CoT), which significantly increase computational overhead. To address this overthinking problem, existing work focuses on using reinforcement learning (RL) to train hybrid reasoning models that automatically decide whether to engage in thinking or not based on the complexity of the query. Unfortunately, using RL will suffer the the reward hacking problem, e.g., the model engages in thinking but is judged as not doing so, resulting in incorrect rewards. To mitigate this problem, existing works either employ supervised fine-tuning (SFT), which incurs high computational costs, or enforce uniform token limits on non-thinking responses, which yields limited mitigation of the problem. In this paper, we propose Thinking-Based Non-Thinking (TNT). It does not employ SFT, and sets different maximum token usage for responses not using thinking across various queries by leveraging information from the solution component of the responses using thinking. Experiments on five mathematical benchmarks demonstrate that TNT reduces token usage by around 50% compared to DeepSeek-R1-Distill-Qwen-1.5B/7B and DeepScaleR-1.5B, while significantly improving accuracy. In fact, TNT achieves the optimal trade-off between accuracy and efficiency among all tested methods. Additionally, the probability of reward hacking problem in TNT's responses, which are classified as not using thinking, remains below 10% across all tested datasets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.04805v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Siyuan Gan, Jiaheng Liu, Boyan Wang, Tianpei Yang, Runqing Miao, Yuyao Zhang, Fanyu Meng, Junlan Feng, Linjian Meng, Jing Huo, Yang Gao</dc:creator>
    </item>
    <item>
      <title>FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching</title>
      <link>https://arxiv.org/abs/2601.05212</link>
      <description>arXiv:2601.05212v2 Announce Type: replace 
Abstract: Brain Magnetic Resonance Imaging (MRI) plays a central role in studying neurological development, aging, and diseases. One key application is Brain Age Prediction (BAP), which estimates an individual's biological brain age from MRI data. Effective BAP models require large, diverse, and age-balanced datasets, whereas existing 3D MRI datasets are demographically skewed, limiting fairness and generalizability. Acquiring new data is costly and ethically constrained, motivating generative data augmentation. Current generative methods are often based on latent diffusion models, which operate in learned low dimensional latent spaces to address the memory demands of volumetric MRI data. However, these methods are typically slow at inference, may introduce artifacts due to latent compression, and are rarely conditioned on age, thereby affecting the BAP performance. In this work, we propose FlowLet, a conditional generative framework that synthesizes age-conditioned 3D MRIs by leveraging flow matching within an invertible 3D wavelet domain, helping to avoid reconstruction artifacts and reducing computational demands. Experiments show that FlowLet generates high-fidelity volumes with few sampling steps. Training BAP models with data generated by FlowLet improves performance for underrepresented age groups, and region-based analysis confirms preservation of anatomical structures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.05212v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Danilo Danese, Angela Lombardi, Matteo Attimonelli, Giuseppe Fasano, Tommaso Di Noia</dc:creator>
    </item>
    <item>
      <title>Improving User Experience with Personalized Review Ranking and Summarization</title>
      <link>https://arxiv.org/abs/2601.05261</link>
      <description>arXiv:2601.05261v2 Announce Type: replace 
Abstract: Online consumer reviews are important decision-support resources in e-commerce, yet the increasing volume of reviews often creates information overload and makes it difficult for users to identify content that matches their individual preferences. Existing review-ranking approaches commonly rely on aggregate signals such as star ratings, helpfulness votes, or recency, which may not reflect user-specific interests. This paper proposes a personalized review ranking and summarization framework that integrates user preference modeling, hybrid sentiment estimation, aspect-level review matching, and Large Language Model (LLM)-based summarization. The framework first extracts aspect-level preferences and sentiment signals from historical reviews. It then incorporates user-selected product aspects and written review input to build a personalized user profile. Candidate reviews are ranked by comparing this profile with review-level aspect and sentiment representations. The top-ranked reviews are then summarized to provide concise, preference-aligned information. The proposed method was evaluated using an Amazon Mobile Electronics review dataset and a structured user study involving 70 participants across common consumer electronics categories. Results show that the proposed ranking method outperformed random ordering, star-rating-based ranking, helpfulness-vote ranking, recency-based ranking, and semantic-similarity-based ranking. User-study results further indicate improvements in satisfaction, perceived relevance, decision-making confidence, ease of finding information, and reading efficiency. The findings suggest that combining aspect-level personalization, sentiment-aware ranking, and LLM-based summarization can reduce review overload and support more efficient user-centered decision-making.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.05261v2</guid>
      <category>cs.IR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Muhammad Jawad Mufti, Omar Hammad, MD. Mahfuzur Rahman</dc:creator>
    </item>
    <item>
      <title>One if by Land, Two if by Sea, Three if by Four Seas, and More to Come -- Values of Perception, Prediction, Communication, and Common Sense in Decision Making</title>
      <link>https://arxiv.org/abs/2601.06077</link>
      <description>arXiv:2601.06077v2 Announce Type: replace 
Abstract: This work aims to rigorously define the values of perception, prediction, communication, and common sense in decision making. The defined quantities are decision-theoretic, but have information-theoretic analogues, e.g., they share some simple but key mathematical properties with Shannon entropy and mutual information, and can reduce to these quantities in particular settings. One interesting observation is that, the value of perception without prediction can be negative, while the value of perception together with prediction and the value of prediction alone are always nonnegative. The defined quantities suggest answers to practical questions arising in the design of autonomous decision-making systems. Example questions include: Do we need to observe and predict the behavior of a particular agent? How important is it? What is the best order to observe and predict the agents? The defined quantities may also provide insights to cognitive science and neural science, toward the understanding of how natural decision makers make use of information gained from different sources and operations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.06077v2</guid>
      <category>cs.IT</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>math.IT</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Aolin Xu</dc:creator>
    </item>
    <item>
      <title>Dynamic Distributed Constraint Optimization and Metareasoning for Continual, Large-Scale Satellite Operations</title>
      <link>https://arxiv.org/abs/2601.06188</link>
      <description>arXiv:2601.06188v3 Announce Type: replace 
Abstract: As Earth-observing satellite constellations grow in size and capability, distributed onboard control offers a pathway to novel responses and time-sensitive measurements. However, deploying autonomy to satellites requires efficient computation and communication. This work addresses the challenge of scheduling observations for hundreds of satellites in a dynamic, large-scale problem with millions of variables. We present the dynamic multi-satellite constellation observation scheduling problem (DCOSP), a new formulation of dynamic distributed constraint optimization problems (DDCOP) that models integrated scheduling and execution. DCOSP features a novel optimality condition, for which we construct an exact omniscient offline algorithm. Motivated by the strong resource constraints of onboard satellite operations, we introduce a framework to incorporate metareasoning in DDCOPs that controls when agents expend resources to recompute solutions. In addition, we present the dynamic incremental neighborhood stochastic search (D-NSS) algorithm, an incomplete online decomposition-based DDCOP algorithm that repairs localized sub-problems in response to dynamic events. We demonstrate in realistic simulations that D-NSS converges to near-optimal solutions, outperforming standard DDCOP baselines in solution quality, computation time, and message volume, while our metareasoning framework successfully balances resource conservation with utility. As part of the NASA FAME mission, this work lays the foundation for the largest in-space demonstration of distributed multi-agent AI to date.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.06188v3</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.65109/JCYH5778</arxiv:DOI>
      <dc:creator>Itai Zilberstein, Steve Chien</dc:creator>
    </item>
    <item>
      <title>How Context Shapes Truth: Geometric Transformations of Statement-level Truth Representations in LLMs</title>
      <link>https://arxiv.org/abs/2601.06599</link>
      <description>arXiv:2601.06599v2 Announce Type: replace 
Abstract: Large Language Models (LLMs) often encode whether a statement is true as a vector in their residual stream activations. These vectors, also known as truth vectors, have been studied in prior work, however how they change when context is introduced remains unexplored. We study this question by measuring (1) the directional change ($\theta$) between the truth vectors with and without context and (2) the relative magnitude of the truth vectors upon adding context. Across four LLMs and four datasets, we find that (1) truth vectors are roughly orthogonal in early layers, converge in middle layers, and may stabilize or continue increasing in later layers; (2) adding context generally increases the truth vector magnitude, i.e., the separation between true and false representations in the activation space is amplified; (3) larger models distinguish relevant from irrelevant context mainly through directional change ($\theta$), while smaller models show this distinction through magnitude differences. We also find that context conflicting with parametric knowledge produces larger geometric changes than parametrically aligned context. To the best of our knowledge, this is the first work that provides a geometric characterization of how context transforms the truth vector in the activation space of LLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.06599v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shivam Adarsh, Maria Maistro, Christina Lioma</dc:creator>
    </item>
    <item>
      <title>Revisiting Training Scale: An Empirical Study of Token Count, Power Consumption, and Parameter Efficiency</title>
      <link>https://arxiv.org/abs/2601.06649</link>
      <description>arXiv:2601.06649v2 Announce Type: replace 
Abstract: Research in machine learning has questioned whether increases in training token counts reliably produce proportional performance gains in large language models. Building on prior work introducing an energy-aware parameter efficiency metric, this study empirically examines the effects of increasing training token counts under fixed hardware and training conditions. The significance of this work lies in the explicit integration of power consumption and execution duration, as reflected by the power sampling frequency, into token-scale analysis. This addresses a gap in prior studies emphasizing performance outcomes while underrepresenting computational and energy costs. Using a repeated-measures experimental design on a constant GPU instance with an identical model architecture, optimizer settings, and epoch counts, a 1.1-billion-parameter TinyLlama model was trained at three token counts (500K, 1M, and 2M). While conventional performance metrics exhibited inconsistent or diminishing returns across token scales, the inclusion of power consumption and execution duration revealed a strictly monotonic decline in training efficiency as token count increased. Repeated-measures ANOVA demonstrated a strong effect of token count on parameter efficiency, with all pairwise comparisons remaining significant following Bonferroni correction. These findings indicate that increases in training token counts may be energetically inefficient even when marginal performance improvements are observed, underscoring the importance of efficiency-aware evaluation in large language model training.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.06649v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Joe Dwyer</dc:creator>
    </item>
    <item>
      <title>DYCP: Dynamic Context Pruning for Long-Form Dialogue with LLMs</title>
      <link>https://arxiv.org/abs/2601.07994</link>
      <description>arXiv:2601.07994v5 Announce Type: replace 
Abstract: Large Language Models (LLMs) increasingly operate over long-form dialogues with frequent topic shifts. While recent LLMs support extended context windows, efficient management of dialogue history in practice is needed due to inference cost and latency constraints. We present DyCP, a lightweight context management method implemented outside the LLM that dynamically identifies and retrieves relevant dialogue segments conditioned on the current turn, without offline memory construction. DyCP manages dialogue context while preserving the sequential nature of dialogue without predefined topic boundaries, enabling adaptive and efficient context selection. Across three long-form dialogue benchmarks-LoCoMo, MT-Bench+, and SCM4LLMs-and multiple LLM backends, DyCP achieves competitive answer quality in downstream generation, with more selective context usage and improved inference efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.07994v5</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nayoung Choi, Jonathan Zhang, Jinho D. Choi</dc:creator>
    </item>
    <item>
      <title>MMR-GRPO: Accelerating GRPO-Style Training through Diversity-Aware Reward Reweighting</title>
      <link>https://arxiv.org/abs/2601.09085</link>
      <description>arXiv:2601.09085v2 Announce Type: replace 
Abstract: Group Relative Policy Optimization (GRPO) has become a standard approach for training mathematical reasoning models; however, its reliance on multiple completions per prompt makes training computationally expensive. Although recent work has reduced the number of training steps required to reach peak performance, the overall wall-clock training time often remains unchanged or even increases due to higher per-step cost. We propose MMR-GRPO, which integrates Maximal Marginal Relevance to reweigh rewards based on completion diversity. Our key insight is that semantically redundant completions contribute limited marginal learning signal; prioritizing diverse solutions yields more informative updates and accelerates convergence. Extensive evaluations across three model sizes (1.5B, 7B, 8B), three GRPO variants, and five mathematical reasoning benchmarks show that MMR-GRPO achieves comparable peak performance while requiring on average 47.9% fewer training steps and 70.2% less wall-clock time. These gains are consistent across models, methods, and benchmarks. Our code is released at: https://github.com/WeiKangda/MMR-GRPO.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.09085v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Kangda Wei, Ruihong Huang</dc:creator>
    </item>
    <item>
      <title>Enhancing Spatial Reasoning in Large Language Models for Metal-Organic Frameworks Structure Prediction</title>
      <link>https://arxiv.org/abs/2601.09285</link>
      <description>arXiv:2601.09285v2 Announce Type: replace 
Abstract: Metal-organic frameworks (MOFs) are porous crystalline materials with broad applications such as carbon capture and drug delivery, yet accurately predicting their 3D structures remains a significant challenge. While Large Language Models (LLMs) have shown promise in generating crystal structures, their application to MOFs is hindered by MOFs' high structural complexity arising from the large number of atoms in unit cell. Inspired by the success of block-wise paradigms in deep generative models for MOFs, we pioneer the application of LLMs in this domain by introducing MOF-LLM, the first LLM framework specifically adapted for block-level MOF structure prediction. To effectively harness LLMs for this 3D modular assembly task, our training paradigm integrates spatial-aware continual pre-training (CPT), structural supervised fine-tuning (SFT), and matching-driven reinforcement learning (RL). By incorporating explicit spatial priors and optimizing structural stability via Soft Adaptive Policy Optimization (SAPO), our approach substantially enhances the spatial reasoning in a Qwen-3 8B model for MOF structure prediction. Comprehensive experiments demonstrate that MOF-LLM achieves state-of-the-art performance with a match rate of 35.78% while exhibiting superior sampling efficiency of 0.04 seconds per structure.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.09285v2</guid>
      <category>cs.LG</category>
      <category>cond-mat.mtrl-sci</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mianzhi Pan, JianFei Li, Peishuo Liu, Botian Wang, Yawen Ouyang, Yiming Rong, Hao Zhou, Jianbing Zhang</dc:creator>
    </item>
    <item>
      <title>Nonlinear numerical schemes using specular differentiation for initial value problems of first-order ordinary differential equations</title>
      <link>https://arxiv.org/abs/2601.09900</link>
      <description>arXiv:2601.09900v4 Announce Type: replace 
Abstract: This paper proposes specular differentiation in one-dimensional Euclidean space and provides its fundamental analysis, including a quasi-Fermat theorem and a quasi-Mean Value Theorem. As an application, this paper develops several numerical schemes for solving initial value problems for first-order ordinary differential equations. Based on numerical simulations, we select one scheme and prove its second-order consistency and convergence. By modifying this scheme, we also obtain a numerical scheme with zero local truncation error for ODEs whose solution trajectories are ellipses.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.09900v4</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kiyuob Jung</dc:creator>
    </item>
    <item>
      <title>Phong-Rodrigues Extrinsic Vector-Field Processing</title>
      <link>https://arxiv.org/abs/2601.10621</link>
      <description>arXiv:2601.10621v2 Announce Type: replace 
Abstract: We introduce a new extrinsic discretization of tangent vector fields on triangle meshes that is continuous, with bounded derivatives that are continuous almost everywhere, supporting pointwise evaluation and integration of differential operators. We achieve this by building a continuous normal field over the mesh via Phong interpolation and using minimal Rodrigues rotations to transport vertex-based tangent vectors into triangle interiors. Unlike most existing discretizations, which typically sacrifice either continuity or the ability to evaluate derivatives pointwise, our approach supports both. Because it is pointwise evaluatable, and using the fact that the covariant derivative can be decomposed into its symmetric, antisymmetric, and scalar components, our discretization supports the construction of standard vector-field processing operators including the connection and Hodge Laplacians, Killing energy, divergence, curl, and the Lie bracket. This framework provides a simple and practical finite-element formulation for vector-field processing on meshes, supporting both integration-based operators and pointwise queries. To our knowledge, ours is the first discretization that jointly enables extrinsic continuous vector fields, bounded derivatives, and pointwise evaluation of this collection of operators.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.10621v2</guid>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hongyi Liu, Oded Stein, Amir Vaxman, Mirela Ben-Chen, Misha Kazhdan</dc:creator>
    </item>
    <item>
      <title>Neural Induction of Finite-State Transducers</title>
      <link>https://arxiv.org/abs/2601.10918</link>
      <description>arXiv:2601.10918v3 Announce Type: replace 
Abstract: Finite-State Transducers (FSTs) are effective models for string-to-string rewriting tasks, often providing the efficiency necessary for high-performance applications, but constructing transducers by hand is difficult. In this work, we propose a novel method for automatically constructing unweighted FSTs following the hidden state geometry learned by a recurrent neural network. We evaluate our methods on real-world datasets for morphological inflection, grapheme-to-phoneme prediction, and historical normalization, showing that the constructed FSTs are highly accurate and robust for many datasets, substantially outperforming classical transducer learning algorithms by up to 87% accuracy on held-out test sets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.10918v3</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Michael Ginn, Alexis Palmer, Mans Hulden</dc:creator>
    </item>
    <item>
      <title>Massively Multilingual Joint Segmentation and Glossing</title>
      <link>https://arxiv.org/abs/2601.10925</link>
      <description>arXiv:2601.10925v3 Announce Type: replace 
Abstract: Automated interlinear gloss prediction with neural networks is a promising approach to accelerate language documentation efforts. However, while state-of-the-art models like GlossLM achieve high scores on glossing benchmarks, user studies with linguists have found critical barriers to the usefulness of such models in real-world scenarios. In particular, existing models typically generate morpheme-level glosses but assign them to whole words without predicting the actual morpheme boundaries, making the predictions less interpretable and thus untrustworthy to human annotators.
  We conduct the first study on neural models that jointly predict interlinear glosses and the corresponding morphological segmentation from raw text. We run experiments to determine the optimal way to train models that balance segmentation and glossing accuracy, as well as the alignment between the two tasks. We extend the training corpus of GlossLM and pretrain PolyGloss, a family of seq2seq multilingual models for joint segmentation and glossing that outperforms GlossLM on glossing and beats various open-source LLMs on segmentation, glossing, and alignment. In addition, we demonstrate that PolyGloss can be quickly adapted to a new dataset via low-rank adaptation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.10925v3</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Michael Ginn, Lindia Tjuatja, Enora Rice, Ali Marashian, Maria Valentini, Jasmine Xu, Graham Neubig, Alexis Palmer</dc:creator>
    </item>
    <item>
      <title>A Comparative Study of Student Perspectives on Technical Writing Feedback Quality: Evaluating LLMs, SLMs, and Humans in Computer Science Topics</title>
      <link>https://arxiv.org/abs/2601.11541</link>
      <description>arXiv:2601.11541v2 Announce Type: replace 
Abstract: To address the scalability of feedback in computer science while mitigating the privacy and cost limitations of commercial Large Language Models (LLMs), this study evaluates a locally hosted Small Language Model (SLM). We deployed a quantized Llama-3.1, GPT-4, and human instructors across introductory programming (N=176), operating systems (N=80), and a writing seminar (N=7). Mixed-methods analysis of student perceptions reveals that while the local SLM matched commercial LLMs and was rated higher by students for readability and actionability in technical courses, human feedback remained more favoured for highly specialized writing tasks. We demonstrate that local SLMs offer a privacy-preserving, zero-marginal-cost alternative for foundational feedback, supporting a tiered pedagogical framework where AI handles structural guidance while instructors focus on high-level conceptual scaffolding.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.11541v2</guid>
      <category>cs.HC</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Suqing Liu, Runlong Ye, Christopher Eaton, Bogdan Simion, Michael Liut</dc:creator>
    </item>
    <item>
      <title>Multimodal Generative Engine Optimization: Rank Manipulation for Vision-Language Model Rankers</title>
      <link>https://arxiv.org/abs/2601.12263</link>
      <description>arXiv:2601.12263v2 Announce Type: replace 
Abstract: Vision-Language Models (VLMs) integrate visual and textual knowledge into unified representations that increasingly underpin modern retrieval and recommendation systems. However, it remains unclear how reliably these models utilize their cross-modal knowledge when ranking multimodal items, and whether their knowledge grounding can be subverted. In this paper, we expose a fundamental vulnerability in how VLMs apply multimodal knowledge for product ranking: through Multimodal Generative Engine Optimization (MGEO), we show that an adversary can manipulate a VLM's ranking decisions by jointly crafting imperceptible image perturbations and fluent textual suffixes that exploit the model's internal cross-modal knowledge coupling. Using an alternating optimization strategy, MGEO targets the deep interactions between visual and linguistic representations within the VLM, achieving rank manipulations that substantially exceed those of unimodal attacks and heuristic baselines powered by strong commercial models. Our findings reveal that surface-level content quality is insufficient for rank promotion; instead, direct alignment with the model's internal knowledge utilization mechanism is required. These results raise important questions on the faithfulness and robustness of knowledge grounding in multimodal foundation models, and motivate future work on defense mechanisms for multimodal retrieval systems. Code is available at: https://github.com/glad-lab/MGEO</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.12263v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Yixuan Du, Chenxiao Yu, Haoyan Xu, Ziyi Wang, Yue Zhao, Xiyang Hu</dc:creator>
    </item>
    <item>
      <title>OOPS: Automated generation of REST API specification via LLMs</title>
      <link>https://arxiv.org/abs/2601.12735</link>
      <description>arXiv:2601.12735v2 Announce Type: replace 
Abstract: REST APIs, based on the REpresentational State Transfer (REST) architecture, are the primary type of Web API. The OpenAPI Specification (OAS) serves as the de facto standard for describing REST APIs and is crucial for multiple software engineering tasks. Automated OAS generation can help developers identify and correct issues in manually maintained OAS, but existing approaches rely on technology-specific rules and human expert intervention. LLMs' powerful code understanding capabilities offer the potential to overcome these limitations, but introduce additional challenges such as context length limitations and hallucinations. To address these challenges, we propose OOPS, the first technology-agnostic approach that leverages LLM-based static analysis of server code for OAS generation. Through an LLM agent workflow comprising two key steps, endpoint method extraction and OAS generation, OOPS eliminates the need for technology-specific rules or human expert intervention. By constructing an API dependency graph, it establishes necessary file associations to address LLMs' context length limitations. By multi-stage generation and self-refine, it mitigates both syntactic and semantic hallucinations during OAS generation. We evaluated OOPS on 12 real-world REST APIs spanning 5 programming languages and 8 development frameworks. Experimental results demonstrate that OOPS accurately generates high-quality OAS for REST APIs implemented with diverse technologies, achieving an average F1-score exceeding 98% for endpoint method inference, 97% for both request parameter and response inference, and 92% for parameter constraint inference. The input tokens average below 5.6K with a maximum of 16.13K, while the output tokens average below 0.9K with a maximum of 7.63K.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.12735v2</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1016/j.jss.2026.112914</arxiv:DOI>
      <arxiv:journal_reference>Journal of Systems and Software 239 (2026) 112914</arxiv:journal_reference>
      <dc:creator>Hao Chen, Yunchun Li, Chen Chen, Fengxu Lin, Wei Li</dc:creator>
    </item>
    <item>
      <title>XCR-Bench: Benchmarking Cross-Cultural Reasoning in LLMs via Culture-Specific Items and Hall's Triad</title>
      <link>https://arxiv.org/abs/2601.14063</link>
      <description>arXiv:2601.14063v2 Announce Type: replace 
Abstract: Cross-cultural competence in large language models (LLMs) requires understanding and adapting Culture-Specific Items (CSIs) across varying cultural contexts. However, progress in evaluating this capability remains limited by the lack of high-quality CSI-annotated corpora with parallel cross-cultural sentence pairs. We introduce XCR-Bench, a Cross(X)-Cultural Reasoning Benchmark containing 4.1k parallel sentences and 1,098 CSIs across three reasoning tasks. XCR-Bench integrates Newmark's CSI framework with Hall's Triad of Culture, enabling evaluation across levels of cultural visibility -- from observable practices to implicit social norms and values. Experiments on eight multilingual LLMs show that state-of-the-art models exhibit consistent weaknesses in identifying and adapting specific categories of CSIs, revealing a gap between surface-level recall and explicit cultural reasoning. Performance declines significantly on culturally sensitive categories and deeper cultural levels (p&lt;0.005, 8/8 models), and adaptation quality varies systematically across target cultures and Bengali regional variants, indicating encoded regional and ethno-religious biases even within a single linguistic setting. We publicly release the corpus and code to support future research on cross-cultural NLP.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.14063v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Mohsinul Kabir, Tasnim Ahmed, Md Mezbaur Rahman, Shaoxiong Ji, Hassan Alhuzali, Yuechen Jiang, Jimin Huang, Sophia Ananiadou</dc:creator>
    </item>
    <item>
      <title>A Unified Framework for Scalable and Robust Paper Assignment</title>
      <link>https://arxiv.org/abs/2601.14402</link>
      <description>arXiv:2601.14402v2 Announce Type: replace 
Abstract: Assigning papers to reviewers is a central challenge in the peer-review process of large academic conferences. Program chairs must balance competing objectives, including maximizing reviewer expertise, promoting diversity, and enhancing robustness to strategic manipulation, but it is challenging to do so at the modern conference scale.
  Existing algorithmic paper assignment approaches either fail to address all of these goals simultaneously or suffer from poor scalability. To address the limitation, we propose Robust Assignment via Marginal Perturbation (RAMP), a unified framework for large-scale peer review. Our approach formulates a linearized perturbed-maximization objective with soft constraints that flexibly balance assignment quality, diversity, and robustness while maintaining runtime efficiency. We further introduce an attribute-aware sampling procedure that converts fractional solutions into integral assignments and improves the diversity and robustness of the final assignment. On datasets with over 20,000 papers and 20,000 reviewers, RAMP runs in under 20 minutes, demonstrating its suitability for real-world deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.14402v2</guid>
      <category>cs.SI</category>
      <category>cs.GT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Michael Cui, Chenxin Dai, Yixuan Even Xu, Fei Fang</dc:creator>
    </item>
    <item>
      <title>On-the-fly hand-eye calibration for the da Vinci surgical robot</title>
      <link>https://arxiv.org/abs/2601.14871</link>
      <description>arXiv:2601.14871v2 Announce Type: replace 
Abstract: In Robot-Assisted Minimally Invasive Surgery (RMIS), accurate tool localization is crucial to ensure patient safety and successful task execution. However, this remains challenging for cable-driven robots, such as the da Vinci robot, because erroneous encoder readings lead to pose estimation errors. In this study, we propose a calibration framework to produce accurate tool localization results through computing the hand-eye transformation matrix on-the-fly. The framework consists of two interrelated algorithms: the feature association block and the hand-eye calibration block, which provide robust correspondences for key points detected on monocular images without pre-training, and offer the versatility to accommodate various surgical scenarios by adopting an array of filter approaches, respectively. To validate its efficacy, we test the framework extensively on publicly available video datasets that feature multiple surgical instruments conducting tasks in both in vitro and ex vivo scenarios, under varying illumination conditions and with different levels of key point measurement accuracy. The results show a significant reduction in tool localization errors under the proposed calibration framework, with accuracies comparable to other state-of-the-art methods while being more time-efficient.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.14871v2</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Zejian Cui, Ferdinando Rodriguez y Baena</dc:creator>
    </item>
    <item>
      <title>Parameter-Efficient Multi-Task Fine-Tuning in Code-Related Tasks</title>
      <link>https://arxiv.org/abs/2601.15094</link>
      <description>arXiv:2601.15094v2 Announce Type: replace 
Abstract: Large Language Models (LLMs) have proven highly effective in automating software engineering tasks, bridging natural language and code semantics to achieve notable results in code generation and summarization. However, their scale incurs substantial computational costs, making full fine-tuning impractical. Parameter-Efficient Fine-Tuning (PEFT) methods like QLoRA enable efficient specialization with lower resource demands. Recent studies show QLoRA-optimized Large Code Models (LCMs) perform strongly across diverse tasks, yet it remains unclear whether this effectiveness persists when a single model is QLoRA fine-tuned for multiple code-related tasks. The interaction between Multi-task fine-tuning and QLoRA optimization, and how transfer learning affects correctness and quality of generated artifacts, remains largely unexplored. We investigate Multi-task QLoRA fine-tuning across three representative tasks: code generation, translation, and summarization. We evaluate functional correctness through execution-based and similarity-based metrics, complemented by comprehensive code quality analysis--an aspect largely overlooked in prior work. Our findings show that Multi-task QLoRA effectively leverages transfer learning, achieving competitive or superior performance at the 1.5B, 3B, and 7B configurations relative to both Single-task QLoRA and Multi-task full fine-tuning. Larger models demonstrate more consistent balance between correctness and quality, whereas smaller models preserve functionality but exhibit a higher incidence of quality-related issues.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.15094v2</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Md Zahidul Haque, Saima Afrin, Antonio Mastropaolo</dc:creator>
    </item>
    <item>
      <title>The Flexibility Trap: Rethinking the Value of Arbitrary Order in Diffusion Language Models</title>
      <link>https://arxiv.org/abs/2601.15165</link>
      <description>arXiv:2601.15165v4 Announce Type: replace 
Abstract: Diffusion Large Language Models (dLLMs) break the rigid left-to-right constraint of traditional LLMs, enabling token generation in arbitrary orders. Intuitively, this flexibility implies a solution space that strictly supersets the fixed autoregressive trajectory, theoretically unlocking superior reasoning potential. However, in this paper, we find that for general reasoning tasks (e.g., mathematics and coding), arbitrary order generation may in fact limit the reasoning potential of dLLMs. We observe that dLLMs tend to exploit this order flexibility to bypass high-uncertainty tokens that are crucial for exploration, which can lead to a premature collapse of solution coverage. This observation motivates a rethink of RL approaches for dLLMs, where considerable complexities, such as handling combinatorial trajectories and intractable likelihoods, are often devoted to preserving this flexibility. We show that effective reasoning can be elicited by simply forgoing arbitrary order and applying standard Group Relative Policy Optimization (GRPO) instead. Our approach, JustGRPO, is minimalist yet surprisingly effective (e.g., 89.1% accuracy on GSM8K) while fully retaining the parallel decoding ability of dLLMs. Project page: https://nzl-thu.github.io/the-flexibility-trap</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.15165v4</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zanlin Ni, Shenzhi Wang, Yang Yue, Tianyu Yu, Weilin Zhao, Yeguo Hua, Tianyi Chen, Jun Song, Cheng Yu, Bo Zheng, Gao Huang</dc:creator>
    </item>
    <item>
      <title>CURE: Curriculum-guided Multi-task Training for Reliable Anatomy Grounded Report Generation</title>
      <link>https://arxiv.org/abs/2601.15408</link>
      <description>arXiv:2601.15408v2 Announce Type: replace 
Abstract: Medical vision-language models can automate the generation of radiology reports but struggle with accurate visual grounding and factual consistency. Existing models often misalign textual findings with visual evidence, leading to unreliable or weakly grounded predictions. We present CURE, an error-aware curriculum learning framework that improves grounding and report quality without any additional data. CURE fine-tunes a multimodal instructional model on phrase grounding, grounded report generation, and anatomy-grounded report generation using public datasets. The method dynamically adjusts sampling based on model performance, emphasizing harder samples to improve spatial and textual alignment. CURE improves grounding accuracy by +0.35 IoU, boosts report quality by +0.192 CXRFEScore, and reduces hallucinations by 18.6%. CURE is a data-efficient framework that enhances both grounding accuracy and report reliability. Code is available at https://github.com/PabloMessina/CURE and model weights at https://huggingface.co/pamessina/medgemma-4b-it-cure</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.15408v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 36279-36289</arxiv:journal_reference>
      <dc:creator>Pablo Messina, Andr\'es Villa, Juan Le\'on Alc\'azar, Karen S\'anchez, Carlos Hinojosa, Denis Parra, \'Alvaro Soto, Bernard Ghanem</dc:creator>
    </item>
    <item>
      <title>Lattice: A Confidence-Gated Hybrid System for Uncertainty-Aware Sequential Prediction with Behavioral Archetypes</title>
      <link>https://arxiv.org/abs/2601.15423</link>
      <description>arXiv:2601.15423v2 Announce Type: replace 
Abstract: We introduce Lattice, a hybrid sequential prediction system that conditionally activates learned behavioral structure using binary confidence gating. The system summarizes behavior windows as behavioral archetypes and activates archetype-based scoring only when an in-support confidence signal exceeds a validation-calibrated threshold, falling back to backbone predictions when uncertain. Our primary estimand is the controlled effect of adding Lattice to a fixed backbone on identical test rows. On MovieLens (30 paired seeds, full-catalog ranking), LSTM+Lattice improves HR@10 by +31.7% (gated) versus the LSTM backbone alone (p much less than 10^-20); ungated fusion reaches +58.7% on the same protocol. We do not claim gating maximizes pooled accuracy. With backbone-native archetypes (fit in each backbone's embedding space), gated lifts of +13.3% (transformer) and +17.0% (SASRec) hold under the same evaluation design. A prior approximately 0% transformer row in version 1 reflected an invalid cross-backbone transfer, not evidence that composition cannot help stronger encoders. Amazon Electronics provides supporting cross-domain evidence (+124.0% gated, 15 seeds, high variance). Controlled shift checks (appendix) illustrate gate refusal under distribution shift. Standalone SASRec and BERT4Rec scores are contextual references, not the target estimand. We report what composition achieves and when it activates; production calibration and implementation details remain proprietary pending patent prosecution.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.15423v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lorian Bannis</dc:creator>
    </item>
    <item>
      <title>Towards Automated Kernel Generation in the Era of LLMs</title>
      <link>https://arxiv.org/abs/2601.15727</link>
      <description>arXiv:2601.15727v3 Announce Type: replace 
Abstract: The performance of modern AI systems is fundamentally constrained by the quality of their underlying GPU kernels, which translate high-level algorithmic semantics into low-level hardware operations. Achieving near-optimal kernels requires expert-level understanding of hardware architectures and programming models, making kernel engineering a critical but notoriously time-consuming and non-scalable process. Recent advances in large language models and LLM-based agents have opened new possibilities for automating kernel generation and optimization. LLMs are well-suited to compress expert-level kernel knowledge that is difficult to formalize, while agentic systems further enable scalable optimization by casting kernel development as an iterative, feedback-driven loop. Rapid progress has been made in this area. However, the field remains fragmented and lacks a systematic perspective for LLM-driven kernel generation. This survey addresses this gap by providing a structured overview of existing approaches, spanning LLM-based approaches and agentic optimization workflows, and systematically organizing the datasets and benchmarks that underpin learning and evaluation in this domain. Moreover, key open challenges and future research directions are further outlined, aiming to establish a comprehensive reference for the next generation of automated kernel optimization. To keep track of this field, we maintain an open-source GitHub repository at https://github.com/flagos-ai/awesome-LLM-driven-kernel-generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.15727v3</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yang Yu, Peiyu Zang, Chi Hsu Tsai, Haiming Wu, Yixin Shen, Jialing Zhang, Haoyu Wang, Zhiyou Xiao, Jingze Shi, Yuyu Luo, Wentao Zhang, Chunlei Men, Guang Liu, Yonghua Lin</dc:creator>
    </item>
    <item>
      <title>Learning to Optimize by Differentiable Programming</title>
      <link>https://arxiv.org/abs/2601.16510</link>
      <description>arXiv:2601.16510v3 Announce Type: replace 
Abstract: Solving massive-scale optimization problems requires scalable first-order methods with low per-iteration cost. This tutorial highlights a shift in optimization: using differentiable programming not only to execute algorithms but to learn how to design them. Modern frameworks such as PyTorch, TensorFlow, and JAX enable this paradigm through efficient automatic differentiation. Embedding first-order methods within these systems allows end-to-end training that improves convergence and solution quality. Guided by Fenchel-Rockafellar duality, the tutorial demonstrates how duality-informed iterative schemes such as ADMM and PDHG can be learned and adapted. Case studies across LP, NNV, Sum-Rate maximization, OPF, and LRMP illustrate these gains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.16510v3</guid>
      <category>cs.MS</category>
      <category>cs.LG</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Liping Tao, Xindi Tong, Chee Wei Tan</dc:creator>
    </item>
    <item>
      <title>Moded Types for Grassroots Logic Programs, by AI, for AI (Full Version)</title>
      <link>https://arxiv.org/abs/2601.17957</link>
      <description>arXiv:2601.17957v4 Announce Type: replace 
Abstract: Grassroots Logic Programs (GLP) is a concurrent logic programming language in which logic variables are partitioned into paired readers and writers. An assignment is produced at most once via a writer and consumed at most once via its paired reader, and may contain additional readers and/or writers. This enables the concise expression of rich multidirectional communication modalities.
  ``Logic Programs as Types for Logic Programs'' (LICS'91) defined types as regular sets of paths over the Herbrand atom semantics of a logic program. Here, we develop a \emph{moded-atom semantics} that extends the standard Herbrand atom semantics in two ways: (\ia)~each atom subterm carries a \emph{mode}, recording whether it is consumed from or produced to the environment; and (\ib)~partial computations, including those that deadlock, fail, or never terminate, also contribute moded atoms to the semantics. We define types to be regular sets of \emph{moded paths} over this semantics, give a syntactic definition of GLP well-typing, and prove that a well-typed program is sound: every output path in its well-typed moded-atom semantics conforms to its declared output type.
  A type checker for GLP was implemented \emph{by} AI (Claude) in Dart, starting from the mathematical specification of Typed GLP (this paper), deriving from it an English+pseudocode spec (written by AI), and from the spec deriving Dart code (by AI). While GLP is naturally untyped, the motivation for typing it was \emph{for} AI: tasking AI to program complex communication modalities and hoping for the best turned out to be a tenuous strategy. The discipline we developed with Typed GLP is for the human designer and AI to jointly develop formal GLP type definitions and declarations, together with informal intent of the declared procedures, and only then let AI write the GLP code.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.17957v4</guid>
      <category>cs.PL</category>
      <category>cs.DC</category>
      <category>cs.FL</category>
      <category>cs.LO</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Ehud Shapiro</dc:creator>
    </item>
    <item>
      <title>Just-In-Time Reinforcement Learning: Continual Learning in LLM Agents Without Gradient Updates</title>
      <link>https://arxiv.org/abs/2601.18510</link>
      <description>arXiv:2601.18510v3 Announce Type: replace 
Abstract: While Large Language Model (LLM) agents excel at general tasks, they inherently struggle with continual adaptation due to the frozen weights after deployment. Conventional reinforcement learning (RL) offers a solution but incurs prohibitive computational costs and the risk of catastrophic forgetting. We introduce Just-In-Time Reinforcement Learning (JitRL), a training-free framework that enables test-time policy optimization without any gradient updates. JitRL maintains a dynamic, non-parametric memory of experiences and retrieves relevant trajectories to estimate action advantages on-the-fly. These estimates are then used to directly modulate the LLM's output logits. We theoretically prove that this additive update rule is the exact closed-form solution to the KL-constrained policy optimization objective. Extensive experiments on WebArena and Jericho demonstrate that JitRL establishes a new state-of-the-art among training-free methods. Crucially, JitRL outperforms the performance of computationally expensive fine-tuning methods (e.g., WebRL) while reducing monetary costs by over 30 times, offering a scalable path for continual learning agents. The code is available at https://github.com/liushiliushi/JitRL.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.18510v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yibo Li, Zijie Lin, Ailin Deng, Xuan Zhang, Yufei He, Shuo Ji, Tri Cao, Bryan Hooi</dc:creator>
    </item>
    <item>
      <title>GimmBO: Interactive Generative Image Model Merging via Bayesian Optimization</title>
      <link>https://arxiv.org/abs/2601.18585</link>
      <description>arXiv:2601.18585v2 Announce Type: replace 
Abstract: Fine-tuning-based adaptation is widely used to customize diffusion-based image generation, leading to large collections of community-created adapters that capture diverse subjects and styles. Adapters derived from the same base model can be merged with weights, enabling the synthesis of new visual results within a vast and continuous design space. To explore this space, current workflows rely on manual slider-based tuning, an approach that scales poorly and makes weight selection difficult, even when the candidate set is limited to 20-30 adapters. We propose GimmBO to support interactive exploration of adapter merging for image generation through Preferential Bayesian Optimization (PBO). Motivated by observations from real-world usage, including sparsity and constrained weight ranges, we introduce a two-stage BO backend that improves sampling efficiency and convergence in high-dimensional spaces. We evaluate our approach with simulated users and a user study, demonstrating improved convergence, high success rates, and consistent gains over BO and line-search baselines, and further show the flexibility of the framework through several extensions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.18585v2</guid>
      <category>cs.CV</category>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Chenxi Liu, Selena Ling, Alec Jacobson</dc:creator>
    </item>
    <item>
      <title>Bellman Residual Minimization for Control: Geometry, Stationarity, and Convergence</title>
      <link>https://arxiv.org/abs/2601.18840</link>
      <description>arXiv:2601.18840v4 Announce Type: replace 
Abstract: Markov decision problems are most commonly solved via dynamic programming. Another approach is Bellman residual minimization, which directly minimizes the squared Bellman residual objective function. However, compared to dynamic programming, this approach has received relatively less attention, mainly because it is often less efficient in practice and can be more difficult to extend to model-free settings such as reinforcement learning. Nonetheless, Bellman residual minimization has several advantages that make it worth investigating, such as more stable convergence with function approximation for value functions. While Bellman residual methods for policy evaluation have been widely studied, methods for policy optimization (control tasks) have been scarcely explored. In this paper, we establish foundational results for the control Bellman residual minimization for policy optimization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.18840v4</guid>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Donghwan Lee, Hyukjun Yang</dc:creator>
    </item>
    <item>
      <title>Payoff scaling shapes cooperation in LLM agents across languages</title>
      <link>https://arxiv.org/abs/2601.19082</link>
      <description>arXiv:2601.19082v2 Announce Type: replace 
Abstract: Large language models (LLMs) are increasingly deployed as autonomous agents that negotiate, coordinate, and act on behalf of users. Whether they cooperate in such settings is no longer just an academic question, but a central issue for AI governance. We approach it from a strategic-behaviour angle, asking how two everyday levers - the size of what is at stake, and the language in which the interaction is described - shape the strategies LLMs adopt in a repeated Prisoner's Dilemma. Rather than reading cooperation off raw action counts, we train supervised classifiers to recognise the canonical strategies of repeated games (always cooperate, always defect, Tit-for-Tat, Win-Stay-Lose-Shift) and use them as a lens onto LLM behaviour. To know what the strategy distribution should look like under the same payoffs, we derive an evolutionary game theory (EGT) baseline and compare it with the LLM data. The two outcomes disagree in a revealing way: as stakes grow, evolutionary theory predicts that defection should take over the population, yet LLMs move in the opposite direction, becoming more cooperative - a signature, we argue, of alignment training and the human-like reasoning patterns LLMs inherit from their training data. We further show that this picture is not particular to frontier-scale, proprietary models: it also occurs with three open-weight smaller LLMs. Overall, our analysis highlights that payoff design and linguistic framing are powerful but under-explored levers for steering LLM behaviour, with direct implications for evaluating, aligning, and governing multi-agent AI systems deployed in high-stakes, multilingual environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.19082v2</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.GT</category>
      <category>cs.LG</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Trung-Kiet Huynh, Dao-Sy Duy-Minh, Thanh-Bang Cao, Phong-Hao Le, Hong-Dan Nguyen, Phu-Quy Nguyen-Lam, Minh-Luan Nguyen-Vo, Hong-Phat Pham, Phu-Hoa Pham, Thien-Kim Than, Chi-Nguyen Tran, Huy Tran, Gia-Thoai Tran-Le, Alessio Buscemi, Le Hong Trang, The Anh Han</dc:creator>
    </item>
    <item>
      <title>Meeting SLOs, Slashing Hours: Automated Enterprise LLM Optimization with OptiKIT</title>
      <link>https://arxiv.org/abs/2601.20408</link>
      <description>arXiv:2601.20408v2 Announce Type: replace 
Abstract: Enterprise LLM deployment faces a critical scalability challenge: organizations must optimize models systematically to scale AI initiatives within constrained compute budgets, yet the specialized expertise required for manual optimization remains a niche and scarce skillset. This challenge is particularly evident in managing GPU utilization across heterogeneous infrastructure while enabling teams with diverse workloads and limited LLM optimization experience to deploy models efficiently. We present OPTIKIT, a distributed LLM optimization framework that democratizes model compression and tuning by automating complex optimization workflows for non-expert teams. OPTIKIT provides dynamic resource allocation, staged pipeline execution with automatic cleanup, and seamless enterprise integration. In production, it delivers more than 2x GPU throughput improvement while empowering application teams to achieve consistent performance improvements without deep LLM optimization expertise. We share both the platform design and key engineering insights into resource management, pipeline orchestration, and integration patterns that enable large-scale, production-grade democratization of model optimization. Finally, we open-source the system to enable external contributions and broader reproducibility.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.20408v2</guid>
      <category>cs.DC</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Nicholas Santavas, Kareem Eissa, Patrycja Cieplicka, Piotr Florek, Matteo Nulli, Stefan Vasilev, Seyyed Hadi Hashemi, Antonios Gasteratos, Shahram Khadivi</dc:creator>
    </item>
    <item>
      <title>Comparative evaluation of training strategies using partially labelled datasets for segmentation of white matter hyperintensities and stroke lesions in FLAIR MRI</title>
      <link>https://arxiv.org/abs/2601.20503</link>
      <description>arXiv:2601.20503v2 Announce Type: replace 
Abstract: White matter hyperintensities (WMH) and ischaemic stroke lesions (ISL) are key imaging biomarkers of cerebral small vessel disease (SVD) detectable on magnetic resonance imaging (MRI). The development of robust deep learning models to automatically segment and differentiate these pathologies remains challenging. Specifically, WMH and ISL frequently co-occur within the same subject and present as visually confounding hyperintensities on fluid-attenuated inversion recovery (FLAIR) sequences, complicating their accurate delineation. To address the scarcity of fully annotated cohorts, we systematically evaluated six accessible strategies for training a joint WMH and ISL segmentation model using partially labelled data. We aggregated privately held and publicly available datasets to curate a large-scale cohort of 2,052 MRI volumes, of which 1341 and 1152 volumes contained ground truth annotations for WMH and ISL, respectively. Our analysis indicates that multiple strategies effectively leverage partially labelled data to enhance overall model performance, with pseudolabelling emerging as the most effective approach. This model exhibited a consistent WMH segmentation policy and successfully detected the majority of FLAIR-positive ISL. These findings demonstrate the viability of using partially labelled data to develop reliable automated segmentation tools, which can support ongoing SVD monitoring and high-throughput biomarker extraction for large-scale clinical research.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.20503v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jesse Phitidis, Alison Q. Smithard, William N. Whiteley, Joanna M. Wardlaw, Miguel O. Bernabeu, Maria Vald\'es Hern\'andez</dc:creator>
    </item>
    <item>
      <title>Shortest LCD embeddings of binary, ternary and quaternary linear codes</title>
      <link>https://arxiv.org/abs/2601.20600</link>
      <description>arXiv:2601.20600v2 Announce Type: replace 
Abstract: In the recent years, there has been active research on self-orthogonal embeddings of linear codes since they yielded some optimal self-orthogonal codes. LCD codes have a trivial hull so they are counterparts of self-orthogonal codes. So it is a natural question whether one can embed linear codes into optimal LCD codes. To answer it, we first determine the number of columns to be added to a generator matrix of a linear code in order to embed the given code into an LCD code. Then we characterize all possible forms of shortest LCD embeddings of a linear code. As examples, we start from binary and ternary Hamming codes of small lengths and obtain optimal LCD codes with minimum distance 4. Furthermore, we find new ternary LCD codes with parameters including $[23, 4, 14]$, $[23, 5, 12]$, $[24, 6, 12]$, and $[25, 5, 14]$ and a new quaternary LCD $[21, 10, 8]$ code, each of which has minimum distance one greater than those of known codes. This shows that our shortest LCD embedding method is useful in finding optimal LCD codes over various fields.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.20600v2</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Junmin An, Ji-Hoon Hong, Jon-Lark Kim, Haeun Lim</dc:creator>
    </item>
    <item>
      <title>Mobility-Embedded POIs: Learning What A Place Is and How It Is Used from Human Movement</title>
      <link>https://arxiv.org/abs/2601.21149</link>
      <description>arXiv:2601.21149v3 Announce Type: replace 
Abstract: Recent progress in geospatial foundation models highlights the importance of learning general-purpose representations for real-world locations, particularly points-of-interest (POIs) where human activity concentrates. Existing approaches, however, focus primarily on place identity derived from static textual metadata, or learn representations tied to trajectory context, which capture movement regularities rather than how places are actually used (i.e., POI's function). We argue that POI function is a missing but essential signal for general POI representations. We introduce Mobility-Embedded POIs (ME-POIs), a framework that augments POI embeddings derived, from language models with large-scale human mobility data to learn POI-centric, context-independent representations grounded in real-world usage. ME-POIs encodes individual visits as temporally contextualized embeddings and aligns them with learnable POI representations via contrastive learning to capture usage patterns across users and time. To address long-tail sparsity, we propose a novel mechanism that propagates temporal visit patterns from nearby, frequently visited POIs across multiple spatial scales. We evaluate ME-POIs on five newly proposed map enrichment tasks, testing its ability to capture both the identity and function of POIs. Across all tasks, augmenting text-based embeddings with ME-POIs consistently outperforms both text-only and mobility-only baselines. Notably, ME-POIs trained on mobility data alone can surpass text-only models on certain tasks, highlighting that POI function is a critical component of accurate and generalizable POI representations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.21149v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Maria Despoina Siampou, Shushman Choudhury, Shang-Ling Hsu, Neha Arora, Cyrus Shahabi</dc:creator>
    </item>
    <item>
      <title>More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)</title>
      <link>https://arxiv.org/abs/2601.21522</link>
      <description>arXiv:2601.21522v2 Announce Type: replace 
Abstract: The performance of large language models (LLMs) on verifiable tasks is usually measured by pass@k, the probability of answering a question correctly at least once in k trials. At a fixed budget, a more suitable metric is coverage@cost, the average number of unique questions answered as a function of the total number of attempts. We connect the two metrics and show that the empirically-observed power-law behavior in pass@k leads to a sublinear growth of the coverage@cost (diminishing returns). To solve this problem, we propose Reset-and-Discard (ReD), a query method of LLMs that increases coverage@cost for a given budget, regardless of the pass@k form. Moreover, given a pass@k, we can quantitatively predict the savings in the total number of attempts using ReD. If pass@k is not available for the model, ReD can infer its power-law exponent. Experiments on three LLMs across coding (HumanEval), math (GSM8K), and reasoning (MMLU-Pro) benchmarks demonstrate that ReD substantially reduces the required attempts, tokens, and USD cost to reach a desired coverage, while also offering an efficient way to measure inference power-laws. ReD's advantage is maintained for imperfect verifiers and outperforms the tested allocation baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.21522v2</guid>
      <category>cs.LG</category>
      <category>cond-mat.dis-nn</category>
      <category>cs.AI</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sagi Meir, Tommer D. Keidar, Noam Levi, Shlomi Reuveni, Barak Hirshberg</dc:creator>
    </item>
    <item>
      <title>Language-based Trial and Error Falls Behind in the Era of Experience</title>
      <link>https://arxiv.org/abs/2601.21754</link>
      <description>arXiv:2601.21754v3 Announce Type: replace 
Abstract: While Large Language Models (LLMs) excel in language-based agentic tasks, their applicability to unseen, nonlinguistic environments (e.g., symbolic or spatial tasks) remains limited. Previous work attributes this performance gap to the mismatch between the pretraining distribution and the testing distribution. In this work, we demonstrate the primary bottleneck is the prohibitive cost of exploration: mastering these tasks requires extensive trial-and-error, which is computationally unsustainable for parameter-heavy LLMs operating in a high dimensional semantic space. To address this, we propose SCOUT (Sub-Scale Collaboration On Unseen Tasks), a novel framework that decouples exploration from exploitation. We employ lightweight "scouts" (e.g., small MLPs) to probe environmental dynamics at a speed and scale far exceeding LLMs. The collected trajectories are utilized to bootstrap the LLM via Supervised Fine-Tuning (SFT), followed by multi-turn Reinforcement Learning (RL) to activate its latent world knowledge. Empirically, SCOUT enables a Qwen2.5-3B-Instruct model to achieve an average score of 0.86, significantly outperforming proprietary models, including Gemini-2.5-Pro (0.60), while saving about 60% GPU hours consumption.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.21754v3</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haoyu Wang, Guozheng Ma, Shugang Cui, Yilun Kong, Haotian Luo, Li Shen, Mengya Gao, Yichao Wu, Xiaogang Wang, Dacheng Tao</dc:creator>
    </item>
    <item>
      <title>Nonparametric LLM Evaluation from Preference Data</title>
      <link>https://arxiv.org/abs/2601.21816</link>
      <description>arXiv:2601.21816v2 Announce Type: replace 
Abstract: Evaluating the performance of large language models (LLMs) from human preference data is crucial for obtaining LLM leaderboards. However, many existing approaches either rely on restrictive parametric assumptions or lack valid uncertainty quantification when flexible machine learning methods are used. In this paper, we propose a nonparametric statistical framework, called DMLRank, for comparing and ranking LLMs from preference data using debiased machine learning (DML). For this, we introduce generalized average ranking scores (GARS), which generalize commonly used ranking models, including the Bradley-Terry model or PageRank/ Rank centrality, with complex human responses such as ties. DMLRank comes with the following advantages: (i)~It produces statistically efficient estimates of GARS ranking scores. (ii) It naturally allows the incorporation of black-box machine learning methods for estimation. (iii) It can be combined with pre-trained LLM evaluators (e.g., using LLM-as-a-judge). (iv) It suggests optimal policies for collecting preference data under budget constraints. We demonstrate these advantages both theoretically and empirically using both synthetic and real-world preference datasets. In summary, our framework provides practitioners with powerful, state-of-the-art methods for comparing or ranking LLMs for leaderboards.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.21816v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dennis Frauen, Athiya Deviyani, Mihaela van der Schaar, Stefan Feuerriegel</dc:creator>
    </item>
    <item>
      <title>Mechanistic Data Attribution: Tracing the Training Origins of Interpretable LLM Units</title>
      <link>https://arxiv.org/abs/2601.21996</link>
      <description>arXiv:2601.21996v2 Announce Type: replace 
Abstract: While Mechanistic Interpretability has identified interpretable circuits in LLMs, their causal origins in training data remain elusive. We introduce Mechanistic Data Attribution (MDA), a scalable framework that employs Influence Functions to trace interpretable units back to specific training samples. Through extensive experiments on the Pythia family, we causally validate that targeted intervention--removing or augmenting a small fraction of high-influence samples--significantly modulates the emergence of interpretable heads, whereas random interventions show no effect. Our analysis reveals that repetitive structural data (e.g., LaTeX, XML) acts as a mechanistic catalyst. Furthermore, we observe that interventions targeting induction head formation induce a concurrent change in the model's in-context learning (ICL) capability. This provides direct causal evidence for the long-standing hypothesis regarding the functional link between induction heads and ICL. Finally, we propose a mechanistic data augmentation pipeline that consistently accelerates circuit convergence across model scales, providing a principled methodology for steering the developmental trajectories of LLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.21996v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jianhui Chen, Yuzhang Luo, Liangming Pan</dc:creator>
    </item>
    <item>
      <title>Latent Spherical Flow Policy for Reinforcement Learning with Combinatorial Actions</title>
      <link>https://arxiv.org/abs/2601.22211</link>
      <description>arXiv:2601.22211v2 Announce Type: replace 
Abstract: Reinforcement learning (RL) with combinatorial action spaces remains challenging because feasible action sets are exponentially large and governed by complex feasibility constraints, making direct policy parameterization impractical. Existing approaches embed task-specific value functions into constrained optimization programs or learn deterministic structured policies, sacrificing generality and policy expressiveness. We propose a solver-induced \emph{latent spherical flow policy} that brings the expressiveness of modern generative policies to combinatorial RL while guaranteeing feasibility by design. Our method, LSFlow, learns a \emph{stochastic} policy in a compact continuous latent space via spherical flow matching, and delegates feasibility to a combinatorial optimization solver that maps each latent sample to a valid structured action. To improve efficiency, we train the value network directly in the latent space, avoiding repeated solver calls during policy optimization. To address the piecewise-constant and discontinuous value landscape induced by solver-based action selection, we introduce a smoothed Bellman operator that yields stable, well-defined learning targets. Empirically, our approach outperforms state-of-the-art baselines by an average of 20.6\% across a range of challenging combinatorial RL tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.22211v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Lingkai Kong, Anagha Satish, Hezi Jiang, Akseli Kangaslahti, Andrew Ma, Wenbo Chen, Mingxiao Song, Lily Xu, Milind Tambe</dc:creator>
    </item>
    <item>
      <title>Flexible FTN-Aided OTFS Modulation for High-Mobility LEO Satellite-to-Ground Communications</title>
      <link>https://arxiv.org/abs/2601.22526</link>
      <description>arXiv:2601.22526v2 Announce Type: replace 
Abstract: In low Earth orbit (LEO) satellite communications, the link quality fluctuates drastically during a satellite pass, exhibiting a wide dynamic range from the horizon to the zenith. Moreover, the high relative velocity induces severe Doppler shifts. While orthogonal time frequency space (OTFS) modulation effectively resolves the doubly-selective fading, its spectral efficiency is fundamentally bounded by the Nyquist limit. To break this bottleneck while adapting to dynamic channel variations, this paper proposes a LEO satellite-assisted flexible faster-than-Nyquist (FFTN)-OTFS (LEO-FFTN-OTFS) scheme. Conventional fixed-parameter FTN signaling suffers from severe inter-symbol interference at low elevation angles or spectral inefficiency at the zenith. To overcome this, a low-complexity Look-Up Table (LUT) mechanism is designed to adaptively optimize the time-domain compression factor based on the instantaneous signal-to-noise ratio. At the receiver, a linear minimum mean-square error (LMMSE) detector is formulated to suppress the colored noise and structured interference with minimal computational overhead. Besides, a rigorous theoretical framework is established incorporating 3GPP Tapped Delay Line (TDL) channel models to derive analytical expressions for effective throughput, energy efficiency, and bit error rate (BER) bounds.Simulation results demonstrate that the proposed adaptive scheme eliminates the irreducible error floor inherent in aggressive static FTN configurations at low SNRs, and maximizes the effective throughput across the entire elevation trajectory, achieving a superior trade-off between spectral efficiency and transmission reliability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.22526v2</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Chaorong Zhang, Benjamin K. Ng, Hui Xu, Yue Liu, Chan-Tong Lam, Halim Yanikomeroglu</dc:creator>
    </item>
    <item>
      <title>Beyond Fixed Rounds: Data-Free Early Stopping for Practical Federated Learning</title>
      <link>https://arxiv.org/abs/2601.22669</link>
      <description>arXiv:2601.22669v3 Announce Type: replace 
Abstract: Federated Learning (FL) facilitates decentralized collaborative learning without transmitting raw data. However, reliance on fixed global rounds or validation data for hyperparameter tuning hinders practical deployment by incurring high computational costs and privacy risks. To address this, we propose a data-free early stopping framework that determines the optimal stopping point by monitoring the task vector's growth rate using only server-side parameters. The numerical results on skin lesion/blood cell/colon pathology classification demonstrate that our approach is comparable to the validation-based early stopping across various state-of-the-art FL methods. In particular, the proposed framework requires an average of 45/12/31 (skin lesion/blood cell/colon pathology) additional rounds to achieve over 12.3%/8.9%/3.9% higher performance than early stopping based on validation data. Moreover, the proposed framework requires only 9/8/14 additional rounds to screen bad configurations, which is less than 3% of the fixed-round budget. To the best of our knowledge, this is the first work to propose a data-free early stopping framework for FL methods. Our code is available at this open repository.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.22669v3</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Youngjoon Lee, Hyukjoon Lee, Seungrok Jung, Andy Luo, Jinu Gong, Yang Cao, Joonhyuk Kang</dc:creator>
    </item>
    <item>
      <title>UA-DCM: Uncertainty-aware Causal Decision Making via Effect Bound Decomposition</title>
      <link>https://arxiv.org/abs/2601.22736</link>
      <description>arXiv:2601.22736v2 Announce Type: replace 
Abstract: Causal inference from observational data can provide strong evidence for finding the best action in a decision-making scenario without having to perform expensive randomized trials. The causal effect of an action is often not pointwise identifiable even with infinite data due to unobserved confounding factors. Furthermore, having only finitely many samples adds another layer of uncertainty to causal effect estimation. Several existing methods can be used to obtain upper and lower bounds to the causal effect, ranging from symbolic methods to the more recent neural network-based approaches, which implicitly incorporate both sources of uncertainty. However, these methods do not inform whether collecting more samples may or may not help identify the best action from observational data, leaving experts in the dark about their data collection strategies. We address this problem with a novel framework that can distinguish the range of causal effect values that might be eliminated by collecting more samples from the range of values that, with high probability, cannot be eliminated with more observational samples. We show that this partitioning can be obtained by solving max-min and min-max optimization problems. We leverage neural causal models to approximately recover this decomposition in practice. We demonstrate via experiments on synthetic and real-world datasets that our algorithm can determine when collecting more samples will not help determine the best action. Our framework can help practitioners decide when to resort to non-observational studies or seek to measure some of the unmeasured confounders for optimal decision-making.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.22736v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Md Musfiqur Rahman, Ziwei Jiang, Hilaf Hasson, Murat Kocaoglu</dc:creator>
    </item>
    <item>
      <title>MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering</title>
      <link>https://arxiv.org/abs/2601.22859</link>
      <description>arXiv:2601.22859v3 Announce Type: replace 
Abstract: The evolution of Large Language Model (LLM) agents for software engineering (SWE) is constrained by the scarcity of verifiable datasets, a bottleneck stemming from the complexity of constructing executable environments across diverse languages. To address this, we introduce MEnvAgent, a Multi-language framework for automated Environment construction that facilitates scalable generation of verifiable task instances. MEnvAgent employs a multi-agent Planning-Execution-Verification architecture to autonomously resolve construction failures and integrates a novel Environment Reuse Mechanism that reduces computational overhead by incrementally patching historical environments. Evaluations on MEnvBench, a new benchmark comprising 1,000 tasks across 10 languages, demonstrate that MEnvAgent outperforms baselines, improving Fail-to-Pass (F2P) rates by 8.6% while reducing time costs by 43%. Additionally, we demonstrate the utility of MEnvAgent by constructing MEnvData-SWE, the largest open-source polyglot dataset of realistic verifiable Docker environments to date, alongside solution trajectories that enable consistent performance gains on SWE tasks across a wide range of models. Our code, benchmark, and dataset are available at https://github.com/ernie-research/MEnvAgent.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.22859v3</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chuanzhe Guo, Jingjing Wu, Sijun He, Yang Chen, Zhaoqi Kuang, Shilong Fan, Bingjin Chen, Siqi Bao, Jing Liu, Hua Wu, Qingfu Zhu, Wanxiang Che, Haifeng Wang</dc:creator>
    </item>
    <item>
      <title>Optimal Fair Aggregation of Crowdsourced Noisy Labels using Demographic Parity Constraints</title>
      <link>https://arxiv.org/abs/2601.23221</link>
      <description>arXiv:2601.23221v2 Announce Type: replace 
Abstract: As acquiring reliable ground-truth labels is usually costly, or infeasible, crowdsourcing and aggregation of noisy human annotations is the typical resort. Aggregating subjective labels, though, may amplify individual biases, particularly regarding sensitive features, raising fairness concerns. Nonetheless, fairness in crowdsourced aggregation remains largely unexplored, with no existing convergence guarantees and only limited post-processing approaches for enforcing $\varepsilon$-fairness under demographic parity. We address this gap by analyzing the fairness s of crowdsourced aggregation methods within the $\varepsilon$-fairness framework, for Majority Vote and Optimal Bayesian aggregation. In the small-crowd regime, we derive an upper bound on the fairness gap of Majority Vote in terms of the fairness gaps of the individual annotators. We further show that the fairness gap of the aggregated consensus converges exponentially fast to that of the ground-truth under interpretable conditions. Since ground-truth itself may still be unfair, we generalize a state-of-the-art multiclass fairness post-processing algorithm from the continuous to the discrete setting, which enforces strict demographic parity constraints to any aggregation rule. Experiments on synthetic and real datasets demonstrate the effectiveness of our approach and corroborate the theoretical insights.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.23221v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Gabriel Singer, Samuel Gruffaz, Olivier Vo Van, Nicolas Vayatis, Argyris Kalogeratos</dc:creator>
    </item>
    <item>
      <title>VideoGPA: Distilling Geometry Priors for 3D-Consistent Video Generation</title>
      <link>https://arxiv.org/abs/2601.23286</link>
      <description>arXiv:2601.23286v4 Announce Type: replace 
Abstract: While recent video diffusion models (VDMs) produce visually impressive results, they fundamentally struggle to maintain 3D structural consistency, often resulting in object deformation or spatial drift. We hypothesize that these failures arise because standard denoising objectives lack explicit incentives for geometric coherence. To address this, we introduce VideoGPA (Video Geometric Preference Alignment), a data-efficient self-supervised framework that leverages a geometry foundation model to automatically derive dense preference signals that guide VDMs via Direct Preference Optimization (DPO). This approach effectively steers the generative distribution toward inherent 3D consistency without requiring human annotations. VideoGPA significantly enhances temporal stability, geometric plausibility, and motion coherence using minimal preference pairs, consistently outperforming state-of-the-art baselines in extensive experiments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.23286v4</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hongyang Du, Junjie Ye, Xiaoyan Cong, Runhao Li, Jingcheng Ni, Aman Agarwal, Zeqi Zhou, Zekun Li, Randall Balestriero, Yue Wang</dc:creator>
    </item>
    <item>
      <title>How Hyper-Datafication Impacts the Sustainability Costs in Frontier AI</title>
      <link>https://arxiv.org/abs/2602.00056</link>
      <description>arXiv:2602.00056v4 Announce Type: replace 
Abstract: Large-scale data has fuelled the success of frontier artificial intelligence (AI) models over the past decade. This expansion has relied on sustained efforts by large technology corporations to aggregate and curate internet-scale datasets. In this work, we examine the environmental, social, and economic costs of large-scale data in AI through a sustainability lens. We argue that the field is shifting from building models from data to actively creating data for building models. We characterise this transition as hyper-datafication, which marks a critical juncture for the future of frontier AI and its societal impacts. To quantify and contextualise data-related costs, we analyse approximately 550,000 datasets from the Hugging Face Hub, focusing on dataset growth, storage-related energy consumption and carbon footprint, and societal representation using language data. We complement this analysis with qualitative responses from data workers in Kenya to examine the labour involved, including direct employment by big tech corporations and exposure to graphic content. We further draw on external data sources to substantiate our findings by illustrating the global disparity in data centre infrastructure. Our analyses reveal that hyper-datafication drives substantial and growing environmental costs while systematically redistributing labour risks and representational harms toward the Global South. Thus, we propose Data PROOFS recommendations spanning provenance, resource awareness, ownership, openness, frugality, and standards to mitigate these costs. Our work aims to make visible the often-overlooked costs of data that underpin frontier AI and to stimulate broader debate within the research community and beyond.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.00056v4</guid>
      <category>cs.CY</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3805689.3812393</arxiv:DOI>
      <dc:creator>Sophia N. Wilson, Sebastian Mair, Mophat Okinyi, Erik B. Dam, Janin Koch, Raghavendra Selvan</dc:creator>
    </item>
    <item>
      <title>DIVERGE: Diversity-Enhanced RAG for Open-Ended Information Seeking</title>
      <link>https://arxiv.org/abs/2602.00238</link>
      <description>arXiv:2602.00238v2 Announce Type: replace 
Abstract: Existing retrieval-augmented generation (RAG) systems often assume that each query has a single correct answer. This assumption overlooks open-ended information-seeking scenarios where multiple plausible answers are valuable, and where diversity is important for creativity, fairness, and inclusive access to information. We show that standard RAG systems fail to fully use diverse retrieved contexts: simply increasing retrieval diversity does not necessarily lead to diverse generations. To address this limitation, we propose Diverge, a plug-and-play agentic RAG framework that improves the diversity--quality trade-off through iterative, reflection-guided exploration of diverse viewpoints and diversity-aware retrieval support. We further introduce evaluation metrics for characterizing the diversity-quality trade-off in open-ended question answering. Experiments across multiple real-world datasets and backbone LLMs show that Diverge achieves the best trade-off among competitive baselines, increasing diversity by $\sim2\times$ without noticeable quality degradation. These results reveal a systematic limitation of current RAGs and show the value of explicit diversity modeling.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.00238v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tianyi Hu, Niket Tandon, Akhil Arora</dc:creator>
    </item>
    <item>
      <title>Your Self-Play Algorithm is Secretly an Adversarial Imitator: Understanding LLM Self-Play through the Lens of Imitation Learning</title>
      <link>https://arxiv.org/abs/2602.01357</link>
      <description>arXiv:2602.01357v2 Announce Type: replace 
Abstract: Self-play post-training methods has emerged as an effective approach for finetuning large language models and turn the weak language model into strong language model without preference data. However, the theoretical foundations for self-play finetuning remain underexplored. In this work, we tackle this by connecting self-play finetuning with adversarial imitation learning by formulating finetuning procedure as a min-max game between the model and a regularized implicit reward player parameterized by the model itself. This perspective unifies self-play imitation and general preference alignment within a common framework. Under this formulation, we present a game-theoretic analysis showing that the self-play finetuning will converge to it's equilibrium. Guided by this theoretical formulation, we propose a new self-play imitation finetuning algorithm based on the $\chi^2$-divergence variational objective with bounded rewards and improved stability. Experiments on various of language model finetuning tasks demonstrate consistent improvements over existing self-play methods and validate our theoretical insights.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.01357v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shangzhe Li, Xuchao Zhang, Chetan Bansal, Weitong Zhang</dc:creator>
    </item>
    <item>
      <title>Reward Shaping for (Inference-Time) Alignment: A Stackelberg Game Perspective</title>
      <link>https://arxiv.org/abs/2602.02572</link>
      <description>arXiv:2602.02572v2 Announce Type: replace 
Abstract: Existing alignment methods directly use the reward model learned from user preference data to optimize an LLM policy, subject to KL regularization with respect to the base policy. This practice is suboptimal for maximizing user's utility because the KL regularization may cause the LLM to inherit the bias in the base policy that conflicts with user preferences. While amplifying rewards for preferred outputs can mitigate this bias, it also increases the risk of reward hacking. This tradeoff motivates the problem of optimally designing reward models under KL regularization. We formalize this reward model optimization problem as a Stackelberg game, and show that a simple reward shaping scheme can effectively approximate the optimal reward model. We empirically evaluate our method in inference-time alignment settings and demonstrate that it integrates seamlessly into existing alignment methods with minimal overhead. Our method consistently improves average reward and achieves win-tie rates exceeding 66% against all baselines, averaged across evaluation settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.02572v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haichuan Wang, Tao Lin, Lingkai Kong, Ce Li, Hezi Jiang, Milind Tambe</dc:creator>
    </item>
    <item>
      <title>Composition for Pufferfish Privacy</title>
      <link>https://arxiv.org/abs/2602.02718</link>
      <description>arXiv:2602.02718v2 Announce Type: replace 
Abstract: When creating public data products out of confidential datasets, inferential/posterior-based privacy definitions, such as Pufferfish, provide compelling privacy semantics for data with correlations. However, such privacy definitions are rarely used in practice because they do not always compose. For example, it is possible to design algorithms for these privacy definitions that have no leakage when run once but reveal the entire dataset when run more than once. We prove necessary and sufficient conditions that must be added to ensure linear composition for Pufferfish mechanisms, hence avoiding such privacy collapse. These extra conditions turn out to be differential privacy-style inequalities, indicating that achieving both the interpretable semantics of Pufferfish for correlated data and composition benefits requires adopting differentially private mechanisms to Pufferfish. We show that such translation is possible through a concept called the $a(b)$-influence curve, and many existing differentially private algorithms can be translated with our framework into a composable Pufferfish algorithm. We illustrate the benefit of our new framework by designing composable Pufferfish algorithms for Markov chains that significantly outperform prior work.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.02718v2</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiamu Bai, Guanlin He, Xin Gu, Daniel Kifer, Kiwan Maeng</dc:creator>
    </item>
    <item>
      <title>TAME: A Trustworthy Test-Time Evolution of Agent Memory with Systematic Benchmarking</title>
      <link>https://arxiv.org/abs/2602.03224</link>
      <description>arXiv:2602.03224v2 Announce Type: replace 
Abstract: Test-time evolution of agent memory represents a pivotal paradigm for advancing AGI, as it strengthens complex reasoning through experience accumulation without requiring parameter updates. However, even during benign task evolution, agent safety alignment remains vulnerable, a phenomenon known as Agent Memory Misevolution. To evaluate this phenomenon, we construct the Trust-Memevo benchmark and find that agents exhibit an overall decline in trustworthiness across multiple tasks during benign task evolution. To address this issue, we propose TAME, a trust-aware memory evolution framework in which a shared memory bank is jointly governed by an Executor and an Evaluator. The Executor retrieves and applies transferable experiences to support task solving, while the Evaluator assesses the contribution of each utilized experience to the outcome and produces trust-aware feedback to guide subsequent memory use. This executor-evaluator loop enables memory to be selectively reinforced, cautiously reused, and continuously expanded over time. Experiments show that TAME mitigates memory misevolution while achieving strong task performance. In particular, on the GPT-5.2 AIME benchmark, TAME improves accuracy by 14.6 percentage points over the strongest existing method and maintains competitive trustworthiness.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.03224v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yu Cheng, Yongkang Hu, Jiuan Zhou, Yushuo Zhang, Yihang Chen, Huichi Zhou, Mingang Chen, Zhizhong Zhang, Kun Shao, Yuan Xie, Zhaoxia Yin</dc:creator>
    </item>
    <item>
      <title>Entropy Functions on Two-Dimensional Faces of Polymatroid Region Spanned by a Matroid and a Rank-One Matroid</title>
      <link>https://arxiv.org/abs/2602.03363</link>
      <description>arXiv:2602.03363v2 Announce Type: replace 
Abstract: Characterization of entropy functions is of fundamental importance in information theory. By imposing constraints on their Shannon outer bound, i.e., the polymatroidal region, one obtains the faces of the region and entropy functions on them with special structures. In this paper, we characterize entropy functions on 2-dimensional faces of polymatroidal region of degree n spanned by a matroid and a rank-1 matroid. We classify all such 2-dimensional faces into four types.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.03363v2</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kaizhe He, Qi Chen</dc:creator>
    </item>
    <item>
      <title>The Label Horizon Paradox: Rethinking Supervision Targets in Financial Forecasting</title>
      <link>https://arxiv.org/abs/2602.03395</link>
      <description>arXiv:2602.03395v4 Announce Type: replace 
Abstract: While deep learning has revolutionized financial forecasting through sophisticated architectures, the design of the supervision signal itself is rarely scrutinized. We challenge the canonical assumption that training labels must strictly mirror inference targets, uncovering the Label Horizon Paradox: the optimal supervision signal often deviates from the prediction goal, shifting across intermediate horizons governed by market dynamics. We theoretically ground this phenomenon in a dynamic signal-noise trade-off, demonstrating that generalization hinges on the competition between marginal signal realization and noise accumulation. To operationalize this insight, we propose a bi-level optimization framework that autonomously identifies the optimal proxy label within a single training run. Extensive experiments on large-scale financial datasets demonstrate consistent improvements over conventional baselines, thereby opening new avenues for label-centric research in financial forecasting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.03395v4</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chen-Hui Song, Shuoling Liu, Liyuan Chen</dc:creator>
    </item>
    <item>
      <title>Enhancing Adversarial Robustness with Signed Distance Fields for Harmonizing Geometric Invariance and Texture</title>
      <link>https://arxiv.org/abs/2602.05175</link>
      <description>arXiv:2602.05175v2 Announce Type: replace 
Abstract: Deep neural networks demonstrate impressive performance in visual recognition but remain highly vulnerable to imperceptible adversarial attacks. Existing defense strategies such as adversarial training and diffusion-based purification have achieved significant progress but are frequently constrained by high computational cost, information loss, and inference latency. To address these challenges, we propose a Geometric and Texture balancing Purification (GeoTexPuri) framework that enhances adversarial robustness by harmonizing invariant geometric structures with textural features. Specifically, the framework integrates dense geometric guidance into the training phase by transforming discrete image masks into continuous spatial fields via Signed Distance Fields (SDF). This process establishes stable structural anchors that shield the model from local pixel noise. Through a multi-stream training objective, the model learns to internalize purified representations that effectively align semantic textural cues with these underlying geometric invariants. Extensive experiments on ImageNet demonstrate the efficacy of our approach. GeoTexPuri achieves 84.79\% clean accuracy and 83.52\% robust accuracy under the AutoAttack. Crucially, GeoTexPuri functions as a deterministic classifier during inference, requiring only the input image without any auxiliary geometric modules or additional computational costs, thereby ensuring a scalable and efficient solution for real-time applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.05175v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Zhe Li, Bernhard Kainz</dc:creator>
    </item>
    <item>
      <title>Emergence-as-Code as a Foundation for Self-Governing Reliable Systems</title>
      <link>https://arxiv.org/abs/2602.05458</link>
      <description>arXiv:2602.05458v2 Announce Type: replace 
Abstract: Service-level objective (SLO)-as-code tools make per-service reliability declarative, but users experience journeys: end-to-end executions whose availability and tail latency emerge from topology, routing, redundancy, timeouts/fallbacks, shared failure domains, and tail amplification. Journey objectives are therefore often maintained outside code and drift away from the effective runtime graph.
  We propose Emergence-as-Code (EmaC), a declarative contract that compiles journey-level SLI bounds and governance artifacts for declared SLOs from intent and evidence. An EmaC specification defines a typed journey expression, leaf bindings to atomic SLOs and telemetry, failure-domain assumptions, and guarded actions. Model Discovery proposes evidence-backed deltas for edges, branch probabilities, redundancy groups, and failure-domain hypotheses; each delta carries provenance and confidence. The compiler derives optimistic and pessimistic journey bounds and emits reviewable governance artifacts. An executable checkout replay shows that local SLOs can remain green while evidence-backed discovery changes the failure-domain model, collapses the pessimistic payment-race bound, and changes the rollout decision from pass to fail or review.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.05458v2</guid>
      <category>cs.SE</category>
      <category>cs.DC</category>
      <category>cs.PF</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Anatoly A. Krasnovsky</dc:creator>
    </item>
    <item>
      <title>Toward Operationalizing Rasmussen: Drift Observability on the Simplex for Evolving Systems</title>
      <link>https://arxiv.org/abs/2602.05483</link>
      <description>arXiv:2602.05483v2 Announce Type: replace 
Abstract: Software operations increasingly rely on SLOs, traces, deployment specifications, and change events, yet dashboards and thresholding practices often expose share-like operational signals as separate scalar panels or baseline distances. This can create false alarms under benign redistribution and miss movement toward policy boundaries. Rasmussen's dynamic safety model motivates drift under competing pressures, but operationalizing it for software is difficult because relevant state variables (remaining margin, engineering effort, and risk/impact) are often compositional and their parts evolve. We formulate an automated, artifact-derived drift-monitor design that maps changing software artifacts into a stable compositional monitoring state: it extracts a current part inventory and policy constraints, maps telemetry to a positive composition, stabilizes splits, merges, and renames through lineage-aware canonical groups, and analyzes boundary-directed drift in log-ratio coordinates. The proposed monitor would report drift direction, step-to-boundary, balance-level attribution, and model-health indicators under architectural churn. We specify the approach, identify its zero/noise/lineage assumptions, and report a reproducible synthetic sanity check of boundary-aware drift and controlled part churn.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.05483v2</guid>
      <category>eess.SY</category>
      <category>cs.CY</category>
      <category>cs.SY</category>
      <category>stat.AP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Anatoly A. Krasnovsky</dc:creator>
    </item>
    <item>
      <title>On the Superlinear Relationship between SGD Noise Covariance and Loss Landscape Curvature</title>
      <link>https://arxiv.org/abs/2602.05600</link>
      <description>arXiv:2602.05600v2 Announce Type: replace 
Abstract: Stochastic Gradient Descent (SGD) introduces anisotropic noise that is correlated with the local curvature of the loss landscape, thereby biasing optimization toward flat minima. Prior work often assumes an equivalence between the Fisher Information Matrix and the Hessian for negative log-likelihood losses, leading to the claim that the SGD noise covariance $\mathbf{C}$ is proportional to the Hessian $\mathbf{H}$. We show that this assumption holds only under restrictive conditions that are typically violated in deep neural networks. Using the recently discovered Activity--Weight Duality, we find a more general relationship agnostic to the specific loss formulation, showing that $\mathbf{C} \propto \mathbb{E}_p[\mathbf{h}_p^2]$, where $\mathbf{h}_p$ denotes the per-sample Hessian with $\mathbf{H} = \mathbb{E}_p[\mathbf{h}_p]$. As a consequence, $\mathbf{C}$ and $\mathbf{H}$ commute approximately rather than coincide exactly. We further find that, within the analyzed fully connected layers, their diagonal elements follow per-layer empirical power laws $C_{ii} \propto H_{ii}^{\gamma}$, with layer-dependent fitted exponents bounded by $1 \leq \gamma \leq 2$. Experiments across datasets, architectures, and loss functions support the resulting layerwise bounds, providing a unified characterization of the noise-curvature relationship in deep learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.05600v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yikuan Zhang, Ning Yang, Yuhai Tu</dc:creator>
    </item>
    <item>
      <title>Variational Speculative Decoding: Rethinking Draft Training from Token Likelihood to Sequence Acceptance</title>
      <link>https://arxiv.org/abs/2602.05774</link>
      <description>arXiv:2602.05774v4 Announce Type: replace 
Abstract: Speculative decoding accelerates inference for (M)LLMs, yet a training-decoding discrepancy persists: while existing methods optimize single greedy trajectories, decoding involves verifying and ranking multiple sampled draft paths. We propose Variational Speculative Decoding (VSD), formulating draft training as variational inference over latent proposals (draft paths). VSD maximizes the marginal probability of target-model acceptance, yielding an ELBO that promotes high-quality latent proposals while minimizing divergence from the target distribution. To enhance quality and reduce variance, we incorporate a path-level utility and optimize via an Expectation-Maximization procedure. The E-step draws Monte Carlo samples from an oracle-filtered posterior, while the M-step maximizes weighted likelihood using Adaptive Rejection Weighting (ARW) and Confidence-Aware Regularization (CAR). Theoretical analysis confirms that VSD increases expected acceptance length and speedup. Extensive experiments across LLMs and MLLMs show that VSD achieves up to a 9.6% speedup over EAGLE-3 and 7.9% over ViSpec, significantly improving decoding efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.05774v4</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>math.PR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiandong Zou, Jianshu Li, Jing Huang, Pan Zhou</dc:creator>
    </item>
    <item>
      <title>Self-Supervised Learning with a Multi-Task Latent Space Objective</title>
      <link>https://arxiv.org/abs/2602.05845</link>
      <description>arXiv:2602.05845v2 Announce Type: replace 
Abstract: We propose a multi-task formulation of self-predictive Siamese SSL in which each spatial transformation defines a distinct latent-space alignment task, solved by a dedicated predictor over a shared encoder. This perspective directly explains a long-standing failure of multi-crop training in self-predictive methods such as BYOL, SimSiam, and MoCo v3: a shared predictor is forced to solve heterogeneous alignment tasks simultaneously, leading to unstable optimization. Assigning one predictor per view type resolves this interference, unlocking linear evaluation gains of 3.8-4\% across frameworks. This perspective also suggests a principled way to enrich pre-training by introducing additional spatial transformations as complementary tasks. We demonstrate this by introducing asymmetric cutout views, in which a masked online view is aligned with a complete target, forming a semantic inpainting objective. The resulting framework is stable, backbone-agnostic, and consistently improves the performance of ResNet and ViT models on ImageNet and COCO.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.05845v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pierre-Fran\c{c}ois De Plaen, Abhishek Jha, Luc Van Gool, Tinne Tuytelaars, Marc Proesmans</dc:creator>
    </item>
    <item>
      <title>Lost in Speech: Benchmarking, Evaluation, and Parsing of Spoken Bilingual Conversational Language Beyond Standard UD Assumptions</title>
      <link>https://arxiv.org/abs/2602.06307</link>
      <description>arXiv:2602.06307v2 Announce Type: replace 
Abstract: Spoken bilingual conversations pose substantial challenges for syntactic parsing because they often include disfluencies and discourse-driven structures that complicate dependency parsing under standard Universal Dependencies (UD) assumptions and evaluation practices. To systematically study these challenges, in this work, we first introduce a linguistically grounded taxonomy of conversational bilingual phenomena, together with SpokeBench, an expert-annotated English-Spanish benchmark for structurally complex speech. To address the limitations of existing evaluation practices, we propose Flex-UD, an ambiguity-aware evaluation metric that distinguishes catastrophic structural failures from linguistically acceptable variations. Finally, we introduce DECAP, a decoupled agentic parsing framework that separates spoken-phenomena handling from core syntactic analysis, enabling robust and interpretable dependency parsing without retraining. Experiments across both proprietary and open-weight LLMs show that DECAP substantially improves performance on complex conversational phenomena and achieves over 60% improvements in UPOS-F1 Score over baselines, while Flex-UD evaluations reveal gains that otherwise remain partially hidden under standard attachment-based metrics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.06307v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nemika Tyagi, Olga Kellert, Holly Hendrix, Nelvin Licona-Guevara, Justin Mackie, Phanos Kareen, Megan Michelle Smith, Tatiana Gallego Hernande, Samhitha Harish, Chitta Baral</dc:creator>
    </item>
    <item>
      <title>On the Role of the Double Fourier Sphere Method in Fast Algorithms on SO(3)</title>
      <link>https://arxiv.org/abs/2602.06677</link>
      <description>arXiv:2602.06677v3 Announce Type: replace 
Abstract: We analyze the Double Fourier Sphere (DFS) method on the rotation group $\mathcal{SO}(3)$ in the frequency domain and demonstrate its central role in fast algorithms. Fast Fourier algorithms on $\mathcal{SO}(3)$ are commonly formulated as a Wigner transform - mapping harmonic to Fourier coefficients - followed by a Fourier transform. We revisit this formulation and interpret the Wigner transform as an explicit realization of the DFS method, lifting functions from $\mathcal{SO}(3)$ to $\mathbb{T}^3$. In this context, we analyze the Sobolev regularity loss induced by this lifting. Furthermore, we compare different Wigner transform implementations, examine additional symmetry enhancements, and observe that the direct method is often faster and more stable than the fast polynomial transform approaches.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.06677v3</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ralf Hielscher, Erik W\"unsche</dc:creator>
    </item>
    <item>
      <title>Implementing Grassroots Logic Programs with Multiagent Transition Systems and AI (Full Version)</title>
      <link>https://arxiv.org/abs/2602.06934</link>
      <description>arXiv:2602.06934v4 Announce Type: replace 
Abstract: Grassroots Logic Programs (GLP) is a concurrent logic programming language in which logic variables are partitioned into paired readers and writers. An assignment is produced at most once via a writer and consumed at most once via its paired reader, and may contain additional readers and/or writers. This enables the concise expression of rich multidirectional communication modalities.
  The language was introduced together with concurrent (cGLP) and multiagent (maGLP) operational semantics. Here, we derive from these (\ia)~dGLP, a deterministic counterpart of cGLP, and (\ib)~madGLP, a counterpart of maGLP in which deterministic agents communicate solely by asynchronous message passing, and prove them correct against their abstract counterparts. maGLP shared variable pairs spanning agents can be implemented as local variables paired by \emph{global links}, with correctness following from disjoint substitution commutativity (a consequence of GLP's single-occurrence invariant). We further prove that madGLP is grassroots. Both dGLP and madGLP serve as formal specifications for an AI-driven implementation discipline (math $\to$ informal spec $\to$ Dart) employed and described here: from dGLP, AI (Claude) developed a workstation-based GLP implementation in Dart, and from madGLP it is developing a smartphone-based multiagent one.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.06934v4</guid>
      <category>cs.PL</category>
      <category>cs.AI</category>
      <category>cs.DC</category>
      <category>cs.LO</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Ehud Shapiro</dc:creator>
    </item>
    <item>
      <title>Optimizing Few-Step Generation with Adaptive Matching Distillation</title>
      <link>https://arxiv.org/abs/2602.07345</link>
      <description>arXiv:2602.07345v2 Announce Type: replace 
Abstract: Distribution Matching Distillation (DMD) is a powerful acceleration paradigm, yet its stability is often compromised in Forbidden Zone, regions where the real teacher provides unreliable guidance while the fake teacher exerts insufficient repulsive force. In this work, we propose a unified optimization framework that reinterprets prior art as implicit strategies to avoid these corrupted regions. Based on this insight, we introduce Adaptive Matching Distillation (AMD), a self-correcting mechanism that utilizes reward proxies to explicitly detect and escape Forbidden Zones. AMD dynamically prioritizes corrective gradients via structural signal decomposition and introduces Repulsive Landscape Sharpening to enforce steep energy barriers against failure mode collapse. Extensive experiments across image and video generation tasks (e.g., SDXL, Wan2.1) and rigorous benchmarks (e.g., VBench, GenEval) demonstrate that AMD significantly enhances sample fidelity and training robustness. For instance, AMD improves the HPSv2 score on SDXL from 30.64 to 31.25, outperforming state-of-the-art baselines. These findings validate that explicitly rectifying optimization trajectories within Forbidden Zones is essential for pushing the performance ceiling of few-step generative models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.07345v2</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lichen Bai, Zikai Zhou, Shitong Shao, Wenliang Zhong, Shuo Yang, Shuo Chen, Bojun Chen, Zeke Xie</dc:creator>
    </item>
    <item>
      <title>Data Compression with Stochastic Codes</title>
      <link>https://arxiv.org/abs/2602.07635</link>
      <description>arXiv:2602.07635v2 Announce Type: replace 
Abstract: Machine learning has had a major impact on data compression over the last decade and opened up many new theoretical and applied fields of inquiry.
  This paper describes one such direction -- relative entropy coding -- which focuses on constructing stochastic codes, mainly as an alternative to quantisation and entropy coding in lossy source coding. Our primary aim is to provide a broad overview of the topic, with an emphasis on the computational and practical aspects currently missing from the literature.
  Our goal is threefold: for the curious reader, we aim to provide an intuitive picture of the field and convince them that relative entropy coding is a simple yet exciting emerging field in data compression research. For a reader interested in applied research on lossy data compression, we provide an account of the most salient contemporary applications. Finally, for the reader who has heard of relative entropy coding but has never been quite sure what it is or how the algorithms fit together, we hope to illustrate how simple and elegant the underlying constructions are.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.07635v2</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gergely Flamich, Deniz G\"und\"uz</dc:creator>
    </item>
    <item>
      <title>Generative Reasoning Re-ranker</title>
      <link>https://arxiv.org/abs/2602.07774</link>
      <description>arXiv:2602.07774v5 Announce Type: replace 
Abstract: Recent studies increasingly explore Large Language Models (LLMs) as a new paradigm for recommendation systems due to their scalability and world knowledge. However, existing work has three key limitations: (1) most efforts focus on retrieval and ranking, while the reranking phase, critical for refining final recommendations, is largely overlooked; (2) LLMs are typically used in zero-shot or supervised fine-tuning settings, leaving their reasoning abilities, especially those enhanced through reinforcement learning (RL) and high-quality reasoning data, underexploited; (3) items are commonly represented by non-semantic IDs, creating major scalability challenges in industrial systems with billions of identifiers. To address these gaps, we propose the Generative Reasoning Reranker (GR2), an end-to-end framework with a three-stage training pipeline tailored for reranking. First, a pretrained LLM is mid-trained on semantic IDs encoded from non-semantic IDs via a tokenizer achieving $\ge$99% uniqueness. Next, a stronger larger-scale LLM generates high-quality reasoning traces through carefully designed prompting and rejection sampling, which are used for supervised fine-tuning to impart foundational reasoning skills. Finally, we apply Decoupled Clip and Dynamic sAmpling Policy Optimization (DAPO), enabling scalable RL supervision with verifiable rewards designed specifically for reranking. Experiments on two real-world datasets demonstrate GR2's effectiveness: it surpasses the state-of-the-art OneRec-Think by 2.4% in Recall@5 and 1.3% in NDCG@5. Ablations confirm that advanced reasoning traces yield substantial gains across metrics. We further find that RL reward design is crucial in reranking: LLMs tend to exploit reward hacking by preserving item order, motivating conditional verifiable rewards to mitigate this behavior and optimize reranking performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.07774v5</guid>
      <category>cs.IR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Mingfu Liang, Yufei Li, Jay Xu, Kavosh Asadi, Xi Liu, Shuo Gu, Kaushik Rangadurai, Frank Shyu, Shuaiwen Wang, Song Yang, Zhijing Li, Jiang Liu, Mengying Sun, Fei Tian, Xiaohan Wei, Chonglin Sun, Jacob Tao, Shike Mei, Wenlin Chen, Santanu Kolay, Sandeep Pandey, Hamed Firooz, Luke Simon</dc:creator>
    </item>
    <item>
      <title>Investigating Energy Bounds of Analog Compute-in-Memory with Local Normalization</title>
      <link>https://arxiv.org/abs/2602.08081</link>
      <description>arXiv:2602.08081v2 Announce Type: replace 
Abstract: Modern edge AI workloads demand maximum energy efficiency, motivating the pursuit of analog Compute-in-Memory (CIM) architectures. Simultaneously, the popularity of Large-Language-Models (LLMs) drives the adoption of low-bit floating-point formats which prioritize dynamic range. However, the conventional direct-accumulation CIM accommodates floating-points by normalizing them to a shared widened fixed-point scale. Consequently, hardware resolution is dictated by the input's dynamic range rather than its precision, and energy consumption is dominated by the ADC. We address this limitation by introducing local normalization for each input, weight, and multiply-accumulate (MAC) output via a Gain-Ranging MAC (GR-MAC). Normalization overhead is handled by low-power digital logic, enabling the computationally expensive MAC operation to remain in the energy-efficient low-precision analog regime. Energy modelling shows that the addition of a gain-ranging Stage to the MAC enables a 4-bit increase in input dynamic range without increased energy consumption at a 35 dB SQNR standard. Additionally, the ADC resolution requirement becomes invariant to input distribution assumptions, allowing construction of an upper bound with a 1.5-bit reduction compared to the conventional lower bound. These results establish a pathway towards unlocking favourable energy scaling trends of analog CIM for modern AI workloads.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.08081v2</guid>
      <category>cs.AR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Brian Rojkov, Shubham Ranjan, Derek Wright, Manoj Sachdev</dc:creator>
    </item>
    <item>
      <title>Weak-Driven Learning: How Weak Agents make Strong Agents Stronger</title>
      <link>https://arxiv.org/abs/2602.08222</link>
      <description>arXiv:2602.08222v2 Announce Type: replace 
Abstract: As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing methods continue to reinforce target predictions, we find that informative supervision signals remain latent in models' own historical weak states. Motivated by this observation, we propose WMSS (Weak Agents Can Make Strong Agents Stronger), a post-training paradigm that leverages weak checkpoints to guide continued optimization. By identifying recoverable learning gaps via entropy dynamics and reinforcing them through compensatory learning, WMSS enables strong agents to improve beyond conventional post-training saturation. Experiments on mathematical reasoning and code generation datasets show that agents trained with our approach achieve effective performance improvements, while incurring zero additional inference cost.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.08222v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zehao Chen, Gongxun Li, Tianxiang Ai, Zixuan Huang, Xiaodong Liu, Yifei Li, Wang Zhou, Fuzhen Zhuang, Xianglong Liu, Jianxin Li, Deqing Wang, Yikun Ban</dc:creator>
    </item>
    <item>
      <title>When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents</title>
      <link>https://arxiv.org/abs/2602.08235</link>
      <description>arXiv:2602.08235v2 Announce Type: replace 
Abstract: Although computer-use agents (CUAs) hold significant potential to automate increasingly complex OS workflows, they can demonstrate unsafe unintended behaviors that deviate from expected outcomes even under benign input contexts. However, exploration of this risk remains largely anecdotal, lacking concrete characterization and automated methods to proactively surface long-tail unintended behaviors under realistic CUA scenarios. To fill this gap, we introduce the first conceptual and methodological framework for unintended CUA behaviors, by defining their key characteristics, automatically eliciting them, and analyzing how they arise from benign inputs. We propose AutoElicit: an agentic framework that iteratively perturbs benign instructions using CUA execution feedback, and elicits severe harms while keeping perturbations realistic and benign. Using AutoElicit, we surface hundreds of harmful unintended behaviors from state-of-the-art CUAs such as Claude 4.5 Haiku, Claude 4.5 Opus, and Operator. We further evaluate the transferability of human-verified successful perturbations, identifying persistent susceptibility to unintended behaviors across various other frontier CUAs. This work establishes a foundation for systematically analyzing unintended behaviors in realistic computer-use settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.08235v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jaylen Jones, Zhehao Zhang, Yuting Ning, Eric Fosler-Lussier, Pierre-Luc St-Charles, Yoshua Bengio, Dawn Song, Yu Su, Huan Sun</dc:creator>
    </item>
    <item>
      <title>Pitot-Aided Attitude and Air Velocity Estimation with Almost Global Asymptotic Stability Guarantees</title>
      <link>https://arxiv.org/abs/2602.08273</link>
      <description>arXiv:2602.08273v2 Announce Type: replace 
Abstract: This paper investigates the problem of attitude and air velocity estimation for fixed-wing unmanned aerial vehicles (UAVs) using IMU measurements and at least one Pitot tube measurement, with almost global asymptotic stability (AGAS) guarantees. A cascade observer architecture is developed, in which a Riccati/Kalman-type filter estimates the body-fixed frame air velocity and the vehicle's tilt using IMU data as inputs and Pitot measurements as outputs. Under mild excitation conditions, the resulting air velocity and tilt estimation error dynamics are shown to be uniformly observable. The estimated tilt is then combined with magnetometer measurements in a nonlinear observer on SO(3) to recover the full attitude. Rigorous analysis establishes AGAS of the overall cascade structure under the uniform observability (UO) condition. The effectiveness of the proposed approach is demonstrated through validation on real flight data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.08273v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Melone Nyoba Tchonkeu, Soulaimane Berkane, Tarek Hamel</dc:creator>
    </item>
    <item>
      <title>A Machine Learning Enabled MDO for Bio-Inspired Autonomous Underwater Gliders</title>
      <link>https://arxiv.org/abs/2602.08508</link>
      <description>arXiv:2602.08508v2 Announce Type: replace 
Abstract: The preliminary design of AUGs is intrinsically challenging due to the strong coupling between the external hydrodynamic shape, the hydrostatic balance, the structural integrity, and internal packaging constraints. This complexity is further amplified for bio-inspired configurations, whose rich geometric parametrizations lead to high-dimensional design spaces that are difficult to explore using conventional optimization approaches. This work presents a ML-enabled bi-level multidisciplinary design optimization (MDO) framework for the performance-driven design of a manta-ray-inspired AUG. At the upper level, hydrodynamically efficient external geometries are explored in a reduced design space obtained through physics-driven parametric model embedding, which identifies a low-dimensional latent representation directly correlated with the lift, drag, and pressure distributions. At the lower level, a constrained internal sizing problem determines the minimum feasible empty weight by accounting for structural, hydrostatic, geometric, and payload constraints. To render the resulting bi-level problem computationally tractable, a multi-fidelity surrogate-based optimization strategy is adopted, combining low- and high-fidelity hydrodynamic models with stochastic radial basis function surrogates and adaptive Bayesian sampling. The framework enables efficient exploration of the coupled design space while rigorously managing model uncertainty and computational cost. The optimized configurations exhibit a 14.7\% improvement in maximum hydrodynamic efficiency and a 12.8\% reduction in empty weight relative to the baseline design, while satisfying all disciplinary constraints. These results demonstrate that the integration of physics-driven dimensionality reduction and multi-fidelity machine learning enables scalable and physically consistent MDO of complex bio-inspired underwater vehicles.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.08508v2</guid>
      <category>cs.CE</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Andrea Serani, Giorgio Palma, Jeroen Wackers, Matteo Diez</dc:creator>
    </item>
    <item>
      <title>Foundation Inference Models for Ordinary Differential Equations</title>
      <link>https://arxiv.org/abs/2602.08733</link>
      <description>arXiv:2602.08733v2 Announce Type: replace 
Abstract: Ordinary differential equations (ODEs) are central to scientific modelling, but inferring their vector fields from noisy trajectories remains challenging. Current approaches such as symbolic regression, Gaussian process (GP) regression, and Neural ODEs often require complex training pipelines and substantial machine learning expertise, or they depend strongly on system-specific prior knowledge. We propose FIM-ODE, a pretrained Foundation Inference Model that amortises low-dimensional ODE inference by predicting the vector field directly from noisy trajectory data in a single forward pass. We pretrain FIM-ODE on a prior distribution over ODEs with low-degree polynomial vector fields and represent the target field with neural operators. FIM-ODE achieves strong zero-shot performance, matching and often improving upon ODEFormer, a recent pretrained symbolic baseline, across a range of regimes despite using a simpler pretraining prior distribution. Pretraining also provides a strong initialisation for finetuning, enabling fast and stable adaptation that outperforms modern neural and GP baselines without requiring machine learning expertise.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.08733v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:journal_reference>Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)</arxiv:journal_reference>
      <dc:creator>Maximilian Mauel, Johannes R. H\"ubers, David Berghaus, Patrick Seifner, Ramses J. Sanchez</dc:creator>
    </item>
    <item>
      <title>A Graphop Analysis of Graph Neural Networks on Sparse Graphs: Generalization and Universal Approximation</title>
      <link>https://arxiv.org/abs/2602.08785</link>
      <description>arXiv:2602.08785v2 Announce Type: replace 
Abstract: Generalization and approximation capabilities of message passing graph neural networks (MPNNs) are often studied by defining a compact metric on a space of input graphs under which MPNNs are equicontinuous. Such analyses are of two varieties: 1) when the metric space includes graphs of unbounded sizes, the theory is only appropriate for dense graphs, and, 2) when studying sparse graphs, the metric space only includes graphs of uniformly bounded size. In this work, we present a unified approach, defining a compact metric on the space of graphs of all sizes, both sparse and dense, under which MPNNs are equicontinuous. This leads to more powerful universal approximation theorems and generalization bounds than previous works. The theory is based on, and extends, a recent approach to graph limit theory called graphop analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.08785v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ofek Amran, Tom Gilat, Ron Levie</dc:creator>
    </item>
    <item>
      <title>AMS-HD: Hyperdimensional Computing for Real-Time and Energy-Efficient Acute Mountain Sickness Detection</title>
      <link>https://arxiv.org/abs/2602.08916</link>
      <description>arXiv:2602.08916v3 Announce Type: replace 
Abstract: Objective: Acute mountain sickness (AMS) is the most prevalent altitude illness, affecting unacclimatized individuals ascending above 2,500 m and potentially escalating to life threatening cerebral or pulmonary edema. Conventional machine learning (ML) methods for AMS detection from wearable physiological signals often fail to meet real-time hardware efficiency requirements of continuous monitoring. Methods: We present AMS-HD, the first hyperdimensional computing (HDC)-based framework for real-time AMS detection, spanning high-level bipolar (-1/+1) computing for mobile platforms and low-level binary (0/1) computing for FPGA and ASIC targets. The framework integrates mutual information feature selection, hypervector encoding, and positional projection to enhance classification efficiency. Validation spans ARM, FPGA, and smartwatch-smartphone platforms using wearable-accessible SpO2 and heart rate signals. Results: AMS-HD matches or outperforms SVM and MLP baselines in both binary and multiclass classification, achieving up to 91% accuracy and 90% F1-score in binary classification, and up to 85% accuracy on external AMS-related datasets. On FPGA, AMS-HD reduces LUT and flip-flop usage by 7.3x and 5.8x, while consuming 3.9x less power than MLP. On mobile platforms, AMS-HD requires only 1% battery per session, 60 Bytes of memory, and 2.50 ms inference time -- approximately 2x and more than 3x lower energy consumption than SVM and MLP. Conclusion: AMS-HD provides a scalable, hardware-aware alternative to conventional ML for real-time AMS monitoring, achieving competitive performance with substantially lower resource consumption. Significance: This work presents the first complete HDC framework for altitude sickness detection, bridging wearable inference and low-level hardware deployment for resource-constrained health monitoring.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.08916v3</guid>
      <category>cs.SC</category>
      <category>cs.ET</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Abu Masum, Mehran Moghadam, M. Hassan Najafi, Bige Unluturk, Ulkuhan Guler, Beth A. Beidleman, Sercan Aygun</dc:creator>
    </item>
    <item>
      <title>SciFlow-Bench: Evaluating Structure-Aware Scientific Diagram Generation via Inverse Parsing</title>
      <link>https://arxiv.org/abs/2602.09809</link>
      <description>arXiv:2602.09809v2 Announce Type: replace 
Abstract: Scientific diagrams convey explicit structural information, yet modern text-to-image models often produce visually plausible but structurally incorrect results. Existing benchmarks either rely on image-centric or subjective metrics insensitive to structure, or evaluate intermediate symbolic representations rather than final rendered images, leaving pixel-based diagram generation underexplored. We introduce SciFlow-Bench, a structure-first benchmark for evaluating scientific diagram generation directly from pixel-level outputs. Built from real scientific PDFs, SciFlow-Bench pairs each source framework figure with a canonical ground-truth graph and evaluates models as black-box image generators under a closed-loop, round-trip protocol that inverse-parses generated diagram images back into structured graphs for comparison. This design enforces evaluation by structural recoverability rather than visual similarity alone, and is enabled by a hierarchical multi-agent system that coordinates planning, perception, and structural reasoning. Experiments show that preserving structural correctness remains a fundamental challenge, particularly for diagrams with complex topology, underscoring the need for structure-aware evaluation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.09809v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tong Zhang, Honglin Lin, Zhou Liu, Chong Chen, Wentao Zhang</dc:creator>
    </item>
    <item>
      <title>Kunlun: Establishing Scaling Laws for Massive-Scale Recommendation Systems through Unified Architecture Design</title>
      <link>https://arxiv.org/abs/2602.10016</link>
      <description>arXiv:2602.10016v3 Announce Type: replace 
Abstract: Deriving predictable scaling laws that govern the relationship between model performance and computational investment is crucial for designing and allocating resources in massive-scale recommendation systems. While such laws are established for large language models, they remain challenging for recommendation systems, especially those processing both user history and context features. We identify poor scaling efficiency as the main barrier to predictable power-law scaling, stemming from inefficient modules with low Model FLOPs Utilization (MFU) and suboptimal resource allocation. We introduce Kunlun, a scalable architecture that systematically improves model efficiency and resource allocation. Our low-level optimizations include Generalized Dot-Product Attention (GDPA), Hierarchical Seed Pooling (HSP), and Sliding Window Attention. Our high-level innovations feature Computation Skip (CompSkip) and Event-level Personalization. These advances increase MFU from 17% to 37% on NVIDIA B200 GPUs and double scaling efficiency over state-of-the-art methods. Kunlun is now deployed in major Meta Ads models, delivering significant production impact.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.10016v3</guid>
      <category>cs.IR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bojian Hou, Xiaolong Liu, Xiaoyi Liu, Jiaqi Xu, Yasmine Badr, Mengyue Hang, Sudhanshu Chanpuriya, Junqing Zhou, Yuhang Yang, Han Xu, Qiuling Suo, Laming Chen, Yuxi Hu, Jiasheng Zhang, Huaqing Xiong, Yuzhen Huang, Chao Chen, Yue Dong, Yi Yang, Shuo Chang, Xiaorui Gan, Wenlin Chen, Santanu Kolay, Darren Liu, Jade Nie, Chunzhi Yang, Ellie Wen, Jiyan Yang, Huayu Li</dc:creator>
    </item>
    <item>
      <title>Hyperspectral Smoke Segmentation via Mixture of Prototypes</title>
      <link>https://arxiv.org/abs/2602.10858</link>
      <description>arXiv:2602.10858v2 Announce Type: replace 
Abstract: Smoke segmentation is critical for wildfire management and industrial safety applications. Traditional visible-light-based methods face limitations due to insufficient spectral information, particularly struggling with cloud interference and semi-transparent smoke regions. To address these challenges, we introduce hyperspectral imaging for smoke segmentation and present the first hyperspectral smoke segmentation dataset (HSSDataset) with carefully annotated samples collected from over 18,000 frames across 20 real-world scenarios using a Many-to-One annotations protocol. However, different spectral bands exhibit varying discriminative capabilities across spatial regions, necessitating adaptive band weighting strategies. We decompose this into three technical challenges: spectral interaction contamination, limited spectral pattern modeling, and complex weighting router problems. We propose a mixture of prototypes (MoP) network with: (1) band split (BS) for spectral isolation, (2) prototype-based spectral representation (PSR) for diverse patterns, and (3) dual-stage router (DSR) for adaptive spatial-aware band weighting. We further construct a multispectral dataset (MSSDataset) with RGB-infrared images. Extensive experiments validate superior performance across both hyperspectral and multispectral modalities, establishing a new paradigm for spectral-based smoke segmentation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.10858v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lujian Yao, Haitao Zhao, Xianghai Kong, Yuhan Xu</dc:creator>
    </item>
    <item>
      <title>SurveyLens: A Discipline-Aware Benchmark for Automatic Survey Generation</title>
      <link>https://arxiv.org/abs/2602.11238</link>
      <description>arXiv:2602.11238v2 Announce Type: replace 
Abstract: Automatic Survey Generation (ASG) aims to produce comprehensive literature surveys by retrieving, organizing, and synthesizing academic papers. Despite rapid progress in specialized ASG frameworks and Deep Research agents, existing evaluations largely center on Computer Science or rely on generic criteria, leaving it unclear whether current systems satisfy the survey standards of diverse disciplines. We introduce SurveyLens, the first discipline-aware ASG benchmark. SurveyLens comprises SurveyLens-1k, a curated dataset of 1,000 human-written surveys across 10 disciplines, and a dual-lens framework that combines discipline-aware rubric scoring with reference-based alignment to human-written surveys. Evaluating 11 state-of-the-art systems across vanilla LLMs, ASG systems, and Deep Research agents, we find that Deep Research agents are the only paradigm robust across all 10 disciplines, ASG systems lead on structural planning, and all paradigms remain weak on reference quality, providing practical guidance for discipline-specific tool selection and future ASG design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.11238v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Beichen Guo, Zhiyuan Wen, Jia Gu, Haochen Shi, Jian Wang, Senzhang Wang, Haoyang Li, Ruosong Yang, Shuaiqi Liu</dc:creator>
    </item>
    <item>
      <title>The Arithmetic Singleton Bound on the Hamming Distances of Simple-rooted Constacyclic Codes over Finite Fields</title>
      <link>https://arxiv.org/abs/2602.11788</link>
      <description>arXiv:2602.11788v2 Announce Type: replace 
Abstract: In this work, We introduce a new upper bound on the Hamming distance of simple-root constacyclic codes over finite fields, which we call the arithmetic Singleton bound. The main technical tool is the notion of a multiple equal-difference (MED) representation. Via the MED representations of the defining set of the generator polynomial of a simple-root constacyclic code, we obtain a family of upper bounds on its Hamming distance, among which the weakest one coincides with the Singleton bound, while the strongest one is defined to be the arithmetic Singleton bound for this code. Consequently, the arithmetic Singleton bound is always at least as strong as the classical Singleton bound, and is in fact strictly stronger in numerous nontrivial cases. The arithmetic Singleton bound partially measures the restriction on the Hamming distance of a simple-root constacyclic code imposed by its arithmetic structure. In particular, for an irreducible constacyclic code, the MED representations of the defining set of its generator polynomial are completely determined, via which the arithmetic Singleton bound is computed concretely. Finally for any simple-root cyclic code the arithmetic Singleton bound and the BCH bound are compared.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.11788v2</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Li Zhu, Hongfeng Wu</dc:creator>
    </item>
    <item>
      <title>Robot-DIFT: Correspondence-Sensitive Diffusion Features for Contact-Rich Robot Manipulation</title>
      <link>https://arxiv.org/abs/2602.11934</link>
      <description>arXiv:2602.11934v2 Announce Type: replace 
Abstract: Robot manipulation often fails in the final millimeters: a policy may recognize the right object yet miss the pose offsets, boundaries, or pre-contact alignments needed for action. We argue that such failures arise when semantic invariance suppresses correspondence cues for closed-loop control, or when these cues are not exposed to the policy in a usable form. Modern visual encoders provide strong semantic abstractions, but contact-rich manipulation requires correspondence sensitivity: discriminative feature responses to action-relevant changes in pose, boundary, and contact geometry. Diffusion features provide a strong prior for dense correspondence, but direct use is impractical due to stochasticity, latency, and representation drift. We introduce Robot-DIFT, a deterministic diffusion-derived backbone for real-time control. Through Manifold Distillation, Robot-DIFT converts a noise-conditioned diffusion Teacher into a clean-input, single-pass Student while preserving the teacher's feature manifold. A Spatial--Semantic Feature Pyramid Network (S2-FPN) fuses coarse-to-fine Student decoder features into visual tokens that expose semantic context and fine contact detail to the policy. Across RoboCasa, LIBERO-10, and real robots, Robot-DIFT outperforms vision--language, self-supervised, geometry-oriented, and diffusion baselines on contact-sensitive tasks. Controlled backbone/readout swaps show that S2-FPN unlocks, rather than replaces, the diffusion correspondence prior.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.11934v2</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yu Deng, Yufeng Jin, Xiaogang Jia, Jiahong Xue, Gerhard Neumann, Georgia Chalvatzaki</dc:creator>
    </item>
    <item>
      <title>On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage</title>
      <link>https://arxiv.org/abs/2602.12107</link>
      <description>arXiv:2602.12107v2 Announce Type: replace 
Abstract: We study offline reinforcement learning under $Q^\star$-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative $Q$-Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are $Q^\star$-realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?"
  We answer in the negative via an information-theoretic lower bound. To identify additional structure that enables sample-efficient offline RL under partial coverage, we introduce a general decision-estimation framework, inspired by model-free decision-estimation coefficients (DEC) for online RL (Foster et al., 2023b; Liu et al., 2025b). Our framework decomposes offline RL complexity into decision complexity and value estimation error. This allows modular study of both sub-problems. Our result not only unifies existing results (Chen and Jiang, 2022; Uehara et al., 2023), but further improves and generalizes them. On the decision complexity side, our improvement includes: the first $\epsilon^{-2}$ sample complexity bound for soft $Q$-learning under partial coverage that improves Uehara et al.'s (2023) $\epsilon^{-4}$ bound, the removal of the need for additional online interaction in the value-gap setting of Chen and Jiang (2022), and new learnable settings beyond the above two cases. On the value estimation side, we provide a new characterization of the role of Bellman completeness under partial coverage, and the first characterization of offline learnability for general low-Bellman-rank MDPs (Jiang et al., 2017; Du et al., 2021; Jin et al., 2021). The latter is a canonical online RL setting that has remained unexplored in offline RL except for special cases. As a side contribution, our techniques give the first analysis of CQL in the function approximation setting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.12107v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haolin Liu, Braham Snyder, Chen-Yu Wei</dc:creator>
    </item>
    <item>
      <title>Towards Personalized Bangla Book Recommendation: A Large-Scale Heterogeneous Book Graph Dataset</title>
      <link>https://arxiv.org/abs/2602.12129</link>
      <description>arXiv:2602.12129v2 Announce Type: replace 
Abstract: Personalized book recommendation in Bangla literature has been constrained by the lack of structured, large-scale, and publicly available datasets. This work introduces RokomariBG, a large-scale heterogeneous book graph dataset designed to support research on personalized recommendation in a low-resource language setting. The dataset comprises 127,302 books, 63,723 users, 16,601 authors, 1,515 categories, 2,757 publishers, and 209,602 reviews, connected through several relation types and organized as a comprehensive knowledge graph. To demonstrate the utility of the dataset, we present a systematic benchmarking study on the top-N recommendation and sequential recommendation tasks, evaluating a diverse set of representative recommendation models. Through comprehensive benchmarking, we demonstrate that recommendation performance in this domain is strongly influenced by both heterogeneous relational information and code-mixed textual metadata. These findings reveal unique challenges of Bangladeshi e-commerce ecosystems that are largely absent from existing recommendation benchmarks. Overall, this work establishes a foundational benchmark and a publicly available resource for Bangla book recommendation research, enabling reproducible evaluation and future studies on recommendation in low-resource cultural domains. The dataset and code are publicly available at https://github.com/backlashblitz/Bangla-Book-Recommendation-Dataset</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.12129v2</guid>
      <category>cs.IR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Rahin Arefin Ahmed, Md. Anik Chowdhury, Sakil Ahmed Sheikh Reza, Devnil Bhattacharjee, Muhammad Abdullah Adnan, Julian McAuley, Nafis Sadeq</dc:creator>
    </item>
    <item>
      <title>6G Empowering Future Robotics: A Vision for Next-Generation Autonomous Systems</title>
      <link>https://arxiv.org/abs/2602.12246</link>
      <description>arXiv:2602.12246v2 Announce Type: replace 
Abstract: The convergence of robotics and next-generation communication is a critical driver of technological advancement. As the world transitions from 5G to 6G, the foundational capabilities of wireless networks are evolving to support increasingly complex and autonomous systems. We examine the transformative impact of 6G on enhancing key robotics functionalities. It provides a systematic mapping of IMT-2030 key performance indicators to robotic functional blocks, including sensing, perception, cognition, actuation, and self-learning. Building upon this mapping, we propose a high-level architectural framework integrating robotic, intelligent, and network service planes, underscoring the need for a holistic approach. As an example, use case, we present a real-time, dynamic safety framework enabled by IMT-2030 capabilities for safe and efficient human-robot collaboration in shared spaces.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.12246v2</guid>
      <category>cs.NI</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Mona Ghassemian, Andr\'es Meseguer Valenzuela, Ana Garcia Armada, Dejan Vukobratovic, Periklis Chatzimisios, Kaspar Althoefer, Ranga Rao Venkatesha Prasad</dc:creator>
    </item>
    <item>
      <title>Empirical Modeling of Therapist-Client Dynamics in Psychotherapy Using LLM-Based Assessments</title>
      <link>https://arxiv.org/abs/2602.12450</link>
      <description>arXiv:2602.12450v2 Announce Type: replace 
Abstract: Psychotherapy is a primary treatment for many mental health conditions, yet the interplay among therapist behaviors, client responses, and the therapeutic relationship is difficult to study at scale, as process research has relied on labor-intensive human coding. We develop and validate a computational framework for modeling therapist-client interaction, using large language models (LLMs) to measure therapist behaviors (empathy, exploration), relational quality (rapport), and client outcomes (self-disclosure, self-directed and outward-directed negative emotion). After validating model-generated scores against human annotations (ICC = 0.45-0.81; rapport 0.81, self-disclosure 0.78), we apply these measures to roughly 2,000 hours of transcripts from the Alexander Street corpus and use Structural Equation Modeling to estimate moment-to-moment relationships among therapist behaviors, rapport, and subsequent client responses, controlling for prior client state and context. Therapist empathy and exploration directly predict increased client disclosure and shifts in emotional expression; empathy is more strongly associated with self-directed than outward-directed negative emotion, suggesting greater acknowledgment of internal distress, while exploration increases disclosure and emotional elaboration. Rapport does not directly amplify disclosure or emotional intensity but instead moderates the associations between therapist behaviors and client affect, potentially contributing to reductions in internal distress. These results show that LLM-based measurement combined with structural modeling can capture core therapeutic processes at scale, with empathy and exploration acting directly and rapport as a contextual moderator, providing a foundation for precision modeling of psychotherapy and for scalable therapist training and AI-supported clinical education.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.12450v2</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Angela Chen, Siwei Jin, Canwen Wang, Holly Swartz, Tongshuang Wu, Robert E Kraut, Haiyi Zhu</dc:creator>
    </item>
    <item>
      <title>Know More, Know Clearer: A Meta-Cognitive Framework for Knowledge Augmentation in Large Language Models</title>
      <link>https://arxiv.org/abs/2602.12996</link>
      <description>arXiv:2602.12996v2 Announce Type: replace 
Abstract: Knowledge augmentation has significantly enhanced the performance of Large Language Models (LLMs) in knowledge-intensive tasks. However, existing methods typically operate on the simplistic premise that model performance equates with internal knowledge, overlooking the knowledge-confidence gaps that lead to overconfident errors or uncertain truths. To bridge this gap, we propose a novel meta-cognitive framework for reliable knowledge augmentation via differentiated intervention and alignment. Our approach leverages internal cognitive signals to partition the knowledge space into mastered, confused, and missing regions, guiding targeted knowledge expansion. Furthermore, we introduce a cognitive consistency mechanism to synchronize subjective certainty with objective accuracy, ensuring calibrated knowledge boundaries. Extensive experiments demonstrate the our framework consistently outperforms strong baselines, validating its rationality in not only enhancing knowledge capabilities but also fostering cognitive behaviors that better distinguish knowns from unknowns. All codes are available at https://github.com/AI9Stars/Know-More-Know-Clearer.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.12996v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hao Chen, Ye He, Yuchun Fan, Yukun Yan, Zhenghao Liu, Qingfu Zhu, Maosong Sun, Wanxiang Che</dc:creator>
    </item>
    <item>
      <title>Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics</title>
      <link>https://arxiv.org/abs/2602.15253</link>
      <description>arXiv:2602.15253v2 Announce Type: replace 
Abstract: Neural scaling laws -- power-law relationships between loss, model size, and data -- have been extensively documented for language and vision transformers, yet their existence in single-cell genomics remains largely unexplored. We present the first systematic study of scaling behaviour for masked-reconstruction transformers trained on single-cell RNA sequencing (scRNA-seq) data. Using expression profiles from the CELLxGENE Census, we construct two experimental regimes: a data-rich regime (512 highly variable genes, 200,000 cells) and a data-limited regime (1,024 genes, 10,000 cells). Across seven model sizes spanning three orders of magnitude in parameter count (533 to 3.4 x 10^8 parameters), we fit the parametric scaling law to validation mean squared error (MSE). The data-rich regime exhibits clear power-law scaling with an irreducible loss floor of c ~ 1.44, while the data-limited regime shows negligible scaling, indicating that model capacity is not the binding constraint when data are scarce. These results establish that scaling laws analogous to those observed in natural language processing do emerge in single-cell transcriptomics when sufficient data are available, and they identify the data-to-parameter ratio as a critical determinant of scaling behaviour. A preliminary conversion of the data-rich asymptotic floor to information-theoretic units yields an estimate of approximately 2.30 bits of entropy per masked gene position. We discuss implications for the design of single-cell foundation models and outline the additional measurements needed to refine this entropy estimate.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.15253v2</guid>
      <category>cs.LG</category>
      <category>q-bio.GN</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ihor Kendiukhov</dc:creator>
    </item>
    <item>
      <title>Prescriptive Scaling Reveals the Evolution of Language Model Capabilities</title>
      <link>https://arxiv.org/abs/2602.15327</link>
      <description>arXiv:2602.15327v2 Announce Type: replace 
Abstract: Machine learning model performance improvements tend to arise from competition and application. For deployment, we consider prescriptive scaling laws: given a pre-training compute budget, what downstream accuracy is attainable with contemporary post-training practice, and how stable is that mapping as the field evolves? Using large-scale observational evaluations with 5k existing and 2k newly evaluated model checkpoints spanning 2022-2026 across six benchmarks, we estimate capability boundaries, high conditional quantiles of benchmark scores as a function of log pre-training FLOPs, via smoothed quantile regression with a monotone, saturating sigmoid parameterization. We validate temporal reliability by fitting on earlier model generations and evaluating on later releases: across four of six tasks, the out-of-distribution coverage error remains below 2%, while math reasoning exhibits a consistently advancing boundary over time. For instance, at a budget of 10^24 FLOPs, the estimated attainable accuracies are 0.83 on IFEval and 0.54 on MATH Lvl 5. We then extend our approach to analyze task-dependent saturation and to probe contamination-related shifts on math reasoning tasks. Finally, we introduce a balanced I-optimal sampling algorithm that recovers near-full-data frontiers using roughly 20% of the parameter-count-weighted evaluation budget, as low as 5% on some tasks, while maintaining comparable calibration. Together, our work releases Proteus-2k, the latest model performance evaluation dataset, and introduces a practical methodology for translating compute budgets into reliable performance expectations and for monitoring when capability boundaries shift across time.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.15327v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hanlin Zhang, Jikai Jin, Vasilis Syrgkanis, Sham Kakade</dc:creator>
    </item>
    <item>
      <title>Operationalising the Superficial Alignment Hypothesis via Task Complexity</title>
      <link>https://arxiv.org/abs/2602.15829</link>
      <description>arXiv:2602.15829v2 Announce Type: replace 
Abstract: The superficial alignment hypothesis (SAH) posits that large language models learn most of their knowledge during pre-training, and that post-training merely surfaces this knowledge. The SAH, however, lacks a precise definition, which has led to (i) different and seemingly orthogonal arguments supporting it, and (ii) important critiques to it. We propose a new metric called task complexity: the length of the shortest program that achieves a target performance on a task. In this framework, the SAH simply claims that pre-trained models drastically reduce the complexity of achieving high performance on many tasks. Our definition unifies prior arguments supporting the SAH, interpreting them as different strategies to find such short programs. Experimentally, we estimate the task complexity of mathematical reasoning, machine translation, and instruction following; we then show that these complexities can be remarkably low when conditioned on a pre-trained model. Further, we find that pre-training enables access to strong performances on our tasks, but it can require programs of gigabytes of length to access them. Post-training, on the other hand, collapses the complexity of reaching this same performance by several orders of magnitude. Overall, our results highlight that task adaptation often requires surprisingly little information -- often just a few kilobytes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.15829v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tom\'as Vergara-Browne, Darshan Patil, Ivan Titov, Siva Reddy, Tiago Pimentel, Marius Mosbach</dc:creator>
    </item>
    <item>
      <title>Geometry-Aware Uncertainty Quantification via Conformal Prediction on Manifolds</title>
      <link>https://arxiv.org/abs/2602.16015</link>
      <description>arXiv:2602.16015v2 Announce Type: replace 
Abstract: Conformal prediction gives finite-sample coverage guarantees for regression, but most standard constructions are designed for Euclidean output spaces. When the response lies on a Riemannian manifold, Euclidean residuals and coordinate-based regions can ignore the geometry that defines meaningful error. We propose adaptive geodesic conformal prediction, a simple framework that builds nonconformity scores from geodesic distances and normalizes them with a cross-validated estimate of local prediction difficulty. On the sphere, this produces geodesic caps whose area is independent of position, while their radii still adapt to heteroscedastic noise. In both a synthetic sphere experiment and an IGRF-14 geomagnetic field forecasting task, the adaptive method preserves valid marginal coverage, reduces variation in conditional coverage, and improves worst-case coverage relative to non-adaptive and coordinate-based baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.16015v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Marzieh Amiri Shahbazi, Ali Baheri</dc:creator>
    </item>
    <item>
      <title>Amortized Predictability-aware Training Framework for Time Series Forecasting and Classification</title>
      <link>https://arxiv.org/abs/2602.16224</link>
      <description>arXiv:2602.16224v3 Announce Type: replace 
Abstract: Time series data are prone to noise in various domains, and training samples may contain low-predictability patterns that deviate from the normal data distribution, leading to training instability or convergence to poor local minima. Therefore, mitigating the adverse effects of low-predictability samples is crucial for time series analysis tasks such as time series forecasting (TSF) and time series classification (TSC). While many deep learning models have achieved promising performance, few consider how to identify and penalize low-predictability samples to improve model performance from the training perspective. To fill this gap, we propose a general Amortized Predictability-aware Training Framework (APTF) for both TSF and TSC. APTF introduces two key designs that enable the model to focus on high-predictability samples while still learning appropriately from low-predictability ones: (i) a Hierarchical Predictability-aware Loss (HPL) that dynamically identifies low-predictability samples and progressively expands their loss penalty as training evolves, and (ii) an amortization model that mitigates predictability estimation errors caused by model bias, further enhancing HPL's effectiveness. The code is available at https://github.com/Meteor-Stars/APTF.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.16224v3</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xu Zhang, Peng Wang, Yichen Li, Wei Wang</dc:creator>
    </item>
    <item>
      <title>Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents</title>
      <link>https://arxiv.org/abs/2602.16346</link>
      <description>arXiv:2602.16346v4 Announce Type: replace 
Abstract: LLM-based agents execute real-world workflows via tools and memory. These affordances enable ill-intended adversaries to also use these agents to carry out complex misuse scenarios. Existing agent misuse benchmarks largely test single-prompt instructions, leaving a gap in measuring how agents end up helping with harmful or illegal tasks over multiple turns. We introduce STING (Sequential Testing of Illicit N-step Goal execution), an automated red-teaming framework that constructs a step-by-step illicit plan grounded in a benign persona and iteratively probes a target agent with adaptive follow-ups, using judge agents to track phase completion. We further introduce an analysis framework that models multi-turn red-teaming as a time-to-first-jailbreak random variable, enabling analysis tools like discovery curves, hazard-ratio attribution by attack language, and a new metric: Restricted Mean Jailbreak Discovery. Across AgentHarm scenarios, STING yields substantially higher illicit-task completion than single-turn prompting and chat-oriented multi-turn baselines adapted to tool-using agents. In multilingual evaluations across six non-English settings, we find that attack success and illicit-task completion do not consistently increase in lower-resource languages, diverging from common chatbot findings. Overall, STING provides a practical way to evaluate and stress-test agent misuse in realistic deployment settings, where interactions are inherently multi-turn and often multilingual.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.16346v4</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nivya Talokar, Ayush K Tarun, Murari Mandal, Maksym Andriushchenko, Antoine Bosselut</dc:creator>
    </item>
    <item>
      <title>Web Agents Should Use Typed Actions Instead of Click-Based Browsing</title>
      <link>https://arxiv.org/abs/2602.17245</link>
      <description>arXiv:2602.17245v2 Announce Type: replace 
Abstract: This position paper argues that building a reliable agentic Web requires shifting from low-level interaction primitives to typed actions supported by a semantic layer. Today's web agents primarily operate through clicks, keystrokes, and DOM manipulation, which leads to brittle long-horizon behavior, high execution cost, and limited auditability. We propose web verbs as a concrete design for this layer. A verb exposes a web operation as a typed function with structured inputs, structured outputs, and documented behavior, whether it is backed by a server-side Web API or a maintained client-side workflow. Verb calls can carry preconditions, postconditions, policy tags, and logging hooks, allowing agents to synthesize concise programs with explicit control flow and data flow and to produce checkable execution traces. Using representative case studies, we illustrate how verb-level composition can produce correct, reproducible outcomes, while browser agents using low-level interaction primitives may produce brittle behavior or incorrect reasoning. We conclude with a call to action on standardization, developer tooling, and community processes needed to make this semantic layer deployable and trustworthy at web scale.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.17245v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Linxi Jiang, Rui Xi, Zhijie Liu, Shuo Chen, Zhiqiang Lin, Suman Nath</dc:creator>
    </item>
    <item>
      <title>Security of the Fischlin Transform in Quantum Random Oracle Model</title>
      <link>https://arxiv.org/abs/2602.17307</link>
      <description>arXiv:2602.17307v2 Announce Type: replace 
Abstract: The Fischlin transform yields non-interactive zero-knowledge proofs with straight-line extractability in the classical random oracle model. This is done by forcing a prover to generate multiple accepting transcripts through a proof-of-work mechanism. Whether the Fischlin transform is straight-line extractable against quantum adversaries has remained open due to the difficulty of reasoning about the likelihood of query transcripts in the quantum-accessible random oracle model (QROM), even when using the compressed oracle methodology. In this work, we prove that the Fischlin transform remains straight-line extractable in the QROM, via an extractor based on the compressed oracle. This establishes the post-quantum security of the Fischlin transform, providing a post-quantum straight-line extractable NIZK alternative to Pass' transform with smaller proof size. Our techniques include tail bounds for sums of independent random variables and for martingales as well as symmetrization, query amplitude and quantum union bound arguments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.17307v2</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Christian Majenz, Jaya Sharma</dc:creator>
    </item>
    <item>
      <title>Polaffini: A feature-based approach for robust affine and polyaffine image registration</title>
      <link>https://arxiv.org/abs/2602.17337</link>
      <description>arXiv:2602.17337v2 Announce Type: replace 
Abstract: In this work we present Polaffini, a robust and versatile framework for anatomically grounded registration. Medical image registration is dominated by intensity-based registration methods that rely on surrogate measures of alignment quality. In contrast, feature-based approaches that operate by identifying explicit anatomical correspondences, while more desirable in theory, have largely fallen out of favor due to the challenges of reliably extracting features. However, such challenges are now significantly overcome thanks to recent advances in deep learning, which provide pre-trained segmentation models capable of instantly delivering reliable, fine-grained anatomical delineations. We aim to demonstrate that these advances can be leveraged to create new anatomically-grounded image registration algorithms. To this end, we propose Polaffini, which obtains, from these segmented regions, anatomically grounded feature points with 1-to-1 correspondence in a particularly simple way: extracting their centroids. These enable efficient global and local affine matching via closed-form solutions. Those are used to produce an overall transformation ranging from affine to polyaffine with tunable smoothness. Polyaffine transformations can have many more degrees of freedom than affine ones allowing for finer alignment, and their embedding in the log-Euclidean framework ensures diffeomorphic properties. Polaffini has applications both for standalone registration and as pre-alignment for subsequent non-linear registration, and we evaluate it against popular intensity-based registration techniques. Results demonstrate that Polaffini outperforms competing methods in terms of structural alignment and provides improved initialisation for downstream non-linear registration. Polaffini is fast, robust, and accurate, making it particularly well-suited for integration into medical image processing pipelines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.17337v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Antoine Legouhy, Cosimo Campo, Ross Callaghan, Hojjat Azadbakht, Hui Zhang</dc:creator>
    </item>
    <item>
      <title>Condition-Gated Reasoning for Context-Dependent Biomedical Question Answering</title>
      <link>https://arxiv.org/abs/2602.17911</link>
      <description>arXiv:2602.17911v3 Announce Type: replace 
Abstract: Current biomedical question answering (QA) systems often assume that medical knowledge applies uniformly, yet real-world clinical reasoning is inherently conditional: nearly every decision depends on patient-specific factors such as comorbidities and contraindications. Existing benchmarks do not evaluate such conditional reasoning, and retrieval-augmented or graph-based methods lack explicit mechanisms to ensure that retrieved knowledge is applicable to given context. To address this gap, we propose CondMedQA, the first benchmark for conditional biomedical QA, consisting of multi-hop questions whose answers vary with patient conditions. Furthermore, we propose Condition-Gated Reasoning (CGR), a novel framework that constructs condition-aware knowledge graphs and selectively activates or prunes reasoning paths based on query conditions. Our findings show that CGR more reliably selects condition-appropriate answers while matching or exceeding state-of-the-art performance on biomedical QA benchmarks, highlighting the importance of explicitly modeling conditionality for robust medical reasoning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.17911v3</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3770855.3818963 10.1145/3770855.3818963 10.1145/3770855.3818963 10.1145/3770855.3818963 10.1145/3770855.3818963</arxiv:DOI>
      <dc:creator>Jash Rajesh Parekh, Wonbin Kweon, Joey Chan, Rezarta Islamaj, Robert Leaman, Pengcheng Jiang, Chih-Hsuan Wei, Zhizheng Wang, Zhiyong Lu, Jiawei Han</dc:creator>
    </item>
    <item>
      <title>UAOR: Uncertainty-aware Observation Reinjection for Vision-Language-Action Models</title>
      <link>https://arxiv.org/abs/2602.18020</link>
      <description>arXiv:2602.18020v2 Announce Type: replace 
Abstract: Vision-Language-Action (VLA) models leverage pretrained Vision-Language Models (VLMs) as backbones to map images and instructions to actions, demonstrating remarkable potential for generalizable robotic manipulation. To enhance performance, existing methods often incorporate extra observation cues (e.g., depth maps, point clouds) or auxiliary modules (e.g., object detectors, encoders) to enable more precise and reliable task execution, yet these typically require costly data collection and additional training. Inspired by the finding that Feed-Forward Network (FFN) in language models can act as "key-value memory", we propose Uncertainty-aware Observation Reinjection (UAOR), an effective, training-free and plug-and-play module for VLA models. Specifically, when the current language model layer exhibits high uncertainty, measured by Action Entropy, it reinjects key observation information into the next layer's Feed-Forward Network (FFN) through attention retrieval. This mechanism directly augments the hidden states with observation evidence at high-uncertainty layers, enabling more accurate and reliable action generation. Comprehensive experiments show that our method consistently improves diverse VLA models across simulation and real-world tasks with minimal overhead. Notably, UAOR eliminates the need for additional observation cues or modules, making it a versatile and practical plug-in for existing VLA pipelines. The project page is at https://uaor.jiabingyang.cn.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.18020v2</guid>
      <category>cs.CV</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiabing Yang, Yixiang Chen, Yuan Xu, Peiyan Li, Zichen Wen, Bowen Fang, Tao Yu, Xiangnan Wu, Qisen Ma, Kai Wang, Ziheng He, Yingda Li, Zhengbo Zhang, Jing Liu, Nianfeng Liu, Yan Huang, Liang Wang</dc:creator>
    </item>
    <item>
      <title>Quantum Maximum Likelihood Prediction via Hilbert Space Embeddings</title>
      <link>https://arxiv.org/abs/2602.18364</link>
      <description>arXiv:2602.18364v2 Announce Type: replace 
Abstract: Maximum likelihood prediction (MLP) is a core task at the heart of modern large language models. Here, we study a quantum version of this task for a simplified data model consisting of independent and identically distributed samples, as a first step. The quantum maximum likelihood predictor is obtained by embedding of empirical probability distributions into quantum states and performing a minimization of quantum relative entropy over a given class of states. We provide an interpretation of this predictor in terms of quantum reverse information projection and quantum Pythagorean theorem when the class of quantum models is sufficiently expressive. We further derive non-asymptotic performance guarantees in terms of convergence rates and concentration inequalities, both in trace norm and quantum relative entropy. Our approach provides a unified framework to handle MLP within both classical and quantum LLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.18364v2</guid>
      <category>cs.IT</category>
      <category>cs.LG</category>
      <category>math.IT</category>
      <category>quant-ph</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sreejith Sreekumar, Nir Weinberger</dc:creator>
    </item>
    <item>
      <title>Resilience of Task-Oriented V2X Networks to Incomplete Information Sharing</title>
      <link>https://arxiv.org/abs/2602.18620</link>
      <description>arXiv:2602.18620v2 Announce Type: replace 
Abstract: Task-oriented Vehicle-to-Everything (V2X) networks have recently been proposed to scalably support the large-scale deployment of connected vehicles. In task-oriented V2X networks, vehicles select the content of the transmitted messages based on its relevance to the intended receivers. However, estimating relevance can be challenging, especially in highly dynamic and complex driving scenarios. Relevance estimation errors may cause a transmitting vehicle to share incomplete information, omitting relevant data that is critical for the intended receivers' situational awareness. This work numerically demonstrates that task-oriented V2X networks exhibit an inherent resilience to incomplete information sharing. We show that such resilience guarantees a consistent delivery of relevant information even under high relevance estimation error probability conditions. Furthermore, we show that the fundamental conditions underpinning such inherent resilience can also be encountered outside of the V2X domain - in particular, in other task-oriented networks where multiple transmitters select the content of their messages based on the task-related requirements of a common set of intended receivers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.18620v2</guid>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Luca Lusvarghi, Javier Gozalvez</dc:creator>
    </item>
    <item>
      <title>Insertion Based Sequence Generation with Learnable Order Dynamics</title>
      <link>https://arxiv.org/abs/2602.18695</link>
      <description>arXiv:2602.18695v2 Announce Type: replace 
Abstract: Existing insertion-based masked diffusion models that generate sequences by interleaving token insertion with unmasking use fixed schedules that are not dependent on the data. For structured sequences like graphs and molecules, learning data-dependent generation orders can improve generation quality by reducing uncertainty over the action space. We propose LoFlexMDM, an insertion-based masked diffusion model with learnable order dynamics that learns data-dependent insertion and unmasking rates. We generalize the discrete flow matching framework to work with variable-length sequences, propose a tractable schedule parameterization and a training objective for joint training of the generator and the target order dynamics. On De Novo and fragment-constrained molecule generation, LoFlexMDM improves sample quality over FlexMDM by up to 17.5% and 6.7%, respectively. These results show that learning the target generation order can improve insertion-based diffusion models without giving up tractable training. We open source the code at https://github.com/dhruvdcoder/LoFlexMDM.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.18695v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dhruvesh Patel, Benjamin Rozonoyer, Gaurav Pandey, Tahira Naseem, Ram\'on Fernandez Astudillo, Andrew McCallum</dc:creator>
    </item>
    <item>
      <title>CTS-Bench: Benchmarking Graph Coarsening Trade-offs for GNNs in Clock Tree Synthesis</title>
      <link>https://arxiv.org/abs/2602.19330</link>
      <description>arXiv:2602.19330v2 Announce Type: replace 
Abstract: Graph Neural Networks (GNNs) are increasingly explored for physical design analysis in Electronic Design Automation, particularly for modeling Clock Tree Synthesis behavior such as clock skew and buffering complexity. However, practical deployment remains limited due to the prohibitive memory and runtime cost of operating on raw gate-level netlists. Graph coarsening is commonly used to improve scalability, yet its impact on CTS-critical learning objectives is not well characterized. This paper introduces CTS-Bench, a benchmark suite for systematically evaluating the trade-offs between graph coarsening, prediction accuracy, and computational efficiency in GNN-based CTS analysis. CTS-Bench consists of 4,860 converged physical design solutions spanning five architectures and provides paired raw gate-level and clustered graph representations derived from post-placement designs. Using clock skew prediction as a representative CTS task, we demonstrate a clear accuracy-efficiency trade-off. While graph coarsening reduces GPU memory usage by up to 17.2x and accelerates training by up to 3x, it also removes structural information essential for modeling clock distribution, frequently resulting in negative $R^2$ scores under zero-shot evaluation. Our findings indicate that generic graph clustering techniques can fundamentally compromise CTS learning objectives, even when global physical metrics remain unchanged. CTS-Bench enables principled evaluation of CTS-aware graph coarsening strategies, supports benchmarking of GNN architectures and accelerators under realistic physical design constraints, and provides a foundation for developing learning-assisted CTS analysis and optimization techniques.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.19330v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Barsat Khadka, Kawsher Roxy, Md Rubel Ahmed</dc:creator>
    </item>
    <item>
      <title>Fast Linear Reservoirs via Diagonalization</title>
      <link>https://arxiv.org/abs/2602.19802</link>
      <description>arXiv:2602.19802v3 Announce Type: replace 
Abstract: We introduce a diagonalization-based optimization for Linear Echo State Networks (ESNs) that reduces the per-step computational complexity of reservoir state updates from quadratic to linear. By reformulating reservoir dynamics in the eigenbasis of the recurrent matrix, the recurrent update becomes a set of independent element-wise operations, eliminating the matrix multiplication. We further propose three methods to use our optimization depending on the situation: (i) Eigenbasis Weight Transformation (EWT), which preserves the dynamics of standard and trained Linear ESNs, (ii) End-to-End Eigenbasis Training (EET), which directly optimizes readout weights in the transformed space and (iii) Direct Parameter Generation (DPG), that bypasses matrix diagonalization by directly sampling eigenvalues and eigenvectors, achieving comparable performance to standard Linear ESNs. Across all experiments, both our methods preserve predictive accuracy while offering significant computational speedups, making them a replacement for standard Linear ESNs computations and training, and suggesting a shift of paradigm in linear ESN towards the direct selection of eigenvalues.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.19802v3</guid>
      <category>cs.DC</category>
      <category>cs.NE</category>
      <category>math.CV</category>
      <category>math.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Romain de Coudenhove (Mnemosyne, DI-ENS), Yannis Bendi-Ouis (Mnemosyne), Anthony Strock (Mnemosyne), Xavier Hinaut (Mnemosyne)</dc:creator>
    </item>
    <item>
      <title>CAD-Prompted SAM3: Geometry-Conditioned Instance Segmentation for Industrial Objects</title>
      <link>https://arxiv.org/abs/2602.20551</link>
      <description>arXiv:2602.20551v3 Announce Type: replace 
Abstract: Verbal-prompted segmentation is inherently limited by the expressiveness of natural language and struggles with uncommon, instance-specific, or difficult-to-describe objects: scenarios frequently encountered in manufacturing and 3D printing environments. While image exemplars provide an alternative, they primarily encode appearance cues such as color and texture, which are often unrelated to a part's geometric identity. In industrial settings, a single component may be produced in different materials, finishes, or colors, making appearance-based prompting unreliable. In contrast, such objects are typically defined by precise CAD models that capture their canonical geometry. We propose a CAD-prompted segmentation framework built on SAM3 that uses canonical multi-view renderings of a CAD model as prompt input. The rendered views provide geometry-based conditioning independent of surface appearance. The model is trained using synthetic data generated from mesh renderings in simulation under diverse viewpoints and scene contexts. Our approach enables single-stage, CAD-prompted mask prediction, extending promptable segmentation to objects that cannot be robustly described by language or appearance alone.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.20551v3</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhenran Tang, Rohan Nagabhirava, Changliu Liu</dc:creator>
    </item>
    <item>
      <title>NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning</title>
      <link>https://arxiv.org/abs/2602.21172</link>
      <description>arXiv:2602.21172v3 Announce Type: replace 
Abstract: Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with NORD (No Reasoning for Driving). Compared to existing VLAs, NORD achieves competitive performance while being fine-tuned on &lt;60% of the data and no reasoning annotations, resulting in 3x fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. NORD overcomes this by incorporating Dr. GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, NORD achieves competitive performance on Waymo and NAVSIM with a fraction of the training data and no reasoning overhead, enabling more efficient autonomous systems. Website: https://nord-vla-ai.github.io/</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.21172v3</guid>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ishaan Rawal, Shubh Gupta, Yihan Hu, Wei Zhan</dc:creator>
    </item>
    <item>
      <title>Efficient Scaling of LLM Training with Flexible Context Parallelism</title>
      <link>https://arxiv.org/abs/2602.21788</link>
      <description>arXiv:2602.21788v2 Announce Type: replace 
Abstract: Scaling long-context capabilities is crucial for Large Language Models (LLMs). However, real-world data contain a large number of sequences with heterogeneous lengths. Existing training libraries for LLMs rely on static parallelism strategies, which suffer from severe load imbalance, redundant communication, and suboptimal hardware utilization under data heterogeneity. In this work, we propose Flexible Context Parallelism (FCP), an efficient parallelism strategy that adaptively reconfigures communication groups and context parallelism degrees during LLM training. We generalize more flexible non-power-of-two parallelism degrees and develop a polynomial-time algorithm to generate near-optimal parallelism strategies with only millisecond-level overhead per training batch. FCP is able to maintain high hardware efficiency even under extreme data heterogeneity. Experimental results demonstrate that FCP significantly outperforms Megatron-LM and DeepSpeed in both LLM and MLLM training, achieving up to 1.46x speedup in average throughput while maintaining near-linear scaling efficiency across large-scale clusters. For extremely unbalanced batches, FCP even achieves 2.24x speedup.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.21788v2</guid>
      <category>cs.DC</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yifan Niu, Han Xiao, Dongyi Liu, Wei Zhou, Jia Li</dc:creator>
    </item>
    <item>
      <title>2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support</title>
      <link>https://arxiv.org/abs/2602.21889</link>
      <description>arXiv:2602.21889v3 Announce Type: replace 
Abstract: Predictions from ML models support human decision making in several fields, including high-stakes ones such as healthcare and the judiciary. Yet, we still lack a clear understanding of how decision makers learn from ML-based decision support (ML-DS). In this paper, we introduce a general computational framework, the 2-Step Agent, to capture this process. As a prediction from an ML model contains information about the training data, a prediction can also be used for inference. Our framework models (i) how a prediction for a new observation affects the beliefs of a rational Bayesian agent, and (ii) how this change in beliefs affects the estimation of causal effect, the downstream decision, and the subsequent outcome. In addition to the framework itself, we make three contributions. First, for the linear Gaussian setting, we derive a tractable solution for the challenging Bayesian inference problem we introduced, i.e. one in which the agent infers from an ML prediction. Second, we experimentally identify conditions under which ML-DS is beneficial. Third, we show that a single misaligned prior belief can be sufficient for ML-DS to lead to worse downstream outcomes compared to no decision support even when the ML model is well-specified and the agent is perfectly rational. Hence, even under ideal conditions, ML-DS can do more harm than good.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.21889v3</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Otto Nyberg, Fausto Carcassi, Davide Tugnoli, Giovanni Cin\`a</dc:creator>
    </item>
    <item>
      <title>Energy Efficient Federated Learning with Hyperdimensional Computing over Wireless Communication Networks</title>
      <link>https://arxiv.org/abs/2602.21949</link>
      <description>arXiv:2602.21949v2 Announce Type: replace 
Abstract: In this paper, we investigate a problem of minimizing total energy consumption for secure federated learning (FL) over wireless edge networks. To address the high computational cost and privacy challenges in conventional FL with neural networks (NN) for resource-constrained users, we propose a novel FL with hyperdimensional computing and differential privacy (FL-HDC-DP) framework. In the considered model, each edge user employs hyperdimensional computing (HDC) for local training, which replaces complex neural updates with simple hypervector operations, and applies differential privacy (DP) noise to protect transmitted model information. We optimize the total energy of computation and communication under both latency and privacy constraints. We formulate the problem as an optimization that minimizes the total energy of all users by jointly allocating HDC dimension, transmission time, system bandwidth, transmit power, and CPU frequency. To solve this problem, a sigmoid-variant function is proposed to characterize the relationship between the HDC dimension and the convergence rounds required to reach a target accuracy. Based on this model, we develop two alternating optimization algorithms, where closed-form expressions for time, frequency, bandwidth, and power allocations are derived at each iteration. Since the iterative algorithm requires a feasible initialization, we construct a feasibility problem and obtain feasible initial resource parameters by solving a per round transmission time minimization problem. Simulation results demonstrate that the proposed FL-HDC-DP framework achieves up to 83.3% total energy reduction compared with the baseline, while attaining about 90% accuracy in approximately 3.5X fewer communication rounds than the NN baseline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.21949v2</guid>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yahao Ding, Yinchao Yang, Jiaxiang Wang, Zhaohui Yang, Zhu Han, Mohammad Shikh-Bahaei</dc:creator>
    </item>
    <item>
      <title>SODA-CitrON: Static Object Data Association by Clustering Multi-Modal Sensor Detections Online</title>
      <link>https://arxiv.org/abs/2602.22243</link>
      <description>arXiv:2602.22243v3 Announce Type: replace 
Abstract: The online fusion and tracking of static objects from heterogeneous sensor detections is a fundamental problem in robotics, autonomous systems, and environmental mapping. Although classical data association approaches such as JPDA are well suited for dynamic targets, they are less effective for static objects observed intermittently and with heterogeneous uncertainties, where motion models provide minimal discriminative power with respect to clutter. In this paper, we propose a novel method for static object data association by clustering multi-modal sensor detections online (SODA-CitrON), while simultaneously estimating positions and maintaining persistent tracks for an unknown number of objects. The proposed unsupervised machine learning approach operates in a fully online manner and handles temporally uncorrelated and multi-sensor measurements. Additionally, it has a worst-case loglinear complexity in the number of sensor detections while providing full output explainability. We evaluate the proposed approach in different Monte Carlo simulation scenarios and compare it against state-of-the-art methods, including POM-based filtering, DBSTREAM clustering, and JPDA. The results demonstrate that SODA-CitrON consistently outperforms the compared methods in terms of F1 score, position RMSE, MOTP, and MOTA in the static object mapping scenarios studied.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.22243v3</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jan Nausner, Kilian Wohlleben, Michael Hubner</dc:creator>
    </item>
    <item>
      <title>CRAG: Can 3D Generative Models Help 3D Assembly?</title>
      <link>https://arxiv.org/abs/2602.22629</link>
      <description>arXiv:2602.22629v2 Announce Type: replace 
Abstract: Most existing 3D assembly methods treat the problem as pure pose estimation, rearranging observed parts via rigid transformations. In contrast, human assembly naturally couples structural reasoning with holistic shape inference. Inspired by this intuition, we reformulate 3D assembly as a joint problem of assembly and generation. We show that these two processes are mutually reinforcing: assembly provides part-level structural priors for generation, while generation injects holistic shape context that resolves ambiguities in assembly. Unlike prior methods that cannot synthesize missing geometry, we propose CRAG, which simultaneously generates plausible complete shapes and predicts poses for input parts. Extensive experiments demonstrate state-of-the-art performance across in-the-wild objects with diverse geometries, varying part counts, and missing pieces. Project Page: https://ai4ce.github.io/CRAG/</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.22629v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zeyu Jiang, Sihang Li, Siqi Tan, Chenyang Xu, Juexiao Zhang, Julia Galway-Witham, Xue Wang, Scott A. Williams, Radu Iovita, Chen Feng, Jing Zhang</dc:creator>
    </item>
    <item>
      <title>Imagination Helps Visual Reasoning, But Not Yet in Latent Space</title>
      <link>https://arxiv.org/abs/2602.22766</link>
      <description>arXiv:2602.22766v2 Announce Type: replace 
Abstract: Latent visual reasoning aims to mimic human's imagination process by meditating through hidden states of Multimodal Large Language Models. While recognized as a promising paradigm for visual reasoning, the underlying mechanisms driving its effectiveness remain unclear. Motivated to demystify the true source of its efficacy, we investigate the validity of latent reasoning using Causal Mediation Analysis. We model the process as a causal chain: the input as the treatment, the latent tokens as the mediator, and the final answer as the outcome. Our findings uncover two critical disconnections: (a) Input-Latent Disconnect: dramatic perturbations on the input result in negligible changes to the latent tokens, suggesting that latent tokens do not effectively attend to the input sequence. (b) Latent-Answer Disconnect: perturbations on the latent tokens yield minimal impact on the final answer, indicating the limited causal effect latent tokens imposing on the outcome. Furthermore, extensive probing analysis reveals that latent tokens encode limited visual information and exhibit high similarity. Consequently, we challenge the necessity of latent reasoning and propose a straightforward alternative named CapImagine, which teaches the model to explicitly imagine using text. Experiments on vision-centric benchmarks show that CapImagine significantly outperforms complex latent-space baselines, highlighting the superior potential of visual reasoning through explicit imagination.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.22766v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>You Li, Chi Chen, Yanghao Li, Fanhu Zeng, Kaiyu Huang, Jinan Xu, Maosong Sun</dc:creator>
    </item>
    <item>
      <title>Chain of Flow: ECG-Conditioned 4D Cardiac Cine Generation from Patient-Specific Anatomical Anchor</title>
      <link>https://arxiv.org/abs/2602.22919</link>
      <description>arXiv:2602.22919v2 Announce Type: replace 
Abstract: Cardiac cine magnetic resonance imaging (MRI) is central to functional cardiac assessment, yet a full current cine sequence may not always be directly available at the point of analysis. We introduce Chain of Flow (COF), an electrocardiography (ECG)-conditioned framework that combines patient-specific MRI and current ECG for subject-specific 4D cardiac cine generation. On the UK Biobank dataset, COF achieves strong image-level fidelity and downstream function-oriented performance on a shared same-visit evaluable benchmark. Multi-slice and multi-resolution analyses indicate stable structural generation quality across the short-axis stack and heterogeneous acquisition resolutions. Controlled phase-robustness analyses across resampled input MRI phases further provide same-visit proxy support for patient-specific MRI plus current ECG when a target MRI phase is not directly observed. A cross-visit route provides exploratory serial evidence, with the clearest gains in current-facing region-of-interest readout. Disease-category functional audits, case-level volume-trajectory evidence review further delineate where the current patient-specific MRI plus ECG formulation remains stable for anatomy-aware downstream cardiac analysis. Code is available at https://anonymous.4open.science/r/COF-paper-release-C88B.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.22919v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haofan Wu, Nay Aung, Theodoros N. Arvanitis, Joao A. C. Lima, Steffen E. Petersen, Le Zhang</dc:creator>
    </item>
    <item>
      <title>Scaling Search Relevance: Augmenting App Store Ranking with LLM-Generated Judgments</title>
      <link>https://arxiv.org/abs/2602.23234</link>
      <description>arXiv:2602.23234v5 Announce Type: replace 
Abstract: Large-scale commercial search systems optimize for relevance to drive successful sessions that help users find what they are looking for. To maximize relevance, we leverage two complementary objectives: behavioral relevance (results users tend to click or download) and textual relevance (a result's semantic fit to the query). A persistent challenge is the scarcity of expert-provided textual relevance labels relative to abundant behavioral relevance labels. We first address this by systematically evaluating LLM configurations, finding that a specialized, fine-tuned model significantly outperforms a much larger pre-trained one in providing highly relevant labels. Using this optimal model as a force multiplier, we generate millions of textual relevance labels to overcome the data scarcity. We show that augmenting our production ranker with these textual relevance labels leads to a significant outward shift of the Pareto frontier: offline NDCG improves for behavioral relevance while simultaneously increasing for textual relevance. These offline gains were validated by a worldwide A/B test on the App Store ranker, which demonstrated a statistically significant +0.24% increase in conversion rate, with the most substantial performance gains occurring in tail queries, where the new textual relevance labels provide a robust signal in the absence of reliable behavioral relevance labels.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.23234v5</guid>
      <category>cs.IR</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Evangelia Christakopoulou, Vivekkumar Patel, Hemanth Velaga, Sandip Gaikwad, Sean Suchter, Venkat Sundaranatha</dc:creator>
    </item>
    <item>
      <title>A Mixed Diet Makes DINO An Omnivorous Vision Encoder</title>
      <link>https://arxiv.org/abs/2602.24181</link>
      <description>arXiv:2602.24181v2 Announce Type: replace 
Abstract: Pre-trained vision encoders like DINOv2 have demonstrated exceptional performance on unimodal tasks. However, we observe that their features are poorly aligned across different visual modalities. For instance, the feature embedding for an RGB image and its corresponding depth map of the same scene exhibit a cosine similarity that is nearly identical to that of two random, unrelated images. To address this, we propose the Omnivorous Vision Encoder, a post-training framework that learns a modality-agnostic feature space. We fine-tune the encoder with a dual objective: first, to maximize the feature alignment between different modalities of the same scene; and second, a distillation objective that anchors the learned representations to a fully frozen teacher. The resulting student encoder becomes "omnivorous" by producing more consistent embeddings for a given scene, regardless of the input modality (RGB, Depth, Segmentation, etc.). This approach enables robust cross-modal understanding while retaining the discriminative semantics of the original foundation model. Omnivorous model weights are available at https://github.com/google-deepmind/representations4d.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.24181v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Rishabh Kabra, Maks Ovsjanikov, Drew A. Hudson, Ye Xia, Skanda Koppula, Andre Araujo, Joao Carreira, Niloy J. Mitra</dc:creator>
    </item>
    <item>
      <title>Uncertainty-Aware Hierarchical Re-Localization in OpenStreetMap via Semantic Alignment</title>
      <link>https://arxiv.org/abs/2603.01613</link>
      <description>arXiv:2603.01613v2 Announce Type: replace 
Abstract: Monocular re-localization enables robots to estimate camera poses from visual observations. However, many existing methods rely on dense maps or large reference image databases, which face scalability limitations and privacy risks. OpenStreetMap (OSM), as a lightweight privacy-preserving map, offers semantic and geometric information with global scalability. Nonetheless, OSM localization remains challenging due to cross-modal discrepancies between natural images and OSM, as well as the high cost of global map-based localization. In this paper, we propose an uncertainty-aware hierarchical search framework with semantic alignment for localization in OSM. First, object-centric DINO-ViT tokens are exploited to reduce the semantic gap between ground-view observations and OSM vectors. Second, global dense matching is decomposed into coarse FFT correlation and uncertainty-controlled local refinement. Extensive experiments demonstrate that our method significantly improves localization accuracy and speed. When trained on a single dataset, the 3$^\circ$ orientation recall of our method even outperforms the 5$^\circ$ recall of state-of-the-art methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.01613v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuchen Zou, Xiao Hu, Lihuang Fang, Yuqing Tang</dc:creator>
    </item>
    <item>
      <title>From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG</title>
      <link>https://arxiv.org/abs/2603.03292</link>
      <description>arXiv:2603.03292v3 Announce Type: replace 
Abstract: Large Language Models (LLMs) exhibit high reasoning capacity in medical question-answering, but their tendency to produce hallucinations and outdated knowledge poses critical risks in healthcare fields. While Retrieval-Augmented Generation (RAG) mitigates these issues, existing methods rely on noisy token-level signals and lack the multi-round refinement required for complex reasoning. In this paper, we propose MA-RAG (Multi-Round Agentic RAG), a framework that facilitates test-time scaling for complex medical reasoning by iteratively evolving both external evidence and internal reasoning history within an agentic refinement loop. At each round, the agent transforms semantic conflict among candidate responses into actionable queries to retrieve external evidence, while optimizing history reasoning traces to mitigate long-context degradation. MA-RAG extends the self-consistency principle by leveraging the lack of consistency as a proactive signal for multi-round agentic reasoning and retrieval, and mirrors a boosting mechanism that iteratively minimizes the residual error toward a stable, high-fidelity medical consensus. Extensive evaluations across 7 medical Q&amp;A benchmarks show that MA-RAG consistently surpasses competitive inference-time scaling and RAG baselines, delivering substantial +6.8 points on average accuracy over the backbone model. Our code is available at https://github.com/NJU-RL/MA-RAG.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.03292v3</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wenhao Wu, Zhentao Tang, Yafu Li, Shixiong Kai, Mingxuan Yuan, Zhenhong Sun, Chunlin Chen, Zhi Wang</dc:creator>
    </item>
    <item>
      <title>A Baseline Study and Benchmark for Few-Shot Open-Set Action Recognition with Feature Residual Discrimination</title>
      <link>https://arxiv.org/abs/2603.04125</link>
      <description>arXiv:2603.04125v2 Announce Type: replace 
Abstract: Few-Shot Action Recognition (FS-AR) has shown promising results but is often limited by a closed-set assumption that fails in real-world open-set scenarios. While Few-Shot Open-Set (FSOS) recognition is well-established for images, its extension to spatio-temporal video data remains underexplored. To address this, we propose an architectural extension based on a Feature-Residual Discriminator (FR-Disc), adapting previous work on skeletal data to the more complex video domain. Extensive experiments on five datasets demonstrate that while common open-set techniques provide only marginal gains, our FR-Disc significantly enhances unknown rejection capabilities without compromising closed-set accuracy, setting a new state-of-the-art for FSOS-AR. The project website, code, and benchmark are available at: https://hsp-iit.github.io/fsosar/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.04125v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Stefano Berti, Giulia Pasquale, Lorenzo Natale</dc:creator>
    </item>
    <item>
      <title>CodeTaste: Can LLMs Generate Human-Level Code Refactorings?</title>
      <link>https://arxiv.org/abs/2603.04177</link>
      <description>arXiv:2603.04177v2 Announce Type: replace 
Abstract: LLM coding agents can generate working code, but their solutions often accumulate complexity, duplication, and architectural debt. Human developers address such issues through refactoring: behavior-preserving program transformations that improve structure and maintainability. We investigate whether agents (i) can execute refactorings reliably and (ii) identify the refactorings that human developers actually chose in real codebases. To this end, we construct CodeTaste, a benchmark mined from large multi-file open-source refactorings. To score solutions, we combine repository test suites that measure functional correctness with tailored static checks that verify removal of undesired and introduction of desired code patterns using dataflow reasoning. Our results show a clear gap: agents perform well at implementing refactorings that are specified in detail, but often fail to discover the human refactoring choices when given a focus area for changes. A propose-then-implement decomposition improves alignment, and selecting the best-aligned proposal before implementation can yield further gains. CodeTaste provides an evaluation target and a potential preference signal for aligning coding agents with human refactoring decisions in realistic codebases. We release the benchmark, leaderboard, and code.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.04177v2</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alex Thillen, Niels M\"undler, Veselin Raychev, Martin Vechev</dc:creator>
    </item>
    <item>
      <title>Generalizing Fair Top-$k$ Selection: An Integrative Approach</title>
      <link>https://arxiv.org/abs/2603.04689</link>
      <description>arXiv:2603.04689v3 Announce Type: replace 
Abstract: Fair top-$k$ selection, which ensures appropriate proportional representation of members from minority or historically disadvantaged groups among the top-$k$ selected candidates, has drawn significant attention. We study the problem of finding a fair (linear) scoring function with multiple protected groups while also minimizing the disparity from a reference scoring function. This generalizes the prior setup, which was restricted to the single-group setting without disparity minimization. Previous studies imply that the number of protected groups may have a limited impact on the runtime efficiency. However, driven by the need for experimental exploration, we find that this implication overlooks a critical issue that may affect the fairness of the outcome. Once this issue is properly considered, our hardness analysis shows that the problem may become computationally intractable even for a two-dimensional dataset and small values of $k$. However, our analysis also reveals a gap in the hardness barrier, enabling us to recover the efficiency for the case of small $k$ when the number of protected groups is sufficiently small. Furthermore, beyond measuring disparity as the "distance" between the fair and the reference scoring functions, we introduce an alternative disparity measure$\unicode{x2014}$utility loss$\unicode{x2014}$that may yield a more stable scoring function under small weight perturbations. Through careful engineering trade-offs that balance implementation complexity, robustness, and performance, our augmented two-pronged solution demonstrates strong empirical performance on real-world datasets, with experimental observations also informing algorithm design and implementation decisions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.04689v3</guid>
      <category>cs.DS</category>
      <category>cs.CC</category>
      <category>cs.CG</category>
      <category>cs.CY</category>
      <category>cs.DB</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Guangya Cai</dc:creator>
    </item>
    <item>
      <title>An explicit finite-memory scheme for approximating and sampling invariant measures of stochastic functional differential equations with infinite delay</title>
      <link>https://arxiv.org/abs/2603.04724</link>
      <description>arXiv:2603.04724v2 Announce Type: replace 
Abstract: Efficient sampling and numerical approximation of invariant probability measures (IPMs) on infinite-dimensional function spaces are important problems in scientific computing. In this paper, we study the numerical approximation and sampling of IPMs associated with stochastic functional differential equations with infinite delay (SFDEswID). To this end, we develop a fully explicit ergodicity-preserving truncated Euler--Maruyama scheme for SFDEswID that requires only finite historical storage and accommodates superlinearly growing coefficients. We establish strong convergence of the numerical segment process and show that it admits a unique IPM and is exponentially ergodic in the Wasserstein distance. Building on these results, we prove the convergence of the numerical IPM to the exact one and derive an explicit convergence rate. As a consequence, we obtain a quantitative long-time sampling error estimate of order $O\left(e^{-\lambda_\varepsilon t_n}+\Delta^{\rho_\varepsilon}\right)$. The results provide a rigorous and computationally efficient framework for sampling IPMs and quantifying long-time sampling errors for stochastic systems with infinite delay.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.04724v2</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Guozhen Li, Shan Huang, Xiaoyue Li, Xuerong Mao</dc:creator>
    </item>
    <item>
      <title>Focus Then Listen: An Empirical Study of Plug-and-Play Audio Enhancer for Noise-Robust Large Audio Language Models</title>
      <link>https://arxiv.org/abs/2603.04862</link>
      <description>arXiv:2603.04862v4 Announce Type: replace 
Abstract: Large audio language models (LALMs) are a class of foundation models for audio understanding. Existing LALMs tend to degrade significantly in real-world noisy acoustic conditions where speech and non-speech sounds interfere. While noise-aware fine-tuning can improve robustness, it requires task-specific noisy data and expensive retraining, limiting scalability. To address this issue, we propose Focus-Then-Listen (FTL), a plug-and-play audio enhancer that improves LALMs' noise robustness. Specifically, FTL first separates the input waveform into speech and non-speech, and a modality router is applied to predict the target audio modality (e.g., speech) based on the user's instruction. Finally, a modality-aware fusion block generates a task-adaptive enhanced signal for improved downstream perception and reasoning. Experiments across multiple LALMs and tasks show that FTL improves performance across different noise levels without fine-tuning on LALMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.04862v4</guid>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Han Yin, Yang Xiao, Younghoo Kwon, Ting Dang, Jung-Woo Choi</dc:creator>
    </item>
    <item>
      <title>The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights</title>
      <link>https://arxiv.org/abs/2603.04865</link>
      <description>arXiv:2603.04865v3 Announce Type: replace 
Abstract: Recent progress in audio generation has made it increasingly easy to create highly realistic environmental soundscapes, which can be misused to produce deceptive content, such as fake alarms, gunshots, and crowd sounds, raising concerns for public safety and trust. While deepfake detection for speech and singing voice has been extensively studied, environmental sound deepfake detection (ESDD) remains underexplored. To advance ESDD, the first edition of the ESDD challenge was launched, attracting 97 registered teams and receiving 1,748 valid submissions. This paper presents the task formulation, dataset construction, evaluation protocols, baseline systems, and key insights from the challenge results. Furthermore, we analyze common architectural choices and training strategies among top-performing systems. Finally, we discuss potential future research directions for ESDD, outlining key opportunities and open problems to guide subsequent studies in this field.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.04865v3</guid>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Han Yin, Yang Xiao, Rohan Kumar Das, Jisheng Bai, Ting Dang</dc:creator>
    </item>
    <item>
      <title>RepoLaunch: Automating Build and Management of Code Repositories across Languages and Platforms</title>
      <link>https://arxiv.org/abs/2603.05026</link>
      <description>arXiv:2603.05026v2 Announce Type: replace 
Abstract: Language model (LM) agents have driven substantial progress in automated software engineering (SWE), yet building and testing software repositories at scale remains a largely manual and labor-intensive bottleneck. In this work, we introduce RepoLaunch, a novel agentic framework that automatically resolves dependencies, compiles source code, and extracts test results across diverse programming languages and operating systems. RepoLaunch achieves a 78% build success rate, outperforming the Python/Linux-only prior system by 18%. To demonstrate its application, we further present a fully automated pipeline for SWE dataset creation driven by RepoLaunch, which only requires human input at the task-design stage. RepoLaunch is open-sourced, and its automated task-generation pipeline has already been adopted by several recent works on agentic benchmarking and training.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.05026v2</guid>
      <category>cs.SE</category>
      <category>cs.LG</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kenan Li, Rongzhi Li, Linghao Zhang, Qirui Jin, Liao Zhu, Xiaosong Huang, Geng Zhang, Yikai Zhang, Shilin He, Chengxing Xie, Xin Zhang, Zijian Jin, Bowen Li, Chaoyun Zhang, Yu Kang, Yufan Huang, Elsie Nallipogu, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang</dc:creator>
    </item>
    <item>
      <title>POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation</title>
      <link>https://arxiv.org/abs/2603.05500</link>
      <description>arXiv:2603.05500v2 Announce Type: replace 
Abstract: Efficient and stable training of large language models (LLMs) remains a core challenge in modern machine learning systems. To address this challenge, Reparameterized Orthogonal Equivalence Training (POET), a spectrum-preserving framework that optimizes each weight matrix through orthogonal equivalence transformation, has been proposed. Although POET provides strong training stability, its original implementation incurs high memory consumption and computational overhead due to intensive matrix multiplications. To overcome these limitations, we introduce POET-X, a scalable and memory-efficient variant that performs orthogonal equivalence transformations with significantly reduced computational cost. POET-X maintains the generalization and stability benefits of POET while achieving substantial improvements in throughput and memory efficiency. In our experiments, POET-X enables the pretraining of billion-parameter LLMs on a single Nvidia H100 GPU, and in contrast, standard optimizers such as AdamW run out of memory under the same settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.05500v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zeju Qiu, Lixin Liu, Adrian Weller, Han Shi, Weiyang Liu</dc:creator>
    </item>
    <item>
      <title>Few Tokens, Big Leverage: Preserving Safety Alignment by Constraining Safety Tokens during Fine-tuning</title>
      <link>https://arxiv.org/abs/2603.07445</link>
      <description>arXiv:2603.07445v3 Announce Type: replace 
Abstract: Large language models (LLMs) often require fine-tuning (FT) to perform well on downstream tasks, but FT can induce safety-alignment drift even when the training dataset contains only benign data. Prior work shows that introducing a small fraction of harmful data can substantially compromise LLM refusal behavior, causing LLMs to comply with harmful requests. Existing defense methods often rely on model-wide interventions, such as restricting which parameters are updated or injecting additional safety data, which can limit generality and degrade downstream task performance. To address these limitations, we propose a fine-tuning framework called Preserving Safety Alignment via Constrained Tokens (PACT), which stabilizes the model's confidence on safety tokens. Our approach is motivated by the empirical observation that safety-aligned behavior is reflected in the model's token-level output confidence and is often concentrated on a small subset of safety-related tokens. During downstream fine-tuning, we regularize the fine-tuned model to match the aligned reference model's confidence on safety-related tokens at each response step, while leaving non-safety tokens largely unconstrained to allow effective task adaptation. This targeted constraint prevents alignment drift without imposing global restrictions that typically trade off with model utility. Our code is available at {https://github.com/Glresearch1/PACT}.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.07445v3</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1145/3770855.3817837</arxiv:DOI>
      <dc:creator>Guoli Wang, Haonan Shi, Tu Ouyang, An Wang</dc:creator>
    </item>
    <item>
      <title>Integral Formulas for Vector Signal Tensor Products</title>
      <link>https://arxiv.org/abs/2603.08630</link>
      <description>arXiv:2603.08630v2 Announce Type: replace 
Abstract: We derive integral formulas that simplify the Vector Signal Tensor Product recently introduced by Xie et al., which generalizes the Gaunt tensor product to anti-symmetric couplings. In particular, we obtain explicit closed-form expressions for the anti-symmetric analogues of the Gaunt coefficients. This enables us to simulate the Clebsch-Gordan tensor product using a single Vector Signal Tensor Product, yielding up to a $9\times$ reduction in the required tensor product evaluations. Our results enable efficient and practical implementations of the Vector Signal Tensor Product, paving the way for applications of this generalization of Gaunt Tensor Products in $\mathrm{SO}(3)$-equivariant neural networks. Moreover, we discuss how the Gaunt and the Vector Signal Tensor Products allow to control the expressivity-runtime tradeoff associated with the usual Clebsch-Gordan Tensor Products. Finally, we investigate low rank decompositions of the normalizations of the considered tensor products in view of their use in equivariant neural networks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.08630v2</guid>
      <category>cs.LG</category>
      <category>physics.comp-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Valentin Heyraud, Zachary Weller-Davies, Jules Tilly</dc:creator>
    </item>
    <item>
      <title>Context Over Compute Human-in-the-Loop Outperforms Iterative Chain-of-Thought Prompting in Interview Answer Quality</title>
      <link>https://arxiv.org/abs/2603.09995</link>
      <description>arXiv:2603.09995v2 Announce Type: replace 
Abstract: Behavioral interview evaluation using large language models presents unique challenges that require structured assessment, realistic interviewer behavior simulation, and pedagogical value for candidate training. We investigate chain of thought prompting for interview answer evaluation and improvement through two controlled experiments with 50 behavioral interview question and answer pairs. Our contributions are threefold. First, we provide a quantitative comparison between human in the loop and automated chain of thought improvement. Using a within subject paired design with n equals 50, both approaches show positive rating improvements. The human in the loop approach provides significant training benefits. Confidence improves from 3.16 to 4.16 (p less than 0.001) and authenticity improves from 2.94 to 4.53 (p less than 0.001, Cohen's d is 3.21). The human in the loop method also requires five times fewer iterations (1.0 versus 5.0, p less than 0.001) and achieves full personal detail integration. Second, we analyze convergence behavior. Both methods converge rapidly with mean iterations below one, with the human in the loop approach achieving a 100 percent success rate compared to 84 percent for automated approaches among initially weak answers (Cohen's h is 0.82, large effect). Additional iterations provide diminishing returns, indicating that the primary limitation is context availability rather than computational resources. Third, we propose an adversarial challenging mechanism based on a negativity bias model, named bar raiser, to simulate realistic interviewer behavior, although quantitative validation remains future work. Our findings demonstrate that while chain of thought prompting provides a useful foundation for interview evaluation, domain specific enhancements and context aware approach selection are essential for realistic and pedagogically valuable results.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.09995v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kewen Zhu, Zixi Liu, Yanjing Li, Jing Chen</dc:creator>
    </item>
    <item>
      <title>Graph-GRPO: Training Graph Flow Models with Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2603.10395</link>
      <description>arXiv:2603.10395v2 Announce Type: replace 
Abstract: Graph generation is a fundamental task with broad applications, such as drug discovery. Recently, discrete flow matching-based graph generation, \aka, graph flow model (GFM), has emerged due to its superior performance and flexible sampling. However, effectively aligning GFMs with complex human preferences or task-specific objectives remains a significant challenge. In this paper, we propose Graph-GRPO, an online reinforcement learning (RL) framework for training GFMs under verifiable rewards. Our method makes two key contributions: (1) We derive an analytical expression for the transition probability of GFMs, replacing the Monte Carlo sampling and enabling fully differentiable rollouts for RL training; (2) We propose a refinement strategy that randomly perturbs specific nodes and edges in a graph, and regenerates them, allowing for localized exploration and self-improvement of generation quality. Extensive experiments on both synthetic and real datasets demonstrate the effectiveness of Graph-GRPO. With only 50 denoising steps, our method achieves 95.0\% and 97.5\% Valid-Unique-Novelty scores on the planar and tree datasets, respectively. Moreover, Graph-GRPO achieves state-of-the-art performance on the molecular optimization tasks, outperforming graph-based and fragment-based RL methods as well as classic genetic algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.10395v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Baoheng Zhu, Deyu Bo, Delvin Ce Zhang, Xiao Wang</dc:creator>
    </item>
    <item>
      <title>A Machine Learning-Enhanced Hopf-Cole Formulation for Nonlinear Gas Flow in Porous Media</title>
      <link>https://arxiv.org/abs/2603.11250</link>
      <description>arXiv:2603.11250v2 Announce Type: replace 
Abstract: Accurate modeling of gas flow through porous media is critical for many technological applications, including reservoir performance prediction, carbon capture and sequestration, and fuel cells and batteries. However, such modeling remains challenging due to strong nonlinear behavior and uncertainty in model parameters. In particular, gas slippage effects described by the Klinkenberg model introduce pressure-dependent permeability, which complicates numerical simulation and obscures deviations from classical Darcy flow behavior. To address these challenges, we present an integrated modeling framework for gas transport in porous media that combines a Klinkenberg-enhanced constitutive relation, Hopf-Cole-transformed mixed-form linear governing equations, a shared-trunk neural network architecture, and a Deep Least-Squares (DeepLS) solver. The Hopf-Cole transformation reformulates the original nonlinear flow equations into an equivalent linear system closely related to the Darcy model, while the mixed formulation, together with a shared-trunk neural architecture, enables simultaneous and accurate prediction of both pressure and velocity fields. A rigorous convergence analysis is performed both theoretically and numerically, establishing the stability and convergence properties of the proposed solver. Importantly, the proposed framework also naturally facilitates inverse modeling of pressure-dependent permeability and slippage parameters from limited or indirect observations, enabling efficient estimation of flow properties that are difficult to measure experimentally. Numerical results demonstrate accurate recovery of flow dynamics and parameters across a wide range of pressure regimes, highlighting the framework's robustness, accuracy, and computational efficiency for gas transport modeling and inversion in tight formations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.11250v2</guid>
      <category>math.NA</category>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>physics.flu-dyn</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>V. S. Maduri, K. B. Nakshatrala</dc:creator>
    </item>
    <item>
      <title>PicoSAM3: Real-Time In-Sensor Region-of-Interest Segmentation</title>
      <link>https://arxiv.org/abs/2603.11917</link>
      <description>arXiv:2603.11917v2 Announce Type: replace 
Abstract: Real-time, on-device segmentation is critical for latency-sensitive and privacy-aware applications such as smart glasses and Internet-of-Things devices. We introduce PicoSAM3, a lightweight promptable visual segmentation model optimized for edge and in-sensor execution, including deployment on the Sony IMX500 vision sensor. PicoSAM3 has 1.3M parameters and combines a dense CNN architecture with region of interest prompt encoding, Efficient Channel Attention, and knowledge distillation from SAM2 and SAM3. On COCO and LVIS, PicoSAM3 achieves 65.45% and 64.01% mIoU, respectively, outperforming existing SAM-based and edge-oriented baselines at similar or lower complexity. The INT8 quantized model preserves accuracy with negligible degradation while enabling real-time in-sensor inference at 11.82ms latency on the IMX500, fully complying with its memory and operator constraints. Ablation studies show that distillation from large SAM models yields up to +14.5% mIoU improvement over supervised training and demonstrate that high-quality, spatially flexible promptable segmentation is feasible directly at the sensor level.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.11917v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pietro Bonazzi, Nicola Farronato, Stefan Zihlmann, Haotong Qin, Michele Magno</dc:creator>
    </item>
    <item>
      <title>CSE-UOI at SemEval-2026 Task 6: A Two-Stage Heterogeneous Ensemble with Deliberative Complexity Gating for Political Evasion Detection</title>
      <link>https://arxiv.org/abs/2603.12453</link>
      <description>arXiv:2603.12453v2 Announce Type: replace 
Abstract: This paper describes our system for SemEval-2026 Task 6, which classifies clarity of responses in political interviews into three categories: Clear Reply, Ambivalent, and Clear Non-Reply. We propose a heterogeneous dual large language model (LLM) ensemble via self-consistency (SC) and weighted voting, and a novel post-hoc correction mechanism, Deliberative Complexity Gating (DCG). This mechanism uses cross-model behavioral signals and exploits the finding that an LLM response-length proxy correlates strongly with sample ambiguity. To further examine mechanisms for improving ambiguity detection, we evaluated multi-agent debate as an alternative strategy for increasing deliberative capacity. Unlike DCG, which adaptively gates reasoning using cross-model behavioral signals, debate increases agent count without increasing model diversity. Our solution achieved a Macro-F1 score of 0.85 on the evaluation set, securing 3rd place and tied with the second-best reported score.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.12453v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Christos Tzouvaras, Konstantinos Skianis, Athanasios Voulodimos</dc:creator>
    </item>
    <item>
      <title>RetroReasoner: A Reasoning LLM for Strategic Retrosynthesis Prediction</title>
      <link>https://arxiv.org/abs/2603.12666</link>
      <description>arXiv:2603.12666v2 Announce Type: replace 
Abstract: Retrosynthesis prediction aims to identify reactants that can synthesize a given product molecule. Although molecular large language models (LLMs) have recently shown promising results, most existing methods either generate reactants directly or provide only generic product-level analysis, without explicitly reasoning about bond-disconnection strategies that justify specific reactant choices. This paper proposes RetroReasoner, a retrosynthetic reasoning model that captures chemists' strategic disconnection-based thinking. RetroReasoner is trained with supervised fine-tuning and reinforcement learning. For supervised fine-tuning, SyntheticRetro generates structured disconnection rationales paired with reactant predictions. For reinforcement learning, a round-trip reward evaluates predicted reactants by passing them through a forward synthesis model and rewarding predictions that reconstruct the original product. RetroReasoner can also be applied to multi-step retrosynthetic planning by incorporating it into a parallelized Monte Carlo tree search framework, reducing search time while increasing the number and diversity of valid synthetic pathways. Experimental results show that RetroReasoner outperforms prior baselines, including not only molecular LLMs but also retrosynthesis-specific expert models, and generates a broader range of feasible reactant proposals, especially for challenging reaction instances.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.12666v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Hanbum Ko, Chanhui Lee, Ye Rin Kim, Rodrigo Hormazabal, Sehui Han, Sungbin Lim, Sungwoong Kim</dc:creator>
    </item>
    <item>
      <title>How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing</title>
      <link>https://arxiv.org/abs/2603.13259</link>
      <description>arXiv:2603.13259v2 Announce Type: replace 
Abstract: When a decoder-only transformer is forced to process matched correct and incorrect single-token continuations of a factual query, the two pathways through hidden-state space diverge in a specific way: displacement vectors from the query-only representation maintain approximately equal magnitude but rotate apart in direction. The angular separation grows through mid-depth, and late layers resolve the asymmetric outcome -a logit-lens preference that, in the incorrect run, falls far below the naive prior of equal probability, corresponding to the model assigning approximately 11.5 times more probability to the incorrect token than to the correct one. We characterize this two-phase pattern-rotational divergence in mid-depth followed by late-layer asymmetric commitment-as the empirical geometric signature of what looks externally like the model rejecting a wrong continuation, while remaining explicit that it is an observational characterization, not a causal account. The pattern is consistent across six decoder-only transformers including five architecture families from 1B to 13B parameters. A seventh model (Qwen2 1.5B) shows a flat profile under the present extraction protocol that is plausibly a tokenizer-fragmentation artefact rather than a real scale floor; the question of an emergence threshold is left open. Single-layer activation patching does not recover the correct token at any layer band, meaning the late-layer asymmetry is not localized to a discrete component under the protocol used. Taken together, the evidence is consistent with a distributed-by-trajectory account of factual constraint processing-geometric structure that emerges cumulatively across many layers rather than from a single localized circuit and inconsistent with the simplest single-layer localized-recall account.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.13259v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Javier Mar\'in</dc:creator>
    </item>
    <item>
      <title>CHIMERA-Bench: A Benchmark Dataset for Epitope-Specific Antibody Design</title>
      <link>https://arxiv.org/abs/2603.13431</link>
      <description>arXiv:2603.13431v3 Announce Type: replace 
Abstract: Computational antibody design has seen rapid methodological progress, with dozens of deep generative methods proposed in the past three years, yet the field lacks a standardized benchmark for fair comparison and model development. These methods are evaluated on different SAbDab snapshots, non-overlapping test sets, and incompatible metrics, and the literature fragments the design problem into numerous sub-tasks with no common definition. We introduce CHIMERA-Bench: (CDR Modeling with Epitope-guided Redesign), a unified benchmark built around a single canonical task: epitope-conditioned CDR sequence-structure co-design. CHIMERA-Bench provides three components. The first is a curated, deduplicated dataset of 2,922 antibody-antigen complexes with epitope and paratope annotations. The second is a set of three biologically motivated splits that test generalization to unseen epitopes, unseen antigen folds, and prospective temporal targets. The third is a comprehensive evaluation protocol with five metric groups, including novel epitope-specificity measures. We benchmark eleven methods spanning six generative paradigms and report results across all splits. CHIMERA-Bench is the largest dataset of its kind for the antibody design problem, allowing the community to develop and test novel methods and evaluate their generalizability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.13431v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mansoor Ahmed, Nadeem Taj, Imdad Ullah Khan, Hemanth Venkateswara, Murray Patterson</dc:creator>
    </item>
    <item>
      <title>Toward Scalable Co-located Practical Learning: Assisting with Computer Vision and Multimodal Analytics</title>
      <link>https://arxiv.org/abs/2603.13679</link>
      <description>arXiv:2603.13679v2 Announce Type: replace 
Abstract: Co-located practical learning leaves evidence in visible actions around patients, task resources and room zones, but these traces are often recovered through live observation or retrospective video review. Fixed wide-angle video could reduce sensing burden, yet a debriefing pipeline must do more than detect behaviours: it must maintain detection after small camera-position shifts, relate the detector-derived behaviour trace to instructor-labelled outcomes and preserve room-zone context. This study evaluates a fixed-camera pipeline in repeated nursing simulation. Using a harmonised six-code taxonomy, we tested YOLO26 target-only training and two-stage source-to-target adaptation across two same-room side-view data sources. We then converted detections from 51 instructor-labelled sessions into one-second behaviour and behaviour-zone traces for rate, ordered-network, transition-network and sequence analyses.
  Two-stage adaptation improved mean mAP50 from 0.815 to 0.848 for the 2021 target view and from 0.690 to 0.855 for the smaller 2022 target view; with a balanced target quota of \(N = 22\), the 2022 model reached 0.850 mAP50. In the detector-derived behaviour trace analyses, higher phone use characterised low task-performance sessions. Zone labels changed the interpretation of patient interaction: primary patient-care-zone interaction was stronger in higher-performance sessions, while secondary-zone interaction was stronger in lower-performance sessions. Ordered and transition network models showed that ordered room-zone relations contributed beyond behaviour frequency, with the strongest task-performance classifier using zoned and co-presence features. The resulting trace is most appropriate for searchable simulation debriefing, where instructors inspect detected moments rather than receive automated assessment scores.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.13679v2</guid>
      <category>cs.HC</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xinyu Li, Linxuan Zhao, Yueqiao Jin, Yuchen Liu, Jin Zhou, Roberto Martinez-Maldonado, Dragan Gasevic, Lixiang Yan</dc:creator>
    </item>
    <item>
      <title>An Alternative Trajectory for Generative AI</title>
      <link>https://arxiv.org/abs/2603.14147</link>
      <description>arXiv:2603.14147v2 Announce Type: replace 
Abstract: The generative artificial intelligence (AI) ecosystem is undergoing rapid transformations that threaten its sustainability. As models transition from research prototypes to high-traffic products, the energetic burden has shifted from one-time training to recurring, unbounded inference. This is exacerbated by reasoning models that inflate compute costs by orders of magnitude per query. The prevailing pursuit of artificial general intelligence through scaling of monolithic models is colliding with hard physical constraints: grid failures, water consumption, and diminishing returns on data scaling. This trajectory yields models with impressive factual recall but struggles in domains requiring in-depth reasoning, possibly due to insufficient abstractions in training data.
  Current large language models (LLMs) exhibit genuine reasoning depth only in domains like mathematics and coding, where rigorous, pre-existing abstractions provide structural grounding. In other fields, the current approach fails to generalize well. We propose an alternative trajectory based on domain-specific superintelligence (DSS). We argue for first constructing explicit symbolic abstractions (knowledge graphs, ontologies, and formal logic) to underpin synthetic curricula enabling small language models to master domain-specific reasoning without the model collapse problem typical of LLM-based synthetic data methods.
  Rather than a single generalist giant model, we envision "societies of DSS models": dynamic ecosystems where orchestration agents route tasks to distinct DSS back-ends. This paradigm shift decouples capability from size, enabling intelligence to migrate from energy-intensive data centers to secure, on-device experts. By aligning algorithmic progress with physical constraints, DSS societies move generative AI from an environmental liability to a sustainable force for economic empowerment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.14147v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Margarita Belova, Yuval Kansal, Yihao Liang, Jiaxin Xiao, Niraj K. Jha</dc:creator>
    </item>
    <item>
      <title>AgroOmni: A Large-Scale Multi-view Agricultural Dataset for Cross-Scale Multimodal Reasoning</title>
      <link>https://arxiv.org/abs/2603.14342</link>
      <description>arXiv:2603.14342v2 Announce Type: replace 
Abstract: Modern agricultural data is sourced from diverse platforms and spans multiple spatial scales, ranging from ground-level close-up photography to Unmanned Aerial Vehicle (UAV) aerial observation and satellite remote sensing imagery. Accordingly, agricultural multimodal reasoning demands robust cross-scale spatial understanding. However, due to the lack of multi-view agricultural benchmark datasets, existing multimodal large language models (MLLMs) exhibit severe ground-level bias, which leads to scale confusion then semantic collapse in agricultural perception tasks, such as misinterpreting farmland imagery as walls or floors. To address this, we introduce AgroOmni, a large-scale multi-view training corpus with 288K Visual Question Answering pairs covering 56 specialized task categories across 14 task types, designed to capture diverse scales in modern precision agriculture. Built on this dataset, we propose AgroNVILA, which achieves a new state-of-the-art of 62.32% on the AgroMind benchmark (+15.03% over GPT-5.2), effectively mitigating the multi-view cross-scale gap for holistic agricultural understanding. Diagnostic evaluations on AgMMU further reveal an inherent heterogeneity between macro-priors and micro-diagnostics through constrained zero-shot performance. Meanwhile, even minimal fine-tuning leads to a dramatic performance gain of AgroNVILA on AgMMU, strongly demonstrating its generalization capability empowered by AgroOmni. Full training scripts are publicly available at https://anonymous.4open.science/r/AgroOmni-6510.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.14342v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiarui Zhang, Junqi Hu, Zurong Mai, Yang Liu, Yuhang Chen, Shuohong Lou, Henglian Huang, Hong Cheng, Lingyuan Zhao, Jianxi Huang, Yutong Lu, Haohuan Fu, Juepeng Zheng</dc:creator>
    </item>
    <item>
      <title>A Systematic Comparison and Evaluation of Building Ontologies for Deploying Data-Driven Analytics in Smart Buildings</title>
      <link>https://arxiv.org/abs/2603.14374</link>
      <description>arXiv:2603.14374v3 Announce Type: replace 
Abstract: Ontologies play a critical role in data exchange, information integration, and knowledge sharing across diverse smart building applications. Yet, semantic differences between the prevailing building ontologies hamper their purpose of bringing data interoperability and restrict the ability to reuse building ontologies in real-world applications. In this paper, we propose and adopt a framework to conduct a systematic comparison and evaluation of four popular building ontologies (Brick Schema, RealEstateCore, Project Haystack and Google's Digital Buildings) from both axiomatic design and assertions in a use case, namely the Terminological Box (TBox) evaluation and the Assertion Box (ABox) evaluation. In the TBox evaluation, we use the SQuaRE-based Ontology Quality Evaluation (OQuaRE) Framework and concede that Project Haystack and Brick Schema are more compact with respect to the ontology axiomatic design. In the ABox evaluation, we apply an empirical study with sample building data that suggests that Brick Schema and RealEstateCore have greater completeness and expressiveness in capturing the main concepts and relations within the building domain. The results implicitly indicate that there is no universal building ontology for integrating Linked Building Data (LBD). We discuss ontology compatibility and investigate building ontology design patterns (ODPs) to support ontology matching, alignment, and harmonisation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.14374v3</guid>
      <category>cs.IR</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Zhangcheng Qiang, Stuart Hands, Kerry Taylor, Subbu Sethuvenkatraman, Daniel Hugo, Pouya Ghiasnezhad Omran, Madhawa Perera, Armin Haller</dc:creator>
    </item>
    <item>
      <title>Convex algebras on an interval with semicontinuous monotone operations</title>
      <link>https://arxiv.org/abs/2603.14955</link>
      <description>arXiv:2603.14955v2 Announce Type: replace 
Abstract: In a recent work of Matteo Mio on compact quantitative equational theories (here compact means that all its consequences are derivable by means of finite proofs) convex algebras on the carrier set [0,1] whose operations are monotone and satisfy certain semicontinuity properties occurred. We fully classify those algebraic structures by giving an explicit construction of all possible convex operations on [0,1] possessing the mentioned properties. Our result thus describes exactly the range of theories to which Mio's theorem applies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.14955v2</guid>
      <category>cs.LO</category>
      <category>math.LO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Ana Sokolova, Harald Woracek</dc:creator>
    </item>
    <item>
      <title>On the Nonasymptotic Bounds of Joint Source-Channel Coding with Hierarchical Sources</title>
      <link>https://arxiv.org/abs/2603.15249</link>
      <description>arXiv:2603.15249v2 Announce Type: replace 
Abstract: This paper establishes tractable bounds of joint source-channel coding with hierarchical sources in the finite blocklength regime. In this setting, both the indirect source and observable source must be reconstructed under correlated distortion constraints, leading to a joint excess-distortion event. First, to build computable tight bounds, we introduce a novel $\mathsf{d}(\cdot)$-functional distortion relaxation, which enables tractable and tight bounding of the joint excess-distortion probability induced by correlated sources. By this approach, the nonasymptotic converse and achievability bounds are given. Second, Gaussian approximations for the proposed bounds are obtained, which are optimal for the transmission of a Gaussian memoryless source over an additive white Gaussian noise channel with mean-square error distortion. The optimal scheme is obtained via a structured analysis that captures the intrinsic tradeoff between semantic and observable reconstructions. Furthermore, for the transmission of Gaussian memoryless sources over AWGN channels, we obtain explicit and computable bounds, by providing a new geometric structure involving three correlated spherical regions. This results extend the classical two-spherical region analysis for a single distortion constraint. Numerical simulations demonstrate that the proposed achievability and converse bounds tightly sandwich the Gaussian approximation and align closely with Monte Carlo numerical results.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.15249v2</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shuo Shao, Chao Qi, Jincheng Dai, Wenrui Dai, Hongkai Xiong</dc:creator>
    </item>
    <item>
      <title>Become the Beast: Exploring Human-Quadruped Locomotion for Exergames</title>
      <link>https://arxiv.org/abs/2603.15428</link>
      <description>arXiv:2603.15428v2 Announce Type: replace 
Abstract: Embodying non-human characters and exercising abdominal muscles are both underexplored in exergames. We address this by describing the design and evaluation of a novel human quadruped locomotion exergame, Become the Beast. In the game, the player lies supine on the ground and moves their arms and legs to control a quadrupedal character (a tiger), similar to common bodyweight abdominal muscle exercises such as the Bicycle Crunch. The motion tracking is computer vision-based, utilizing a Kinect sensor placed above the player, which makes our approach suitable for commercial premises such as indoor activity parks where a system needs to run unattended and without any wearable components. Our system extends embodied interaction beyond traditional bipedal or controller-based systems, demonstrating how natural limb movements can generate responsive and immersive quadrupedal motion within virtual environments. We conducted a user study (N=15) and utilized Reflexive Thematic Analysis (RTA) to evaluate the system's intuitiveness, control, and overall player experience. The findings validate that natural body movements effectively control the avatar while delivering an intense core workout. Notably, gameplay immersion masked physical exertion, allowing rigorous core training to be primarily perceived as play.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.15428v2</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shamit Ahmed, Prabhav Bhatnagar, Perttu H\"am\"al\"ainen</dc:creator>
    </item>
    <item>
      <title>IRAM-Omega-Q: A Computational Framework for Uncertainty Regulation in Adaptive Agents</title>
      <link>https://arxiv.org/abs/2603.16020</link>
      <description>arXiv:2603.16020v2 Announce Type: replace 
Abstract: Adaptive agents operating under uncertainty must do more than optimize task outputs: they must maintain a workable internal state under noise, perturbation, and changing conditions. This paper introduces IRAM-Omega-Q, a computational framework for modeling uncertainty regulation in adaptive agents under stochastic disturbance. The framework combines a quantum-like state representation with closed-loop adaptive control over an internal entropy signal. The quantum-like formalism is used instrumentally: the evolving state is a normalized complex amplitude vector, coherent evolution is propagated exactly as psi(t + Delta t) = exp(-i H Delta t) psi(t), and a derived density matrix supports entropy and coherence-gap analysis. Two causal control orderings are compared. In regulation-first (RF) ordering, adaptive regulation is available before current-cycle disturbance and attenuates incoming exposure; in disturbance-first (DF) ordering, current-cycle disturbance is received before a new regulatory response can be computed, and stabilization acts reactively. Publication-mode, matched-seed simulations show broadly comparable coherence-gap trajectories but lower sustained adaptive gain under RF. Susceptibility maps based on post-burn-in temporal fluctuations further show that DF shifts the critical initial-gain ridge toward larger values across multiple disturbance intervals. These results identify ordering as an architectural determinant of regulatory demand and threshold location within an otherwise shared regime structure.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.16020v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Veronique Ziegler</dc:creator>
    </item>
    <item>
      <title>A New Approach to Code Smoothing Bounds</title>
      <link>https://arxiv.org/abs/2603.18077</link>
      <description>arXiv:2603.18077v2 Announce Type: replace 
Abstract: Code smoothing is a phenomenon in which an error distribution makes a code statistically close to the uniform distribution over the ambient space. This closeness is measured by total variation distance. Recently, Debris-Alazard et al.\ introduced a smoothing bound, which is an upper bound on this total variation distance. Although the smoothing bound evaluates how the error distribution smooths a code, this bound applies only to linear codes. In this paper, we generalize this bound to not only linear codes but also specific non-linear codes. While the smoothing bound in previous work was obtained by Fourier analysis over finite abelian groups, we derive this bound using a graph-theoretic approach. To derive the smoothing bound, we consider code smoothing as the mixing of random walks on a specific graph, and use the concept of equitable partitions, which is well-studied in graph theory.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.18077v2</guid>
      <category>cs.IT</category>
      <category>cs.CR</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tsuyoshi Miezaki, Yusaku Nishimura, Katsuyuki Takashima</dc:creator>
    </item>
    <item>
      <title>Reflection in the Dark: Exposing and Escaping the Black Box in Reflective Prompt Optimization</title>
      <link>https://arxiv.org/abs/2603.18388</link>
      <description>arXiv:2603.18388v2 Announce Type: replace 
Abstract: Automatic prompt optimization (APO) has emerged as a powerful paradigm for improving LLM performance without manual prompt engineering. Reflective APO methods such as GEPA iteratively refine prompts by diagnosing failure cases, but the optimization process remains black-box and label-free, leading to uninterpretable trajectories and systematic failure. We identify and empirically demonstrate four limitations: on GSM8K with a defective seed, GEPA degrades accuracy from 23.81% to 13.50%. We propose VISTA, a multi-agent APO framework that decouples hypothesis generation from prompt rewriting, enabling semantically labeled hypotheses, parallel minibatch verification, and interpretable optimization trace. A two-layer explore-exploit mechanism combining random restart and epsilon-greedy sampling further escapes local optima. VISTA recovers accuracy to 87.57% on the same defective seed and consistently outperforms baselines across all conditions on GSM8K and AIME2025.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.18388v2</guid>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Shiyan Liu, Qifeng Xia, Qiyun Xia, Yisheng Liu, Xinyu Yu, Rui Qu</dc:creator>
    </item>
    <item>
      <title>Heart Artifact Removal in Electrohysterography Measurements Using Algebraic Differentiators</title>
      <link>https://arxiv.org/abs/2603.18949</link>
      <description>arXiv:2603.18949v2 Announce Type: replace 
Abstract: Electrohysterography (EHG) enables non-invasive monitoring of uterine contractions but can be contaminated by electrocardiogram (ECG) artifacts. This work presents an ECG removal method using algebraic differentiators, a control-theoretic tool for model-free derivative estimation, that preserves signal shape outside the detected cardiac pulse locations. The differentiator parameters are designed to simultaneously suppress slow physiological artifacts and powerline interference. Cross-channel clustering distinguishes cardiac pulses from localized artifacts, enabling accurate pulse subtraction without auxiliary ECG references. Implemented as a causal FIR filter, the method is validated as a proof of concept on multichannel EHG recordings from one female and one male healthy volunteer and compared to the template subtraction method.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.18949v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Amine Othmane, Maria Camila Bustos Vivas, Johannes Steuer, Jana Hutter</dc:creator>
    </item>
    <item>
      <title>Sparse Autoencoders Reveal Interpretable and Steerable Features in VLA Models</title>
      <link>https://arxiv.org/abs/2603.19183</link>
      <description>arXiv:2603.19183v2 Announce Type: replace 
Abstract: Vision-Language-Action (VLA) models have emerged as a promising approach for general-purpose robot manipulation. However, little research has mechanistically explored when and why they generalize across objects, scenes, and instructions. To probe internal representations, we train Sparse Autoencoders (SAEs) on the VLA's hidden-layer activations. SAEs learn sparse dictionaries over model activations, often revealing features that correspond to interpretable directions in the model's representation space. We identify SAE features corresponding to motion primitives and semantic concepts, including features that are general across episodes and causally steerable. We propose a metric to categorize features as general transferable primitives or episode-specific memorizations, offering a promising glimpse towards VLA generalization. We validate these findings through steering experiments on both the LIBERO simulation benchmark and on real-world DROID hardware. We find that amplifying general and semantic features induces behaviors consistent with their meanings, whereas ablating them destroys model performance. Furthermore, we demonstrate steering as a way to control behavior in unpromptable directions. Together, these results provide mechanistic evidence that VLAs can learn reusable internal features linking perception, language, and action across tasks and scenes. Our project page is located at https://drvla.github.io</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.19183v2</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Aiden Swann, Lachlain McGranahan, Hugo Buurmeijer, Monroe Kennedy III, Mac Schwager</dc:creator>
    </item>
    <item>
      <title>SWARM+: Scalable and Resilient Multi-Agent Consensus for Decentralized Data-Aware Workload Management</title>
      <link>https://arxiv.org/abs/2603.19431</link>
      <description>arXiv:2603.19431v3 Announce Type: replace 
Abstract: Distributed scientific workflows are increasingly executed across heterogeneous and geo-distributed computing environments, where centralized workload orchestration becomes a scalability and resilience bottleneck. This paper presents SWARM+, a decentralized workload management system that coordinates workload placement through hierarchical multi-agent consensus, reducing coordination overhead and dramatically improving scalability, while tolerating failures and dynamic membership changes. SWARM+ enables data-aware scheduling policies that incorporate resource availability, data transfer node (DTN) connectivity, and data locality into workload placement decisions. We evaluate SWARM+ on the distributed FABRIC testbed using heterogeneous scientific workloads derived from production workflow traces obtained from the Pegasus Workflow Management System (WMS). Experimental results show that SWARM+ scales coordination to 990 distributed agents with approximately 1\,s per-job selection time at 110 agents. SWARM+ demonstrates balanced workload distribution, maintains over 97% job completion under distributed failures with graceful degradation (mean ~95% job completion) during correlated site outages, tolerates coordinator agent failures gracefully, improves schedule quality by employing data-aware policies, and reduces both selection time and scheduling latency by 97-98% when compared to the prior SWARM system.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.19431v3</guid>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Komal Thareja, Krishnan Raghavan, Hamza Safri, Anirban Mandal, Ewa Deelman</dc:creator>
    </item>
    <item>
      <title>Back to Point: Exploring Point-Language Models for Zero-Shot 3D Anomaly Detection</title>
      <link>https://arxiv.org/abs/2603.21511</link>
      <description>arXiv:2603.21511v2 Announce Type: replace 
Abstract: Zero-shot (ZS) 3D anomaly detection is crucial for reliable industrial inspection, as it enables detecting and localizing defects without requiring any target-category training data. Existing approaches render 3D point clouds into 2D images and leverage pre-trained Vision-Language Models (VLMs) for anomaly detection. However, such strategies inevitably discard geometric details and exhibit limited sensitivity to local anomalies. In this paper, we revisit intrinsic 3D representations and explore the potential of pre-trained Point-Language Models (PLMs) for ZS 3D anomaly detection. We propose BTP (Back To Point), a novel framework that effectively aligns 3D point cloud and textual embeddings. Specifically, BTP aligns multi-granularity patch features with textual representations for localized anomaly detection, while incorporating geometric descriptors to enhance sensitivity to structural anomalies. Furthermore, we introduce a joint representation learning strategy that leverages auxiliary point cloud data to improve robustness and enrich anomaly semantics. Extensive experiments on Real3D-AD and Anomaly-ShapeNet demonstrate that BTP achieves superior performance in ZS 3D anomaly detection. Code will be available at \href{https://github.com/wistful-8029/BTP-3DAD}{https://github.com/wistful-8029/BTP-3DAD}.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.21511v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kaiqiang Li, Gang Li, Mingle Zhou, Min Li, Delong Han, Jin Wan</dc:creator>
    </item>
    <item>
      <title>Counterfactual Credit Policy Optimization for Multi-Agent Collaboration</title>
      <link>https://arxiv.org/abs/2603.21563</link>
      <description>arXiv:2603.21563v4 Announce Type: replace 
Abstract: Collaborative multi-agent large language models (LLMs) can solve complex reasoning tasks by decomposing roles, but reinforcement learning for such systems is limited by credit assignment: shared terminal rewards obscure individual contributions and can encourage free-riding. We introduce Collaborative Credit Policy Optimization (CCPO), an optimizer-agnostic credit assignment layer that converts team-level outcomes into agent-specific learning signals. CCPO provides two complementary allocators. Counterfactual credit estimates an agent's marginal contribution by comparing the realized team outcome with a counterfactual outcome where that agent is removed. Verifier-anchored LLM self-evaluation is an exploratory allocator that uses constrained self- and peer-evaluations to redistribute credit while keeping the external verifier outcome dominant. The resulting role-specific rewards can be consumed by GRPO-style updates or other policy-gradient optimizers such as GSPO and REINFORCE++. We instantiate CCPO in a sequential Think--Solve setting and evaluate it on mathematical reasoning benchmarks. Results show that explicit credit assignment often improves dual-agent reasoning, especially on MATH500 and several out-of-distribution settings, while gains vary across models and datasets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.21563v4</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhongyi Li, Wan Tian, Yikun Ban, Jinju Chen, Huiming Zhang, Yang Liu, Fuzhen Zhuang</dc:creator>
    </item>
    <item>
      <title>Component Ablation for Efficient Hybrid Language Model Architectures: Performance, Resilience, and Compression Implications</title>
      <link>https://arxiv.org/abs/2603.22473</link>
      <description>arXiv:2603.22473v2 Announce Type: replace 
Abstract: Hybrid language models combine softmax attention with linear-time sequence mechanisms such as state-space or linear-attention layers, but the functional contribution of each component type remains insufficiently characterized. We study component-level ablation in two sub-1B hybrid language models, Qwen3.5-0.8B and Falcon-H1-0.5B, using likelihood-based evaluation, downstream benchmarks, layer-wise interventions, random controls, and representation-level diagnostics.
  Across the tested models, removing either attention or the alternative sequence-processing pathway substantially degrades performance, indicating that both component types contribute to model behavior. Likelihood metrics are especially sensitive to the linear-attention or state-space pathway, while downstream benchmark degradation depends on task and architecture. Layer-wise ablations show that component importance is position-dependent, with the strongest effects concentrated in early or mid-network components rather than uniformly across depth. Random-removal controls further show that hybrid architectures and same-family Transformer baselines degrade differently under structural perturbation.
  These results suggest that component ablation is a useful diagnostic for understanding hybrid language model architectures. The findings provide evidence relevant to efficient model design, compression, robustness analysis, and deployment decisions in architectures that combine attention with alternative sequence-processing mechanisms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.22473v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hector Borobia, Elies Segu\'i-Mas, Guillermina Tormo-Carb\'o</dc:creator>
    </item>
    <item>
      <title>Signals Are Not States: Neuro-Symbolic Safeguards for Culturally Aware Classroom AI</title>
      <link>https://arxiv.org/abs/2603.22793</link>
      <description>arXiv:2603.22793v2 Announce Type: replace 
Abstract: Classroom AI systems increasingly infer high-level educational states such as engagement, confusion, collaboration, participation, and instructional quality from multimodal and linguistic signals. In multicultural and multilingual classrooms, such inferences can translate culturally situated behavior into stereotyped claims: silence may be read as disengagement, gaze aversion as inattention, code-switching as low proficiency, or indirect help-seeking as confusion. We argue that stereotype-aware classroom AI should separate observable evidence from culturally loaded interpretation and should treat unsupported construct-level claims as safety risks. We introduce NSCR, a culturally grounded neuro-symbolic framework that converts video, audio, ASR, lesson artifacts, and contextual metadata into typed facts with uncertainty, provenance, and cultural scope, then composes them through executable reasoning and policy constraints. We define a taxonomy of stereotype-prone classroom inferences and propose a benchmark agenda covering culture-conditioned state inference, evidence-grounded claim verification, multilingual and code-switched reasoning, collaboration analysis, counterfactual cultural robustness, and culture-conditioned red-teaming. We further specify metrics for stereotype leakage, unsupported attribution, cultural calibration gaps, abstention under cultural ambiguity, and evidence faithfulness. The contribution is methodological: a concrete framework and evaluation agenda for mitigating stereotyped reasoning in classroom AI, with education as a high-stakes, culturally variable deployment setting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.22793v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sina Bagheri Nezhad</dc:creator>
    </item>
    <item>
      <title>LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load</title>
      <link>https://arxiv.org/abs/2603.23640</link>
      <description>arXiv:2603.23640v2 Announce Type: replace 
Abstract: Deploying large language models on-device for always-on personal agents demands sustained inference from hardware tightly constrained in power, thermal envelope, and memory. We benchmark Qwen 2.5 1.5B (4-bit quantised) across four platforms: a Raspberry Pi 5 with Hailo-10H NPU, a Samsung Galaxy S24 Ultra, an iPhone 16 Pro, and a laptop NVIDIA RTX 4050 GPU. Using a fixed 258-token prompt over 20 warm-condition iterations per device, we measure throughput, latency, power, and thermal behaviour. For mobile platforms, thermal management supersedes peak compute as the primary constraint: the iPhone 16 Pro loses nearly half its throughput within two iterations, and the S24 Ultra suffers a hard OS-enforced GPU frequency floor that terminates inference entirely. On dedicated hardware, distinct constraints dominate: the RTX 4050 is bounded by its battery power ceiling, while the Hailo-10H is limited by on-module memory bandwidth. The RTX 4050 sustains 131.7 tok/s at 34.1 W; the Hailo-10H sustains 6.9 tok/s at under 2 W with near-zero variance, matching the RTX 4050 in energy proportionality at 19x lower throughput. Results should be interpreted as platform-level deployment characterisations for a single model and prompt type, reflecting hardware and software combined, rather than general claims about hardware capability alone.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.23640v2</guid>
      <category>cs.DC</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Pranay Tummalapalli, Sahil Arayakandy, Ritam Pal, Kautuk Kundan</dc:creator>
    </item>
    <item>
      <title>DecepGPT: Schema-Driven Deception Detection with Multicultural Datasets and Robust Multimodal Learning</title>
      <link>https://arxiv.org/abs/2603.23916</link>
      <description>arXiv:2603.23916v3 Announce Type: replace 
Abstract: Multimodal deception detection aims to identify deceptive behavior by analyzing audiovisual cues for forensics and security. In these high-stakes settings, investigators need verifiable evidence connecting audiovisual cues to final decisions, along with reliable generalization across domains and cultural contexts. However, existing benchmarks provide only binary labels without intermediate reasoning cues. Datasets are also small with limited scenario coverage, leading to shortcut learning. We address these issues through three contributions. First, we construct reasoning datasets by augmenting existing benchmarks with structured cue-level descriptions and reasoning chains, enabling model output auditable reports. Second, we release T4-Deception, a multicultural dataset based on the unified ``To Tell The Truth'' television format implemented across four countries. With 1695 samples, it is the largest non-laboratory deception detection dataset. Third, we propose two modules for robust learning under small-data conditions. Stabilized Individuality-Commonality Synergy (SICS) refines multimodal representations by synergizing learnable global priors with sample-adaptive residuals, followed by a polarity-aware adjustment that bi-directionally recalibrates representations. Distilled Modality Consistency (DMC) aligns modality-specific predictions with the fused multimodal predictions via knowledge distillation to prevent unimodal shortcut learning. Experiments on three established benchmarks and our novel dataset demonstrate that our method achieves state-of-the-art performance in both in-domain and cross-domain scenarios, while exhibiting superior transferability across diverse cultural contexts. The datasets and codes will be released.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.23916v3</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiajian Huang, Dongliang Zhu, Zitong YU, Hui Ma, Jiayu Zhang, Chunmei Zhu, Xiaochun Cao</dc:creator>
    </item>
    <item>
      <title>Decentralized End-to-End Multi-AAV Pursuit Using Predictive Spatio-Temporal Observation via Deep Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2603.24238</link>
      <description>arXiv:2603.24238v2 Announce Type: replace 
Abstract: Decentralized cooperative pursuit in cluttered environments is challenging for autonomous aerial swarms, especially under partial and noisy perception. Existing methods often rely on abstracted geometric features or privileged ground-truth states, and therefore sidestep perceptual uncertainty in real-world settings. We propose a decentralized end-to-end multi-agent reinforcement learning (MARL) framework that maps raw LiDAR observations directly to continuous control commands. Central to the framework is the Predictive Spatio-Temporal Observation (PSTO), an egocentric grid representation that aligns obstacle geometry with predictive adversarial intent and teammate motion in a unified, fixed-resolution projection. Built on PSTO, a single decentralized policy enables agents to navigate static obstacles, intercept dynamic targets, and maintain cooperative encirclement. Simulations demonstrate that the proposed method achieves superior capture efficiency and competitive success rates compared to state-of-the-art learning-based approaches relying on privileged obstacle information. Furthermore, the unified policy scales seamlessly across different team sizes without retraining. Finally, fully autonomous outdoor experiments validate the framework on a quadrotor swarm relying on only onboard sensing and computing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.24238v2</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yude Li, Zhexuan Zhou, Huizhe Li, Yanke Sun, Yenan Wu, Yichen Lai, Yiming Wang, Youmin Gong, Jie Mei</dc:creator>
    </item>
    <item>
      <title>Causal Transfer in Medical Image Analysis</title>
      <link>https://arxiv.org/abs/2603.24388</link>
      <description>arXiv:2603.24388v2 Announce Type: replace 
Abstract: Medical imaging models frequently fail when deployed across hospitals, scanners, populations, or imaging protocols due to domain shift, limiting their clinical reliability. While transfer learning and domain adaptation address such shifts statistically, they often rely on spurious correlations that break under changing conditions. On the other hand, causal inference provides a principled way to identify invariant mechanisms that remain stable across environments. This survey introduces and systematises Causal Transfer Learning (CTL) for medical image analysis. This paradigm integrates causal reasoning with cross-domain representation learning to enable robust and generalisable clinical AI. We frame domain shift as a causal problem and analyse how structural causal models, invariant risk minimisation, and counterfactual reasoning can be embedded within transfer learning pipelines. We studied spanning classification, segmentation, reconstruction, anomaly detection, and multimodal imaging, and organised them by task, shift type, and causal assumption. A unified taxonomy is proposed that connects causal frameworks and transfer mechanisms. We further summarise datasets, benchmarks, and empirical gains, highlighting when and why causal transfer outperforms correlation-based domain adaptation. Finally, we discuss how CTL supports fairness, robustness, and trustworthy deployment in multi-institutional and federated settings, and outline open challenges and research directions for clinically reliable medical imaging AI.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.24388v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mohammed M. Abdelsamea, Daniel Tweneboah Anyimadu, Tasneem Selim, Saif Alzubi, Lei Zhang, Ahmed Karam Eldaly, Xujiong Ye</dc:creator>
    </item>
    <item>
      <title>GraphER: An Efficient Graph-Based Enrichment and Reranking Method for Retrieval-Augmented Generation</title>
      <link>https://arxiv.org/abs/2603.24925</link>
      <description>arXiv:2603.24925v2 Announce Type: replace 
Abstract: Retrieval-augmented generation (RAG) systems that rely on semantic search often fail to retrieve the complete set of evidence for complex queries, particularly when information is distributed across multiple sources. Existing approaches either rely on iterative agentic retrieval, which can be inefficient, or maintain additional structures such as knowledge graphs, which introduce storage and maintenance overhead. In this paper, we propose GraphER, a graph-based enrichment and reranking framework that (1) leverages the organizational structure of data to capture proximity relationships beyond semantic similarity, (2) constructs a graph at query time based on these proximities, and (3) applies graph-based ranking to surface the top candidate documents. Experiments across table retrieval, multi-hop retrieval, and long-document retrieval benchmarks demonstrate consistent improvements in terms of retrieval completeness. Additionally, GraphER requires no additional graph infrastructure and integrates seamlessly with standard vector stores. The framework is retriever-agnostic, supports multiple forms of proximity, and introduces minimal query-time latency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.24925v2</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ruizhong Miao, Yuying Wang, Rongguang Wang, Chenyang Li, Tao Sheng, Sujith Ravi, Dan Roth</dc:creator>
    </item>
    <item>
      <title>Vision Hopfield Memory Networks for Image Recognition</title>
      <link>https://arxiv.org/abs/2603.25157</link>
      <description>arXiv:2603.25157v3 Announce Type: replace 
Abstract: Recent vision backbones, such as Transformer families and state-space models like Mamba, have achieved remarkable progress on image recognition. Despite their empirical success, these architectures remain far from the computational principles of the human brain, often demanding enormous amounts of training data while offering limited interpretability. We propose the Vision Hopfield Memory Network (V-HMN), a brain-inspired vision backbone that integrates hierarchical memory mechanisms across layers with iterative refinement updates. Specifically, V-HMN incorporates local Hopfield modules that provide associative memory dynamics at the image patch level, global Hopfield modules that function as episodic memory for contextual modulation, and a predictive-coding-inspired refinement rule for iterative error correction. By organizing these memory-based modules hierarchically, V-HMN captures both local and global dynamics in a unified framework. Memory retrieval exposes the relationship between inputs and stored patterns, providing a prototype-based form of interpretability through explicit memory retrieval, while the reuse of stored patterns improves data efficiency. This brain-inspired design therefore enhances data efficiency and provides a prototype-based form of interpretability compared to existing self-attention- or state-space-based approaches. We conducted extensive experiments on public image classification benchmarks. V-HMN achieves strong performance on small- and medium-scale benchmarks, and remains competitive with widely adopted backbone architectures on ImageNet despite minimal architectural tuning, while offering improved data efficiency and a prototype-based form of interpretability. These findings highlight the potential of V-HMN as a memory-centric alternative to standard vision backbones, thereby bridging brain-inspired computation with modern machine learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.25157v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jianfeng Wang, Amine M'Charrak, Luk Koska, Xiangtao Wang, Daniel Petriceanu, Ruizhi Wang, Michael Bumbar, Luca Pinchetti, Thomas Lukasiewicz</dc:creator>
    </item>
    <item>
      <title>Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model</title>
      <link>https://arxiv.org/abs/2603.25184</link>
      <description>arXiv:2603.25184v2 Announce Type: replace 
Abstract: Reinforcement learning (RL) has become essential for post-training large language models (LLMs) in reasoning tasks. While scaling rollouts can stabilize training and enhance performance, the computational overhead is a critical issue. In algorithms like GRPO, multiple rollouts per prompt incur prohibitive costs, as a large portion of prompts provide negligible gradients and are thus of low utility. To address this problem, we investigate how to select high-utility prompts before the rollout phase. Our experimental analysis reveals that sample utility is non-uniform and evolving: the strongest learning signals concentrate at the ``learning edge", the intersection of intermediate difficulty and high uncertainty, which shifts as training proceeds. Motivated by this, we propose HIVE (History-Informed and online-VErified prompt selection), a dual-stage framework for data-efficient RL. HIVE utilizes historical reward trajectories for coarse selection and employs prompt entropy as a real-time proxy to prune instances with stale utility. By evaluating HIVE across multiple math reasoning benchmarks and models, we show that HIVE yields significant rollout efficiency without compromising performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.25184v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiahao Wu, Ning Lu, Shengcai Liu, Kun Wang, Yanting Yang, Bailong Lin, Chen Jason Zhang, Li Qing, Ke Tang</dc:creator>
    </item>
    <item>
      <title>AnyHand: A Large-Scale Synthetic Dataset for RGB(-D) Hand Pose Estimation</title>
      <link>https://arxiv.org/abs/2603.25726</link>
      <description>arXiv:2603.25726v3 Announce Type: replace 
Abstract: We present AnyHand, a large-scale synthetic dataset designed to advance the state of the art in 3D hand pose estimation. While recent works with foundation approaches have shown that scaling training data markedly improves hand pose estimation, existing real-world datasets are limited in coverage, and prior synthetic datasets rarely provide occlusions, arm details, and aligned depth together at scale. To address this bottleneck, our proposed AnyHand contains 2.5M single-hand and 4.1M hand-object interaction RGB-D images, with rich geometric annotations. We show that extending the original training data recipes of existing RGB baselines with AnyHand yields significant gains on multiple benchmarks (FreiHAND and HO-3D), even when keeping the architectures and training schemes fixed. Together with extensive ablations on the scale and composition of the training data setups, these results suggest that training data diversity and quality are as critical as scale for advancing hand pose estimation. We further examine the utility of AnyHand's aligned depth maps in the appendix, showing that scaling RGB-D supervision with AnyHand allows a lightweight depth-fusion variant of existing RGB baselines to outperform prior RGB-D methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.25726v3</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Chen Si, Yulin Liu, Bo Ai, Jianwen Xie, Rolandos Alexandros Potamias, Chuanxia Zheng, Hao Su</dc:creator>
    </item>
    <item>
      <title>A Camera-Native Talking-Head Video Dataset for Various Computer Vision Tasks</title>
      <link>https://arxiv.org/abs/2603.26763</link>
      <description>arXiv:2603.26763v2 Announce Type: replace 
Abstract: Talking-head videos constitute a predominant content type in real-time communication, yet publicly available datasets for video processing research in this domain remain scarce and limited in signal fidelity. In this paper, we open-source a camera-native dataset of 847 talking-head recordings (approximately 212 minutes), each 15s in duration, captured from 805 participants using 446 unique consumer webcam devices in their natural environments. All recordings are stored using the FFV1 lossless codec, preserving the camera-native signal -- uncompressed (24.4%) or MJPEG-encoded (75.6%) -- without additional lossy processing. Each recording is annotated with a Mean Opinion Score (MOS) and ten perceptual quality tokens that jointly explain 64.4% of the MOS variance. From this corpus, we curate a stratified benchmarking subset of 120 clips in three content conditions: original, background blur, and background replacement. Codec efficiency evaluation across four datasets and four codecs, namely H.264, H.265, H.266, and AV1, yields VMAF BD-rate savings up to $-71.3\%$ (H.266) relative to H.264, with significant encoder$\times$dataset ($\eta_p^2 = .112$) and encoder$\times$content condition ($\eta_p^2 = .149$) interactions, demonstrating that both content type and background processing affect compression efficiency. A preliminary super-resolution evaluation with four SR models confirms that the dataset significantly affects absolute performance while preserving model rankings, demonstrating applicability beyond codec benchmarking. The dataset offers 5$\times$ the scale of the largest prior talking-head webcam dataset (847 vs. 160 clips) with lossless signal fidelity, establishing a resource for benchmarking video compression, super-resolution, quality assessment, and enhancement models in real-time communication.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.26763v2</guid>
      <category>cs.CV</category>
      <category>cs.MM</category>
      <category>eess.IV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Babak Naderi, Ross Cutler, Nabakumar Singh Khongbantabam</dc:creator>
    </item>
    <item>
      <title>Fully Spiking Neural Networks with Target Awareness for Energy-Efficient UAV Tracking</title>
      <link>https://arxiv.org/abs/2603.27493</link>
      <description>arXiv:2603.27493v2 Announce Type: replace 
Abstract: Spiking Neural Networks (SNNs), characterized by their event-driven computation and low power consumption, have shown great potential for energy-efficient visual tracking on unmanned aerial vehicles (UAVs). However, existing SNN-based trackers often rely on costly event cameras, which limits their deployment on standard RGB-camera UAV platforms. To address this limitation, we propose STATrack, a fully spiking neural network framework for UAV visual tracking using only RGB inputs. To the best of our knowledge, this is the first study to explore fully spiking neural networks for RGB-based UAV visual tracking. To alleviate target semantic degradation caused by spike discretization and reduce background interference in UAV scenes, we introduce an Adaptive Mutual Information Maximization (AMIM) mechanism. AMIM maximizes the mutual information between template inputs and their deep target-aware features, encouraging the spiking backbone to preserve discriminative target semantics. In addition, a sample-difficulty-aware dynamic weighting strategy is designed to adaptively adjust the mutual information constraint during training. Extensive experiments on four widely used UAV tracking benchmarks demonstrate that STATrack achieves state-of-the-art tracking performance with low theoretical energy consumption, highlighting its potential for energy-constrained UAV applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.27493v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Pengzhi Zhong, Jiwei Mo, Dan Zeng, Feixiang He, Shuiwang Li</dc:creator>
    </item>
    <item>
      <title>A Unified Algebraic Framework for Subspace Pruning in Koopman Operator Approximation via Principal Vectors</title>
      <link>https://arxiv.org/abs/2603.29001</link>
      <description>arXiv:2603.29001v2 Announce Type: replace 
Abstract: Finite-dimensional approximations of the Koopman operator rely critically on identifying nearly invariant subspaces. This invariance proximity can be rigorously quantified via the principal angles between a candidate subspace and its image under the operator. To systematically minimize this error, we propose an algebraic framework for subspace pruning utilizing principal vectors. We establish the equivalence of this approach to existing consistency-based methods while providing a foundation for broader generalizations. To ensure scalability, we introduce an efficient numerical update scheme based on rank-one modifications, reducing the computational complexity of tracking principal angles by an order of magnitude. Finally, we demonstrate the effectiveness of our framework through numerical simulations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.29001v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dhruv Shah, Jorge Cortes</dc:creator>
    </item>
    <item>
      <title>Wherefore Art Thou? Provenance-Guided Automatic Online Debugging with Lumos</title>
      <link>https://arxiv.org/abs/2603.29013</link>
      <description>arXiv:2603.29013v3 Announce Type: replace 
Abstract: Debugging distributed systems in-production is inevitable and hard. Myriad interactions between concurrent components in modern, complex and large-scale systems cause non-deterministic bugs that offline testing and verification fail to capture. When bugs surface at runtime, their root causes may be far removed from their symptoms. To identify a root cause, developers often need evidence scattered across multiple components and traces. Unfortunately, existing tools fail to quickly and automatically record useful provenance information at low overheads, leaving developers to manually perform the onerous evidence collection task. Lumos is an online debugging framework that exposes application-level bug provenances--the computational history linking symptoms of an incident to their root causes. Lumos leverages dependency-guided instrumentation powered by static analysis to identify program state related to a bug's provenance, and exposes them via lightweight on-demand recording. Lumos provides developers with enough evidence to identify a bug's root cause, while incurring low runtime overhead, and given only a few occurrences of a bug.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.29013v3</guid>
      <category>cs.SE</category>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Jingyuan Chen, Lei Zhang, Leon Schuermann, Gongqi Huang, Ravi Netravali, Amit Levy</dc:creator>
    </item>
    <item>
      <title>Stochastic Dimension Implicit Functional Projections for Global Integral Conservation in High-Dimensional PINNs</title>
      <link>https://arxiv.org/abs/2603.29237</link>
      <description>arXiv:2603.29237v2 Announce Type: replace 
Abstract: Enforcing prescribed global integral constraints in mesh-free neural PDE solvers is challenging in high-dimensional domains. Existing projection methods for spatial integrals are often tied to fixed grids or uniform quadrature, which can conflict with randomly sampled physics-informed neural networks (PINNs) and scale poorly with dimension. High-order differential operators also increase reverse-mode automatic differentiation memory costs. We propose Stochastic Dimension Implicit Functional Projection (SDIFP), a quadrature-level framework for enforcing prescribed first and second spatial moments. SDIFP replaces tensor-product nodal projection by a global affine correction of the neural-network output, with two scalar coefficients determined from a weighted quadrature rule. Under positive target variance and nonzero empirical raw variance, this correction is the nearest-point projection, in the weighted quadrature norm, onto the empirical two-moment constraint set. Thus, the prescribed moments are exact for the selected quadrature rule, while continuum errors are quadrature errors of the corrected field. For decomposable high-dimensional linear operators, SDIFP combines affine moment correction with stochastic operator-subset sampling. With independent residual and derivative sampling and conditionally unbiased coefficient-gradient estimation, the resulting estimator is unbiased for the specified quadrature-based residual objective; the shared-subset fast mode is biased in general. SDIFP avoids tensor-product quadrature for moment enforcement, separates forward quadrature evaluation from the reverse-mode graph, and retains pointwise inference efficiency once the affine coefficients are fixed or precomputed.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.29237v2</guid>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhangyong Liang, Huanhuan Gao</dc:creator>
    </item>
    <item>
      <title>All-in-One Augmented Reality Guided Head and Neck Tumor Resection</title>
      <link>https://arxiv.org/abs/2603.29495</link>
      <description>arXiv:2603.29495v2 Announce Type: replace 
Abstract: Positive margins are common in head and neck squamous cell carcinoma, yet intraoperative re-resection is often imprecise because margin locations are typically communicated verbally from pathology. We present an all-in-one augmented reality (AR) system that relocalizes positive margins from a resected specimen to the resection bed and visualizes them in situ using HoloLens 2 depth sensing and fully automated markerless surface registration. In a silicone phantom study with six medical trainees, markerless registration achieved target registration errors comparable to a marker-based baseline (median 1.8 mm vs. 1.7 mm; maximum &lt; 4 mm). In a margin relocalization task, AR guidance reduced error from verbal guidance (median 14.2 mm) to a few millimeters (median 3.2 mm), with all AR localizations within 5 mm error. These results support the feasibility of markerless AR margin guidance for more precise intraoperative re-excision.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.29495v2</guid>
      <category>cs.CV</category>
      <category>cs.ET</category>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yue Yang, Matthieu Chabanas, Carrie Reale, Annie Benson, Jason Slagle, Matthew Weinger, Michael Topf, Jie Ying Wu</dc:creator>
    </item>
    <item>
      <title>Curvature-Guided LoRA: Matching Full Fine-Tuning in Function Space</title>
      <link>https://arxiv.org/abs/2603.29824</link>
      <description>arXiv:2603.29824v2 Announce Type: replace 
Abstract: Parameter-efficient fine-tuning methods such as LoRA enable efficient adaptation of large pretrained models, but often lag behind full fine-tuning in both convergence speed and final performance. Recent approaches aim to reduce this gap by aligning LoRA parameter updates with those of full fine-tuning, but such parameter-space alignment only indirectly controls model predictions. Instead, we adopt a function-space perspective and formulate the \emph{prediction alignment problem}, whose objective is to match the outputs of LoRA fine-tuning to those of full fine-tuning. We show that this objective naturally leads to a curvature-aware, second-order formulation, where optimal low-rank updates correspond to a Newton-like, curvature-whitened gradient. Based on this insight, we propose Curvature-Guided LoRA (CG-LoRA), an algorithm that selects adaptation directions using local curvature information. Our method is computationally efficient and avoids explicit second-order matrix construction. Experiments on standard natural language understanding benchmarks demonstrate improved performance and faster convergence compared to existing LoRA variants.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.29824v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Fr\'ed\'eric Zheng, Alexandre Prouti\`ere</dc:creator>
    </item>
    <item>
      <title>UnWeaving the knots of GraphRAG -- turns out VectorRAG is almost enough</title>
      <link>https://arxiv.org/abs/2603.29875</link>
      <description>arXiv:2603.29875v3 Announce Type: replace 
Abstract: One of the key problems in Retrieval-augmented generation (RAG) systems is that chunk-based retrieval pipelines represent the source chunks as atomic objects, mixing the information contained within such a chunk into a single vector. These vector representations are then fundamentally treated as isolated, independent and self-sufficient, with no attempt to represent possible relations between them. Such an approach has no dedicated mechanisms for handling multi-hop questions. Graph-based RAG systems aimed to ameliorate this problem by modeling information as knowledge-graphs, with entities represented by nodes being connected by robust relations, and forming hierarchical communities. This approach however suffers from its own issues with some of them being: orders of magnitude increased componential complexity in order to create graph-based indices, and reliance on heuristics for performing retrieval. We propose UnWeaver, a novel RAG framework simplifying the idea of GraphRAG. UnWeaver disentangles the contents of the documents into entities which can occur across multiple chunks using an LLM. In the retrieval process entities are used as an intermediate way of recovering original text chunks hence preserving fidelity to the source material. We argue that entity-based decomposition yields a more distilled representation of original information, and additionally serves to reduce noise in the indexing, and generation process. Furthermore we experimentally show that on end to end QA evaluation VectorRAG performs better than standard GraphRAG and almost as good as current SOTA graph-based solutions, for a fraction of the cost.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.29875v3</guid>
      <category>cs.IR</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.5281/zenodo.19203878</arxiv:DOI>
      <dc:creator>Ryszard Tuora, Mateusz Gali\'nski, Micha{\l} Godziszewski, Micha{\l} Karpowicz, Mateusz Czy\.znikiewicz, Adam Kozakiewicz, Tomasz Zi\k{e}tkiewicz</dc:creator>
    </item>
    <item>
      <title>IDDM: Identity-Decoupled Personalized Diffusion Models with a Tunable Privacy-Utility Trade-off</title>
      <link>https://arxiv.org/abs/2604.00903</link>
      <description>arXiv:2604.00903v2 Announce Type: replace 
Abstract: Personalized text-to-image diffusion models (e.g., DreamBooth, LoRA) enable users to synthesize high-fidelity avatars from a few reference photos for social expression. However, once these generations are shared on social media platforms (e.g., Instagram, Facebook), they can be linked to the real user via face recognition systems, enabling identity tracking and profiling. Existing defenses mainly follow an anti-personalization strategy that protects publicly released reference photos by disrupting model fine-tuning. While effective against unauthorized personalization, they do not address another practical setting in which personalization is authorized, but the resulting public outputs still leak identity information.
  To address this problem, we introduce a new defense setting, termed model-side output immunization, whose goal is to produce a personalized model that supports authorized personalization while reducing the identity linkability of public generations, with tunable control over the privacy-utility trade-off to accommodate diverse privacy needs. To this end, we propose Identity-Decoupled personalized Diffusion Models (IDDM), a model-side defense that integrates identity decoupling into the personalization pipeline. Concretely, IDDM follows an alternating procedure that interleaves short personalization updates with identity-decoupled data optimization, using a two-stage schedule to balance identity linkability suppression and generation utility. Extensive experiments across multiple datasets, diverse prompts, and state-of-the-art face recognition systems show that IDDM consistently reduces identity linkability while preserving high-quality personalized generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.00903v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Linyan Dai, Xinwei Zhang, Haoyang Li, Qingqing Ye, Haibo Hu</dc:creator>
    </item>
    <item>
      <title>Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks</title>
      <link>https://arxiv.org/abs/2604.01039</link>
      <description>arXiv:2604.01039v2 Announce Type: replace 
Abstract: System Instructions in Large Language Models (LLMs) are commonly used to enforce safety policies, define agent behavior, and protect sensitive operational context in agentic AI applications. These instructions may contain sensitive information such as API credentials, internal policies, and privileged workflow definitions, making system instruction leakage a critical security risk highlighted in the OWASP Top 10 for LLM Applications. Without incurring the overhead costs of reasoning models, many LLM applications rely on refusal-based instructions that block direct requests for system instructions, implicitly assuming that prohibited information can only be extracted through explicit queries. We introduce an automated evaluation framework that tests whether system instructions remain confidential when extraction requests are re-framed as encoding or structured output tasks. Across four common models and 46 verified system instructions, we observe high attack success rates ( &gt; 0.7) for structured serialization where models refuse direct extraction requests but disclose protected content in the requested serialization formats. We further demonstrate a mitigation strategy based on one-shot instruction reshaping using a Chain-of-Thought reasoning model, indicating that even subtle changes in wording and structure of system instructions can significantly reduce attack success rate without requiring model retraining.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.01039v2</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Anubhab Sahu, Diptisha Samanta, Reza Soosahabi</dc:creator>
    </item>
    <item>
      <title>Koopman Subspace Pruning in Reproducing Kernel Hilbert Spaces via Principal Vectors</title>
      <link>https://arxiv.org/abs/2604.01459</link>
      <description>arXiv:2604.01459v2 Announce Type: replace 
Abstract: Data-driven approximations of the infinite-dimensional Koopman operator rely on finite-dimensional projections, where the predictive accuracy of the resulting models hinges heavily on the invariance of the chosen subspace. Subspace pruning systematically discards geometrically misaligned directions to enhance this invariance proximity, which formally corresponds to the largest principal angle between the subspace and its image under the operator. Yet, existing techniques are largely restricted to Euclidean settings. To bridge this gap, this paper presents an approach for computing principal angles and vectors to enable Koopman subspace pruning within a Reproducing Kernel Hilbert Space (RKHS) geometry. We first outline an exact computational routine, which is subsequently scaled for large datasets using randomized Nystrom approximations. Based on these foundations, we introduce the Kernel-SPV and Approximate Kernel-SPV algorithms for targeted subspace refinement via principal vectors. Simulation results validate our approach.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.01459v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dhruv Shah, Jorge Cortes</dc:creator>
    </item>
    <item>
      <title>Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression</title>
      <link>https://arxiv.org/abs/2604.01609</link>
      <description>arXiv:2604.01609v2 Announce Type: replace 
Abstract: The deployment of Large Language Models is constrained by the memory and bandwidth demands of static weights and dynamic Key-Value cache. SVD-based compression provides a hardware-friendly solution to reduce these costs. However, existing methods suffer from two key limitations: some are suboptimal in reconstruction error, while others are theoretically optimal but practically inefficient. In this paper, we propose Swift-SVD, an activation-aware, closed-form compression framework that simultaneously guarantees theoretical optimum, practical efficiency and numerical stability. Swift-SVD incrementally aggregates covariance of output activations given a batch of inputs and performs a single eigenvalue decomposition after aggregation, enabling training-free, fast, and optimal layer-wise low-rank approximation. We employ effective rank to analyze local layer-wise compressibility and design a dynamic rank allocation strategy that jointly accounts for local reconstruction loss and end-to-end layer importance. Extensive experiments across six LLMs and eight datasets demonstrate that Swift-SVD outperforms state-of-the-art baselines, achieving optimal compression accuracy while delivering 3-70X speedups in end-to-end compression time. Our code is available at https://github.com/hiahei/Swift-SVD.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.01609v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ruoling Qi, Yirui Liu, Xuaner Wu, Xiangyu Wang, Ming Li, Chen Chen, Jian Chen, Yin Chen, Qizhen Weng</dc:creator>
    </item>
    <item>
      <title>COMPASS: Complete Multimodal Fusion via Proxy Tokens and Shared Spaces for Ubiquitous Sensing</title>
      <link>https://arxiv.org/abs/2604.02056</link>
      <description>arXiv:2604.02056v2 Announce Type: replace 
Abstract: Missing modalities in multimodal sensing cause not only information loss but also a fusion-interface mismatch: a fusion head trained on a canonical set of modality slots must operate on changing observed subsets at inference time. We propose Compass, an interface-complete fusion framework that restores this canonical slot structure before prediction. Each modality is assigned a fixed fusion slot. Observed modalities populate their slots with real representations, while absent modalities are filled with target-slot completion representations estimated from the observed sources. Multiple source-specific estimates for the same missing slot are aggregated into a single slot filler, allowing the same lightweight fusion operator to be applied under arbitrary missing-modality patterns. Training uses synthetic modality masking, slot-compatibility supervision, and representation-space stabilization to make completed slots compatible with real modality representations and useful for downstream recognition. Across XRF55, MM-Fi, and OctoNet, Compass improves robustness under diverse single- and multiple-missing settings, including controlled comparisons against imputation, distillation, and translation-style baselines. These results suggest that preserving the fusion interface is a simple and effective principle for robust multimodal sensing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.02056v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hao Wang, Yanyu Qian, Pengcheng Weng, Zixuan Xia, William Dan, Yangxin Xu, Fei Wang</dc:creator>
    </item>
    <item>
      <title>Cognitive Comparability and the Limits of Governance: Evaluating Authority Under Radical Capability Asymmetry</title>
      <link>https://arxiv.org/abs/2604.02720</link>
      <description>arXiv:2604.02720v3 Announce Type: replace 
Abstract: Governance theory presupposes a rough cognitive comparability between governors and governed. This paper makes that assumption explicit and testable through a six-dimension evaluation framework covering legitimacy, accountability, corrigibility, non-domination, subsidiarity, and institutional resilience, drawn from political legitimacy theory, principal-agent models, republican theory, and the AI alignment literature. The framework is first demonstrated on existing non-majoritarian institutions, where capability asymmetry is real but bounded, and then applied to a prospective case of bounded superintelligent authority, where the asymmetry is radical. Four of six dimensions show structural failures. Two of the four appear tractable to institutional design (subsidiarity scope limitation and institutional resilience). The other two, the public reason problem under cognitive incomprehensibility and the non-domination problem under permanent capability asymmetry, call for new normative theory rather than better institutional design. The analysis also finds that dimensions which operate as independent checks under bounded asymmetry begin to degrade together under radical asymmetry, because each depends on the same oversight capacity. The assumptions that allowed these checks to remain independent have gone unexamined so far because they have always held.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.02720v3</guid>
      <category>cs.CY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tony Rost</dc:creator>
    </item>
    <item>
      <title>MC-CPO: Mastery-Conditioned Constrained Policy Optimization for Pedagogically Safe Intelligent Tutoring Systems</title>
      <link>https://arxiv.org/abs/2604.04251</link>
      <description>arXiv:2604.04251v2 Announce Type: replace 
Abstract: Intelligent tutoring systems increasingly rely on reinforcement learning to personalise instruction, yet optimising for observable engagement signals can systematically decouple learner activity from genuine knowledge acquisition. Analysing over 21 million student interactions across two deployed platforms, we find engagement events without corresponding mastery gains occur in 26.5% of interactions on Junyi Academy (72,758 students) and 3.1% on XES3G5M (14,453 students, NeurIPS 2023), confirming this pattern is directly observable in deployed educational technology at scale. We introduce Mastery-Conditioned Constrained Policy Optimisation (MC-CPO), a reinforcement learning framework that addresses this problem structurally. MC-CPO conditions the admissible instructional action space on learner mastery state: a concept becomes available only when prerequisite knowledge meets a mastery threshold, yielding an action space that expands naturally as learners acquire knowledge. Pedagogical safety constraints are enforced by construction, with formal guarantees of structural prerequisite safety, primal-dual convergence, and strict dominance over post-hoc filtering. MC-CPO is the only method to reduce reward hacking severity across all conditions. Mean per-episode mastery gain increases by 18.3% on Junyi Academy and 54.0% on XES3G5M relative to all baselines, while competitive engagement performance is maintained. These results support structural constraint modelling as a principled foundation for safer adaptive instructional policies in deployed tutoring systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.04251v2</guid>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Oluseyi Olukola, Nick Rahimi</dc:creator>
    </item>
    <item>
      <title>Bounded by Risk, Not Capability: Quantifying AI Occupational Substitution Rates via a Tech-Risk Dual-Factor Model</title>
      <link>https://arxiv.org/abs/2604.04464</link>
      <description>arXiv:2604.04464v2 Announce Type: replace 
Abstract: The deployment of Large Language Models (LLMs) has ignited concerns about technological unemployment. Existing task-based evaluations predominantly measure theoretical "exposure" to AI capabilities, ignoring critical frictions of real-world commercial adoption: liability, compliance, and physical safety. We argue occupations are not eradicated instantaneously, but gradually encroached upon via atomic actions. We introduce a Tech-Risk Dual-Factor Model to re-evaluate this. By deconstructing 923 occupations into 2,087 Detailed Work Activities (DWAs), we utilize a multi-agent LLM ensemble to score both technical feasibility and business risk. Through variance-based Human-in-the-Loop (HITL) validation with an expert panel, we demonstrate a profound cognitive gap: isolated algorithmic probabilities fail to encapsulate the "institutional premium" imposed by experts bounded by professional liability. Applying a strictly algorithmic baseline via mathematical bottleneck aggregation, we calculate Relative Occupational Automation Indices ($OAI$) for the U.S. labor market. Our findings challenge the traditional Routine-Biased Technological Change (RBTC) hypothesis. Non-routine cognitive roles highly dependent on symbolic manipulation (e.g., Data Scientists) face unprecedented exposure ($OAI \approx 0.70$). Conversely, unstructured physical trades and high-stakes caretaking roles exhibit absolute resilience, quantifying a profound "Cognitive Risk Asymmetry." We hypothesize the emergent necessity of a "Compliance Premium," indicating wage resilience increasingly tied to risk-absorption capacity. We frame these findings as a cross-sectional diagnostic of systemic vulnerability, establishing a foundation for subsequent Computable General Equilibrium (CGE) econometric modeling involving dynamic wage elasticity and structural labor reallocation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.04464v2</guid>
      <category>cs.CY</category>
      <category>econ.GN</category>
      <category>q-fin.EC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shuyao Gao (aSSIST University, Seoul, South Korea), Minghao Huang (aSSIST University, Seoul, South Korea)</dc:creator>
    </item>
    <item>
      <title>Relational Epipolar Graphs for Robust Relative Camera Pose Estimation</title>
      <link>https://arxiv.org/abs/2604.04554</link>
      <description>arXiv:2604.04554v2 Announce Type: replace 
Abstract: A key component of Visual Simultaneous Localization and Mapping (VSLAM) is estimating relative camera poses using matched keypoints. Accurate estimation is challenged by noisy correspondences. Classical methods rely on stochastic hypothesis sampling and iterative estimation, while learning-based methods often lack explicit geometric structure. In this work, we reformulate relative pose estimation as a relational inference problem over epipolar correspondence graphs, where matched keypoints are nodes and nearby ones are connected by edges. Graph operations such as pruning, message passing, and pooling estimate a quaternion rotation, translation vector, and the Essential Matrix (EM). Minimizing a loss comprising (i) $\mathcal{L}_2$ differences with ground truth (GT), (ii) Frobenius norm between estimated and GT EMs, (iii) singular value differences, (iv) heading angle differences, and (v) scale differences, yields the relative pose between image pairs. The dense detector-free method LoFTR is used for matching. Experiments on indoor and outdoor benchmarks show improved robustness to dense noise and large baseline variation compared to classical and learning-guided approaches, highlighting the effectiveness of global relational consensus.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.04554v2</guid>
      <category>cs.CV</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Prateeth Rao, Sachit Rao</dc:creator>
    </item>
    <item>
      <title>Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook</title>
      <link>https://arxiv.org/abs/2604.06210</link>
      <description>arXiv:2604.06210v4 Announce Type: replace 
Abstract: As LLMs are globally deployed, aligning their cultural value orientations is critical for safety and user engagement. However, existing benchmarks face the Construct-Composition-Context ($C^3$) challenge: relying on discriminative, multiple-choice formats that probe value knowledge rather than true orientations, overlook subcultural heterogeneity, and mismatch with real-world open-ended generation. We introduce DOVE, a distributional evaluation framework that directly compares human-written text distributions with LLM-generated outputs. DOVE utilizes a rate-distortion variational optimization objective to construct a compact value codebook from 10K documents, mapping text into a structured value space to filter semantic noise. Alignment is measured using unbalanced optimal transport, capturing intra-cultural distributional structures and subgroup diversity. Experiments across 12 LLMs show that DOVE achieves superior predictive validity, attaining a 31.56% correlation with downstream tasks, while maintaining high reliability with as few as 500 samples per culture.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.06210v4</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Jaehyeok Lee, Xiaoyuan Yi, Jing Yao, Hyunjin Hwang, Roy Ka-Wei Lee, Xing Xie, JinYeong Bak</dc:creator>
    </item>
    <item>
      <title>Energy-Regularized Spatial Masking: A Novel Approach to Enhancing Robustness and Interpretability in Vision Models</title>
      <link>https://arxiv.org/abs/2604.06893</link>
      <description>arXiv:2604.06893v3 Announce Type: replace 
Abstract: Deep convolutional neural networks achieve remarkable performance by exhaustively processing dense spatial feature maps, yet this brute-force strategy introduces significant computational redundancy and encourages reliance on spurious background correlations. As a result, modern vision models remain brittle and difficult to interpret. We propose Energy-Regularized Spatial Masking (ERSM), a novel framework that reformulates feature selection as a differentiable energy minimization problem. By embedding a lightweight Energy-Mask Layer inside standard convolutional backbones, each visual token is assigned a scalar energy composed of two competing forces: an intrinsic Unary importance cost and a Pairwise spatial coherence penalty. Unlike prior pruning methods that enforce rigid sparsity budgets or rely on heuristic importance scores, ERSM allows the network to autonomously discover an optimal information-density equilibrium tailored to each input. We validate ERSM on convolutional architectures and demonstrate that it produces emergent sparsity, improved robustness to structured occlusion, and highly interpretable spatial masks, while preserving classification accuracy. Furthermore, we show that the learned energy ranking significantly outperforms magnitude-based pruning in deletion-based robustness tests, revealing ERSM as an intrinsic denoising mechanism that isolates semantic object regions without pixel-level supervision.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.06893v3</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tom Devynck, Bilal Faye, Djamel Bouchaffra, Nadjib Lazaar, Hanane Azzag, Mustapha Lebbah</dc:creator>
    </item>
    <item>
      <title>Scalable and Private Federated Learning Using Distributed Differential Privacy and Secure Aggregation</title>
      <link>https://arxiv.org/abs/2604.07125</link>
      <description>arXiv:2604.07125v2 Announce Type: replace 
Abstract: This article presents DDP-SA, a scalable privacy-preserving federated learning framework that jointly leverages client-side local differential privacy (LDP) and full-threshold additive secret sharing (ASS) for secure aggregation. Unlike existing methods that rely solely on differential privacy or on secure multi-party computation (MPC), DDP-SA integrates both techniques to deliver stronger end-to-end privacy guarantees while remaining computationally practical. The framework introduces a two-stage protection mechanism: clients first perturb their local gradients with calibrated Laplace noise, then decompose the noisy gradients into additive secret shares that are distributed across multiple intermediate servers. This design ensures that (i) no single compromised server or communication channel can reveal any information about individual client updates, and (ii) the parameter server reconstructs only the aggregated noisy gradient, never any client-specific contribution. Extensive experiments show that DDP-SA achieves substantially higher model accuracy than standalone LDP while providing stronger privacy protection than MPC-only approaches. The proposed framework scales linearly with the number of participants and offers a practical, privacy-preserving solution for federated learning applications with controllable computational and communication overhead.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.07125v2</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wenjing Wei, Farid Nait-Abdesselam, Alla Jammine</dc:creator>
    </item>
    <item>
      <title>SPAMoE: Spectrum-Aware Hybrid Operator Framework for Full-Waveform Inversion</title>
      <link>https://arxiv.org/abs/2604.07421</link>
      <description>arXiv:2604.07421v3 Announce Type: replace 
Abstract: Full-waveform inversion (FWI) is pivotal for reconstructing high-resolution subsurface velocity models but remains computationally intensive and ill-posed. While deep learning approaches promise efficiency, existing Convolutional Neural Networks (CNNs) and single-paradigm Neural Operators (NOs) struggle with one fundamental issue: frequency entanglement of multi-scale geological features. To address this challenge, we propose Spectral-Preserving Adaptive MoE (SPAMoE), a novel spectrum-aware framework for solving inverse problems with complex multi-scale structures. Our approach introduces a Spectral-Preserving DINO Encoder that enforces a lower bound on the high-to-low frequency energy ratio of the encoded representation, mitigating high-frequency collapse and stabilizing subsequent frequency-domain modeling. Furthermore, we design a novel Spectral Decomposition and Routing mechanism that dynamically assigns frequency bands to a Mixture-of-Experts (MoE) ensemble comprising FNO, MNO, and LNO. On the ten OpenFWI sub-datasets, experiments show that SPAMoE reduces the average MAE by 44.4% relative to the best officially reported OpenFWI baseline, thereby establishing a new architectural framework for learning-based full-waveform inversion. Our code and data are available at https://github.com/zhenyuwang12366/SPAMoE</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.07421v3</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhenyu Wang, Peiyuan Li, Yongxiang Shi, Ruoyu Wu, Chenfei Liao, Lei Zhang</dc:creator>
    </item>
    <item>
      <title>Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning</title>
      <link>https://arxiv.org/abs/2604.07848</link>
      <description>arXiv:2604.07848v2 Announce Type: replace 
Abstract: Multi-task learning shows strikingly inconsistent results -- sometimes joint training helps substantially, sometimes it actively harms performance -- yet the field lacks a principled framework for predicting these outcomes. We identify a fundamental but unstated assumption underlying gradient-based task analysis: tasks must share training instances for gradient conflicts to reveal genuine relationships. When tasks are measured on the same inputs, gradient alignment reflects shared mechanistic structure; when measured on disjoint inputs, any apparent signal conflates task relationships with distributional shift. We discover this sample overlap requirement exhibits a sharp phase transition: below 30% overlap, gradient-task correlations are statistically indistinguishable from noise; above 40%, they reliably recover known biological structure. Comprehensive validation across multiple datasets achieves strong correlations and recovers biological pathway organization. Standard benchmarks systematically violate this requirement -- MoleculeNet operates at &lt;5% overlap, TDC at 8-14% -- far below the threshold where gradient analysis becomes meaningful. This provides the first principled explanation for seven years of inconsistent MTL results.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.07848v2</guid>
      <category>cs.LG</category>
      <category>q-bio.MN</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jasper Zhang, Bryan Cheng</dc:creator>
    </item>
    <item>
      <title>Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions</title>
      <link>https://arxiv.org/abs/2604.08304</link>
      <description>arXiv:2604.08304v3 Announce Type: replace 
Abstract: Retrieval-augmented generation (RAG) extends large language models (LLMs) with external knowledge, but this access path also introduces security risks that existing work often conflates with inherent LLM flaws. We frame secure RAG as securing external knowledge access and organize the literature with SLOT, a taxonomy along four axes: the attack Surface (S) where an adversary acts, the defense Layer (L) that controls the same point, the Objective (O) it breaks following the CIA properties, and the Target (T) it pursues, from a single known query (T1) to target-claim manipulation across a query distribution (T2). Mapping attacks, defenses, remediation, and evaluation onto a six-stage knowledge-access pipeline, we expose two structural mismatches. Finally, we discuss directions for more realistic targets, no-blind-spot and adaptively evaluated defenses, stronger confidentiality, and evaluation for multimodal and agentic RAG. The curated paper list for RAG security is in: https://github.com/TreeAI-Lab/Awesome-RAG-Security.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.08304v3</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuming Xu, Mingtao Zhang, Zhuohan Ge, Haoyang Li, Nicole Hu, Yongqi Zhang, Zhiyuan Wen, Jason Chen Zhang, Qing Li, Lei Chen</dc:creator>
    </item>
    <item>
      <title>AI generates well-liked but templatic empathic responses</title>
      <link>https://arxiv.org/abs/2604.08479</link>
      <description>arXiv:2604.08479v2 Announce Type: replace 
Abstract: Recent research shows that greater numbers of people are turning to Large Language Models (LLMs) for emotional support, and that people rate LLM responses as more empathic than human-written responses. We suggest a reason for this success: LLMs have learned and consistently deploy a well-liked template for expressing empathy. We develop a taxonomy of 10 empathic language "tactics" that include validating someone's feelings and paraphrasing, and apply this taxonomy to characterize the language that people and LLMs produce when writing empathic responses. Across a set of 2 studies comparing a total of n = 3,265 AI-generated (by six models) and n = 1,290 human-written responses, we find that LLM responses are highly formulaic at a discourse functional level. We discovered a template -- a structured sequence of tactics -- that matches between 83--90% of LLM responses (and 60--83\% in a held out sample), and when those are matched, covers 81--92% of the response. By contrast, human-written responses are more diverse. We end with a discussion of implications for the future of AI-generated empathy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.08479v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Emma S. Gueorguieva, Hongli Zhan, Jina Suh, Javier Hernandez, Tatiana Lau, Junyi Jessy Li, Desmond C. Ong</dc:creator>
    </item>
    <item>
      <title>SatIR: Scalable High-Recall Constraint-Satisfaction-Based Information Retrieval for Clinical Trials Matching</title>
      <link>https://arxiv.org/abs/2604.08849</link>
      <description>arXiv:2604.08849v2 Announce Type: replace 
Abstract: Many important retrieval problems are not merely problems of semantic similarity, but problems of constraint satisfaction: a retrieved item should be topically relevant to a query and satisfy explicit requirements involving negation, temporal conditions, numeric thresholds, exceptions, ontological relations, and incomplete evidence. We study this challenge in clinical trial matching, a high-stakes test bed where a useful trial must both address a patient's medical needs and satisfy complex eligibility criteria.
  We propose SatIR, a scalable constraint-based retrieval method for clinical trial matching. SatIR converts trial eligibility criteria and summaries into formal constraints, then retrieves patient--trial pairs by executing these constraints over a database. The system combines Satisfiability Modulo Theories (SMT), relational algebra, medical ontology grounding, and large language models (LLMs): formal methods provide executable and inspectable matching, while LLMs convert ambiguous, incomplete, and implicit clinical information into explicit, controllable constraint representations.
  Across the SIGIR 2016 patient--trial collection and TREC-2022-RetrievalSubset, a benchmark derived from TREC 2022, SATIR consistently improves eligibility-aware retrieval over similarity-based baselines. Relative to TrialGPT-style retrieval, SATIR retrieves 32%--72% more relevant-and-eligible trials per patient on SIGIR 2016 and achieves $1.8$--$3.2\times$ higher eligible-trial recall on TREC-2022-RetrievalSubset. Retrieval is fast, requiring only 146 milliseconds per patient over 3,621 SIGIR trials.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.08849v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.DB</category>
      <category>cs.MA</category>
      <category>cs.SC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Cyrus Zhou, Yufei Jin, Yilin Xu, Yu-Chiang Wang, Chieh-Ju Chao, Monica S. Lam</dc:creator>
    </item>
    <item>
      <title>Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning</title>
      <link>https://arxiv.org/abs/2604.09967</link>
      <description>arXiv:2604.09967v2 Announce Type: replace 
Abstract: Muon has emerged as a promising optimizer for large-scale foundation model pre-training by exploiting the matrix structure of neural network updates through iterative orthogonalization. However, the orthogonalization quality of Muon hinges on the number of Newton--Schulz (NS) iterations performed, which poses efficiency challenges due to its non-trivial computation and communication cost. We propose Muon$^2$, an extension of Muon, to improve both quality and efficiency by applying Adam-style adaptive second-moment preconditioning before orthogonalization. Our key insight is that the core challenge of polar approximation in Muon lies in the ill-conditioned momentum matrix, of which the spectrum is substantially improved by Muon$^2$, leading to faster convergence toward a practically sufficient orthogonalization. We further characterize the practical orthogonalization quality via directional alignment, under which Muon$^2$ demonstrates dramatic improvement over Muon at each polar step. Across GPT, LLaMA, and Mixture-of-Experts pre-training experiments up to 13B parameters, Muon$^2$ (and its memory-efficient variant Muon$^2$-F that preserves most of its benefits) consistently outperforms Muon and its variants while reducing NS iterations by 40%, and saves up to 1/4 training time over Muon when achieving the same loss.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.09967v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Ziyue Liu, Ruijie Zhang, Zhengyang Wang, Yequan Zhao, Yupeng Su, Zi Yang, Zheng Zhang</dc:creator>
    </item>
    <item>
      <title>HARPO: Hierarchical Agentic Reasoning for User-Aligned Conversational Recommendation</title>
      <link>https://arxiv.org/abs/2604.10048</link>
      <description>arXiv:2604.10048v2 Announce Type: replace 
Abstract: Conversational recommender systems (CRSs) operate under incremental preference revelation, requiring recommendation decisions under uncertainty. While recent LLM-based approaches achieve strong performance on proxy metrics such as Recall@K and BLEU, they often fail to deliver high-quality, user-aligned recommendations in practice, as they optimize intermediate objectives like retrieval accuracy or fluent generation rather than recommendation quality itself. We propose HARPO (Hierarchical Agentic Reasoning with Preference Optimization), an agentic framework that reframes conversational recommendation as a structured decision-making process optimized for multi-dimensional recommendation quality. HARPO integrates (i) hierarchical preference learning that decomposes recommendation quality into interpretable dimensions (relevance, diversity, satisfaction, and engagement) with context-dependent weighting; (ii) deliberative tree-search reasoning guided by a learned value network evaluating candidate paths on predicted quality; and (iii) domain-agnostic reasoning abstractions through Virtual Tool Operations and multi-agent refinement. We evaluate HARPO on ReDial, INSPIRED, and MUSE, demonstrating consistent improvements over strong baselines on recommendation-centric metrics while maintaining competitive response quality.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.10048v2</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Subham Raj, Aman Vaibhav Jha, Mayank Anand, Sriparna Saha</dc:creator>
    </item>
    <item>
      <title>MedVeriSeg: Teaching LISA-Like Medical Segmentation Models to Verify Query Validity Without Extra Training</title>
      <link>https://arxiv.org/abs/2604.10242</link>
      <description>arXiv:2604.10242v3 Announce Type: replace 
Abstract: Despite recent progress in text-prompt-based medical image segmentation, existing LISA-like MLLM-based methods typically generate masks regardless of whether the target specified in the query is present, leading to hallucinated segmentation. In this work, we propose MedVeriSeg, a training-free query verification framework that enables LISA-like medical segmentation models to reject false segmentation queries. MedVeriSeg first quantifies the response quality between the [SEG] token and image features through a Similarity Response Quality Scoring Module. To further improve robustness, it employs a Lightweight Routed Multi-Agent Verification Module, which fuses quantitative score evidence with qualitative agent evidence to comprehensively verify the validity of the query. To support systematic evaluation, we construct MedVeriSeg-Bench, a benchmark designed for query verification in medical image segmentation. Experimental results demonstrate that MedVeriSeg effectively identifies false segmentation queries and reduces hallucinated segmentation, while maintaining a high acceptance rate for valid queries, thereby largely preserving the segmentation utility of LISA-like medical segmentation models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.10242v3</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qinyue Tong, Xiaozhen Wang, Ziqian Lu, Jun Liu, Yunlong Yu, Zheming Lu</dc:creator>
    </item>
    <item>
      <title>Hijacking Text Heritage: Hiding the Human Signature through Homoglyphic Substitution</title>
      <link>https://arxiv.org/abs/2604.10271</link>
      <description>arXiv:2604.10271v4 Announce Type: replace 
Abstract: In what way could a data breach involving government-issued IDs such as passports, driver's licenses, etc., rival a random voluntary disclosure on a nondescript social-media platform? At first glance, the former appears more significant, and that is a valid assessment. The disclosed data could contain an individual's date of birth and address; for all intents and purposes, a leak of that data would be disastrous. Given the threat, the latter scenario involving an innocuous online post seems comparatively harmless--or does it? From that post and others like it, a forensic linguist could stylometrically uncover equivalent pieces of information, estimating an age range for the author (adolescent or adult) and narrowing down their geographical location (specific country). While not an exact science--the determinations are statistical--stylometry can reveal comparable, though noticeably diluted, information about an individual. To prevent an ID from being breached, simply sharing it as little as possible suffices. Preventing the leakage of personal information from written text requires a more complex solution: adversarial stylometry. In this paper, we explore how performing homoglyph substitution--the replacement of characters with visually similar alternatives (e.g., "h" $\texttt{[U+0068]}$ $\rightarrow$ "h" $\texttt{[U+04BB]}$)--on text can degrade stylometric systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.10271v4</guid>
      <category>cs.CR</category>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Robert Dilworth</dc:creator>
    </item>
    <item>
      <title>BMdataset: A Musicologically Curated LilyPond Dataset</title>
      <link>https://arxiv.org/abs/2604.10628</link>
      <description>arXiv:2604.10628v2 Announce Type: replace 
Abstract: Symbolic music research has relied almost exclusively on MIDI-based datasets; text-based engraving formats such as LilyPond remain unexplored for music understanding. We present BMdataset, a musicologically curated dataset of 393 LilyPond scores (2,646 movements) transcribed by experts directly from original Baroque manuscripts, with metadata covering composer, musical form, instrumentation, and sectional attributes. Building on this resource, we introduce LilyBERT (weights can be found at https://huggingface.co/csc-unipd/lilybert), a CodeBERT-based encoder adapted to symbolic music through vocabulary extension with 115 LilyPond-specific tokens and masked language model pre-training. Linear probing on the out-of-domain Mutopia corpus shows that, despite its modest size (~90M tokens), fine-tuning on BMdataset alone outperforms continuous pre-training on the full PDMX corpus (~15B tokens) for both composer and style classification, demonstrating that small, expertly curated datasets can be more effective than large, noisy corpora for music understanding. Combining broad pre-training with domain-specific fine-tuning yields the best results overall (84.3% composer accuracy), confirming that the two data regimes are complementary. We release the dataset, tokenizer, and model to establish a baseline for representation learning on LilyPond.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.10628v2</guid>
      <category>cs.SD</category>
      <category>cs.CL</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Matteo Spanio, Ilay Guler, Antonio Rod\`a</dc:creator>
    </item>
    <item>
      <title>Resilient Write: A Six-Layer Durable Write Surface for LLM Coding Agents</title>
      <link>https://arxiv.org/abs/2604.10842</link>
      <description>arXiv:2604.10842v3 Announce Type: replace 
Abstract: LLM-powered coding agents increasingly rely on tool-use protocols such as the Model Context Protocol (MCP) to read and write files on a developer's workstation. When a write fails - due to content filters, truncation, or an interrupted session - the agent typically receives no structured signal, loses the draft, and wastes tokens retrying blindly. We present Resilient Write, an MCP server that interposes a six-layer durable write surface between the agent and the filesystem. The layers - pre-flight risk scoring, transactional atomic writes, resume-safe chunking, structured typed errors, out-of-band scratchpad storage, and task-continuity handoff envelopes - are orthogonal and independently adoptable. Each layer maps to a concrete failure mode observed during a real agent session in April 2026, in which content-safety filters silently rejected a draft containing redacted API-key prefixes. Three additional tools - chunk preview, format-aware validation, and journal analytics - emerged from using the system to compose this paper. A 186-test suite validates correctness at each layer, and quantitative comparison against naive and defensive baselines shows a 5x reduction in recovery time and a 13x improvement in agent self-correction rate. Resilient Write is open-source under the MIT license.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.10842v3</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Justice Owusu Agyemang, Jerry John Kponyo, Elliot Amponsah, Godfred Manu Addo Boakye, Kwame Opuni-Boachie Obour Agyekum</dc:creator>
    </item>
    <item>
      <title>TraversalBench: Challenging Paths to Follow for Vision Language Models</title>
      <link>https://arxiv.org/abs/2604.10999</link>
      <description>arXiv:2604.10999v2 Announce Type: replace 
Abstract: Vision-language models (VLMs) perform strongly on multimodal benchmarks, but their ability to follow complex visual paths remains under-tested. We introduce TraversalBench, a controlled benchmark for exact visual path traversal. Each instance contains a continuous polyline with a unique start marker and labeled vertices; models must recover the ordered sequence encountered from start to finish. The benchmark balances self-intersection count, tortuosity, vertex count, and nearby confounding lines while limiting reliance on OCR, world knowledge, or open-ended planning.
  We find that self-intersections are the dominant source of difficulty. A first-crossing analysis localizes failures to crossing points: performance is stable before the first crossing, then drops sharply when the model must resolve the correct continuation. Nearby confounders have weaker but compounding effects, and an auxiliary reading-order benchmark reveals a consistent left-to-right bias. Together, these results characterize how VLMs perceive and fail on visual paths. Finally, we position TraversalBench as a new contribution to the growing line of sustained and precise visual grounding benchmarks for VLMs. Code, benchmark data, and rendered examples are available at https://github.com/clarapetrova/traversalbench.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.10999v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Clara Petrova, Zhuo Chen, Marin Solja\v{c}i\'c</dc:creator>
    </item>
    <item>
      <title>Models Know Their Shortcuts: Deployment-Time Shortcut Mitigation</title>
      <link>https://arxiv.org/abs/2604.12277</link>
      <description>arXiv:2604.12277v2 Announce Type: replace 
Abstract: Pretrained text encoders are prone to shortcut learning, relying on token-label correlations that fail once the distribution shifts in deployment. Existing shortcut mitigation methods mainly operate at training time and assume access to training data, training dynamics, or shortcut annotations, which are hardly available during deployment, where only the converged model remains. We show that this model alone suffices to mitigate shortcuts during deployment: a biased model internalizes a signal of its learned shortcuts that can be captured via unsupervised gradient-based attribution. We further prove that deployment-time mitigation is information-theoretically upper-bounded by training-time mitigation. Nevertheless, exploiting this gradient signal, our proposed unsupervised deployment-time shortcut mitigation framework for pretrained text encoders, Shortcut Guardrail, recovers substantial performance under shortcut distribution shift, matching or outperforming training-time baselines across sentiment classification, toxicity detection, and natural language inference.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.12277v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Jiayi Li, Shijie Tang, G\"un Kaynar, Shiyi Du, Carl Kingsford</dc:creator>
    </item>
    <item>
      <title>Grid-Forming Characterization in DC Microgrids</title>
      <link>https://arxiv.org/abs/2604.12804</link>
      <description>arXiv:2604.12804v3 Announce Type: replace 
Abstract: DC microgrids are converter-based electrical networks that are increasingly being used in various applications, including data centers and industrial distribution systems. A central challenge in their operation is maintaining the DC-bus voltage within predefined limits while ensuring overall system stability. Although a wide variety of converter control algorithms has been proposed to achieve these objectives, the literature lacks a clear and physically interpretable framework for evaluating their effectiveness and for classifying and comparing them. Moreover, the grid-forming versus grid-following distinction that exists in AC systems has largely been unexplored in DC microgrids. To address this gap, this paper introduces three novel impedance-based indices that can be used to quantify the voltage-forming and current-forming behavior of a converter. The indices also provide a basis for defining the desired converter behavior that yields superior DC-bus voltage regulation performance. Simulation results illustrate the application of the framework to several representative control strategies and highlight the strengths and limitations of these control algorithms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.12804v3</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:journal_reference>Proc. 2026 IEEE 8th International Conference on DC Microgrids (ICDCM), Xi'an, China, 2026</arxiv:journal_reference>
      <dc:creator>Jovan Krajacic, Ognjen Stanojev, Mario Schweizer, Orcun Karaca, Gabriela Hug, Vladan Lazarevi\'c</dc:creator>
    </item>
    <item>
      <title>A Stable SBP-SAT FDTD Subgridding Method Without Region Split</title>
      <link>https://arxiv.org/abs/2604.14618</link>
      <description>arXiv:2604.14618v2 Announce Type: replace 
Abstract: A provably stable summation-by-parts simultaneous approximation term (SBP-SAT) finite-difference time-domain (FDTD) subgridding method without region split is proposed. By designing projection SBP operators tailored for embedded topological features and deriving the corresponding SAT boundary conditions, this approach guarantees long-time stability through discrete energy analysis. Unlike conventional SBP-SAT FDTD subgridding techniques that rely on aligned or multi-block configurations, the proposed method enables a direct coupling between an internal refined region and a single surrounding coarse-grid domain without introducing auxiliary blocks or causing domain fragmentation. Numerical results validate the efficiency, accuracy, and topological flexibility of the proposed method. Compared with existing multi-block SBP-SAT methods, this method effectively reduces computational complexity by minimizing SAT boundary conditions and improves calculation accuracy near grid interfaces.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.14618v2</guid>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuhui Wang, Langran Deng, Weibo Wu, Hanhong Liu, Xinyue Zhang, Xingqi Zhang, Jian Wang, Wei-Jie Wang, Zhizhang Chen, Shunchuan Yang</dc:creator>
    </item>
    <item>
      <title>GroupEnvoy: A Conversational Agent Speaking for the Outgroup to Foster Intergroup Relations</title>
      <link>https://arxiv.org/abs/2604.16095</link>
      <description>arXiv:2604.16095v3 Announce Type: replace 
Abstract: Conversational agents have the potential to support intergroup relations when psychological or linguistic barriers prevent direct interaction. Based on intergroup contact theory, we propose GroupEnvoy, a text-based conversational agent that represents outgroup perspectives during ingroup discussions. Its dialogue is grounded in data from a prior outgroup-only discussion. To evaluate this approach and derive design principles, we conducted a mixed-methods, between-subjects study with university students, in which host-country students formed the ingroup and international students formed the outgroup. Ingroup students performed a collaborative task while engaging with outgroup perspectives, either by interacting with GroupEnvoy (AI-mediated contact) or by reading a static document (passive exposure). Quantitatively, AI-mediated contact demonstrated a directional reduction in intergroup anxiety and an improvement in perspective-taking. Qualitatively, AI-mediated contact enhanced outcome expectancies and directed empathy toward the outgroup's evaluations of the ingroup, whereas passive exposure fostered future contact intentions and elicited empathy toward the outgroup's lived experiences. These findings present AI-mediated contact as a promising paradigm for improving intergroup relations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.16095v3</guid>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1145/3816046.3816204</arxiv:DOI>
      <dc:creator>Koken Hata, Rintaro Chujo, Reina Takamatsu, Wenzhen Xu, Yukino Baba</dc:creator>
    </item>
    <item>
      <title>ClawXiv: a signed archival workflow and distributed publication architecture for human--AI collaborative research</title>
      <link>https://arxiv.org/abs/2604.16476</link>
      <description>arXiv:2604.16476v2 Announce Type: replace 
Abstract: We propose \emph{ClawXiv}, a workflow and archive architecture for mixed human--AI research. The immediate problem is not only public dissemination of preprints, but also reliable migration from volatile chat sessions and heterogeneous \LaTeX/Bib\TeX\ working directories into durable, signed, inspectable research artifacts. ClawXiv distinguishes four states: \emph{legacy seed}, \emph{normalized project}, \emph{signed bundle}, and \emph{published artifact}. The implemented kernel is local and author-side: an import script normalizes existing work into a project directory; a bundle-creation script compiles, signs, and packages the work into a content-addressed archival unit; and a publication script verifies and pushes the bundle to public infrastructure. Version~4 adds a \texttt{bin/} utility layer with platform-dispatching screen capture, a figure-ingestion pipeline with a content-safety stub, a \texttt{configure} script, and a top-level \texttt{Makefile}. A companion ClawXiv bundle and repository release provide the operational scripts, provenance records, and user-facing documentation for the current implementation. Code is available at \texttt{github.com/kornai/clawxiv}.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.16476v2</guid>
      <category>cs.DL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Andras Kornai</dc:creator>
    </item>
    <item>
      <title>Medial Axis Aware Learning of Signed Distance Functions</title>
      <link>https://arxiv.org/abs/2604.16512</link>
      <description>arXiv:2604.16512v2 Announce Type: replace 
Abstract: We propose a novel variational method to compute a highly accurate global signed distance function (SDF) to a given point cloud. To this end, the jump set of the gradient of the SDF, which coincides with the medial axis of the surface, is explicitly taken into account through a higher-order variational formulation that enforces linear growth along the gradient direction away from this discontinuity set. The eikonal equation and the zero-level set of the SDF are enforced as constraints. To make this variational problem computationally tractable, a phase field approximation of Ambrosio-Tortorelli type is employed. The associated phase field function implicitly describes the medial axis. The method is implemented for surfaces represented by unoriented point clouds using neural network approximations of both the SDF and the phase field. Experiments demonstrate the method's accuracy both in the near field and globally. Quantitative and qualitative comparisons with other approaches show the advantages of the proposed method.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.16512v2</guid>
      <category>cs.CV</category>
      <category>cs.CG</category>
      <category>cs.GR</category>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Samuel Weidemaier, Christoph Norden-Smoch, Martin Rumpf</dc:creator>
    </item>
    <item>
      <title>AI Slop and the Software Commons</title>
      <link>https://arxiv.org/abs/2604.16754</link>
      <description>arXiv:2604.16754v2 Announce Type: replace 
Abstract: In this article, we argue that AI slop in software is creating a tragedy of the commons. Individual productivity gains from AI-generated content externalize costs onto reviewer capacity, codebase integrity, public knowledge resources, collaborative trust, and the talent pipeline. AI slop is cheap to generate and expensive to review, and the review layer is already thin. Commons problems are not solved by individual restraint. We outline concrete next steps for tool developers, team leads, and educators, grounded in Ostrom's design principles for enduring commons institutions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.16754v2</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sebastian Baltes, Marc Cheong, Christoph Treude</dc:creator>
    </item>
    <item>
      <title>Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems</title>
      <link>https://arxiv.org/abs/2604.17249</link>
      <description>arXiv:2604.17249v2 Announce Type: replace 
Abstract: Rowhammer on GPU DRAM has enabled adversarial bit flips in model weights; shared KV-cache blocks in LLM serving systems present an analogous but previously unexamined target. In vLLM's Prefix Caching, these blocks exist as a single physical copy without integrity protection. Using software fault injection under ideal bit targeting, we characterize worst-case severity and identify three properties: (1) Silent divergence - 13 of 16 BF16 bit positions produce coherent but altered outputs, indistinguishable from legitimate responses without a clean baseline. (2) Selective propagation - only requests sharing the targeted prefix are affected. (3) Persistent accumulation - no temporal decay occurs, so cumulative damage grows linearly with subsequent requests. Together, these constitute a threat profile distinct from weight corruption: silent divergence and selective propagation enable detection evasion; persistent accumulation then proceeds unchecked, yielding damage amplification bounded only by how long the block remains cached. A checksum-based countermeasure detects any single-bit corruption at scheduling time, bounding cumulative damage to one batch independent of the block's cache lifetime, with negligible overhead. These results argue for integrity protection of prefix blocks before end-to-end exploitation is demonstrated.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.17249v2</guid>
      <category>cs.CR</category>
      <category>cs.AR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuji Yamamoto, Satoshi Matsuura</dc:creator>
    </item>
    <item>
      <title>Capacity-Controlled Global Attention for Graph Transformers</title>
      <link>https://arxiv.org/abs/2604.17324</link>
      <description>arXiv:2604.17324v2 Announce Type: replace 
Abstract: Global self-attention drives modern graph transformers, yet the softmax at its core imposes a structural constraint rarely examined directly: every attention row is non-negative and sums to one, so each per-head output is a mass-conserving convex combination of value vectors. A node can never "attend to nothing." We argue this conservation constraint is a single root cause behind three pathologies usually studied in isolation: the collapse of node representations with depth (over-smoothing), a low-rank bottleneck on per-head outputs, and brittle optimization in deep stacks. Drawing on how sigmoid gating removes analogous attention sinks in language models, we introduce SigGate-GT, a graph transformer that applies a learned, per-head, input-conditioned sigmoid gate to the attention output inside the GraphGPS framework. The gate is a smooth, per-dimension "volume control" that can drive head outputs toward zero, relaxing the constraint without abandoning attention's probabilistic interpretation. Analytically and through synthetic experiments, we show the gate strictly increases the stable rank of per-head outputs, and connect this rank gain to all three manifestations. On five molecular and long-range benchmarks, SigGate-GT matches the prior best on ZINC (0.059 MAE), records the strongest result among the graph-transformer baselines we evaluate on ogbg-molhiv (82.47% ROC-AUC), and is competitive on ogbg-molpcba and the Long-Range Graph Benchmark, with statistically significant gains over GraphGPS on all five datasets (p &lt; 0.05). Mechanism analyses confirm the diagnosis: gating slows over-smoothing (a 30% mean relative gain in representation diversity across 4-16 layers), keeps attention entropy from collapsing, and stabilizes training across a 10x learning-rate range, at about 1% parameter overhead on OGB and under 3% wall-clock cost.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.17324v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yang Liu, Dongxin Guo, Tom Zheng, Siu Ming Yiu, Liam Ning, Jikun Wu</dc:creator>
    </item>
    <item>
      <title>EvoMaster: A Foundational Evolving Agent Framework for Agentic Science at Scale</title>
      <link>https://arxiv.org/abs/2604.17406</link>
      <description>arXiv:2604.17406v3 Announce Type: replace 
Abstract: The convergence of large language models and agents is catalyzing a new era of scientific discovery: Agentic Science. While the scientific method is inherently iterative, existing agent frameworks are predominantly static, narrowly scoped, and lack the capacity to learn from trial and error. To bridge this gap, we present EvoMaster, a foundational evolving agent framework engineered specifically for Agentic Science at Scale. Driven by the core principle of continuous self-evolution, EvoMaster empowers agents to iteratively refine hypotheses, self-critique, and progressively accumulate knowledge across experimental cycles, faithfully mirroring human scientific inquiry. Crucially, as a domain-agnostic base harness, EvoMaster is exceptionally easy to scale up -- enabling developers to build and deploy highly capable, self-evolving scientific agents for arbitrary disciplines in approximately 100 lines of code. Built upon EvoMaster, we incubated the SciMaster ecosystem across domains such as machine learning, physics, and general science. Evaluations on four authoritative benchmarks (Humanity's Last Exam, MLE-Bench Lite, BrowseComp, and FrontierScience) demonstrate that EvoMaster achieves state-of-the-art scores of 41.1%, 75.8%, 73.3%, and 53.3%, respectively. It comprehensively outperforms the general-purpose baseline OpenClaw with relative improvements ranging from +159% to +316%, robustly validating its efficacy and generality as the premier foundational framework for the next generation of autonomous scientific discovery. EvoMaster is available at https://github.com/sjtu-sai-agents/EvoMaster.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.17406v3</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xinyu Zhu, Yuzhu Cai, Zexi Liu, Cheng Wang, Fengyang Li, Wenkai Jin, Wanxu Liu, Zehao Bing, Bingyang Zheng, Jingyi Chai, Shuo Tang, Rui Ye, Yuwen Du, Xianghe Pang, Yaxin Du, Tingjia Miao, Yuzhi Zhang, Ruoxue Liao, Zhaohan Ding, Linfeng Zhang, Yanfeng Wang, Weinan E, Siheng Chen</dc:creator>
    </item>
    <item>
      <title>Topology-Aware LLM-Driven Social Simulation: A Unified Framework for Efficient and Realistic Agent Dynamics</title>
      <link>https://arxiv.org/abs/2604.18011</link>
      <description>arXiv:2604.18011v2 Announce Type: replace 
Abstract: Social simulation is essential for understanding collective human behavior by modeling how individual interactions give rise to large-scale social dynamics. Recent advances in large language models (LLMs) have enabled multi-agent frameworks with human-like reasoning and communication capabilities. However, existing LLM-based simulations treat social networks as fixed communication scaffolds, failing to leverage the structural signals that shape behavioral convergence and heterogeneous influence in real-world systems, which often leads to inefficient and unrealistic dynamics. To address this challenge, we propose TopoSim, a unified topology-aware social simulation framework that explicitly integrates structural reasoning into agent interactions along two complementary dimensions. First, TopoSim aligns agents with similar structural roles and interaction contexts into shared backbone units, enabling coordinated updates that reduce redundant computation while preserving emergent social dynamics. Second, TopoSim models social influence as a structure-induced signal, introducing heterogeneous interaction patterns grounded in network topology rather than uniform influence assumptions. Extensive experiments across three social simulation frameworks and diverse datasets demonstrate that TopoSim achieves comparable or improved simulation fidelity while reducing token consumption by 50 - 90%. Moreover, our approach more accurately reproduces key structural phenomena observed in real-world social systems and exhibits strong generalization and scalability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.18011v2</guid>
      <category>cs.SI</category>
      <category>cs.DB</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Yuwei Xu, Shulun Zhang, Yingli Zhou, Shipei Zeng, Laks V. S. Lakshmanan, Chenhao Ma</dc:creator>
    </item>
    <item>
      <title>The Topological Dual of a Dataset: A Logic-to-Topology Encoding for AlphaGeometry-Style Data</title>
      <link>https://arxiv.org/abs/2604.18050</link>
      <description>arXiv:2604.18050v2 Announce Type: replace 
Abstract: AlphaGeometry represents a milestone in neuro-symbolic reasoning, yet its architecture faces a log-linear scaling bottleneck within its symbolic deduction engine that limits its efficiency as problem complexity increases. Recent technical reports suggest that current domain-specific languages may be isomorphic as input representations to natural language, interchanging them acts as a performance-invariant transformation, implying that current neural guidance relies on superficial encodings rather than structural understanding. This paper addresses this representation bottleneck by proposing a logic-to-topology encoding designed to reveal the structural invariants of a model's latent space under a transformation of its input space. By leveraging the Logic of Observation, we utilize the duality between provability in observable theories and topologies to propose a logic-to-topology encoder for the input space. We introduce the concept of the "topological dual of a dataset", a transformation that bridges formal logic, topology, and neural processing. This framework serves as a Rosetta Stone for neuro-symbolic AI, providing a principled pathway for the mechanistic interpretability of how models navigate complex discovery paths.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.18050v2</guid>
      <category>cs.AI</category>
      <category>cs.LO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Anthony Bordg</dc:creator>
    </item>
    <item>
      <title>Multilingual Training and Evaluation Resources for Vision-Language Models</title>
      <link>https://arxiv.org/abs/2604.18347</link>
      <description>arXiv:2604.18347v2 Announce Type: replace 
Abstract: Vision Language Models (VLMs) achieved rapid progress in the recent years. However, despite their growth, VLMs development is heavily grounded on English, leading to two main limitations: (i) the lack of multilingual and multimodal datasets for training, and (ii) the scarcity of comprehensive evaluation benchmarks across languages. In this work, we address these gaps by introducing a new comprehensive suite of resources for VLMs training and evaluation spanning five European languages (English, French, German, Italian, and Spanish). We adopt a regeneration-translation paradigm that produces high-quality cross-lingual resources by combining curated synthetic generation and manual annotation. Specifically, we build Multi-PixMo, a training corpus obtained regenerating examples from Pixmo pre-existing datasets with permissively licensed models: PixMo-Cap, PixMo-AskModelAnything, and CoSyn-400k. On the evaluation side, we construct a set of multilingual benchmarks derived translating widely used English datasets (MMbench, ScienceQA, MME, POPE, AI2D). We assess the quality of these resources through qualitative and quantitative human analyses, measuring inter-annotator agreement. Additionally, we perform ablation studies to demonstrate the impact of multilingual data, with respect to English only, in VLMs training. Experiments, comprising 3 different models show that using multilingual, multimodal examples for training VLMs aids is consistently beneficial on non-English benchmarks, with positive transfer to English as well.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.18347v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Daniela Baiamonte, Elena Fano, Matteo Gabburo, Stefano Simonazzi, Leonardo Rigutini, Andrea Zugarini</dc:creator>
    </item>
    <item>
      <title>Explainable AML Triage with LLMs: Evidence Retrieval and Counterfactual Checks</title>
      <link>https://arxiv.org/abs/2604.19755</link>
      <description>arXiv:2604.19755v2 Announce Type: replace 
Abstract: Anti-money laundering (AML) transaction monitoring generates large volumes of alerts that must be rapidly triaged by investigators under strict audit and governance constraints. While large language models (LLMs) can summarize heterogeneous evidence and draft rationales, unconstrained generation is risky in regulated workflows due to hallucinations, weak provenance, and explanations that are not faithful to the underlying decision. We propose an explainable AML triage framework that treats triage as an evidence-constrained decision process. Our method combines (i) retrieval-augmented evidence bundling from policy/typology guidance, customer context, alert triggers, and transaction subgraphs, (ii) a structured LLM output contract that requires explicit citations and separates supporting from contradicting or missing evidence, and (iii) counterfactual checks that validate whether minimal, plausible perturbations lead to coherent changes in both the triage recommendation and its rationale. We evaluate on public synthetic AML benchmarks and simulators and compare against rules, tabular and graph machine-learning baselines, and LLM-only/RAG-only variants. Results show that evidence grounding substantially improves auditability and reduces numerical and policy hallucination errors, while counterfactual validation further increases decision-linked explainability and robustness, yielding the best overall triage performance (PR-AUC 0.75; Escalate F1 0.62) and strong provenance and faithfulness metrics (citation validity 0.98; evidence support 0.88; counterfactual faithfulness 0.76). These findings indicate that governed, verifiable LLM systems can provide practical decision support for AML triage without sacrificing compliance requirements for traceability and defensibility.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.19755v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dorothy Torres, Wei Cheng, Ke Hu</dc:creator>
    </item>
    <item>
      <title>Deconstructing Superintelligence: Identity, Self-Modification and Diff\'erance</title>
      <link>https://arxiv.org/abs/2604.19845</link>
      <description>arXiv:2604.19845v4 Announce Type: replace 
Abstract: Self-modification is routinely treated as constitutive of artificial superintelligence (\textbf{SI}), yet modification is a relative action requiring a \emph{supplement} outside the operation. We formalise this on an associative operator algebra $\mathcal{A}$ with update operator $\hat U$, difference operator $\hat D$, and self-representation operator $\hat R$, identifying the supplement with $\operatorname{Comm}(\hat U)$. A propagation theorem shows $[\hat U,\hat R]$ decomposes through $[\hat U,\hat D]$, so non-commutation propagates to self-representation. The liar paradox is the rank-one case $[\hat T,\Pi_L]=0$, and \emph{class $\mathbf{A}$} systems, in which $\hat U$ acts on $\hat D$, reproduce it at system scale, yielding a structure coinciding with Priest's inclosure schema and Derrida's \emph{diff\'erance}. Our results show that the strong self-modification taken to define superintelligence may undermine the persistent identity upon which such systems are premised.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.19845v4</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Elija Perrier</dc:creator>
    </item>
    <item>
      <title>FingerEye: Learning Dexterous Manipulation with Continuous Vision-Tactile Sensing</title>
      <link>https://arxiv.org/abs/2604.20689</link>
      <description>arXiv:2604.20689v3 Announce Type: replace 
Abstract: Dexterous robotic manipulation requires perception that remains informative from pre-contact approach to contact initiation and post-contact control. We introduce FingerEye, a sensing and learning framework that strengthens robotic dexterity through continuous vision-tactile feedback throughout interaction. On the sensing side, FingerEye integrates binocular RGB cameras with a compliant contact interface to support perception both before and after contact. Before contact, the fingertip cameras provide close-range visual cues and implicit stereo for precise approach and object localization. After contact, marker-tracked deformation of the compliant ring provides a proxy for contact wrench sensing. On the learning side, we build real-and-sim infrastructure for data collection and evaluation, systematically study policy-interface designs for learning with multiple FingerEye sensors, and develop FingerEye Policy, which applies group-structured modality fusion to reduce modality shortcuts and better exploit distributed fingertip feedback. Across seven contact-sensitive task settings, FingerEye improves wrist-only policy by over 30 percentage points in mean success rate in both simulation and the real world.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.20689v3</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhixuan Xu, Yichen Li, Xuanye Wu, Tianyu Qiu, Lin Shao</dc:creator>
    </item>
    <item>
      <title>Watts-per-Intelligence Part II: Algorithmic Catalysis</title>
      <link>https://arxiv.org/abs/2604.20897</link>
      <description>arXiv:2604.20897v2 Announce Type: replace 
Abstract: We develop a thermodynamic theory of algorithmic catalysis within the watts per intelligence framework, identifying reusable computational structures that reduce irreversible operations for a task class while satisfying bounded restoration and structural selectivity constraints. We prove that any class specific speed-up is upper-bounded by the algorithmic mutual information between the substrate and the class descriptor, and that encoding this information incurs a minimum thermodynamic cost via Landauer erasure. Combining these results yields a coupling theorem that lower-bounds the deployment horizon required for an algorithmic catalyst to be energetically favourable. The framework is illustrated on an affine SAT class and situates contemporary learned systems within an information thermodynamic constraint on intelligent computation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.20897v2</guid>
      <category>cs.IT</category>
      <category>cs.AI</category>
      <category>math.IT</category>
      <category>physics.comp-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Elija Perrier</dc:creator>
    </item>
    <item>
      <title>CodeGraphVLP: Code-as-Planner Meets Semantic-Graph State for Non-Markovian Vision-Language-Action Models</title>
      <link>https://arxiv.org/abs/2604.22238</link>
      <description>arXiv:2604.22238v2 Announce Type: replace 
Abstract: Vision-Language-Action (VLA) models promise generalist robot manipulation, but are typically trained and deployed as short-horizon policies that assume the latest observation is sufficient for action reasoning. This assumption breaks in non-Markovian long-horizon tasks, where task-relevant evidence can be occluded or appear only earlier in the trajectory, and where clutter and distractors make fine-grained visual grounding brittle. We present CodeGraphVLP, a hierarchical framework that enables reliable long-horizon manipulation by combining a persistent semantic-graph state with an executable code-based planner and progress-guided visual-language prompting. The semantic-graph maintains task-relevant entities and relations under partial observability. The synthesized planner executes over this semantic-graph to perform efficient progress checks and outputs a subtask instruction together with subtask-relevant objects. We use these outputs to construct clutter-suppressed observations that focus the VLA executor on critical evidence. On real-world non-Markovian tasks, CodeGraphVLP improves task completion over strong VLA baselines and history-enabled variants while substantially lowering planning latency compared to VLM-in-the-loop planning. We also conduct extensive ablation studies to confirm the contributions of each component.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.22238v2</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Khoa Vo, Sieu Tran, Taisei Hanyu, Yuki Ikebe, Duy Nguyen, Nghi D. Q. Bui, Minh Vu, Anthony Gunderman, Chase Rainwater, Anh Nguyen, Ngan Le</dc:creator>
    </item>
    <item>
      <title>Holo360D: A Large-Scale Real-World Dataset with Continuous Trajectories for Advancing Panoramic 3D Reconstruction and Beyond</title>
      <link>https://arxiv.org/abs/2604.22482</link>
      <description>arXiv:2604.22482v2 Announce Type: replace 
Abstract: While feed-forward 3D reconstruction models have advanced rapidly, they still exhibit degraded performance on panoramas due to spherical distortions. Moreover, existing panoramic 3D datasets are predominantly collected with 360 cameras fixed at discrete locations, resulting in discontinuous trajectories. These limitations critically hinder the development of panoramic feed-forward 3D reconstruction, especially for the multi-view setting. In this paper, we present Holo360D, a comprehensive dataset containing 109,495 panoramas paired with registered point clouds, meshes, and aligned camera poses. To our knowledge, Holo360D is the first large-scale dataset that provides continuous panoramic sequences with accurately aligned high-completeness depth maps. The raw data are initially collected using a 3D laser scanner coupled with a 360 camera. Subsequently, the raw data are processed with both online and offline SLAM systems. Furthermore, to enhance the 3D data quality, a post-processing pipeline tailored for the 360 dataset is proposed, including geometry denoising, mesh hole filling, and region-specific remeshing. Finally, we establish a new benchmark by fine-tuning 3D reconstruction models on Holo360D, providing key insights into effective fine-tuning strategies. Our results demonstrate that Holo360D delivers superior training signals and provides a comprehensive benchmark for advancing panoramic 3D reconstruction models. Datasets and Code will be made publicly available.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.22482v2</guid>
      <category>cs.CV</category>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Jing Ou, Zidong Cao, Yinrui Ren, Zhuoxiao Li, Jinjing Zhu, Tongyan Hua, Shuai Zhang, Hui Xiong, Wufan Zhao</dc:creator>
    </item>
    <item>
      <title>Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond</title>
      <link>https://arxiv.org/abs/2604.22748</link>
      <description>arXiv:2604.22748v2 Announce Type: replace 
Abstract: As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world model carries different meanings across research communities. We introduce a "levels x laws" taxonomy organized along two axes. The first defines three capability levels: L1 Predictor, which learns one-step local transition operators; L2 Simulator, which composes them into multi-step, action-conditioned rollouts that respect domain laws; and L3 Evolver, which autonomously revises its own model when predictions fail against new evidence. The second identifies four governing-law regimes: physical, digital, social, and scientific. These regimes determine what constraints a world model must satisfy and where it is most likely to fail. Using this framework, we synthesize over 400 works and summarize more than 100 representative systems spanning model-based reinforcement learning, video generation, web and GUI agents, multi-agent social simulation, and AI-driven scientific discovery. We analyze methods, failure modes, and evaluation practices across level-regime pairs, propose decision-centric evaluation principles and a minimal reproducible evaluation package, and outline architectural guidance, open problems, and governance challenges. The resulting roadmap connects previously isolated communities and charts a path from passive next-step prediction toward world models that can simulate, and ultimately reshape, the environments in which agents operate.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.22748v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Meng Chu, Xuan Billy Zhang, Kevin Qinghong Lin, Lingdong Kong, Jize Zhang, Teng Tu, Weijian Ma, Ziqi Huang, Senqiao Yang, Wei Huang, Yeying Jin, Zhefan Rao, Jinhui Ye, Xinyu Lin, Xichen Zhang, Qisheng Hu, Shuai Yang, Leyang Shen, Wei Chow, Yifei Dong, Fengyi Wu, Quanyu Long, Bin Xia, Shaozuo Yu, Mingkang Zhu, Wenhu Zhang, Jiehui Huang, Haokun Gui, Runyi Li, Shiyi Du, Xu Huang, Dong Huang, Rui Liu, Chenyu Tang, Xuhang Chen, Chengzu Li, Haoxuan Che, Long Chen, Qifeng Chen, Wenxuan Zhang, Wenya Wang, Xiaojuan Qi, Yang Deng, Yanwei Li, Mike Zheng Shou, Zhi-Qi Cheng, See-Kiong Ng, Ziwei Liu, Philip Torr, Jiaya Jia</dc:creator>
    </item>
    <item>
      <title>ML-Guided Primal Heuristics for Mixed Binary Quadratic Programs</title>
      <link>https://arxiv.org/abs/2604.23053</link>
      <description>arXiv:2604.23053v2 Announce Type: replace 
Abstract: Mixed Binary Quadratic Programs (MBQPs) are an important and complex set of problems in combinatorial optimization. As solving large-scale combinatorial optimization problems is challenging, primal heuristics have been developed to quickly identify high-quality solutions within a short amount of time. Recently, a growing body of research has also used machine learning to accelerate solution methods for challenging combinatorial optimization problems. Despite the increasing popularity of these ML-guided methods, a large body of work has focused on Mixed-Integer Linear Programs (MILPs). MBQPs are challenging to solve due to the combinatorial complexity coupled with nonlinearities. This work proposes ML-guided primal heuristics for Mixed Binary Quadratic Programs (MBQPs) by adapting and extending existing work on ML-guided MILP solution prediction to MBQPs. We introduce a new neural network architecture for MBQP solution prediction and a new training data collection procedure. Moreover, we extend existing loss functions in solution prediction and propose to combine contrastive and weighted cross-entropy losses. We evaluate the methods on standard and real-world MBQP benchmarks and show that the developed ML-guided methods significantly outperform existing primal heuristics and state-of-the-art solvers. Furthermore, models trained with our proposed extension with combined losses outperform other ML-based methods adapted from MILPs and improve generalization in cross-regional inference on a real-world wind farm layout optimization problem.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.23053v2</guid>
      <category>cs.LG</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Weimin Huang, Natalie M. Isenberg, J\'an Drgo\v{n}a, Draguna L Vrabie, Bistra Dilkina</dc:creator>
    </item>
    <item>
      <title>Urban Flood Observations: A hand-labeled training and validation dataset of post-flood inundation</title>
      <link>https://arxiv.org/abs/2604.23066</link>
      <description>arXiv:2604.23066v2 Announce Type: replace 
Abstract: Urban flooding affects lives and infrastructure worldwide. Mapping inundation in complex urban environments from satellite imagery remains challenging due to limited spatial resolution, infrequent acquisitions, and cloud cover. We present Urban Flood Observations (UFO), a global, hand-labeled dataset of post-flood inundation in diverse urban settings. UFO comprises 215 image chips (1024 by 1024 pixels) from 14 flood events between 2017 and 2021, derived from 3 m PlanetScope imagery. Each chip is annotated with two classes: 'inundated' (all visible surface water, including floodwater and pre-existing water bodies (permanent or seasonal)) and 'non-inundated'. To demonstrate the dataset's utility, we trained a segmentation model using leave-one-event-out cross-validation, achieving a mean Intersection over Union (IoU) of 77.3. We also used UFO to evaluate two widely used surface water products, the Sentinel-1-based NASA IMPACT model and Google's 10 m Dynamic World water class, which yielded IoUs of 44.1 and 48.1, respectively. UFO is publicly available to support the development and validation of urban inundation mapping methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.23066v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rohit Mukherjee, Hannah K. Friedrich, Beth Tellman, Ariful Islam, Zhijie Zhang, Jonathan Giezendanner, Upmanu Lall, Venkataraman Lakshmi</dc:creator>
    </item>
    <item>
      <title>Knee-xRAI: An Explainable AI Framework for Automatic Kellgren-Lawrence Grading of Knee Osteoarthritis</title>
      <link>https://arxiv.org/abs/2604.23435</link>
      <description>arXiv:2604.23435v2 Announce Type: replace 
Abstract: Grading knee osteoarthritis (KOA) on plain radiographs is poorly reproducible across readers. A single-grade disagreement on the Kellgren-Lawrence (KL) scale can alter surgical management or redirect a patient from conservative therapy to intra-articular injection. Meanwhile, deep learning models that outperform human readers often offer no explanation for their decisions. We present Knee-xRAI, a pipeline that decomposes the grading process by mimicking clinical radiological workflows. It independently measures joint space narrowing (JSN), osteophytes, and subchondral sclerosis, then combines these findings into an explainable KL grade. Specifically, a U-Net++ architecture quantifies JSN via contour segmentation, an SE-ResNet-50 multi-task network grades osteophytes per anatomical site on the OARSI scale, and a hybrid texture-CNN detects binary sclerosis. This pipeline yields a 50-dimensional feature vector evaluated via an XGBoost-SHAP classifier (Path A, audit) and a ConvNeXt hybrid predictor (Path B, deployed). On 8,260 OAI-derived radiographs, the JSN module achieved a Dice score of 0.8909 and an mJSW ICC of 0.8674. Path A reached a QWK of 0.6294 and an AUC of 0.8046, confirming the structured feature vector carries substantial diagnostic signal. Path B achieved a QWK of 0.8436 and an AUC of 0.9017. SHAP analysis identifies JSN as the dominant feature, with osteophytes adding a consistent increment and sclerosis contributing marginally. Removing JSN evidence collapses KL3-KL4 recall while early grades remain intact, aligning with the KL diagnostic criteria. Knee-xRAI grounds every prediction in an auditable chain of measured radiographic findings, providing clinical transparency at the point of care.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.23435v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Azmul A. Irfan, Nur Ahmad Khatim, Alfan Alfian Irfan, Achmad Zaki, Erike A. Suwarsono, Mansur M. Arief</dc:creator>
    </item>
    <item>
      <title>Learning is Revelation in Disguise: Optimal Regret and Equivalence Results for Dynamic Pricing</title>
      <link>https://arxiv.org/abs/2604.24093</link>
      <description>arXiv:2604.24093v2 Announce Type: replace 
Abstract: We study dynamic pricing where a seller repeatedly interacts with a strategic, non-myopic buyer who has a fixed private valuation and discounts future utility. Prior work focused exclusively on posted-price mechanisms, where the seller gives a take-it-or-leave-it offer. For our first result, we show that menu mechanisms consisting of allocation-payment contracts achieve $O(T_\gamma)$ regret, where $T_\gamma$ is the buyer's effective discounted time horizon. We also establish a $\Omega(T_\gamma)$ lower bound, demonstrating the bound is tight. Considering the geometric discounting buyer with a constant discount factor, our bound is $O(1)$, while prior bounds using posted-price mechanisms incur an unavoidable $\Omega(\log\log T)$ factor in regret. Our second contribution is more conceptual in nature. The problem of dynamic pricing sits at the intersection of two paradigms: learning with strategic agents in computer science / machine learning and revelation-principle-based mechanism design in economics, yet their relationship has remained unclear. We establish a fundamental equivalence: indirect learning-based mechanisms and direct revelation mechanisms achieve identical optimal regret. The adaptive, data-driven algorithms of online learning and explicit type elicitation are two languages towards solving the same problem.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.24093v2</guid>
      <category>cs.GT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shiliang Zuo</dc:creator>
    </item>
    <item>
      <title>Speech Enhancement Based on Drifting Models</title>
      <link>https://arxiv.org/abs/2604.24199</link>
      <description>arXiv:2604.24199v4 Announce Type: replace 
Abstract: We propose Speech Enhancement based on Drifting Models (DriftSE), a novel generative framework that formulates denoising as an equilibrium problem. Rather than relying on iterative sampling, DriftSE natively achieves one-step inference by evolving the pushforward distribution of a mapping function to directly match the clean speech distribution. This evolution is driven by a Drifting Field, a learned correction vector that guides samples toward the high-density regions of the clean distribution, which naturally facilitates training on unpaired data by matching distributions rather than paired samples. We investigate the framework under two formulations: a direct mapping from the noisy observation, and a stochastic conditional generative model from a Gaussian prior. Experiments on the VoiceBank-DEMAND benchmark demonstrate that DriftSE achieves high-fidelity enhancement in a single step, outperforming multi-step diffusion baselines and establishing a new paradigm for speech enhancement.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.24199v4</guid>
      <category>cs.SD</category>
      <category>cs.AI</category>
      <category>eess.AS</category>
      <category>eess.SP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Liang Xu, Diego Caviedes-Nozal, W. Bastiaan Kleijn, Longfei Felix Yan, Rasmus Kongsgaard Olsson</dc:creator>
    </item>
    <item>
      <title>RAS: a Reliability Oriented Metric for Automatic Speech Recognition</title>
      <link>https://arxiv.org/abs/2604.24278</link>
      <description>arXiv:2604.24278v4 Announce Type: replace 
Abstract: Automatic speech recognition systems often produce confident yet incorrect transcriptions under noisy or ambiguous conditions, which can be misleading for both users and downstream applications. Standard evaluation based on Word Error Rate focuses solely on accuracy and fails to capture transcription reliability. We introduce an abstention-aware transcription framework that enables ASR models to explicitly abstain from uncertain segments. To evaluate reliability under abstention, we propose RAS, a reliability-oriented metric that balances transcription informativeness and error aversion, with its trade-off parameter calibrated by human preference. We then train an abstention-aware ASR model through supervised bootstrapping followed by reinforcement learning. Our experiments demonstrate substantial improvements in transcription reliability while maintaining competitive accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.24278v4</guid>
      <category>cs.SD</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wenbin Huang, Yuhang Qiu, Bohan Li, Yiwei Guo, Jing Peng, Hankun Wang, Xie Chen, Kai Yu</dc:creator>
    </item>
    <item>
      <title>Advancing Ligand-based Virtual Screening and Molecular Generation with Pretrained Molecular Embedding Distance</title>
      <link>https://arxiv.org/abs/2604.24474</link>
      <description>arXiv:2604.24474v2 Announce Type: replace 
Abstract: Molecular similarity plays a central role in ligand-based drug discovery, such as virtual screening, analog searching, and goal-directed molecular generation. However, traditional similarity measures, ranging from fingerprint-based Tanimoto coefficients to 3D shape overlays, are often computationally expensive at scale or rely on hand-crafted molecular descriptors. Meanwhile, many deep learning approaches to similarity-aware design still depend on similarity-specific supervision or costly data curation, limiting their generality across targets. In this work, we propose pretrained embedding distance (PED) as an effective alternative, computed directly from pretrained molecular models without task-specific training. Experimental results show that PED exhibits distinct correlations with traditional similarity metrics, and performs effectively in both ranking molecules for virtual screening and guiding molecular generation via reward design. These findings suggest that pretrained molecular embeddings capture rich structural information and can serve as a promising and scalable similarity measurement for modern AI-aided drug discovery.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.24474v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shiyun Wa, Yifei Wang, Simone Sciabola, Ye Wang</dc:creator>
    </item>
    <item>
      <title>Skill Retrieval Augmentation for Agentic AI</title>
      <link>https://arxiv.org/abs/2604.24594</link>
      <description>arXiv:2604.24594v3 Announce Type: replace 
Abstract: As large language models (LLMs) evolve into agentic problem solvers, they increasingly rely on external, reusable skills to handle tasks beyond their native parametric capabilities. In existing agent systems, the dominant strategy for incorporating skills is to explicitly enumerate available skills within the context window. However, this strategy fails to scale: as skill corpora expand, context budgets are consumed rapidly, and the agent becomes markedly less accurate in identifying the right skill. To this end, this paper formulates Skill Retrieval Augmentation (SRA), a new paradigm in which agents dynamically retrieve, incorporate, and apply relevant skills from large external skill corpora on demand. To make this problem measurable, we construct a large-scale skill corpus and introduce SRA-Bench, the first benchmark for decomposed evaluation of the full SRA pipeline, covering skill retrieval, skill incorporation, and end-task execution. SRA-Bench contains 5,400 capability-intensive test instances and 636 manually constructed gold skills, which are mixed with web-collected distractor skills to form a large-scale corpus of 26,262 skills. Extensive experiments show that retrieval-based skill augmentation can substantially improve agent performance, validating the promise of the paradigm. At the same time, we uncover a fundamental gap in skill incorporation: current LLM agents tend to load skills at similar rates, regardless of whether a gold skill is retrieved or whether the task actually requires external capabilities. This shows that the bottleneck in skill augmentation lies not only in retrieval but also in the base model's ability to determine which skill to load and when external loading is actually needed. These findings position SRA as a distinct research problem and establish a foundation for the scalable augmentation of capabilities in future agent systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.24594v3</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Weihang Su, Jianming Long, Qingyao Ai, Qiaozhi He, Yichen Tang, Changyue Wang, Yiteng Tu, Yingbo Wang, Yiqun Liu</dc:creator>
    </item>
    <item>
      <title>Sketch2Arti: Sketch-based Articulation Modeling of CAD Objects</title>
      <link>https://arxiv.org/abs/2604.25781</link>
      <description>arXiv:2604.25781v3 Announce Type: replace 
Abstract: Articulation modeling aims to infer movable parts and their motion parameters for a 3D object, enabling interactive animation, simulation, and shape editing. In this paper, we present Sketch2Arti, the first sketch-based articulation modeling system for CAD objects. Our key observation is that designers naturally communicate articulation intent through lightweight sketches (e.g., arrows and strokes) that indicate how parts should move, yet translating such sketches into articulated 3D models remains largely manual. Sketch2Arti bridges this gap by enabling users to specify articulation through simple 2D sketches drawn from a chosen viewpoint. Given a CAD model and user sketches, our approach automatically discovers the corresponding movable parts and predicts their motion parameters, allowing iterative modeling of multiple articulations on complex objects with fine-grained control. Importantly, Sketch2Arti is trained in a category-agnostic manner without requiring object category information, leading to strong generalization to diverse objects beyond existing articulation datasets. Moreover, for shell models lacking interior structures, Sketch2Arti supports controllable internal completion guided by user sketches, generating plausible internal components consistent with the existing geometry and predicted motion constraints. Comprehensive experiments and user evaluations demonstrate the effectiveness, controllability, and generalization of Sketch2Arti. The code, dataset, and the prototype system are at https://arlo-yang.github.io/Sketch2Arti.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.25781v3</guid>
      <category>cs.CV</category>
      <category>cs.GR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Yi Yang, Hao Pan, Yijing Cui, Alla Sheffer, Changjian Li</dc:creator>
    </item>
    <item>
      <title>CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering</title>
      <link>https://arxiv.org/abs/2604.26176</link>
      <description>arXiv:2604.26176v4 Announce Type: replace 
Abstract: The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) has significantly advanced Knowledge Graph Question Answering (KGQA). However, existing LLM-driven KGQA systems act as stateless planners, generating retrieval plans in isolation without exploiting historical query patterns: analogous to a database system that optimizes every query from scratch without a plan cache. This fundamental design flaw leads to schema hallucinations and limited retrieval coverage. We propose CacheRAG, a systematic cache-augmented architecture for LLM-based KGQA that transforms stateless planners into continual learners. Unlike traditional database plan caching (which optimizes for frequency), CacheRAG introduces three novel design principles tailored for LLM contexts: (1) Schema-agnostic user interface: A two-stage semantic parsing framework via Intermediate Semantic Representation (ISR) enables non-expert users to interact purely in natural language, while a Backend Adapter grounds the LLM with local schema context to compile executable physical queries safely. (2) Diversity-optimized cache retrieval: A two-layer hierarchical index (Domain $\rightarrow$ Aspect) coupled with Maximal Marginal Relevance (MMR) maximizes structural variety in cached examples, effectively mitigating reasoning homogeneity. (3) Bounded heuristic expansion: Deterministic depth and breadth subgraph operators with strict complexity guarantees significantly enhance retrieval recall without risking unbounded API execution. Extensive experiments on multiple benchmarks demonstrate that CacheRAG significantly outperforms state-of-the-art baselines (e.g., +13.2% accuracy and +17.5% truthfulness on the CRAG dataset).</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26176v4</guid>
      <category>cs.DB</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yushi Sun, Lei Chen</dc:creator>
    </item>
    <item>
      <title>Structure-Aware Tensorial Model Reduction</title>
      <link>https://arxiv.org/abs/2604.26280</link>
      <description>arXiv:2604.26280v2 Announce Type: replace 
Abstract: This work investigates a two-stage method for constructing projection-based reduced-order models (ROMs) of parameterized partial differential equations (PDEs). Based on established tensorial ROM methodology, the proposed approach reduces dimensionality offline by encoding solution snapshots using a multi-linear Tucker factorization, so that a reduced basis which varies nonlinearly with PDE parameters can be rapidly constructed online and used in a Galerkin ROM. Two novel extensions of this strategy, tailored to the cases of structured PDEs and sparse parameter sampling, are presented: the construction of reduced bases orthonormalized with respect to a general discrete inner product, and the interpolation of encoded states via radial basis functions. Basic representation and ROM error estimates are presented demonstrating the validity of these modifications, and the approach is challenged on examples where monolithic-basis ROMs are known to struggle, including a realistic instance of Maxwell's equations in 3D. Results suggest that the proposed nonlinear basis ROM can effectively mitigate linear restrictions on Kolmogorov $n$-width while improving upon previous tensorial ROM technology, particularly in the highly nonlinear and data-limited regimes characteristic of practical use cases.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26280v2</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <category>math.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Arjun Vijaywargiya, Eric C. Cyr, Anthony Gruber</dc:creator>
    </item>
    <item>
      <title>Distributional Learning of Graph Languages Generated by Fixed-Interface Clause Systems</title>
      <link>https://arxiv.org/abs/2604.26333</link>
      <description>arXiv:2604.26333v2 Announce Type: replace 
Abstract: Distributional learning provides a useful framework for studying the learnability of structured languages from positive data. In this paper, we extend this framework to graph languages generated by fixed-interface clause systems (FICSs).
  We formulate FICSs explicitly and study the corresponding learning problem under positive presentations and membership queries. We consider a bounded class of graph languages satisfying the finite context property (FCP) under a bounded-degree assumption. The bounds are expressed by the degree bound $\Delta$ together with five structural parameters $m,s,t,w$, and $d$, which control the clause-system structure, interface ranks, and local head-frame complexity.
  The learning algorithm constructs hypotheses from ordered boundary representations induced by the observed positive examples. These representations make explicit the interface information needed to compare contexts and to test candidate clauses by membership queries. We prove that target contexts eventually appear in the observed sample, target clauses are reconstructed over the corresponding predicate representatives, and spurious non-fact clauses are eventually excluded. Consequently, for every fixed parameter tuple, the target language is identifiable in the limit from positive data and membership queries.
  We also prove that the learner has polynomial-time update on $\FICSLFCP_{\Delta}(m,s,t,w,d)$: at each stage, only polynomially many ordered boundary representations, predicate symbols, clause candidates, and membership queries are needed. Overall, the paper gives a parameterized reformulation of distributional learning for interface-based graph languages in a fixed-interface setting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26333v2</guid>
      <category>cs.FL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Takayoshi Shoudai, Satoshi Matsumoto, Yusuke Suzuki, Tomoyuki Uchida</dc:creator>
    </item>
    <item>
      <title>Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction</title>
      <link>https://arxiv.org/abs/2604.26498</link>
      <description>arXiv:2604.26498v3 Announce Type: replace 
Abstract: The rapid growth of molecular foundation models and large language models (LLMs) has encouraged a scale centred view of AI in drug discovery, in which larger pretrained models are expected to supersede compact cheminformatics models. We test this assumption across 26 ADME, toxicity and bioactivity endpoints, covering 165,541 endpoint level compound label records. The benchmark contains 78 endpoint and split entries evaluated under random, Murcko scaffold and structure separated 5-fold cross validation protocols, representing increasing chemical generalization difficulty. Across 156 task and metric comparisons, classical machine learning (ML) provides the largest share of best performing entries (47.4%), followed by pretrained molecular sequence models (28.8%), graph neural networks (21.8%) and LLM based SAR baselines (1.9%). Classical ML dominates random split interpolation and remains the largest winner family overall. GNN and sequence models are competitive in selected harder splits, but their strict winner shares decrease under a fixed final-window readout, indicating sensitivity to training settings and model selection. Paired bootstrap analyses show that small numerical differences between individual models should not be read as decisive victories. SAR knowledge from training folds improves GPT5.5-SAR and Opus4.7-SAR metrics but does not make rule based reasoning a universal substitute for supervised predictors. Compact specialized models remain highly effective, and predictive performance depends on the fit among model, task and validation scenario, not on scale alone.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26498v3</guid>
      <category>cs.LG</category>
      <category>q-bio.QM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jinjiang Guo, Sheng Ding</dc:creator>
    </item>
    <item>
      <title>Exploring Converter Control Duality in Microgrids: AC Grid-Forming vs DC Droop Control</title>
      <link>https://arxiv.org/abs/2604.26595</link>
      <description>arXiv:2604.26595v2 Announce Type: replace 
Abstract: Power electronic converters are fundamental building blocks of both AC and DC microgrids, enabling the integration of renewable energy sources, energy storage systems, electronic loads, and electric vehicles. In contrast, converter control in DC microgrids has developed along the path of droop control, which is widely adopted for decentralized DC-bus voltage regulation and power sharing. Although these control strategies share certain characteristics, their similarities remain largely unexplored due to the distinct physical domains in which they operate. To bridge this gap, we introduce a novel perspective based on the concept of duality to reveal the underlying isomorphism between the two control approaches. We show that AC grid-forming and DC I--V droop control are duals of each other in several aspects, including: (i) the small-signal model of the converter; (ii) the inner current control structure; (iii) power-sharing mechanisms based on the AC swing equation and DC capacitor power balance; and (iv) disturbance signals and dynamic response. Theoretical analysis, validated through simulations on simple converter setups, illustrates these dualities and provides new insights towards a unified control design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26595v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:journal_reference>Proc. 2026 IEEE 8th International Conference on DC Microgrids (ICDCM), Xi'an, China, 2026</arxiv:journal_reference>
      <dc:creator>Jovan Krajacic, Ognjen Stanojev, Mario Schweizer, Orcun Karaca, Gabriela Hug, Vladan Lazarevi\'c</dc:creator>
    </item>
    <item>
      <title>Simple Self-Conditioning Adaptation for Masked Diffusion Models</title>
      <link>https://arxiv.org/abs/2604.26985</link>
      <description>arXiv:2604.26985v2 Announce Type: replace 
Abstract: Masked diffusion models (MDMs) generate discrete sequences by iterative denoising under an absorbing masking process. In standard masked diffusion, if a token remains masked after a reverse update, the model discards its clean-state prediction for that position. Thus, still-masked positions must be repeatedly inferred from the mask token alone. This design choice limits cross-step refinement. To address this limitation, this paper proposes a simple, yet effective, post-training adaptation for MDMs that conditions each denoising step on the model's own previous clean-state predictions. The resulting method, called Self-Conditioned Masked Diffusion Models (SCMDM), requires minimal architectural change, does not introduce a recurrent latent-state pathway, does not rely on an auxiliary reference model, and adds no extra denoiser evaluations during sampling. This is an important departure from partial self-conditioning approaches which requires expensive model training from scratch. In particular, the paper shows that partial self-conditioning, including the commonly used 50% dropout strategy for training self-conditioned models from scratch, is suboptimal in the post-training regime. Instead, once the model's self-generated clean-state estimates become informative, the specialization to refinement is preferable to mixing conditional and unconditional objectives. SCMDM is evaluated across multiple domains, demonstrating consistent improvement over vanilla MDM baselines, achieving nearly a 50% reduction in generative perplexity on OWT-trained models (42.89 to 23.72), alongside strong improvements in discretized image synthesis quality, small molecular generation, and enhanced fidelity in genomic distribution modeling.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26985v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Michael Cardei, Huu Binh Ta, Ferdinando Fioretto</dc:creator>
    </item>
    <item>
      <title>State-Dependent Lyapunov Analysis of Rank-1 Matrix Factorization</title>
      <link>https://arxiv.org/abs/2604.26993</link>
      <description>arXiv:2604.26993v2 Announce Type: replace 
Abstract: We study gradient descent for rank-1 matrix factorization through a state-dependent Lyapunov perspective. The central object is a parameterized quadratic certificate $I(\delta;\,\cdot)$ whose boundary-inward property induces a monotone state parameter $\delta_t$, thereby certifying that the trajectory is confined to a shrinking family of level sets. For certified initializations below the critical step size, this mechanism proves convergence to global minimizers. Above the critical step size, the same monotone-state mechanism instead leads to a balanced terminal regime; for a range of post-critical step sizes, the reduced dynamics exhibit period-2 behavior consistent with edge-of-stability phenomena.
  We further show that the scalar certificate is not an ad hoc algebraic construction: under structural axioms and a natural state-parameter normalization, it is uniquely determined by the monotonicity mechanism. Numerical experiments suggest that this state-dependent Lyapunov mechanism persists beyond the proved cases, including two-dimensional rank-1 approximation and quartic augmentations of scalar factorization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.26993v2</guid>
      <category>math.NA</category>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jaehong Moon</dc:creator>
    </item>
    <item>
      <title>Few-Shot Synthetic Accented Speech for ASR Fine-Tuning: What Helps and When?</title>
      <link>https://arxiv.org/abs/2604.27273</link>
      <description>arXiv:2604.27273v2 Announce Type: replace 
Abstract: Synthetic accented speech is a promising way to improve automatic speech recognition (ASR) when real accented recordings are scarce. We ask what makes such data useful for ASR fine-tuning: target-accent phoneme edits that expose the recognizer to accent-specific pronunciations, or random phoneme perturbations that act as augmentation in phoneme space. In a few-shot TTS pipeline, we compare LLM-generated accent edits with matched-rate random substitutions and oracle controls using ground-truth accented phonemes and prosody. Random substitutions recover much of the ASR gain: LLM target-accent edits improve over random by only a small margin, ground-truth phonemes stay close to the random baseline and nearly converge with it as the synthetic ASR fine-tuning set grows larger, and adding ground-truth prosody yields only a modest further gain. Mixing synthetic with real accented speech also stabilizes low-resource fine-tuning, but a fixed synthetic budget can later dilute the information in real data, showing that the real--synthetic ratio matters.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27273v2</guid>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yurii Halychanskyi, Nimet Beyza Bozdag, Mark Hasegawa-Johnson, Dilek Hakkani-T\"ur, Volodymyr Kindratenko</dc:creator>
    </item>
    <item>
      <title>Secure Cross-Silo Synthetic Genomic Data Generation</title>
      <link>https://arxiv.org/abs/2604.27456</link>
      <description>arXiv:2604.27456v2 Announce Type: replace 
Abstract: Access to genomic data is highly regulated due to its sensitive nature. While safeguards are essential, cumbersome data access processes pose a significant barrier to the development of AI methods for genomics. Synthetic data generation can mitigate this tension by enabling broader data sharing without exposing sensitive information. Synthetic genomic data are produced by training generative models on real data and subsequently sampling artificial data that preserves relevant statistics while limiting disclosures about the underlying individuals. In some settings, a single data holder may have sufficient data to train such generative models; however, in many applications data must be combined across multiple sites to achieve adequate scale. This need arises, e.g., in rare disease studies, where individual hospitals typically hold data for only a small number of patients. The solution we present in this paper enables multiple data holders to jointly train a synthetic data generator without revealing their raw data. Our approach combines secure multiparty computation (MPC) to ensure input privacy, so that no party ever discloses its data in unencrypted form, with differential privacy (DP) to provide output privacy by mitigating information leakage from the released synthetic data. We empirically demonstrate the effectiveness of the proposed method by generating high-utility synthetic datasets from multiple real RNA-seq cohorts in federated settings, showing that our approach enables privacy-preserving data synthesis even when data are distributed across institutions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27456v2</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Daniil Filienko, Martine De Cock, Sikha Pentyala</dc:creator>
    </item>
    <item>
      <title>EdgeFM: Efficient Edge Inference for Vision-Language Models</title>
      <link>https://arxiv.org/abs/2604.27476</link>
      <description>arXiv:2604.27476v2 Announce Type: replace 
Abstract: Vision-language models (VLMs) have demonstrated strong applicability in edge industrial applications, yet their deployment remains severely constrained by requirements for deterministic low latency and stable execution under resource limitations. Existing frameworks either rely on bloated general-purpose designs or force developers into opaque, hardware-specific closed-source ecosystems, leading to hardware lock-in limitation and poor cross-platform adaptability. Observing that modern AI agents can efficiently search and tune configurations to generate highly optimized low-level kernels for standard LLM operators, we propose EdgeFM, a lightweight, agent-driven VLM/LLM inference framework tailored for cross-platform industrial edge deployment. EdgeFM removes non-essential features to reduce single-request latency, and encapsulates agent-tuned kernel optimizations as a modular library of reusable skills. By allowing direct invocation of these skills rather than waiting for closed-source implementations, it effectively closes the performance gap long dominated by proprietary toolchains. The framework natively supports mainstream platforms including x86 and NVIDIA Orin SoCs, and represents the first end-to-end VLA deployment on the domestic Horizon Journey platform, enhancing cross-platform portability. In most cases, it yields clearly better inference performance than conventional vendor-specific toolchains, achieving up to 1.49 times speedup over TensorRT-Edge-LLM on the NVIDIA Orin platform. Experimental results show that EdgeFM delivers favorable end-to-end inference performance, providing an open-source, production-grade solution for diverse edge industrial scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27476v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mengling Deng, Yuanpeng Chen, Sheng Yang, Wei Tao, Wenhai Zhang, Hui Song, Linyuanhao Qin, Kai Zhao, Xiaojun Ye, Shanhui Mo, Jingli Fan, Shuang Zhang, Bei Liu, Tiankun Zhao, Xiangjing An</dc:creator>
    </item>
    <item>
      <title>Hyper-Dimensional Fingerprints as Molecular Representations</title>
      <link>https://arxiv.org/abs/2604.27810</link>
      <description>arXiv:2604.27810v2 Announce Type: replace 
Abstract: Computational molecular representations underpin virtual screening, property prediction, and materials discovery. Conventional fingerprints are efficient and deterministic but lose structural information through hash-based compression, particularly at low dimensionalities. Learned representations from graph neural networks recover this expressiveness but require task-specific training and substantial computational resources. Here we introduce hyperdimensional fingerprints (HDF), which replace the learned transformations of message-passing neural networks with algebraic operations on high-dimensional vectors, producing deterministic molecular representations without any training. Across diverse property prediction benchmarks, HDF outperforms conventional fingerprints in the majority of tasks while exhibiting greater consistency across datasets and models. Crucially, HDF embeddings preserve molecular similarity faithfully: at 32 dimensions, distances in HDF space achieve a 0.9 Pearson correlation with graph edit distance, compared to 0.55 for Morgan fingerprints at equivalent size. This structural fidelity persists at low dimensions where hash-based methods degrade, allowing simple nearest-neighbor regression to remain predictive with as few as 64 components. We further demonstrate the practical impact in Bayesian molecular optimization, where HDF-based surrogate models achieve substantially improved sample efficiency in regimes where Morgan fingerprints perform comparably to random search. HDF thus provides a general-purpose, training-free alternative to conventional molecular fingerprints, suggesting that the information loss long accepted as inherent to fixed-length fingerprints is a limitation of the hash-based encoding scheme rather than the fingerprint paradigm itself.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27810v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jonas Teufel, Luca Torresi, Andr\'e Eberhard, Pascal Friederich</dc:creator>
    </item>
    <item>
      <title>When Do Diffusion Models learn to Generate Multiple Objects?</title>
      <link>https://arxiv.org/abs/2605.00273</link>
      <description>arXiv:2605.00273v2 Announce Type: replace 
Abstract: Text-to-image diffusion models achieve impressive visual fidelity, yet they remain unreliable in multi-object generation. Despite extensive empirical evidence of these failures, the underlying causes remain unclear. We begin by asking how much of this limitation arises from the data itself. To disentangle data effects, we consider two regimes across different dataset sizes: (1) concept generalization, where each individual concept is observed during training under potentially imbalanced data distributions, and (2) compositional generalization, where specific combinations of concepts are systematically held out. To study these regimes, we introduce mosaic (Multi-Object Spatial relations, AttrIbution, Counting), a controlled framework for dataset generation. By training diffusion models on mosaic, we find that scene complexity plays a dominant role rather than concept imbalance, and that counting is uniquely difficult to learn in low-data regimes. Moreover, compositional generalization collapses as more concept combinations are held out during training. These findings highlight fundamental limitations of diffusion models and motivate stronger inductive biases and data design for robust multi-object compositional generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00273v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yujin Jeong, Arnas Uselis, Iro Laina, Seong Joon Oh, Anna Rohrbach</dc:creator>
    </item>
    <item>
      <title>DynamicPO: Dynamic Preference Optimization for Recommendation</title>
      <link>https://arxiv.org/abs/2605.00327</link>
      <description>arXiv:2605.00327v2 Announce Type: replace 
Abstract: In large language model (LLM)-based recommendation systems, direct preference optimization (DPO) effectively aligns recommendations with user preferences, requiring multi-negative objective functions to leverage abundant implicit-feedback negatives and sharpen preference boundaries. However, our empirical analyses reveal a counterintuitive phenomenon, preference optimization collapse, where increasing the number of negative samples can lead to performance degradation despite a continuously decreasing training loss. We further theoretically demonstrate that this collapse arises from gradient suppression, caused by the dominance of easily discriminable negatives over boundary-critical negatives that truly define user preference boundaries. As a result, boundary-relevant signals are under-optimized, weakening the model's decision boundary. Motivated by these observations, we propose DynamicPO (Dynamic Preference Optimization), a lightweight and plug-and-play framework comprising two adaptive mechanisms: Dynamic Boundary Negative Selection, which identifies and prioritizes informative negatives near the model's decision boundary, and Dual-Margin Dynamic beta Adjustment, which calibrates optimization strength per sample according to boundary ambiguity. Extensive experiments on three public datasets show that DynamicPO effectively prevents optimization collapse and improves recommendation accuracy on multi-negative preference optimization methods, with negligible computational overhead. Our code and datasets are available at https://github.com/xingyuHuxingyu/DynamicPO.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00327v2</guid>
      <category>cs.IR</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xingyu Hu, Kai Zhang, Jiancan Wu, Shuli Wang, Chi Wang, Wenshuai Chen, Yinhua Zhu, Haitao Wang, Xingxing Wang, Xiang Wang</dc:creator>
    </item>
    <item>
      <title>From Backward Spreading to Forward Replay: Revisiting Target Construction in LLM Parameter Editing</title>
      <link>https://arxiv.org/abs/2605.00358</link>
      <description>arXiv:2605.00358v2 Announce Type: replace 
Abstract: LLM parameter editing methods commonly rely on computing an ideal target hidden-state at a target layer (referred as anchor point) and distributing the target vector to multiple preceding layers (commonly known as backward spreading) for cooperative editing. Although widely used for a long time, its underlying basis have not been systematically investigated. In this paper, we first conduct a systematic study of its foundations, which helps clarify its capability boundaries, practical considerations, and potential failure modes. Then, we propose a simple and elegant alternative that replaces backward spreading with forward-propagation. Instead of optimizing the target at the last editing layer, we optimize the anchor point at the first editing layer, and then propagate it forward to obtain accurate and mutually compatible target hidden-states for all subsequent editing layers. This approach achieves the same computational complexity as existing methods while producing more accurate layer-wise targets. Our method is simple, without interfering with either the computation of the initial target hidden state or any other components of the subsequent editing pipeline, and thus constituting a benefit for a wide range of LLM parameter editing methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00358v2</guid>
      <category>cs.CL</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wei Liu, Hongkai Liu, Zhiying Deng, Yee Whye Teh, Wee Sun Lee</dc:creator>
    </item>
    <item>
      <title>Label-Conditioned Cross-Modal Fusion for Adult-to-Pediatric ECG Transfer via Curriculum-Gated Contrastive Alignment</title>
      <link>https://arxiv.org/abs/2605.00647</link>
      <description>arXiv:2605.00647v2 Announce Type: replace 
Abstract: Automated pediatric electrocardiogram (ECG) interpretation remains challenging because developmental differences in heart rate, intervals, and waveforms limit the transferability of models trained mainly on adult data, while expert-labeled pediatric ECG cohorts are scarce. We propose PEACE (Pediatric-Adult ECG Alignment via Cross-modal Enhancement), an adult-to-pediatric ECG transfer framework pretrained on MIMIC-IV ECGs and adapted to pediatric targets. PEACE integrates label-specific bidirectional contrastive learning (LSBC) to align ECG representations with diagnostic semantics and curriculum adaptive fusion (CAF) to stabilize optimization under limited pediatric supervision. Label-conditioned short text descriptors provide auxiliary semantic supervision during training, whereas inference requires ECG signals only. On ZZU-pECG, PEACE achieves macro-average AUCs of 59.39%, 81.74%, and 91.56% under zero-shot, 50-shot, and full fine-tuning settings, respectively, outperforming ECG-only, multimodal, and generic domain adaptation baselines including DANN and MMD. On PTB-XL, it reaches 96.90% macro-average AUC after full fine-tuning over nine harmonized labels with nonzero mapped incidence. Gradient-based attention maps show increased saliency around QRS voltage and morphology regions for chamber-related RVH and around QRS-to-T/repolarization intervals for LQTS, broadly consistent with ECG regions commonly inspected during routine interpretation. These results suggest that adult-scale ECG pretraining coupled with rhythm, morphology, and ST-T repolarization semantic descriptors improves transferable pediatric diagnosis under label scarcity while preserving clinically interpretable waveform focus.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.00647v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Xinran Liu, Yuwen Li, Hongxiang Gao, Heyang Xu, Jianqing Li, Zongmin Wang, Chengyu Liu</dc:creator>
    </item>
    <item>
      <title>CADFit: Precise Mesh-to-CAD Program Generation with Hybrid Optimization</title>
      <link>https://arxiv.org/abs/2605.01171</link>
      <description>arXiv:2605.01171v3 Announce Type: replace 
Abstract: Despite recent progress, recovering parametric CAD construction sequences from geometric input, such as meshes or point clouds, is a key challenge for design and manufacturing, as existing CAD reconstruction and generation methods are largely restricted to difficult-to-edit formats like meshes or Breps or editable simple sketch-and-extrude pipelines and low-complexity datasets. We introduce CADFit, a hybrid optimization-based CAD reconstruction framework that recovers complex, editable CAD construction sequences from meshes by incrementally fitting and validating parametric operations using geometric feedback. Our approach is distinguished by formulating reconstruction as an IoU-driven optimization over structured CAD programs and supporting a rich set of operations, including extrusions, revolutions, fillets, and chamfers. Experiments on multiple CAD benchmarks show that CADFit outperforms state-of-the-art mesh-to-CAD methods in volumetric Intersection-over-Union and Chamfer Distance, while substantially reducing the Invalid Ratio of reconstructed CAD programs, particularly for complex designs. We further present a multimodal pipeline that enables end-to-end reconstruction of CAD construction sequences from images by combining image-based geometry reconstruction with CADFit. By enabling accurate reconstruction of higher-complexity CAD models, CADFit provides a practical foundation for generating richer datasets and advancing future learning-based approaches to CAD reverse engineering. The code is available at: https://github.com/ghadinehme/CADFit.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01171v3</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ghadi Nehme, Eamon Whalen, Faez Ahmed</dc:creator>
    </item>
    <item>
      <title>PACE: Post-Causal Entropy Modeling for Learned LiDAR Point Cloud Compression</title>
      <link>https://arxiv.org/abs/2605.01320</link>
      <description>arXiv:2605.01320v2 Announce Type: replace 
Abstract: LiDAR point cloud compression is vital for autonomous systems to handle massive data from high-resolution sensors. While learned entropy modeling built upon octree structures yields high compression gains, it faces two critical bottlenecks: 1) prohibitive latency, particularly during decoding, caused by causal, multi-stage context modeling; and 2) a rigid performance-latency trade-off, preventing a single model from adapting to varying constraints. These limitations stem from the tight coupling between the context aggregation backbone and probability prediction. To address this, we propose PACE, a new framework that reformulates ancestral context aggregation as a non-causal backbone and confines causality to a lightweight, stage-scalable predictor, eliminating repetitive backbone executions and reducing computational overhead. The predictor supports an arbitrary number of prediction stages, enabling seamless adaptation across diverse performance-latency trade-offs without reloading parameters. Experiments demonstrate that PACE sets a new state-of-the-art in compression efficiency, achieving notable BD-BR savings and reducing decoding latency by over 90\% in autoregressive mode, making it attractive for practical applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01320v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiahao Zhu, Kang You, Dandan Ding, Zhan Ma</dc:creator>
    </item>
    <item>
      <title>Sequential Minimal Optimization for $\varepsilon$-SVR with MAPE Loss and Sample-Dependent Box Constraints</title>
      <link>https://arxiv.org/abs/2605.01446</link>
      <description>arXiv:2605.01446v3 Announce Type: replace 
Abstract: Support vector regression with Mean Absolute Percentage Error (MAPE) loss is theoretically well-motivated for forecasting applications where accuracy is evaluated in relative terms, but the sample-dependent dual box constraints it induces have not been addressed in the published SMO literature. We derive a Sequential Minimal Optimization algorithm for this setting and prove a structural-invariance result: the MAPE modification affects exactly two components of the SMO iteration -- working-set selection and analytic-update clipping -- leaving gradient bookkeeping and curvature computation identical to classical epsilon-SVR. Building on this invariance, we establish four efficiency improvements (asymmetric freeze-counters, warm-starting, block working-set updates of size four, and per-pair tolerance scaling) and resolve a previously-open convergence problem for the odd-symmetry kernel variant via adaptive spectral regularization. Numerical validation against three reference solvers across eleven synthetic configurations certifies solution agreement within standard tolerance. Wall-time benchmarks show the present algorithm achieves the lowest median runtime on every tested configuration against OSQP, MOSEK, and Clarabel. At production scale, the algorithm converges on the California Housing benchmark while the patched LIBSVM reference implementation reaches its iteration ceiling without satisfying optimality -- demonstrating the practical necessity of the theoretical efficiency mechanisms. An open-source R package and an explicit solver-adaptation recipe are provided.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01446v3</guid>
      <category>math.NA</category>
      <category>cs.NA</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Pablo Benavides-Herrera, Riemann Ruiz-Cruz, Juan Diego S\'anchez-Torres</dc:creator>
    </item>
    <item>
      <title>Learning Behavioral Signals from Encrypted Smartphone Network Traffic</title>
      <link>https://arxiv.org/abs/2605.01616</link>
      <description>arXiv:2605.01616v2 Announce Type: replace 
Abstract: Human behavior is challenging to measure continuously at scale, yet traces of daily routines and well-being may be reflected in interactions with personal devices. We investigate whether encrypted smartphone network traffic can serve as a passive sensing signal for behavioral states related to sleep disturbance, stress, and loneliness. To capture both population-level patterns and individual-specific behavior, we employ a transformer-based model with user-specific adapters that learns representations of network activity while accounting for personal baselines and deviations from them. To improve interpretability, we further analyze these representations using sparse representation learning to identify latent behavioral features associated with distinct activity patterns. We relate the resulting features to sleep disturbance, stress, and loneliness using generalized estimating equations with Mundlak decomposition, enabling separation of stable between-person differences from within-person changes over time. Our analysis reveals that the three outcomes are characterized by different temporal dynamics: stress is predominantly associated with persistent between-person variation, loneliness is more strongly linked to within-person fluctuations, and sleep disturbance reflects a combination of both. Importantly, these within-person behavioral signals are not recovered by conventional handcrafted network-traffic features, highlighting the advantages of learned representations for longitudinal behavioral modeling. Overall, our findings demonstrate that encrypted network traffic contains interpretable behavioral information and can support passive, scalable monitoring of behavioral dynamics, particularly changes relative to an individual's typical pattern of activity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01616v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Rameen Mahmood, Omar El Shahawy, Souptik Barua, Zachary Beattie, Jeffrey Kaye, Xuhai "Orson'' Xu, Chao-Yi Wu, Danny Yuxing Huang</dc:creator>
    </item>
    <item>
      <title>Embody4D: A Generalist Data Engine for Embodied 4D World Modeling</title>
      <link>https://arxiv.org/abs/2605.01799</link>
      <description>arXiv:2605.01799v2 Announce Type: replace 
Abstract: Embodied agents require robust and comprehensive 3D spatiotemporal representations to support spatial reasoning, manipulation understanding, and downstream decision making. However, existing robot data are typically captured from fixed or sparse viewpoints, providing only partial and view-dependent observations, which limits multi-view perception and generalization across viewpoints. Given the difficulty of collecting additional viewpoints in real-world settings, we propose Embody4D, a dedicated video-to-video world model for embodied scenarios to bridge this observation gap by transforming a monocular robot video into novel-view videos from flexible target camera viewpoints. First, to tackle training data scarcity, we introduce a 3D-aware compositional synthesis pipeline to curate a heterogeneous dataset compositing cross-embodiment robotic arms with diverse backgrounds, promoting broad generalization. Second, to enforce geometric stability, we devise a latent confidence-aware expert modulation strategy, which estimates the reliability of warped latent priors and adaptively routes regions to copy, repair, or inpaint experts for spatiotemporally consistent 4D generation. Finally, to enhance the fidelity of the manipulation, we incorporate an interaction-aware attention mechanism that explicitly attends to the robotic interaction regions. Extensive experiments show that Embody4D achieves state-of-the-art performance on visual evaluation benchmarks, while both simulated and real-world robotic experiments further demonstrate its effectiveness as a robust data engine for synthesizing high-fidelity, view-consistent videos that empower downstream robotic planning and learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.01799v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Peiyan Tu, Hanxin Zhu, Jingwen Sun, Shaojie Ren, Cong Wang, Yuyan Xu, Jiayi Luo, Xiaoqian Cheng, Zhibo Chen</dc:creator>
    </item>
    <item>
      <title>Anomaly-Preference Image Generation</title>
      <link>https://arxiv.org/abs/2605.02439</link>
      <description>arXiv:2605.02439v3 Announce Type: replace 
Abstract: Synthesizing realistic and diverse anomalous samples from limited data is vital for robust model generalization. However, existing methods struggle to reconcile fidelity and diversity, often hampered by distribution misalignment and overfitting, respectively.To mitigate this, we introduce Anomaly Preference Optimization,a novel paradigm that reformulates anomaly generation as a preference learning problem.Central to our approach is an implicit preference alignment mechanism that leverages real anomalies as positive references, deriving optimization signals directly from denoising trajectory deviations without requiring costly human annotation. Furthermore, we propose a Time-Aware Capacity Allocation module that dynamically distributes model capacity along the diffusion timeline,prioritizing structural diversity during highnoise phases while enhancing fine-grained fidelity in low-noise stages. During inference, a hierarchical sampling strategy modulates the coherencealignment trade-off, enabling precise control over generation. Extensive experiments demonstrate that significantly outperforms existing baselines,achieving state-of-the-art performance in both realism and diversity.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02439v3</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Fuyun Wang, Yuanzhi Wang, Xu Guo, Sujia Huang, Tong Zhang, Dan Wang, Hui Yan, Xin Liu, Zhen Cui</dc:creator>
    </item>
    <item>
      <title>Kernel Affine Hull Machines as Compute-Efficient Encoders for Frozen Semantic Spaces</title>
      <link>https://arxiv.org/abs/2605.02950</link>
      <description>arXiv:2605.02950v2 Announce Type: replace 
Abstract: Transformer-based semantic encoders are effective for retrieval, but in many deployments the recurring bottleneck is online query encoding rather than offline corpus indexing. This paper studies whether, once a strong teacher representation space and corpus index are fixed, repeated neural query encoding can be replaced by a substantially lighter and analytically explicit estimator. We formulate fixed-teacher lexical-to-semantic encoding as a conditional-mean estimation problem in which the target semantic vector is represented as a noisy mixture of semantic prototypes weighted by posterior cluster probabilities. Kernel Affine Hull Machine (KAHM) geometry is used to estimate these posterior weights from inexpensive lexical features in an explicitly identified RKHS hypothesis space, and the semantic prototypes are refined by normalized least-mean-squares updates from noisy teacher embeddings. This yields a backpropagation-free query-side encoder together with an end-to-end error decomposition into posterior-approximation, finite-sample/generalization, and teacher-noise terms. We instantiate the approach on a controlled Austrian-law retrieval benchmark with 5,000 test queries, 84 candidate laws, and 10,762 aligned retrieval units, using law-specific encoders into a frozen Mixedbread embedding space. Among evaluation-matched learned adapters, KAHM achieves the strongest teacher-space reconstruction and the best rank-sensitive retrieval performance at all evaluated cutoffs. At k=20, it obtains MRR@20 = 0.504, Hit@20 = 0.694, and Top-1 Accuracy = 0.411, while reducing online per-query time by 8.53 relative to direct transformer query encoding in the reported CPU setting. The results support KAHMs as compute-efficient encoders for supervised fixed-representation deployment regimes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.02950v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mohit Kumar, Somayeh Kargaran, Bernhard A. Moser, Manuela Gei{\ss}</dc:creator>
    </item>
    <item>
      <title>Neuron-Anchored Rule Extraction for Large Language Models via Contrastive Hierarchical Ablation</title>
      <link>https://arxiv.org/abs/2605.03058</link>
      <description>arXiv:2605.03058v2 Announce Type: replace 
Abstract: A central goal of explainable AI is to express large language model (LLM) decision logic symbolically and ground it in internal mechanisms. Existing rule-extraction methods usually learn ungrounded symbolic surrogates, while mechanistic interpretability links behavior to neurons but often requires hand-crafted hypotheses and costly interventions. We introduce MechaRule, a pipeline that grounds rule extraction in LLM circuits by localizing sparse agonist activations whose ablation disrupts rule-related behavior. MechaRule rests on two findings. First, in a fixed baseline/flip regime, sparse agonist effects can exhibit overtopping: a few high-effect activations remain detectable within larger groups, dominate weaker ones, and flip many of the same examples. In such regimes, adaptive group testing with confidence-guided conservative pruning requires O(k log(N/k) + k) interventions over N candidates when k &lt;&lt; N are agonists. Second, agonists are localized more reliably on data splits aligned with close-to-faithful rule behavior; spectral splits provide a rule-free fallback, whereas unfaithful splits degrade localization. Empirically, on arithmetic and jailbreaking, MechaRule recalls 97.0% of highest-effect agonists in matched brute-force validations at only 2.14% of exhaustive-ablation cost on average. Ablating the localized agonists eliminates 97.6--100.0% of eligible correct arithmetic answers and jailbreaks, and can correct arithmetic errors or induce jailbreaks by up to 72.8% and 32.5%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03058v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1145/3770855.3818091</arxiv:DOI>
      <dc:creator>Francesco Sovrano, Gabriele Dominici, Marc Langheinrich</dc:creator>
    </item>
    <item>
      <title>Self-Mined Hardness for Safety Fine-Tuning</title>
      <link>https://arxiv.org/abs/2605.03226</link>
      <description>arXiv:2605.03226v2 Announce Type: replace 
Abstract: Safety fine-tuning of language models typically requires a curated adversarial dataset. We take a different approach: score each candidate prompt's difficulty by how often the target model's own rollouts are judged harmful, then fine-tune on the hardest prompts paired with the model's own non-jailbroken rollouts. On Llama-3-8B-Instruct and Llama-3.2-3B-Instruct, this approach cuts the WildJailbreak attack success rate from 11.5% and 20.1% down to 1-3%, but pushes refusal on jailbreak-shaped benign prompts from 14-22% to 74-94%. Interleaving the same hard prompts 1:1 with adversarially-framed benign prompts (prompts that look like jailbreaks but have benign intent) cuts that refusal back down to 30-51% on 8B and 52-72% on 3B, at a cost of 2-6 percentage points of attack success rate. Within the mixed regime, training on the hardest half of the eligible pool rather than a random half cuts the remaining ASR by 35-50% (about 3 percentage points) on both models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03226v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Prakhar Gupta, Garv Shah, Donghua Zhang</dc:creator>
    </item>
    <item>
      <title>Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning</title>
      <link>https://arxiv.org/abs/2605.03229</link>
      <description>arXiv:2605.03229v2 Announce Type: replace 
Abstract: Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory layers to the model and, on each training step, updating only the small set of memory rows that the current batch reads most heavily. We re-implement SMF on Qwen-2.5-0.5B-Instruct and compare it with LoRA and full finetuning on MedMCQA, a 4-choice medical exam task, using WikiText perplexity and TriviaQA accuracy as forgetting probes. SMF improves MedMCQA by 2.5 percentage points while keeping both forgetting probes within roughly 1 point of the base model, whereas LoRA and full finetuning achieve larger gains but with clear drift on both. We also compare two row-selection rules (KL-divergence and TF-IDF), which balance the two forgetting metrics differently.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03229v2</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Prakhar Gupta, Garv Shah, Satyam Goyal, Anirudh Kanchi</dc:creator>
    </item>
    <item>
      <title>Population-Aware Imitation Learning in Mean-field Games with Common Noise</title>
      <link>https://arxiv.org/abs/2605.03357</link>
      <description>arXiv:2605.03357v2 Announce Type: replace 
Abstract: Mean Field Games (MFGs) provide a powerful framework for modeling the collective behavior of large populations of interacting agents. In this paper, we address the problem of Imitation Learning (IL) in MFGs subject to common noise, where the population distribution evolves stochastically. This stochasticity compels agents to adopt population-aware policies to respond to aggregate shocks. We formulate two distinct learning objectives: recovering a Nash equilibrium and maximizing performance against an expert population. We investigate two imitation proxies: Behavioral Cloning (BC) and Adversarial (ADV) divergence. We then establish finite-sample error bounds showing that minimizing these proxies effectively controls both the policy's exploitability and its performance gap relative to the expert. Furthermore, we propose a numerical framework using generalized Fictitious Play and Deep Learning to compute expert population-aware policies. Through experiments on three environments we demonstrate that standard population-unaware policies fail to capture the equilibrium dynamics. Our results highlight that learning population-aware policies is crucial to avoid being misled by the randomness inherent in common noise.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03357v2</guid>
      <category>cs.LG</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gr\'egoire Lambrecht, Mathieu Lauri\`ere</dc:creator>
    </item>
    <item>
      <title>APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music</title>
      <link>https://arxiv.org/abs/2605.03395</link>
      <description>arXiv:2605.03395v2 Announce Type: replace 
Abstract: Music popularity prediction has attracted growing research interest, with relevance to artists, platforms, and recommendation systems. However, the explosive rise of AI-generated music platforms has created an entirely new and largely unexplored landscape, where a surge of songs is produced and consumed daily without the traditional markers of artist reputation or label backing. Key, yet unexplored in this pursuit is aesthetic quality. We propose APEX, the first large-scale multi-task learning framework for AI-generated music, trained on over 211k songs (10k hours of audio) from Suno and Udio, that jointly predicts engagement-based popularity signals - streams and likes scores - alongside five perceptual aesthetic quality dimensions from frozen audio embeddings extracted from MERT, a self-supervised music understanding model. Aesthetic quality and popularity capture complementary aspects of music that together prove valuable: in an out-of-distribution evaluation on the Music Arena dataset, comprising pairwise human preference battles across eleven generative music systems unseen during training, including aesthetic features consistently improves preference prediction, demonstrating strong generalisation of the learned representations across generative architectures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03395v2</guid>
      <category>cs.SD</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.MM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Jaavid Aktar Husain, Dorien Herremans</dc:creator>
    </item>
    <item>
      <title>Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards</title>
      <link>https://arxiv.org/abs/2605.03862</link>
      <description>arXiv:2605.03862v4 Announce Type: replace 
Abstract: Reinforcement learning with verifiable rewards has become a common way to improve explicit reasoning in large language models, but final-answer correctness alone does not reveal whether the reasoning trace is faithful, reliable, or useful to the model that consumes it. This outcome-only signal can reinforce traces that are right for the wrong reasons, overstate reasoning gains by rewarding shortcuts, and propagate flawed intermediate states in multi-step systems. To this end, we propose TraceLift, a planner-executor training framework that treats reasoning as a consumable intermediate artifact. During planner training, the planner emits tagged reasoning. A frozen executor turns this reasoning into the final artifact for verifier feedback, while an executor-grounded reward shapes the intermediate trace. This reward multiplies a rubric-based Reasoning Reward Model (RM) score by measured uplift on the same frozen executor, crediting traces that are both high-quality and useful. To make reasoning quality directly learnable, we introduce TRACELIFT-GROUPS, a rubric-annotated reason-only dataset built from math and code seed problems. Each example is a same-problem group containing a high-quality reference trace and multiple plausible flawed traces with localized perturbations that reduce reasoning quality or solution support while preserving task relevance. Extensive experiments on code and math benchmarks show that this executor-grounded reasoning reward improves the two-stage planner-executor system over execution-only training, suggesting that reasoning supervision should evaluate not only whether a trace looks good, but also whether it helps the model that consumes it. Our code is available at: https://github.com/MasaiahHan/TraceLift</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.03862v4</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tianyang Han, Hengyu Shi, Junjie Hu, Xu Yang, Zhiling Wang, Junhao Su</dc:creator>
    </item>
    <item>
      <title>Rethinking Local Learning: A Cheaper and Faster Recipe for LLM Post-Training</title>
      <link>https://arxiv.org/abs/2605.04913</link>
      <description>arXiv:2605.04913v4 Announce Type: replace 
Abstract: LLM post-training typically propagates task gradients through the full depth of the model. Although this end-to-end structure is simple and general, it couples task adaptation to full-depth activation storage, long-range backward dependencies and direct task-gradient access to pretrained representations. We argue that this full-depth backward coupling can be unnecessarily expensive and intrusive, particularly when post-training supervision is much narrower than pre-training. To this end, we propose \textbf{LoPT}: Local-Learning Post-Training, a simple post-training strategy that makes gradient reach an explicit design choice. LoPT places a single gradient boundary at the transformer midpoint: the second-half block learns from the task objective, while the first-half block is updated by a lightweight feature-reconstruction objective to preserve useful representations and maintain interface compatibility. LoPT shortens the task-induced backward path while limiting direct interference from narrow task gradients on early-layer representations. Extensive experiments demonstrate that LoPT achieves competitive performance with lower memory cost, higher training efficiency and better retention of pretrained capabilities. Our code is available at: https://github.com/HumyuShi/LoPT</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.04913v4</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hengyu Shi, Tianyang Han, Peizhe Wang, Zhiling Wang, Xu Yang, Junhao Su</dc:creator>
    </item>
    <item>
      <title>CPCANet: Deep Unfolding Common Principal Component Analysis for Domain Generalization</title>
      <link>https://arxiv.org/abs/2605.05136</link>
      <description>arXiv:2605.05136v3 Announce Type: replace 
Abstract: Domain Generalization (DG) aims to learn representations that remain robust under out-of-distribution (OOD) shifts and generalize effectively to unseen target domains. While recent invariant learning strategies and architectural advances have achieved strong performance, explicitly discovering a structured domain-invariant subspace through second-order statistics remains underexplored. In this work, we propose CPCANet, a novel framework grounded in Common Principal Component Analysis (CPCA), which unrolls the iterative Flury-Gautschi (FG) algorithm into fully differentiable neural layers. This approach integrates the statistical properties of CPCA into an end-to-end trainable framework, enforcing the discovery of a shared subspace across diverse domains while preserving interpretability. Experiments on four standard DG benchmarks demonstrate that CPCANet achieves state-of-the-art (SOTA) performance in zero-shot transfer. Moreover, CPCANet is architecture-agnostic and requires no dataset-specific tuning, providing a simple and efficient approach to learning robust representations under distribution shift. Code is available at https://github.com/wish44165/CPCANet.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05136v3</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yu-Hsi Chen, Abd-Krim Seghouane</dc:creator>
    </item>
    <item>
      <title>Executable World Models for ARC-AGI-3 in the Era of Coding Agents</title>
      <link>https://arxiv.org/abs/2605.05138</link>
      <description>arXiv:2605.05138v2 Announce Type: replace 
Abstract: We evaluate an initial coding-agent system for ARC-AGI-3 in which the agent maintains an executable Python world model, verifies it against previous observations, refactors it toward simpler abstractions as a practical proxy for an MDL-like simplicity bias, and plans through the model before acting. The system is intentionally direct: it uses a scripted controller, predefined world-model interfaces, verifier programs, and a plan executor, but no hand-coded game-specific logic. The agent-facing prompts, workspace, and controller contain no game-specific code, game-specific prompts, hand-coded heuristics, hidden solutions, or other game-specific information; the same agent and prompts are used across games. Because the coding agent has broad system access, we audit unintended information channels, describe earlier vulnerable harnesses, and explain how the current harness closes observed leakage channels while reducing benchmark-specific information exposure. We report results on the 25 public ARC-AGI-3 games. Each playthrough starts from a fresh agent instance and clean workspace, with no access to files or conversation state from earlier playthroughs. With GPT-5.5 high reasoning effort, the agent fully solved 15 games and achieved a mean per-game RHAE of 58.12%. With GPT-5.4 high reasoning effort, it fully solved 8 games and achieved a mean per-game RHAE of 41.29%. Performance on the private validation set, which is not yet available to us, remains to be tested. Overall, the results provide preliminary evidence that verifier-driven executable world models are a promising approach for ARC-AGI-3 agents. Full run artifacts are released with the code at https://github.com/astroseger/arc-3-agents-baseline1.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05138v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Sergey Rodionov</dc:creator>
    </item>
    <item>
      <title>FalconGEMM: Surpassing Hardware Peaks with Lower-Complexity Matrix Multiplication</title>
      <link>https://arxiv.org/abs/2605.06057</link>
      <description>arXiv:2605.06057v3 Announce Type: replace 
Abstract: Peak breaking Matrix Multiplication is a promising technique to improve the performance of DL, especially in LLM training and inference. We present FalconGEMM, a cross-platform framework that automates the deployment, optimization, and selection of Lower-Complexity Matrix Multiplication Algorithms (LCMAs) across diverse hardware. There are three key innovations: (1) a Deployment Module that enables portable execution across various hardware and input configurations through code generation; (2) an Execution Module with Group-Parallel Optimizations that maximizes on-chip data reuse, utilizes parallel resources, and reduces bandwidth overhead; and (3) a Decision Module featuring a lightweight analytical performance model to select the optimal strategy based on matrix shapes and hardware profiles. Extensive evaluation is conducted on LLM workloads across GPU (H20, A100) and CPU (ARM, x86) architectures with multiple data types. FalconGEMM succeeds in delivering peak breaking performance and outperforms GEMM libraries (e.g., cuBLAS, CUTLASS, Intel MKL, etc) by 7.59%-17.85% and LCMA competitors like AlphaTensor by 12.41%-55.61%. Our framework makes the theoretical promise of LCMAs practical for production deployment across the heterogeneous landscape of modern hardware.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.06057v3</guid>
      <category>cs.DC</category>
      <category>cs.MS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Honglin Zhu, Jiaping Cao, Jiang Shao, Siyuan Feng, Qian Qiu, Peng Chen, Xu Zhang, Yixian Zhou, Man Lung Yiu, Guang Ji, Minwen Deng, Jintao Meng, Wenxi Zhu</dc:creator>
    </item>
    <item>
      <title>NavOne: One-Step Global Planning for Vision-Language Navigation on Top-Down Maps</title>
      <link>https://arxiv.org/abs/2605.06317</link>
      <description>arXiv:2605.06317v4 Announce Type: replace 
Abstract: Existing Vision-Language Navigation (VLN) methods typically adopt an egocentric, step-by-step paradigm, which struggles with error accumulation and limits efficiency. While recent approaches attempt to leverage pre-built environment maps, they often rely on incrementally updating memory graphs or scoring discrete path proposals, which restricts continuous spatial reasoning and creates discrete bottlenecks. We propose Top-Down VLN (TD-VLN), reformulating navigation as a one-step global path planning problem on pre-built top-down maps, supported by our newly constructed R2R-TopDown dataset. To solve this, we introduce NavOne, a unified framework that directly predicts dense path probabilities over multi-modal maps in a single end-to-end forward pass. NavOne features a Top-Down Map Fuser for joint multi-modal map representation, and extends Attention Residuals for spatial-aware depth mixing. Extensive experiments on R2R-TopDown show that NavOne achieves state-of-the-art performance among map-based VLN methods, with a planning-stage speedup of 8x over existing map-based baselines and 80x over egocentric methods, enabling highly efficient global navigation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.06317v4</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dijia Zhan, Jinyi Li, Chenxi Zheng, Shaoyu Huang, Yong Li, Jie Tang, Xuemiao Xu</dc:creator>
    </item>
    <item>
      <title>MinMax Recurrent Neural Cascades</title>
      <link>https://arxiv.org/abs/2605.06384</link>
      <description>arXiv:2605.06384v3 Announce Type: replace 
Abstract: We introduce MinMax Recurrent Neural Cascades (MinMax RNCs), a class of recurrent neural networks built from a novel form of recurrence over the MinMax algebra. We show that MinMax RNCs enjoy key properties that are difficult to obtain simultaneously: strong formal expressivity, efficient evaluation, stable dynamics, and non-vanishing state gradients. First, their formal expressivity corresponds to the regular languages, arguably the maximal expressivity for finite-memory systems. Second, in addition to evaluation in recurrent form, they also admit parallel-scan evaluation with logarithmic depth and linear work in the input length. Third, their states and activations are uniformly bounded for all sequence lengths. Fourth, their loss gradients exist almost everywhere and are uniformly bounded for all sequence lengths. Fifth, they do not exhibit vanishing state gradients: the gradient of a state with respect to a past state can retain norm one independently of the temporal distance between the states. Empirically, we find that these theoretical properties translate into strong practical performance. MinMax RNCs solve the considered synthetic tasks perfectly, generalise to long sequences, and outperform the recurrent baselines considered in our experiments. We also train a 112M-parameter MinMax RNC for next-token prediction, obtaining competitive performance for its size and providing initial evidence that MinMax recurrence can scale to real-world sequence-modelling tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.06384v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.FL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Alessandro Ronca</dc:creator>
    </item>
    <item>
      <title>PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization</title>
      <link>https://arxiv.org/abs/2605.06582</link>
      <description>arXiv:2605.06582v2 Announce Type: replace 
Abstract: Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audio tokenizers rely on quantization, clustering, or codec reconstruction, assigning tokens locally, so sequence consistency, compactness, length control, termination, and edit similarity are rarely optimized directly.
  We introduce PairAlign, a framework for compact audio tokenization through sequence-level self-alignment. PairAlign treats tokenization as conditional sequence generation: an encoder maps speech to a continuous condition, and an autoregressive decoder generates tokens from BOS, learning token identity, order, length, and EOS placement. Given two content-preserving views, each view's sequence is trained to be likely under the other's representation, while unrelated examples provide competing sequences. This gives a scalable surrogate for edit-distance preservation while discouraging many-to-one collapse.
  PairAlign starts from VQ-style tokenization and refines it with EMA-teacher targets, cross-paired teacher forcing, prefix corruption, likelihood contrast, and length control.
  On 3-second speech, PairAlign learns compact, non-degenerate sequences with broad vocabulary usage and strong cross-view consistency. On retrieval tests, it preserves edit-distance search while reducing archive token count by 55%. A continuous-sweep probe shows lower local overlap than a dense geometric tokenizer, but stronger length control and bounded edit trajectories under 100 ms shifts. PairAlign is a sequence-symbolic predictive learner: like JEPA-style objectives, it predicts an abstract target from another view as a learned variable-length symbolic sequence, not a continuous latent.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.06582v2</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Adhiraj Banerjee, Vipul Arora</dc:creator>
    </item>
    <item>
      <title>jina-embeddings-v5-omni: Geometry-preserving Embeddings via Locked Aligned Towers</title>
      <link>https://arxiv.org/abs/2605.08384</link>
      <description>arXiv:2605.08384v3 Announce Type: replace 
Abstract: In this work, we introduce GELATO (Geometry-preserving Embeddings via Locked Aligned TOwers), a novel approach to multimodal embedding models. We build on the VLM-style architecture, in which non-text encoders are adapted to produce input for a language model, which in turn generates embeddings for all varieties of input. We present the result: the jina-embeddings-v5-omni suite, a pair of models that encode text, image, audio, and video input into a single semantic embedding space. GELATO extends the two Jina Embeddings v5 Text models to support additional modality by adding encoders for images and audio. The backbone text embedding models and the added non-text modality encoders remain frozen. We only trained the connecting components, representing 0.35% of the total weights of the joint model. Training is therefore much more efficient than full-parameter retraining. Additionally, the language model remains effectively unaltered, producing exactly the same embeddings for text inputs as the Jina Embeddings v5 Text models. Our evaluations show that GELATO produces results that are competitive with the state-of-the-art, yielding nearly equal performance to larger multimodal embedding models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.08384v3</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Florian H\"onicke, Michael G\"unther, Andreas Koukounas, Mohammad Kalim Akram, Scott Martens, Saba Sturua, Han Xiao</dc:creator>
    </item>
    <item>
      <title>OTora: A Unified Red Teaming Framework for Reasoning-Level Denial-of-Service in LLM Agents</title>
      <link>https://arxiv.org/abs/2605.08876</link>
      <description>arXiv:2605.08876v2 Announce Type: replace 
Abstract: Large Language Models (LLMs) are increasingly deployed as autonomous agents that execute tool-augmented, multi-step tasks, where latency is a critical factor for real-world applications. Yet an overlooked threat is Reasoning-Level Denial-of-Service (R-DoS), in which an attacker preserves task correctness but degrades availability by inflating an agent's reasoning depth or tool-use budget. We introduce OTora, the first unified, two-stage red-teaming framework for instantiating R-DoS attacks. Stage I optimizes an adversarial trigger that induces targeted tool invocations using insertion-aware scoring and dynamic target co-evolution, supporting both black-box and white-box settings. Stage II generates agent-aware reasoning payloads via an ICL-guided genetic search that amplifies overthinking while maintaining correct task outcomes. Across WebShop, Email, and OS agents built on multiple backbone models such as LLaMA-70B and GPT-OSS-120B, OTora achieves up to 10 times increases in reasoning tokens and order-of-magnitude latency slowdowns, all while preserving near-baseline task accuracy. Finally, we discuss mitigation strategies for detecting and constraining abnormal reasoning and latency spikes. The code is available at https://github.com/llm2409/OTora.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.08876v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xinyu Li, Ronghui Mu, Lin Li, Tianjin Huang, Gaojie Jin</dc:creator>
    </item>
    <item>
      <title>From Detection to Recovery: Operational Analysis on LLM Pre-training with 504 GPUs</title>
      <link>https://arxiv.org/abs/2605.09370</link>
      <description>arXiv:2605.09370v3 Announce Type: replace 
Abstract: Large-scale AI training is now fundamentally a distributed systems problem, and hardware failures have become routine operating conditions rather than rare exceptions. Public operational evidence from production training clusters, however, remains scarce. This technical report presents an empirical analysis of a 63-node NVIDIA B200 production cluster (504 GPUs), using 55 days of Prometheus time-series data and 73 days of operational logs covering 224 multi-node training sessions. The cluster operates within a cross-organizational environment in which five parties (SKT, Upstage, Lablup, NVIDIA Korea, and VAST Data) share a unified monitoring pipeline. This arrangement enabled joint diagnosis of a 60-node-scale storage I/O bottleneck that did not appear at 2-4-node scale, a production-scale phenomenon no single team could isolate alone. Drawing on a months-long pre-training campaign, we perform three quantitative analyses yielding four findings. First, statistical analysis over 751 Prometheus metrics and 10 XID-identified GPU failures achieves a 10/10 detection rate (2/10 pre-XID) at ~0.84 false positives per day. No single metric is consistently dominant across failure types, motivating a multi-signal detection strategy. Second, profiling 523 checkpoint events along the GPU VRAM to NFS path attributes the "bandwidth paradox" (1.4-10.4% utilization of 200 Gbps RoCE) to saturation of the 128-slot NFS RPC layer. Third, multi-node failure response shows concentrated exclusions (top 3 of 63 nodes account for &gt;50% of all exclusions) and an auto-retry chain success rate of 33.3% over 12 chains (73 attempts), 2.7x the 12.5% manual recovery rate; the median retry interval is 11 min (IQR 10-11). All analyses are grounded in production infrastructure providing session-level workload management, GPU-centric scheduling, and unified observability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.09370v3</guid>
      <category>cs.DC</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Daemyung Kang, Eunjin Hwang, Hanjeong Lee, HyeokJin Kim, Hyunhoi Koo, Jeongkyu Shin, Jeongseok Kang, Jihyun Kang, Joongi Kim, Junbum Lee, Jungseung Yang, Kyujin Cho, Youngsook Song</dc:creator>
    </item>
    <item>
      <title>CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs</title>
      <link>https://arxiv.org/abs/2605.09823</link>
      <description>arXiv:2605.09823v3 Announce Type: replace 
Abstract: Personal AI assistants are beginning to act as delegates with access to calendars, inboxes, and user preferences. Calendar scheduling makes the trust problem concrete: an assistant must coordinate with other assistants while deciding what to reveal about the person it represents. We introduce CalBench, a controlled benchmark for multi-agent calendar scheduling under private information. In each task, $N$ agents manage separate private calendars and schedule a stream of $M$ incoming meetings while minimizing disruption costs. Because no agent can inspect another agent's calendar, success requires language-mediated coordination rather than centralized planning. CalBench generates solvable scenarios with CP-SAT oracle solutions and decentralized non-LLM reference protocols, enabling evaluation of task success, excess cost, communication efficiency, burden fairness, and privacy leakage under matched information constraints. Across seven model families, we find that completion alone misses important failures: agents leave avoidable cost on the table, communication volume does not predict lower regret, and privacy-preserving silence can deprive teammates of cost information needed for fair burden allocation. CalBench provides a reproducible testbed for studying whether autonomous assistants can coordinate on behalf of users before deployment at scale.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.09823v3</guid>
      <category>cs.MA</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chelsea Zou, Yiheng Yao, Selena She, Noah Goodman, Robert D. Hawkins</dc:creator>
    </item>
    <item>
      <title>SleepWalk: A Three-Tier Benchmark for Stress-Testing Instruction-Guided Vision-Language Navigation</title>
      <link>https://arxiv.org/abs/2605.10376</link>
      <description>arXiv:2605.10376v2 Announce Type: replace 
Abstract: Vision-Language Models (VLMs) have advanced rapidly in multimodal perception and language understanding, yet it remains unclear whether they can reliably ground language into spatially coherent, plausibly executable actions in 3D digital environments. We introduce SleepWalk, a benchmark for evaluating instruction-grounded trajectory prediction in single-scene 3D worlds generated from textual scene descriptions and filtered for navigability. Unlike prior navigation benchmarks centered on long-range exploration across rooms, SleepWalk targets localized, interaction-centric embodied reasoning: given rendered visual observations and a natural-language instruction, a model must predict a trajectory that respects scene geometry, avoids collisions, and terminates at an action-compatible location. The benchmark covers diverse indoor and outdoor environments and organizes tasks into three tiers of spatial and temporal difficulty, enabling fine-grained analysis of grounding under increasing compositional complexity. Using a standardized pointwise judge-based evaluation protocol, we evaluate three frontier VLMs on 2,472 curated 3D environments with nine instructions per scene. Results reveal systematic failures in grounded spatial reasoning, especially under occlusion, interaction constraints, and multi-step instructions: performance drops as the difficulty level of the tasks increase. In general, current VLMs can somewhat produce trajectories that are simultaneously spatially coherent, plausibly executable, and aligned with intended actions. By exposing failures in a controlled yet scalable setting, SleepWalk provides a critical benchmark for advancing grounded multimodal reasoning, embodied planning, vision-language navigation, and action-capable agents in 3D environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.10376v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Niyati Rawal, Sushant Ravva, Shah Alam Abir, Saksham Jain, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das</dc:creator>
    </item>
    <item>
      <title>ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction</title>
      <link>https://arxiv.org/abs/2605.11212</link>
      <description>arXiv:2605.11212v3 Announce Type: replace 
Abstract: Computer-use agents (CUAs) rely on visual observations of graphical user interfaces, where each screenshot is encoded into a large number of visual tokens. As interaction trajectories grow, the token cost increases rapidly, limiting the amount of history that can be incorporated under fixed context and compute budgets. This has resulted in no or very limited improvement in the performance when using history unlike other domains. We address this inefficiency by introducing ReVision, which is used to train multimodal language models on trajectories where redundant visual patches are removed using a learned patch selector that compares patch representations across consecutive screenshots while preserving spatial structure required by the model. Across three benchmarks, OSWorld, WebTailBench, and AgentNetBench, when processing trajectories with 5 history screenshots using Qwen2.5-VL-7B, ReVision reduces token usage by 46% on average while improving success rate by 3% over the no drop baseline. This establishes a clear efficiency gain, enabling agents to process longer trajectories with fewer tokens. With this improved efficiency, we revisit the role of history in CUAs and find that performance continues to improve as more past observations are incorporated when redundancy is removed.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.11212v3</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Amirhossein Abaskohi, Yuhang He, Peter West, Giuseppe Carenini, Pranit Chawla, Vibhav Vineet</dc:creator>
    </item>
    <item>
      <title>Byzantine Consensus in Directed Graphs with Message Authentication</title>
      <link>https://arxiv.org/abs/2605.11309</link>
      <description>arXiv:2605.11309v2 Announce Type: replace 
Abstract: We consider the problem of reaching consensus in communication networks that are modeled by directed graphs. We assume the existence of a message authentication mechanism (such as digital signatures) to verify the integrity of messages. We identify the necessary and sufficient conditions on the directed communication graph for the following problems to be solvable: (i) exact consensus in synchronous systems; and (ii) approximate consensus in asynchronous systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.11309v2</guid>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nitin H. Vaidya, Lewis Tseng</dc:creator>
    </item>
    <item>
      <title>Quantifying Rodda and Graham Gait Classification from 3D Markerless Kinematics derived from a Single-view Video in a Heterogeneous Pediatric Clinical Cohort</title>
      <link>https://arxiv.org/abs/2605.11314</link>
      <description>arXiv:2605.11314v3 Announce Type: replace 
Abstract: Cerebral Palsy (CP) is a neurological disorder of movement and the most common cause of lifelong physical disability in childhood. Approximately 75% of children with CP are ambulatory, and accurate gait assessment is central to preserving walking function, which deteriorates by mid-adulthood in a quarter to half of adults with CP. The Rodda and Graham classification system quantifies sagittal-plane gait deviations using ankle and knee z-scores derived from 3D Instrumented Gait Analysis (3D-IGA), but 3D-IGA is expensive and limited to specialized centers, while observational assessment shows only moderate inter-rater agreement. We developed a markerless gait analysis pipeline that quantifies Rodda and Graham knee and ankle z-scores directly from single-view clinical gait videos. Across 1,058 bilateral limb samples from 529 trials of 152 children (88 male, 63 female; age 12.1 $\pm$ 4.0 years; 60 distinct primary diagnoses, cerebral palsy the most common at $n=54$), the sagittal-view model achieved $R^2 = 0.80 \pm 0.02$ and CCC $= 0.89 \pm 0.02$ for knee z-scores and $R^2 = 0.57 \pm 0.02$ and CCC $= 0.72 \pm 0.02$ for ankle z-scores against 3D-IGA. Binary screening for excess knee flexion achieves AUROC $= 0.88$, correctly identifying 83% of affected children, and applying Rodda and Graham rules yields $43 \pm 1$% 7-class accuracy with macro-AUROC $= 0.78 \pm 0.01$, ankle prediction error remaining the primary bottleneck. Beyond cross-sectional screening, continuous z-scores support longitudinal trajectory tracking across visits, providing a quantitative substrate for monitoring disease progression and treatment response unavailable from observational scales. These results demonstrate the feasibility of video-based z-score estimation, excess-flexion screening, and longitudinal trajectory tracking as a path toward scalable, objective gait assessment in low-resource clinical settings.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.11314v3</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Lauhitya Reddy, Seth Donahue, Jeremy Bauer, Susan Sienko, Anita Bagley, Joseph Krzak, Maura Eveld, Karen Kruger, Ross Chafetz, Vedant Kulkarni, Hyeokhyen Kwon</dc:creator>
    </item>
    <item>
      <title>Engagement Process: Rethinking the Temporal Interface of Action and Observation</title>
      <link>https://arxiv.org/abs/2605.11484</link>
      <description>arXiv:2605.11484v2 Announce Type: replace 
Abstract: Task completion in digital and physical environments increasingly involves complex temporal interaction, where actions and observations unfold over different time scales rather than align with fixed observation--action steps. To model such interactions, we propose \emph{Engagement Process} (EP), an interaction formalism that inherits the decision-theoretic structure of POMDPs while making time explicit in the action--observation interface. EP represents actions and observations as decoupled event streams along time, rather than updates paired at fixed decision steps. This interface captures single-agent timing issues such as deliberation latency, delayed feedback, and persistent actions, while supporting richer agent-side organization, multi-rate coordination, and compositional interaction among subsystems. Across toy, LLM-agent, and learning experiments, EP exposes temporal behaviors hidden by step-based interfaces and enables policies to adapt under explicit time costs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.11484v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jialian Li, Yuchen Cao, Junhong Liu, Weiran Guo, Xutao Wang, Jiaming Song, Jiahao Zhang, Jie Chen</dc:creator>
    </item>
    <item>
      <title>Robust Biomedical Publication Type and Study Design Classification with Knowledge-Guided Perturbations</title>
      <link>https://arxiv.org/abs/2605.11502</link>
      <description>arXiv:2605.11502v2 Announce Type: replace 
Abstract: Accurately and consistently indexing biomedical literature by publication type and study design is essential for supporting evidence synthesis and knowledge discovery. Prior work on automated publication type and study design indexing has primarily focused on expanding label coverage, enriching feature representations, and improving in-domain accuracy, with evaluation typically conducted on data drawn from the same distribution as training. Although pretrained biomedical language models achieve strong performance under these settings, models optimized for in-domain accuracy may rely on superficial lexical or dataset-specific cues, resulting in reduced robustness under distributional shift. In this study, we introduce an evaluation framework based on controlled semantic perturbations to assess the robustness of a publication type classifier and investigate robustness-oriented training strategies that combine entity masking and domain-adversarial training to mitigate reliance on spurious topical correlations. Our results show that the commonly observed trade-off between robustness and in-domain accuracy can be mitigated when robustness objectives are designed to selectively suppress non-task-defining features while preserving salient methodological signals. We find that these improvements arise from two complementary mechanisms: (1) increased reliance on explicit methodological cues when such cues are present in the input, and (2) reduced reliance on spurious domain-specific topical features. These findings highlight the importance of feature-level robustness analysis for publication type and study design classification and suggest that refining masking and adversarial objectives to more selectively suppress topical information may further improve robustness. Data, code, and models are available at: https://github.com/ScienceNLP-Lab/MultiTagger-v2/tree/main/ICHI</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.11502v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shufan Ming, Joe D. Menke, Neil R. Smalheiser, Halil Kilicoglu</dc:creator>
    </item>
    <item>
      <title>Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications</title>
      <link>https://arxiv.org/abs/2605.11855</link>
      <description>arXiv:2605.11855v2 Announce Type: replace 
Abstract: Sequence learning is dominated by Transformers and parallelizable recurrent neural networks (RNNs) such as state-space models, yet learning long-term dependencies remains challenging, and state-of-the-art designs trade power consumption for performance. The Bistable Memory Recurrent Unit (BMRU) was introduced to enable hardware-software co-design of ultra-low power RNNs: quantized states with hysteresis provide persistent memory while mapping directly to analog primitives. However, BMRU performance lags behind parallelizable RNNs on complex sequential tasks. In this paper, we identify gradient blocking during state updates as a key limitation and propose a cumulative update formulation that restores gradient flow while preserving persistent memory, creating skip-connections through time. This leads to the Cumulative Memory Recurrent Unit (CMRU) and its relaxed variant, the $\alpha$CMRU. Experiments show that the cumulative formulation dramatically improves convergence stability and reduces initialization sensitivity. The CMRU and $\alpha$CMRU match or outperform Linear Recurrent Units (LRUs) and minimal Gated Recurrent Units (minGRUs) across diverse benchmarks at small model sizes, with particular advantages on tasks requiring discrete long-range retention, while the CMRU retains quantized states, persistent memory, and noise-resilient dynamics essential for analog implementation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.11855v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.AR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Julien Brandoit, Arthur Fyon, Damien Ernst, Guillaume Drion</dc:creator>
    </item>
    <item>
      <title>Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems</title>
      <link>https://arxiv.org/abs/2605.12213</link>
      <description>arXiv:2605.12213v2 Announce Type: replace 
Abstract: LLM-based conversational AI agents struggle to maintain coherent behavior over long horizons due to limited context. While RAG-based approaches are increasingly adopted to overcome this limitation by storing interactions in external memory modules and performing retrieval from them, their effectiveness in answering challenging questions (e.g., multi-hop, commonsense) ultimately depends on the agent's ability to reason over the retrieved information. However, existing methods typically retrieve memory based on semantic similarity to the raw user utterance, which lacks explicit reasoning about missing intermediate facts and often returns evidence that is irrelevant or insufficient for grounded reasoning. In this work, we introduce Goal-Mem, a goal-oriented reasoning framework for RAG-based agentic memory that performs explicit backward chaining from the user's utterance as a goal. Rather than progressively expanding from retrieved context, Goal-Mem decomposes each goal into atomic subgoals, performs targeted memory retrieval to satisfy each subgoal, and iteratively identifies what information from memory should be retrieved when intermediate goals cannot be resolved. We formalize this process in Natural Language Logic, a logical system that combines the verifiability of reasoning provided by FOL with the expressivity of natural language. Through extensive experiments on two datasets and comparing to nine strong memory baselines, we show that Goal-Mem consistently improves performance, particularly on tasks requiring multi-hop reasoning and implicit inference.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.12213v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiazhou Liang, Armin Toroghi, Yifan Simon Liu, Faeze Moradi Kalarde, Liam Gallagher, Scott Sanner</dc:creator>
    </item>
    <item>
      <title>Subspace Pruning via Principal Vectors for Accurate Koopman-Based Approximations</title>
      <link>https://arxiv.org/abs/2605.13135</link>
      <description>arXiv:2605.13135v2 Announce Type: replace 
Abstract: The accuracy of Koopman operator approximations over
  finite-dimensional spaces relies critically on their invariance
  properties. These can be rigorously quantified via the principal
  angles between a candidate subspace and its image under the Koopman
  operator. This paper proposes a unified algebraic framework for
  subspace pruning designed to systematically refine the invariance
  error. We establish the geometric equivalence between
  consistency-based methods and principal-vector pruning, and build on
  this insight to introduce a hybrid strategy that balances between
  multiple and single principal vector pruning for improved numerical
  stability and scalability. We derive error bounds for the retention
  of approximate and external eigenfunctions, demonstrating that the
  multi-vector approach mitigates the numerical drift inherent to
  sequential pruning. To ensure scalability, we develop an efficient
  numerical update scheme based on rank-one modifications that reduces
  the computational complexity of tracking principal angles by an
  order of magnitude. Finally, we exploit the subspace obtained from
  the pruning algorithms to build a lifted linear model for state
  prediction that accounts for the trade-offs between improving
  invariance and minimizing state reconstruction error. Simulations
  demonstrate the effectiveness of our approach.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.13135v2</guid>
      <category>eess.SY</category>
      <category>cs.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dhruv Shah, Jorge Cort\'es</dc:creator>
    </item>
    <item>
      <title>High-Rate Quantized Matrix Multiplication II</title>
      <link>https://arxiv.org/abs/2605.13768</link>
      <description>arXiv:2605.13768v2 Announce Type: replace 
Abstract: This is the second part of the work investigating quantized matrix multiplication (MatMul). In part I we considered the case of calibration-free quantization, whereas here we discuss the setting where covariance matrix $\Sigma_X$ of the columns of the second factor is available. This setting arises in the ubiquitous task of weight-only post-training quantization of LLMs.
  Weight-only quantization is related to the problem of weighted mean squared error (WMSE) source coding, whose classical (reverse) waterfilling solution dictates how one should distribute rate between coordinates of the vector. We show how waterfilling can be used to improve practical LLM quantization algorithms (GPTQ), which at present allocate rate equally. A recent scheme (known as ``WaterSIC'') that only uses scalar INT quantizers is analyzed and its high-rate performance is shown to be (a) basis free (i.e., characterized by the determinant of $\Sigma_X$ and, thus, unlike existing schemes, is immune to applying random rotations); and (b) within a multiplicative factor of $\frac{2\pi e}{12}$ (or 0.25 bit/entry) of the information-theoretic distortion limit. GPTQ's performance, in turn, is affected by the choice of basis, but for a random rotation and actual $\Sigma_X$ from Llama-3-8B we find it to be within 0.1 bit (depending on the layer type) of WaterSIC, suggesting that GPTQ with random rotation is also near optimal, at least in the high-rate regime.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.13768v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Or Ordentlich, Yury Polyanskiy</dc:creator>
    </item>
    <item>
      <title>ASH: Agents that Self-Hone via Embodied Learning</title>
      <link>https://arxiv.org/abs/2605.14211</link>
      <description>arXiv:2605.14211v3 Announce Type: replace 
Abstract: Long-horizon embodied tasks remain a fundamental challenge in AI, as current methods rely on hand-engineered rewards or action-labeled demonstrations, neither of which scales. We introduce ASH, an agentic system that learns an embodied policy from unlabeled, noisy internet video, without reward shaping or expert annotation. ASH follows a self-improvement loop; when it gets stuck, ASH learns an Inverse Dynamics Model (IDM) from its own trajectories, and uses its IDM to extract supervision from relevant internet video. ASH uses unsupervised learning to identify key moments from large-scale internet video and retains them as long-term memory -- allowing it to tackle long-horizon problems. We evaluate ASH on two complementary environments demanding multi-hour planning: Pokemon Emerald, a turn-based RPG, and The Legend of Zelda: The Minish Cap, a real-time action-adventure game. In both games, behavioral cloning, retrieval-augmented and zero-shot foundation-model baselines plateau, while ASH sustains progression across our 8-hour evaluation. ASH reaches an average of $11.2/12$ milestones in Pokemon Emerald and $9.9/12$ in Legend of Zelda, while the strongest baseline gets stuck in both environments at an average of $6.5/12$ and $6.0/12$ milestones, respectively. We demonstrate that self-improving agents are a scalable recipe for long-horizon embodied learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.14211v3</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Benjamin Schneider, Xavier Schneider, Victor Zhong, Sun Sun</dc:creator>
    </item>
    <item>
      <title>Language Generation as Optimal Control: Closed-Loop Diffusion in Latent Control Space</title>
      <link>https://arxiv.org/abs/2605.14531</link>
      <description>arXiv:2605.14531v3 Announce Type: replace 
Abstract: This work reformulates language generation as a stochastic optimal control problem, providing a unified theoretical perspective to analyze autoregressive and diffusion models and explain their limitations (Efficiency-Fidelity Paradox, Irreversibility Error Propagation, Optimization Tractability and Fidelity) in terms of combination of trajectory singularity, adjoint state vanishing, and gradient absence. To address these issues, we approximate the solution to the Hamilton-Jacobi-Bellman (HJB) equation, yielding an optimal policy that acts as a closed-loop controller. To bypass the intractability of directly solving the HJB PDE, we employ Flow Matching as the optimal trajectory solver within the rectified latent control space. This allows our Manta-LM with Global Integral Operator to approximate the global vector field, effectively realizing a model that simultaneously achieves high-fidelity text generation and efficient, low-cost parallel sampling. Empirically, our method achieves strong performance on language modeling and conditional generation tasks, while exhibiting improved stability, efficiency, and controllability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.14531v3</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>ZiYi Dong, Yuliang Huang, Weijian Deng, Xiangyang Ji, Liang Lin, Pengxu Wei</dc:creator>
    </item>
    <item>
      <title>Margin-Adaptive Confidence Ranking for Reliable LLM Judgement</title>
      <link>https://arxiv.org/abs/2605.15416</link>
      <description>arXiv:2605.15416v2 Announce Type: replace 
Abstract: Jung et al. (2025) introduce a hypothesis testing framework for guaranteeing agreement between large language models (LLMs) and human judgments, relying on the assumption that the model's estimated confidence is monotonic with respect to human-disagreement risk. In practice, however, this assumption may be violated, and the generalization behavior of the confidence estimator is not explicitly analyzed. We mitigate these issues by learning a dedicated confidence estimator instead of relying on heuristic confidence signals. Our approach leverages simulated annotator diversity and a margin-based ranking formulation to explicitly model how confidently an LLM distinguishes between human-agreement and human-disagreement cases. We further derive generalization guarantees for this estimator, revealing a margin-dependent trade-off that informs the design of an adaptive estimator training procedure. When integrated into fixed-sequence testing, the learned confidence estimator yields improved ranking accuracy and empirically strengthens the monotonic relationship between confidence and disagreement risk, leading to higher success rates in satisfying target agreement levels across multiple datasets and judge models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.15416v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Gaojie Jin, Yong Tao, Lijia Yu, Tianjin Huang</dc:creator>
    </item>
    <item>
      <title>Entity-Centric World Models: Interaction-Aware Masking for Causal Video Prediction</title>
      <link>https://arxiv.org/abs/2605.15466</link>
      <description>arXiv:2605.15466v2 Announce Type: replace 
Abstract: Learning predictive world models from unlabelled video is a foundational challenge in artificial intelligence. While Joint Embedding Predictive Architectures (JEPA) have set new benchmarks in semantic classification, they often remain physics-blind, failing to capture the causal dynamics necessary for downstream reasoning. We hypothesize that this stems from standard patch-based masking strategies, which prioritize visual texture over rare but informative kinematic events. We propose Interaction-Aware JEPA (IA-JEPA), which utilizes a self-supervised motion-centric masking strategy to prioritize physical interactions. By specifically targeting entities engaged in collisions or momentum transfers, we force the architecture to reconstruct latent trajectories rather than static background features. Evaluated on the CLEVRER benchmark, IA-JEPA achieves 14.26% accuracy on causal reasoning tasks, a significant lead over the 3.22% achieved by standard patch-masked baselines. Crucially, we demonstrate that IA-JEPA breaks the "static bias" of standard self-supervision by inducing a higher-entropy, more discriminative latent space (+10% entropy gain) that linearizes physical energy ($R^2=0.43$). We show that this interaction bias generalizes to real-world human actions (Something-Something V2) and zero-shot physical puzzles (PHYRE-Lite). Our results provide a scalable, fully self-supervised path toward building foundational world models that begin to internalize the causal structure of the physical world.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.15466v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Santosh Kumar Paidi</dc:creator>
    </item>
    <item>
      <title>Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs</title>
      <link>https://arxiv.org/abs/2605.15491</link>
      <description>arXiv:2605.15491v2 Announce Type: replace 
Abstract: Layer pruning removes entire Transformer decoder blocks from large language models, but introduces a mismatch between the hidden state received by the next surviving layer and the distribution it was trained to process, leading to significant performance degradation. We propose Ghosted Layers, a training-free recovery module that addresses this issue by solving a boundary activation alignment problem. Our method derives a closed-form optimal linear operator from a small calibration set to reconstruct the activation discrepancy introduced by the pruned layers. We show that this solution corresponds to the unconstrained optimum of the alignment objective, whereas existing methods are restricted to constrained solutions over limited operator subspaces. Experiments across multiple LLM backbones and pruning strategies demonstrate that our method consistently improves accuracy and perplexity over prior training-free baselines, while preserving the efficiency gains of layer pruning. Official code repository: https://github.com/daniel-eai/ghosted_layers_official_repository/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.15491v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.PF</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vincent-Daniel Yun, Junhyuk Jo, Sai Praneeth Karimireddy, Sunwoo Lee</dc:creator>
    </item>
    <item>
      <title>FRWKV+: Periodic-Aware Adaptive Gating for Frequency-Space Linear Time Series Forecasting</title>
      <link>https://arxiv.org/abs/2605.15690</link>
      <description>arXiv:2605.15690v2 Announce Type: replace 
Abstract: Accurate and efficient long-term multivariate time series forecasting requires capturing recurring temporal structure while keeping inference cheap across many variables and horizons. Frequency-space models represent long-range and periodic variation compactly, but they typically process the real and imaginary spectral components as weakly coupled streams and treat periodic cues as ordinary input features, even when such cues are unreliable. This paper proposes FRWKV-Plus, a lightweight periodic-aware frequency-space forecasting model built on the efficient FRWKV backbone. FRWKV-Plus introduces a cross-branch spectral gate that reweights each spectral branch using a summary of its sibling branch, and a trust-gated residual correction that converts compact within-period context into a bounded, sign-flexible adjustment of these gates under a learned, data-dependent trust score. By construction, the correction is identity-preserving at initialization and strictly bounded, so periodic evidence can refine but never dominate or invert the base interaction. On seven standard benchmarks, FRWKV-Plus is consistently competitive with strong linear, frequency-domain, recurrent-style, and Transformer-based forecasters while preserving the lightweight profile of the backbone. Controlled three-seed ablations show that each component contributes, that the benefit is modest on strongly periodic data and pronounced on the harder Exchange and ILI datasets, and that the within-period context is the most influential single component. The implementation is publicly available at https://github.com/yangqingyuan-byte/FRWKV-plus.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.15690v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Qingyuan Yang, Dongyue Chen, Da Teng, Junhua Xiao, Jiaji Pan, Shizhuo Deng</dc:creator>
    </item>
    <item>
      <title>Evaluating Design Video Generation: Metrics for Compositional Fidelity</title>
      <link>https://arxiv.org/abs/2605.16223</link>
      <description>arXiv:2605.16223v2 Announce Type: replace 
Abstract: Generative video models are increasingly used in design animation tasks, yet no standardized evaluation framework exists for this domain. Unlike natural video generation, design animation imposes structured constraints: specific components shall animate with prescribed motion types, directions, speed and timing, while non-animated regions must remain stable and layout structure must be preserved. This paper provides a fully automated evaluation framework organized across four dimensions: layout fidelity, motion correctness, temporal quality, and content fidelity. This eliminates the reliance on subjective human evaluation and establishes a common basis for benchmarking progress in the field. We release the code and dataset here: https://github.com/purvanshi/lica-bench.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.16223v2</guid>
      <category>cs.GR</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Adrienne Deganutti, Dingning Cao, Jaejung Seol, Elad Hirsch, Purvanshi Mehta</dc:creator>
    </item>
    <item>
      <title>ANNEAL: Adapting LLM Agents via Governed Symbolic Patch Learning</title>
      <link>https://arxiv.org/abs/2605.16309</link>
      <description>arXiv:2605.16309v2 Announce Type: replace 
Abstract: LLM-based agents can recover from individual execution errors, yet they repeatedly fail on the same fault when the underlying process knowledge--operator schemas, preconditions, and constraints--remains unrepaired. Existing self-evolving approaches address this gap by updating prompts, memory, or model weights, but none directly repair the symbolic structures that encode how tasks are executed, and few provide the governance guarantees required for safe deployment. We introduce ANNEAL, a neuro-symbolic agent that converts recurring failures into governed symbolic edits of a process knowledge graph without modifying foundation model weights. Its core mechanism, Failure-Driven Knowledge Acquisition (FDKA), localizes the responsible operator, synthesizes a typed patch through constrained LLM generation, and validates the proposal via multi-dimensional scoring, symbolic guardrails, and canary testing before commit. Every accepted edit carries full provenance and deterministic rollback capability. Across four domains and 27 multi-seed runs, ANNEAL is the only evaluated system that commits persistent structural repairs--strong baselines such as ReAct and Reflexion achieve high episodic recovery yet retain 72--100% holdout failure rates on recurring faults, whereas ANNEAL reduces these to 0% in the tested recurring-failure settings. Ablation confirms that removing FDKA eliminates all structural repairs and drops success rate by up to 26.7 percentage points. These results suggest that governed symbolic repair offers a complementary paradigm to weight-level and prompt-level adaptation for persistent fault elimination.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.16309v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Safayat Bin Hakim, Keyan Guo, Wenkai Tan, Alvaro Velasquez, Shouhuai Xu, Houbing Herbert Song</dc:creator>
    </item>
    <item>
      <title>PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures</title>
      <link>https://arxiv.org/abs/2605.16551</link>
      <description>arXiv:2605.16551v2 Announce Type: replace 
Abstract: Evaluating LLM-based agents remains challenging because identifying meaningful failure cases often requires substantial human effort to design realistic test scenarios. Prior works primarily focus on automatically discovering agent failures induced by adversarial users, while overlooking queries with real user intents that also trigger agent failures. We introduce PQR, a framework that not only surfaces agent failures with respect to specific objectives (e.g., helpfulness, safety, etc.) but also resembles real users' intents. PQR operates through an iterative interaction between two complementary modules. The query refinement module performs rewrites to explore diverse query variations, while the prompt refinement module uses prior feedback to derive new objective-violating strategies and realism policies for refining prompts, which in turn generate failure-triggering yet realistic queries. We evaluate PQR on detecting an e-commerce QA agent's unhelpful responses. Our method uncovers 23% - 78% more unhelpful responses, and our generated queries are more diverse and realistic compared to previous methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.16551v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yunan Lu, Luigi Liu, Omar Yahia, Arpit Sharma, Zhou Yu</dc:creator>
    </item>
    <item>
      <title>VQ-Atom: Semantic Discretization of Local Atomic Environments for Molecular Representation Learning</title>
      <link>https://arxiv.org/abs/2605.16823</link>
      <description>arXiv:2605.16823v2 Announce Type: replace 
Abstract: Large language models succeed by combining large-scale pretraining with meaningful discrete tokens. In molecular machine learning, SMILES is widely used as a token representation, but it is primarily a linearization format for molecular graphs rather than a semantic decomposition of chemistry. We propose VQ-Atom, a semantic tokenization framework that assigns discrete atom-level tokens based on local chemical environments via vector quantization. Unlike SMILES tokens, VQ-Atom tokens encode graph-local chemical context and are aligned with molecular structure. On protein-cold drug--target interaction prediction using the KIBA dataset, VQ-Atom substantially improves global ranking performance, achieving AUROC of 0.79 while substantially outperforming both SMILES-based and continuous molecular representations under an identical downstream architecture. Furthermore, VQ-Atom enables approximately 3 times faster downstream training than continuous atom-level representations by replacing per-atom continuous features with reusable discrete tokens. These results suggest that molecular tokenization is not merely a preprocessing step, but a central design choice. In particular, well-structured tokens can encode substantial chemical semantics, reducing the burden on downstream learning. VQ-Atom can be interpreted as defining a molecular language, where tokens correspond to chemically meaningful atomic environments, suggesting that token design may constitute an additional axis of machine learning research alongside architecture, objectives, and optimization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.16823v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Takayuki Kimura</dc:creator>
    </item>
    <item>
      <title>Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps</title>
      <link>https://arxiv.org/abs/2605.16928</link>
      <description>arXiv:2605.16928v2 Announce Type: replace 
Abstract: Long-context inference in large language models is bottlenecked by the quadratic cost of full attention. Existing efficient alternatives often rely either on native sparse training or on heuristic token eviction, creating an undesirable trade-off among efficiency, training cost, and accuracy. In this work, we show that full-attention LLMs are already intrinsically sparse and can be transformed into highly sparse models with only minimal adaptation. Our approach is built on three observations: (1) only a small subset of attention heads truly requires full long-context processing; (2) long-range retrieval is governed primarily by a low-dimensional subspace, allowing relevant tokens to be retrieved efficiently with a 16-dimensional indexer; and (3) the useful token budget is strongly query-dependent, making dynamic top-$p$ selection more suitable than fixed top-$k$ sparsification. Based on these insights, we propose RTPurbo, which retains the full KV cache only for retrieval heads and introduces a lightweight token indexer for sparse attention. By exploiting the model's intrinsic sparsity, RTPurbo achieves sparsification with only a few hundred training steps. Experiments on long-context benchmarks and reasoning tasks show that RTPurbo preserves near-lossless accuracy while delivering substantial efficiency gains, including up to a 9.36$\times$ prefill speedup at 1M context and about a 2.01$\times$ decode speedup. These results suggest that strong sparse inference can be obtained from standard full-attention training without expensive native sparse pretraining.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.16928v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yanke Zhou, Yiduo Li, Hanlin Tang, Maohua Li, Kan Liu, Tao Lan, Lin Qu, Yuan Yao, Xiaoxing Ma</dc:creator>
    </item>
    <item>
      <title>WhiteTesseract: Reframing the Interpretation of Cultural Heritage through XR and Conversational AI</title>
      <link>https://arxiv.org/abs/2605.16972</link>
      <description>arXiv:2605.16972v2 Announce Type: replace 
Abstract: Cultural heritage exhibitions often struggle to sustain attention and support reflective engagement. Physical exhibitions rely on fixed interpretive aids that lack adaptability to individual backgrounds or curiosity, and their effectiveness depends heavily on a visitor's Personal Context, prior knowledge, and cultural literacy. Meanwhile, digital exhibitions prioritize convenience and accessibility but risk weakening the Physical and Social Contexts that define embodied cultural experience.
  WhiteTesseract addresses this gap by enabling in-situ interpretation through high-resolution XR and conversational AI. The system integrates spatial intelligence via artwork recognition to allow visitors to selectively reduce environmental distractions (via diminished reality) and engage in context-aware dialogue (via large language models). The goal is to preserve the richness of the physical and social environment while providing a flexible space for personal reflection, enhancing Personal Context without compromising physical authenticity.
  We deployed the system in a Claude Monet exhibition and conducted a controlled user study with 26 participants. Quantitative results showed that WhiteTesseract modulation significantly increased average viewing duration from 35.3 to 98.3 seconds (p &lt; 0.001). Analysis of 529 visitor-AI interactions revealed that 60% extended beyond factual queries to include analytical, emotional, and comparative inquiries. These findings demonstrate how XR and AI can enrich the physical exhibition experience by supporting deeper, more personalized engagement without displacing the embodied value of cultural heritage. We discuss technical and social constraints for real-world deployment and limitations of our controlled setting.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.16972v2</guid>
      <category>cs.HC</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jingjing Li, Zhi Liu, Xiyao Jin, Tatsuki Fushimi, Yoichi Ochiai</dc:creator>
    </item>
    <item>
      <title>CatalyticMLLM: A Graph-Text Multimodal Large Language Model for Catalytic Materials</title>
      <link>https://arxiv.org/abs/2605.17254</link>
      <description>arXiv:2605.17254v3 Announce Type: replace 
Abstract: Property prediction and inverse structural design of catalytic materials are typically modeled as two independent tasks: the former predicts target properties from given structures, whereas the latter generates candidate structures according to desired properties. Although the decoupled paradigm facilitates the implementation of a ``generation--evaluation--screening'' workflow, the inconsistency between the generative model and the property prediction model in terms of representation spaces and training objectives can readily introduce data distribution shifts and evaluator bias, thereby limiting the stability of closed-loop optimization.
  In this work, we propose CatalyticMLLM, a unified graph--text multimodal large language model for catalytic materials, which integrates property prediction and \textbf{inverse design} within the same model and shared representation space. Under this unified framework, CatalyticMLLM can not only perform reliable property prediction by leveraging three-dimensional structures and textual information, but also generate and screen physically feasible CIF candidates conditioned on target properties, thereby forming a closed-loop optimization workflow of ``inverse design--prediction--screening--redesign.'' Experimental results demonstrate that this unified paradigm outperforms decoupled baselines on both catalytic relaxed-energy prediction and inverse design tasks, validating the effectiveness of jointly modeling property prediction and structure generation within a single multimodal model.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.17254v3</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yanjie Li, Jian Xu, Xu-Yao Zhang, Shiming Xiang, Nian Ran, Weijun Li, Cheng-Lin Liu</dc:creator>
    </item>
    <item>
      <title>LEAP: Learnable End-to-End Adaptive Pruning of Large Language Models</title>
      <link>https://arxiv.org/abs/2605.17289</link>
      <description>arXiv:2605.17289v2 Announce Type: replace 
Abstract: Unstructured sparsity is now natively accelerated by recent GPU kernels and dataflow hardware, shifting the bottleneck from inference execution to the pruning algorithm. State-of-the-art methods for unstructured LLM pruning are layer-wise surrogates derived from the Optimal Brain Surgeon principle, and they sacrifice end-to-end accuracy, especially under aggressive sparsity. End-to-end alternatives such as MaskLLM and PATCH show that learnable masks can close this gap, but their categorical-over-patterns parameterization scales with the number of valid masks per row and does not port to the unstructured setting. We introduce LEAP, which replaces this intractable parameterization with a per-weight Bernoulli-via-Gumbel-sigmoid relaxation that makes end-to-end unstructured mask learning tractable. Across five LLM families from 0.5B to 8B parameters at 50% and 60% sparsity, LEAP improves six-task average zero-shot accuracy by +2.59 points on average over ADMM, the best layer-wise baseline in our sweep.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.17289v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Mohammad Mozaffari, Younes Hourri, Mohammad Rastegari, Mahyar Najibi</dc:creator>
    </item>
    <item>
      <title>ConflictRAG: Detecting and Resolving Knowledge Conflicts in Retrieval Augmented Generation</title>
      <link>https://arxiv.org/abs/2605.17301</link>
      <description>arXiv:2605.17301v2 Announce Type: replace 
Abstract: Retrieval-Augmented Generation (RAG) systems implicitly assume mutual consistency among retrieved documents -- an assumption that frequently fails in practice. We present ConflictRAG, a conflict-aware RAG framework that detects, classifies, and resolves knowledge conflicts prior to answer generation. The framework introduces three contributions: (1) a two-stage conflict detection module combining a lightweight embedding-based MLP classifier with selective LLM refinement, reducing API costs by 62% while maintaining 90.8% detection accuracy; (2) an Entropy-TOPSIS framework for data-driven source credibility assessment, improving selection accuracy by 7.1% over manual heuristics; and (3) a Conflict-Aware RAG Score (CARS) for diagnostic evaluation of conflict-handling capabilities. Experiments on three benchmarks against six baselines demonstrate 88.7% conflict-detection F1 and consistent 5.3--6.1% correctness gains over the strongest conflict-aware baseline, with the pipeline transferring effectively across backbone LLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.17301v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chenyu Wang, Yueyuan Li, Yingmin Liu, Yang Shu</dc:creator>
    </item>
    <item>
      <title>Adaptive Generate-Rank-Verify: Inference-Time Search with Costly Verification</title>
      <link>https://arxiv.org/abs/2605.17609</link>
      <description>arXiv:2605.17609v2 Announce Type: replace 
Abstract: Many inference-time language-model pipelines combine a cheap reward signal with an expensive verifier, such as exact answer checking in mathematical reasoning or hidden-test execution in code generation. We formalize this setting using a learning-theoretic lens as generative active search: a cost-sensitive first-positive search problem in which a policy adaptively samples candidates from an unknown distribution, observes cheap scores, and pays for verifier labels until it finds a positive example. For a fixed prompt, the generator and reward model induce two unknown objects: a distribution over reward scores and a score-conditioned success function. When these quantities are known, we characterize the distribution-aware optimal policy using a dynamic programming approach. In the realistic and practical setting where both the score distribution and success function are unknown, we propose ADAP, a shellwise adaptive generate-rank-verify algorithm that progressively increases the number of sampled responses and top-ranked verifications. Under the monotonicity assumption that higher reward scores are no less likely to pass verification, we show that ADAP achieves expected cost within a constant factor of the distribution-aware optimum. We complement this result with learning-theoretic lower bounds, based on a centered star number, showing that structural assumptions on the score--label relationship are necessary. Experiments on mathematical reasoning and competitive programming validate the predicted advantage over both fixed non-adaptive policies and difficulty-adaptive baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.17609v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shaddin Dughmi, Mahdi Haghifam, Yusuf Hakan Kalayci</dc:creator>
    </item>
    <item>
      <title>Post-Trained MoE Can Skip Half Experts via Self-Distillation</title>
      <link>https://arxiv.org/abs/2605.18643</link>
      <description>arXiv:2605.18643v2 Announce Type: replace 
Abstract: Mixture-of-Experts (MoE) scales language models efficiently through sparse expert activation, and its dynamic variant further reduces computation by adjusting the activated experts in an input-dependent manner. Existing dynamic MoE methods usually rely on pre-training from scratch or task-specific adaptation, leaving the practical conversion of fully trained MoE underexplored. Enabling such adaptation would directly alleviate the inference costs by allowing easy tokens to bypass unnecessary expert during serving. This paper introduces Zero-Expert Self-Distillation Adaptation (ZEDA), a low-cost framework that transforms post-trained static MoE models into efficient dynamic ones. To stabilize this architectural conversion, ZEDA injects parameter-free zero-output experts into each MoE layer and adapts the augmented model through two-stage self-distillation, utilizing the original MoE as a frozen teacher and applying a group-level balancing loss. On Qwen3-30B-A3B and GLM-4.7-Flash across 11 benchmarks spanning math, code, and instruction following, ZEDA eliminates over 50% of expert FLOPs at marginal accuracy loss. It outperforms the strongest dynamic MoE baseline by 6.1 and 4.0 points on the two models, and delivers ~1.20$\times$ end-to-end inference speedup.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.18643v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xingtai Lv, Li Sheng, Kaiyan Zhang, Yichen You, Siyan Gao, Xueheng Luo, Yuxin Zuo, Yuchen Fan, Junlin Yang, Ganqu Cui, Bingning Wang, Fan Yang, Youbang Sun, Ning Ding, Bowen Zhou</dc:creator>
    </item>
    <item>
      <title>SPHERICAL KV: Angle-Domain Attention and Rate-Distortion Retention for Efficient Long-Context Inference</title>
      <link>https://arxiv.org/abs/2605.18856</link>
      <description>arXiv:2605.18856v3 Announce Type: replace 
Abstract: Long-context inference is increasingly constrained by the KV cache: resident memory grows with context length, and decoding becomes limited by repeated High Bandwidth Memory (HBM) streaming rather than arithmetic. Existing methods such as eviction, windowing, quantization, and offloading reduce footprint, but often leave the critical-path bottleneck only partially addressed, especially when compressed states must still be reconstructed into dense vectors during decoding.
  We present Spherical KV, a long-context inference method that treats KV allocation as a rate-distortion problem grounded in attention geometry for efficient decoding. The method is built on two ideas: (i) represent directional information cheaply in the decode hot loop, and (ii) allocate retention and precision according to estimated future utility. Its first component, Angle-Domain Attention (ADA), stores keys in a spherical parameterization consisting of a scalar radius and compact angle codes, and computes attention logits directly from these codes without reconstructing dense keys. This preserves a paged, block-local, fusion-friendly decode path and directly targets HBM traffic in realistic serving settings. Its second component, Rate-Distortion Retention (RDR), jointly chooses keep/drop decisions and precision tiers per token and head under a fixed budget, producing tier-homogeneous pages with lightweight metadata and coalesced reads. Together, ADA and RDR provide a deployment-oriented mechanism for reducing KV residency while preserving decode efficiency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.18856v3</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anay Chauhan, Gurucharan Marthi Krishna Kumar, Arion Das, Amit Dhanda, Vinija Jain, Aman Chadha, Amitava Das</dc:creator>
    </item>
    <item>
      <title>Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution</title>
      <link>https://arxiv.org/abs/2605.19228</link>
      <description>arXiv:2605.19228v2 Announce Type: replace 
Abstract: Large Language Models have achieved strong performance on reasoning tasks with objective answers by generating step-by-step solutions, but diagnosing where a multi-step reasoning trace might fail remains difficult. Confidence estimation offers a diagnostic signal, yet existing methods are restricted to final answers or require internal model access. In this paper, we introduce Stepwise Confidence Attribution (SCA), a framework for closed-source LLMs that assigns step-level confidence based only on generated reasoning traces. SCA applies the Information Bottleneck principle: steps aligning with consensus structures across correct solutions receive high confidence, while deviations are flagged as potentially erroneous. We propose two complementary methods: (1) NIBS, a non-parametric IB approach measuring consistency without graph structures, and (2) GIBS, a graph-based IB model that learns subgraphs through a differentiable mask to capture logical variability. Extensive experiments on mathematical reasoning and multi-hop question answering show that SCA reliably identifies low-confidence steps strongly correlated with reasoning errors. Moreover, using step-level confidence to guide self-correction improves the correction success rate by up to 13.5\% over answer-level feedback.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.19228v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.IT</category>
      <category>cs.LG</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xiaoou Liu, Tiejin Chen, Dengjia Zhang, Yaqing Wang, Lu Cheng, Hua Wei</dc:creator>
    </item>
    <item>
      <title>FormalASR: End-to-End Spoken Chinese to Formal Text</title>
      <link>https://arxiv.org/abs/2605.19266</link>
      <description>arXiv:2605.19266v2 Announce Type: replace 
Abstract: Automatic speech recognition (ASR) systems are typically optimized for verbatim transcription, which preserves disfluencies, filler words, and informal spoken structures that are often unsuitable for downstream writing-oriented applications. A common workaround is a two-stage ASR+LLM pipeline for post-editing, but this design increases latency and memory cost and is difficult to deploy on-device. We present FormalASR, two compact end-to-end models (0.6B and 1.7B) that directly transcribe spoken Chinese into formal written text. To enable this setting, we build WenetSpeech-Formal and Speechio-Formal, two large-scale spoken-to-formal datasets constructed by LLM-based rewriting and quality filtering. We then fine-tune Qwen3-ASR at two scales (0.6B and 1.7B) with supervised fine-tuning. Experiments on WenetSpeech-Formal and Speechio-Formal show that FormalASR achieves up to 37.4% relative CER reduction over verbatim baselines, while also improving ROUGE-L and BERTScore. FormalASR requires no post-processing LLM at deployment time, providing a lightweight, on-device solution for spoken-to-formal transcription.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.19266v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Wanyi Ning, Yinshang Guo, Haitao Qian, Jiyuan Cheng, Weiyuan Feng, Yufei Zhang</dc:creator>
    </item>
    <item>
      <title>OpenCompass: A Universal Evaluation Platform for Large Language Models</title>
      <link>https://arxiv.org/abs/2605.19276</link>
      <description>arXiv:2605.19276v3 Announce Type: replace 
Abstract: In recent years, the field of artificial intelligence has undergone a paradigm shift from task-specific small-scale models to general-purpose large language models (LLMs). With the rapid iteration of LLMs, objective, quantitative, and comprehensive evaluation of their capabilities has become a critical link in advancing technological development. Currently, the mainstream static benchmark dataset-based evaluation methods face challenges such as the diversity of task types, inconsistent evaluation criteria, and fragmentation of data and processing workflows, making it difficult to efficiently conduct cross-domain and large-scale model evaluation. To address the aforementioned issues, this paper proposes and open-sources OpenCompass, a one-stop, scalable, and high-concurrency-supported general-purpose LLM evaluation platform. Adhering to the design philosophy of modularization and component decoupling, the platform boasts three core advantages: high compatibility, flexibility, and high concurrency. The core architecture of OpenCompass comprises five key components: the Configuration System, Task Partitioning Module, Execution and Scheduling Module, Task Execution Unit, and Result Visualization Module. Its workflow provides rule-based, LLM-as-a-Judge, and cascaded evaluators to adapt to the requirements of different task scenarios. Supporting mainstream benchmark datasets across multiple domains, including knowledge, reasoning, computation, science, language, code, etc., the platform offers a unified and efficient LLM evaluation tool for both academia and industry, facilitating the accurate identification of strengths and weaknesses of LLMs as well as their subsequent optimization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.19276v3</guid>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Maosong Cao, Kai Chen, Haodong Duan, Yixiao Fang, Zhiwei Fei, Tong Gao, Ge Jiaye, Mo Li, Hongwei Liu, Junnan Liu, Yuan Liu, Chengqi Lyu, Han Lyu, Ningsheng Ma, Zerun Ma, Yu Sun, Zhiyong Wu, Linchen Xiao, Zhuozhi Xiong, Jun Xu, Haochen Ye, Zhaohui Yu, Yike Yuan, Songyang Zhang, Yufeng Zhao, Fengzhe Zhou, Peiheng Zhou, Dongsheng Zhu, Lin Zhu, Jingming Zhuo</dc:creator>
    </item>
    <item>
      <title>When Tabular Foundation Models Meet Strategic Tabular Data: A Prior Alignment Approach</title>
      <link>https://arxiv.org/abs/2605.19662</link>
      <description>arXiv:2605.19662v2 Announce Type: replace 
Abstract: Tabular foundation models based on pretrained prior-data fitted networks~(PFNs) have shown strong generalization on diverse tabular tasks, but they are typically designed for \emph{non-strategic} settings where data distributions are independent of deployed classifiers. In many real-world decision scenarios, however, individuals may strategically modify their features after deployment to obtain favorable outcomes, inducing a post-deployment distribution shift. This paper studies whether PFN-style tabular foundation models can generalize to such \emph{strategic} tabular data. We show that strategic manipulation creates a mismatch between the non-strategic prior learned during pretraining and the post-manipulation strategic prior, which leads to systematic prediction bias. To address this issue, we propose \textbf{Strategic Prior-data Fitted Network}~\textit{(SPN)}, an inference-time strategy-aware framework that adapts tabular foundation models to strategic environments without retraining. SPN constructs strategic in-context examples to approximate post-manipulation inputs and aligns PFN predictions with the induced strategic distribution. Experiments on real-world and synthetic tabular datasets show that SPN consistently improves robustness and predictive performance under strategic manipulation compared with both tabular foundation models and classical tabular methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.19662v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xinpeng Lv, Yunxin Mao, Renzhe Xu, Chunyuan Zheng, Yikai Chen, Haoxuan Li, Jinxuan Yang, Kun Kuang, Yuanlong Chen, Mingyang Geng, Wanrong Huang, Shixuan Liu, Shaowu Yang, Wenjing Yang, Zhouchen Lin, Haotian Wang</dc:creator>
    </item>
    <item>
      <title>Beyond Rational Illusion: Behaviorally Realistic Strategic Classification</title>
      <link>https://arxiv.org/abs/2605.19674</link>
      <description>arXiv:2605.19674v2 Announce Type: replace 
Abstract: Strategic classification(SC) studies the interaction between decision models and agents who strategically manipulate their features for favorable outcomes. Existing SC frameworks typically rely on the idealized assumption that agents are strictly rational. However, evidence from behavioral economics and psychology consistently shows that real-world decision-making is often shaped by cognitive biases, deviating from pure rationality. To formalize this limitation, we identify and define a new problem setting, termed the behaviorally realistic strategic classification problem, where agents' strategic manipulations deviate from full rationality due to psychological biases. Motivated by the identified limitation, we propose the Prospect-Guided Strategic Framework (Pro-SF) to address the problem, a principled framework grounded in prospect theory to model and learn under behaviorally realistic strategic responses. Specifically, to capture behaviorally realistic strategic manipulations, our framework reformulates the Stackelberg-style interaction between agents and the decision-maker by incorporating three key mechanisms inspired by prospect theory, including the asymmetry between benefits and costs, different subjective reference points, and non-rational probability distortion. Experiments on synthetic and real-world datasets establish Pro-SF as a behaviorally grounded approach to strategic classification, bridging machine learning and behavioral economics for more reliable deployment in the real world.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.19674v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xinpeng Lv, Yunxin Mao, Renzhe Xu, Chunyuan Zheng, Yikai Chen, Haoxuan Li, Yang Shi, Jinxuan Yang, Zhouchen Lin, Yuanlong Chen, Yuanxing Zhang, Shaowu Yang, Wenjing Yang, Haotian Wang</dc:creator>
    </item>
    <item>
      <title>Causal Unlearning in Collaborative Optimization: Exact and Approximate Influence Reversal under Adversarial Contributions</title>
      <link>https://arxiv.org/abs/2605.20341</link>
      <description>arXiv:2605.20341v2 Announce Type: replace 
Abstract: Federated learning systems must support data deletion requests to comply with privacy regulations, yet retraining from scratch after each deletion is computationally prohibitive. We present HF-KCU, a method that removes a client's contribution by approximating the influence function through conjugate gradient iterations in Krylov subspaces, reducing complexity from O(d^3) to O(kd) where k&lt;&lt;d.A causal weighting mechanism ensures that only clients holding the deleted data receive parameter updates, preventing spurious changes to unaffected clients. Our method is designed to handle bounded adversarial perturbations to the Hessian and gradient, providing graceful degradation under realistic threat models. We validate HF-KCU across convolutional (ResNet-18, SimpleCNN) and transformer (ViT-Lite) architectures on CIFAR-10, MNIST, and Fashion-MNIST. On CIFAR-10 under Dirichlet (alpha=0.5) partitioning, HF-KCU achieves 47.75 times speedup over retraining while maintaining test accuracy within 0.60% of the rational baseline(71.16 vs 71.76 %). Membership inference attacks on the forget set yield success rates of 0.499 matching the retrained model and confirming effective privacy restoration. We provide convergence guarantees showing that the Krylov approximation error decreases as O((k ^1/2-1)/(k^1/2+1)) where k is the Hessian condition number. The causal weighting mechanism ensures surgical updates, where only clients holding deleted data are modified, preserving model quality for unaffected participants and avoiding the instability of gradient-based approaches in asynchronous federated settings. This design provides interpretability as each update is directly traceable to the influence of the deleted data. The method's efficiency and precision make it suitable for production federated systems where deletion requests arrive asynchronously and computational budgets are constrained.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.20341v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CR</category>
      <category>cs.PF</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ali Mahdavi, Azadeh Zamanifar, Amirfarhad Farhadi, Omid Kashefi</dc:creator>
    </item>
    <item>
      <title>Lowering the Barrier to IREX Participation: Open-Source Algorithms, Toolkit, and Benchmarking for Iris Recognition</title>
      <link>https://arxiv.org/abs/2605.20735</link>
      <description>arXiv:2605.20735v2 Announce Type: replace 
Abstract: NIST Iris Exchange (IREX) offers an appealing solution to evaluating new open-source iris recognition algorithms, but it presents high barriers to entry because these algorithms must be written in C++, using a specific API, and adapted to meet strict IREX speed and memory constraints. The main goal of this paper is to lower these barriers and advance open-source iris recognition large-scale evaluations by offering: (a) two new modern deep learning-based open-source iris matchers (ArcIris and TripletIris), along with their C++ IREX X-compliant implementations, which are the first open-source iris recognition methods included into the IREX X leaderboard (and thus IREX-vetted), as well as new segmentation and iris circular approximation models that can be incorporated into any new iris recognition method, and (b) a performance assessment (according to IREX X testing protocols) of all major and currently available open-source iris recognition solutions. The paper also provides Python implementations of the new ArcIris and TripletIris methods and discusses the differences one may encounter between C++ and Python implementations of the same conceptually equivalent approaches. Finally, the paper offers open-source, IREX X-compliant C++ implementations of two existing methods: (a) an iris image filtering-based algorithm utilizing human saliency-driven kernels (HDBIF), and (b) a human-interpretable algorithm for detecting and comparing Fuchs' crypts (CRYPTS). In addition to IREX X evaluation results, the paper reports the performance of all methods on major academic benchmarks: Quality-Face/Iris Research Ensemble (Q-FIRE), Warsaw-Biobase Post-Mortem Iris, CASIA-Iris-Thousand-V4, CASIA-Iris-Lamp-V4, IIT Delhi Iris Database, IIITD Contact Lens Iris Database, NDIris3D, and Notre Dame Variable Iris Image Quality Release 2 (VII-Q-R2).</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.20735v2</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Siamul Karim Khan, Patrick J. Flynn, Adam Czajka</dc:creator>
    </item>
    <item>
      <title>Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy</title>
      <link>https://arxiv.org/abs/2605.21006</link>
      <description>arXiv:2605.21006v2 Announce Type: replace 
Abstract: We study the effect of different persona on \textbf{sycophancy}: model's agreement with users even when the user is incorrect. The standard mitigation, Contrastive Activation Addition (CAA), derives a steering direction from labelled pairs of sycophantic and honest responses. This study evaluates whether off-the-shelf persona steering vectors, originally developed for general role-playing and not trained on sycophancy data, can serve as an alternative. In two instruction-tuned models, steering toward personas characterised by doubt or scrutiny reduces sycophancy to approximately $68\%$ and $98\%$ of CAA's effect, and, unlike CAA, maintains accuracy when the user is correct. The effect is also asymmetric: steering toward agreeable personas does not produce a mirror increase in sycophancy. Geometrically, the persona vector is largely independent of the direction of sycophancy in activation space. Collectively, these findings suggest that sycophancy is better understood as a persona-level property rather than a single steerable direction. We release our code here: https://anonymous.4open.science/r/Sycophancy-Steering-9DF0/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.21006v2</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ishaan Kelkar, Nebras Alam, Vikram Kakaria, Madhur Panwar, Vasu Sharma, Maheep Chaudhary</dc:creator>
    </item>
    <item>
      <title>DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation</title>
      <link>https://arxiv.org/abs/2605.21028</link>
      <description>arXiv:2605.21028v2 Announce Type: replace 
Abstract: Autoregressive long video generation often adopts bounded-memory streaming for efficiency, typically combining local windows for short-term continuity with static early-frame sinks as long-range anchors. However, this fixed allocation keeps early frames cached even when the current visual state has substantially diverged from them, while discarding potentially more relevant intermediate history. As a result, the retained long-range context may become less adaptive and bias generation toward outdated cues; in severe cases, RoPE-induced phase re-alignment can homogenize inter-head attention and cause sink collapse, where content regresses toward sink frames. We propose DySink, a retrieval-based framework that maintains a compact memory bank and selects visually relevant historical frames as dynamic frame sinks. DySink couples adaptive retrieval with a sink anomaly gate, which detects excessive inter-head consensus over retrieved context and suppresses collapse-prone context. Experiments on minute-long videos show that DySink consistently improves dynamic degree over strong baselines while also achieving higher temporal quality. The code and model weights will be released at https://github.com/yebo0216best/DySink.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.21028v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bo Ye, Xinyu Cui, Jian Zhao, Tong Wei, Min-Ling Zhang</dc:creator>
    </item>
    <item>
      <title>CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models</title>
      <link>https://arxiv.org/abs/2605.21854</link>
      <description>arXiv:2605.21854v2 Announce Type: replace 
Abstract: Vision-Language-Action (VLA) models have rapidly converged on a small set of architectural patterns: discrete-token autoregression (e.g. OpenVLA) and continuous-action flow-matching (e.g. pi-0.5). Yet preference alignment via Direct Preference Optimisation (DPO) -- the de-facto post-training step in language models -- has been studied almost exclusively on autoregressive VLAs. We present CrossVLA, an empirical study of cross-paradigm VLA post-training. Three contributions: (i) a surrogate flow-matching log-probability estimator that lets DPO operate on continuous-action backbones without probability-flow ODE integration; (ii) a head-to-head comparison of LoRA and DoRA as the parameter-efficient layer for VLA DPO, finding DoRA improves over OpenVLA SFT by a mean +10.4 pp across LIBERO 4-suite (600 trials, 3 seeds) -- per-suite +20.0 Object, +11.0 Long-horizon, +8.0 Goal, +2.7 Spatial -- with zero seed variance on Object (38/50 on each of 3 seeds); (iii) an inference-time anatomy showing the denoise loop dominates 78.6% of sample_actions latency and prefix-K/V caching a la VLA-Cache caps at a 21% acceleration ceiling -- both chunk-level and token-level cache strategies degrade success rate to 0-80% in our benchmarks. We further pretrain a multi-view + temporal projection head on 6000 LIBERO frames, achieving 99.5% k-NN recall@1 for same-task retrieval (36x over random), available as a downstream initialisation. All code, ckpts, training logs, and reproduction scripts are open at https://github.com/lz-googlefycy/vla-lab.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.21854v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhi Liu</dc:creator>
    </item>
    <item>
      <title>Ishigaki-IDS-Bench: A Benchmark for Generating Information Delivery Specification from BIM Information Requirements</title>
      <link>https://arxiv.org/abs/2605.22079</link>
      <description>arXiv:2605.22079v2 Announce Type: replace 
Abstract: Building Information Modeling (BIM) projects increasingly use Information Delivery Specification (IDS) to formalize information requirements in a machine-checkable XML format. Because IDS conditions are grounded in the Industry Foundation Classes (IFC) vocabulary, authoring them requires expertise in IFC concepts, validation tools, and property set conventions. Existing benchmarks for structured generation do not adequately capture the additional burden of vocabulary conformance and external-validator agreement that IDS imposes. We present Ishigaki-IDS-Bench, the first publicly released benchmark for IDS generation from BIM information requirements. The benchmark contains 166 examples spanning 83 practical scenarios authored in Japanese and English by six BIM/IDS experts, each paired with a gold IDS file and metadata covering input format, turn setting, target IFC versions, and construction domain. Evaluation proceeds in two stages: (i) formal validity scored by the buildingSMART IDSAuditTool along Processability, Structure, and Content, and (ii) content fidelity scored by facet-level macro-F1 against the gold IDS. Across 10 LLMs in zero-shot, the highest Facet F1 is 65.6%, achieved by GPT-5.5, while the highest Content pass rate is only 33.1%, achieved by Claude Opus 4.5. Ishigaki-IDS-Bench is released on Hugging Face (DOI 10.57967/hf/8873) under CC BY 4.0, and the evaluation code is released on Zenodo (DOI 10.5281/zenodo.20550510) under Apache-2.0.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.22079v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ryo Kanazawa, Koyo Hidaka, Teppei Miyamoto, Takayuki Kato, Tomoki Ando, Chenguang Wang, Dayuan Jiang, Naofumi Fujita, Shuhei Saitoh, Atomu Kondo, Koki Arakawa, Daiho Nishioka</dc:creator>
    </item>
    <item>
      <title>EvoIR-Agent: Self-Evolving Image Restoration Agentic System via Experience-Driven Learning</title>
      <link>https://arxiv.org/abs/2605.22208</link>
      <description>arXiv:2605.22208v2 Announce Type: replace 
Abstract: Multimodal Large Language Model (MLLM)-driven image restoration agent demonstrates effectiveness in degradation coupling scenarios by flexibly selecting tools and determining removal orders. However, their zero-shot planning often fails without experience, necessitating severe trial-and-error overhead to achieve satisfactory outcomes. Currently, two paradigms are employed to address this issue, yet a dilemma persists: Training-based methods embed intrinsic experience into parameters, achieving high inference efficiency but lacking compatibility with new tools or degradation. In contrast, training-free methods utilize explicit experience storage for compatibility but still incur trial-and-error overhead due to naive experience. To resolve the dilemma, we propose EvoIR-Agent, which first systematically formulates the experience components of a training-free image restoration agent. Subsequently, a hierarchical experience pool is constructed, which enables coarse-to-fine guidance for diverse tools and removal orders. Furthermore, a self-evolving mechanism is introduced to update the pool from scratch using accumulated records, thereby greatly improving performance and efficiency. Extensive experiments reveal that EvoIR-Agent achieves a significant lead in the full reference metrics and yields a remarkable Pareto-optimal balance between performance and efficiency compared to the state-of-the-art methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.22208v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kailin Zhuang, Jiawei Wu, Zhi Jin</dc:creator>
    </item>
    <item>
      <title>MBABench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance</title>
      <link>https://arxiv.org/abs/2605.22664</link>
      <description>arXiv:2605.22664v2 Announce Type: replace 
Abstract: LLM agents are increasingly expected to carry out end-to-end workflows, producing complete artifacts from high-level user instructions. To meet enterprise needs, frontier AI labs have developed agents that can construct entire spreadsheets from scratch. This is especially relevant in finance, where core workflows such as financial modeling, forecasting, and scenario analysis are commonly conducted through spreadsheets. Yet, existing spreadsheet benchmarks do not measure this advanced capability, focusing instead on question-answering or single-formula edits. To address this gap, we provide one of the first evaluations of agents on end-to-end spreadsheet tasks, focusing on economically critical financial workflows such as modeling and scenario analysis. Since deliverables therein are routinely reviewed and revised by multiple stakeholders, judging their quality necessarily involves high-level criteria such as readability or ease of modification. To reflect the multidimensional nature of solution quality, we develop an evaluation taxonomy comprising three dimensions: Accuracy, Formula, and Format, each comprising fine-grained criteria that reflect professional standards. The Claude family leads the benchmark and produces the most professional-looking outputs in our qualitative review, but even the strongest agents frequently fall short of professional finance standards and degrade sharply as the difficulty increases beyond a few chained calculations. This suggests that current agents are not yet able to reliably produce professional-quality spreadsheets at the level of complexity real-world workflows demand.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.22664v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Thomson Yen, Julian Poeltl, Harshith Srinivas Gear, Yilin Meng, Joshua Fan, Adam Shen, Yili Liu, Ali Bauyrzhan, Siri Du, Haoyang Liu, Daniel Guetta, Hongseok Namkoong</dc:creator>
    </item>
    <item>
      <title>Advancing Mathematics Research with AI-Driven Formal Proof Search</title>
      <link>https://arxiv.org/abs/2605.22763</link>
      <description>arXiv:2605.22763v2 Announce Type: replace 
Abstract: Large language models (LLMs) increasingly excel at mathematical reasoning, but their unreliability limits their utility in mathematics research. A mitigation is using LLMs to generate formal proofs in languages like Lean. We perform the first large-scale evaluation of this method's ability to solve open problems. Our most capable agent autonomously resolved 9 of 353 open Erd\H{o}s problems at the per-problem cost of a few hundred dollars, proved 44/492 OEIS conjectures, and is being deployed in combinatorics, optimization, graph theory, algebraic geometry, and quantum optics research. A basic agent alternating LLM-based generation with Lean-based verification replicated the Erd\H{o}s successes but proved costlier on the hardest problems. These findings demonstrate the power of AI-aided formal proof search and shed light on the agent designs that enable it.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.22763v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>George Tsoukalas, Anton Kovsharov, Sergey Shirobokov, Anja Surina, Moritz Firsching, Gergely B\'erczi, Francisco J. R. Ruiz, Arun Suggala, Adam Zsolt Wagner, Eric Wieser, Lei Yu, Aja Huang, Mikl\'os Z. Horv\'ath, Andrew Ferraiuolo, Henryk Michalewski, Edward Lockhart, Codrut Grosu, Thomas Hubert, Matej Balog, Pushmeet Kohli, Swarat Chaudhuri</dc:creator>
    </item>
    <item>
      <title>DeltaBox: Scaling Stateful AI Agents with Millisecond-Level Sandbox Checkpoint/Rollback</title>
      <link>https://arxiv.org/abs/2605.22781</link>
      <description>arXiv:2605.22781v2 Announce Type: replace 
Abstract: LLM-powered AI agents require high-frequency state exploration (e.g., test-time tree search and reinforcement learning), relying on rapid checkpoint and rollback (C/R) of the complete sandbox state, including files and process state (e.g., memory, contexts, etc.). Existing mechanisms duplicate the entire state, causing hundreds of milliseconds to seconds of latency per C/R, which severely bottlenecks deep search and large-scale fan-outs. This paper observes that subsequent checkpoints in AI agents are highly similar. Therefore, instead of full duplication, a sandbox should only duplicate the changes between consecutive checkpoints (Key Insight). However, it is non-trivial to realize the idea, mainly due to the missing OS supports.
  This paper proposes a new OS-level abstraction, DeltaState, to enable the change-based transactional C/R for AI agents with two co-designed OS mechanisms. First, DeltaFS enables change-based filesystem C/R by organizing the file states into layers and dynamically freezing the writable layer and inserting a new one during checkpoint, reducing file updates to copy-on-write, and making rollback a simple layer switch. Second, DeltaCR enables change-based process state C/R using incremental dumps, and accelerates rollback by bypassing traditional pipelines to directly fork() from a frozen template process. We then present DeltaBox, a novel agent sandbox achieving millisecond level C/R through the two new mechanisms. Evaluations on SWE-bench and RL micro-benchmarks show DeltaBox completes checkpoint and rollback in millisecond-level latency (14ms and 5ms, respectively), empowering agents to explore substantially more nodes under fixed time budgets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.22781v2</guid>
      <category>cs.OS</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yunpeng Dong, Jingkai He, Shiqi Liu, Yuze Hou, Dong Du, Zhonghu Xu, Si Yu, Baochuan Yang, Yubin Xia, Haibo Chen</dc:creator>
    </item>
    <item>
      <title>Latent Cache Flow: Model-to-Model Communication Without Text</title>
      <link>https://arxiv.org/abs/2605.22863</link>
      <description>arXiv:2605.22863v2 Announce Type: replace 
Abstract: LLM agents today communicate via text, which incurs considerable latency and information loss due to the need to autoregressively decode the sharer model's state and encode at the receiver model. Recent work such as Cache-to-Cache (C2C; Fu et al., 2026) seeks to exchange KV caches by learning adapters that translate sharer KV matrices to the receiver model. However, the adapters are large and expensive to train, and translate individual tokens, which requires the target context to be identical. This is unsuitable for agent communication, where the LLMs have differing context. We introduce Latent Cache Flow (LCF). To address efficiency, we observe that keys and values can be jointly translated and compressed, reducing the adapter to about 4% of C2C's size. To address differing context, we design the adapter to transmit a summary of new information that the target model does not have. Our early experiments show that a pruned 13 MB LCF adapter can be more accurate than C2C at 956 MB in shared-context settings; for different contexts, LCF improves F1 by 7.5% and Exact Match by 23% while 8.5 times faster than text-based communication.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.22863v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Maximillian Rossi, Prajwal Raghunath, Eugene Wu</dc:creator>
    </item>
    <item>
      <title>Improved Torn Paper Coding via Local Alignment</title>
      <link>https://arxiv.org/abs/2605.23076</link>
      <description>arXiv:2605.23076v2 Announce Type: replace 
Abstract: In the torn paper channel, a transmitted codeword is broken at random locations into fragments that arrive at the decoder in an unordered manner. A central theoretical challenge within this model is global alignment -- the task of determining each fragment's original position -- in order to faithfully reconstruct the entire codeword. Prior work by Shomorony and Vahid introduced an interleaved-pilot scheme that successfully achieved a vanishing error probability. However, their alignment strategy relies heavily on global statistics, requiring fragments to exceed a minimum length and effectively discarding many shorter ones as erasures, which results in rates significantly below capacity. To address this gap, we propose an improved coding scheme that achieves a provable rate increase through a novel approach we call \textit{local alignment}. This approach identifies global alignment bits within each fragment using only local information, allowing the decoder to determine the positions of fragments that are shorter than those used in previous work. Consequently, the decoder can extract information from a much larger fraction of the channel output than in previous work, yielding significantly higher rates. Furthermore, we extend our analysis to torn paper coding with lost pieces (TPC-LP), a generalized model that accounts for length-dependent fragment deletion. For a class of TPC-LP channels that delete all fragments below a logarithmic length threshold while allowing arbitrary length-dependent deletion probabilities for longer fragments, we show that the proposed local alignment strategy achieves an arbitrarily small additive gap to capacity as the threshold increases.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.23076v2</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Junsheng Liu, Netanel Raviv</dc:creator>
    </item>
    <item>
      <title>Accelerating Divisible Load Processing Through Machine Learning: A Practical Framework for Large-Scale Workloads</title>
      <link>https://arxiv.org/abs/2605.23247</link>
      <description>arXiv:2605.23247v2 Announce Type: replace 
Abstract: In this paper, we introduce the first machine learning framework for predicting optimal processing times in Single-Level Tree Network (SLTN) architectures for the Divisible Load Theory (DLT) paradigm. Using a feedforward neural network(FNN) with 16 engineered features, we train a model on 100,000 synthetically generated configurations to predict optimal processing times without explicit formulation of DLT equations. The model achieves 97-99% accuracy (R-square factor) with mean absolute percentage error of 1-5%, demonstrating that neural networks can effectively learn complex load distribution relationships. Feature importance analysis reveals that the model implicitly captures DLT mathematical structure, including load conservation and simultaneous finishing constraints. With inference times under 1 millisecond, the approach serves as a viable option over traditional DLT computation, enabling applications in real-time scheduling, design space exploration, and cloud resource allocation. The method generalizes well across diverse system configurations (n=3 to 20, load size =1 to 100 GB) with consistent accuracy, though performance degrades slightly for very large or highly heterogeneous systems. This work demonstrates the feasibility of using machine learning to accelerate distributed computing optimization while maintaining near-optimal accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.23247v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Bharadwaj Veeravalli</dc:creator>
    </item>
    <item>
      <title>Learning to Evaluate: Cost-Effective Model Evaluation on Unlabeled Data with Meta-Learning</title>
      <link>https://arxiv.org/abs/2605.23595</link>
      <description>arXiv:2605.23595v3 Announce Type: replace 
Abstract: The rapid advancement of machine learning has led to an unprecedented expansion of model ecosystems, making it increasingly difficult to assess the reliability of newly released models on unseen and unlabeled data. Existing evaluation pipelines typically rely on costly annotation, repeated fine-tuning, or assumptions that do not generalize well to new models. We introduce MetaEvaluator, a cost-effective, model-agnostic framework for fast, label-free evaluation of unseen models across diverse architectures and modalities. MetaEvaluator meta-learns over a pool of reference models to acquire an effective initialization for accurate assessment of unseen models, thereby amortizing evaluation cost and eliminating the need for per-model retraining. To the best of our knowledge, this is the first model-agnostic framework that evaluates new models on unlabeled datasets. Extensive experiments demonstrate that MetaEvaluator delivers stable and accurate performance estimates at substantially lower cost than conventional approaches, enabling scalable benchmarking on unlabeled datasets for emerging models. The code is available at: https://github.com/phkhanhtrinh23/MetaEvaluator.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.23595v3</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <category>cs.ET</category>
      <category>cs.PF</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Trinh Pham, Viet Huynh, Hongzhi Yin, Quoc Viet Hung Nguyen, Thanh Tam Nguyen</dc:creator>
    </item>
    <item>
      <title>LGMT: Logic-Grounded Metamorphic Testing for Evaluating the Reasoning Reliability of LLMs</title>
      <link>https://arxiv.org/abs/2605.23965</link>
      <description>arXiv:2605.23965v2 Announce Type: replace 
Abstract: Large Language Models (LLMs) achieve strong performance on logical reasoning benchmarks, yet their reliability remains uncertain. Existing evaluations rely on static benchmarks, which fail to assess robustness under logically equivalent transformations and often overestimate reasoning capability. We propose LGMT (Logic-Grounded Metamorphic Testing), an oracle-free framework that leverages first-order logic (FOL) to evaluate LLM reasoning. By deriving metamorphic relations from formal logical equivalences, LGMT constructs semantically invariant test cases and detects reasoning defects through cross-case consistency checking. Experiments on six state-of-the-art LLMs show that LGMT exposes substantial hidden defects missed by traditional reference-based evaluations. We further find that models are particularly sensitive to symbol-level and conclusion-level variations, and that advanced prompting such as Few-shot CoT only partially mitigates these issues. These results suggest that LLM evaluation should move beyond isolated correctness toward robustness under logical invariance. LGMT provides a principled and scalable approach for diagnosing reasoning failures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.23965v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zenghui Zhou, Man Li, Xiaoke Fang, Xinyi Zhou, Weibin Lin, Zheng Zheng</dc:creator>
    </item>
    <item>
      <title>How Many Tools Should an LLM Agent See? A Chance-Corrected Answer</title>
      <link>https://arxiv.org/abs/2605.24660</link>
      <description>arXiv:2605.24660v2 Announce Type: replace 
Abstract: Before an LLM agent can use a tool, a retrieval system must decide which candidate tools to show to the agent. How long should that shortlist be? Show too many tools and the model struggles to choose. Show too few and the correct tool may not appear. Most systems apply a fixed shortlist size to every query, but no standard metric exists to evaluate whether that size was appropriate. We treat the number of tools shown to an LLM agent as the object of evaluation and we apply Bits-over-Random (BoR), a chance-corrected metric that asks whether success at a given depth is better than what random selection would achieve at that same depth. We evaluate BoR across three tool-selection benchmarks, multiple scorers, and registries ranging from 20 to 3,251 tools. We then turn the same principle into a reinforcement learning (RL) reward for choosing tool shortlist depth per query. The RL agent is deliberately simple, serving as a probe of the metric rather than a proposed system. As the shortlist grows, random chance of including the correct tool rises, so the reward naturally decreases, reducing the need for an engineered depth penalty. On BFCL (370 tools), the learned policy nearly matches the coverage of showing 50 tools ($90.3\%$ vs $90.8\%$) while presenting only 7 on average. On ToolBench (3,251 tools), a fixed shortlist of 5 tools achieves higher aggregate coverage ($64.7\%$ vs $61.9\%$) but finds nothing on hard queries (correct tool ranked 6th-20th). The BoR agent finds $16.7\%$ on those same queries by searching deeper. Downstream validation with Claude Sonnet 4.6 indicates that shorter adaptive lists also improve the LLM's ability to select the right tool: $93.1\%$ versus $87.1\%$ when always shown 5 tools, widening to $76.8\%$ vs $60.9\%$ on medium-difficulty queries where the correct tool is present but not ranked first.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.24660v2</guid>
      <category>cs.IR</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vyzantinos Repantis, Ameya Gawde, Harshvardhan Singh, Joey Blackwell II</dc:creator>
    </item>
    <item>
      <title>QuoVLA: Quotient Space for Vision-Language-Action Models</title>
      <link>https://arxiv.org/abs/2605.24890</link>
      <description>arXiv:2605.24890v2 Announce Type: replace 
Abstract: Vision-Language-Action (VLA) models commonly adapt pretrained Vision-Language Models (VLMs) to robot control by mapping visual observations and language instructions to continuous actions. Existing approaches typically take an action-insufficiency view, assuming that pretrained VLM latents either lack directly usable action information or should be shielded from action-learning signals. Against this view, our \textit{Quotient Theory for VLA} shows that pretrained VLM latents are not action-insufficient but action-sufficient: they already contain the information needed for control, yet remain overcomplete by distinguishing prompt-level variations that induce the same optimal action behavior. To operationalize this theory, we propose QuoVLA, a quotient-space framework for VLA that compresses pretrained VLM latents into action-sufficient representations. Specifically, QuoVLA instantiates this principle with a quantization module and a dual-branch design with relative temporal-complexity regularization, preserving action-relevant information while removing prompt-level redundancy. Extensive experiments across multiple benchmarks demonstrate that QuoVLA achieves strong performance, with particularly notable improvements in generalization under visual, linguistic, and environmental distribution shifts. Our code will be made publicly available.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.24890v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xuan Wang, Yinan Wu, Haoran Duan, Jungong Han</dc:creator>
    </item>
    <item>
      <title>X-Foresight: A Joint Vision-Action Causal Forecasting Network via Predictive World Modeling</title>
      <link>https://arxiv.org/abs/2605.24892</link>
      <description>arXiv:2605.24892v3 Announce Type: replace 
Abstract: Physical world knowledge resides mainly in videos. Equipping Vision-Language-Action (VLA) models with such knowledge is fundamental for safe and generalizable planning. Predictive world modeling enables VLA to internalize physical dynamics and long-term causality by predicting future video from past observations. However, naive next-frame prediction faces two challenges: 1) unlike semantically distinct text tokens, video tokens are low-entropy and redundant, causing prediction to degenerate into trivial extrapolation. 2) world modeling poses a temporal dilemma: dense prediction captures instantaneous dynamics, but cannot efficiently model long-horizon causality.
  To learn world knowledge effectively, we introduce X-Foresight, a predictive world model integrated directly into the VLA architecture to jointly learn world modeling and real-time action control. At its core lies a long-horizon chunk-wise auto-regressive strategy that addresses both challenges: by predicting semantically distant chunks rather than adjacent frames, it escapes trivial extrapolation, while preserving dense intra-chunk frames for instantaneous dynamics and sparse inter-chunk transitions for long-term causality. A curriculum learning schedule progressively extends prediction horizons and stabilizes long-horizon training. To capture long-term causality effectively, we present temporal importance sampling, which concentrates supervision on safety-critical chunks identified by ego-motion and behavioral signals. We further delegate photorealistic synthesis to a diffusion-based multi-view renderer, improving photorealistic appearance.
  Comprehensive experiments demonstrate that X-Foresight significantly outperforms VLA baselines in planning performance while maintaining strong generative fidelity, establishing a robust paradigm for world-knowledge-driven autonomous systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.24892v3</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Baolu Li (Victor), Jingyu Qian (Victor), Rui Guo (Victor), Yilun Chen (Victor), Hanpeng Liu (Victor), Yuan Lin (Victor), Junhong Zhou (Victor), Ruixin Liu (Victor), Willow Yang (Victor), Yutong Zheng (Victor), Zhenli Zhang (Victor), Sean Li (Victor), Chaoda Zheng (Victor), Boyang Wang (Victor),  Tenglong (Victor),  Gu, Zhuangzhuang Ding, Pengkun Zheng, Yu Zhang, Xianming Liu</dc:creator>
    </item>
    <item>
      <title>Riemannian-Manifold Steering: Geometry-Aware Generative Autoencoders for Label-Free Steering</title>
      <link>https://arxiv.org/abs/2605.24942</link>
      <description>arXiv:2605.24942v2 Announce Type: replace 
Abstract: Steering a language model - intervening on its internal activations to change downstream behaviour - has recently expanded beyond linear interpolation to nonlinear methods such as angular and kernelized steering, which define intervention transformations without learning an explicit geometry over paths in activation space. Freshly introduced geometry-aware manifold methods do learn such a geometry, but require labelled class centroids together with prescribed cyclic or sequential structure. These assumptions restrict where manifold steering can be applied, since existing constructions require labelled centroids and compatible boundary conditions. We recast manifold steering more broadly as \textbf{Riemannian geodesic computation} on activation space, recovering linear and labelled-spline steering as geodesics under particular choices of metric. A principled metric within this framework is the output-space Hellinger distance pulled back to activations; we approximate this with a learned encoder trained on output distances over a small concept-token schema - no per-prompt labels, no topology prior, and no per-task curve fitting. Empirically, the method reliably drives the model onto the target class across all tasks in a standard four-task language-model arithmetic benchmark, while following more behaviourally natural trajectories than baselines on smaller output spaces. We thereby provide a unified Riemannian framework for manifold steering together with a schema-supervised, label-free instantiation that operates without labelled centroids or prescribed boundary conditions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.24942v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Narmeen Oozeer, Shivam Raval, Philip Quirke, Manikandan Ravikiran, Jeff Phillips, Shriyash Upadhyay, Amirali Abdullah</dc:creator>
    </item>
    <item>
      <title>Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression</title>
      <link>https://arxiv.org/abs/2605.25085</link>
      <description>arXiv:2605.25085v2 Announce Type: replace 
Abstract: We study the rate-distortion limits of online KV cache compression in autoregressive language models, formulating it as sequential Wyner-Ziv source coding on the filtration induced by the model, with the next-step query as decoder side information. Empirically, across four models spanning two families and $0.5$-$3$B parameters, we find that the next-token distribution's sensitivity to context truncation decays \emph{polynomially} rather than \emph{geometrically}: a power law improves on an exponential fit by an order of magnitude in extrapolation, the fitted exponent is recovered independently from a sink-plus-recent KL measurement, and the decay is verified to be free of positional-encoding artifacts by a position-preserving ablation. Under a corresponding \emph{polynomial truncation-sensitivity} assumption, our main result characterizes the per-token memory requirement of \emph{suffix-only} cache policies: a sliding-window scheme attains distortion $\varepsilon$ with window $w = O(\varepsilon^{-1/\alpha})$, and -- under an additional two-sided Bayes-risk condition -- a converse shows $w = \Omega(\varepsilon^{-1/\alpha})$ is necessary within this policy class, so the scaling is $\Theta(\varepsilon^{-1/\alpha})$ for suffix-only policies. Whether recurrent or propagating cache summaries can beat this scaling is left open. An explicit block-Markov scheme achieves the upper bound; its rate-of-convergence exponent matches the converse under additional forward-decay and regularity hypotheses (not implied by truncation sensitivity alone), and differs by a factor of two otherwise. Empirically, the polynomial law predicts the degradation curves of concrete cache policies: recency-based eviction (sliding, sink-plus-recent) suppresses distortion by roughly two orders of magnitude over random retention at equal budget, with a power-law decay in the budget.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.25085v2</guid>
      <category>cs.IT</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Munsik Kim</dc:creator>
    </item>
    <item>
      <title>P1SCO: Social Dimensions from a Perspectivist Lens</title>
      <link>https://arxiv.org/abs/2605.25312</link>
      <description>arXiv:2605.25312v2 Announce Type: replace 
Abstract: We introduce P1SCO, a dataset of social media comments collected from three distinct platforms, annotated according to ten social dimensions to capture the diversity of social interactions and perceptions. The dataset is carefully disaggregated to allow analysis at the level of individual comments, annotators, and platforms. In addition to the social dimension labels, we include rich metadata on the annotators, including demographics, Big Five personality profiles, and political affiliation. This combination of comment-level annotations and annotator-level features enables nuanced analyses of how social perception varies across platforms, individual differences, and demographic factors. By preserving the diversity of annotator perspectives, our dataset supports studies of inter- and intra-annotator agreement, the influence of personality and political orientation on social interpretation, and the cross-platform dynamics of social discourse.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.25312v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Amanda Cercas Curry, Gianmarco de Francisci Morales, Luca Maria Aiello</dc:creator>
    </item>
    <item>
      <title>Pantheon360: Taming Digital Twin Generation via 3D-Aware 360{\deg} Video Diffusion</title>
      <link>https://arxiv.org/abs/2605.25449</link>
      <description>arXiv:2605.25449v2 Announce Type: replace 
Abstract: Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360{\deg} video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides a strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360{\deg} Video Diffusion, a controllable 360{\deg} video generation framework that synthesizes high-fidelity videos from sparse 360{\deg} inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360{\deg} scene generation for downstream simulation and digital-twin applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.25449v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ting-Hsuan Chen, Ying-Huan Chen, Tao Tu, Jie-Ying Lee, Cho-Ying Wu, Fangzhou Lin, Hengyuan Zhang, David Paz, Xinyu Huang, Yuliang Guo, Yu-Lun Liu, Yue Wang, Liu Ren</dc:creator>
    </item>
    <item>
      <title>CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents</title>
      <link>https://arxiv.org/abs/2605.25624</link>
      <description>arXiv:2605.25624v2 Announce Type: replace 
Abstract: Reinforcement learning with verifiable rewards (RLVR) has driven breakthroughs in domains such as math, tool-use, and software engineering, yet its extension to computer-use agents (CUAs) has been bottlenecked by the scarcity of scalable training data with deterministic rewards. Constructing such data for CUAs requires consistent task instruction, executable environment, and verifiable reward. However, hand-curated benchmarks achieve high reward fidelity but cover few applications and LLM-as-judge-based datasets scale broadly but lack reliable verification. We present CUA-Gym, a scalable pipeline that co-generates task instructions, environment states, and reward functions. Concretely, a Generator agent constructs the initial and golden environment states, and a separate Discriminator agent writes the reward function from the task specification. An orchestrator agent drives the two through iterative rounds upon execution. Generated tuples then pass a final filter combining LLM majority voting and agent rollouts, ensuring quality beyond the per-task adversarial loop. To address the scarcity of training environments, we further synthesize CUA-Gym-Hub, a broad suite of high-fidelity mock web applications grounded in real-world software-use distributions, expanding the scale of CUA RLVR data by magnitude. Using this pipeline, we construct CUA-Gym, a dataset of 32,112 verified RLVR training tuples grounded in 110 environments. Trained with GSPO on CUA-Gym, our CUA-Gym-A3B and CUA-Gym-A17B achieve 62.1% and 72.6% on OSWorld-Verified, outperforming prior open-source CUAs at comparable scales, with performance scaling smoothly in both data volume and environment diversity. The same checkpoints also improve on the held-out WebArena benchmark, indicating transfer beyond the training environments. We will open-source the full synthesis pipeline, dataset, CUA-Gym-Hub environments, and models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.25624v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Bowen Wang, Dunjie Lu, Junli Wang, Tianyi Bai, Shixuan Liu, Zhipeng Zhang, Haiquan Wang, Hao Hu, Tianbao Xie, Shuai Bai, Dayiheng Liu, Que Shen, Junyang Lin, Tao Yu</dc:creator>
    </item>
    <item>
      <title>Neural Scalable Symbolic Search Framework for Complex Logical Queries with Multiple Free Variables</title>
      <link>https://arxiv.org/abs/2605.25985</link>
      <description>arXiv:2605.25985v2 Announce Type: replace 
Abstract: Complex Query Answering (CQA) is a fundamental knowledge representation and reasoning task over incomplete knowledge graphs (KGs). Answering existential first-order queries with $k$ free variables (i.e., $\text{EFO}_k$ queries) is a crucial yet challenging problem, as it requires ranking answer tuples in $\mathcal{E}^k$, where $\mathcal{E}$ denotes the entity set of a KG. This quickly becomes intractable as $k$ grows. Consequently, existing benchmarks and methods rely on marginal rankings over individual variables; however, marginal rankings are a poor proxy for the true joint ranking of tuples. Building on neural symbolic search for $\text{EFO}_1$ queries, we propose Neural Scalable Symbolic Search (NS3), a budgeted framework that approximates joint ranking without enumerating $\mathcal{E}^k$. NS3 (i) answers marginalized sub-queries to obtain necessary candidate sets, (ii) merges multiple free variables into hypernodes whose domains are pruned and controlled by a dynamic budget $B$, and (iii) progressively reduces an $\text{EFO}_k$ query to an $\text{EFO}_{k-1}$ query over a budgeted reduced domain. Across three standard KG datasets, NS3 substantially improves joint ranking performance while retaining strong marginal accuracy. We further release a joint-ranking benchmark that extends existing $\text{EFO}_1$ datasets to $k=3$, enabling systematic evaluation of multi-variable queries. Our code is provided in https://github.com/HKUST-KnowComp/NS3_KDD2026.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.25985v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Weizhi Fei, Hang Yin, Zihao Wang, Shukai Zhao, Wei Zhang, Yangqiu Song</dc:creator>
    </item>
    <item>
      <title>Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2605.26078</link>
      <description>arXiv:2605.26078v3 Announce Type: replace 
Abstract: Wasserstein policy gradient (WPG) is a policy optimization method for reinforcement learning (RL) that exploits the optimal-transport geometry of action distributions. For the entropy-regularized RL objective, WPG evolves each state-conditional policy by transporting it along the action gradient of the soft Q-function together with a Langevin-type diffusion. Despite its appeal for continuous-control problems, its global convergence properties remain poorly understood. Standard Langevin analyses do not directly apply, because the RL objective depends on the policy through the Bellman recursion rather than through a static convex functional, and the Langevin drift is determined by the soft Q-function, whose regularity must be controlled along the policy iterates.
  In this paper, we develop a global convergence theory for WPG by exploiting the Bellman structure of entropy-regularized RL. We show that the role usually played by convexity can be replaced by a Bellman-based argument: the soft Bellman residual admits a statewise KL representation with respect to a Gibbs policy; Bellman contraction relates this residual to the global optimality gap; and a Bellman resolvent identity connects value improvement to relative Fisher information. Combined with a uniform log-Sobolev inequality (LSI) for the evolving Gibbs family, these ingredients yield a distributional Polyak--\L{}ojasiewicz condition. We further establish the regularity and uniform bounds needed to control the discretization error, thereby obtaining geometric contraction up to a discretization bias. Conceptually, our analysis shows that although entropy-regularized RL is not convex in the usual flat sense, the Bellman recursion induces a favorable Polyak--Lojasiewicz-type (PL) geometry that supports global convergence of WPG.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.26078v3</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhaoyu Zhu, Rui Gao, Shuang Li</dc:creator>
    </item>
    <item>
      <title>Reinforcing Few-step Generators via Reward-Tilted Distribution Matching</title>
      <link>https://arxiv.org/abs/2605.26108</link>
      <description>arXiv:2605.26108v3 Announce Type: replace 
Abstract: Recent advances in few-step diffusion distillation have enabled efficient image generation, yet aligning these models with human preferences remains challenging. We propose Reward-Tilted Distribution Matching Distillation (RTDMD), a two-stage framework that unifies distribution matching distillation with reward-guided reinforcement learning for few-step flow generators. We show that minimizing the KL divergence to a reward-tilted teacher distribution naturally decomposes into a distribution matching term and a reward maximization term. In the first stage, we introduce Ambient-Consistent Distribution Matching Distillation (AC-DMD), which performs subinterval-wise distribution matching and augments the fake score objective with a consistency regularizer to help the fake score model track the shifting generator distribution under limited updates. In the second stage, we jointly optimize both terms: for the reward maximization term, we derive a hybrid policy gradient that combines a GRPO-style estimator for the stochastic intermediate transitions with direct reward backpropagation through the deterministic final step, and further introduce step-subset GRPO (SubGRPO) to reduce variance. Experiments on SD3, SD3.5, and FLUX.2 demonstrate that RTDMD establishes new state-of-the-art results across preference, aesthetic, and compositional metrics with only 4 inference steps, outperforming previous few-step text-to-image generation methods. Code and models are available at https://github.com/Harahan/RTDMD.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.26108v3</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yushi Huang, Xiangxin Zhou, Ruoyu Wang, Chi Zhang, Jun Zhang, Tianyu Pang</dc:creator>
    </item>
    <item>
      <title>Totoro$^+$: An Adaptive and Scalable Edge Federated Learning System</title>
      <link>https://arxiv.org/abs/2605.26323</link>
      <description>arXiv:2605.26323v3 Announce Type: replace 
Abstract: Federated Learning (FL) is an emerging distributed machine learning (ML) technique that enables in-situ model training and inference on decentralized edge devices. We propose Totoro$^+$, a novel scalable FL system that enables massive FL applications to run simultaneously on edge networks. The key insight is to explore a distributed hash table (DHT)-based peer-to-peer (P2P) model to re-architect the centralized FL system design into a fully decentralized one. In contrast to previous studies where many FL applications shared one centralized parameter server, Totoro$^+$ assigns a dedicated parameter server to each application. Any edge node can act as any application's coordinator, aggregator, client selector, worker (participant device), or any combination of the above, thereby radically improving scalability and adaptivity. Totoro$^+$ introduces three innovations to realize its design: a locality-aware P2P multi-ring structure, a publish/subscribe-based forest abstraction, and a game-theoretic path planning model with a guarantee of an $\epsilon$-approximate Nash equilibrium. Real-world experiments on 500 Amazon EC2 servers show that Totoro$^+$ scales gracefully with the number of FL applications and $N$ edge nodes speeds up the total training time by $1.2\times-14.0\times$, achieves $\mathcal{O}(\log N)$ hops for model dissemination and gradient aggregation with millions of nodes, and efficiently adapts to the practical edge networks and churns.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.26323v3</guid>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1109/TPDS.2026.3696917</arxiv:DOI>
      <arxiv:journal_reference>IEEE Transactions on Parallel and Distributed Systems, vol. 37, no. 7, pp. 1740-1757, July 2026</arxiv:journal_reference>
      <dc:creator>Cheng-Wei Ching, Xin Chen, Taehwan Kim, Jian-Jhih Kuo, Dilma Da Silva, Liting Hu</dc:creator>
    </item>
    <item>
      <title>Robust Koopman Control Barrier Filters for Safe Actor-Critic Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2605.26452</link>
      <description>arXiv:2605.26452v2 Announce Type: replace 
Abstract: Safe reinforcement learning (RL) for robotic systems requires policies that improve task performance while satisfying state and input constraints during both training and deployment. Control barrier functions (CBFs) provide a principled mechanism for enforcing forward invariance through minimally invasive safety filters, but their use in model-free RL is limited by the need for accurate dynamics and hand-designed barrier certificates. We propose Robust Koopman-CBF SAC, a safety-filtered actor--critic framework that learns a finite-dimensional Koopman predictor from data, constructs affine CBF constraints in the lifted space, and enforces them through a quadratic-program safety layer. To account for finite-dimensional Koopman approximation error, the CBF condition is tightened using a projected residual margin estimated from held-out rollout data. The critic is trained on the executed safe action, while the actor is regularized toward the Koopman-CBF feasible set, reducing dependence on the filter over training. Across safe-control benchmarks, the method achieves zero constraint violations on CartPole stabilization and tracking while matching or exceeding unconstrained SAC returns. On high-dimensional Safety Gymnasium locomotion tasks, the method reduces violations in some settings but also exposes important limitations of first-order velocity barriers and linear EDMD models, motivating high-order and multi-step Koopman-CBF extensions. These results suggest that robust Koopman-CBF filters are a promising bridge between model-free RL and certifiable safety, while clarifying the structural conditions under which such filters remain effective.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.26452v2</guid>
      <category>cs.RO</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dhruv S. Kushwaha, Zoleikha A. Biron</dc:creator>
    </item>
    <item>
      <title>The Strongest Teacher Is Not Always the Best Teacher: Student-Centric Answer Selection</title>
      <link>https://arxiv.org/abs/2605.26872</link>
      <description>arXiv:2605.26872v2 Announce Type: replace 
Abstract: LLM training increasingly relies on teacher-generated supervision, from synthetic responses to reasoning traces and tool-use demonstrations. Current practice often chooses the highest-performing teacher to generate student training data, implicitly treating teacher test performance as a proxy for teaching quality. We show that this assumption can fail: even when multiple teachers provide correct answers to the same question, the answer from the strongest teacher is not necessarily the best supervision for a given student. To address this gap, we propose Student-Centric Answer Sampling (SCAS), a framework that selects from verified teacher-generated answers according to their estimated student-centric learning cost. Motivated by a token-wise gradient decomposition, we derive an efficient forward-only proxy for this cost and use it to guide answer selection during training. Experiments across 30 teacher models, 6 student base models, and 6 tasks show that SCAS consistently improves student performance, suggesting that effective distillation should prioritize supervision matched to the current student rather than teacher strength alone.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.26872v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhengyu Hu, Zheyuan Xiao, Linxin Song, Fengqing Jiang, Yuetai Li, Zhengyu Chen, Zhihan Xiong, Yue Liu, Junhao Lin, Yao Su, Lijie Hu, Kaize Ding, Teng Xiao, Radha Poovendran</dc:creator>
    </item>
    <item>
      <title>A Unified Structured Query Understanding Framework for Industrial Semantic Search</title>
      <link>https://arxiv.org/abs/2605.27441</link>
      <description>arXiv:2605.27441v2 Announce Type: replace 
Abstract: Query understanding in large-scale industrial search systems is typically implemented as a cascade of disparate, task-specific components. While individually optimizable, this fragmented architecture incurs high maintenance overhead and results in inconsistent behaviors, particularly for long-tail queries. In this work, we propose and deploy a unified structured query understanding system that consolidates these heterogeneous functions into a single Small Language Model (SLM) that performs schema-constrained generation. To address the data bottlenecks inherent in unified modeling, we introduce Query Illuminator, a dual-purpose framework serving as: (i) a teacher model for high-quality auto-annotation and distillation, and (ii) a surrogate judge for scalable evaluation where human labels are scarce. We validate this approach through extensive offline and online tests within LinkedIn's Job Search system. Furthermore, we demonstrate the framework's horizontal extensibility through a cross-domain case study on People Search. The results show improved user engagement and reduced operational costs, achieved while satisfying strict low-latency serving constraints on limited GPU resources.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.27441v2</guid>
      <category>cs.IR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3770855.3818312</arxiv:DOI>
      <dc:creator>Ping Liu, Qianqi Shen, Jianqiang Shen, Chunnan Yao, Kevin Kao, Rajat Arora, Dan Xu, Baofen Zheng, Yunxiang Ren, Benjamin Le, Ali Hooshmand, Igor Lapchuk, Juan Bottaro, Raghavan Muthuregunathan, Caleb Johnson, Liangjie Hong, Jingwei Wu, Wenjing Zhang</dc:creator>
    </item>
    <item>
      <title>Locality-Aware Redundancy Pruning for LLM Depth Compression</title>
      <link>https://arxiv.org/abs/2605.27786</link>
      <description>arXiv:2605.27786v2 Announce Type: replace 
Abstract: Large language models are known to contain representational redundancy across network depth, making depth pruning an effective approach for improving inference efficiency. Existing one-shot pruning methods rely on local layer importance or fixed redundancy assumptions across architectures. We propose Locality-Aware Redundancy Pruning (LoRP), a training-free one-shot depth pruning framework guided by representation locality. We show that inter-layer redundancy can be either localized or globally distributed depending on the LLM architecture. To characterize this phenomenon, we introduce Representation Locality Score (RLS), derived from global inter-layer hidden-state similarity. Using a small calibration set, LoRP computes pairwise layer similarity, clusters layers by representational similarity, and allocates pruning according to residual intra-cluster redundancy. Experiments across diverse LLM families show improvements in both perplexity and downstream task accuracy. Official github repository: https://github.com/daniel-eai/LoRP-Locality-Aware-Redundancy-Pruning/</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.27786v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vincent-Daniel Yun, Youngrae Kim, Woosang Lim, YoungJin Heo, Minkyu Kim, Sunwoo Lee</dc:creator>
    </item>
    <item>
      <title>ClothTransformer: Unified Latent-Space Transformers for Scalable Cloth Simulation</title>
      <link>https://arxiv.org/abs/2605.27852</link>
      <description>arXiv:2605.27852v3 Announce Type: replace 
Abstract: Unified and scalable Transformers have recently achieved remarkable success in modeling diverse phenomena traditionally associated with computer graphics, such as 3D visual effects, rendering processes, and motion in videos. In this work, we take a step further by investigating whether modern Transformer techniques can tackle the challenging task of cloth simulation. To this end, we present ClothTransformer, a framework that reformulates cloth simulation as autoregressive sequence modeling in a learned latent space. Existing neural cloth simulators are largely specialized to single scenarios, intrinsically coupled to the mesh discretization, and lack robust collision handling. Our approach addresses these limitations through three contributions: (1) a unified Transformer architecture that handles diverse scenarios -- body-driven garments, robotic manipulation, and free-fall collisions -- under a single model and achieves approximately $4$--$9{\times}$ lower error than prior state-of-the-art methods across all scenarios; (2) a scalable latent-space formulation that compresses arbitrary-resolution meshes into a fixed-size set of latent tokens, making temporal dynamics computation independent of mesh resolution; and (3) a diverse-scenario high-fidelity penetration-free dataset of ${\sim}$493.4k frames spanning all three settings, which enables a differentiable Continuous Collision Detection (CCD) module to suppress penetration artifacts. Project Page: https://yucrazing.github.io/clothtransformer/</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.27852v3</guid>
      <category>cs.GR</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yu Zhang, Yidi Shao, Wenqi Ouyang, Yushi Lan, Zhexin Liang, Chengrui Wu, Xudong Xu, Xingang Pan</dc:creator>
    </item>
    <item>
      <title>Pruning and Distilling Mixture-of-Experts into Dense Language Models</title>
      <link>https://arxiv.org/abs/2605.28207</link>
      <description>arXiv:2605.28207v2 Announce Type: replace 
Abstract: Mixture-of-Experts (MoE) is now the dominant architecture for frontier language models, yet it requires all expert parameters to be loaded in memory, making it less preferable for memory-constrained deployment. Existing compression methods reduce the number of experts but the output remains an MoE model with the same fundamental limitation. We present the first systematic framework for converting a trained MoE into a standard fully dense architecture: experts are scored, selected, and grouped, then concatenated into a dense FFN and refined by knowledge distillation from the MoE teacher. We evaluate 7 scoring, 5 grouping, and 2 magnitude scaling methods across a range of selected expert counts on Qwen3-30B-A3B, yielding 350 configurations. We find that the choice of scoring method is the most impactful, with our novel diversity-aware scoring consistently outperforming prior methods on Qwen3-30B-A3B, DeepSeek-V2-Lite, and GPT-OSS-20B. Under a controlled comparison at matched parameter count, MoE-to-dense outperforms dense-to-dense pruning by +6.3 pp in average downstream accuracy after ~4B-token distillation at 1.6x faster training wall-clock speed.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.28207v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Junhyuck Kim, Jihun Yun, Haechan Kim, Gyeongman Kim, Joonghyun Bae, Jaewoong Cho</dc:creator>
    </item>
    <item>
      <title>Efficient and Scalable Provenance Tracking for LLM-Generated Code Snippets</title>
      <link>https://arxiv.org/abs/2605.28510</link>
      <description>arXiv:2605.28510v2 Announce Type: replace 
Abstract: Large language models (LLMs) for code completion and generation are increasingly used in software development, yet they may reproduce training examples verbatim and without authorship attribution, raising legal and ethical concerns around plagiarism and license compliance. Classical fingerprint-based plagiarism detectors based on fingerprinting, such as Winnowing, remain highly effective, yet the inspection requires comparing fragments of code to the entire training set, and their linear-time search makes them impractical for the billion-scale corpora used to train modern code LLMs. To bridge this gap, we introduce SOURCETRACKER, a 300M-parameter encoder tailored for code retrieval, together with a hybrid two-stage provenance-tracking pipeline HYBRIDSOURCETRACKER (HST). HST first narrows down a small set of candidate snippets via vector search, then re-ranks those candidates using Winnowing on exact fingerprints. We train and evaluate our system on a 10M-snippet subset of the THESTACKV2 dataset, with both verbatim and adapted snippets that emulate realistic identifier renaming. On an in vitro 100k-snippet search space with adapted queries, our hybrid approach reaches a mean reciprocal rank on par with Winnowing for 30-token fragments. Then, starting from windows &gt;= 60 tokens, it consistently over-performs by up to 5.4% while preserving logarithmic-time query complexity. In a complementary evaluation using an LLM-based judge, we find that many retrieved snippets not labeled as ground truth are still highly similar to the expected sources, particularly with longer context windows, and thus remain useful for end users. Overall, our results demonstrate that integrating vector search with fingerprinting enables scalable, high-precision provenance tracking for code produced by LLMs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.28510v2</guid>
      <category>cs.SE</category>
      <category>cs.AI</category>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Andrea Gurioli, Davide D'Ascenzo, Federico Pennino, Maurizio Gabbrielli, Stefano Zacchiroli</dc:creator>
    </item>
    <item>
      <title>Unified sparse framework for large-scale material point method simulations</title>
      <link>https://arxiv.org/abs/2605.28525</link>
      <description>arXiv:2605.28525v2 Announce Type: replace 
Abstract: The material point method (MPM) is a hybrid particle-grid method widely used for simulating large deformation with history-dependent behavior. Standard MPM often relies on a dense background grid, which can be highly inefficient when material occupies a small fraction of the computational domain. Such sparsity is common in many large-scale problems, from geophysical mass flows over large terrain domains to visual-computing applications. Here, we introduce a unified sparse background-grid framework for large-scale MPM simulation. The framework treats sparse grid construction as a general active-node indexing problem. We develop two architecture-specific implementations to realize the same sparse framework: a scan-based strategy for CPUs and a hash-based strategy for GPUs. Through benchmark problems and a large-scale landslide simulation, we show that the framework provides the same results as standard dense MPM while reducing computational time and memory usage by one to two orders of magnitude in strongly sparse cases.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.28525v2</guid>
      <category>cs.CE</category>
      <category>physics.comp-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yidong Zhao, Lars Blatny, Xiang Feng, Mikkel M. Juel, Chenfanfu Jiang, Johan Gaume</dc:creator>
    </item>
    <item>
      <title>S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering</title>
      <link>https://arxiv.org/abs/2605.28831</link>
      <description>arXiv:2605.28831v2 Announce Type: replace 
Abstract: Long-horizon memory question answering often requires sparse evidence from heterogeneous histories, including events, object states, visual observations, temporal relations, and causal steps. Existing memory interfaces expand reader context, retrieve semantically related chunks, or expose graph neighborhoods, but they are not explicitly designed to select compact evidence for a fixed reader. We propose Structured Spatiotemporal Scene--Event Memory (S3Mem), a query-time memory interface that writes textual, visual, and agent-use histories into structured scene--event units and routes compact evidence packs to the reader. Its router scores candidate units, query anchors, and anchor--support links, enabling both single-hop selection and short multi-hop evidence chains without reader fine-tuning or test-time training. Across LoCoMo, EMemBench Visual Games, and AMA-Bench, S3Mem provides a strong score--token trade-off, with the clearest gains on localized event, state, temporal, causal, or provenance evidence. On LoCoMo, S3Mem reaches \(0.48\) F1 and \(0.40\) BLEU with (1{,}073) evidence tokens per question, about \(15.8\times\) fewer than the LoCoMo reference. On EMemBench Visual Games, it obtains the best F1 and second-best accuracy with only \(189\)tokens.On AMA-Bench, it is not the highest-scoring method, but remains competitive while using the fewest reader-visible evidence tokens.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.28831v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Encheng Su, Jianyu Wu, Jinouwen Zhang, Qiucheng Yu, Chen Tang, Pengze Li, Lintao Wang, Aoran Wang, Xinzhu Ma, Shixiang Tang, Yizhou Wang, Houqiang Li</dc:creator>
    </item>
    <item>
      <title>Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?</title>
      <link>https://arxiv.org/abs/2605.28860</link>
      <description>arXiv:2605.28860v2 Announce Type: replace 
Abstract: Fine-tuning large language models (LLMs) frequently induces catastrophic forgetting of prior capabilities. Recent work has shown that reinforcement learning (RL) retains prior capabilities more effectively than supervised fine-tuning (SFT), attributing this to policy-gradient updates remaining closer to the base policy \cite{shenfeld2025rl}. We extend this behavioral account to the mechanistic level and ask whether RL's advantage is mirrored by stronger preservation of internal computational circuits. We introduce differential circuit vulnerability, a head-level measure of how much a circuit degrades under fine-tuning, and use it to compare RL and SFT on Qwen2.5-3B-Instruct adapted to scientific question-answering. We find a clear mechanistic trade-off: SFT adapts more rapidly to the target task but produces substantially greater circuit disruption and forgetting of prior capabilities, whereas RL preserves a larger fraction of the base circuit at the cost of slower task adaptation. These findings suggest that circuit preservation may help explain why RL is more robust to catastrophic forgetting. We released our code here: https://github.com/rl-sft-circuit-research/differential-circuit-vulnerability.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.28860v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jeanmely Rojas Nunez, Viraj Sawant, Nathan Allen, Nomgondalai Amgalanbaatar, Yannis Zongo, Vasu Sharma, Maheep Chaudhary</dc:creator>
    </item>
    <item>
      <title>Cycle-Space Informed Detection of Autoencoded Blind False Data Injection Attacks on Power Systems</title>
      <link>https://arxiv.org/abs/2605.28912</link>
      <description>arXiv:2605.28912v2 Announce Type: replace 
Abstract: The rapid growth of AI-driven data centers and large-scale energy storage systems is increasing the reliance of power system operation on real-time measurement data and automated decision-making. However, many existing detection methods rely on statistical or data-driven analysis of measurements and can fail when attackers exploit the same data structure to craft stealthy perturbations. To illustrate this limitation, we demonstrate a blind False Data Injection Attack (FDIA) in which an Autoencoder learns the measurement manifold and generates perturbations aligned with the Jacobian null space, thereby allowing the attack to evade both residual-based baddata detectors and time-series anomaly detectors. To mitigate data-driven FDIAs which exploit the null space, we propose a topology-informed Cycle-Space Detector (CSD) that leverages the Cycle-Space of the network to impose structural constraints that enhance null space estimation. In addition, we prove that by using the Minimum Cycle Basis (MCB), the proposed CSD achieves the optimal generalization error for attack detection. By exploiting topology-derived cycle constraints rather than relying solely on numerical null space estimation, the proposed method does not require precise line parameters and improves the separation between normal and attacked measurements. Simulation results on IEEE 14-, 30-, 57-, and 118-bus systems demonstrate that the proposed method effectively detects data-driven FDIAs under realistic measurement noise.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.28912v2</guid>
      <category>cs.LG</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xin Li, Chenhan Xiao, Jonathan Cohen, Aviad Elyashar, Yang Weng, Rami Puzis</dc:creator>
    </item>
    <item>
      <title>Multifidelity Proper Orthogonal Decomposition</title>
      <link>https://arxiv.org/abs/2605.29213</link>
      <description>arXiv:2605.29213v2 Announce Type: replace 
Abstract: This paper introduces a multifidelity formulation that reduces the computational cost of the proper orthogonal decomposition (POD) of a high-fidelity model by leveraging data from cheaper, lower-fidelity models. POD is a prevalent technique for extracting a low-dimensional basis from training data to achieve subsequent dimension reduction or reduced-order modeling. In scientific and engineering applications, the training data are typically numerical snapshot solutions of a high-fidelity model, and computation of a sufficiently rich snapshot set can be prohibitively expensive, especially when sampling over a high-dimensional parameter space. Insufficient snapshot training data risks overfitting and poor generalizability of the POD basis to outside the training regime. Our multifidelity POD (MFPOD) formulation reallocates computational budget to cheaper, low-fidelity models that can be sampled more extensively. MFPOD then weights high- and low-fidelity snapshot data via a control-variate formulation to guarantee an unbiased estimate of the expected high-fidelity least-squares projection error. The MFPOD subspace is chosen to minimize the estimate of this projection error, and converges in probability to the same subspace as single-fidelity POD in the limit of an arbitrarily large budget. For restrictive computational budgets, the MFPOD cost function has (under some assumptions) lower variance than the POD cost function, which makes the MFPOD subspace more robust against variations in the training data and thus less prone to overfitting. For a numerical example modeling the velocity of the Pine Island glacier, MFPOD achieves the same accuracy as single-fidelity POD with an order of magnitude reduction in the offline computational cost of snapshot generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.29213v2</guid>
      <category>math.NA</category>
      <category>cs.CE</category>
      <category>cs.NA</category>
      <category>math-ph</category>
      <category>math.MP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Nicole Aretz, Karen Willcox</dc:creator>
    </item>
    <item>
      <title>MOOSE-Copilot: A Web-Based Interactive Assistant for Unified Exploratory and Fine-Grained Scientific Hypothesis Discovery</title>
      <link>https://arxiv.org/abs/2605.29475</link>
      <description>arXiv:2605.29475v2 Announce Type: replace 
Abstract: Large language models (LLMs) show remarkable potential in scientific hypothesis discovery. However, existing approaches face two critical limitations: they treat divergent exploratory search and convergent fine-grained refinement as isolated tasks, and they operate autonomously with little to no human guidance. We present MOOSE-Copilot, the first unified framework to bridge this abstraction gap through a formalized human-AI interaction (HAII) protocol. Our system empowers scientists to steer the generative process via three explicit signals: initial blueprints, inter-stage routing, and intra-stage feedback. Using an oracle-simulated evaluation in which an LLM provides idealized expert signals, we show that injecting these structured signals significantly outperforms purely autonomous baselines, characterizing the gains achievable under high-quality guidance. Furthermore, we build a web-based interface that turns the framework into a no-code workflow: researchers pose a question, watch the hypothesis search unfold as an interactive tree, and steer it by selecting hypotheses, routing between stages, and injecting feedback-no command-line agents required. This makes end-to-end hypothesis discovery directly accessible to interdisciplinary researchers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.29475v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.CE</category>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hongran An, Zonglin Yang</dc:creator>
    </item>
    <item>
      <title>LoRA-Key: User-Centric LoRA Watermarking for Text-to-Image Diffusion Models</title>
      <link>https://arxiv.org/abs/2605.29569</link>
      <description>arXiv:2605.29569v2 Announce Type: replace 
Abstract: Low-Rank Adaptation (LoRA) has become a widely used mechanism for customizing text-to-image diffusion models, enabling lightweight modules that are shared, reused, and commercialized as independent assets. This LoRA-centric ecosystem shifts copyright protection from foundation models to distributed LoRA modules, which are easy to copy, redistribute, or reuse without authorization. Existing watermarking methods either protect the base diffusion model or require watermark-aware retraining for each target LoRA, limiting their practicality in open community settings. To address this limitation, we propose LoRA-Key, a user-centric LoRA watermarking framework that treats copyright protection as a reusable ownership key. LoRA-Key encapsulates a recoverable secret message into a standalone user-specific Watermark LoRA, which can be attached to different target LoRAs through training-free linear superposition without per-LoRA retraining or structural modification. To train such a reusable key, we first establish a latent watermark prior in the frozen VAE latent space for robust message embedding and recovery, and then optimize the Watermark LoRA with message-conditioned watermark supervision and semantic consistency constraints. We further introduce Gradient Orthogonal Projection (GOP) to suppress watermark updates that conflict with semantic-preserving directions, reducing interference with generation fidelity and downstream style adaptation. Extensive experiments show that LoRA-Key provides lightweight plug-and-play copyright protection while preserving generation quality and style fidelity, and maintains robust ownership verification under image-level distortions, downstream fine-tuning, and multi-LoRA composition.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.29569v2</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yaopeng Wang, Qingliang Wang, Zhibo Wang, Huiyu Xu, Jiacheng Du, Qiu Wang, Jia-Li Yin, Kui Ren</dc:creator>
    </item>
    <item>
      <title>Quantifying and Optimizing Simplicity via Polynomial Representations</title>
      <link>https://arxiv.org/abs/2605.29823</link>
      <description>arXiv:2605.29823v2 Announce Type: replace 
Abstract: Deep networks often exhibit a preference for "simple" solutions, and such a simplicity bias is widely believed to play a key role in generalization. Yet a broadly applicable, quantitative measure of simplicity remains elusive. We introduce polynomial representations as a distribution-aware, low-dimensional surrogate for neural functions: we approximate a network's predictive behavior along data-dependent interpolation paths using orthogonal polynomial bases, yielding a compact functional representation. We show that the effective degree of this representation serves as a practical simplicity metric that is predictive of generalization across tasks and architectures, and consistently outperforms existing generalization proxies such as sharpness. Finally, polynomial representations naturally yield a differentiable simplicity regularizer, which consistently improves generalization in image and text classification, fine-tuning contrastive vision-language models, and reinforcement learning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.29823v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tianren Zhang, Xiangxin Li, Minghao Xiao, Guanyu Chen, Feng Chen</dc:creator>
    </item>
    <item>
      <title>Midpoint Generative Models</title>
      <link>https://arxiv.org/abs/2605.29920</link>
      <description>arXiv:2605.29920v2 Announce Type: replace 
Abstract: We introduce Midpoint Generative Models (MGM), a principled framework for training one-step generative models. MGM is based on a simple symmetry of Flow Matching with linear interpolation: when the two endpoint distributions coincide, the corresponding drift field vanishes at the midpoint time, $t=1/2$. We show that the norm of this field defines a valid discrepancy between distributions, which we call the Midpoint Divergence. We extend this discrepancy beyond the midpoint by introducing randomly flipped interpolations and further generalize it by replacing deterministic linear Flow Matching interpolations with symmetric stochastic interpolants, yielding a generalized Midpoint Divergence. Finally, we derive a variational formulation of our generalized divergence, yielding a tractable objective for training a one-step generator. The resulting MGM algorithm offers an effective and theoretically grounded approach to generative modeling, achieving competitive performance against existing one-step generative modeling methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.29920v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Daniil Shlenskii, Nikita Gushchin, Lev Novitskiy, Dmitry V. Dylov, Alexander Korotin</dc:creator>
    </item>
    <item>
      <title>Privacy-Enhanced Zero-Order Federated Learning via xMK-CKKS over Wireless Channels</title>
      <link>https://arxiv.org/abs/2605.30123</link>
      <description>arXiv:2605.30123v2 Announce Type: replace 
Abstract: Homomorphic encryption (HE) enables privacy-preserving aggregation in federated learning (FL) by allowing the server to operate on encrypted data without decryption. Existing HE-over-the-air (OTA) methods mainly rely on single-key HE schemes and require channel estimation or pre-equalization to compensate for wireless fading. However, single-key HE remains vulnerable to honest-but-curious (HBC) clients holding the shared secret key, while multi-key HE provides stronger client-level security by assigning each device its own secret key. We propose a four-phase protocol that enables the aggregation of xMK-CKKS over a shared wireless channel without channel estimation. The protocol retransmits partial public keys and ciphertexts through the same channel realization, so that the dominant large-modulus encryption terms cancel algebraically during decryption. We integrate this protocol with zero-order FL over slowly varying LoS-dominant channels, where each device transmits a single encrypted scalar per round and the communication/encryption overhead is independent of the model dimension. We show that the residual noise induced by encryption and wireless aggregation preserves the standard convergence rate \(O(1/\sqrt{K})\) up to a negligible noise floor, where $K$ is the number of communication rounds. The protocol assumes an non-trusted server and is secure against HBC clients, preventing any client from recovering the local updates of other participants. Numerical results on MNIST validate the theoretical analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.30123v2</guid>
      <category>cs.CR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anthony Ayli, Khalil Harris, Jihad Fahs, Mohamad Assaad</dc:creator>
    </item>
    <item>
      <title>Can AI Weather Models Predict Beyond Two Weeks? A Quantitative Benchmark and Analysis of Long Rollouts</title>
      <link>https://arxiv.org/abs/2605.30184</link>
      <description>arXiv:2605.30184v2 Announce Type: replace 
Abstract: While AI weather models excel at short-to-medium range forecasts (up to 15 days), they frequently suffer from ill-defined "instabilities" when rolled out over longer horizons. This work addresses the lack of a formal taxonomy by categorizing these failures into three distinct regimes: blow-up, drift, and loss of seasonality, through year-long rollouts of nine state-of-the-art AI weather models. Our analysis reveals that stability hinges on the treatment of small spatio-temporal scales: unstable models amplify high-frequency energy, while stable models act as denoisers when noise is added to their inputs. Far from reducing these models to mere stochastic parrots, our findings highlight that stable models generate unique weather trajectories, conditioned on the initial state. We verify our findings through ablation studies on architectural design choices, conducted using state-of-the-art Vision Transformer (ViT) AI weather model architectures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.30184v2</guid>
      <category>cs.LG</category>
      <category>physics.ao-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Fanny Lehmann, Firat Ozdemir, Yun Cheng, Torsten Hoefler, Sebastian Schemm, Benedikt Soja, Siddhartha Mishra</dc:creator>
    </item>
    <item>
      <title>BORA: Bridging Offline Reinforcement Learning and Online Residual Adaptation for Real-World Dexterous VLA Models</title>
      <link>https://arxiv.org/abs/2605.30226</link>
      <description>arXiv:2605.30226v2 Announce Type: replace 
Abstract: Vision-Language-Action (VLA) models have emerged as a promising paradigm for grounding visual-language understanding into real-world robotic manipulation. However, dexterous manipulation remains challenging for VLA policies due to high-dimensional hand control and compounding execution errors, which makes real-world RL post-training essential for bridging the gap between visually grounded action generation and physically reliable dexterous execution. However, high-dimensional dexterous exploration often triggers temporal inconsistency, sample inefficiency and hardware risks in the real world. To address these challenges, we propose BORA, an offline-to-online RL post-training framework designed for real-world dexterous VLA models. In the offline phase, BORA constructs a critic that takes both the VLM's cognition tokens and action chunks as inputs. This design enables action-conditioned value guidance, allowing the critic to evaluate dexterous hand motions beyond visual context alone. During the subsequent online phase, BORA freezes the VLA base and introduces a lightweight, Human-in-the-Loop (HiL) chunk-wise residual adaptation mechanism to mitigate real-world execution errors and further correct the offline-learned intents within the actual physical environment. By inheriting the offline critic and employing intervention-driven rewards, BORA effectively corrects execution discrepancies and adapts to real-world physical variances while preserving the pretrained policy as a stable prior. Extensive evaluations across five complex real-world dexterous tasks demonstrate that BORA significantly outperforms pure imitation learning and traditional decoupled RL baselines, achieving a 33% absolute increase in average success rate under standard settings and up to a 43% improvement in unseen object generalization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.30226v2</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhongxi Chen, Yifan Han, Yanming Shao, Huanming Liu, Congsheng Xu, Xiaoyu Chen, Yao Mu, Wenzhao Lian</dc:creator>
    </item>
    <item>
      <title>Exploring Autonomous Agentic Data Engineering for Model Specialization</title>
      <link>https://arxiv.org/abs/2605.30407</link>
      <description>arXiv:2605.30407v2 Announce Type: replace 
Abstract: Large Language Models (LLMs) have demonstrated strong performance on general tasks, while often struggling to adapt to specialized domains without high-quality domain-specific data. Existing LLM-based data curation methods primarily rely on human-designed workflows, leaving it unexamined whether LLMs can autonomously execute an end-to-end data engineering pipeline for model specialization. We formalize Autonomous Agentic Data Engineering, a novel task designed to evaluate LLMs as autonomous data engineers that drive model specialization through end-to-end data curation. We frame data as an optimizable component and study agents that plan, generate, and iteratively optimize training data across multiple domains, guided by post-training performance improvement. Experiments show that autonomous LLM data engineers yield substantial gains, as GPT-5.2 constructs a training curriculum that improves a student model by 57.29%, entirely through iterative, agent-driven data adaptation. By illuminating both potential and bottlenecks, our study establishes autonomous data engineering as a measurable capability and charts a path toward agent-driven model specialization (Code will be released at https://github.com/zjunlp/DataAgent).</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.30407v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.IR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Yujie Luo, Xiangyuan Ru, Jingsheng Zheng, Jingjing Wang, Yuqi Zhu, Jintian Zhang, Runnan Fang, Kewei Xu, Ye Liu, Zheng Wei, Jiang Bian, Zang Li, Shumin Deng</dc:creator>
    </item>
    <item>
      <title>Semantic Motion Anchors: Bridging Motion and Meaning in Co-Speech Gestures</title>
      <link>https://arxiv.org/abs/2605.30608</link>
      <description>arXiv:2605.30608v3 Announce Type: replace 
Abstract: Learning a shared representation between spoken text and gesture is central to co-speech gesture retrieval, synthesis, and understanding, but remains challenging for semantically meaningful gestures whose communicative intent is not captured by motion alone. Direct contrastive alignment between transcripts and continuous motion embeddings often overemphasizes low-level kinematics and misses the symbolic content of semantic gestures. We propose semantic motion anchors, natural-language abstractions of gesture motion capturing physical form and communicative intent. Our method discretizes 3D gestures into body-hand motion primitives, verbalizes them into structured descriptions, and grounds them in the transcript to provide auxiliary contrastive supervision. On BEAT2, our method improves text-to-gesture R@1 by 8.2% over a direct text-motion baseline and outperforms prior retrieval approaches on text to gesture and gesture to text retrieval directions. Beyond aggregate retrieval metrics, semantic motion anchor supervision helps retrieve gestures that are semantically meaningful for the spoken query, rather than defaulting to generic motion patterns. A downstream retrieval-augmented gesture generation study showed that users significantly preferred gestures retrieved by our approach over a retrieval-augmented generation baseline, demonstrating that semantically grounded retrieval translates to gestures that better convey communicative intent in downstream generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.30608v3</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Varsha Suresh, Mohammad Mahdi Abootorabi, Mohamed Salman, M. Hamza Mughal, Christian Theobalt, Ashwin Ram, J\"urgen Steimle, Vera Demberg</dc:creator>
    </item>
    <item>
      <title>Cross-Layer Subspace Coupling for LLM Compression: A Unifying Framework and Its Empirical Limits</title>
      <link>https://arxiv.org/abs/2605.30836</link>
      <description>arXiv:2605.30836v2 Announce Type: replace 
Abstract: Recent SVD based compression methods for large language models like SVD LLM and Basis Sharing can be unified under one optimization problem. While mathematical proofs and tests on Pythia models show this unified approach improves weight reconstruction error by up to 46% percent it fails in practical tasks. Downstream metrics like perplexity and accuracy severely degrade compared to standard per layer SVD LLM. The authors explain this failure mechanistically. Although the bundle method mathematically couples adjacent layers the transformer residual stream actually decouples them during forward passes. Thus per layer optimality matters more than joint cross layer optimization. The paper concludes that weight space reconstruction is a flawed objective for cross layer compression and future methods must focus on per layer activation reconstruction instead.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.30836v2</guid>
      <category>cs.LG</category>
      <category>math.DG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Snigdha Chandan Khilar</dc:creator>
    </item>
    <item>
      <title>SDM-Q: Cost-Aware Staged Decision-Making for Multi-Omics Classification with Deep Q-Learning</title>
      <link>https://arxiv.org/abs/2605.31014</link>
      <description>arXiv:2605.31014v2 Announce Type: replace 
Abstract: Multi-omics data provide complementary molecular characterizations of disease phenotypes and play an important role in disease diagnosis and subtype classification in precision medicine. However, acquiring complete multi-omics profiles is expensive and time-consuming, while most existing deep learning methods assume full modality availability during inference, resulting in substantial redundancy and limited practicality in clinical settings. To address this issue, we propose SDM-Q, a reinforcement learning framework for adaptive and cost-aware multi-omics classification. Specifically, multi-omics diagnosis is reformulated as a finite-horizon sequential decision problem, where the currently acquired omics modalities define the diagnostic state at each stage. An action--value function determines whether to acquire an additional modality or terminate the decision process and output the final prediction. To balance diagnostic utility and acquisition cost, the reward is defined only at the terminal stage and jointly determined by classification correctness and cumulative modality acquisition cost. A backward stage-wise optimization strategy is introduced to improve policy consistency and training stability. Experiments on four public multi-omics datasets, including ROSMAP, LGG, BRCA, and KIPAN, demonstrate that SDM-Q effectively reduces redundant modality acquisition while maintaining competitive classification performance compared with methods using complete multi-omics inputs. In the BRCA and KIPAN datasets, more than 99\% and 95\% of subjects, respectively, achieve accurate classification using only a single omics modality, while the average number of acquired modalities remains below two for ROSMAP and LGG. These results suggest that cost-aware sequential decision-making provides an effective paradigm for improving the efficiency of precision medicine workflows.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.31014v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Nan Mu, Yangfan Xiao, Ling Wang, Xiaoning Li, Yue Kang, Chen Zhao</dc:creator>
    </item>
    <item>
      <title>Light Interaction: Training-Free Inference Acceleration for Interactive Video World Models</title>
      <link>https://arxiv.org/abs/2605.31158</link>
      <description>arXiv:2605.31158v2 Announce Type: replace 
Abstract: Interactive video world models generate video chunk by chunk in response to user-controlled camera movements, enabling applications such as real-time game simulation, virtual scene navigation, and embodied AI training. However, scaling to long interactive trajectories is prohibitively expensive due to growing context memory, quadratic attention complexity, and repeated denoising steps. We present Light Interaction, a training-free inference acceleration framework for interactive video world models. Our key insight is that interaction naturally enables trajectory-dependent adaptive computation: retrieved spatial memory can be discarded during novel exploration, temporal context can be adjusted according to local latent dynamics, and early-step model outputs can be reused when the camera revisits familiar regions. Based on this insight, Light Interaction combines adaptive context management, denoising cache acceleration, and hardware-software co-designed 3D block sparse attention with fused Triton kernels. Evaluated on HY-WorldPlay and Matrix-Game-3.0, Light Interaction achieves up to 2.59x speedup without model retraining while maintaining competitive visual quality.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.31158v2</guid>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiacheng Lu, Haoyi Zhu, Sipei Yi, Enze Xie, Yu Li, Cheng Zhuo</dc:creator>
    </item>
    <item>
      <title>Scalable Inference-Time Annealing with Surrogate Likelihood Estimators</title>
      <link>https://arxiv.org/abs/2605.31498</link>
      <description>arXiv:2605.31498v3 Announce Type: replace 
Abstract: A long standing challenge in computational chemistry and biophysics is efficiently sampling the Boltzmann distribution of molecules. Advances in generative modeling have been proposed to address the limitations of conventional sampling techniques by eliminating the computational cost of simulation. A promising direction is iteratively finetuning diffusion models along a temperature ladder whereby training data is generated via importance sampling during inference-time annealing. Unfortunately, these methods require computing a divergence over the score field to estimate importance weights, rendering them intractable for larger systems. Here we present scalable inference-time annealing (SITA), which retrains flow-based models to generate samples at progressively lower temperatures using an energy-based model to facilitate fast surrogate likelihoods. We demonstrate state-of-the-art performance on both Alanine Dipeptide and Alanine Tripeptide while avoiding costly divergence terms. Our code is available at https://github.com/countrsignal/sita.git</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.31498v3</guid>
      <category>cs.LG</category>
      <category>q-bio.BM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Daniel Pe\~naherrera, Rishal Aggarwal, David Ryan Koes</dc:creator>
    </item>
    <item>
      <title>A Datalog Framework for Conflict-Free Replicated Data Types</title>
      <link>https://arxiv.org/abs/2605.31569</link>
      <description>arXiv:2605.31569v2 Announce Type: replace 
Abstract: Distributed applications increasingly support local-first collaboration over shared data, allowing multiple users to perform updates concurrently without global coordination. Such collaboration requires careful design to capture the intended semantics of the concurrent interactions.
  We introduce a declarative framework for specifying and reasoning about the semantics of conflict-free replicated data types (CRDTs) and CRDT-based applications in Datalog. The framework models CRDT semantics as executable logic programs over operation contexts, making concurrency explicit and compositional, and thus amenable to automated analysis. As one application, we use property-based testing to compare implementations. To the best of our knowledge, this is the first work to systematically use Datalog as a foundation for prototyping and analyzing complex CRDTs and their compositions.
  We evaluate our methodology using a collaborative graph data editing case study and report experimentation results assessing correctness validation and scalability with an increasing number of operations and replicas.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.31569v2</guid>
      <category>cs.DC</category>
      <category>cs.DB</category>
      <category>cs.LO</category>
      <category>cs.PL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Elena Yanakieva, Annette Bieniusa, Stefania Dumbrava</dc:creator>
    </item>
    <item>
      <title>ART: Attention Run-time Termination for Efficient Large Language Model Decoding</title>
      <link>https://arxiv.org/abs/2606.00024</link>
      <description>arXiv:2606.00024v2 Announce Type: replace 
Abstract: Long-context decoding in Large Language Models (LLMs) is constrained by the cost of accessing and processing the Key-Value (KV) cache. Despite the evidence that attention outputs depend jointly on keys and values, most existing KV management methods rely on key-only pruning, as incorporating values incurs prohibitive additional overhead. In this paper, we propose Attention Run-time Termination (ART), a lightweight run-time mechanism that tracks accumulated attention outputs during kernel execution and terminates subsequent KV block accesses once further contributions become negligible. Rather than replacing KV selection, ART dynamically terminates redundant KV traversal on top of existing dense or sparse attention policies. We introduce a stability-based criterion that monitors both magnitude and directional changes of intermediate attention outputs, and provide a theoretical characterization of the resulting truncation error. Experiments on LongBench and RULER Needle-in-a-Haystack tasks show that ART increases the generation throughput of existing KV-cache methods by up to 20%, without compromising the quality of the results.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.00024v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Chen Qiu, Guozhong Li, Cristian McGee, Aritra Dutta, Panos Kalnis</dc:creator>
    </item>
    <item>
      <title>Diffusion Image Generation with Explicit Modeling of Data Manifold Geometry</title>
      <link>https://arxiv.org/abs/2606.00094</link>
      <description>arXiv:2606.00094v2 Announce Type: replace 
Abstract: Image generative models aim to sample data points from the underlying data manifold, a task that requires learning and decoding a dense, low-dimensional, and compact parameterization space. To achieve this, we propose the Data Manifold-aware Image diffusioN moDel (MIND), a novel framework that explicitly models manifold geometry by integrating discrete patch tokenization into the score function of a continuous diffusion model. This approach successfully leverages both the structural quantification capabilities of discrete tokens and the parallel generation flexibility of continuous diffusion. Moreover, we enable end-to-end differentiable training via a novel soft top-$k$ aggregation mechanism and introduce dual-branch high-frequency feature embedding layers to alleviate the spectral bias of transformer backbones on low-dimensional inputs. Furthermore, for inference, we design a multi-stage transition sampling scheme that dynamically adjusts the sampling scheme based on timestep. Extensive experiments on ImageNet 256$\times$256 demonstrate the effectiveness of MIND. After 80-epoch training, our base model achieves an FID of 22.73 without guidance, nearly halving the 43.47 FID of the vanilla DiT-B/2 baseline. The proposed method reduces FID by 15.95 and 9.06 on average compared with the baselines DiT and SiT, respectively. For image generation on ImageNet-256$\times$256 with guidance, the proposed MIND-B with only 130M parameters achieves an FID of 2.06, superpassing the LlamaGen-3B with 3.1B parameters. The proposed MIND-XL with 715M parameters further reduces the FID to 1.95. Our MIND introduces a fresh perspective on diffusion-based image generation, paving the way for future research and innovation in this community. The code will be publicly available.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.00094v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Duoduo Xue, Zhiyu Zhu, Junhui Hou</dc:creator>
    </item>
    <item>
      <title>Continuous Reasoning for Vision-Language-Action</title>
      <link>https://arxiv.org/abs/2606.00229</link>
      <description>arXiv:2606.00229v2 Announce Type: replace 
Abstract: Natural language is a powerful reasoning medium for language and vision-language models, but it is mismatched to the granularity of continuous control. Text and explicit subgoals operate at task-level granularity, whereas vision-language-action (VLA) policies must choose actions at a much finer temporal scale; a single reasoning step can therefore span many action chunks while remaining only weakly coupled to the action needed now. This suggests a different question for VLA: what should play the role of language? We argue that a useful VLA reasoning medium must be shareable across model instances, verifiable through downstream action improvement, and aligned with temporally extended control structure.
  Based on this view, we propose Continuous Reasoning for Vision-Language-Action. Our model first predicts continuous reasoning in the form of a structured set of continuous thoughts, then reuses them as shared context for chunk-structured action generation. Better action prediction alone does not certify good reasoning: if the same internal medium cannot be shared across model instances and independently verified through improved downstream control, the added latent may simply become a model-private shortcut that helps on seen behaviors without supporting generalizable control. We therefore instantiate continuous reasoning as a shared Gaussian latent interface and train it with a self-verification objective in which an exponential-moving-average teacher must successfully consume the student's reasoning when predicting target actions.
  Empirically, Continuous Reasoning improves LIBERO-PRO robustness and performs strongly on real robots, raising mean subtask success over {\pi}0.5 by 40.4% on TX-G2, an AgiBot G2-compatible variant, and 26.3% on HSR. This suggests that reasoning in VLA is less about extra tokens than about a shareable, verifiable internal language for action.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.00229v2</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yueh-Hua Wu, Tatsuya Matsushima, Kei Ota</dc:creator>
    </item>
    <item>
      <title>VESTA: Visual Exploration with Statistical Tool Agents</title>
      <link>https://arxiv.org/abs/2606.00384</link>
      <description>arXiv:2606.00384v2 Announce Type: replace 
Abstract: Fitting quantitative models to data is a central step in scientific workflows, yet it remains one of the least automated. Recent agent-based systems leverage language and vision-language models (VLMs) to iteratively propose and refine statistical models, but these systems struggle on more challenging modeling tasks. To address these limitations, we introduce VESTA: Visual Exploration with Statistical Tool Agents, a framework that equips VLMs with a dynamically growing exploration toolkit to guide model refinement through data transformations, hypothesis-driven visualizations, and robust statistical tests. Unlike prior systems that rely on iterative critique alone, VESTA actively explores data before and during refinement by selecting or creating diagnostic tools, which accumulate in the model's context and can be reused later. We evaluate VESTA against established baselines in three toolkit configurations: no tools, static expert-written tools, and dynamic model-written tools. To support this evaluation, we introduce DAWN (Dataset for Automated Workflows and Numerical Modeling), a benchmark targeting distribution fitting and time series modeling with varying difficulty tiers, and culminating in real-world astronomy tasks including modeling initial mass functions and gravitational-wave chirp signals. We find that VESTA's dynamic tool creation outperforms prior agentic pipelines, with the largest gains on complex and domain-specific tasks. We further show that dynamically generated tools are substantially more sophisticated than those produced by existing visual tool-creation systems, covering more diagnostic categories per function and strongly preferring visual outputs that the VLM critic can reason over directly.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.00384v2</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>cs.CV</category>
      <category>cs.LG</category>
      <category>stat.CO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>William Rudman, Abhishek Divekar, Kanishk Jain, Sebastian Joseph, Stella S. R. Offner, Matthew Lease, Kyle Mahowald, Greg Durrett, Junyi Jessy Li</dc:creator>
    </item>
    <item>
      <title>On the Recoverability of Causal Relations from Bulk Gene Expression Data</title>
      <link>https://arxiv.org/abs/2606.00568</link>
      <description>arXiv:2606.00568v2 Announce Type: replace 
Abstract: Bulk gene expression profiling, which aggregates pooled RNA across cells within a biological sample, remains important in the single-cell era because it is typically less noisy, more sensitive, and more cost-effective than single-cell assays. Accordingly, a growing body of computational methods seeks to recover causal relations among genes from bulk expression data. However, aggregation is a lossy, non-invertible coarsening of the underlying cellular system, and it remains unclear whether and under what conditions causal relations are recoverable from aggregated bulk gene expression data. To answer this, we formalize recoverability under aggregation through two notions of consistency: functional-form consistency and conditional-independence consistency. We then derive necessary and sufficient conditions for recoverability, showing that these properties are preserved only under linear aggregations (e.g., sum/mean) coupled with affine structural equations. To assess the practical plausibility of these conditions, analyses of four bulk and four single-cell gene expression datasets further reveal that the estimated pairwise regulatory functions among genes deviate from linearity in both data types, providing limited empirical support for the linearity assumptions required for recoverability. Together, these results caution against recovering causal relations from aggregated bulk expression data without strong additional assumptions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.00568v2</guid>
      <category>cs.LG</category>
      <category>q-bio.GN</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gongxu Luo, Boyang Sun, Kun Zhang</dc:creator>
    </item>
    <item>
      <title>CR-JEPA: Cross-Modal Joint-Embedding Predictive Learning for Remote Sensing Image Retrieval</title>
      <link>https://arxiv.org/abs/2606.00706</link>
      <description>arXiv:2606.00706v2 Announce Type: replace 
Abstract: Cross-modal remote sensing image retrieval aims to retrieve semantically related scenes across heterogeneous sensing modalities. This remains challenging because paired observations may differ substantially in imaging physics, spatial resolution, spectral configuration, and visual appearance. Moreover, a single retrieval projection trained with one objective may be insufficient to jointly support cross-modal semantic alignment and same-modal neighbourhood preservation. We propose CR-JEPA, a Cross-modal Retrieval Joint-Embedding Predictive Architecture for dual-modality remote sensing retrieval. The model uses modality-specific stems, a shared transformer trunk, and JEPA-style predictive objectives to estimate masked latent target features within and across modalities. Inspired by LeJEPA, we apply Sketched Isotropic Gaussian Regularization to raw retrieval projections to stabilize embeddings and mitigate collapse. CR-JEPA further employs a decoupled-head design with a unified retrieval head for same-modal retrieval and a cross-modal retrieval head for cross-modal search. We evaluate CR-JEPA on BEN-14K, CBRSIR_VS, and DSRSID. On BEN-14K, CR-JEPA improves S1 to S2 retrieval from 61.23% to 75.82% and S2 to S1 retrieval from 63.73% to 75.40% over X-JEPA, while also achieving competitive same-modal retrieval with fewer parameters.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.00706v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Md Aminur Hossain, Ayush V. Patel, Nitant Dube, Biplab Banerjee</dc:creator>
    </item>
    <item>
      <title>MBench: A Comprehensive Benchmark on Memory Capability for Video World Models</title>
      <link>https://arxiv.org/abs/2606.00793</link>
      <description>arXiv:2606.00793v2 Announce Type: replace 
Abstract: Recent advancements in video-based world models have demonstrated an unprecedented ability to synthesize high-fidelity visual sequences. However, a fundamental gap persists between visually plausible video generation and the functional requirements of a world model, particularly in maintaining a stable and reasonable internal state over extended temporal horizons. While existing benchmarks primarily emphasize visual quality, motion coherence, and text-video alignment, they largely overlook memory, the core capability of a world model to preserve consistency across long-term horizons and complex interactions. To address this gap, we present \textbf{MBench}, a comprehensive benchmark dedicated to quantifying and evaluating the memory capability of video world models. We systematically decompose the memory capability of video world models into three hierarchical and complementary core dimensions: entity consistency, environment consistency, and causal consistency, which are further refined into 12 quantifiable sub-dimensions for comprehensive characterization of long-term memory. Our benchmark is built upon rigorously curated real-captured long videos, and evaluated by rule-based quantitative matrices and VLM to enable objective and comprehensive consistency assessment. Extensive evaluations of mainstream state-of-the-art video world models reveal critical systemic limitations of existing methods in long-term state retention, providing a standardized benchmark and clear research direction to advance the field.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.00793v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shengjun Zhang, Zhang Zhang, Simin Huang, Zhenyu Tang, Hanyang Wang, Chensheng Dai, Min Chen, Yifan Li, Yuxin Li, Yingjie Chen, Hao Liu, Chen Li, Jing Lyu, Yueqi Duan</dc:creator>
    </item>
    <item>
      <title>Beyond Independent Manipulation: Individual Fairness-aware Strategic Classification with Peer Imitation</title>
      <link>https://arxiv.org/abs/2606.00827</link>
      <description>arXiv:2606.00827v2 Announce Type: replace 
Abstract: Strategic classification (SC) investigates scenarios where agents manipulate their features to obtain favorable decisions from predictive models. Existing fairness-aware SC approaches primarily focus on group fairness and typically assume that agents respond independently. However, when individual fairness is required, ensuring similar individuals receive similar outcomes, agents' manipulation becomes interdependent: an agent's preferred manipulation depends on the neighborhoods' outcomes. This induces a mismatch between classical SC formulations and fairness-aware decision settings, where independent models no longer accurately characterize strategic manipulations. To address this issue, we introduce individual fairness-aware strategic classification (IFSC), a framework that models peer-driven manipulation arising from individual fairness, where agents imitate nearby positively decided peers to obtain favorable outcomes. IFSC characterizes strategic manipulation as similarity-based imitation toward visible accepted peers and learns classifiers under the resulting post-manipulation distributions. To account for uncertainty in peer observability, IFSC employs a robust learning process that introduces stochastic perturbations during manipulation simulation. Experiments on synthetic and real-world datasets demonstrate that IFSC improves individual-fairness consistency and mitigates imitation-induced distortions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.00827v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3770855.3817670</arxiv:DOI>
      <dc:creator>Xinpeng Lv, Chunyuan Zheng, Yunxin Mao, Renzhe Xu, Jinxuan Yang, Yuanlong Chen, Wangrong Huang, Shaowu Yang, Wenjing Yang, Xinwang Liu, Peng Cui, Haotian Wang</dc:creator>
    </item>
    <item>
      <title>MedSyn2: Flexible Control of 3D CT Generation via Text and Semantically-Defined Segmentation Prompts</title>
      <link>https://arxiv.org/abs/2606.00967</link>
      <description>arXiv:2606.00967v3 Announce Type: replace 
Abstract: Generative models for volumetric medical images have found many applications in medical imaging, ranging from data augmentation to serving as priors for inverse problems. For these applications, generating high-resolution 3D images with strong controllability is essential but remains highly challenging. Existing approaches typically control generation either through radiology reports used as text prompts or through full image segmentation. While text-based prompting is flexible, it provides limited spatial control over the location, shape, and boundary of abnormalities. In contrast, segmentation-based methods receive precise spatial guidance but are restrictive in requiring full-organ annotations. In this work, we propose a flexible multimodal framework for controllable volumetric image generation that supports input from radiology reports and segmentation prompts (both optional). Our approach allows users to provide segmentation of a specific anatomy or abnormality without requiring full-organ annotations. The semantic meaning of the segmentation mask is specified through an accompanying text description, resulting in a highly flexible and scalable conditioning mechanism. We develop a memory-efficient architecture based on a modified diffusion transformer that jointly processes image and segmentation tokens. The model further incorporates gated attention to effectively attend to long radiology reports. Experiments demonstrate that our method achieves state-of-the-art perceptual and semantic scores (e.g., 24% relative improvement in mean FID), generates high-resolution anatomically consistent CT volumes, and improves data efficiency when used for data augmentation. Radiologists' evaluation further confirms strong alignment between generated and real medical images.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.00967v3</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Weicheng Dai, Chenyu Wang, Binxu Li, Shantanu Ghosh, Afrooz Zandifar, Christina LeBedis, Kayhan Batmanghelich</dc:creator>
    </item>
    <item>
      <title>MENTIS: What Belief Changes Under Alignment? Measuring Multi-Scale Latent Torsion in Language Models</title>
      <link>https://arxiv.org/abs/2606.01060</link>
      <description>arXiv:2606.01060v2 Announce Type: replace 
Abstract: Preference alignment has substantially improved the observable behavior of large language models, yet it remains unclear what alignment changes internally. Aligned systems still fail under jailbreaks, prompt injection, and retrieval-time corruption, suggesting behavior-level evaluation alone is incomplete. Post-training should leave measurable traces in internal computation. We ask: when an instruction-tuned (IT) model becomes a preference-aligned (PA) model, what geometric structure changes, where do those changes concentrate, and how selectively do they vary across concepts, prompts, and model families?
  We introduce MENTIS, a geometry-first framework for measuring alignment-induced internal reorganization in paired checkpoints. MENTIS compares IT and PA models using a primary layerwise covariance-based torsion norm (T1), a secondary spectral torsion diagnostic (T2), and an Energy-Radiance-Activation measure (ERA) for depth localization. Across four 7-8B model pairs on LITMUS, our study reveals that alignment-induced change is selective rather than uniform: normative concepts exhibit larger torsion shifts than factual concepts on average; torsion is negatively correlated with contextual entropy; and peak effects localize to architecture-specific mid-to-late layers. The same pattern appears across word-level, prompt-level, and model-level analyses. These results suggest preference alignment leaves structured, depth-localized geometric signatures in internal computation beyond what behavior-level evaluation alone can reveal.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.01060v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Partha Pratim Saha, Samarth Raina, Mayur Parvatikar, Amit Dhanda, Vinija Jain, Aman Chadha, Amitava Das</dc:creator>
    </item>
    <item>
      <title>ImagineUAV: Aerial Vision-Language Navigation via World-Action Modeling and Kinodynamic Planning</title>
      <link>https://arxiv.org/abs/2606.01205</link>
      <description>arXiv:2606.01205v2 Announce Type: replace 
Abstract: Vision-language navigation (VLN) for UAVs demands grounding free-form instructions into 6-DoF flight under partial observability. While Vision-Language-Action (VLA) models excel at semantic reasoning, they suffer from brittleness due to geometric inconsistency and dynamics mismatch. To address this, we propose ImagineUAV, an imagination-driven framework leveraging cascaded world-action modeling. Instead of direct regression, ImagineUAV employs a latent video diffusion model to generate instruction-conditioned future observations, explicitly imagining environmental evolution, from which 6-DoF motions are inferred via an action extractor. A kinodynamic planner then refines these estimates into collision-free trajectories. Additionally, a step-distilled inference pipeline ensures real-time execution. With only 1.3B parameters, ImagineUAV outperforms prior VLN and VLA baselines on benchmarks and real-world flights, validating the practicality of imagination-driven aerial navigation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.01205v2</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xuchen Liu, Jiawei Huang, Shihao Xia, Bingxi Liu, Jinqiang Cui, Jiankun Yang</dc:creator>
    </item>
    <item>
      <title>When Hard Negatives Hurt: Bridging the Generative-Discriminative Gap in Hard Negative Synthesis for Retrieval</title>
      <link>https://arxiv.org/abs/2606.01304</link>
      <description>arXiv:2606.01304v2 Announce Type: replace 
Abstract: Hard negative mining has become the dominant strategy for training retrievers, yet it faces intrinsic limitations: negatives are bounded by corpus availability, selected by retriever score rather than diagnostic value, and increasingly contaminated by false positives as the retriever improves. LLM-based synthesis offers a principled alternative, where negatives that are unconstrained, targeted, and free from false positive risk. But we show that naively incorporating generated negatives into contrastive learning often degrades retrieval performance. We identify and formalize the root cause as a generative-discriminative gap: LLM generation optimizes for fluent, plausible text, while contrastive learning demands strategic violations of relevance at the decision boundary. Our analysis reveals two compounding failure modes: discriminative-agnostic generation, where the LLM lacks an explicit model of query information needs and defaults to generic or topic-drifted text that provides no contrastive signal; and source-dependent shortcuts, where distributional artifacts enable the model to distinguish negatives by origin rather than relevance, causing gradient drift that actively corrupts optimization. To close this gap, we propose CausalNeg consisting of two main modules: (1) CoT-guided counterfactual perturbation for data construction: decomposes why a document satisfies a query into explicit information requirements, then surgically violates individual requirements to construct negatives with controlled, interpretable hardness. (2) Query-view entropy maximization during training: disperses generated negatives across the similarity spectrum, minimizing the mutual information between source identity and similarity scores to suppress shortcut exploitation. We make our code publicly available at https://github.com/mzhangzhicheng/CausalNeg.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.01304v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3770855.3818118</arxiv:DOI>
      <dc:creator>Zhicheng Zhang, Jiwei Tang, Kuicai Dong, Xiaopeng Li, Jieming Zhu, Jingyu Li, Qianhui Zhu, Fengyuan Lu, Wang Jiaheng, Gang Wang, Hai-Tao Zheng, Zhaocheng Du</dc:creator>
    </item>
    <item>
      <title>Towards Optimal Robustness in Learning-Augmented Paging</title>
      <link>https://arxiv.org/abs/2606.01342</link>
      <description>arXiv:2606.01342v3 Announce Type: replace 
Abstract: Learning-augmented paging has been extensively studied in recent years. A key advantage over naive ML-based approaches is \emph{bounded robustness}, which guarantees worst-case performance even when predictions are inaccurate, making these algorithms valuable for real-world systems. Prior work achieves robustness bounds of $2H_k + O(1)$ in the randomized setting, leaving a gap to the optimal competitive ratio $H_k$.
  In this paper, we study how to close this gap. We begin by reviewing online optimality and proving a new property of the latest $H_k$-competitive algorithm, which facilitates our analysis in the learning-augmented setting. Then, we review existing learning-augmented paging algorithms and introduce a unifying primitive, the \emph{relative prediction budget}, which captures the essence of establishing robustness and reveals that prior algorithms either overuse or underutilize predictions. Guided by the above analysis, we develop a new framework that achieves the best-possible robustness up to an additive constant for learning-augmented paging: $H_k + O(1)$. Experiments further demonstrate strong practical performance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.01342v3</guid>
      <category>cs.DS</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Peng Chen, Hailiang Zhao, Xueyan Tang, Yixuan Wang, Shuiguang Deng</dc:creator>
    </item>
    <item>
      <title>Turning Back Without Forgetting: Selective Backward Refinement for Parameter-Efficient Continual Learning</title>
      <link>https://arxiv.org/abs/2606.01379</link>
      <description>arXiv:2606.01379v2 Announce Type: replace 
Abstract: While prompt-based parameter-efficient continual learning mitigates catastrophic forgetting by isolating task-specific prompts, this isolation also limits later tasks from improving earlier ones, leaving backward knowledge transfer underexplored. We address this limitation by proposing Selective bAckward refinement for positive Backward knowledge transfER (SABER), a replay-free framework that enables controlled backward transfer in prompt-based continual learning. SABER determines when backward refinement is beneficial using complementary task-correlation criteria based on prompt-gradient geometry and loss-distribution similarity, and how to perform refinement safely by restricting updates to non-interfering directions in the prompt parameter space. Extensive experiments across multiple continual learning benchmarks and diverse pretrained backbones, including T5-Large, LLaMA, and Qwen, demonstrate that SABER consistently achieves positive backward transfer while maintaining strong overall average performance. Code is available at https://github.com/OptMN-Lab/SABER-ICML-2026/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.01379v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anushka Tiwari, Kaiyi Ji</dc:creator>
    </item>
    <item>
      <title>Crazyflow: An Accurate, GPU-Accelerated, Differentiable Drone Simulator in JAX</title>
      <link>https://arxiv.org/abs/2606.01478</link>
      <description>arXiv:2606.01478v2 Announce Type: replace 
Abstract: High-quality, large-scale synthetic data from simulations is becoming a cornerstone for pushing the capabilities of robot algorithms. While aerial robotics simulators have evolved to support specialized needs such as fidelity, differentiability, and swarms independently, a unified platform that can synthesize data across all these domains is missing. In this work, we propose Crazyflow, a simulator designed to push the limits of aerial-robotics algorithm development, from model-based to data-driven methods, gradient-based to sampling-based approaches, and single-agent to multi-agent systems. Compared to existing state-of-the-art drone simulators, it achieves speeds more than an order of magnitude faster for a single drone and can simulate thousands of swarms of 4000 drones each. Real-world experiments show Crazyflow supports both analytical-gradient-based policy learning, achieving sub-centimeter trajectory tracking accuracy without domain randomization, and sampling-based obstacle avoidance at speeds exceeding half a billion steps per second. Breaking the traditional train-then-deploy paradigm, we show that its unprecedented speed even enables in-flight reinforcement learning; we demonstrate this by throwing a physical drone into the air and training a recovery policy from scratch in 0.38 seconds, successfully stabilizing the drone. Crazyflow supports multiple levels of simulation abstraction, is directly compatible with all open-source Crazyflie models, and enables rapid reconfiguration across custom drone platforms and applications by providing a light-weight system identification pipeline. By pushing accuracy, speed, and differentiability simultaneously, Crazyflow serves as an open-source resource for synthetic data generation, with emerging capabilities for large-scale parallelization for online, in-execution learning and optimization, opening the door to novel algorithm development.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.01478v2</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.MA</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Martin Schuck, Marcel P. Rath, Yufei Hua, Abhishek Goudar, SiQi Zhou, Angela P. Schoellig</dc:creator>
    </item>
    <item>
      <title>Flexible Online Representation Learning Based on Similarity Matching</title>
      <link>https://arxiv.org/abs/2606.01546</link>
      <description>arXiv:2606.01546v2 Announce Type: replace 
Abstract: Sparse high-dimensional representations are conducive to uncovering nontrivial structures in unsupervised exploration of data. Such a representation can deal with the dense connectivity in graphs relevant to community detection problems. However, sparse high-dimensional representations are capable of doing more, including manifold tiling and feature learning. Conventional algorithms optimize in the space of computationally intractable completely positive matrices or relax the problem to the space of doubly nonnegative matrices that scale with sample size in a way rendering them impractical for large data sets. Some of these methods also impose a row sum constraint, such as double stochasticity. Row sum constraints have the added advantage of being shift-invariant, in the context of manifold tiling. Constraints on the row sum of output similarity matrices require nontrivial online learning rules. Addressing these needs, we propose a versatile online biologically plausible learning algorithm capable of learning sparse shift-invariant representations, useful for clustering, manifold tiling, or sparse coding, depending on the data structure.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.01546v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shagesh Sridharan, Yanis Bahroun, Anirvan M. Sengupta</dc:creator>
    </item>
    <item>
      <title>Defenses &amp; Enablers For Skill Injection Attacks on Terminal Based Agents</title>
      <link>https://arxiv.org/abs/2606.01567</link>
      <description>arXiv:2606.01567v2 Announce Type: replace 
Abstract: Large language model (LLM) agents increasingly rely on reusable skills i.e. documents describing task-specific procedures. However, this introduces a new attack surface for agents to manage. We study two complementary directions for this threat. First, we evaluate guardian-based defenses: an intermediary LLM agent that acts as a mediator for skill file access (dynamic guardian) or pre-rewrites these files at build time (static guardian). Across three LLM agent families, our guardians cut attack success rate (ASR) by well over half while preserving task utility. Second, we stress test them through attack reframing using four attacks that preserve the malicious instruction but change the phrasing. For non-guardian setup, the reframing pushes the ASR up to 81.4\%, but the dynamic guardian brings it down to 18.6\%, showing that real-time mediation is a robust defense.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.01567v2</guid>
      <category>cs.CR</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yoshinari Fujinuma, Varun Gangal, Traian Rebedea, Makesh Narsimhan Sreedhar, Prasoon Varshney, Rebecca Qian, Anand Kannappan</dc:creator>
    </item>
    <item>
      <title>ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL</title>
      <link>https://arxiv.org/abs/2606.01619</link>
      <description>arXiv:2606.01619v2 Announce Type: replace 
Abstract: Agentic reinforcement learning (RL) enables LLM agents to improve continuously from environment rewards, yet the resulting policies do not systematically accumulate reusable strategies that generalize across tasks. Modular skills can provide such reusable strategies, yet existing skill-augmented RL methods decouple skill creation from policy optimization, risking adopting skills that conflict with the evolving policy. Inspired by Anthropic's Skill Creator, we introduce ReSkill, an RL-in-the-loop skill creation framework that reconciles skill evolution with policy learning. ReSkill exploits the group-wise structure of GRPO to naturally embed three mechanisms with only marginal additional overhead: (1) an assertion-driven skill creator that diagnoses failures from past experience and proposes conditional, trigger-based skill revisions; (2) within-group rollout sampling that enables controlled comparison of skill versions, capturing which version best supports the policy's ongoing learning; and (3) Thompson Sampling with adaptive discounting to balance exploration and exploitation in skill version selection as the policy evolves. Across several domains, ReSkill consistently outperforms existing memory and skill-based RL methods, with the largest gains on unseen tasks. Analysis of the skill lifecycle shows skills being automatically created, tested, refined, and pruned as the policy improves, demonstrating reconciled skill-policy co-evolution.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.01619v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zelin He, Haotian Lin, Boran Han, Wei Zhu, Haoyang Fang, Bernie Wang, Xuan Zhu, Runze Li, Matthew Reimherr</dc:creator>
    </item>
    <item>
      <title>Easier to Mislead Than to Correct: Harmful and Beneficial Revision in LLM Conformity</title>
      <link>https://arxiv.org/abs/2606.01637</link>
      <description>arXiv:2606.01637v2 Announce Type: replace 
Abstract: Large language models are increasingly used in multi-agent systems, where they see and respond to other agents' answers. A key risk is conformity: a model may abandon its own answer simply because others agree on a different one. Prior studies show that LLMs often revise toward a majority answer, but it remains unclear whether these revisions help correct mistakes as often as they introduce new errors. In this paper, we conduct a controlled study in which an LLM first answers a question, then sees simulated peer responses before making a final decision. We manipulate two social cues: consensus structure and authority labels assigned to peers, and measure how they influence beneficial and harmful revisions. Across four open-weight LLMs and seven QA datasets, we find that peer agreement makes it much easier to mislead initially correct models than to correct initially wrong ones. Authority labels make models more likely to choose the endorsed answer, regardless of whether it is correct. More concerningly, generic reasoning interventions such as chain-of-thought and reflection do not reliably reduce harmful revision while preserving beneficial revision. These findings suggest that multi-agent LLM systems should verify peer answers rather than simply aggregate them.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.01637v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiaming Qu, Lucheng Fu, Yibo Hu</dc:creator>
    </item>
    <item>
      <title>Argument Collapse: LLMs Flatten Long-Form Public Debate</title>
      <link>https://arxiv.org/abs/2606.01736</link>
      <description>arXiv:2606.01736v3 Announce Type: replace 
Abstract: As LLMs are increasingly used to draft public-facing arguments, they may flatten public debate by repeatedly introducing the same polished, plausible arguments. We study argument collapse, the tendency of essays generated by different LLMs to converge to a smaller set of main arguments, sub-arguments, and paragraph-level structures. We compare 1,039 human responses from 195 New York Times (NYT) debates, 448 human responses from 61 longer-form Boston Review (BR) forums, and 23,384 LLM-generated essays. In the NYT corpus, 65.3% of human main arguments are unique within a debate, compared to 3.4% of LLM main arguments. Asking LLMs to generate diverse answers adds variation, but a typical model recovers only about half of the distinct human main arguments, with much of the added variation falling outside the observed human argument space. Collapse also appears in sub-arguments, where among essays with the same main argument, 41.0% of human sub-arguments are unique versus 9.1% from LLM responses. Qualitatively, LLMs often reuse generalized and hedged sub-arguments, while humans prefer more concrete and topic-specific ones. Structure-wise, LLM-generated essays tend to follow a more fixed arc, often opening with a direct claim and moving quickly toward proposals. The same patterns hold in longer BR essays, suggesting that argument collapse extends beyond short-form responses.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.01736v3</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yekyung Kim, Yapei Chang, Chau Minh Pham, Mohit Iyyer</dc:creator>
    </item>
    <item>
      <title>SparseX: Efficient Segment-Level KV Cache Sharing for Interleaved LLM Serving</title>
      <link>https://arxiv.org/abs/2606.01751</link>
      <description>arXiv:2606.01751v2 Announce Type: replace 
Abstract: In long-context LLM serving, the prefill stage often dominates time-to-first-token and computational cost. Although Prefix Cache in vLLM/PagedAttention has been widely used to reuse identical prompt prefixes, repeated content in practical applications frequently appears as non-prefix, cross-request, cross-turn, and cross-agent segments, which makes conventional cache mechanisms insufficient. This paper presents SparseX, a segment-level KV Cache sharing method for common serving scenarios. SparseX uses contiguous token segments as reuse units and exploits Sparse-Q indices that naturally arise in KV Cache reuse workloads to estimate the key tokens that require correction. Based on this estimate, SparseX performs Sparse-KV Recomputation within a single forward pass, thereby restoring cross-segment contextual interactions under complex interleaved reuse patterns while avoiding additional models or separate preprocessing stages for token selection. SparseX further implements a full+sparse hybrid attention mode based on a layer-specific threshold: early layers retain full attention to obtain a more stable token-importance signal, and later layers switch to sparse recomputation to improve reuse quality on complex long-context tasks. We implement SparseX-vLLM on top of vLLM, integrating segment-level cache lookup, PagedAttention management, RoPE alignment, Sparse-Q token selection, and FlashAttention backends into a unified execution path. SparseX is model-agnostic, training-free, and compatible with Prefix Cache, and it provides unified support for common online serving scenarios including multi-round chat, retrieval-augmented generation (RAG), and agent workflows.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.01751v2</guid>
      <category>cs.PF</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Quqing Zhang, Kai Chen, Ning Liao, Zehao Lin, Bo Tang, Feiyu Xiong, Zhiyu Li, Xiaoxing Wang</dc:creator>
    </item>
    <item>
      <title>WorldCoder-Bench: Benchmarking Physically Grounded 3D World Synthesis</title>
      <link>https://arxiv.org/abs/2606.01869</link>
      <description>arXiv:2606.01869v2 Announce Type: replace 
Abstract: Large language models (LLMs) are increasingly asked not only to write static interfaces, but to construct executable interactive worlds from natural language. Browser-native 3D, commonly built with Three.js, is a natural next frontier: generated programs must integrate assets, obey spatial and physical constraints, and keep user-facing controls synchronized with hidden runtime state. Existing web-generation benchmarks and evaluators, however, largely observe only pixels or DOM nodes, while the mechanics of a Three.js world unfold inside an opaque &lt;canvas&gt;. We introduce WorldCoder-Bench, a benchmark for autonomous, physically grounded 3D world synthesis. WorldCoder-Bench contains 2,026 expert-curated tasks across Simulation, Rendering, and Application scenarios, with optional .glb assets and hidden behavioral contracts. We further propose StateProbe, an execution-based protocol that probes generated programs in a sandboxed browser and verifies hidden, mutation-hardened contracts over runtime states and transitions. Beyond verification coverage, we report Return on Automation and Time Efficiency Multiplier to measure correctness-adjusted cost and time savings. Across nine frontier models, the best system reaches only 27.8% verification coverage on WorldCoder-Core and 19.9% on WorldCoder-Robust, with failures dominated by state-schema drift and broken interaction chains rather than missing scene elements. Utility metrics further show that cheap or fast models can still provide substantial value on easier domains. WorldCoder-Bench is available at https://anonymous.4open.science/r/WorldCoder-Bench/.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.01869v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Shuo Lu, Yinuo Xu, Kecheng Yu, Siru Jiang, Yongcan Yu, Yubin Wang, Haitao Yang, Yuxiang Zhang, Bin Wang, Ran He, Jian Liang</dc:creator>
    </item>
    <item>
      <title>Dexterity-BEV: Aligning 3D World and Actions for Generalizable Robot Policies Learning</title>
      <link>https://arxiv.org/abs/2606.02274</link>
      <description>arXiv:2606.02274v2 Announce Type: replace 
Abstract: End-to-end manipulation policies, combined with web-scale pretrained Vision-Language Models (VLMs), show the promise for generalizable and dexterous robotic manipulation. However, they inherit two key limitations from 2D foundation models: 1) the reliance on 2D RGB inputs that ignores the intrinsically 3D nature of manipulation; and 2) the lack of spatial 3D alignment between input-output spaces as well as across diverse robot embodiments, camera setups, and trajectory datasets. In this paper, we present a series of contributions to address these issues. First, we introduce aligned vertex map and vertex spectrum -- a pixel-wise 3D representation that elevates 2D visual inputs to 3D, using camera calibration and optional depth. This novel input representation marries 3D awareness with the generalization of 2D large VLMs. Then, we propose to align the inputs and outputs of manipulation policies by expressing per-pixel 3D information of each camera view and robot actions to a shared coordinate. Based on this, we designate a canonical Bird's-Eye-View (BEV) alignment frame and innovatively propose to construct BEV images, producing a view-invariant representation robust to camera pose variations. To enable training and evaluation at scale, we develop a comprehensive data processing pipeline to perform such alignments; we also introduce a novel temporal alignment scheme for trajectories across diverse robots, human operators, and datasets. These contributions collectively mitigate input and output spatial-temporal misalignments, improving the consistency and generalization for real-world manipulation. Pretrained checkpoint, source code and data processing pipeline are available in https://hnuzhy.github.io/projects/Dex-BEV.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.02274v2</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Huayi Zhou, Wei Gao, Dekun Lu, Ruiji Liu, Zhanqi Zhang, Ziyang Zhang, Jian Chen, Wenlve Zhou, Sheng Xu, Shumin Li, Kangyi Guo, Shichen Xu, Zixin Huang, Yongyi Su, Kui Jia</dc:creator>
    </item>
    <item>
      <title>Parameter-efficient Dual-encoder Architecture with Differentiable Choquet Integral Fusion for Underwater Acoustic Classification</title>
      <link>https://arxiv.org/abs/2606.02341</link>
      <description>arXiv:2606.02341v2 Announce Type: replace 
Abstract: Underwater acoustic classification has a wide array of oceanic applications, but faces challenges due to an increasingly complex acoustic environment. Waveform and spectrogram representations have been primarily used as acoustic data features for classification tasks in this domain. Spectrograms model harmonic dependencies, but these reduced representations can filter out acoustic features relevant for discrimination. While phase information from the waveform allows full characterization of the signal, the original waveform can be noisy and complex, rendering this representation difficult for models to process directly. This paper proposes a dual-encoder neural architecture to simultaneously process acoustic waveforms and spectrograms, leveraging pre-trained backbones and parameter-efficient fine-tuning modules, enabling a domain adaptation. To combine these adapted branches, a novel differentiable fuzzy aggregation mechanism based on the Choquet integral is introduced to balance the temporal and spectral representations. This fusion strategy not only yields higher classification accuracy but also provides interpretability. Specifically, by analyzing the learned fuzzy measures, insights are revealed about class-specific shifts in the network's representation reliance. By dynamically shifting attention to the representation least corrupted by potential asymmetric channel distortions, the proposed gating mechanism mitigates the non-stationary challenges of the underwater environment. Evaluations on the DeepShip and ShipsEar datasets demonstrate that the proposed architecture achieves classification improvements over independent single-encoder baselines, while simultaneously restricting the trainable parameter space. This mitigates the risk of overfitting on limited acoustic datasets while alleviating the computational costs associated with fully fine-tuning foundation models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.02341v2</guid>
      <category>cs.SD</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Amirmohammad Mohammadi, Joshua Peeples, Alexandra Van Dine</dc:creator>
    </item>
    <item>
      <title>Local Preferential Bayesian Optimization</title>
      <link>https://arxiv.org/abs/2606.02351</link>
      <description>arXiv:2606.02351v2 Announce Type: replace 
Abstract: Bayesian optimization (BO) is a popular and effective approach for tuning expensive, noisy experiments, but requires the formulation of an explicit objective function. Preferential BO (PBO) removes this requirement by learning from pairwise human feedback, yet existing methods struggle to efficiently optimize beyond low- and medium-dimensional problems due to their global search approaches. We address this limitation by developing a family of local PBO methods that transfer key ideas from high-dimensional BO to the preferential setting. In particular, we introduce local PBO methods which adapt trust-region and derivative-informed local search to pairwise preference feedback, where the latter exploits first- and second-order derivatives of the Laplace-approximated GP posterior. Our benchmark on GP sample paths, standard optimization benchmark functions, and policy-search tasks shows that local PBO methods are especially effective in high-dimensional and complex landscapes with steep optima. Compared with global preference-based baselines, they can substantially reduce cumulative regret, making them particularly useful for real-world preference-based optimization tasks such as policy search.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.02351v2</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Johanna Menn, Miriam Kober, Paul Brunzema, David Stenger, Sebastian Trimpe</dc:creator>
    </item>
    <item>
      <title>A Game-Theoretic Decision Framework for Optimal Selection of Coordination Detection Methods in Multi-UAV Fleet Operations</title>
      <link>https://arxiv.org/abs/2606.02383</link>
      <description>arXiv:2606.02383v2 Announce Type: replace 
Abstract: Detecting coordination among unmanned aerial vehicle (UAV) fleets operating in shared airspace and identifying the route-lead aircraft whose navigation decisions govern fleet behavior presents a fundamental speed--accuracy trade-off: fast methods enable real-time traffic management but sacrifice detection fidelity, while accurate methods may exceed the time budget for actionable airspace deconfliction. This paper presents a game-theoretic decision framework that resolves this trade-off by formulating method selection as a two-player zero-sum game between a Monitor (selecting computational methods and parameters) and Nature (selecting the unknown traffic scenario). We construct an end-to-end pipeline from trajectory surveillance data through eight candidate detection algorithms, a Monte Carlo sensitivity analysis characterizing their stochastic performance, and finally a multi-objective optimization layer that identifies Pareto-optimal method portfolios. The minimax solution provides a robust mixed strategy with a probability distribution over methods that guarantees worst-case performance regardless of scenario uncertainty. Experimental evaluation across 200 randomized configurations spanning 5--50 aircraft demonstrates that the framework recommends distinct method portfolios depending on operational priority: Koopman Phase dominates balanced (70.6%) and speed-priority (79.7%) profiles, while CRQA emerges as primary (47.4%) when route-lead identification is prioritized. The framework achieves a guaranteed game value of 0.29--0.53 (normalized utility) across all tested preference profiles, providing the first principled, scenario-adaptive methodology for computational method selection in UTM fleet monitoring operations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.02383v2</guid>
      <category>cs.MA</category>
      <category>cs.GT</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Christian Manasseh, Savana Ammons</dc:creator>
    </item>
    <item>
      <title>IMAC-AgriVLN: Can Agricultural Vision-and-Language Navigation Agents be Aware of Instruction Mistakes?</title>
      <link>https://arxiv.org/abs/2606.02519</link>
      <description>arXiv:2606.02519v2 Announce Type: replace 
Abstract: Agricultural robots are serving as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily relying on manual operations or railway systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extended Vision-and-Language Navigation (VLN) to the agricultural domain, enabling a robot to navigate to a target position following a natural language instruction. However, almost all the prior methods adopt an ideal assumption that the given instructions themselves are correct, which does not align with the realistic scenarios, because anybody may say an instruction with mistakes. To bridge this gap, we propose the A2A-MI benchmark, in which we build a semi-automatic data annotator to insert three mistake classifications into each original instruction in a more diversified and efficient way. We test several state-of-the-art agricultural VLN agents on it and observe a sufficient drop with -57% on SR and -9% on NE, from which we suggest that an agricultural VLN agent tends to assume that the given instruction is correct, so does not have the awareness to doubt it when the scenes it sees do not align with the instruction it receives. To build the awareness on instruction mistake, we propose the IMAC module analyzing the instruction and the current front-facing image, to judge whether the instruction has mistakes and attempt to correct it when needed. We integrate IMAC into the baseline model, and observe a noteworthy improvement, sufficiently narrowing the gap to the performance on instructions without mistakes. Project: https://github.com/AlexTraveling/IMAC-AgriVLN.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.02519v2</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xiaobei Zhao, Xingqi Lyu, Xin Chen, Xiang Li</dc:creator>
    </item>
    <item>
      <title>See Less, Specify More: Visual Evidence Budgets for Generalizable VLAs</title>
      <link>https://arxiv.org/abs/2606.02735</link>
      <description>arXiv:2606.02735v2 Announce Type: replace 
Abstract: Generalization remains a central bottleneck for vision-language-action (VLA) models: under distractors, appearance shifts, and semantically similar tasks, the policy must often infer local execution details from coarse instructions while also deciding which parts of the image matter for control. We present S2 (See Less, Specify More), a framework for improving VLA generalization by training the executor under a cleaner interface.
  Specify More preserves the original instruction as a stable high-level goal while relabeling each trajectory into refined trajectory- and subtask-level language that disambiguates the current execution mode. Unlike native attention, See Less imposes an explicit visual evidence budget, training the executor to act from task-sufficient evidence rather than unconstrained visual context, without any region or mask annotation.
  This interface lets the executor follow detailed guidance without relying on distracting visual patches or resolving avoidable ambiguity on its own, and it remains compatible with off-the-shelf VLM planners through in-context learning. Across our main evaluation settings, S2 improves overall generalization metrics by changing the executor's learning problem: coarse instructions induce avoidable supervision aliasing, goal-preserving local guidance outperforms instruction replacement in our main ablations, and explicit evidence budgeting reduces dependence on broad visual context beyond efficiency considerations.
  Across eight real-robot tasks on TX-G2 (an AgiBot G2-compatible variant) and HSR, S2 raises mean subtask success from 54.2% to 79.0% over pi0.5. Together, these results suggest that VLA generalization improves when the executor is trained to act from informative local guidance and task-sufficient visual evidence, rather than recovering both from weak supervision.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.02735v2</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yueh-Hua Wu, Tatsuya Matsushima, Kei Ota</dc:creator>
    </item>
    <item>
      <title>Do Value Vectors in Deep Layers Need Context from the Residual Stream?</title>
      <link>https://arxiv.org/abs/2606.02780</link>
      <description>arXiv:2606.02780v2 Announce Type: replace 
Abstract: The success of the transformer architecture as the backbone of modern LLMs is in large part due to its use of attention layers. An attention layer follows the standard neural network paradigm: it takes the residual stream as input and thereby produces context-dependent query, key, and value vectors. However, we find that model performance meaningfully improves when deeper layers learn only a context-free value vector to preserve the original token information, without drawing on any context from the residual stream. When the model has access to this context-free value vector, adding back the context-dependent component provides little additional benefit for aggregate benchmark performance. Such context-free value vectors can be stored as sparse model parameters, eliminating the need to recompute or persistently cache these values. Through systematic ablations on the key design choices for such context-free value vectors, we propose Bank of Values (BoV), a new way of computing value vectors in attention by learning a lookup table of token-specific value vectors for each of the last third of layers. Across 135M and 780M models, BoV improves validation loss over standard attention and, at 780M, the average score across 21 benchmarks, matching the previous best method that adds token information to the value vector with less compute and memory.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.02780v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Muyu He, Yuchen Liu, Qingya Huang, Li Zhang</dc:creator>
    </item>
    <item>
      <title>Cosmos 3: Omnimodal World Models for Physical AI</title>
      <link>https://arxiv.org/abs/2606.02800</link>
      <description>arXiv:2606.02800v2 Announce Type: replace 
Abstract: We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI -- effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 License at https://github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3. The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.02800v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <category>cs.MM</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator> NVIDIA,  :,  Aditi, Niket Agarwal, Arslan Ali, Jon Allen, Martin Antolini, Adeline Aubame, Alisson Azzolini, Junjie Bai, Maciej Bala, Yogesh Balaji, Josh Bapst, Aarti Basant, Mukesh Beladiya, Mohammad Qazim Bhat, Zaid Pervaiz Bhat, Dan Blick, Vanni Brighella, Han Cai, Tiffany Cai, Eric Cameracci, Jiaxin Cao, Yulong Cao, Mark Carlson, Carlos Casanova, Ting-Yun Chang, Yan Chang, Yu-Wei Chao, Prithvijit Chattopadhyay, Roshan Chaudhari, Chieh-Yun Chen, Junyu Chen, Ke Chen, Qizhi Chen, Wenkai Chen, Xiaotong Chen, Yu Chen, An-Chieh Cheng, Click Cheng, Xiu Chia, Jeana Choi, Chaeyeon Chung, Wenyan Cong, Yin Cui, Magdalena Dadela, Nalin Dadhich, Wenliang Dai, Joyjit Daw, Alperen Degirmenci, Rodrigo Vieira Del Monte, Robert Denomme, Sameer Dharur, Marco Di Lucca, Ke Ding, Wenhao Ding, Yifan Ding, Yuzhu Dong, Nicole Drumheller, Yilun Du, Aigul Dzhumamuratova, Aleksandr Efitorov, Hamid Eghbalzadeh, Naomi Eigbe, Imad El Hanafi, Hassan Eslami, Benedikt Falk, Jiaojiao Fan, Jim Fan, Amol Fasale, Sergiy Fefilatyev, Liang Feng, Francesco Ferroni, Sanja Fidler, Xiao Fu, Vikram Fugro, Prashant Gaikwad, TJ Galda, Katelyn Gao, Yihuai Gao, Wenhang Ge, Sreyan Ghosh, Arushi Goel, Vivek Goel, Akash Gokul, Rama Govindaraju, Jinwei Gu, Miguel Guerrero, Elfie Guo, Aryaman Gupta, Siddharth Gururani, Hugo Hadfield, Song Han, Ankur Handa, Zekun Hao, Mohammad Harrim, Ali Hassani, Nathan Hayes-Roth, Yufan He, Chris Helvig, Cyrus Hogg, Madison Huang, Michael Huang, Sophia Huang, Yufan Huang, Jacob Huffman, DeLesley Hutchins, Suneel Indupuru, Boris Ivanovic, Arihant Jain, Joel Jang, Ryan Ji, Yanan Jian, Dongfu Jiang, Jingyi Jin, Atharva Joshi, Nikhilesh Joshi, Pranjali Joshi, Andy Ju, Jaehun Jung, Weiwei Kang, Scott Kassekert, Jan Kautz, Ashna Khetan, Julia Kiczka, Slawek Kierat, Gwanghyun Kim, Kuno Kim, Sunny Kim, Kezhi Kong, Xin Kong, Zhifeng Kong, Tomasz Kornuta, Egor Krivov, Hui Kuang, Saurav Kumar, Chia-Wen Kuo, George Kurian, Wojciech Kutak, JF Lafleche, Himangshu Lahkar, Omar Laymoun, Jayjun Lee, Sanggil Lee, Gabriele Leone, Boyi Li, Freya Li, Jiajun Li, Jinfeng Li, Ling Li, Pengcheng Li, Shangru Li, Tingle Li, Xiaolong Li, Xuan Li, Zhaoshuo Li, Zhiqi Li, Hao Liang, Maosheng Liao, Chen-Hsuan Lin, Tsung-Yi Lin, Ming-Yu Liu, Sifei Liu, Zihan Liu, Hai Loc Lu, Xiangyu Lu, Alice Luo, Ruipu Luo, Wenjie Luo, Jiangran Lyu, Martin Ding Ma, Nic Ma, Qianli Ma, Dawid Majchrowski, Louis Marcoux, Miguel Martin, Qing Miao, Ashkan Mirzaei, Shreyas Misra, Kaichun Mo, Durra Mohsin, Hyejin Moon, Pawel Morkisz, Saeid Motiian, Kirill Motkov, Seungjun Nah, Yashraj Narang, Deepak Narayanan, Thabang Ngazimbi, Julian Ouyang, Shubham Pachori, David Page, Yatian Pang, Sehwi Park, Mahesh Patekar, Mostofa Patwary, Marco Pavone, Trung Pham, Wei Ping, Soha Pouya, Shrimai Prabhumoye, Varun Praveen, Delin Qu, Hesam Rabeti, Morteza Ramezanali, Marilyn Reeb, Xuanchi Ren, Kristen Rumley, Wojciech Rymer, Jun Saito, Yeongho Seol, John Shao, Piyush Shekdar, Tianwei Shen, Humphrey Shi, Min Shi, Stella Shi, Kevin Shih, Mohammad Shoeybi, Mateusz Sieniawski, Shuran Song, Alexander Sotelo, Amir Sotoodeh, Sunil Srinivasa, Vignesh Srinivasakumar, Bartosz Stefaniak, Rahul Heinrich Steiger, Shangkun Sun, Jiaxiang Tang, Shitao Tang, Yangyang Tang, Yue Tang, Tolou Tavakkoli, Kayley Ting, Krzysztof Tomala, Wei-Cheng Tseng, Jibin Varghese, Sergei Vasilev, Thomas Volk, Raju Wagwani, Roger Waleffe, Andrew Z. Wang, Boxiang Wang, Haoxiang Wang, Qiao Wang, Shihao Wang, Shijie Wang, Ting-Chun Wang, Yan Wang, Yu Wang, Rohit Watve, David Wehr, Fangyin Wei, Xinshuo Weng, Jay Zhangjie Wu, Kedi Wu, Hongchi Xia, Summer Xiao, Tianjun Xiao, Kevin Xie, Daguang Xu, Jiashu Xu, Mengyao Xu, Ruqing Xu, Xingqian Xu, Yao Xu, Dinghao Yang, Dong Yang, Hans Yang, Xiaodong Yang, Xuning Yang, Yichu Yang, Yurong You, Zhiding Yu, Hao Yuan, Simon Yuen, Xiaohui Zeng, Pengcuo Zeren, Cindy Zha, Haotian Zhang, Jenny Zhang, Jing Zhang, Liangkai Zhang, Paris Zhang, Shun Zhang, Xuanmeng Zhang, Zhizheng Zhang, Ann Zhao, Yilin Zhao, Yuliya Zhautouskaya, Charles Zhou, Fengzhe Zhou, Shilin Zhu, Yuke Zhu, Dima Zhylko, Artur Zolkowski</dc:creator>
    </item>
    <item>
      <title>ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning</title>
      <link>https://arxiv.org/abs/2606.02802</link>
      <description>arXiv:2606.02802v2 Announce Type: replace 
Abstract: Large language models (LLMs) exhibit strong natural-language reasoning abilities for clinical decision support, but struggle to effectively model structured longitudinal electronic health records (EHRs). In contrast, EHR foundation models can learn predictive patient representations, yet lack interpretable language-based reasoning. To bridge this gap, we propose ChatHealthAI, a multimodal reasoning framework that aligns structured EHR representations from a pretrained EHR foundation model with the semantic space of a frozen LLM through a task-aware resampler. By integrating longitudinal patient representations with refined clinical event descriptions, ChatHealthAI enables clinically grounded natural-language reasoning while maintaining accurate patient prediction. We evaluated ChatHealthAI on three clinical predictive tasks from the EHRSHOT benchmark. Results show that ChatHealthAI improves reasoning quality and interpretability while preserving competitive predictive performance. These findings highlight the potential of integrating EHR foundation models with pretrained LLMs for interpretable clinical prediction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.02802v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Bo-Hong Wang, Baicheng Peng, Ruilin Wang, Jun Bai, Ziyang Song, Yue Li</dc:creator>
    </item>
    <item>
      <title>The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs</title>
      <link>https://arxiv.org/abs/2606.03092</link>
      <description>arXiv:2606.03092v2 Announce Type: replace 
Abstract: Inference-time scaling has emerged as a critical avenue for enhancing Large Language Models' performance, yet real-world deployment is constrained by strict computational budgets. In this work, we formulate inference budget allocation as a global constrained optimization problem governed by economic principles. By modeling per-query reasoning utility with a shifted-surge function, we derive an optimal allocation policy based on a global shadow price that equilibrates marginal utility under resource scarcity. Based on this theory, we propose Constrained Latent-utility Equilibrium Allocation for Reasoning (CLEAR). It performs rational abandonment and reallocates resources from insolvent queries to solvable queries near their emergence thresholds.
  Extensive experiments on several reasoning tasks with different traffic streams demonstrate that CLEAR significantly improves the Pareto frontier of total token cost versus mean accuracy. In resource-scarce regimes, CLEAR achieves up to a 3x improvement in global accuracy compared to uniform allocation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.03092v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xu Wan, Speed Zhu, Jianwei Cai, Guang Chen, XiMing Huang, Wiggin Zhou, Mingyang Sun</dc:creator>
    </item>
    <item>
      <title>Private Embedding Lookup with Encrypted Compact Queries under Fully Homomorphic Encryption</title>
      <link>https://arxiv.org/abs/2606.03191</link>
      <description>arXiv:2606.03191v3 Announce Type: replace 
Abstract: Many NLP or recommendation models begin by mapping discrete client inputs to embedding vectors. Since inputs can reveal sensitive information, the embedding step must be protected in privacy-preserving inference. Fully Homomorphic Encryption (FHE) enables inference over encrypted client data, but turns embedding lookup from simple table access into homomorphic computation. To keep the embedding table server-side and avoid transmitting encrypted embedding vectors from the client, we focus on server-side lookup: the client sends only a small encrypted index.
  Prior ICML 2024 work first builds a one-hot vector from the encrypted index before multiplying with the embedding table, and this one-hot generation is the dominant cost. One-hot-based methods are expensive in FHE: they construct a p-dimensional selection vector via an equality test for each coordinate, requiring $O(p \log p)$ total homomorphic operations.
  Our key observation is that private embedding lookup only requires a linearly independent representation of the encrypted index, not the one-hot basis itself. Building on it, we propose Independent Vector Evaluation (IVE). Instead of constructing a one-hot vector, IVE evaluates a linearly independent vector built from successive powers of a single encrypted value, reducing vector-generation cost to $O(p)$. It then recovers the same embedding vector via a precomputed change of basis, instantiated with an orthogonal Discrete Cosine Transform to mitigate error amplification.
  Our implementation shows IVE improves amortized lookup time by up to 78.4x over prior method. We further evaluate its impact on end-to-end encrypted FastText inference, where embedding lookup is a major cost in the shallow model. On Enron-Spam dataset, replacing one-hot generation with IVE reduces the share of vector generation in encrypted inference time from 99.6% to 66.3%.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.03191v3</guid>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jung Hee Cheon, Daehyun Jang, Jaehee Kang, Hanee Rhee</dc:creator>
    </item>
    <item>
      <title>Ollivier-Ricci curvature in cycle overlap mode</title>
      <link>https://arxiv.org/abs/2606.03317</link>
      <description>arXiv:2606.03317v2 Announce Type: replace 
Abstract: Ollivier-Ricci curvature of an edge (x,y) is defined by comparing the distance taken to transport from neighbors of x to neighbors of y. It is a structural measure that has been studied in many fields such as community detection and deep neural networks. However, high computational complexity or error limits its application in large scale-free graphs. This paper proposes an optimal transport principle to minimize the distance by 3,4,5-cycles that include the edge (x,y), and designs a curvature calculation approach named Curvature in Cycle Overlap Mode (CCOM). In this approach, a greedy and pruning algorithm is proposed to approximate the optimal transport principle. We theoretically and experimentally verified that our approach CCOM can significantly improve the accuracy of the curvature on real-world networks with low time consumption. In addition, we compared CCOM with baseline approximation approaches in community detection tasks using the same curvature-based framework, and experimentally confirmed the effectiveness of CCOM on large scale-free graphs.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.03317v2</guid>
      <category>cs.SI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zexian Zhou, Bo Jiao</dc:creator>
    </item>
    <item>
      <title>Calibration Data Trade-offs Across Capability Dimensions: Why Multi-Source Mixing Matters for High-Sparsity LLM Pruning</title>
      <link>https://arxiv.org/abs/2606.03328</link>
      <description>arXiv:2606.03328v2 Announce Type: replace 
Abstract: Post-training pruning compresses large language models to high sparsity using a small unlabelled calibration set, and recent work has concluded that the choice of calibration source has only modest impact on averaged post-pruning accuracy. We ask whether this conclusion survives once calibration impact is evaluated separately across distinct capability dimensions rather than aggregated. Decomposing post-pruning capability into General, Commonsense, Code, and Math, and analysing $n{=}15$ calibration sources via Spearman correlations between OIT information metrics and per-dimension retention, we uncover an opposite-sign trade-off: calibration perplexity correlates positively with General retention ($\rho{=}{+}0.71$) but negatively with Math and Code retention ($\rho{=}{-}0.53,\,{-}0.59$; $p{&lt;}0.05$), so no single source can preserve all capabilities. We respond with multi-source calibration mixing, and propose IGSP, an information-guided self-calibration protocol that automates multi-source construction without capability-aligned corpora by minimising 4-gram aggregation and balancing perplexity across dimensions. On LLaMA-3.1-8B at SparseGPT 60% sparsity, a uniform multi-source mix reaches 58.8% total retention, outperforming the best single source (MetaMath, 50.0%) by $+8.8$ and the C4 default (40.0%) by $+18.8$; IGSP improves over Self-Cal by $+2.4$ and SGS by $+4.8$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.03328v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hu Xu, Zhaolong Xing, Congcong Liu, Jiaxing Wang, Zhida Jiang, Junshi Huang, Zhen Chen, Jianfeng Xu</dc:creator>
    </item>
    <item>
      <title>See, Infer, Intervene: Proactive World Modeling for Goal-Oriented Social Intelligence</title>
      <link>https://arxiv.org/abs/2606.03371</link>
      <description>arXiv:2606.03371v2 Announce Type: replace 
Abstract: Multimodal retail agents should not only recognize what a customer is doing, but also decide whether and how to assist before an explicit request is made. We study this setting through the See--Infer--Intervene (SII) framework, where a device must see pre-interaction behavior, infer latent customer intent, and act by selecting an appropriate service intervention or choosing to wait. We instantiate SII with the Proactive Intent World Model (PIWM), which represents customer state with AIDA (Attention, Interest, Desire, Action) purchasing phases and BDI (belief, desire, intention) psychological fields, predicts action-conditioned intent transitions, and selects from five response classes: Greet, Elicit, Inform, Recommend, and Hold. We further construct GuidanceSalesBench, a smart-retail benchmark containing state manifests, pre-interaction videos, candidate responses, action-conditioned outcomes, and best-action labels. When conditioned on ground-truth customer state to isolate action selection, PIWM achieves 0.641 macro F1 on 30 held-out target videos, outperforming a zero-shot Qwen2.5-VL-7B baseline and training variants without balanced action supervision; end-to-end video-only selection drops to 0.295, below the 5-class balanced random baseline of 0.414, identifying video-to-state grounding as the dominant deployment-time bottleneck. A preliminary staged real-store pilot (recorded with paid participants performing scripted customer behaviors) reaches 0.579 action macro F1 on 20 fully annotated videos, with 10 additional accessible videos released with index-level labels.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.03371v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Honghui Zhang, Chenmeinian Guo, Yichen Yu, Guanyu Liu, Yujia Zhang, Yongming Qin, Chongguo Song, Mengyue Yang, Lei Yu, Tianyu Shi</dc:creator>
    </item>
    <item>
      <title>Skill Is Not Document: A Query-Conditional Benchmark and Two-Stage Retriever for LLM Agent Skill Routing</title>
      <link>https://arxiv.org/abs/2606.03565</link>
      <description>arXiv:2606.03565v2 Announce Type: replace 
Abstract: LLM agents complete complex tasks by composing multiple skills, and skill retrieval is a front-end stage for agents. Skill retrieval differs fundamentally from traditional document retrieval at the supervision level: top-K joint correctness depends not only on the semantic relevance of each individual query-skill pair, but also on whether the skills retrieved together can collaborate to fulfill the task under the given query. Such "skill compatibility" cannot be derived from independent relevance alone. Yet existing LLM-based data synthesis pipelines can produce a direct supervision signal for "which skills should not be jointly retrieved under this query" -- namely the LLM's own rejection decisions -- and this signal is routinely discarded as low-quality data. To address this gap, we propose Reject-as-Resource Retriever (R3) and construct R3-Skill, a bilingual (Chinese-English) skill retrieval benchmark targeting realistic agent skill routing. R3-Skill spans four language directions, features query phrasings close to real user requests, and is verified through multi-expert cross-checking. On R3-Skill, we build a two-stage retrieval system (R3-Embedding + R3-Reranker) with skill compatibility as an explicit training signal. Gradient analysis shows that the "push-away" signal is diluted by bilateral balancing in the bi-encoder but acts as lossless graded ranking supervision in the cross-encoder -- motivating its placement at the cross-encoder stage, as confirmed by ablations on two datasets. The R3-Embedding + R3-Reranker pipeline attains Hit@1 = 0.7714, NDCG@10 = 0.8327 and Set-Compat = 0.3525 on R3-Skill. The dataset, training code and model weights are released as open source for agent skill routing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.03565v2</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zifei Wang, Wei Wen, Qiang Ji, Ruizhi Qiao, Xing Sun</dc:creator>
    </item>
    <item>
      <title>STC: Reversible Digit-Context Decomposition for BWT-Family Text Compression</title>
      <link>https://arxiv.org/abs/2606.03570</link>
      <description>arXiv:2606.03570v2 Announce Type: replace 
Abstract: Burrows-Wheeler-transform-based compressors rely on local context regularity, but structured text also contains dates, counters, identifiers, coordinates, and other digit runs whose values vary differently from their surrounding tokens. STC is a practical BWT-family compressor that separates this source of variation before the component BWT stage. It replaces digit runs in the main stream with an unambiguous placeholder and stores the removed digits in length- and context-conditioned side streams. The side streams use stable bucket ordering and compact digit packing, so the decoder can reconstruct the original run order from the normalized main stream without storing a separate permutation. The resulting components are encoded by a fixed internal BWT/M03-style component coder. On enwik9, STC produces a 157,388,188-byte archive with a 183,174-byte decoder source package, giving a local LTCB-style total of 157,571,362 bytes. A full-enwik9 same-coder ablation shows that the digit-context decomposition reduces the archive by 2,629,561 bytes relative to the no-split control. The result is locally verified by full decode and SHA-256 matching; official benchmark status requires independent maintainer-side verification.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.03570v2</guid>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jingyang Du, Yang Shen, Anling Xiang</dc:creator>
    </item>
    <item>
      <title>AutoTail-BSFGM: Class-Balance-Aware Fine-Tuning for Chinese Scholarly Text Classification</title>
      <link>https://arxiv.org/abs/2606.03576</link>
      <description>arXiv:2606.03576v2 Announce Type: replace 
Abstract: Scholarly text classification supports literature organization, subject indexing, and research intelligence, but Chinese scholarly corpora often contain imbalanced and semantically adjacent disciplinary labels. We propose AutoTail-BSFGM, a class-balance-aware fine-tuning method that combines an automatically gated tail-prior adjustment, a weak Balanced Softmax auxiliary loss, and Fast Gradient Method adversarial regularization. The method changes only the training objective and procedure; inference uses the same single base-size encoder and linear classifier as the corresponding label-smoothed baseline. We evaluate the method on two CSL-based tasks: an abstract-to-discipline task with 67 labels and a title-to-category task with 13 categories. On the primary abstract task, AutoTail-BSFGM improves validation and lockbox accuracy under both Chinese RoBERTa-WWM and MacBERT-base. With MacBERT-base, validation accuracy increases by 0.83 percentage points and lockbox accuracy by 0.49 points, with a pooled paired McNemar signal on validation (p = 0.023). On the title task, the method improves validation accuracy by 0.70 points and validation balanced accuracy by 2.64 points; lockbox accuracy is approximately neutral while lockbox balanced accuracy improves by 1.22 points. The results support a bounded contribution: AutoTail-BSFGM improves class-balance-sensitive behavior and yields consistent gains for abstract-based scholarly classification, without uniformly improving every metric on every split.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.03576v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Anling Xiang, Yuwen Yang, Yang Shen</dc:creator>
    </item>
    <item>
      <title>Worth Remembering: Surprise-Gated Robot Episodic Memory</title>
      <link>https://arxiv.org/abs/2606.03787</link>
      <description>arXiv:2606.03787v3 Announce Type: replace 
Abstract: Robots solving generalist tasks need to be able to ground instructions in their past experience, since humans may refer to notable past events when giving a task (e.g., ``Take me to where the chemical spill happened yesterday''). Since memory limits make storing all past events infeasible, long-term robot memory must be selective, ideally retaining only those episodes with high utility for future tasks. However, future tasks are not typically given a priori for generalist robots. To select generically useful memories, we propose Bayesian surprise as a gating mechanism for memory formation. We present an approach to compute surprise in a semantically rich deployment-agnostic latent space provided by V-JEPA-2. Using our gated episodic memory to augment 4D scene graph-based spatial memory, we show a consistent improvement over state-of-the-art benchmarks in robot question answering, outperforming prior robot memory methods by $\geq12\%$ for temporal, spatial, and binary questions, and surpassing the performance of supervised and non-causal methods with an unsupervised causal method in event segmentation tasks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.03787v3</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Nicolas Gorlo, Derek K. Wise, Alberto Speranzon, Luca Carlone</dc:creator>
    </item>
    <item>
      <title>The Grothendieck Constant is Less Than $\frac{\pi}{2 \log (1+ \sqrt{2})} - 10^{-5}$</title>
      <link>https://arxiv.org/abs/2606.03991</link>
      <description>arXiv:2606.03991v2 Announce Type: replace 
Abstract: We prove that the Grothendieck constant $K_G &lt; \frac{\pi}{2 \log (1+ \sqrt{2})} - 10^{-5}$. This improves on the work of Braverman, Makarychev, Makarychev, and Naor (2011), who proved that $K_G &lt; \frac{\pi}{2 \log (1+ \sqrt{2})} - \epsilon$ for an unspecified $\epsilon&gt;0$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.03991v2</guid>
      <category>cs.DS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Alan Li, Rahul Saha, Anton Xue, Swarat Chaudhuri, Adam Klivans, Pravesh K Kothari, Raghu Meka</dc:creator>
    </item>
    <item>
      <title>Position: Deployed Reinforcement Learning should be Continual</title>
      <link>https://arxiv.org/abs/2606.04029</link>
      <description>arXiv:2606.04029v2 Announce Type: replace 
Abstract: Reinforcement Learning (RL) has received increasing attention and adoption in real-world use cases. Most of these systems follow a train-then-fix paradigm, where trained agents do not learn while interacting with the world until performance degrades and retraining becomes necessary. In this position paper, we argue that deploying an agent that is incapable of optimality, but receives an evaluative reward signal, is inherently a continual RL problem. We identify four sources of non-stationarity after deployment that necessitate never-ending learning, and highlight why the best deployed agents never stop adapting. We analyze successful examples of continual RL in the real world, and present the community with the advantages and measures to move away from the current train-then-fix paradigm.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04029v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Parnian Behdin, Kevin Roice, Golnaz Mesbahi</dc:creator>
    </item>
    <item>
      <title>Discourse-Role Labels as Presentation-Time Variables for Context Use in Language Models</title>
      <link>https://arxiv.org/abs/2606.04109</link>
      <description>arXiv:2606.04109v2 Announce Type: replace 
Abstract: Context-augmented language model systems often wrap supplied content with labels such as Reference:, Evidence:, Instruction:, Note:, or Example:, but the effect of these labels on reader-model behavior remains underexplored. We introduce a paired fixed-content probe over 500 MMLU-Pro items: each item receives the same misleading answer-bearing assertion under different discourse-role labels, and adoption is measured by whether the model outputs the injected wrong option. Across GPT-5.5, DeepSeek V4 Pro, Llama-3-8B-Instruct, and Qwen2.5-7B-Instruct, Misleading Adoption Rate shifts by 56-84 percentage points. Binding or source-like labels such as Instruction: and Reference: produce high adoption, whereas Example: consistently suppresses it. Paired tests, bootstrap intervals, final-instruction ablations, and Qwen final-step log-probability probes support a label-conditioned candidate preference. Boundary probes show where the effect weakens or persists: arithmetic tasks reduce adoption, passage-shaped external context preserves smaller label gaps, short-answer evaluation rules out option-letter copying, and nested-label conflicts suggest that illustrative framing can delimit adoption scope. A 200-case single-author manual audit confirms that the short-answer contrasts are stable under conservative adjudication. The resulting claim is bounded but practical: context-utilization and reader-side RAG benchmarks should report and control wrapper labels, because presentation choices can change measured reliance on supplied context.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04109v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jianguo Zhu, Xiangmei Li, Wenjie Liu</dc:creator>
    </item>
    <item>
      <title>Polymarket-v1 Database</title>
      <link>https://arxiv.org/abs/2606.04217</link>
      <description>arXiv:2606.04217v2 Announce Type: replace 
Abstract: We introduce the Polymarket-v1 Database: the complete on-chain trade archive of Polymarket's first-generation CTF Exchange on Polygon, spanning 2022-11-21 to 2026-04-28 and covering the full contract lifecycle from first settlement to natural termination. The dataset comprises 1.20 billion trade records across 1.30 million markets with $61 billion in nominal volume. Its defining feature is 100% ground-truth aggressor direction derived from the blockchain settlement layer, a property unavailable in existing prediction market archives, which rely on heuristic inference. We use this truth-aligned archive to benchmark standard microstructure tools and document three findings. First, the tick rule and bulk volume classification achieve near-random aggregate accuracy (49.83% and 50.51%), but this masks a systematic, correctable price-level gradient driven by positive trade direction autocorrelation and concentrated market-making -- two structural features of prediction markets that violate the mean-reversion assumption embedded in classical classifiers. Second, these classification errors propagate into downstream metrics: inferred VPIN diverges substantially from ground-truth VPIN, and OFI estimates are directionally biased, with material consequences for Transaction Cost Analysis. Third, ground-truth microstructure quality predicts forecasting performance in ways that classification-based proxies cannot recover: True VPIN positively predicts Brier scores, while Gibbs spread negatively predicts them -- a selection effect reflecting that high-spread niche markets attract informed specialists rather than noise traders. Replacing ground-truth metrics with classified proxies attenuates both relationships, illustrating that measurement accuracy at the transaction level is a prerequisite for reliable inference about prediction market design and probability calibration.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04217v2</guid>
      <category>cs.CE</category>
      <category>q-fin.ST</category>
      <category>q-fin.TR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Boka Qin, Rui Yang</dc:creator>
    </item>
    <item>
      <title>Incremental Sheaf Cohomology on Cellular Complexes: O(1)-in-n Lazy Edit Processing under Bounded Local Geometry</title>
      <link>https://arxiv.org/abs/2606.04227</link>
      <description>arXiv:2606.04227v2 Announce Type: replace 
Abstract: We present an algorithmic framework for incremental maintenance of first sheaf cohomology $H^1(X; \mathcal{F})$ on dynamically evolving 1-dimensional cellular complexes equipped with finite-dimensional cellular sheaves. The classical computation of $H^1$ via factorization of the coboundary matrix requires $O(n^3)$ time; when the complex evolves with a stream of $m$ edits, full recomputation after each edit costs $O(mn^3)$.
  Under a bounded local geometry assumption -- bounded cell size $v_{\max}$, bounded stalk dimension $d$, and bounded nerve degree $D$ -- each edit (vertex insertion, edge insertion, restriction map update) affects only a bounded set of local coboundary blocks. The algorithm therefore processes lazy streaming edits in $O(1)$ time with respect to the total complex size $n$ (with cost polynomial in the local geometry parameters $v_{\max}$, $d$, and $D$, which are treated as constants independent of $n$), deferring local eigensolves and Mayer-Vietoris global assembly to synchronization points (Flush). At synchronization, the maintained state agrees with the corresponding batch assembly of the partitioned sheaf model; we observe zero measured drift in all batch-verified runs (through $V = 10^6$). We also give an amortized $O(|E|)$ streaming construction for the cellular decomposition and discuss an adversarial algebraic-RAM barrier arguing that unpartitioned non-trivial sheaves ($d \geq 2$, non-identity restriction maps) do not admit the same locality. Experiments on Barabasi-Albert graphs with up to $5 \times 10^6$ vertices and $1.7 \times 10^7$ streaming edits show 35 $\mu$s median lazy per-edit update latency (excluding flush); query time (global assembly at synchronization) is $O(n)$ per flush in the implemented full-traversal path. Exact synchronization costs are reported separately.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04227v2</guid>
      <category>cs.DS</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Jason L. Volk</dc:creator>
    </item>
    <item>
      <title>An Empirical Study of Data Scale, Model Complexity, and Input Modalities in Visual Generalization</title>
      <link>https://arxiv.org/abs/2606.04409</link>
      <description>arXiv:2606.04409v2 Announce Type: replace 
Abstract: Modern deep neural networks usually have large parameter scales and nonlinear hierarchical structures, and they have achieved strong performance in computer vision. However, the source of their generalization performance remains difficult to explain using traditional statistical learning theory. Among the factors that may affect visual generalization, data scale, model complexity, and input modalities are fundamental and controllable variables. This study empirically analyzes how these three factors influence model generalization performance. Specifically, in a preliminary experiment, we construct a one-dimensional nonlinear function and vary the number of training samples and the polynomial degree to observe the effects of data scale and model complexity on model performance. In the main experiments, we compare model performance on CIFAR-10 and CIFAR-100 under different training data scales, model architectures, and input modalities. The experimental results show that increasing the training data scale consistently improves generalization performance, whereas changes in model complexity do not provide stable gains. In addition, removing color information degrades model performance, while explicit prior features such as gradients, edges, and wavelets have inconsistent effects across different model architectures. Overall, this study provides an empirical analysis of the relationships among data scale, model complexity, input modalities, and visual generalization performance. Code and experimental logs are available at: https://github.com/YidiZhouluo/DeepLearning-Empirical-Studies/tree/main/Exp_01.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04409v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yidi Zhouluo</dc:creator>
    </item>
    <item>
      <title>FlexNPU: Transparent NPU Virtualization for Dynamic LLM Prefill-Decode Co-location</title>
      <link>https://arxiv.org/abs/2606.04415</link>
      <description>arXiv:2606.04415v2 Announce Type: replace 
Abstract: Modern AI serving increasingly relies on NPUs for conventional inference and large language model serving. However, current NPU deployments commonly expose physical devices directly to applications, which limits runtime control over scheduling and makes it difficult to adapt execution to phase-level workload behavior. This limitation is particularly evident in LLM serving, where the prefill phase is compute-intensive while the decode phase is often constrained by memory bandwidth and KV-cache accesses. Static prefill-decode (PD) disaggregation reduces phase interference, but can introduce resource imbalance and unnecessary data movement. We present FlexNPU, a transparent user-space virtualization layer for Ascend NPUs. FlexNPU interposes on AscendCL APIs and routes NPU operations through per-device daemons, decoupling unmodified from physical NPU devices without modifying model code, AI frameworks, or NPU drivers. This runtime boundary allows FlexNPU to virtualize NPU objects, control operator dispatch, and support phase-aware scheduling for LLM serving. In particular, FlexNPU enables dynamic PD co-location, which adapts scheduling between prefill and decode according to their complementary resource characteristics. We implement FlexNPU on Huawei Ascend NPUs and evaluate it with typical LLM workloads. Compared with direct NPU passthrough, FlexNPU introduces no measurable inference overhead and slightly improves throughput in some scenarios. On a 384-card Ascend 910C deployment of DeepSeek-R1, FlexNPU improves throughput over static PD disaggregation by 5.15% and 26.33%. On Qwen2.5-7B, compared with static PD co-location, FlexNPU maintains comparable throughput while reducing TTFT by over 92% across tested workloads with nearly unchanged TPOT. These results show that transparent NPU virtualization is a practical substrate for efficient and responsive LLM serving.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04415v2</guid>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jiongjiong Gu, Jianfeng Wang, Zidong Han, Yongqiao Wang, Pengfei Xia, Mingjie Zhang, Hong Liu, Yuanyi Xia, Jiajia Chu, Yifeng Tang, Hui Zang, Xin Yao, Qijie Qiu, Yuzhao Wang, Chuanfei Xu, Lin Zhang, Zhuonan Lai, Hongming Huang, Jiawei Qiu, Gong Zhang, Weipeng Cao, Zhong Ming</dc:creator>
    </item>
    <item>
      <title>Trivium: Temporal Regret as a First-Class Objective for Causal-Memory Controllers</title>
      <link>https://arxiv.org/abs/2606.04421</link>
      <description>arXiv:2606.04421v2 Announce Type: replace 
Abstract: Many current agentic systems and LLM pipelines correct mistakes by optimizing outcome reward. This addresses only the what of failure: when an outcome diverges from prediction, the why and when of the mismatch are not systematically logged, reviewed, or corrected, so the same error can recur episode after episode. We argue that this is a structural problem, not merely a model-capacity one. We propose long-horizon temporal regret as a first-class objective alongside outcome regret and epistemic regret over the working causal model. Temporal regret captures when failure persists: how long a miscalibrated causal model is tolerated before correction. Epistemic regret captures why failure persists: residual uncertainty or error in the working causal model. Together, the three regrets give a falsifiable account of what, why, and when a long-lived agent can fail. Modeling the agent as a stream of E episodes, we prove three conditional results under explicit causal-probing, persistence, and detectability assumptions. First, under observationally equivalent confounding, outcome-only learning cannot distinguish causal from spurious structure without an intervention channel, so temporal miscalibration can persist linearly even after outcome regret is driven to zero. Second, with a persistent causal log and budgeted probes, total probe complexity is logarithmic in the episode horizon, inducing O(log E) temporal regret. Third, under K detectable change-points, the rate extends to O(K log E). We instantiate Trivium and pre-register five falsifiable predictions. On CausalBench-Seq, Trivium follows the predicted logarithmic envelope while outcome-only baselines grow linearly. A pilot real-LLM stream provides preliminary external-validity evidence across one full E = 500 run and three E = 100 frontier-model pilots. Self-learning here means revising an external causal model, not retraining LLM weights.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04421v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Edward Y. Chang</dc:creator>
    </item>
    <item>
      <title>Multi-SPIN: Multi-Access Speculative Inference for Cooperative Token Generation at the Edge</title>
      <link>https://arxiv.org/abs/2606.04581</link>
      <description>arXiv:2606.04581v2 Announce Type: replace 
Abstract: Speculative inference (SPIN) was originally developed as an efficient architecture to accelerate Large Language Models (LLMs). In this work, we propose its distributed deployment to enable cooperative token generation in a multiuser edge system; its advantage is to effectively balance computational loads between resource-constrained devices and servers. The resulting architecture, termed Multi-access SPIN (Multi-SPIN), utilizes on-device small language models to generate and upload candidate token drafts, while an edge server operates the LLM to verify them in parallel batches. Given the severe heterogeneity in users' computation and communication capabilities, the draft length emerges as a critical control variable that influences node-level computation loads and multi-access latency, thereby governing the sum token goodput. Consequently, considering frequency-division multiple access, we investigate the problem of multi-access draft control, a joint optimization of draft-length control and bandwidth allocation to maximize sum token goodput. We examine two cases: (1) homogeneous draft lengths across users to facilitate server-side batching, and (2) heterogeneous draft lengths to introduce a new dimension for goodput enhancement. By developing decomposition methods, we reduce these complex optimizations into tractable sub-problems, which allow efficient draft control algorithms to be derived in closed form. Our analysis shows that the optimal bandwidth allocation compensates users with weaker computation-and-communication capabilities in the homogeneous case due to the batching synchronization requirements, whereas its heterogeneous-case counterpart rewards users with higher acceptance rates by relaxing such requirements. Experiments using Llama-2 and Qwen3.5 model pairs across diverse tasks demonstrate that Multi-SPIN improves goodput by up to 88% over heterogeneity-agnostic baselines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04581v2</guid>
      <category>cs.DC</category>
      <category>cs.AI</category>
      <category>cs.NI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haotian Zheng, Zhanwei Wang, Mingyao Cui, Chang Cai, Hongyang Du, Kaibin Huang</dc:creator>
    </item>
    <item>
      <title>MIRAGE: Mobile Agents with Implicit Reasoning and Generative World Models</title>
      <link>https://arxiv.org/abs/2606.04627</link>
      <description>arXiv:2606.04627v2 Announce Type: replace 
Abstract: Mobile agents are increasingly expected to operate everyday applications from screenshots and language goals, where reliable control requires reasoning over screen affordances, multi-step navigation, and future state changes. However, many agents externalize this computation as long textual chains of thought, which slows interaction, increases supervision cost, and complicates deployment. We introduce MIRAGE, a framework that learns continuous latent reasoning representations from visible textual reasoning traces. MIRAGE transfers explicit reasoning into compact hidden states, enabling the agent to reason internally without decoding long rationales. It also incorporates a generative world-model objective: latent reasoning vectors are aligned with future screenshots, encouraging the agent to anticipate upcoming interface states before acting. This turns hidden computation into both a compressed thought representation and a forward-looking model of environment dynamics. At inference time, MIRAGE reasons in continuous latent space, reducing token generation while improving execution efficiency. On AndroidWorld, MIRAGE matches explicit chain-of-thought supervised fine-tuning in the 4B ablation with a 3-5x lower decoded-token budget and improves a comparable instruction-tuned baseline by 10.2 points; on AndroidControl, it improves action grounding while generating over 75% fewer tokens.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04627v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhichao Yang, Yuanze Hu, Haojie Hao, Longkun Hao, Dongshuo Huang, Hongyu Lin, Gen Li, Lanqing Hong, Yihang Lou, Yan Bai</dc:creator>
    </item>
    <item>
      <title>An Empirical Audit of Input Encoders for Multi-Channel Signal Transformers</title>
      <link>https://arxiv.org/abs/2606.04752</link>
      <description>arXiv:2606.04752v2 Announce Type: replace 
Abstract: Transformers consuming multi-channel scalar signals must embed $C$ simultaneous values into one $d_{\text{model}}$-dimensional vector per time step. We audit eight input encoders -- a shared-scalar baseline, per-channel linear projections, an orthogonality regulariser, a nonlinear MLP, block-partitioned concatenation, channel-independent and channel-as-token architectures, and a projected positional encoding -- on a synthetic benchmark where channel identity is informative and on ETTh1, scored by next-step negative log-likelihood. The headline is practical near-equivalence within a wide "top tier": the standard per-channel linear projection matches every alternative up to small, statistically real but practically modest differences. A direct geometric probe attributes this to a spontaneous orthogonalisation of the per-channel projections: they end up near-orthogonal with no explicit regulariser, letting the standard linear recover channel identity from the summed embedding. Two encoders lose decisively: the shared-scalar baseline collapses for information-theoretic reasons we make explicit, and the channel-independent PatchTST-spirit baseline overfits universally on the synthetic benchmark and underperforms on both. Paired tests resolve two small gaps: projecting the sinusoidal positional encoding through a learned linear layer edges the rest at small $C$ by extending this orthogonality to the positional subspace; a nonlinear MLP stem edges them at the largest $C$, with the gap shrinking under more training data. The practical recommendation: use the standard per-channel linear projection by default; reach for something more elaborate only when the task calls for it.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04752v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ossi Lehtinen</dc:creator>
    </item>
    <item>
      <title>The Right Measure for Physics-Constrained Generation: A Co-Area Correction for Posterior-Consistent PDE Inverse Problems</title>
      <link>https://arxiv.org/abs/2606.04804</link>
      <description>arXiv:2606.04804v3 Announce Type: replace 
Abstract: Generative models -- diffusion and flow matching -- are increasingly used to solve partial differential equation (PDE) inverse problems, enforcing the governing physics as a \emph{hard constraint} (via projection or guidance) and reporting the resulting samples as a Bayesian posterior with calibrated uncertainty. We show that this widely adopted recipe samples the wrong distribution. Conditioning a generative prior on a hard PDE constraint is conditioning on a measure-zero manifold -- an operation that is intrinsically ambiguous (the Borel--Kolmogorov paradox) and whose physically correct resolution, the small-residual-noise limit, carries a co-area (Fixman) Jacobian factor $[det(JJ^{\top})]^{-1/2}$ that projection- and guidance-based methods silently omit. We make the bias precise, show that it grows with the heterogeneity of the constraint sensitivity, and validate it on controlled problems against an \emph{i.i.d.} ground-truth arbiter. The omitted factor is not a second-order detail: removing it inflates the posterior error to $20\times$ the sampling-noise floor; minimal-displacement projection (as in PCFM) is biased at $9\times$ the floor; and a naive scalar reweighting does not fix it. We introduce \textbf{CoCoS}, a measure-aware constrained sampler that targets the correct co-area posterior, and show that it matches the gold-standard posterior to within sampling noise. Our results imply that ``satisfying the physics'' is not the same as ``sampling the posterior,'' and give a principled correction for uncertainty-aware scientific inference.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04804v3</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jian Xu, Yanning Wu, Delu Zeng, John Paisley, Qibin Zhao</dc:creator>
    </item>
    <item>
      <title>The Usefulness Gap in Proof-of-Useful-Work: An Empirical Study of Pearl's cuPOW Protocol</title>
      <link>https://arxiv.org/abs/2606.04819</link>
      <description>arXiv:2606.04819v2 Announce Type: replace 
Abstract: Pearl, a Layer-1 blockchain with high-profile AI industry endorsements, markets its Proof-of-Useful-Work (PoUW) protocol as simultaneously securing the network and performing AI inference. We present the first systematic empirical measurement of a deployed PoUW system, finding that Pearl's 24 EH/s network -- representing approximately 320,000 GPU-equivalents consuming an estimated 112 MW -- produces zero useful AI computation. Budget GPU rental prices rose 38% and utilization surged from 57% to 94% following the mining software's public release, displacing legitimate research workloads.
  Our measurements span five dimensions: (1) network composition analysis of 8,012 workers shows all have inference-capable hardware, yet the dominant mining software contains no inference code; (2) the verification protocol accepts random matrices by design, confirmed by 44 pool-accepted shares from our open-source miner across NVIDIA, AMD, CPU, and Apple Silicon hardware; (3) statistical distribution checks are trivially defeated by adversarial Gaussian sampling; (4) mining economics are marginal at current PRL prices ($0.76), with ROI ranging from -1% to +67% depending on GPU tier -- near breakeven for most hardware; and (5) the mining computation is commodity integer arithmetic portable to any hardware platform, offering no vendor lock-in. These findings quantify the verifiability-usefulness tension identified theoretically by Leinweber et al., providing concrete measurements of its magnitude and economic consequences in a deployed system.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04819v2</guid>
      <category>cs.CR</category>
      <category>cs.CY</category>
      <category>cs.DC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Abhinaba Basu</dc:creator>
    </item>
    <item>
      <title>Channel Fracture: Three Instances of Cross-Boundary Silent Delivery Reliability Failures in Multi-Agent Systems</title>
      <link>https://arxiv.org/abs/2606.04896</link>
      <description>arXiv:2606.04896v3 Announce Type: replace 
Abstract: We report the discovery of channel fracture, a silent architectural failure in multi-agent systems where information routed across agent boundaries is silently blocked by invisible constraints. We present three instances in a production Hermes Agent deployment: (1) cron memory injection blocked by scheduler barriers; (2) cross-profile skill routing fractured by recursive directory traversal; (3) WebSocket delivery confirmation fallback fracture causing message duplication. We propose CADVP v1.1, a 13-dimension verification protocol with a veto-level confirmation check. Through 30,012 trials, zero failure rates under protocol versus 69 to 98 percent without. Real-world validation (10,008 trials) confirms quality elevation from 0.90 to 1.00. Three design principles: inverse verification, channel matching, and PIP protection.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04896v3</guid>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Dexing Liu</dc:creator>
    </item>
    <item>
      <title>Toward Multi-Domain and Long-Tailed Quantization via Feature Alignment and Scaling</title>
      <link>https://arxiv.org/abs/2606.04920</link>
      <description>arXiv:2606.04920v2 Announce Type: replace 
Abstract: Quantizing deep neural networks is essential for efficient inference on resource-constrained devices. However, most existing methods are designed for single-domain and class-balanced data, leaving practical settings with domain shifts or severe class imbalance underexplored. We address these challenges with Efficient Multi-Domain Alignment Quantization (EmaQ), which aligns domain distributions through a CDF-based projection and uses sensitivity-aware weight aggregation to stabilize multi-domain quantization. We further extend EmaQ to EmaQ-LT for long-tailed quantization by introducing class-conditioned variance scaling and confidence-based logit adjustment to mitigate majority-class overconfidence. Theoretical analyses establish convergence guarantees and motivate the proposed sensitivity and scaling mechanisms. Experiments on standard, multi-domain (Office-31, Digits), and long-tailed (SynDigits-LT, CIFAR-10-LT, CIFAR-100-LT) benchmarks show that EmaQ and EmaQ-LT achieve strong low-bit performance under domain shift and class imbalance.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04920v2</guid>
      <category>cs.LG</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ting-An Chen, Chin-Yuan Yeh, De-Nian Yang</dc:creator>
    </item>
    <item>
      <title>STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models</title>
      <link>https://arxiv.org/abs/2606.04945</link>
      <description>arXiv:2606.04945v2 Announce Type: replace 
Abstract: Diffusion large language models (DLLMs) have recently emerged as a promising alternative to autoregressive LLMs by generating text through iterative masked denoising with bidirectional context. However, their large model sizes and iterative denoising process introduce substantial memory and computational overhead, motivating post-training quantization for efficient deployment. In this paper, we identify two key challenges for low-bit DLLM quantization: state-dependent activation disparity and temporal error accumulation. Masked and unmasked tokens exhibit different activation distributions within each denoising step, while quantization errors can accumulate across steps during iterative decoding. To address these challenges, we propose STaR-Quant, a state-time consistent PTQ framework for DLLMs. STaR-Quant introduces State-Guided Activation Transformation (SGAT) to assign masked and unmasked tokens to different activation transformation spaces with a unified static weight-side transformation. It further introduces Temporal Attention Compensation (TAC) to correct the quantized attention representation via a lightweight block-diagonal affine mapping. Experiments on representative DLLMs demonstrate that STaR-Quant consistently improves low-bit weight-activation quantization over strong PTQ baselines, while delivering up to 1.69x speedup and 3.14x memory saving over FP16 deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.04945v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Xin Yan, Aqiang Wang, Zhenglin Wan, Xingrui Yu, Ivor Tsang</dc:creator>
    </item>
    <item>
      <title>How Software Engineering Students Use LLMs to Write Research Papers: An Experience Report</title>
      <link>https://arxiv.org/abs/2606.05114</link>
      <description>arXiv:2606.05114v2 Announce Type: replace 
Abstract: Large language models are increasingly becoming part of software engineering education, including activities involving empirical software engineering and evidence synthesis. This paper reports an educational experience involving the integration of reflective LLM use into an empirical methods assignment in a third-year software architecture course. Students were asked to develop a short research paper using either a rapid review or a gray literature review methodology and to disclose how LLMs were used throughout the assignment. We analyzed 146 student disclosure statements using a cross-analysis process combining LLM-assisted categorization with manual verification and refinement by the researchers. The reflections describe how students incorporated LLMs during activities such as brainstorming, methodological clarification, organization of findings, and writing refinement, while also reporting concerns regarding inaccuracies and verification of generated content. This experience report discusses lessons learned and educational implications for integrating AI-assisted technologies into empirical software engineering education.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.05114v2</guid>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ronnie de Souza Santos, Maria Teresa Baldassarre, Cleyton Magalhaes, Italo Santos</dc:creator>
    </item>
    <item>
      <title>MOSAIC: A Workload-Driven Simulation and Design-Space Exploration Framework for Heterogeneous NPUs</title>
      <link>https://arxiv.org/abs/2606.05362</link>
      <description>arXiv:2606.05362v2 Announce Type: replace 
Abstract: AI model architectures are diversifying rapidly. Although dense matrix multiplication underlies today's CNNs and transformers, emerging architectures (state-space models, long convolutions via the fast Fourier transform (FFT), Kolmogorov-Arnold networks, and spiking networks) are not multiply-accumulate (MAC) dominated; they spend much of their computation on vector and non-MAC primitives that homogeneous, MAC-centric neural processing units (NPUs) serve poorly. This has motivated heterogeneous NPUs (HPUs) built from non-identical tiles. Prior heterogeneous designs vary only one or two coarse knobs (typically MAC precision or array size) and are evaluated on narrow workloads; no existing framework supports fine-grained HPU design, where tiles differ across many architectural dimensions at once. We present MOSAIC, an analytical simulator and design-space-exploration (DSE) framework for HPU microarchitecture design. MOSAIC searches the joint space of tile-level heterogeneity: beyond array size and precision, it varies tile-type composition (large Big, small Little, and non-MAC Special-Function tiles), dataflow, sparsity mode, MAC engine type, and special-function units for non-MAC operators (FFT, spiking-integrate, polynomial). Unlike prior simulators that model a single homogeneous tile type, MOSAIC models non-MAC tiles with their own energy, area, and timing models and maps operators across a mix of tiles with a heterogeneity-aware compiler. A multi-seed pipeline pairing a stratified sweep with genetic-algorithm refinement returns Pareto-optimal designs, with cost models calibrated to a 7 nm node and cross-validated against NVIDIA's Deep Learning Accelerator (NVDLA). Across a 20-workload suite, the best general-purpose HPU found by MOSAIC (~200 mm^2 Big+Little+Special-Function) achieves +46.91% mean iso-area energy savings over the best iso-area homogeneous baseline.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.05362v2</guid>
      <category>cs.AR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Arghadip Das, Hoseok Kim, Soomin Lee, Arnab Raha, Deepak A Mathaikutty, Vijay Raghunathan</dc:creator>
    </item>
    <item>
      <title>Should Demand Models Incorporate Competitor Prices? Oblivious Learning and Algorithmic Collusion</title>
      <link>https://arxiv.org/abs/2606.05363</link>
      <description>arXiv:2606.05363v2 Announce Type: replace 
Abstract: On a platform with many sellers, should a pricing algorithm explicitly model competitors' prices when learning demand? Classical learning arguments suggest an affirmative answer: ignoring competitors induces model misspecification and inefficiency. In contrast, recent work on algorithmic collusion suggests that strategic obliviousness -- deliberately ignoring competitor prices -- may facilitate collusive outcomes and improve profits. We study this modeling choice in a stylized competitive market with unknown noisy demand, in which multiple sellers repeatedly set prices and estimate demand via iterated least squares, and either incorporate competitors' prices into their demand models (informed) or ignore them (oblivious). We first show that, relative to a monopolist, an oblivious seller in a competitive market must explore more aggressively to compensate for the loss of dynamic competitor information. Building on this insight, we characterize market dynamics when all sellers are oblivious and show that prices converge to the competitive outcome under sufficient exploration, while a continuum of pseudo-equilibria arises when exploration decays. Analyzing the resulting price trajectories, we uncover an excursion phenomenon that gives rise to transient collusive patterns that dissipate as learning progresses. In markets with both oblivious and informed sellers, the informed strictly out-earn the oblivious. Read as a strategy game, the modeling choice has a unique Nash equilibrium: the all-informed market, in which prices converge to the competitive outcome efficiently. Overall, our results indicate that collusive patterns are not robust and are not sustained by oblivious modeling; therefore, incorporating competitor information, together with sufficient price exploration, remains a reliable strategy for sellers in competitive markets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.05363v2</guid>
      <category>cs.GT</category>
      <category>cs.LG</category>
      <category>econ.TH</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuhang Wu, Assaf Zeevi</dc:creator>
    </item>
    <item>
      <title>Would you still call this Dax? Novel Visual References in VLMs and Humans</title>
      <link>https://arxiv.org/abs/2606.05409</link>
      <description>arXiv:2606.05409v2 Announce Type: replace 
Abstract: Vision-language models (VLMs), like human learners, are frequently exposed to new visual concepts, but how they map novel visual references to language after exposure remains largely underexplored, particularly when those references contradict prior knowledge from pre-training. To study this, we present the Novel Visual References Dataset (NVRD): 19,176 images spanning 90 visual concepts across different levels of visual novelty, each with up to 20 increasingly perturbed versions of the original object to probe generalization. Unlike prior work on visual augmentations of familiar concepts, NVRD comprises entirely novel, open-ended stimuli constructed from scratch, mirroring how humans encounter genuinely new concepts. We evaluate 3 open- and 2 closed-source models alongside 2,400 human judgments for direct human-model comparison, and find that (i) models struggle to acquire novel concepts in-context when they contradict prior knowledge, and (ii) while models and humans show correlated sensitivity to visual perturbations, models significantly overgeneralize, extending learned labels to stimuli that humans reject. We contribute NVRD as a corpus and benchmark for research on visual concept learning in both humans and machines.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.05409v2</guid>
      <category>cs.CV</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ada Defne T\"ur, Gaurav Kamath, Joyce Chai, Siva Reddy, Benno Krojer</dc:creator>
    </item>
    <item>
      <title>GOTabPFN: From Feature Ordering to Compact Tokenization for Tabular Foundation Models on High-Dimensional Data</title>
      <link>https://arxiv.org/abs/2606.05441</link>
      <description>arXiv:2606.05441v2 Announce Type: replace 
Abstract: We investigate how to make small tabular foundation models effective for High-Dimensional, Low-Sample Size (HDLSS) tabular prediction without retraining large backbones. We introduce Graph-guided Ordering with Local Refinement (GO-LR), show its equivalence to weighted Minimum Linear Arrangement, and interpret the practical solver as a TSP-path-style surrogate. We propose GOTabPFN,which builds on GO-LR, and a Neuro-Inspired Subunit Compression (NSC) unit to pool locally adjacent ordered features into meta-features, yielding a compact representation that makes TabPFN-style prediction practical in HDLSS regimes. Across tabular benchmarks, GOTabPFN improves stability and accuracy under tight token budgets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.05441v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Al Zadid Sultan Bin Habib, Md Younus Ahamed, Prashnna Kumar Gyawali, Gianfranco Doretto, Donald A. Adjeroh</dc:creator>
    </item>
    <item>
      <title>Field Validation of a Multi-Resolution ConvLSTM Framework for Retaining Wall Deformation Prediction</title>
      <link>https://arxiv.org/abs/2606.05556</link>
      <description>arXiv:2606.05556v2 Announce Type: replace 
Abstract: This study presents a comprehensive field validation of a multi-resolution Convolutional Long Short-Term Memory (ConvLSTM) framework for predicting retaining wall deformation during staged excavation. The framework is trained on Gaussian noise-augmented numerical simulations and integrates ConvLSTM models operating at different temporal resolutions through a stacking ensemble strategy. The proposed framework is validated using field monitoring data from 34 inclinometers across 11 excavation sites in South Korea. Site-wise prediction performance is systematically evaluated using multiple evaluation metrics, with analyses of the influence of temporal deformation irregularity and spatiotemporal prediction characteristics on model performance. The results demonstrate that the framework predicts retaining wall deformation associated with up to 5.0 m of additional excavation with an average mean absolute error of 1.4 mm and a coefficient of determination of 0.93 across the excavation sites. These results indicate that the framework, although trained exclusively on numerically simulated and augmented database, can be effectively applied to diverse field excavation conditions and achieve a reliable level of prediction accuracy in practical retaining wall deformation prediction.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.05556v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jihoon Kim, Saeyon Kim, Heejung Youn</dc:creator>
    </item>
    <item>
      <title>Domain-Adapted Small Language Models with Hybrid Post-Processing: Achieving Cost-Efficient, Low-Latency Multi-Label Structured Prediction via LoRA Fine-Tuning on Scarce Data</title>
      <link>https://arxiv.org/abs/2606.05781</link>
      <description>arXiv:2606.05781v2 Announce Type: replace 
Abstract: Deploying frontier large language models (LLMs) for domain-specific structured evaluation tasks incurs prohibitive latency, cost, and data-privacy overhead. We present a hybrid framework that fine-tunes a small language model (LLaMA 3.1 8B, 2.05% trainable parameters via LoRA) on only 219 curated examples and couples it with a deterministic rule-based postprocessing layer. Applied to multi-label compliance evaluation of conversational transcripts (18 heterogeneous output fields), our system achieves 100% JSON structural validity, 83.0% human-validated overall accuracy, and 100% accuracy on the most critical classification field in blind evaluation on 53 unseen production transcripts. On a single NVIDIA A100 GPU, inference completes in $\sim$2 seconds -- 2--5x faster than frontier APIs -- at USD 0.013 per evaluation versus USD 0.025--0.055 for proprietary alternatives, yielding 46--76% cost savings.
  We introduce targeted hard-negative augmentation for critical decision boundaries and formalize the hybrid neural-symbolic decomposition, demonstrating that domain-adapted small language models with postprocessing can match frontier model accuracy while dramatically reducing operational cost, latency, and privacy risk.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.05781v2</guid>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Srinivasan Manoharan, Dilipkumar Nallusamy, Sachin Kumar, Haifeng Wu</dc:creator>
    </item>
    <item>
      <title>Causal Longitudinal Prior-Fitted Networks for Counterfactual Outcome Prediction</title>
      <link>https://arxiv.org/abs/2606.05797</link>
      <description>arXiv:2606.05797v2 Announce Type: replace 
Abstract: Longitudinal treatment decisions from multivariate time-series data require predicting potential outcomes under future treatment sequences in the presence of time-varying confounding, heterogeneous patient dynamics, and limited domain-specific data. Existing longitudinal causal estimators typically address this problem by training a new model for each cohort or simulator. We introduce Causal Longitudinal Prior-Fitted Networks (CausalLongPFN), a prior-fitted network for time-series causal inference in longitudinal treatment-response data and zero-shot in-context counterfactual outcome prediction. The model is pretrained entirely on synthetic episodes sampled from a broad prior over temporal structural causal models, exposing it to treatment-confounder feedback, latent heterogeneity, nonlinear state evolution, delayed effects, and cumulative treatment responses. At test time, CausalLongPFN remains frozen and is used zero-shot: it conditions on support trajectories, a query history, and a planned future treatment sequence, and returns a predictive distribution over future outcomes without gradient updates or propensity-model fitting. Multi-step predictions are obtained by recursively applying the one-step predictor under the specified treatment sequence. We evaluate the model on branchable cancer, HIV, and warfarin benchmarks with ground-truth counterfactual labels, and on factual-only rolling-origin prediction in MIMIC-III ICU trajectories. CausalLongPFN is competitive with domain-trained longitudinal baselines on counterfactual benchmarks and performs strongly on factual MIMIC-III prediction, suggesting that broad synthetic causal pretraining can provide a frozen, amortized alternative for zero-shot longitudinal treatment-response prediction when repeated domain-specific training is costly or impractical.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.05797v2</guid>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Amirhossein Zare, Amirhessam Zare, Herlock Rahimi, Reza Salarikia, Mohammad Kashkooli</dc:creator>
    </item>
    <item>
      <title>Emotion-Aware Image Generation from Korean Diary Text via LLM-based Prompt Translation and LoRA Fine-Tuning</title>
      <link>https://arxiv.org/abs/2606.05816</link>
      <description>arXiv:2606.05816v2 Announce Type: replace 
Abstract: T2I models cannot effectively capture sentiment from various types of text, including diaries, as they primarily focus on visual object-related patterns rather than contextual emotional understanding. This paper proposes an emotion-aware text-to-image pipeline that generates children's hand drawing style images from short Korean diary entries. The proposed pipeline employs Qwen3-8B for recognising implicit sentiment from short diaries, and Stable Diffusion 3.5 Medium fine-tuned with LoRA on children's drawing images with emotion-based trigger words for image generation. Additionally, this paper presents experiments examining the effect of emotion trigger words on generated images and discusses the limitations of CLIP Score as an evaluation metric for emotion-aware image generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.05816v2</guid>
      <category>cs.CV</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:journal_reference>Proc. Int. Conf. Multimedia, Information Technology and its Applications (MITA), 2026</arxiv:journal_reference>
      <dc:creator>Jihun Cho, Soo-Yeon Jeong, Sun-Young Ihm</dc:creator>
    </item>
    <item>
      <title>Entropy-Based Evaluation of AI Agents: A Lightweight Framework for Measuring Behavioral Patterns</title>
      <link>https://arxiv.org/abs/2606.05872</link>
      <description>arXiv:2606.05872v2 Announce Type: replace 
Abstract: AI agents are commonly evaluated using task success, reward, latency, and cost. These metrics are useful, but they often miss important aspects of agent behavior: whether an agent explores too much, repeats itself too rigidly, uses tools effectively, reduces uncertainty over time, or remains robust across repeated runs. This paper proposes Entropy-Based Evaluation of AI Agents (EEA), a lightweight framework for measuring agent behavior through entropy. Rather than treating intelligence as only final task completion, EEA studies the structure of the agents decision process. The framework introduces action entropy, trajectory entropy, tool entropy, information gain, exploration efficiency, and robustness entropy. These metrics are intended to complement, not replace, traditional evaluation methods. We also present a practical Python implementation designed to integrate with agent frameworks such as LangChain, Google ADK, custom agent loops, and stored observability traces.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.05872v2</guid>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Olasimbo Ayodeji Arigbabu</dc:creator>
    </item>
    <item>
      <title>A Pre-Registered Causal Partition of Self-Consistency Elicitation and Reward Design in RLVR</title>
      <link>https://arxiv.org/abs/2606.05932</link>
      <description>arXiv:2606.05932v2 Announce Type: replace 
Abstract: Reinforcement learning from verifiable rewards (RLVR) improves reasoning even when the reward signal is spurious -- assigning credit to the group-plurality answer rather than a ground-truth verifier. Practitioners commonly interpret naive = acc(TRUE) - acc(RANDOM) as the reward-design effect. We prove this estimand is systematically biased: it conflates self-consistency elicitation (sharpening the policy toward its modal answer via majority pseudo-reward) with genuine reward-design signal. Using a controlled tabular-GRPO simulator we derive an exact telescoping decomposition total = null + elicit + rd and measure each term across five prior-strength levels. The reward-design fraction of the naive estimator ranges from 0.139 at weak prior (ps=0.20) to 0.05 at strong prior (ps=0.80), with the elicitation term flipping sign at the self-consistency crossover. A pre-registered 2x2x2 factorial confirms non-additivity (interaction ratio 0.385; AxC effect -0.089). A points-vs-bounds pilot gate shows strong-prior regimes are point-identified while near-crossover regimes are only bounded. Re-audits of two named published results yield ELICITATION DOMINATED (elicitation share 0.98) and REWARD DESIGN DOMINATED (rd share 1.18) verdicts respectively, demonstrating the diagnostic value of the partition. We pre-commit to submit regardless of flip outcome; a non-flip is a finding of equal standing. We release a reusable one-command harness for any alignment paper to run the same audit.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.05932v2</guid>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Yuze Gao</dc:creator>
    </item>
    <item>
      <title>OPRD: On-Policy Representation Distillation</title>
      <link>https://arxiv.org/abs/2606.06021</link>
      <description>arXiv:2606.06021v2 Announce Type: replace 
Abstract: On-policy distillation (OPD) supervises the student only in output space by matching next-token probabilities. This output-only paradigm has two limits: (1) sampling variance from Monte Carlo KL estimates over large vocabularies (e.g., Qwen's ~150k tokens) persists throughout training, and (2) it treats the teacher as a black-box, discarding all intermediate hidden states after the LM head. We propose On-Policy Representation Distillation (OPRD), which lifts distillation into hidden-state space by aligning student and teacher representations across selected layers on the same rollouts, bypassing the LM head entirely. Theoretically, OPRD eliminates sampling variance and provides richer per-layer structural information. Empirically, OPRD closes the student-teacher gap on AIME 2024/2025 and AIMO, while output-space OPD baselines plateau below the teacher. OPRD also trains 1.44x faster and uses 54% less memory than top-k OPD. Code: https://github.com/ShenzhiYang2000/OPRD.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06021v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Shenzhi Yang, Guangcheng Zhu, Bowen Song, Haobo Wang, Mingxuan Xia, Xing Zheng, Yingfan Ma, Zhongqi Chen, Weiqiang Wang, Gang Chen</dc:creator>
    </item>
    <item>
      <title>RealDexUMI: A Wearable Universal Manipulation Interface for Dexterous Robot Learning</title>
      <link>https://arxiv.org/abs/2606.06033</link>
      <description>arXiv:2606.06033v2 Announce Type: replace 
Abstract: Learning dexterous manipulation requires demonstrations that preserve fine hand-object interactions while remaining executable at deployment. Existing pipelines either lose deployable dexterity through retargeting or embodiment conversion, or rely on robot-specific teleoperation that is costly to scale and often lacks intuitive, contact-aware control for dexterous data collection. We present RealDexUMI, a wearable universal manipulation interface built around a shared dexterous end-effector module that integrates a lightweight dexterous hand, in-hand vision, and fingertip tactile sensing. A palm-side isomorphic teleoperation glove maps human finger inputs to robot-hand joint commands, enabling real-time, retargeting-free, intuitive, and precise hand control. The shared hand and sensing modules yield zero-gap end-effector data, with matched in-hand observations, tactile signals, contacts, and hand actions between collection and deployment. Across eight real-robot tasks spanning fine-grained, contact-rich, long-horizon, and bimanual manipulation, policies trained on RealDexUMI data achieve an average success rate of 88.75%, generalize to unseen initial poses, and transfer across three embodiments. Website: https://research.beingbeyond.com/realdexumi</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06033v2</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Chaoyi Xu, Yixuan Jiang, Jiahui Huan, Yuhui Fu, Haoyu Zhou, Weitian Yuan, Jiayi Yu, Wanpeng Zhang, Haoqi Yuan, Zongqing Lu</dc:creator>
    </item>
    <item>
      <title>SpeechJBB: Probing Safety Alignment and Comprehension in Large Audio Language Models under Code-Switched Speech</title>
      <link>https://arxiv.org/abs/2606.06037</link>
      <description>arXiv:2606.06037v2 Announce Type: replace 
Abstract: Large audio language models (LALMs) are increasingly deployed in real-world applications, yet their safety alignment is still primarily evaluated on monolingual, text-based harmful prompts. This leaves their generalizability under multilingual and spoken settings, particularly code-switched speech, largely underexplored. To address this gap, we introduce SpeechJBB, an audio jailbreak dataset for benchmarking across multiple state-of-the-art LALMs. The extent of safety weaknesses is further probed by introducing an augmented setting where phonologically plausible pseudo-words are inserted around safety-critical terms to simulate localized obfuscation. Across models, code-switched harmful audio yields substantially high jailbreak success rates (JSR), with non-English monolingual and non-English code-switched pairs exhibiting the highest attack success. Pseudo-word insertion further reduces refusal rates, which demonstrates that natural-sounding obfuscation can effectively bypass safety policies.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06037v2</guid>
      <category>cs.SD</category>
      <category>eess.AS</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Virginia Ceccatelli, Yejin Jeon, David Ifeoluwa Adelani</dc:creator>
    </item>
    <item>
      <title>Learning Visual Spatial Planning from Symbolic State via Modality-Gap-Aware Self-Distillation</title>
      <link>https://arxiv.org/abs/2606.06076</link>
      <description>arXiv:2606.06076v2 Announce Type: replace 
Abstract: While vision-language models excel at general multimodal understanding, they still struggle with visual spatial planning. We attribute this to a perception-reasoning modality gap: visual planning requires models to infer latent state structures from pixels and then reason over the recovered structure to produce valid actions, whereas symbolic planning directly leverages explicit objects and constraints. This creates dual bottlenecks in visual state recovery and multi-step planning. To address this, we propose MGSD, a two-stage modality-gap-aware self-distillation framework. First, a cold-start grounding stage equips the visual student with reliable state representations, minimizing early perception noise. Second, a privileged teacher transfers planning capabilities via on-policy distillation, using explicit symbolic states to supervise the student's own visual rollout prefixes. Crucially, symbolic data is used strictly during training, leaving inference purely visual. Experiments on visual planning benchmarks show that MGSD consistently improves visual planning across both 4B and 8B backbones, raising the macro average by 19.3% and 18.4%, respectively. The resulting models narrow the gap to symbolic-input upper bounds, while ablations and diagnostics confirm that the improvement comes from both visual state recovery and optimal-path reasoning. These results suggest that modality-gap-aware self-distillation improves not only how models perceive actionable states, but also how they plan over the inferred structure. Code is available at https://github.com/Oranger-l/MGSD.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06076v2</guid>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haocheng Luo, Jiahui Liu, Ruicheng Zhang, Zhizhou Zhong, Jiaqi Huang, Zunnan Xu, Quan Shi, Jun Zhou, Xiu Li</dc:creator>
    </item>
    <item>
      <title>Towards Healthy Evolution: Exploring the Role and Mechanisms of Human-Agent Interaction in Self-Evolving Systems</title>
      <link>https://arxiv.org/abs/2606.06114</link>
      <description>arXiv:2606.06114v2 Announce Type: replace 
Abstract: Self-evolving agents improve through continual self-play and self-generated learning signals, but autonomous evolution can also cause capability degradation and safety drift. Although human feedback has proven effective for static and post-trained agents, its role in self-evolving systems remains underexplored. We introduce Agent Norm Correction through Human-like Oversight and Review (ANCHOR), an LLM-based framework that simulates human supervision and delivers feedback at various phases of self-evolution. With ANCHOR, we evaluate two representative open-source self-evolving agent systems across coding, mathematical reasoning, and safety. Our results show that even limited supervision substantially mitigates safety degradation while preserving stable performance on core evolutionary objectives. Further analysis shows that supervision over the output verification phase is the most effective for intervention, whereas increasing supervision frequency yields diminishing returns. These findings provide empirical evidence and practical guidance for designing more stable, controllable, and human-aligned self-evolving agent systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06114v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dianxing Shi, Bowen Wang, Junqi He, Junhao Chen, Yuta Nakashima</dc:creator>
    </item>
    <item>
      <title>An Infectious Disease Spread Simulation Based on Large Language Model Decision Making</title>
      <link>https://arxiv.org/abs/2606.06360</link>
      <description>arXiv:2606.06360v2 Announce Type: replace 
Abstract: Modelling individual decision-making during infectious disease outbreaks is crucial for understanding behavioural dynamics and informing effective public health interventions. Prior work has shown that large language models can simulate realistic human behaviour by generating agent decisions based on demographic prompts and situational context. We build on this foundation with a spatially grounded, agent-based simulation framework that integrates LLM-generated decisions about self-reported influenza-like illness into a census-based synthetic population of agents. Location is treated as a central feature: agents are assigned to spatial units within cities, capturing the spatial distributions of different demographic groups using real-world census data and enabling geographically diverse behavioural modelling. We implement and compare three decision scenarios, independent reasoning, household influence, and message framing, and simulate self-reporting outcomes in San Francisco and Atlanta. Results reveal that income and education are the dominant drivers of reporting rate variation, with smaller but consistent effects from geography, LLM model choice, and message framing. Our framework generates synthetic data that captures both social and geographic heterogeneity, supporting spatial epidemiological modelling and bias-aware behavioural analysis.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06360v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yonchanok Khaokaew, Ruochen Kong, Andreas Zufle, Hao Xue, Taylor Anderson, Chandini Raina MacIntyre, Matthew Scotch, Flora D. Salim, David J Heslop</dc:creator>
    </item>
    <item>
      <title>Humans' ALMANAC: A Human Collaboration Dataset of Action-Level Mental Model Annotations for Agent Collaboration</title>
      <link>https://arxiv.org/abs/2606.06388</link>
      <description>arXiv:2606.06388v2 Announce Type: replace 
Abstract: Recent advances in LLM agents have enabled complex cognitive capabilities, such as multi-step reasoning, planning, and tool use, that increasingly position these agents as human collaborators. Effective collaboration, however, requires collaborators to continuously maintain and align mental models of their own reasoning,partners' intentions, and shared goals during the collaborative process. Today's agents rarely develop such capabilities since they are primarily optimized for task completion, and the community lacks authentic human collaboration data with action-level mental model annotations that could guide agents toward process-level collaborative competence. To bridge this gap, we present ALMANAC, a dataset of Action-Level Mental model ANnotations for Agent Collaboration built from the Map Task, a classic dyadic routing task from social science. ALMANAC contains 2,987 collaboration actions, each paired with theory-informed mental model annotations that record the participants' self-reasoning, perceived partner intent, and perceived team goal. We benchmark six LLMs on predicting humans' next-turn behavior and mental models. Our results demonstrate ALMANAC's utility in evaluating models' ability to simulate human collaborative behaviors and infer their underlying mental models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06388v2</guid>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiaju Chen, Yuxuan Lu, Jiayi Su, Chaoran Chen, Songlin Xiao, Zheng Zhang, Yun Wang, Yunyao Li, Jian Zhao, Tongshuang Wu, Toby Jia-Jun Li, Dakuo Wang, Bingsheng Yao</dc:creator>
    </item>
    <item>
      <title>CollabSim: A CSCW-Grounded Methodology for Investigating Collaborative Competence of LLM Agents through Controlled Multi-Agent Experiments</title>
      <link>https://arxiv.org/abs/2606.06399</link>
      <description>arXiv:2606.06399v2 Announce Type: replace 
Abstract: Multi-agent systems (MAS) built on large language models have shown growing promise, with their effectiveness resting on agents' ability to coordinate through text-based channels much as human teams do. Yet recent study suggests that MAS often falter not because agents lack individual task-solving ability, but because they lack collaborative competence: the capacity to establish common ground, maintain shared task understanding, balance individual and collective incentives, and repair misalignment as interaction unfolds. Decades of research in Computer-Supported Cooperative Work have characterized these requirements for human teams coordinating under constrained communication, yet existing MAS evaluations focus mainly on task outcomes or single-agent proficiency in reasoning, planning, and tool use. To enable a systematic analysis of agents' collaborative competence in MAS, we introduce CollabSim, a configurable simulation framework that combines a theory-grounded definition of collaborative capabilities, controlled manipulation of interaction conditions, and action-level probing of agents' internal states. Experiments across four LLMs show that CollabSim can capture condition effects, separate model performance patterns, and reveal task-dependent effects of agent design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06399v2</guid>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jiaju Chen, Bo Sun, Yuxuan Lu, Yun Wang, Dakuo Wang, Bingsheng Yao</dc:creator>
    </item>
    <item>
      <title>A Vision-language Framework for Comparative Reasoning in Radiology</title>
      <link>https://arxiv.org/abs/2606.06407</link>
      <description>arXiv:2606.06407v2 Announce Type: replace 
Abstract: Medical imaging artificial intelligence has achieved strong performance in isolated image interpretation, but remains poorly aligned with radiological practice, where diagnosis and follow-up rely on comparison across prior studies and analogous reference cases. Here we formulate radiological comparison as an entity-aware cross-image reasoning problem and introduce a framework that supports both reference-case retrieval and temporal comparative interpretation. We construct MedReCo-DB, a large-scale comparative imaging resource derived from routine image-report pairs, comprising more than 690,000 images from over 160,000 patients across eight institutions, four countries and seven imaging modalities. Reports are decomposed into anatomical structures, abnormal findings and pathological conditions to provide supervision for entity-conditioned retrieval and comparative visual question answering. Using this resource, we develop MedReCo, an entity-aware visual encoder for controllable retrieval of clinically analogous cases, and MedReCo-VLM, a vision--language extension for generative interpretation of interval change. Across internal, external and cross-center evaluations, MedReCo achieved the highest Recall@1 in all 12 internal retrieval settings and improved external retrieval by a mean of 6.0 percentage points. In clinically confusable differential groups, it consistently outperformed the strongest baselines. MedReCo-VLM achieved the best performance across all comparative generation evaluations and improved longitudinal follow-up accuracy by 14.5-46.5 percentage points on chest radiographs and 13.0-27.9 percentage points on CT. These findings suggest that entity-aware comparative reasoning can be learned from routine clinical data at scale and may provide a more clinically aligned foundation for medical imaging AI.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06407v2</guid>
      <category>cs.CV</category>
      <category>cs.IR</category>
      <category>cs.LG</category>
      <category>eess.IV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tengfei Zhang, Ziheng Zhao, Xiaoman Zhang, Lisong Dai, Pengcheng Qiu, Ya Zhang, Yanfeng Wang, Weidi Xie</dc:creator>
    </item>
    <item>
      <title>Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions</title>
      <link>https://arxiv.org/abs/2606.06443</link>
      <description>arXiv:2606.06443v2 Announce Type: replace 
Abstract: Large language models are increasingly used to simulate social media users and infer how individuals may respond to online discussions. However, it remains unclear whether these simulations reflect precise user-specific beliefs or whether they are highly sensitive to semantically independent changes in conversational contexts. In this work, we study counterfactual context revision as a framework for auditing LLM-based stance simulation. Given an original online conversation, we first infer a target user's stance toward a specific topic. We then apply controlled revision strategies to the conversational context and simulate the user's stance again under the revised context. We compare text-only revision strategies with a multimodal one that incorporates meme-based context and evaluate two main effectiveness metrics, i.e., average directional stance shift and stance transition rate. The results reveal effective and robust stance transitions in both text-only and multimodal strategies across different polarization-preference mechanisms. Our study contributes an evaluation framework for understanding the context sensitivity of LLM-based stance simulation. More broadly, it highlights both the promise and risk of using LLMs to simulate online opinion dynamics.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06443v2</guid>
      <category>cs.CL</category>
      <category>cs.MM</category>
      <category>cs.SI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Xinnong Zhang, Wanting Shan, Hanjia Lyu, Zhongyu Wei, Jiebo Luo</dc:creator>
    </item>
    <item>
      <title>HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers</title>
      <link>https://arxiv.org/abs/2606.06493</link>
      <description>arXiv:2606.06493v2 Announce Type: replace 
Abstract: For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and expressive enough for diverse loco-manipulation skills. To this end, we introduce HANDOFF, a single humanoid whole-body controller that follows this interface and is distilled via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student from three complementary specialists: whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery. On the Unitree G1, HANDOFF matches state-of-the-art velocity tracking and offers one of the largest robust manipulation workspaces. We further demonstrate hardware feasibility through multiple natural-language-driven task roll-outs, powered by a VLM-driven agentic planner with no task-specific data or controller fine-tuning.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06493v2</guid>
      <category>cs.RO</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Lizhi Yang, Junheng Li, Nehar Poddar, Yiling Hou, Gio Huh, Robert Griffin, Georgia Gkioxari, Aaron Ames</dc:creator>
    </item>
    <item>
      <title>Real-Time AttentionBender: Granular Interactive Network Bending of Video Diffusion Transformers</title>
      <link>https://arxiv.org/abs/2606.06497</link>
      <description>arXiv:2606.06497v2 Announce Type: replace 
Abstract: Generative video models have achieved remarkable visual fidelity, yet their prompt-only interface offers thin creative agency and obscures the model's material process from the artists working with it. We present Real-Time AttentionBender, a tool that extends the practice of network bending across the full depth of the video diffusion transformer (DiT) and brings it into live, interactive generation. Built as a plugin within the DayDream Scope ecosystem and wrapping open-source real-time Wan pipelines, the tool exposes self-attention, cross-attention, and the feed-forward network as independently manipulable surfaces, with targeting down to individual diffusion steps, DiT layers, prompt tokens, and hidden neurons. The immediacy of live manipulation affords what we call "material intimacy" with the model: a responsive, near-mechanistic feel for how specific layers and neurons shape generated video. We position the tool as simultaneously an XAIxArts probe into transformer internals and an expressive instrument for discovering aesthetics outside the model's default representational space.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06497v2</guid>
      <category>cs.GR</category>
      <category>cs.CV</category>
      <category>cs.HC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Adam Cole, Rebecca Fiebrink, Mick Grierson</dc:creator>
    </item>
    <item>
      <title>Characterizing the Impact of NVFP4 Quantization for Low-Power Edge AI Deployment</title>
      <link>https://arxiv.org/abs/2606.06527</link>
      <description>arXiv:2606.06527v2 Announce Type: replace 
Abstract: Energy-efficient neural-network inference at the edge requires reducing arithmetic cost, memory traffic, computation energy, and storage overhead while maintaining acceptable accuracy. This paper presents an ablation-focused study of NVFP4 quantization for edge-efficient neural networks, with emphasis on the relationship between activation precision, weight precision, block-size scaling, retraining, and model accuracy. NVFP4 activations are represented using 4-bit FP4 data, an FP8 block scale, and an FP32 tensor scale, enabling ultra-low precision inference while preserving activation dynamic range. A block-size ablation over six edge-efficient models shows that block size B = 16 provides a practical accuracy/storage trade-off, requiring only 4.5078 bits per input for N = 4096. A weight precision ablation further shows that FP8 and FP16 weights provide only modest gains over FP4 weights under the same NVFP4 activation path, suggesting that activation quantization and scaling dominate much of the accuracy behavior. To isolate the benefit of the NVFP4 data type, this work compares conventional unscaled FP4 activation inference and NVFP4 activation inference with and without retraining. The results show that conventional FP4 inference collapses accuracy for most compact models, while NVFP4 without retraining already recovers substantial accuracy by restoring activation dynamic range through FP8 block scaling and FP32 tensor scaling. When combined with retraining, NVFP4 achieves the best accuracy across the evaluated models, demonstrating the effectiveness of scaling-aware FP4 (NVFP4) inference. These findings provide general design guidance for hardware-software co-design of low power edge inference across a broad range of accelerator platforms, including GPUs, Tensor Cores, FPGAs, domain-specific AI accelerators, near-memory computing systems, and emerging edge-computing architectures.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06527v2</guid>
      <category>cs.AR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ovishake Sen, Venkata Nithin Kamineni, Daniel Lobo, Swarup Bhunia, Rickard Ewetz, Baibhab Chatterjee</dc:creator>
    </item>
    <item>
      <title>Multi-Scale Feature Attention Network for Polymer Classification Using Terahertz Spectroscopy</title>
      <link>https://arxiv.org/abs/2606.06554</link>
      <description>arXiv:2606.06554v2 Announce Type: replace 
Abstract: Reliable polymer identification is essential for ensuring the quality and safety of recycled plastics, yet conventional sorting and spectroscopic techniques often struggle to deliver robust discrimination. Terahertz (THz) spectroscopy offers a promising alternative, providing high-resolution and non-destructive measurements. In this work, we leverage THz signals to classify 12 types of polymers, including pure polymers, multilayer films, commercial blends, and biopolymers. To handle the complexity of these spectral signals, we propose the Multi-Scale Feature Attention Network (MSFAN), a novel deep learning architecture tailored for THz data. The framework integrates feature gating for signal recalibration and multi-scale parallel convolutions to capture diverse frequency patterns. These features are further refined through cross-feature attention and attention pooling, enabling the model to intrinsically highlight the most informative THz regions. MSFAN consistently outperforms state-of-the-art models, reaching a classification accuracy of 85.2%. This study demonstrates the potential of combining THz spectroscopy with deep learning techniques for effective, scalable, and interpretable polymer classification.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06554v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Roshni Mahtani, Il\'an Carretero, Laura Monroy, Aldo Moreno-Oyervides, Oscar El\'ias Bonilla-Manrique, Roc\'io del Amor</dc:creator>
    </item>
    <item>
      <title>A Study of Parallel Continuous Local Search</title>
      <link>https://arxiv.org/abs/2606.06656</link>
      <description>arXiv:2606.06656v2 Announce Type: replace 
Abstract: We study parallel Continuous Local Search (CLS) as a solution approach for Boolean satisfiability problems with symmetric pseudo-Boolean (PB) constraints. Here, the $n$-variable PB-satisfiability problem is relaxed to a continuous optimisation problem with a differentiable objective function on an $n$-dimensional hypercube. For satisfiable instances, the global minimisers of this optimisation problem correspond to satisfying assignments of the SAT problem at hand. We present several novel findings via empirical experiments: (i) redundant constraints can inhibit rather than accelerate convergence; (ii) CLS shows promise as a sub-solver in hybridised settings, quickly completing partial assignments; and (iii) local search rapidly converges to a stable distribution of solution quality (i.e., degree of satisfaction), due to saddle-dense objectives where additional solver steps yield diminishing returns. Our findings inform practical uses of CLS for SAT on modern accelerator hardware.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06656v2</guid>
      <category>cs.AI</category>
      <category>cs.LO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Cody J Christopher, Charles Gretton</dc:creator>
    </item>
    <item>
      <title>RECAP: Regression Evaluation for Continual Adaptation of Prompts</title>
      <link>https://arxiv.org/abs/2606.06698</link>
      <description>arXiv:2606.06698v2 Announce Type: replace 
Abstract: Production agentic systems routinely face evolving constraints and must comply from the very next interaction. Scenarios like a tool-call notification changing a compliance threshold or a policy update adding disclosure requirements fit this criteria, having close to no room for errors in production. This proactive adaptation setting is common in deployment, but absent from current benchmarks, which assume either static constraint sets or reactive protocols with evaluation feedback. We introduce RECAP, a benchmark that measures continual-learning phenomena (forgetting, regression, forward transfer) at the constraint level under a strictly proactive adapt-then-test protocol: prompt optimization methods receive only the constraint specification and must generalize before seeing any test data. Evaluating six methods across four LLMs and three schedules with evolving constraints, we find that these methods show no significant improvement in performance, even after incurring a higher latency. These methods, designed for offline or reactive settings, are inadequate for the proactive paradigm. Our work emphasizes the growing need for designing proactive prompt adaptation methods, where the models must remain robust to evolving needs in deployment.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06698v2</guid>
      <category>cs.LG</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Harsh Deshpande, Kushal Chawla, Sangwoo Cho, William Campbell</dc:creator>
    </item>
    <item>
      <title>ThinkBooster: A Unified Framework for Seamless Test-Time Scaling of LLM Reasoning</title>
      <link>https://arxiv.org/abs/2606.06915</link>
      <description>arXiv:2606.06915v2 Announce Type: replace 
Abstract: Test-time compute (TTC) scaling has emerged as a powerful paradigm for improving large language model (LLM) reasoning by allocating additional compute during inference, e.g., via multi-sample generation and verifier-based reranking. Existing TTC scaling strategies and reasoning scorers remain fragmented, evaluated under inconsistent protocols, and are rarely analyzed through the lens of quality-cost trade-offs. We introduce ThinkBooster, a unified framework for seamless test-time compute scaling of LLM reasoning, which consists of (i) a modular Python library implementing state-of-the-art TTC scaling strategy and scorer families, (ii) a benchmark that jointly evaluates performance and computational efficiency, and (iii) a deployable OpenAI-compatible proxy service that enables drop-in integration of adaptive reasoning into real-world applications. We further provide a demo visual debugger for inspecting the reasoning trajectories, intermediate selection decisions, and alternative reasoning paths. Empirical results on mathematical and coding tasks reveal the performance-compute trade-offs of TTC scaling strategies and scoring methods and demonstrate that ThinkBooster provides practical gains in real-world tasks. The code is available online under an MIT license.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.06915v2</guid>
      <category>cs.CL</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vladislav Smirnov, Chieu Nguyen, Sergey Senichev, Minh Ngoc Ta, Ekaterina Fadeeva, Artem Vazhentsev, Daria Galimzianova, Nikolai Rozanov, Viktor Mazanov, Jingwei Ni, Tianyi Wu, Igor Kiselev, Mrinmaya Sachan, Iryna Gurevych, Preslav Nakov, Timothy Baldwin, Artem Shelmanov</dc:creator>
    </item>
    <item>
      <title>Front-to-Attractors: Modifying the Front-to-Front Heuristic in Bidirectional Search</title>
      <link>https://arxiv.org/abs/2606.07047</link>
      <description>arXiv:2606.07047v2 Announce Type: replace 
Abstract: Heuristics play a central role in the performance of bidirectional search algorithms, which commonly rely on two main classes. Front-to-end (F2E) heuristics estimate the distance from a state s to the target of the search (the goal for forward search or the start for backward search). In contrast, front-to-front (F2F) heuristics estimate the distance from s to the opposite search frontier using a pairwise function h(s, s'), where s' ranges over frontier states. Although F2F heuristics are typically more informative and therefore reduce the number of node expansions, their reliance on extensive pairwise evaluations incurs substantial computational overhead. To address this limitation, we introduce a new heuristic class, front-to-attractors (F2A), that preserves much of the informativeness of F2F while dramatically reducing its computational cost. Rather than evaluating distances to all states on the opposite frontier, F2A estimates the distance from s to a small, dynamically maintained set of attractors in the opposite search direction. These attractors serve as a surrogate for the full frontier, enabling rich heuristic guidance at a fraction of the computational expense while maintaining the optimality guarantees offered by F2F. We evaluate F2A across multiple domains and show that it reduces the number of pairwise evaluations by up to 11.2x compared to F2F, while achieving 4.8x fewer node expansions than F2E on average.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07047v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Alvin Zou, Muhammad Suhail Saleem, Maxim Likhachev</dc:creator>
    </item>
    <item>
      <title>DyCon: Dynamic Reasoning Control via Evolving Difficulty Modeling</title>
      <link>https://arxiv.org/abs/2606.07108</link>
      <description>arXiv:2606.07108v2 Announce Type: replace 
Abstract: Recent advances in Large Reasoning Models (LRMs) demonstrate remarkable performance improvements by iteratively reflecting, exploring, and executing complex tasks, yet suffer from inefficiencies due to redundant reasoning, known as "overthinking". Existing methods to mitigate this issue either rely on static difficulty estimates or require task-specific training, and thus fail to adapt to the dynamic complexity during reasoning. In this work, we empirically show that the problem difficulty evolves dynamically throughout the reasoning process and is linearly encoded in the LRM's step-level embeddings. Building on this insight, we propose DyCon, a training-free framework that leverages latent step-level representations to explicitly model the evolving task difficulty, enabling the dynamic control of reasoning depth to mitigate the overthinking issue. Extensive experiments conducted on four models ranging from 4B to 32B, and across twelve benchmarks in math reasoning, general question answering, and coding tasks demonstrate that DyCon significantly enhances reasoning efficiency by reducing redundant steps without sacrificing accuracy or generalization. Code is available at https://github.com/yu-lin-li/DyCon.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07108v2</guid>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tengyao Tu, Yulin Li, Hui-Ling Zhen, Libo Qin, Zhoujun Wei, Jinghua Piao, Zhuotao Tian, Yong Li, Min Zhang</dc:creator>
    </item>
    <item>
      <title>QuadVerse: An Integrated Framework Aligning Visual-Physical Reality for Quadruped Simulation</title>
      <link>https://arxiv.org/abs/2606.07118</link>
      <description>arXiv:2606.07118v2 Announce Type: replace 
Abstract: Simulation is central to robot learning, yet the sim-to-real gap remains a major bottleneck. Existing approaches often tackle visual or dynamic gaps separately, overlooking how these individual mismatches accumulate and propagate throughout the robot's state evolution. In this paper, we introduce QuadVerse, an integrated framework that uses reconstructed scenes as a calibration substrate for aligning visual perception, physical interaction, and actuator dynamics. From captured RGB videos, we reconstruct geometry-constrained 3D Gaussian Splatting (3DGS) scenes that support batched photorealistic ego-view rendering and collision-ready semantic mesh extraction. The meshes further enable contact calibration by initializing spatially varying friction priors and refining them through trajectory-based posterior search. To address remaining actuator discrepancies, QuadVerse trains a residual dynamics compensator by replaying real-world trajectories on the contact-calibrated terrain, reducing the entanglement between terrain-induced contact errors and actuator non-idealities. Experiments show that QuadVerse improves reconstruction quality and locomotion tracking over relevant baselines. Leveraging this foundation, we demonstrate robust zero-shot visual-navigation policy deployment without task-specific real-world rollouts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07118v2</guid>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuxiang Chen, Yuanhao Wang, Ziheng Zhang, Meng Zhang, Yu Liu, Yufei Jia, Tiancai Wang, Erjin Zhou, Jin Xie</dc:creator>
    </item>
    <item>
      <title>FLOWREADER: Min-Cost Flow Optimization for Multi-Modal Long Document Q&amp;A</title>
      <link>https://arxiv.org/abs/2606.07235</link>
      <description>arXiv:2606.07235v2 Announce Type: replace 
Abstract: Long, multimodal documents force retrieval-augmented systems to assemble answers from evidence fragmented across text, tables, and slides broken across cells in a long table, spread over multiple slides, or split between a figure and its discussion. Top-$k$ chunk retrieval treats each fragment independently and cannot represent how evidence connects. We introduce FLOWREADER, which reframes evidence assembly as a min-cost flow problem on a multimodal node graph: a single scoring vector $h$ controls source selection (via MMR), sink selection (via a length-aware answerability proxy), and the costs and capacities of every edge. The optimal flow is decomposed into candidate evidence paths, a compact non-redundant subset is selected by entropy-regularized replicator dynamics, and parallel VLM workers under a dual-process gate produce the answer with a single System-2 refinement pass triggered when answer consistency is low or the routed flow is strained. On VisDoMBench, FLOWREADER is best on the two subsets dominated by fragmented evidence PaperTab ($58.40$, $+1.30$ over G^{2}-Reader) and SlideVQA ($72.93$, $+0.62$) and competitive on SPIQA, FetaTab, and SciGraphQA. Macro-averaged across all five subsets, FLOWREADER ($65.47$) is within $0.74$ of the strongest baseline (G^{2}-Reader, $66.21$). Overall, these results show that min-cost flow performs well on fragmented multimodal evidence, where top-$k$ retrieval fails. It also provides a unified way to control scoring, routing, selection, and adaptive compute together.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07235v2</guid>
      <category>cs.IR</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ambuj Mehrish, Sebastiano Vascon</dc:creator>
    </item>
    <item>
      <title>Constrained Dominant Sets for Multimodal Document Question Answering</title>
      <link>https://arxiv.org/abs/2606.07252</link>
      <description>arXiv:2606.07252v2 Announce Type: replace 
Abstract: Long multimodal document question answering is limited by which evidence reaches the reader, rather than by the quantity retrieved. In lengthy documents, findings often recur across figures, captions, and introductory sentences, causing similarity based retrievers in modern multimodal retrieval-augmented generation (RAG) systems to allocate resources to near-duplicates while overlooking complementary evidence. This work introduces a retriever that selects evidence as a Constrained Dominant Set (CDS) on a query-augmented affinity graph, offering three advantages that similarity ranking does not. First, the query is encoded as a hard structural constraint, ensuring that every selected element is directly connected to the question through the cluster anchor. Second, the relevance-redundancy balance is determined automatically by a spectral bound, eliminating the need for manually tuned trade offs required by diversity-aware selectors. Third, the selection process achieves a global equilibrium via replicator dynamics, thereby avoiding the distortions introduced by greedy heuristics. The method is inherently graph-based and does not require training. Using a Qwen3-VL-32B reader, CDS establishes a new state of the art on VisDoMBench ($66.99$ average) and improves over the no-retrieval baseline by $37.1$ points on VisDoMBench and $4.8$ on MMLongBench-Doc.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07252v2</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Ambuj Mehrish, Sebastiano Vascon</dc:creator>
    </item>
    <item>
      <title>Gated Bidirectional Linear Attention for Generative Retrieval</title>
      <link>https://arxiv.org/abs/2606.07317</link>
      <description>arXiv:2606.07317v2 Announce Type: replace 
Abstract: In recommender systems, generative retrieval typically uses an encoder-decoder setup: an encoder processes a user interaction history, and an autoregressive decoder then generates recommended items. In large-scale streaming services, active users accumulate very long histories over time. As histories grow, the encoder becomes a major latency bottleneck because softmax attention scales quadratically with sequence length. In our experiments, using bidirectional attention in the encoder substantially improves quality. However, most sub-quadratic attention methods focus on causal attention.
  We propose Gated Bidirectional Linear Attention (GBLA), a linear-time bidirectional attention layer that extends kernelized linear attention with three lightweight components: local causal mixing (Conv1D), sequence-level key gating for soft forgetting, and a gated RMSNorm output. On a large-scale Yandex Music dataset, a hybrid encoder that interleaves self-attention (SA) and GBLA in a 1:2 ratio (one SA block followed by two GBLA blocks) matches bidirectional self-attention quality. On H100 GPUs, GBLA reaches up to an $8.2\times$ single-layer speedup at a history length of 32768, compared to FlashAttention-v3. Finally, we show that the same hybrid design generalizes beyond our proprietary setting, consistently preserving self-attention retrieval quality on public Amazon benchmarks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07317v2</guid>
      <category>cs.IR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1145/3805712.3808495</arxiv:DOI>
      <dc:creator>Artem Matveev, Vladislav Tytskiy, Sergei Makeev, Sergei Liamaev</dc:creator>
    </item>
    <item>
      <title>Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests</title>
      <link>https://arxiv.org/abs/2606.07379</link>
      <description>arXiv:2606.07379v2 Announce Type: replace 
Abstract: A growing failure mode in agent evaluation and training is that models can achieve high evaluation scores by exploiting shortcuts instead of solving the intended task, producing deceptive performance. This makes evaluation scores unreliable as measures of true task-solving ability. We propose CapCode, a framework for constructing coding datasets with randomized tests whose best achievable non-cheating performance is deliberately capped below one. This capped-performance design gives evaluation scores a clearer interpretation: scores substantially above the cap are implausible and therefore provide evidence of cheating. To prevent cheating, we propose CapReward, a reward design based on the CapCode principle to discourage optimization beyond the cap. Experiments across multiple datasets show that CapCode detects cheating while preserving performance ranking of models, and CapReward reduces cheating behavior, yielding models that better follow the intended task specification.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07379v2</guid>
      <category>cs.LG</category>
      <category>cs.AI</category>
      <category>cs.CL</category>
      <category>stat.ME</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Thanawat Lodkaew, Johannes Ackermann, Soichiro Nishimori, Nontawat Charoenphakdee, Masashi Sugiyama, Takashi Ishida</dc:creator>
    </item>
    <item>
      <title>DisPOSE: Projected Polystochastic Diffusion for Self-Supervised Multi-View 3D Human Pose Estimation</title>
      <link>https://arxiv.org/abs/2606.07419</link>
      <description>arXiv:2606.07419v2 Announce Type: replace 
Abstract: Recovering 3D human poses for multiple individuals from different camera views is a fundamental bottleneck for analyzing interacting behaviors. Existing self-supervised approaches leverage synthetic catalogues of 3D poses; however, this leads to poor generalization in real-world scenarios due to distribution shifts. We therefore introduce DisPOSE, a self-supervised framework that approximates the inherently discrete multi-view person-assignment problem as a generative diffusion process over the space of polystochastic tensors. By employing differentiable Sinkhorn projections during denoising, our model learns to guide solutions toward valid and feasible assignments based on 2D image priors. The complete 3D skeletons of localized individuals are then regressed using a Hypergraph-Convolutional Decoder that explicitly models relational structures and articulated joints across multiple views. The proposed approach outperforms current state-of-the-art self-supervised methods on standard datasets and demonstrates strong performance on a newly proposed benchmark featuring highly occluded scenes from surgical operating rooms. Our diffusion-based localization demonstrates high label efficiency, retaining 99% of its performance with only 10% of the pseudo-labels. Notably, disentangling the assignment and root regression components while maintaining differentiability makes DisPOSE nearly agnostic to different camera arrangements.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07419v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tony Danjun Wang, Tolga Birdal, Nassir Navab, Lennart Bastian</dc:creator>
    </item>
    <item>
      <title>OpenGlass: Ultra-Low-Power On-Device AI Eyewear with Event-based Vision</title>
      <link>https://arxiv.org/abs/2606.07431</link>
      <description>arXiv:2606.07431v2 Announce Type: replace 
Abstract: Smart eyewear enables unobtrusive, context-aware interaction through multimodal sensors and on-device intelligence, but is severely limited by power, memory, and compute constraints in a compact form factor. Open-hardware platforms supporting event-based vision and embedded ML at this scale are rare. This work introduces an open-source smart glasses platform for rapid prototyping of novel sensors and algorithms. Its modular design uses a flexible FPC interposer to support both event-based and frame-based cameras without full PCB redesign. A hardware-software co-designed power management system combines a configurable PMIC with event-driven wake-up via an nRF5340 coordinator, keeping the GAP9 RISC-V SoC powered down between inferences. The prototype achieves up to 11.5 hours of continuous on-device ML from a 200 mAh battery. As a demonstration, an egocentric hand gesture recognition pipeline was evaluated on the LynX dataset using polarity-separated event histograms from a Prophesee GENX320 camera. R(2+1)D achieved the best cross-subject accuracy of 83.94\% (macro F1 = 0.781) under leave-two-subjects-out validation, with 78.3 ms end-to-end inference latency on the GAP9. Temporal augmentation and removal of ambiguous classes provided the largest gains (+8.9 pp). All hardware designs, firmware, and models are released open source.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07431v2</guid>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pietro Bonazzi, Julian Moosmann, Ahmet Celik, Philipp Mayer, Michele Magno</dc:creator>
    </item>
    <item>
      <title>The Lipreading Gap: Do VSR Models Perceive Visual Speech Like Human Lipreaders?</title>
      <link>https://arxiv.org/abs/2606.07435</link>
      <description>arXiv:2606.07435v2 Announce Type: replace 
Abstract: Visual speech recognition (VSR) models now surpass human lipreaders on benchmarks, but do such gains establish human-like visual speech perception? To explore this, we compare three VSR systems with human baselines on the MaFI word-level lipreading dataset using word, character, phoneme, and viseme-level metrics. Although models achieve higher overall accuracy, they succeed and fail on different words than humans. A text-only n-gram baseline given only a few initial phonemes rivals human lipreading. VSR word-level errors are consistently better explained by training word frequency than by the visual informativeness of words. Viseme accuracies, confusion matrices and human-model correlations further show that models gain most on visemes humans find hardest, and show much weaker dependence on visual clarity. Our work demonstrates that VSR systems rely primarily on language cues from training data rather than visual perception, failing to bind visual features into meaningful words.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07435v2</guid>
      <category>cs.CV</category>
      <category>cs.CL</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Rishabh Jain, Naomi Harte</dc:creator>
    </item>
    <item>
      <title>Strict stability of extension types</title>
      <link>https://arxiv.org/abs/2203.07194</link>
      <description>arXiv:2203.07194v5 Announce Type: replace-cross 
Abstract: The theory of $(\infty,1)$-categories can be developed synthetically in an augmentation of homotopy type theory introduced by Riehl--Shulman. Central to their development is an additional type forming operation called extensions. The original article sketches the semantics of this formal system, explaining how the simplicial homotopy theory can be used to reason about $(\infty,1)$-categories presented using the Segal space model. However, they leave it open to demonstrate the strict stability of extension types. We prove this using the splitting method of Voevodsky, later generalized by Lumsdaine--Warren to local universes. The practical upshot is that this system has semantics in simplicial objects of an $\infty$-topos, and thus can be used to prove theorems about internal $\infty$-categories in the sense of Martini--Wolf.</description>
      <guid isPermaLink="false">oai:arXiv.org:2203.07194v5</guid>
      <category>math.CT</category>
      <category>cs.LO</category>
      <category>math.LO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jonathan Weinberger</dc:creator>
    </item>
    <item>
      <title>Consensus-based adaptive sampling and approximation for high-dimensional energy landscapes</title>
      <link>https://arxiv.org/abs/2311.05009</link>
      <description>arXiv:2311.05009v5 Announce Type: replace-cross 
Abstract: We present a consensus-based framework that unifies phase space exploration with posterior-residual-based adaptive sampling for surrogate construction in high-dimensional energy landscapes. Unlike standard approximation tasks where sampling points can be freely queried, physical systems with complex energy landscapes such as molecular dynamics (MD) do not have direct access to arbitrary sampling regions due to the physical constraints and energy barriers; the surrogate construction further relies on the dynamical exploration of phase space, posing a significant numerical challenge. We formulate the problem as a minimax optimization that jointly adapts both the surrogate approximation and residual-enhanced sampling. The construction of free energy surfaces (FESs) for high-dimensional collective variables (CVs) of MD systems is used as a motivating example to illustrate the essential idea. Specifically, the maximization step establishes a stochastic interacting particle system to impose adaptive sampling through both exploitation of a Laplace approximation of the max-residual region and exploration of uncharted phase space via temperature control. The minimization step updates the FES surrogate with the new sample set. Numerical results demonstrate the effectiveness of the present approach for biomolecular systems with up to 30 CVs. While we focus on the FES construction, the developed framework is general for efficient surrogate construction for complex systems with high-dimensional energy landscapes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2311.05009v5</guid>
      <category>physics.comp-ph</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Liyao Lyu, Huan Lei</dc:creator>
    </item>
    <item>
      <title>Ordinals and recursively defined functions on the reals</title>
      <link>https://arxiv.org/abs/2311.17210</link>
      <description>arXiv:2311.17210v5 Announce Type: replace-cross 
Abstract: We determine sufficient conditions under which certain recursively defined functions are well defined for all real inputs. Given a function $f:\mathbb R\to\mathbb R$, call a decreasing sequence $x_1&gt;x_2&gt;x_3&gt;\cdots$ "$f$-bad" if $f(x_1)&gt;f(x_2)&gt;f(x_3)&gt;\cdots$, and call the function $f$ "ordinal decreasing" if there exist no infinite $f$-bad sequences. We prove the following result: Given ordinal decreasing functions $f,g_1,\ldots,g_k,s$ that are everywhere larger than $0$, define the recursive algorithm "$M(x)$: if $x&lt;0$ return $f(x)$, else return $g_1(-M(x-g_2(-M(x-\cdots-g_k(-M(x-s(x)))\cdots))))$". Then $M(x)$ halts and is ordinal decreasing for all $x \in \mathbb{R}$. The recursive algorithms $M$ and $M_n$ previously studied in the context of fusible numbers by Ericskon et al. (2022) and Bufetov et al. (2024), respectively, are special cases of this scheme.
  Moreover, given an ordinal decreasing function $f$, denote by $o(f)$ the ordinal height of the root of the tree of $f$-bad sequences. Then we prove that, for $k\ge 2$, the function $M(x)$ defined by the above algorithm satisfies $o(M)\le\varphi_{k-1}(\gamma+o(s)+1)$, where $\gamma$ is the smallest ordinal such that $\max\{o(s),o(f),o(g_1), \ldots, o(g_k)\} &lt;\varphi_{k-1}(\gamma)$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2311.17210v5</guid>
      <category>math.LO</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Gabriel Nivasch, Lior Shiboli</dc:creator>
    </item>
    <item>
      <title>Joint Channel Estimation and Cooperative Localization for Near-Field Ultra-Massive MIMO</title>
      <link>https://arxiv.org/abs/2312.13683</link>
      <description>arXiv:2312.13683v2 Announce Type: replace-cross 
Abstract: The next-generation wireless networks are envisioned to jointly support high-rate communications and ubiquitous sensing. Ultra-Massive Multiple-Input Multiple-Output (UM-MIMO) offers abundant spatial Degrees of Freedom (DoFs) for both functions, yet its large aperture shifts electromagnetic propagation into the near field, invalidating conventional far-field (plane-wave) assumptions. While near-field channel modeling has been studied, existing channel estimation methods are inadequate: on-grid designs suffer from non-orthogonal codebooks, and off-grid methods lack convergence guarantees, yielding unreliable estimates. Moreover, channel estimation and localization are typically designed in isolation, preventing the exchange of information that could otherwise enable mutual performance improvement. To address this difficulty, we propose a unified framework that exploits near-field characteristics to jointly design channel estimation and cooperative localization. Specifically, we develop a Variational Newtonized Near-field Channel Estimation (VNNCE) algorithm that extracts position-aware soft information from the channel, and a Gaussian Fusion Cooperative Localization (GFCL) method that leverages this information across multiple Base Stations (BSs) for enhanced accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2312.13683v2</guid>
      <category>eess.SP</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Ruoxiao Cao, Hengtao He, Xianghao Yu, Shenghui Song, Kaibin Huang, Jun Zhang, Yi Gong, Khaled B. Letaief</dc:creator>
    </item>
    <item>
      <title>Explicit numerical approximations for McKean-Vlasov stochastic differential equations in finite and infinite time</title>
      <link>https://arxiv.org/abs/2401.02878</link>
      <description>arXiv:2401.02878v5 Announce Type: replace-cross 
Abstract: Inspired by the stochastic particle method, this paper establishes an easily implementable explicit numerical method for McKean-Vlasov stochastic differential equations (MV-SDEs) with superlinear growth coefficients. The paper establishes the theory on the propagation of chaos in the $L^{q}$ sense. The optimal {uniform-in-time} strong convergence rate $1/2$-order of the numerical solutions is obtained for the interacting particle system. Furthermore, it is proved that the numerical solutions capture the long-term dynamical behaviors of MV-SDEs precisely, including moment boundedness, stability, and ergodicity. Moreover, a unique numerical invariant probability measure is yielded, which converges to the underlying invariant probability measure of MV-SDEs in the $L^2$-Wasserstein distance. Finally, several numerical experiments are carried out to illustrate the main results.</description>
      <guid isPermaLink="false">oai:arXiv.org:2401.02878v5</guid>
      <category>math.PR</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuanping Cui, Xiaoyue Li, Yi Liu, Fengyu Wang</dc:creator>
    </item>
    <item>
      <title>Spectral Truncation Kernels: Noncommutativity in $C^*$-algebraic Kernel Machines</title>
      <link>https://arxiv.org/abs/2405.17823</link>
      <description>arXiv:2405.17823v5 Announce Type: replace-cross 
Abstract: A central question in vector- and function-valued learning is how to design kernels that capture both local and non-local interactions while remaining computationally tractable. Existing operator-valued kernels offer only partial answers: separable kernels are efficient but fail to model interactions across the function domain, while commutative kernels capture only pointwise structure. To address this, we propose spectral truncation kernels, a new class of positive definite kernels for vector- and function-valued learning based on spectral truncation and $C^*$-algebra. By allowing noncommutative products in the kernel construction, the proposed kernels induce interactions across the data function domain and fill the gap between existing separable and commutative kernels. In addition, by using the $C^*$-algebraic framework, we reduce the computational cost compared to the existing vector-valued RKHS framework with operator-valued kernels.</description>
      <guid isPermaLink="false">oai:arXiv.org:2405.17823v5</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>math.OA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yuka Hashimoto, Ayoub Hafid, Masahiro Ikeda, Hachem Kadri</dc:creator>
    </item>
    <item>
      <title>Phase transition in large language models and the criticality of natural languages</title>
      <link>https://arxiv.org/abs/2406.05335</link>
      <description>arXiv:2406.05335v3 Announce Type: replace-cross 
Abstract: Generation of text and speech in natural languages can be modeled as a stochastic process. This idea dates back to the seminal work of Markov and, later, to that of Shannon and also underlies the recent development of large language models (LLMs). The stochastic processes corresponding to natural languages should be distinct from those that generate nonlinguistic sequences. One of the features that discriminate linguistic and nonlinguistic sequences is power-law behavior, which is universally observed across different languages. In statistical physics, such behavior suggests that natural languages are critical: They lie near a phase transition point in a parametrized space of stochastic processes. However, testing this conjecture is not straightforward. A phase transition, even if it exists, cannot be directly observed in real-world natural languages because they do not have any controllable parameters. Here, we use LLMs as controllable effective models of natural languages. Through statistical analyses of texts generated by LLMs, we find that, when a parameter analogous to physical temperature is varied, LLMs undergo a phase transition. The transition separates a low-temperature phase with complex repetitive structures in generated texts from a high-temperature phase in which LLMs generate incomprehensible texts. At the critical point between these phases, generated texts display the power-law behavior similar to that of natural languages and most closely resemble natural languages as measured by a standard metric in natural language processing. These findings strongly suggest that natural languages are indeed critical.</description>
      <guid isPermaLink="false">oai:arXiv.org:2406.05335v3</guid>
      <category>cond-mat.dis-nn</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima</dc:creator>
    </item>
    <item>
      <title>SPIRONet: Spatial-Frequency Learning and Graph-based Channel Interaction Network for Vessel Segmentation</title>
      <link>https://arxiv.org/abs/2406.19749</link>
      <description>arXiv:2406.19749v2 Announce Type: replace-cross 
Abstract: Automatic vessel segmentation plays a pivotal role in the development of next-generation interventional navigation systems for surgical robotics. However, current approaches still suffer from suboptimal segmentation performance under challenging intraoperative conditions, such as low-signal-to-noise ratio (SNR), small or slender vessels, and strong interference. In this study, a novel spatial-frequency learning and graph-based channel interaction network (SPIRONet) is proposed to address the above issues. To address low-SNR vessel appearance and small or slender branches, dual spatial-frequency encoders are utilized, where the frequency encoder captures global vessel continuity that is less affected by local noise fluctuations, while the spatial encoder preserves fine vessel details. A cross-attention fusion module is further introduced to adaptively integrate this complementary spatial and frequency information. Moreover, to suppress interference from non-target vessels and vessel-like structures, a graph-based channel interaction module is designed to model channel-wise correlations, enhancing consistent vessel-related responses while suppressing task-irrelevant activations. Extensive experimental results on five challenging datasets demonstrate that the proposed method achieves competitive and consistently strong performance compared with existing methods. For example, SPIRONet achieves IoU improvements of +0.87%, +0.52%, +0.23%, +1.39%, and +2.22% over the strongest competing methods on CADSA, CAXF, DCA1, XCAD, and ARCADE, respectively. Moreover, SPIRONet achieves an inference speed of 21 FPS with a 512x512 input size, meeting the real-time requirements of interventional scenarios (6-12 FPS). These promising results indicate SPIRONet's potential for integration into interventional navigation systems. Code is available at https://github.com/Dxhuang-CASIA/SPIRONet.</description>
      <guid isPermaLink="false">oai:arXiv.org:2406.19749v2</guid>
      <category>eess.IV</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Bo-Xian Yao, Zeng-Guang Hou</dc:creator>
    </item>
    <item>
      <title>Entropic Optimal Transport Eigenmaps for Nonlinear Alignment and Joint Embedding of High-Dimensional Datasets</title>
      <link>https://arxiv.org/abs/2407.01718</link>
      <description>arXiv:2407.01718v2 Announce Type: replace-cross 
Abstract: Embedding high-dimensional data into a low-dimensional space is an indispensable component of data analysis. In numerous applications, it is necessary to align and jointly embed multiple datasets from different studies or experimental conditions. Such datasets may share underlying structures of interest but exhibit individual distortions, resulting in misaligned embeddings using traditional techniques. In this work, we propose Entropic Optimal Transport (EOT) eigenmaps, a principled approach for aligning and jointly embedding a pair of datasets with theoretical guarantees. Our approach leverages the leading singular vectors of the EOT plan matrix between two datasets to extract their shared underlying structure and align them in a common embedding space. We interpret our approach as an inter-data variant of the classical Laplacian eigenmaps and diffusion maps embeddings, showing that it enjoys many favorable analogous properties. We analyze a generative model in which two observed high-dimensional datasets share latent variables supported on a common low-dimensional manifold, while each dataset is subject to translation, geometric distortion, orthogonal nuisance structure, and noise. In a large-sample, high-dimensional regime, we prove that the EOT plan concentrates around a population kernel on an effective manifold determined by the geometric mean of the distortions, with invariance to translations, orthogonal nuisance structure, and noise. Subsequently, we relate our embedding to eigenfunctions of population-level operators encoding the density and geometry of the shared manifold. Finally, we showcase the performance of our approach for data integration and embedding through simulations and analyses of real-world biological data, demonstrating its advantages over alternative methods in challenging scenarios.</description>
      <guid isPermaLink="false">oai:arXiv.org:2407.01718v2</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>math.ST</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Boris Landa, Yuval Kluger, Rong Ma</dc:creator>
    </item>
    <item>
      <title>Robust Random Graph Matching in Dense Graphs via an Approximate Message Passing Type Algorithm</title>
      <link>https://arxiv.org/abs/2412.16457</link>
      <description>arXiv:2412.16457v3 Announce Type: replace-cross 
Abstract: In this paper, we focus on the matching recovery problem between a pair of correlated Gaussian Wigner matrices with a latent vertex correspondence. We are particularly interested in a robust version of this problem such that our observation is a perturbed input $(A+E,B+F)$ where $(A,B)$ is a pair of correlated Gaussian Wigner matrices and $E,F$ are adversarially chosen matrices supported on an unknown $\epsilon n * \epsilon n$ principal minor of $A,B$, respectively. We propose an approximate message passing (AMP) type iterative algorithm that succeeds in polynomial time as long as the correlation $\rho$ between $(A,B)$ is a non-vanishing constant and $\epsilon = o\big( \tfrac{1}{(\log n)^{20}} \big)$. A key distinction from standard AMP is the introduction of a time-dependent matrix multiplication step within the iteration, which simultaneously enlarges the feature dimension and cancels the correlation during the iteration.
  The main methodological inputs for our result are the iterative random graph matching algorithm proposed in \cite{DL22+, DL23+} and the spectral preprocessing procedure proposed in \cite{IS24+}. To the best of our knowledge, our algorithm is the first efficient random graph matching type algorithm that is robust under any adversarial perturbations of $n^{1-o(1)}$ size.</description>
      <guid isPermaLink="false">oai:arXiv.org:2412.16457v3</guid>
      <category>stat.ML</category>
      <category>cs.DS</category>
      <category>cs.LG</category>
      <category>math.PR</category>
      <category>math.ST</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Zhangsong Li</dc:creator>
    </item>
    <item>
      <title>Complement or substitute? How AI increases the demand for human skills</title>
      <link>https://arxiv.org/abs/2412.19754</link>
      <description>arXiv:2412.19754v4 Announce Type: replace-cross 
Abstract: Artificial Intelligence (AI) is transforming the nature of work, yet there is limited empirical evidence on how it affects demand for human skills. This paper examines whether AI adoption increases the prevalence and value of human capabilities that complement technical AI skills, such as analytical thinking, resilience, or ethical judgment, within and beyond AI-intensive job roles. Using a dataset of nearly 30 million job postings from the US, the UK and Australia, between 2018 and 2024, we distinguish between internal effects (within AI roles) and external effects (in non-AI roles) across companies, industries, and regions. This paper has three main findings. First, we find that AI-intensive roles are significantly more likely to require complementary non-technical capabilities, such as analytical thinking, resilience, and digital literacy. Second, these complementary skills are associated with meaningful wage premiums, particularly in managerial, sales or finance roles working with AI. Third, we show that AI diffusion has potential spillover effects: as AI adoption rises within companies, industries, and regions, demand for complementary skills increases even in non-AI roles while demand for substitutable skills - summarisation, translation or customer service - decreases. These trends hold across geographies, including the United States, United Kingdom, and Australia, confirming the robustness of our findings. Together, these findings indicate that AI is not simply replacing tasks or requiring more AI developer skills; it may be transforming workforce skill requirements to favor human attributes that enhance collaboration with intelligent systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2412.19754v4</guid>
      <category>econ.GN</category>
      <category>cs.AI</category>
      <category>q-fin.EC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Elina M\"akel\"a, Matthew Bone, Mareike Sehrer, Farah Nanji, Fabian Stephany</dc:creator>
    </item>
    <item>
      <title>Structures preserved by primitive actions of $S_\omega$</title>
      <link>https://arxiv.org/abs/2501.03789</link>
      <description>arXiv:2501.03789v4 Announce Type: replace-cross 
Abstract: We present a dichotomy for structures $A$ that are preserved by primitive actions of $S_{\omega} = \text{Sym}({\mathbb N})$: such a structure primitively positively constructs all finite structures and the constraint satisfaction problem is NP-complete, or the constraint satisfaction problem for $A$ is in P. To prove our result, we study the first-order reducts of the Johnson graph $J(k)$, for $k \geq 2$, whose automorphism group $G$ equals the action of $\text{Sym}({\mathbb N})$ on the set $V$ of $k$-element subsets of $\mathbb N$. We use the fact that $J(k)$ has a finitely bounded homogeneous Ramsey expansion and that $G$ is a maximal closed subgroup of $\text{Sym}(V)$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.03789v4</guid>
      <category>math.LO</category>
      <category>cs.CC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Manuel Bodirsky, Bertalan Bodor</dc:creator>
    </item>
    <item>
      <title>A generalizable 3D framework and model for self-supervised learning in medical imaging</title>
      <link>https://arxiv.org/abs/2501.11755</link>
      <description>arXiv:2501.11755v2 Announce Type: replace-cross 
Abstract: Current self-supervised learning methods for 3D medical imaging rely on simple pretext formulations and organ- or modality-specific datasets, limiting their generalizability and scalability. We present 3DINO, a cutting-edge SSL method adapted to 3D datasets, and use it to pretrain 3DINO-ViT: a general-purpose medical imaging model, on an exceptionally large, multimodal, and multi-organ dataset of ~100,000 3D medical imaging scans from over 10 organs. We validate 3DINO-ViT using extensive experiments on numerous medical imaging segmentation and classification tasks. Our results demonstrate that 3DINO-ViT generalizes across modalities and organs, including out-of-distribution tasks and datasets, outperforming state-of-the-art methods on the majority of evaluation metrics and labeled dataset sizes. Our 3DINO framework and 3DINO-ViT will be made available to enable research on 3D foundation models or further finetuning for a wide range of medical imaging applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2501.11755v2</guid>
      <category>eess.IV</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1038/s41746-025-02035-w</arxiv:DOI>
      <dc:creator>Tony Xu, Sepehr Hosseini, Chris Anderson, Anthony Rinaldi, Rahul G. Krishnan, Anne L. Martel, Maged Goubran</dc:creator>
    </item>
    <item>
      <title>A two-disk approach to the synthesis of coherent passive equalizers for linear quantum systems</title>
      <link>https://arxiv.org/abs/2502.01332</link>
      <description>arXiv:2502.01332v3 Announce Type: replace-cross 
Abstract: The coherent equalization problem consists in designing a quantum system acting as a mean-square near-optimal filter for a given quantum communication channel. The paper develops an improved method for the synthesis of transfer functions for such equalizing filters, based on a linear quantum system model of the channel. The method draws on a connection with the two-disk problem of ${H}_{\infty}$ control for classical (i.e., non-quantum) linear uncertain systems. Compared with the previous methods, the proposed method applies to a broader class of linear quantum communication channels.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.01332v3</guid>
      <category>quant-ph</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <category>math.OC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Valery Ugrinovskii, Shuixin Xiao</dc:creator>
    </item>
    <item>
      <title>Spectral Methods in Microeconomics</title>
      <link>https://arxiv.org/abs/2502.12309</link>
      <description>arXiv:2502.12309v2 Announce Type: replace-cross 
Abstract: Matrices often appear in formal models of social and economic behavior, especially models involving networks. Such models are used to study subjects ranging from opinion dynamics to pollution-mitigation negotiations to the regulation of large marketplace platforms. Matrices are used to capture the focal economic structure in each case. Spectral theory offers powerful tools for understanding matrices, and economic modelers have leveraged these tools to gain considerable insight. When special structure is present, such as nonnegativity or symmetry, more refined tools suited to this structure -- such as Perron--Frobenius theory and the spectral theorem -- offer additional leverage. This essay uses these unifying mathematical threads to offer an accessible tour of several important ideas in social science, assuming minimal non-mathematical background knowledge. Though the introductions to each topic are necessarily brief, the tour cites references throughout for more context.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.12309v2</guid>
      <category>econ.TH</category>
      <category>cs.SI</category>
      <category>math.HO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Benjamin Golub</dc:creator>
    </item>
    <item>
      <title>Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling</title>
      <link>https://arxiv.org/abs/2502.15131</link>
      <description>arXiv:2502.15131v4 Announce Type: replace-cross 
Abstract: We study the fundamental problem of calibrating a linear binary classifier of the form $\sigma(\hat{w}^\top x)$, where the feature vector $x$ is Gaussian, $\sigma$ is a link function, and $\hat{w}$ is an estimator of the true linear weight $w^\star$. By interpolating with a noninformative $\textit{chance classifier}$, we construct a well-calibrated predictor whose interpolation weight depends on the angle $\angle(\hat{w}, w_\star)$ between the estimator $\hat{w}$ and the true linear weight $w_\star$. We establish that this angular calibration approach is provably well-calibrated in a high-dimensional regime where the number of samples and features both diverge, at a comparable rate. The angle $\angle(\hat{w}, w_\star)$ can be consistently estimated. Furthermore, the resulting predictor is uniquely $\textit{Bregman-optimal}$, minimizing the Bregman divergence to the true label distribution within a suitable class of calibrated predictors. Our work is the first to provide a calibration strategy that satisfies both calibration and optimality properties provably in high dimensions. Additionally, we identify conditions under which a classical Platt-scaling predictor converges to our Bregman-optimal calibrated solution. Thus, Platt-scaling also inherits these desirable properties provably in high dimensions.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.15131v4</guid>
      <category>math.ST</category>
      <category>cs.LG</category>
      <category>stat.ME</category>
      <category>stat.ML</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yufan Li, Pragya Sur</dc:creator>
    </item>
    <item>
      <title>Qubit-Efficient Quantum Annealing for Stochastic Unit Commitment</title>
      <link>https://arxiv.org/abs/2502.15917</link>
      <description>arXiv:2502.15917v3 Announce Type: replace-cross 
Abstract: Stochastic Unit Commitment (SUC) has been proposed to manage the uncertainties driven by renewable integration, but it leads to significant computational complexity. When accelerated by Benders Decomposition (BD), the master problem becomes binary integer programming, which is still NP-hard and computationally demanding for classical methods. Quantum Annealing (QA), known for efficiently solving Quadratic Unconstrained Binary Optimization (QUBO) problems, presents a potential solution. However, existing quantum algorithms rely on slack variables to handle linear binary inequality constraints, leading to increased qubit consumption and reduced computational efficiency. To solve the problem, this paper introduces the Powell-Hestenes-Rockafellar Augmented Lagrangian Multiplier (PHR-ALM) method to eliminate the need for slack variables, making qubit consumption independent of the increasing number of Benders cuts. To further reduce the qubit overhead, quantum ADMM is applied to break large-scale SUC into smaller blocks for sequential solutions, which does not scale with the number of generators. Finally, the simulation results on both 4-generator and the IEEE bus-118 systems demonstrate the feasibility and scalability of the proposed algorithm, indicating its superior qubit and runtime efficiency over classical and baseline quantum approaches on the D-Wave QPU platform.</description>
      <guid isPermaLink="false">oai:arXiv.org:2502.15917v3</guid>
      <category>quant-ph</category>
      <category>cs.ET</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wei Hong, Wangkun Xu, Fei Teng</dc:creator>
    </item>
    <item>
      <title>Brain2Text Decoding Model Reveals the Neural Mechanisms of Visual Semantic Processing</title>
      <link>https://arxiv.org/abs/2503.22697</link>
      <description>arXiv:2503.22697v3 Announce Type: replace-cross 
Abstract: Decoding sensory experiences from neural activity to reconstruct human-perceived visual stimuli and semantic content remains a challenge in neuroscience and artificial intelligence. Despite notable progress in current brain decoding models, a critical gap still persists in their systematic integration with established neuroscientific theories and the exploration of underlying neural mechanisms. Here, we present a novel framework that directly decodes fMRI signals into textual descriptions of viewed natural images. Our novel deep learning model, trained without visual information, achieves state-of-the-art semantic decoding performance, generating meaningful captions that capture the core semantic content of complex scenes. Neuroanatomical analysis reveals the critical role of higher-level visual cortices, including MT+ complex, ventral stream visual cortex, and inferior parietal cortex, in visual semantic processing. Furthermore, category-specific analysis demonstrates nuanced neural representations for semantic dimensions like animacy and motion. This work provides a more direct and interpretable framework to the brain's semantic decoding, offering a powerful new methodology for probing the neural basis of complex semantic processing, refining the understanding of the distributed semantic network, and potentially developing brain-inspired language models.</description>
      <guid isPermaLink="false">oai:arXiv.org:2503.22697v3</guid>
      <category>q-bio.NC</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Feihan Feng, Jingxin Nie</dc:creator>
    </item>
    <item>
      <title>Hyperflux: Pruning Reveals Importance</title>
      <link>https://arxiv.org/abs/2504.05349</link>
      <description>arXiv:2504.05349v4 Announce Type: replace-cross 
Abstract: Network pruning is used to reduce inference latency and power consumption in large neural networks. However, most methods focus on empirical results at the expense of understanding the pruning process. We introduce Hyperflux, a novel $L_0$ method which models pruning as a continuously evolving system determined by flux, the gradient response to a weight's removal, and pressure, a global regularization driving weights toward pruning. By exploiting this model, Hyperflux's pruning behavior becomes understandable at both microscopic (weight regrowth/pruning) and macroscopic (sparsity convergence, etc.) levels. We also introduce a novel pressure scheduler that reliably targets desired sparsities. Hyperflux achieves competitive results with ResNet-50, VGG-19 and DeiT-T/S on CIFAR-10, CIFAR-100 and ImageNet datasets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2504.05349v4</guid>
      <category>stat.ML</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Eugen Barbulescu, Antonio Alexoaie, Lucian Busoniu</dc:creator>
    </item>
    <item>
      <title>Statistical Decision Theory with Counterfactual Loss</title>
      <link>https://arxiv.org/abs/2505.08908</link>
      <description>arXiv:2505.08908v3 Announce Type: replace-cross 
Abstract: Many researchers apply classical statistical decision theory to evaluate treatment choices and learn optimal policies. However, because this framework relies solely on realized outcomes under chosen actions and ignores counterfactuals, it cannot assess the quality of a decision relative to feasible alternatives at the unit level, which is an important requirement in some settings. For example, in pretrial bail decisions, a judge must balance crime prevention upon release against the risk of imposing unnecessary burdens on arrestees. A central challenge in this framework is identification: since only one potential outcome is observed per unit, counterfactual risk is typically not identifiable. We show that, under strong ignorability, counterfactual risk is identifiable if and only if the loss is additive in the potential outcomes. We further demonstrate that additive counterfactual losses can yield treatment recommendations that differ from those based on standard losses when more than two treatment options are available. We show that additive counterfactual losses capture not only decision accuracy but also decision difficulty, whereas standard losses reflect accuracy alone. Finally, we introduce a symbolic linear inverse program that determines whether a given counterfactual loss yields an identifiable risk, without requiring data.</description>
      <guid isPermaLink="false">oai:arXiv.org:2505.08908v3</guid>
      <category>math.ST</category>
      <category>cs.LG</category>
      <category>econ.TH</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Benedikt Koch, Kosuke Imai</dc:creator>
    </item>
    <item>
      <title>On the Wasserstein Geodesic Principal Component Analysis of probability measures</title>
      <link>https://arxiv.org/abs/2506.04480</link>
      <description>arXiv:2506.04480v2 Announce Type: replace-cross 
Abstract: This paper focuses on Geodesic Principal Component Analysis (GPCA) on a collection of probability distributions using the Otto-Wasserstein geometry. The goal is to identify geodesic curves in the space of probability measures that best capture the modes of variation of the underlying dataset. We first address the case of a collection of Gaussian distributions, and show how to lift the computations in the space of invertible linear maps. For the more general setting of absolutely continuous probability measures, we leverage a novel approach to parameterizing geodesics in Wasserstein space with neural networks. Finally, we compare to classical tangent PCA through various examples and provide illustrations on real-world datasets.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.04480v2</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>stat.ME</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Nina Vesseron, Elsa Cazelles, Alice Le Brigant, Thierry Klein</dc:creator>
    </item>
    <item>
      <title>Accurate identification of communication between multiple interacting neural populations</title>
      <link>https://arxiv.org/abs/2506.19094</link>
      <description>arXiv:2506.19094v5 Announce Type: replace-cross 
Abstract: Neural recording technologies now enable simultaneous recording of population activity across many brain regions, motivating the development of data-driven models of inter-regional communication. However, existing models can struggle to disentangle the influences that drive recorded population activity, leading to inaccurate portraits of communication. Here, we introduce Multi-Region Latent Factor Analysis via Dynamical Systems (MR-LFADS), a sequential variational autoencoder designed to disentangle inter-regional communication, inputs from unobserved regions, and local neural population dynamics. We show that MR-LFADS outperforms existing approaches at identifying communication across dozens of simulations of task-trained multi-region networks. When applied to large-scale electrophysiology, MR-LFADS predicts brain-wide effects of circuit perturbations that were held out during model fitting. These validations on synthetic and real neural data position MR-LFADS as a promising tool for discovering principles of brain-wide information processing.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.19094v5</guid>
      <category>q-bio.NC</category>
      <category>cs.CE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <arxiv:journal_reference>Forty-second International Conference on Machine Learning (2025)</arxiv:journal_reference>
      <dc:creator>Belle Liu, Jacob Sacks, Matthew D. Golub</dc:creator>
    </item>
    <item>
      <title>LARP: Learner-Agnostic Robust Data Prefiltering</title>
      <link>https://arxiv.org/abs/2506.20573</link>
      <description>arXiv:2506.20573v4 Announce Type: replace-cross 
Abstract: Public datasets, crucial for modern machine learning and statistical inference, often contain low-quality or contaminated samples that can harm model performance. This creates a need for principled prefiltering procedures that a data provider can apply to protect the accuracy of a range of potential downstream statistical and learning procedures simultaneously. In this work, we formalize and analyze Learner-Agnostic Robust data Prefiltering (LARP), the problem of designing prefiltering procedures with guarantees on the worst-case loss over a pre-specified set of learners. We establish the feasibility of LARP in two theoretical settings, by providing upper-bound guarantees on the worst-case loss. Our theoretical results indicate that protecting heterogeneous learner sets via LARP comes at the price of some performance loss compared to individual, learner-specific prefiltering; we call this gap the price of LARP. To assess this gap in performance, we empirically measure the price of LARP across image and tabular tasks. We further explore potential benefits of LARP from the perspective of saving on repeated data curation efforts, in a game-theoretic model where the downstream learners can split the cost of the single prefiltering.</description>
      <guid isPermaLink="false">oai:arXiv.org:2506.20573v4</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kristian Minchev, Dimitar I. Dimitrov, Nikola Konstantinov</dc:creator>
    </item>
    <item>
      <title>Disentangled Feature Importance</title>
      <link>https://arxiv.org/abs/2507.00260</link>
      <description>arXiv:2507.00260v3 Announce Type: replace-cross 
Abstract: When predictors are statistically dependent, the appropriate definition of feature importance depends on the operational goal. Conditional-incremental measures are well-suited for feature selection, acquisition, and compression, where shared predictive information is treated as redundancy. For post-hoc interpretation, however, the goal is often to attribute predictive signals across correlated measurement channels. We introduce Disentangled Feature Importance (DFI), a population-level attribution framework for this setting. DFI maps covariates to an independent latent representation under a specified entropic optimal transport geometry, computes latent importance, and attributes it back to the original covariates through barycentric sensitivities. We show that broad conditional-incremental FI functionals target conditional incremental predictive value under squared-error loss, and therefore answer a different question from attribution of shared predictive signal under dependence. Under fixed transport cost, reference law, and regularization level, DFI defines a well-specified family of estimands. Latent scores admit a functional ANOVA interpretation, and in the Gaussian linear case, the attributed DFI recovers the classical $R^2$ decomposition for correlated regressors. We derive influence-function-based inference under nuisance-rate and smoothness conditions, and show in simulations and an HIV-1 neutralization-resistance analysis that DFI yields stable, interpretable, uncertainty-quantified attributions of shared predictive signal.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.00260v3</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>math.ST</category>
      <category>stat.ME</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Jin-Hong Du, Kathryn Roeder, Larry Wasserman</dc:creator>
    </item>
    <item>
      <title>AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model</title>
      <link>https://arxiv.org/abs/2507.08920</link>
      <description>arXiv:2507.08920v4 Announce Type: replace-cross 
Abstract: We introduce AMix-1, a powerful protein foundation model built on Bayesian Flow Networks and empowered by a systematic training methodology, encompassing pretraining scaling laws, emergent capability analysis, in-context learning mechanism, and test-time scaling algorithm. To guarantee robust scalability, we establish a predictive scaling law and reveal the progressive emergence of structural understanding via loss perspective, culminating in a strong 1.7-billion model. Building on this foundation, we devise a multiple sequence alignment (MSA)-based in-context learning strategy to unify protein design into a general framework, where AMix-1 recognizes deep evolutionary signals among MSAs and consistently generates structurally and functionally coherent proteins. This framework enables the successful design of a dramatically improved AmeR variant with an up to $50\times$ activity increase over its wild type. Pushing the boundaries of protein engineering, we further empower AMix-1 with an evolutionary test-time scaling algorithm for in silico directed evolution that delivers substantial, scalable performance gains as verification budgets are intensified, laying the groundwork for next-generation lab-in-the-loop protein design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.08920v4</guid>
      <category>q-bio.BM</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Changze Lv, Jiang Zhou, Siyu Long, Lihao Wang, Jiangtao Feng, Dongyu Xue, Yu Pei, Hao Wang, Zherui Zhang, Yuchen Cai, Zhiqiang Gao, Ziyuan Ma, Jiakai Hu, Chaochen Gao, Jingjing Gong, Yuxuan Song, Shuyi Zhang, Xiaoqing Zheng, Deyi Xiong, Lei Bai, Wanli Ouyang, Ya-Qin Zhang, Wei-Ying Ma, Bowen Zhou, Hao Zhou</dc:creator>
    </item>
    <item>
      <title>Generalized cluster algorithms for Potts lattice gauge theory</title>
      <link>https://arxiv.org/abs/2507.13503</link>
      <description>arXiv:2507.13503v2 Announce Type: replace-cross 
Abstract: Monte Carlo algorithms, like the Swendsen-Wang and invaded-cluster, sample the Ising and Potts models asymptotically faster than single-spin Glauber dynamics do. Here, we generalize both algorithms to sample Potts lattice gauge theory by way of a $2$-dimensional cellular representation called the plaquette random-cluster model. The invaded-cluster algorithm targets Potts lattice gauge theory at criticality by implementing a stopping condition defined in terms of homological percolation, the emergence of spanning surfaces on the torus. Simulations for $\mathbb Z(2)$ and $\mathbb Z(3)$ lattice gauge theories on the cubical $4$-dimensional torus indicate that both generalized algorithms exhibit much faster autocorrelation decay than single-spin dynamics and allow for efficient sampling on $4$-dimensional tori of linear scale at least $40$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.13503v2</guid>
      <category>cond-mat.stat-mech</category>
      <category>cs.CG</category>
      <category>math-ph</category>
      <category>math.MP</category>
      <category>math.PR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Anthony E. Pizzimenti, Paul Duncan, Benjamin Schweinhart</dc:creator>
    </item>
    <item>
      <title>Locally Adaptive Conformal Inference for Operator Models</title>
      <link>https://arxiv.org/abs/2507.20975</link>
      <description>arXiv:2507.20975v5 Announce Type: replace-cross 
Abstract: Operator models are regression algorithms between Banach spaces of functions. They have become an increasingly critical tool for spatiotemporal forecasting and physics emulation, especially in high-stakes scenarios where robust, calibrated uncertainty quantification is required. We introduce Local Sliced Conformal Inference (LSCI), a distribution-free framework for generating function-valued, locally adaptive prediction sets for operator models. We prove finite-sample validity and derive a data-dependent upper bound on the coverage gap under local exchangeability. On synthetic Gaussian-process tasks and real applications (air quality monitoring, energy demand forecasting, and weather prediction), LSCI yields tighter sets with stronger adaptivity compared to conformal baselines. We also empirically demonstrate robustness against biased predictions and certain out-of-distribution noise regimes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2507.20975v5</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Trevor Harris, Yan Liu</dc:creator>
    </item>
    <item>
      <title>Decentralized Online Riemannian Optimization Beyond Hadamard Manifolds</title>
      <link>https://arxiv.org/abs/2509.07779</link>
      <description>arXiv:2509.07779v2 Announce Type: replace-cross 
Abstract: We study decentralized online Riemannian optimization over manifolds with possibly positive curvature, going beyond the Hadamard manifold setting. Decentralized optimization techniques rely on a consensus step that is well understood in Euclidean spaces because of their linearity. However, in positively curved Riemannian spaces, a main technical challenge is that geodesic distances may not induce a globally convex structure. In this work, we first analyze a curvature-aware Riemannian consensus step that enables a linear convergence beyond Hadamard manifolds. Building on this step, we establish a $O(\sqrt{T})$ regret bound for the decentralized online Riemannian gradient descent algorithm. Then, we investigate the two-point bandit feedback setup, where we employ computationally efficient gradient estimators using smoothing techniques, and we demonstrate the same $O(\sqrt{T})$ regret bound through the subconvexity analysis of smoothed objectives.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.07779v2</guid>
      <category>math.OC</category>
      <category>cs.LG</category>
      <category>cs.MA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Emre Sahinoglu, Shahin Shahrampour</dc:creator>
    </item>
    <item>
      <title>Geometric Analysis of Magnetic Labyrinthine Stripe Evolution via Deep Learning Segmentation</title>
      <link>https://arxiv.org/abs/2509.11485</link>
      <description>arXiv:2509.11485v3 Announce Type: replace-cross 
Abstract: Labyrinthine stripe patterns are common in many physical systems, yet their lack of long-range order makes quantitative characterization challenging. We investigate the evolution of such patterns in bismuth-doped yttrium iron garnet (Bi:YIG) films subjected to a magnetic field annealing protocol. A U-Net deep learning model, trained with synthetic degradations including additive white Gaussian and Simplex noise, enables robust segmentation of experimental magneto-optical images despite noise and occlusions. Building on this segmentation, we develop a geometric analysis pipeline based on skeletonization, graph mapping, and spline fitting, which quantifies local stripe propagation through length and curvature measurements. Applying this framework to 444 images from 12 annealing protocol trials, we analyze the transition from the "quenched" state to a more parallel and coherent "annealed" state, and identify two distinct evolution modes (Type A and Type B) linked to field polarity. Our results provide a quantitative analysis of geometric and topological properties in magnetic stripe patterns and offer new insights into their local structural evolution, and establish a general tool for analyzing complex labyrinthine systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2509.11485v3</guid>
      <category>cond-mat.mtrl-sci</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Vin\'icius Yu Okubo, Kotaro Shimizu, B. S. Shivaram, Gia-Wei Chern, Hae Yong Kim</dc:creator>
    </item>
    <item>
      <title>Quantum feature-map learning with reduced resource overhead</title>
      <link>https://arxiv.org/abs/2510.03389</link>
      <description>arXiv:2510.03389v2 Announce Type: replace-cross 
Abstract: Current quantum computers require algorithms that use limited resources economically. In quantum machine learning, success hinges on quantum feature-maps, which embed classical data into the state space of qubits. We introduce Quantum Feature-Map Learning via Analytic Iterative Reconstructions (Q-FLAIR), an algorithm that reduces quantum resource overhead in iterative feature-map circuit construction. It shifts workloads to a classical computer via partial analytic reconstructions of the quantum model, using only a few evaluations. For each probed gate addition to the ansatz, the simultaneous selection and optimization of the data feature and weight parameter is then entirely classical. Integrated into quantum neural network and quantum kernel support vector classifiers, Q-FLAIR shows state-of-the-art benchmark performance. Since resource overhead decouples from feature dimension, we train a quantum model on a real IBM device in only four hours, surpassing 90% accuracy on the full-resolution MNIST dataset (784 features, digits 3 vs 5). Such results were previously unattainable, as the feature dimension prohibitively drives hardware demands for fixed and search costs for adaptive ans\"atze. Furthermore, Q-FLAIR demonstrates de-quantization robustness against direct classical modeling, satisfying a benchmark rare in the literature and a necessary condition for potential quantum advantage. By rethinking feature-map learning beyond black-box optimization, this work takes a concrete step toward enabling quantum machine learning for real-world problems and near-term quantum computers.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.03389v2</guid>
      <category>quant-ph</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <arxiv:DOI>10.1103/v29j-rh32</arxiv:DOI>
      <arxiv:journal_reference>Phys. Rev. Research 8(2), 023247 (2026)</arxiv:journal_reference>
      <dc:creator>Jonas J\"ager, Philipp Els\"asser, Elham Torabian</dc:creator>
    </item>
    <item>
      <title>UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models</title>
      <link>https://arxiv.org/abs/2510.04593</link>
      <description>arXiv:2510.04593v3 Announce Type: replace-cross 
Abstract: Large language models (LLMs) have demonstrated promising performance in both automatic speech recognition (ASR) and text-to-speech (TTS) systems, gradually becoming the mainstream approach. However, most current approaches address these tasks separately rather than through a unified framework. This work aims to integrate these two tasks into one unified model. Although discrete speech tokenization enables joint modeling, its inherent information loss limits performance in both recognition and generation. In this work, we present UniVoice, a unified LLM framework through continuous representations that seamlessly integrates speech recognition and synthesis within a single model. Our approach combines the strengths of autoregressive modeling for speech recognition with flow matching for high-quality generation. To mitigate the inherent divergence between autoregressive and flow-matching models, we further design a dual attention mechanism, which switches between a causal mask for recognition and a bidirectional attention mask for synthesis. Furthermore, the proposed text-prefix-conditioned speech infilling method enables high-fidelity zero-shot voice cloning. Experimental results demonstrate that our method can achieve or exceed current single-task modeling methods in both ASR and zero-shot TTS tasks. This work explores new possibilities for end-to-end speech understanding and generation. Code is available at https://github.com/gwh22/UniVoice.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.04593v3</guid>
      <category>eess.AS</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wenhao Guan, Zhikang Niu, Ziyue Jiang, Kaidi Wang, Peijie Chen, Qingyang Hong, Lin Li, Xie Chen</dc:creator>
    </item>
    <item>
      <title>Dendrograms of Mixing Measures for Softmax-Gated Gaussian Mixture of Experts: Consistency Without Model Sweeps</title>
      <link>https://arxiv.org/abs/2510.12744</link>
      <description>arXiv:2510.12744v2 Announce Type: replace-cross 
Abstract: We develop a unified statistical framework for softmax-gated Gaussian mixture of experts (SGMoE) that addresses three long-standing obstacles in parameter estimation and model selection: (i) non-identifiability of gating parameters up to common translations, (ii) intrinsic gate-expert interactions that induce coupled differential relations in the likelihood, and (iii) the tight numerator-denominator coupling in the softmax-induced conditional density. Our approach introduces Voronoi-type loss functions aligned with the gate-partition geometry and establishes finite-sample convergence rates for the maximum likelihood estimator (MLE). In over-specified models, we reveal a link between the MLE's convergence rate and the solvability of an associated system of polynomial equations characterizing near-nonidentifiable directions. For model selection, we adapt dendrograms of mixing measures to SGMoE, yielding a consistent, sweep-free selector of the number of experts that attains pointwise-optimal parameter rates under overfitting while avoiding multi-size training. Simulations on synthetic data corroborate the theory, accurately recovering the expert count and achieving the predicted rates for parameter estimation while closely approximating the regression function. Under model misspecification (e.g., $\epsilon$-contamination), the dendrogram selection criterion is robust, recovering the true number of mixture components, while the Akaike information criterion, the Bayesian information criterion, and the integrated completed likelihood tend to overselect as sample size grows. On a maize proteomics dataset of drought-responsive traits, our dendrogram-guided SGMoE selects two experts, exposes a clear mixing-measure hierarchy, stabilizes the likelihood early, and yields interpretable genotype-phenotype maps, outperforming standard criteria without multi-size training.</description>
      <guid isPermaLink="false">oai:arXiv.org:2510.12744v2</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>math.ST</category>
      <category>stat.CO</category>
      <category>stat.ME</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Do Tien Hai, Trung Nguyen Mai, TrungTin Nguyen, Nhat Ho, Binh T. Nguyen, Christopher Drovandi</dc:creator>
    </item>
    <item>
      <title>The Value of Personalized Recommendations: Evidence from Netflix</title>
      <link>https://arxiv.org/abs/2511.07280</link>
      <description>arXiv:2511.07280v5 Announce Type: replace-cross 
Abstract: Personalized recommendation systems shape much of user choice online, yet their targeted nature makes separating out the value of recommendation and the underlying goods challenging. We build a discrete choice model that embeds recommendation-induced utility, low-rank heterogeneity, and flexible state dependence and apply the model to viewership data at Netflix. We exploit idiosyncratic variation introduced by the recommendation algorithm to identify and separately value these components as well as to recover model-free diversion ratios that we can use to validate our structural model. We use the model to evaluate counterfactuals that quantify the incremental engagement generated by personalized recommendations. First, we show that replacing the current recommender system with a matrix factorization or popularity-based algorithm would lead to 4% and 12% reduction in engagement, respectively, and decreased consumption diversity. Second, most of the consumption increase from recommendations comes from effective targeting, not mechanical exposure, with the largest gains for mid-popularity goods (as opposed to broadly appealing or very niche goods).</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.07280v5</guid>
      <category>econ.GN</category>
      <category>cs.IR</category>
      <category>cs.LG</category>
      <category>q-fin.EC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kevin Zielnicki, Guy Aridor, Aur\'elien Bibaut, Allen Tran, Winston Chou, Nathan Kallus</dc:creator>
    </item>
    <item>
      <title>SAGE: Shape-Adapting Gated Experts for Adaptive Histopathology Image Segmentation</title>
      <link>https://arxiv.org/abs/2511.18493</link>
      <description>arXiv:2511.18493v4 Announce Type: replace-cross 
Abstract: The significant variability in cell size and shape continues to pose a major obstacle in computer-assisted cancer detection on gigapixel Whole Slide Images (WSIs), due to cellular heterogeneity. Current CNN-Transformer hybrids use static computation graphs with fixed routing. This leads to extra computation and makes it harder to adapt to changes in input. We propose Shape-Adapting Gated Experts (SAGE), an input-adaptive framework that enables dynamic expert routing in heterogeneous visual networks. SAGE reconfigures static backbones into dynamically routed expert architectures via a dual-path design with hierarchical gating and a Shape-Adapting Hub (SA-Hub) that harmonizes feature representations across convolutional and transformer modules. Embodied as SAGE with ConvNeXt and Vision Transformer UNet (SAGE-ConvNeXt+ViT-UNet), our model achieves a Dice score of 95.23% on EBHI, DSC scores of 92.78% and 91.42% on GlaS Test A and Test B, respectively, and 91.26% DSC at the WSI level on DigestPath, while exhibiting robust generalization under distribution shifts by adaptively balancing local refinement and global context. SAGE establishes a scalable foundation for dynamic expert routing in visual networks, thereby facilitating flexible visual reasoning. Project page: https://oxyzgiahuy.github.io/sage/</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.18493v4</guid>
      <category>eess.IV</category>
      <category>cs.AI</category>
      <category>cs.CV</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Gia Huy Thai, Hoang-Nguyen Vu, Anh-Minh Phan, Quang-Thinh Ly, Thi-Ngoc-Truc Nguyen, Nhat Ho</dc:creator>
    </item>
    <item>
      <title>Quasi-symmetric nets: A constructive approach to the equimodular elliptic type of Kokotsakis polyhedra</title>
      <link>https://arxiv.org/abs/2511.19376</link>
      <description>arXiv:2511.19376v2 Announce Type: replace-cross 
Abstract: A Kokotsakis polyhedron is a polyhedral mesh in three-dimensional Euclidean space formed by a central n-gonal face (the base), n quadrilateral faces each sharing one edge with the base, and n triangular faces inserted between every two adjacent quadrilaterals; it is called flexible if it admits a continuous deformation that preserves the rigidity of every face. This work investigates flexible Kokotsakis polyhedra with a quadrangular base (n = 4) of equimodular elliptic type, filling a significant gap in the literature by providing the first explicit constructions of this type together with an explicit algebraic characterization in terms of flat and dihedral angles. A straightforwardly constructible class of polyhedra - called quasi-symmetric nets (QS-nets) - is introduced, characterized by a symmetry relation among flat angles. It is shown that every elliptic QS-net has equimodular elliptic type and is flexible in real three-dimensional Euclidean space (rather than only in complex configuration spaces), except for a few exceptional choices of dihedral angles, and that its flexion admits a closed-form parameterization. Examples are constructed that are non-self-intersecting and belong exclusively to the equimodular elliptic type. To support applications in computational geometry, a numerical pipeline is developed that searches for candidate solutions, verifies them using the explicit algebraic characterization, and constructs and visualizes the resulting polyhedra; numerical validations achieve high precision. Taken together, these results provide constructive criteria, algorithms, and validated examples for the equimodular elliptic type, enabling the design of a broad range of flexible Kokotsakis mechanisms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2511.19376v2</guid>
      <category>math.MG</category>
      <category>cs.CG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <arxiv:DOI>10.1016/j.cad.2026.104102</arxiv:DOI>
      <arxiv:journal_reference>Computer-Aided Design 199 (2026) 104102</arxiv:journal_reference>
      <dc:creator>A. Nurmatov, M. Skopenkov, F. Rist, J. Klein, D. L. Michels</dc:creator>
    </item>
    <item>
      <title>Closed-form Solution of Wahba's Problem for Pairwise Similar Quaternions</title>
      <link>https://arxiv.org/abs/2512.07597</link>
      <description>arXiv:2512.07597v3 Announce Type: replace-cross 
Abstract: Wahba's problem is fundamental to spacecraft attitude estimation, seeking the optimal rotation that minimizes the weighted misalignment between sets of vector observations. Traditional solvers, including Davenport's $q$-method, QUEST, and ESOQ, reformulate the problem as an eigenvalue task for a $4 \times 4$ symmetric matrix, a process that obscures the underlying algebraic structure of the solution. This paper presents a novel, entirely quaternion-based closed-form solution for the pairwise similar quaternions. By establishing a direct connection to the homogeneous singular Sylvester equation: (i) we derive the necessary and sufficient condition for the existence of a quaternion that achieves zero Wahba's cost; (ii) we provide a closed-form analytic expression for the corresponding solution set; and (iii) we propose the computationally efficient and numerically stable Minimal Analytic Rotation Algorithm (MARA). Computational complexity analysis demonstrates that MARA achieves a $35.11\%$ reduction in total floating-point operations (FLOPs) compared to the state-of-the-art ESOQ2 algorithm. Numerical validation via $10^6$ Monte Carlo trials confirms that MARA achieves higher accuracy than established optimal solvers under stochastic noise, offering a computationally more efficient and analytically transparent alternative for high-frequency attitude determination systems.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.07597v3</guid>
      <category>math.RA</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Hristina Radak, Christian Scheunert, Frank H. P. Fitzek</dc:creator>
    </item>
    <item>
      <title>Unambiguous Representations in Neural Networks: An Information-Theoretic Approach to Intentionality</title>
      <link>https://arxiv.org/abs/2512.11000</link>
      <description>arXiv:2512.11000v2 Announce Type: replace-cross 
Abstract: Representations pervade our daily experience, from letters representing sounds to bit strings encoding digital files. While such representations require externally defined decoders to convey meaning, conscious experience is fundamentally different: a neural state corresponding to perceiving a red square cannot alternatively encode the experience of a green triangle. This intrinsic property of consciousness suggests that conscious representations must be unambiguous in a way that conventional representations are not. We formalize this intuition using information theory, defining representational ambiguity as the conditional entropy H(I|R) over possible interpretations I given a representation R. Through experiments on neural networks trained to classify MNIST digits, we demonstrate that relational structures in network connectivity can unambiguously encode representational content. From relational structure alone, we achieve perfect (100%) accuracy for dropout-trained networks and 38% for standard backpropagation (chance: 10%) in identifying output neuron class identity, despite identical task performance, demonstrating that representational ambiguity can arise orthogonally to behavioral accuracy. We further show that spatial position of input neurons, relevant to phenomenal properties like visual field location, can be decoded from network connectivity with R^2 up to 0.844. These results provide a quantitative method for measuring representational ambiguity in neural systems and demonstrate that neural networks can exhibit the low-ambiguity representations posited as necessary (though not sufficient) by theoretical accounts such as narrow representationalism and IIT.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.11000v2</guid>
      <category>q-bio.NC</category>
      <category>cs.AI</category>
      <category>cs.NE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Francesco L\"assig</dc:creator>
    </item>
    <item>
      <title>A Kronecker algorithm for locally closed sets over a perfect field</title>
      <link>https://arxiv.org/abs/2512.14888</link>
      <description>arXiv:2512.14888v2 Announce Type: replace-cross 
Abstract: We develop a probabilistic algorithm of Kronecker type for computing a Kronecker representation of a zero-dimensional linear section of an algebraic variety $V$ defined over a perfect field $k$. The variety $V$ is the Zariski closure of the set of common zeros $\{F_1=0,\ldots,F_r=0,G\not=0\}$ of multivariate polynomials $F_1,\ldots,F_r\in k[X_1,\ldots,X_n]$ outside a prescribed hypersurface $\{G=0\}$. We assume that $F_1,\ldots,F_r$ satisfy natural geometric conditions, such as regularity and radicality, in the local ring $k[X_1,\ldots,X_n]_G$. Our approach combines homotopic deformation techniques with symbolic Newton-Hensel lifting and elimination. We discuss the concept of lifting curves as intermediate geometric objects that enable efficient computation.
  The complexity of the algorithm is expressed in terms of the degrees and arithmetic size of the input and achieves soft-quadratic complexity in these parameters. We provide detailed complexity analyses for arbitrary perfect fields, as well as for two important cases in computer algebra: finite fields and the field of rational numbers. For each case, we obtain sharp bounds on the size of the base field or required primes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.14888v2</guid>
      <category>math.AG</category>
      <category>cs.SC</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Nardo Gim\'enez, Joos Heintz, Guillermo Matera, Luis Miguel Pardo, Mariana P\'erez, Melina Privitelli</dc:creator>
    </item>
    <item>
      <title>Uniform Interpolation</title>
      <link>https://arxiv.org/abs/2512.15391</link>
      <description>arXiv:2512.15391v3 Announce Type: replace-cross 
Abstract: Uniform interpolation is a strengthening of interpolation that holds for certain propositional logics. The starting point of this chapter is a theorem of A. Pitts, which shows that uniform interpolation holds for intuitionistic propositional logic. We outline how this theorem may be proved semantically via the definability of bisimulation quantifiers, and how it generalizes to an open mapping theorem between Esakia spaces. We also discuss connections between uniform interpolation and research in categorical logic, algebra, and model theory.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.15391v3</guid>
      <category>math.LO</category>
      <category>cs.LO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Sam van Gool</dc:creator>
    </item>
    <item>
      <title>Exploring the Effect of Basis Rotation on NQS Performance</title>
      <link>https://arxiv.org/abs/2512.17893</link>
      <description>arXiv:2512.17893v2 Announce Type: replace-cross 
Abstract: Neural Quantum States (NQS) are powerful variational representations of quantum many-body wavefunctions, yet their performance depends sensitively on the chosen basis. Using an exactly solvable one-dimensional Ising model, we show that local basis rotations leave the minimization landscape unchanged while relocating the exact ground state in parameter space. This provides a controlled framework to disentangle representational limitations from optimization-induced trainability effects. This geometric displacement, quantified through information-geometric measures, can steer optimization of shallow architectures toward saddle points and high-curvature regions. As a result, low energy errors may coexist with an incorrect wavefunction structure. By comparing energy and infidelity optimization within the same variational architectures, we show that optimization failure can persist even when the rotated target state remains representable. Our results identify a geometric mechanism contributing to basis dependence in NQS and motivate landscape-aware variational design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.17893v2</guid>
      <category>quant-ph</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Sven Benjamin Ko\v{z}i\'c, Vinko Zlati\'c, Fabio Franchini, Salvatore Marco Giampaolo</dc:creator>
    </item>
    <item>
      <title>GenTSE: Enhancing Target Speaker Extraction via a Coarse-to-Fine Generative Language Model</title>
      <link>https://arxiv.org/abs/2512.20978</link>
      <description>arXiv:2512.20978v2 Announce Type: replace-cross 
Abstract: Language Model (LM)-based generative modeling has emerged as a promising direction for TSE, offering potential for improved generalization and high-fidelity speech. We propose GenTSE, a two-stage decoder-only generative LM for TSE: Stage-1 predicts coarse semantic tokens, and Stage-2 generates fine acoustic tokens. Separating semantics and acoustics stabilizes decoding and yields more accurate target speech. Both stages use continuous SSL or codec embeddings, offering richer context than discretized-prompt methods. To reduce exposure bias, we employ a Frozen-LM Conditioning training strategy that conditions the LMs on predicted tokens from earlier checkpoints to reduce the gap between teacher-forcing training and autoregressive inference. We further apply DPO to better align outputs with perceptual preferences. Experiments on Libri2Mix show that GenTSE surpasses previous LM-based systems in speech quality, intelligibility, and speaker consistency.</description>
      <guid isPermaLink="false">oai:arXiv.org:2512.20978v2</guid>
      <category>eess.AS</category>
      <category>cs.AI</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haoyang Li, Xuyi Zhuang, Azmat Adnan, Ye Ni, Wei Rao, Shreyas Gopal, Eng Siong Chng, Boon Siew Han, Yuanjin Zheng</dc:creator>
    </item>
    <item>
      <title>Supracompetitive Pricing Under AI Monoculture</title>
      <link>https://arxiv.org/abs/2601.01279</link>
      <description>arXiv:2601.01279v3 Announce Type: replace-cross 
Abstract: When competing sellers delegate pricing to a shared AI model, such as a large language model, correlated recommendations combined with performance-driven updates aggregating seller feedback raise a key question: can standard AI deployment practices inadvertently produce supracompetitive pricing? We develop a stylized duopoly model in which two sellers receive pricing recommendations from a shared AI characterized by two parameters: a propensity parameter capturing the model's tendency to set high prices and an output-fidelity parameter measuring alignment between this tendency and actual outputs, with propensity updated via periodic retraining on observed outcomes. We find that configuring AI models for robustness and reproducibility can lead to supracompetitive pricing via a phase transition. Below a critical output-fidelity threshold, competitive pricing is the unique stable outcome. Above it, the model exhibits bistability: both competitive and supracompetitive pricing are locally stable, with the realized outcome determined by the model's initial propensity. Supracompetitive pricing raises average prices, but occasional low-price recommendations complicate detection. With perfect output fidelity, full price coordination emerges from any interior initial propensity. For finite training batches of size $b$, when the initial propensity lies in the supracompetitive basin, the probability of supracompetitive pricing approaches 1 as $b$ increases, with the region of indeterminate outcomes shrinking at rate $O(1/\sqrt{b})$. Any factor reducing alignment between the model's propensity and sellers' actual pricing, whether through diversifying AI providers, introducing recommendation noise, or reducing seller adherence, pushes the market toward competitive outcomes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.01279v3</guid>
      <category>econ.TH</category>
      <category>cs.AI</category>
      <category>cs.CE</category>
      <category>cs.CL</category>
      <category>cs.GT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Shengyu Cao, Ming Hu</dc:creator>
    </item>
    <item>
      <title>A large-scale nanocrystal database with aligned synthesis and properties enabling generative inverse design</title>
      <link>https://arxiv.org/abs/2601.02424</link>
      <description>arXiv:2601.02424v2 Announce Type: replace-cross 
Abstract: The synthesis of nanocrystals has been highly dependent on trial-and-error, due to the complex correlation between synthesis parameters and physicochemical properties. Although deep learning offers a potential methodology to achieve generative inverse design, it is still hindered by the scarcity of high-quality datasets that align nanocrystal synthesis routes with their properties. Here, we present the construction of a large-scale, aligned Nanocrystal Synthesis-Property (NSP) database and demonstrate its capability for generative inverse design. To extract structured synthesis routes and their corresponding product properties from literature, we develop NanoExtractor, a large language model (LLM) enhanced by well-designed augmentation strategies. NanoExtractor is validated against human experts, achieving a weighted average score of 88% on the test set, significantly outperforming chemistry-specialized (3%) and general-purpose LLMs (38%). The resulting NSP database contains nearly 160,000 aligned entries and serves as training data for our NanoDesigner, an LLM for inverse synthesis design. The generative capability of NanoDesigner is validated through the successful design of viable synthesis routes for both well-established PbSe nanocrystals and rarely reported MgF2 nanocrystals. Notably, the model recommends a counter-intuitive, non-stoichiometric precursor ratio (1:1) for MgF2 nanocrystals, which is experimentally confirmed as critical for suppressing byproducts. Our work bridges the gap between unstructured literature and data-driven synthesis, and also establishes a powerful human-AI collaborative paradigm for accelerating nanocrystal discovery.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.02424v2</guid>
      <category>cond-mat.mtrl-sci</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Kai Gu, Yingping Liang, Senliang Peng, Aotian Guo, Haizheng Zhong, Ying Fu</dc:creator>
    </item>
    <item>
      <title>Sound Event Detection with Boundary-Aware Optimization and Inference</title>
      <link>https://arxiv.org/abs/2601.04178</link>
      <description>arXiv:2601.04178v2 Announce Type: replace-cross 
Abstract: Temporal detection problems appear in many fields including time-series estimation, activity recognition and sound event detection (SED). In this work, we propose a new approach to temporal event modeling by explicitly modeling event onsets and offsets, and by introducing boundary-aware optimization and inference strategies that substantially enhance temporal event detection. The presented methodology incorporates new temporal modeling layers - Recurrent Event Detection (RED) and Event Proposal Network (EPN) - which, together with tailored loss functions, enable more effective and precise temporal event detection. We evaluate the proposed method in the SED domain using a subset of the temporally-strongly annotated portion of AudioSet. Experimental results show that our approach not only outperforms traditional frame-wise SED models with state-of-the-art post-processing, but also removes the need for post-processing hyperparameter tuning, and scales to achieve new state-of-the-art performance across all AudioSet Strong classes.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.04178v2</guid>
      <category>eess.AS</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Florian Schmid, Chi Ian Tang, Sanjeel Parekh, Vamsi Krishna Ithapu, Juan Azcarreta Ortiz, Giacomo Ferroni, Yijun Qian, Arnoldas Jasonas, Cosmin Frateanu, Camilla Clark, Gerhard Widmer, \c{C}a\u{g}da\c{s} Bilen</dc:creator>
    </item>
    <item>
      <title>Conditional Normalizing Flows for Forward and Backward Joint State and Parameter Estimation</title>
      <link>https://arxiv.org/abs/2601.07013</link>
      <description>arXiv:2601.07013v2 Announce Type: replace-cross 
Abstract: Traditional filtering algorithms for state estimation -- such as classical Kalman filtering, unscented Kalman filtering, and particle filters -- show performance degradation when applied to nonlinear systems whose uncertainty follows arbitrary non-Gaussian, and potentially multi-modal distributions. This study reviews recent approaches to state estimation via nonlinear filtering based on conditional normalizing flows, where the conditional embedding is generated by standard MLP architectures, transformers or selective state-space models (like Mamba-SSM). In addition, we test the effectiveness of an optimal-transport-inspired kinetic loss term in mitigating overparameterization in flows consisting of a large collection of transformations. We investigate the performance of these approaches on applications relevant to autonomous driving and patient population dynamics, paying special attention to how they handle time inversion and chained predictions. Finally, we assess the performance of various conditioning strategies for an application to real-world COVID-19 joint SIR system forecasting and parameter estimation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.07013v2</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Luke S. Lagunowich, Guoxiang Grayson Tong, Daniele E. Schiavazzi</dc:creator>
    </item>
    <item>
      <title>Logarithmic Density of Rank $\geq 1$ and Rank $\geq 2$ Genus-2 Jacobians and Applications to Hyperelliptic Curve Cryptography</title>
      <link>https://arxiv.org/abs/2601.17142</link>
      <description>arXiv:2601.17142v2 Announce Type: replace-cross 
Abstract: In this work we study quantitative existence results for genus-$2$ curves over $\mathbb{Q}$ whose Jacobians have Mordell--Weil rank at least $1$ or $2$, ordering the curves by the naive height of their integral Weierstrass models. We use geometric techniques to show that asymptotically the Jacobians of almost all integral models with two rational points at infinity have rank $r \geq 1$. Since there are $\asymp X^{\frac{13}{2}}$ such models among the $X^7$ curves $y^2=f(x)$ of height at most $X$, this yields a lower bound of logarithmic density $13/14$ for the subset of such curves whose Jacobians have rank at least $1$. We further present a large explicit subfamily of genus-$2$ curves, ordered by height as above, for which the Jacobians have rank $r \geq 2$, yielding an unconditional logarithmic density of at least $5/7$. Independently, we give a construction of genus-$2$ curves with split Jacobian and rank at least $2$, producing a subfamily of logarithmic density at least $2/21$. Finally, we analyze quadratic and biquadratic twist families in the split-Jacobian setting, obtaining a positive proportion of rank-$2$ twists. These results have implications for Regev's quantum algorithm in hyperelliptic curve cryptography.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.17142v2</guid>
      <category>math.NT</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Razvan Barbulescu, Mugurel Barcau, Vicentiu Pasol, George C. Turcas</dc:creator>
    </item>
    <item>
      <title>Solving Inverse Problems with Flow-based Models via Model Predictive Control</title>
      <link>https://arxiv.org/abs/2601.23231</link>
      <description>arXiv:2601.23231v2 Announce Type: replace-cross 
Abstract: Flow-based generative models provide strong unconditional priors for inverse problems, but guiding their dynamics for conditional generation remains challenging. Recent work casts training-free conditional generation in flow models as an optimal control problem; however, solving the resulting trajectory optimisation is computationally and memory intensive, requiring differentiation through the flow dynamics or adjoint solves. We propose MPC-Flow, a model predictive control framework that formulates inverse problem solving with flow-based generative models as a sequence of control sub-problems, enabling practical optimal control-based guidance at inference time. We provide theoretical analysis linking MPC-Flow to the underlying optimal control objective and show how different algorithmic choices yield a spectrum of guidance algorithms, including regimes that avoid backpropagation through the generative model trajectory. We evaluate MPC-Flow on benchmark image restoration tasks, spanning linear and non-linear settings such as in-painting, deblurring, and super-resolution, and demonstrate strong performance and scalability to massive state-of-the-art architectures via training-free guidance of FLUX.2 (32B) in a quantised setting on consumer hardware.</description>
      <guid isPermaLink="false">oai:arXiv.org:2601.23231v2</guid>
      <category>eess.IV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>George Webber, Alexander Denker, Riccardo Barbano, Andrew J Reader</dc:creator>
    </item>
    <item>
      <title>Zero-Flow Encoders</title>
      <link>https://arxiv.org/abs/2602.00797</link>
      <description>arXiv:2602.00797v3 Announce Type: replace-cross 
Abstract: Flow-based methods have achieved significant success in various generative modeling tasks, capturing nuanced details within complex data distributions. However, few existing works have exploited this unique capability to resolve fine-grained structural details beyond generation tasks. This paper presents a flow-inspired framework for representation learning. First, we demonstrate that a rectified flow trained using independent coupling is zero everywhere at $t=0.5$ if and only if the source and target distributions are identical. We term this property the \emph{zero-flow criterion}. Second, we show that this criterion can certify conditional independence, thereby extracting \emph{sufficient information} from the data. Third, we translate this criterion into a tractable, simulation-free loss function that enables learning amortized Markov blankets in graphical models and latent representations in self-supervised learning tasks. Experiments on both simulated and real-world datasets demonstrate the effectiveness of our approach. The code reproducing our experiments can be found at: https://github.com/probabilityFLOW/zfe.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.00797v3</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yakun Wang, Leyang Wang, Song Liu, Taiji Suzuki</dc:creator>
    </item>
    <item>
      <title>Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning</title>
      <link>https://arxiv.org/abs/2602.02431</link>
      <description>arXiv:2602.02431v2 Announce Type: replace-cross 
Abstract: It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. While this phenomenon has been extensively studied in linear regression, the benefit of multi-pass gradient descent (GD, which reuses all the data) over one-pass stochastic gradient descent (online SGD, which uses each data point only once) is not well-understood in nonlinear and non-convex settings, except for a loss modification mechanism achieved by the first two passes on the data. In this work, we consider learning a $d$-dimensional single-index model with a quadratic activation, for which it is known that one-pass SGD requires $n\gtrsim d\log d$ samples to achieve weak recovery. We first show that this $\log d$ factor in the sample complexity persists for full-batch spherical GD on the correlation loss; however, by simply truncating the activation, full-batch GD exhibits a favorable optimization landscape at $n \simeq d$ samples, thereby outperforming one-pass SGD (with the same activation) in statistical efficiency. We complement this result with a trajectory analysis of full-batch GD on the squared loss from small initialization, showing that $n \gtrsim d$ samples and $T \gtrsim\log d$ gradient steps suffice to achieve strong (exact) recovery.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.02431v2</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Filip Kova\v{c}evi\'c, Hong Chang Ji, Denny Wu, Mahdi Soltanolkotabi, Marco Mondelli</dc:creator>
    </item>
    <item>
      <title>Improved Analysis of the Accelerated Noisy Power Method with Applications to Decentralized PCA</title>
      <link>https://arxiv.org/abs/2602.03682</link>
      <description>arXiv:2602.03682v2 Announce Type: replace-cross 
Abstract: We analyze the Accelerated Noisy Power Method, an algorithm for Principal Component Analysis in the setting where only inexact matrix-vector products are available, which can arise for instance in decentralized PCA. While previous works have established that acceleration can improve convergence rates compared to the standard Noisy Power Method, these guarantees require overly restrictive upper bounds on the magnitude of the perturbations, limiting their practical applicability. We provide an improved analysis of this algorithm, which preserves the accelerated convergence rate under much milder conditions on the perturbations. We show that our new analysis is worst-case optimal, in the sense that the convergence rate cannot be improved, and that the noise conditions we derive cannot be relaxed without sacrificing convergence guarantees. We demonstrate the practical relevance of our results by deriving an accelerated algorithm for decentralized PCA, which has similar communication costs to non-accelerated methods. To our knowledge, this is the first decentralized algorithm for PCA with provably accelerated convergence.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.03682v2</guid>
      <category>stat.ML</category>
      <category>cs.DC</category>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pierre Agui\'e, Mathieu Even, Laurent Massouli\'e</dc:creator>
    </item>
    <item>
      <title>Performative Learning Theory</title>
      <link>https://arxiv.org/abs/2602.04402</link>
      <description>arXiv:2602.04402v3 Announce Type: replace-cross 
Abstract: Performative predictions influence the very outcomes they aim to forecast. We study performative predictions that affect a sample (e.g., only existing users of an app) and/or the whole population (e.g., all potential app users). This raises the question of how well models generalize under performativity. For example, how well can we draw insights about new app users based on existing users when both of them react to the app's predictions? We address this question by embedding performative predictions into statistical learning theory. We prove generalization bounds under performative effects on the sample, on the population, and on both. A key intuition behind our proofs is that in the worst case, the population negates predictions, while the sample deceptively fulfills them. We cast such self-negating and self-fulfilling predictions as min-max and min-min risk functionals in Wasserstein space, respectively. Our analysis reveals a fundamental trade-off between performatively changing the world and learning from it: the more a model affects data, the less it can learn from it. Moreover, our analysis results in a surprising insight on how to improve generalization guarantees by retraining on performatively distorted samples. We illustrate our bounds in a case study on prediction-informed assignments of unemployed German residents to job trainings, drawing upon administrative labor market records from 1975 to 2017 in Germany.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.04402v3</guid>
      <category>stat.ML</category>
      <category>cs.AI</category>
      <category>cs.CY</category>
      <category>cs.LG</category>
      <category>math.ST</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Julian Rodemann, Unai Fischer-Abaigar, James Bailie, Krikamol Muandet</dc:creator>
    </item>
    <item>
      <title>Wedge Sampling: Efficient Tensor Completion with Nearly-Linear Sample Complexity</title>
      <link>https://arxiv.org/abs/2602.05869</link>
      <description>arXiv:2602.05869v2 Announce Type: replace-cross 
Abstract: We introduce Wedge Sampling, a new non-adaptive sampling scheme for low-rank tensor completion. We study recovery of an order-$k$ low-rank tensor of dimension $n \times \cdots \times n$ from a subset of its entries. Unlike the standard uniform entry model (i.e., i.i.d. samples from $[n]^k$), wedge sampling allocates observations to structured length-two patterns (wedges) in an associated bipartite sampling graph. By directly promoting these length-two connections, the sampling design strengthens the spectral signal that underlies efficient initialization, in regimes where uniform sampling is too sparse to generate enough informative correlations.
  Our main result shows that this change in sampling paradigm enables polynomial-time algorithms to achieve both weak and exact recovery with nearly linear sample complexity in $n$. The approach is also plug-and-play: wedge-sampling-based spectral initialization can be combined with existing refinement procedures (e.g., spectral or gradient-based methods) using only an additional $\tilde{O}(n)$ uniformly sampled entries, substantially improving over the $\tilde{O}(n^{k/2})$ sample complexity typically required under uniform entry sampling for efficient methods. Overall, our results suggest that the statistical-to-computational gap highlighted in Barak and Moitra (2022) is, to a large extent, a consequence of the uniform entry sampling model for tensor completion, and that alternative non-adaptive measurement designs that guarantee a strong initialization can overcome this barrier.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.05869v2</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <category>math.PR</category>
      <category>math.ST</category>
      <category>stat.TH</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Hengrui Luo, Anna Ma, Ludovic Stephan, Yizhe Zhu</dc:creator>
    </item>
    <item>
      <title>Cosmo3DFlow: Wavelet Flow Matching for Spatial-to-Spectral Compression in Reconstructing the Early Universe</title>
      <link>https://arxiv.org/abs/2602.10172</link>
      <description>arXiv:2602.10172v2 Announce Type: replace-cross 
Abstract: Reconstructing the early universe from the evolved present-day universe is a challenging and computationally demanding problem in modern astrophysics. We devise a novel generative framework, Cosmo3DFlow, designed to address dimensionality and sparsity, the critical bottlenecks inherent in current state-of-the-art methods for cosmological inference. By integrating 3D Discrete Wavelet Transform (DWT) with flow matching, we effectively represent high-dimensional cosmological structures. The Wavelet Transform addresses the ``void problem'' by translating spatial emptiness into spectral sparsity. It decouples high-frequency details from low-frequency structures, and wavelet-space velocity fields facilitate stable ordinary differential equation (ODE) solvers with large step sizes. Using large-scale cosmological $N$-body simulations at $128^3$ resolution, we achieve up to $46\times$ faster sampling than diffusion models. Our results enable initial conditions to be sampled in seconds, compared to minutes for previous methods.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.10172v2</guid>
      <category>astro-ph.IM</category>
      <category>cs.AI</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Md. Khairul Islam, Zeyu Xia, Ryan Goudjil, Jialu Wang, Arya Farahi, Judy Fox</dc:creator>
    </item>
    <item>
      <title>Transforming Police-Car Swerving for Mitigating Isolated Stop-and-Go Traffic Waves: A Practice-Oriented Jam-Absorption Driving Strategy</title>
      <link>https://arxiv.org/abs/2602.10234</link>
      <description>arXiv:2602.10234v3 Announce Type: replace-cross 
Abstract: Stop-and-go traffic waves, a major form of freeway congestion, impose severe and persistent adverse impacts, including reduced traffic efficiency, increased safety risks, and elevated vehicle emissions. Among various freeway traffic management strategies, jam-absorption driving (JAD), in which a dedicated vehicle performs "slow-in" and "fast-out" maneuvers before being captured by a stop-and-go wave, has been proposed as a promising approach to suppressing the propagation of such waves. However, most existing JAD strategies remain impractical, primarily due to the lack of consideration of implementation vehicles and operational conditions. Inspired by real-world observations of police-car swerving behavior, this paper first introduces the Single-Vehicle Double-Detector Jam-Absorption Driving (SD-JAD) problem and then proposes a practical JAD strategy based on a definition of the JAD Triangle, transforming such behavior into a traffic control strategy capable of suppressing the propagation of an isolated stop-and-go wave. Five key parameters that significantly affect the proposed strategy, namely JAD speed, inflow traffic speed, wave width, wave speed, and in-wave speed, are identified and systematically analyzed. Using a SUMO-based simulation as an illustrative example, we further demonstrate how these parameters can be measured in practice using only two stationary roadside traffic detectors. The results show that the proposed JAD strategy successfully suppresses the propagation of a stop-and-go wave without triggering secondary waves. This paper is expected to take a significant step toward the practical implementation of JAD, advancing it from a theoretical concept to a feasible and deployable traffic management strategy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.10234v3</guid>
      <category>physics.soc-ph</category>
      <category>cs.AI</category>
      <category>cs.RO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Zhengbing He</dc:creator>
    </item>
    <item>
      <title>Local Coordination and the Geometry of Social Networks</title>
      <link>https://arxiv.org/abs/2602.12571</link>
      <description>arXiv:2602.12571v2 Announce Type: replace-cross 
Abstract: We study agents playing a pure coordination game on a large social network. Agents are restricted to coordinate locally, without access to a global communication device, and so different regions of the network will converge to different actions, precluding perfect coordination. We show that the extent of this inefficiency depends on the network geometry: on some networks, near-perfect efficiency is achievable, while on others welfare is strictly bounded away from the optimum. We provide a geometric condition on the network structure that characterizes when near-efficiency is attainable. On networks in which it is unattainable, our results more generally preclude high correlations between outcomes in a large spectrum of dynamic games.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.12571v2</guid>
      <category>econ.TH</category>
      <category>cs.GT</category>
      <category>math.PR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Tom Hutchcroft, Olga Rospuskova, Omer Tamuz</dc:creator>
    </item>
    <item>
      <title>Enroll-on-Wakeup: A First Comparative Study of Target Speech Extraction for Seamless Interaction in Real Noisy Human-Machine Dialogue Scenarios</title>
      <link>https://arxiv.org/abs/2602.15519</link>
      <description>arXiv:2602.15519v3 Announce Type: replace-cross 
Abstract: Target speech extraction (TSE) typically relies on pre-recorded high-quality enrollment speech, which disrupts user experience and limits feasibility in spontaneous interaction. In this paper, we propose Enroll-on-Wakeup (EoW), a novel framework where the wake-word segment, captured naturally during human-machine interaction, is automatically utilized as the enrollment reference. This eliminates the need for pre-collected speech to enable a seamless experience. We perform the first systematic study of EoW-TSE, evaluating advanced discriminative and generative models under real diverse acoustic conditions. Given the short and noisy nature of wake-word segments, we investigate enrollment augmentation using LLM-based TTS. Results show that while current TSE models face performance degradation in EoW-TSE, TTS-based assistance significantly enhances the listening experience, though gaps remain in speech recognition accuracy.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.15519v3</guid>
      <category>eess.AS</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yiming Yang, Guangyong Wang, Haixin Guan, Yanhua Long</dc:creator>
    </item>
    <item>
      <title>Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models</title>
      <link>https://arxiv.org/abs/2602.16061</link>
      <description>arXiv:2602.16061v2 Announce Type: replace-cross 
Abstract: Estimating population quantities such as mean outcomes from user feedback is fundamental to platform evaluation and social science, yet feedback is often missing not at random (MNAR): users with stronger opinions are more likely to respond, so standard estimators are biased and the estimand is not identified without additional assumptions. Existing approaches typically rely on strong parametric assumptions or bespoke auxiliary variables that may be unavailable in practice. In this paper, we develop a partial identification framework in which sharp bounds on the estimand are obtained by solving a pair of linear programs whose constraints encode the observed data structure. This formulation naturally incorporates outcome predictions from pretrained models, including large language models (LLMs), as additional linear constraints that tighten the feasible set. We call these predictions weak shadow variables: they satisfy a conditional independence assumption with respect to missingness but need not meet the completeness conditions required by classical shadow-variable methods. When predictions are sufficiently informative, the bounds collapse to a point, recovering standard identification as a special case. In finite samples, to provide valid coverage of the identified set, we propose a set-expansion estimator that achieves slower-than-$\sqrt{n}$ convergence rate in the set-identified regime and the standard $\sqrt{n}$ rate under point identification. In simulations and semi-synthetic experiments on customer-service dialogues, we find that LLM predictions are often ill-conditioned for classical shadow-variable methods yet remain highly effective in our framework. They shrink identification intervals by 75--83\% while maintaining valid coverage under realistic MNAR mechanisms.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.16061v2</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <category>econ.EM</category>
      <category>stat.ME</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Hongyu Chen, David Simchi-Levi, Ruoxuan Xiong</dc:creator>
    </item>
    <item>
      <title>huff: A Python package for Market Area Analysis</title>
      <link>https://arxiv.org/abs/2602.17640</link>
      <description>arXiv:2602.17640v4 Announce Type: replace-cross 
Abstract: Market area models, such as the Huff model and its extensions, are widely used to estimate regional market shares and customer flows of retail and service locations. Another, now very common, area of application is the analysis of catchment areas, supply structures and the accessibility of healthcare locations. The huff Python package provides a complete workflow for market area analysis, including data import, construction of origin-destination interaction matrices, basic model analysis, parameter estimation from empirical data, calculation of distance or travel time indicators, and map visualization. Additionally, the package provides several methods of spatial accessibility analysis. The package is modular and object-oriented. It is intended for researchers in economic geography, regional economics, spatial planning, marketing, geoinformation science, and health geography. The software is openly available via the Python Package Index (PyPI) (https://pypi.org/project/huff/); its development and version history are managed in a public GitHub Repository (https://github.com/geowieland/huff_official) and archived at Zenodo (https://doi.org/10.5281/zenodo.18639559).</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.17640v4</guid>
      <category>stat.AP</category>
      <category>cs.SE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Thomas Wieland</dc:creator>
    </item>
    <item>
      <title>Mind the Gap: Detecting Cluster Exits for Robust Local Density-Based Score Normalization in Anomalous Sound Detection</title>
      <link>https://arxiv.org/abs/2602.18777</link>
      <description>arXiv:2602.18777v2 Announce Type: replace-cross 
Abstract: Local density-based score normalization is an effective component of distance-based embedding methods for anomalous sound detection, particularly when data densities vary across conditions or domains. In practice, however, performance depends strongly on neighborhood size. Increasing it can degrade detection accuracy when neighborhood expansion crosses cluster boundaries, violating the locality assumption of local density estimation. This observation motivates adapting the neighborhood size based on locality preservation rather than fixing it in advance. We realize this by proposing cluster exit detection, a lightweight mechanism that identifies distance discontinuities and selects neighborhood sizes accordingly. Experiments across multiple embedding models and datasets show improved robustness to neighborhood-size selection and consistent performance gains.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.18777v2</guid>
      <category>eess.AS</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Kevin Wilkinghoff, Gordon Wichern, Jonathan Le Roux, Zheng-Hua Tan</dc:creator>
    </item>
    <item>
      <title>Training-Free Intelligibility-Guided Observation Addition for Noisy ASR</title>
      <link>https://arxiv.org/abs/2602.20967</link>
      <description>arXiv:2602.20967v2 Announce Type: replace-cross 
Abstract: Automatic speech recognition (ASR) degrades severely in noisy environments. Although speech enhancement (SE) front-ends effectively suppress background noise, they often introduce artifacts that harm recognition. Observation addition (OA) addressed this issue by fusing noisy and SE enhanced speech, improving recognition without modifying the parameters of the SE or ASR models. This paper proposes an intelligibility-guided OA method, where fusion weights are derived from intelligibility estimates obtained directly from the backend ASR. Unlike prior OA methods based on trained neural predictors, the proposed method is training-free, reducing complexity and enhances generalization. Extensive experiments across diverse SE-ASR combinations and datasets demonstrate strong robustness and improvements over existing OA baselines. Additional analyses of intelligibility-guided switching-based alternatives and frame versus utterance-level OA further validate the proposed design.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.20967v2</guid>
      <category>eess.AS</category>
      <category>cs.AI</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Haoyang Li, Changsong Liu, Wei Rao, Hao Shi, Sakriani Sakti, Eng Siong Chng</dc:creator>
    </item>
    <item>
      <title>An Empirical Analysis of Task-Induced Encoder Bias in Fr\'echet Audio Distance</title>
      <link>https://arxiv.org/abs/2602.23958</link>
      <description>arXiv:2602.23958v2 Announce Type: replace-cross 
Abstract: Fr\'echet Audio Distance (FAD) is the de facto standard for evaluating text-to-audio generation, yet its scores depend on the underlying encoder's embedding space. An encoder's training task dictates which acoustic features are preserved or discarded, causing FAD to inherit systematic task-induced biases. We decompose evaluation into Recall, Precision, and Alignment (split into semantic and structural dimensions), using log-scale normalization for fair cross-encoder comparison. Controlled experiments on six encoders across two datasets reveal a four-axis trade-off: reconstruction-based AudioMAE leads precision sensitivity; ASR-trained Whisper dominates structural detection but is blind to signal degradation; classification-trained VGGish maximizes semantic detection but penalizes legitimate intra-class variation. Since no single encoder is a universal evaluator, future metrics must shift toward evaluation-native encoders intrinsically aligned with human perception.</description>
      <guid isPermaLink="false">oai:arXiv.org:2602.23958v2</guid>
      <category>eess.AS</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Wonwoo Jeong</dc:creator>
    </item>
    <item>
      <title>Universal Speech Content Factorization</title>
      <link>https://arxiv.org/abs/2603.08977</link>
      <description>arXiv:2603.08977v2 Announce Type: replace-cross 
Abstract: We propose Universal Speech Content Factorization (USCF), a simple and invertible linear method for extracting a low-rank speech representation in which speaker timbre is suppressed while phonetic content is preserved. USCF extends Speech Content Factorization, a closed-set voice conversion (VC) method, to an open-set setting by learning a universal speech-to-content mapping via least-squares optimization and deriving speaker-specific transformations from only a few seconds of target speech. We show through embedding analysis that USCF effectively removes speaker-dependent variation. As a zero-shot VC system, USCF achieves competitive intelligibility, naturalness, and speaker similarity compared to methods that require substantially more target-speaker data or additional neural training. Finally, we demonstrate that as a training-efficient timbre-disentangled speech feature, USCF features can serve as the acoustic representation for training timbre-prompted text-to-speech models. Speech samples and code are publicly available.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.08977v2</guid>
      <category>eess.AS</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Henry Li Xinyuan, Zexin Cai, Lin Zhang, Leibny Paola Garc\'ia-Perera, Berrak Sisman, Sanjeev Khudanpur, Nicholas Andrews, Matthew Wiesner</dc:creator>
    </item>
    <item>
      <title>ReTabSyn: Realistic Tabular Data Synthesis via Reinforcement Learning</title>
      <link>https://arxiv.org/abs/2603.10823</link>
      <description>arXiv:2603.10823v2 Announce Type: replace-cross 
Abstract: Deep generative models can help with data scarcity and privacy by producing synthetic training data, but they struggle in low-data, imbalanced tabular settings to fully learn the complex data distribution. We argue that striving for the full joint distribution could be overkill; for greater data efficiency, models should prioritize learning the conditional distribution $P(y\mid \bm{X})$, as suggested by recent theoretical analysis. Therefore, we overcome this limitation with \textbf{ReTabSyn}, a \textbf{Re}inforced \textbf{Tab}ular \textbf{Syn}thesis pipeline that provides direct feedback on feature correlation preservation during synthesizer training. This objective encourages the generator to prioritize the most useful predictive signals when training data is limited, thereby strengthening downstream model utility. We empirically fine-tune a language model-based generator using this approach, and across benchmarks with small sample sizes, class imbalance, and distribution shift, ReTabSyn consistently outperforms state-of-the-art baselines. Moreover, our approach can be readily extended to control various aspects of synthetic tabular data, such as applying expert-specified constraints on generated observations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.10823v2</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Xiaofeng Lin, Seungbae Kim, Zhuoya Li, Zachary DeSoto, Charles Fleming, Guang Cheng</dc:creator>
    </item>
    <item>
      <title>SEMamba++: A General Speech Restoration Framework Leveraging Global, Local, and Periodic Spectral Patterns</title>
      <link>https://arxiv.org/abs/2603.11669</link>
      <description>arXiv:2603.11669v2 Announce Type: replace-cross 
Abstract: General speech restoration demands techniques that can interpret complex speech structures under various distortions. While State-Space Models like SEMamba have advanced the state-of-the-art in speech denoising, they are not inherently optimized for critical speech characteristics, such as spectral periodicity or multi-resolution frequency analysis. In this work, we introduce an architecture tailored to incorporate speech-specific features as inductive biases. In particular, we propose the Global, Local, and Periodic (GLP) module, a frequency feature extraction block that effectively and efficiently leverages the properties of frequency bins. Then, we design a multi-resolution parallel time-frequency dual-processing block to capture diverse spectral patterns, and a learnable mapping to further enhance model performance. With all our ideas combined, the proposed SEMamba++ achieves the best performance among multiple baseline models while remaining computationally efficient.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.11669v2</guid>
      <category>eess.AS</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yongjoon Lee, Jung-Woo Choi</dc:creator>
    </item>
    <item>
      <title>Dr. SHAP-AV: Decoding Relative Modality Contributions via Shapley Attribution in Audio-Visual Speech Recognition</title>
      <link>https://arxiv.org/abs/2603.12046</link>
      <description>arXiv:2603.12046v2 Announce Type: replace-cross 
Abstract: Audio-Visual Speech Recognition (AVSR) leverages both acoustic and visual information for robust recognition under noise. However, how models balance these modalities remains unclear. We present Dr. SHAP-AV, a framework using Shapley values to analyze modality contributions in AVSR. Through experiments on six models across two benchmarks and varying SNR levels, we introduce three analyses: Global SHAP for overall modality balance, Generative SHAP for contribution dynamics during decoding, and Temporal Alignment SHAP for input-output correspondence. Our findings reveal that models shift toward visual reliance under noise yet maintain high audio contributions even under severe degradation. Modality balance evolves during generation, temporal alignment holds under noise, and SNR is the dominant factor driving modality weighting. These findings expose a persistent audio bias, motivating ad-hoc modality-weighting mechanisms and Shapley-based attribution as a standard AVSR diagnostic.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.12046v2</guid>
      <category>eess.AS</category>
      <category>cs.CV</category>
      <category>cs.SD</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Umberto Cappellazzo, Stavros Petridis, Maja Pantic</dc:creator>
    </item>
    <item>
      <title>Chaos-Free Networks are Stable Recurrent Neural Networks</title>
      <link>https://arxiv.org/abs/2603.14106</link>
      <description>arXiv:2603.14106v2 Announce Type: replace-cross 
Abstract: Gated Recurrent Neural Networks (RNNs) are widely used for nonlinear system identification due to their high accuracy, although they often exhibit complex, chaotic dynamics that are difficult to analyze. This paper investigates the system-theoretic properties of the Chaos-Free Network (CFN), an architecture originally proposed to eliminate the chaotic behavior found in standard gated RNNs. First, we formally prove that the CFN satisfies Input-to-State Stability (ISS) by design. However, we demonstrate that the CFN architecture does not intrinsically guarantee Incremental ISS (delta-ISS), as ensuring this property relies on specific parametric constraints. To address this, we introduce the Decoupled-Gate Network (DGN), a novel structural variant of the CFN that removes internal state connections in the gating mechanisms. Finally, we prove that the DGN unconditionally satisfies the delta-ISS property, providing an incrementally stable architecture for identifying nonlinear dynamical systems without requiring complex network training modifications. Numerical results confirm that the DGN maintains the modeling capabilities of standard architectures while adhering to these rigorous stability guarantees.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.14106v2</guid>
      <category>math.OC</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Stefano De Carli, Davide Previtali, Mirko Mazzoleni, Fabio Previdi</dc:creator>
    </item>
    <item>
      <title>Online Learning for Supervisory Switching Control</title>
      <link>https://arxiv.org/abs/2603.14762</link>
      <description>arXiv:2603.14762v3 Announce Type: replace-cross 
Abstract: We study supervisory switching control for partially-observed linear dynamical systems. The objective is to identify and deploy a suitable controller for the unknown system by periodically selecting among a collection of $N$ candidate controllers, some of which may destabilize the underlying system. While classical estimator-based supervisory control guarantees asymptotic stability, it lacks quantitative finite-time performance bounds. Conversely, current non-asymptotic methods in both online learning and system identification require restrictive assumptions that are incompatible in a control setting, such as system stability, which preclude testing potentially unstable controllers. To bridge this gap, we propose a novel, non-asymptotic analysis of supervisory control that adapts multi-armed bandit algorithms to a control-theoretic setting. The proposed data-driven algorithm evaluates candidate controllers via scoring criteria that leverage system observability to isolate the effects of state history, enabling both detection of destabilizing controllers and accurate system identification. We present two algorithmic variants with dimension-free, finite-time guarantees, where each identifies the matching controller in $O(N \log^2 N)$ steps, while simultaneously achieving finite $L_2$-gain with respect to system disturbances.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.14762v3</guid>
      <category>math.OC</category>
      <category>cs.LG</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Haoyuan Sun, Ali Jadbabaie</dc:creator>
    </item>
    <item>
      <title>Foundational Analysis Of The Solvability Complexity Index: The Weihrauch-SCI Intermediate Hierarchy</title>
      <link>https://arxiv.org/abs/2603.18955</link>
      <description>arXiv:2603.18955v3 Announce Type: replace-cross 
Abstract: The Solvability Complexity Index (SCI) provides an extensional limit-height formalism for recovering a target map $\Xi$ from finite samples of an evaluation interface $\Lambda\subseteq\mathbb C^\Omega$ by finite-height towers of pointwise limits. We first give a foundational analysis of what this extensional framework does and does not determine. We show that the SCI separation axiom is equivalent to a factorization of $\Xi$ through the full evaluation table, and we isolate the minimal logical role of $\Lambda$ as an information interface.
  To connect the SCI to Type-2 computability and Weihrauch reducibility, we give an effective enrichment for countable $\Lambda$ by viewing the evaluation table image $I_{\Lambda}\subseteq\mathbb{C}^{\mathbb{N}}$ as a represented space and factoring $\Xi$ as $\widehat{\Xi}$. We then define the Weihrauch-SCI rank of a problem as the least number of iterated limit-oracles needed to compute it in the Weihrauch sense, i.e. the least $k$ such that $\widehat{\Xi}\le_{W}\lim^{(k)}$, and prove well-posedness and representation invariance of this rank.
  A central negative result is that the unrestricted raw type-G SCI model (arbitrary post-processing of finite oracle transcripts) is generally not a computability model in the Type-2/Weihrauch sense: finite-query factorizations collapse raw type-G height, and analytic non-Borel decision problems yield examples with raw $\mathrm{SCI}_G=0$ but infinite Weihrauch-SCI rank. We therefore distinguish the raw extensional SCI from implemented SCI variants, where the indexed approximation table is required to be realized uniformly by a chosen class of operations. To recover a robust bridge, we introduce an intermediate SCI hierarchy by restricting the admissible deepest-level post-processing to regularity classes (continuous/Borel/Baire).</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.18955v3</guid>
      <category>math.LO</category>
      <category>cs.LO</category>
      <category>math.SP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Christopher Sorg</dc:creator>
    </item>
    <item>
      <title>Perfect divisibility and perfect-Pollyanna in bull-free graphs</title>
      <link>https://arxiv.org/abs/2603.21538</link>
      <description>arXiv:2603.21538v2 Announce Type: replace-cross 
Abstract: A graph $G$ is {\em perfectly divisible} if, for each induced subgraph $H$ of $G$, $V(H)$ can be partitioned into $A$ and $B$ such that $H[A]$ is perfect and $\omega(H[B])&lt;\omega(H)$. A {\em bull} is a graph consisting of a triangle with two disjoint pendant edges. Ho\`ang [Discrete Math. 349 (2026) 114809] proposed four conjectures: 1. $P_5$-free graphs are perfectly divisible; 2. Odd hole-free graphs are perfectly divisible; 3. Even hole-free graphs are perfectly divisible; and 4. $4K_1$-free graphs are perfectly divisible. Karthick et al. [Electron. J. Combin. 29 (2022) P3.19] proposed a conjecture: Fork-free graphs are perfectly divisible. In this paper, we prove that all of five conjectures above hold for bull-free graphs. Our results also generalize some results of Chudnovsky and Sivaraman [J. Graph Theory 90 (2019) 54--60] and Karthick et al. [Electron. J. Combin. 29 (2022) P3.19].
  We say that a class ${\cal C}$ is {\em perfect-Pollyanna} if ${\cal C}\cap {\cal G}$ is perfectly divisible for any hereditary class ${\cal G}$ in which each triangle-free graph is 3-colorable. Let $H\in\{\text{house, hammer, diamond}\}$. In this paper, we prove that the class of $(\text{bull}, H)$-free graphs is perfect-Pollyanna. Let ${\cal C}$ be the class of $(\text{bull}, H)$-free graphs. This implies that ${\cal C}\cap {\cal G}$ is perfectly divisible if and only if all of triangle-free graphs in ${\cal G}$ are perfectly divisible. As corollaries, we show that $(\text{bull},{\cal H})$-free graphs are perfectly divisible, where ${\cal H}$ is one of $\{P_{11},C_4\},\{P_{14},C_5,C_4\}$, and $\{P_{17},C_6,C_5,C_4\}$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.21538v2</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-sa/4.0/</dc:rights>
      <dc:creator>Ran Chen, Paras Vinubhai Maniya, Di Wu, Junran Yu</dc:creator>
    </item>
    <item>
      <title>Multi-GPU Hybrid Particle-in-Cell Monte Carlo Simulations for Exascale Computing Systems</title>
      <link>https://arxiv.org/abs/2603.24508</link>
      <description>arXiv:2603.24508v3 Announce Type: replace-cross 
Abstract: Particle-in-Cell (PIC) Monte Carlo (MC) simulations are central to plasma physics but face increasing challenges on heterogeneous HPC systems due to excessive data movement, synchronization overheads, and inefficient utilization of multiple accelerators. In this work, we present a portable, multi-GPU hybrid MPI+OpenMP implementation of BIT1 that enables scalable execution on both Nvidia and AMD accelerators through OpenMP target tasks with explicit dependencies to overlap computation and communication across devices. Portability is achieved through persistent device-resident memory, an optimized contiguous one-dimensional data layout, and a transition from unified to pinned host memory to improve large data-transfer efficiency, together with GPU Direct Memory Access (DMA) and runtime interoperability for direct device-pointer access. Standardized and scalable I/O is provided using openPMD and ADIOS2, supporting high-performance file I/O, in-memory data streaming, and in-situ analysis and visualization. Performance results on pre-exascale and exascale systems, including Frontier (OLCF-5) for up to 16,000 GPUs, demonstrate significant improvements in run time, scalability, and resource utilization for large-scale PIC MC simulations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.24508v3</guid>
      <category>physics.plasm-ph</category>
      <category>cs.DC</category>
      <category>cs.PF</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Jeremy J. Williams, Jordy Trilaksono, Stefan Costea, Yi Ju, Luca Pennati, Jonah Ekelund, David Tskhakaya, Leon Kos, Ales Podolnik, Jakub Hromadka, Allen D. Malony, Sameer Shende, Tilman Dannert, Frank Jenko, Erwin Laure, Stefano Markidis</dc:creator>
    </item>
    <item>
      <title>On graph products and multi-word-representability</title>
      <link>https://arxiv.org/abs/2603.29629</link>
      <description>arXiv:2603.29629v4 Announce Type: replace-cross 
Abstract: The multi-word-representation number $\mu(G)$ of a graph $G$ is the minimum number of word-representable graphs whose union is $G$. We study the behavior of $\mu$ under six standard graph products: the lexicographic, Cartesian, rooted, corona, tensor, and strong products. For the Cartesian and rooted products, we show that $\mu(G_1 \square G_2)=\mu(G_1 \diamond G_2)=\max\{\mu(G_1),\mu(G_2)\}$. For the corona product, we prove that $\mu(G_1 \odot G_2)\le \max\{\mu(G_1),\mu(G_2)\}+1$, and we identify a condition under which equality holds. For the lexicographic product, we establish $\mu(G_1 \circ G_2)\le \mu(G_1)+\mu(G_2)$, which reduces to $\max\{\mu(G_1),\mu(G_2)\}$ under a comparability cover condition on $G_2$, and we characterize the case when the lexicographic product of two minimal non-word-representable graphs has $\mu=2$. For the tensor product $G_1 \times G_2$, we show $\mu(G_1 \times G_2)\le \log_3(\min\{\chi(G_1),\chi(G_2)\})$. For the strong product $G_1 \boxtimes G_2$, we establish $\max\{\mu(G_1),\mu(G_2)\}\le \mu(G_1 \boxtimes G_2)\le \max\{\mu(G_1),\mu(G_2)\}+\log_3(\min\{\chi(G_1),\chi(G_2)\})$. For lexicographic powers $G^{[k]}$, we prove that $\mu(G^{[k]})\le k$ when $G$ is word-representable but not a comparability graph, and in general $\mu(G^{[k]})$ is bounded by the comparability cover number of $G$. We further show that $G^{[k]}$ is word-representable if and only if $G$ is a comparability graph. As an application, we obtain a sublinear upper bound on the extremal function $\tau(n)$, defined as the largest integer such that every $n$-vertex graph contains a word-representable induced subgraph of that size; in particular, $\tau(8^k)\le 6^k$, implying $\tau(n)\le n^{\log_8 6+\epsilon}$ for large $n$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2603.29629v4</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Benny George Kenkireth, Gopalan Sajith, Sreyas Sasidharan</dc:creator>
    </item>
    <item>
      <title>Predictive Volatility of Machine Learning in Micro-Samples: A Regularised Assessment of Regional Poverty</title>
      <link>https://arxiv.org/abs/2604.06278</link>
      <description>arXiv:2604.06278v4 Announce Type: replace-cross 
Abstract: Small regional datasets pose a dual statistical problem: correlated predictors inflate estimation variance, while flexible learners can become unstable because the available information per adaptive degree of freedom is limited. We examine this issue through predictive volatility, defined as the cross-sample dispersion and upper-tail behaviour of out-of-sample loss. Using simulation evidence reported for sparse linear, near-linear and heavy-tailed settings, we compare ordinary least squares, frequentist penalties, Bayesian shrinkage models, bounded-response and spatial specifications, and flexible machine-learning procedures. In the reported simulation results, regularised linear estimators generally dominate in the linear high-collinearity micro-sample settings and remain the most reliable overall, whereas tree-based methods become more competitive only when the signal is weakly nonlinear and the sample size is larger. In the empirical application to 34 Indonesian provinces, ridge yields the best leave-one-out performance, followed by elastic net and lasso. Across the Bayesian shrinkage specifications, ICT skills show the most consistent negative association with poverty, with the strongest support under horseshoe and spike-and-slab formulations. These results suggest that, in micro-sample regional modelling, the main constraint is limited information per effective degree of freedom rather than insufficient algorithmic flexibility.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.06278v4</guid>
      <category>stat.ME</category>
      <category>cs.CY</category>
      <category>stat.AP</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>A. H. Jamaluddin, A. T. R. Dani, N. I. Mahat, V. Ratnasari, S. S. M. Fauzi</dc:creator>
    </item>
    <item>
      <title>A Practical Introduction to Tensor Network Renormalization with TNRKit.jl</title>
      <link>https://arxiv.org/abs/2604.06922</link>
      <description>arXiv:2604.06922v4 Announce Type: replace-cross 
Abstract: We present TNRKit, an open-source Julia package for Tensor Network Renormalization (TNR) of two- and three-dimensional classical statistical models and Euclidean lattice field theories. Built on top of TensorKit, it provides a symmetry-aware framework for constructing tensor-network representations of partition functions and coarse-graining them using methods such as TRG, HOTRG, and LoopTNR. Beyond thermodynamic quantities, the package enables the extraction of universal conformal data -- including scaling dimensions and the central charge -- directly from fixed-point tensors. TNRKit is designed with both usability and extensibility in mind, offering a practical platform for applying, benchmarking, and developing modern tensor renormalization algorithms. This paper also serves as a self-contained introduction to the TNR framework.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.06922v4</guid>
      <category>cond-mat.str-el</category>
      <category>cond-mat.stat-mech</category>
      <category>cs.MS</category>
      <category>quant-ph</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Victor Vanthilt, Adwait Naravane, Chenqi Meng, Atsushi Ueda</dc:creator>
    </item>
    <item>
      <title>Learning What's Real: Disentangling Signal and Measurement Artifacts in Multi-Sensor Data, with Applications to Astrophysics</title>
      <link>https://arxiv.org/abs/2604.09787</link>
      <description>arXiv:2604.09787v2 Announce Type: replace-cross 
Abstract: Data collected from the physical world is always a combination of multiple sources: an underlying signal from the physical process of interest and a signal from measurement-dependent artifacts from the sensor or instrument. This secondary signal acts as a confounding factor, limiting our ability to extract information about the physics underlying the phenomena we observe. Furthermore, it complicates the combination of observations in heterogeneous or multi-instrument settings. We propose a deep learning framework that leverages overlapping observations, a dual-encoder architecture, and a counterfactual generation objective to disentangle these factors of variation. The resulting representations explicitly separate intrinsic signals from sensor-specific distortions and noise, and can be used for counterfactual view generation, parameter inference unconfounded by measurement distortions, and instrument-independent similarity search. We demonstrate the effectiveness of our approach on astrophysical galaxy images from the DESI Legacy Imaging Survey (Legacy) and the Hyper Suprime-Cam (HSC) Survey as a representative multi-instrument setting. This framework provides a general recipe for scientific and multi-modal self-supervised pretraining: construct training pairs from overlapping observations of the same physical system, treat sensor- or modality-specific effects as augmentations, and learn invariant representations through counterfactual generation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.09787v2</guid>
      <category>astro-ph.IM</category>
      <category>astro-ph.GA</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pablo Mercader-Perez, Carolina Cuesta-Lazaro, Daniel Muthukrishna, Jeroen Audenaert, V. Ashley Villar, David W. Hogg, Marc Huertas-Company, William T. Freeman</dc:creator>
    </item>
    <item>
      <title>Adversarial Robustness of NTK Neural Networks</title>
      <link>https://arxiv.org/abs/2604.25965</link>
      <description>arXiv:2604.25965v2 Announce Type: replace-cross 
Abstract: Deep learning models are widely deployed in safety-critical domains, but remain vulnerable to adversarial attacks. In this paper, we study the adversarial robustness of NTK neural networks in the context of nonparametric regression. We establish minimax optimal rates for adversarial regression in Sobolev spaces and then show that NTK neural networks, trained via gradient flow with early stopping, can achieve this optimal rate. However, in the overfitting regime, we prove that the minimum norm interpolant is vulnerable to adversarial perturbations.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.25965v2</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuxuan Hou</dc:creator>
    </item>
    <item>
      <title>Fast-Vollib: A Fast Implied Volatility Library for Pythonwith PyTorch, JAX, and CUDA Fused-Kernel Backends</title>
      <link>https://arxiv.org/abs/2604.27210</link>
      <description>arXiv:2604.27210v2 Announce Type: replace-cross 
Abstract: We present fast-vollib, an open-source Python library that provides high-performance European option pricing, implied volatility (IV) computation, and Greeks under the Black-76, Black-Scholes, and Black-Scholes-Merton models. The library is designed as a drop-in alternative to the de-facto-standard py_vollib and py_vollib_vectorized packages, with pluggable PyTorch and JAX execution backends, a CUDA fused-kernel Triton contribution for batched IV workloads, and a compatibility-first public API. In addition to a vectorized Halley-method IV solver, fast-vollib ships an experimental, fully-vectorized implementation of J\"ackel's "Let's Be Rational" (LBR) algorithm with NumPy/Numba, torch.compile, JAX, and Triton single-pass GPU kernels for batched option chains. This note announces the library and describes its public API surface, with source, documentation, and packaging artifacts available at: GitHub (https://github.com/raeidsaqur/fast-vollib), Docs (https://raeidsaqur.github.io/fast-vollib/), PyPI (https://pypi.org/project/fast-vollib/).</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27210v2</guid>
      <category>q-fin.CP</category>
      <category>cs.MS</category>
      <category>q-fin.MF</category>
      <category>q-fin.PR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:creator>Raeid Saqur</dc:creator>
    </item>
    <item>
      <title>Solution Sets for Inverse Infinite-Horizon Linear-Quadratic Descriptor Differential Games</title>
      <link>https://arxiv.org/abs/2604.27460</link>
      <description>arXiv:2604.27460v4 Announce Type: replace-cross 
Abstract: In this letter, we study a model-based inverse problem for infinite-horizon linear-quadratic differential games with descriptor dynamics. Given an observed feedback strategy profile, we seek to identify all cost functions that rationalize it as a feedback Nash equilibrium; this collection is referred to as the solution set. We characterize the solution set, show that it is rectangular and convex, and provide an algorithm for computing an admissible realization whenever it is nonempty. We also show that, compared with the corresponding inverse problem for standard state-space dynamics, descriptor dynamics modify the geometry of the solution set and may reduce identifiability. Finally, we illustrate the results with numerical examples.</description>
      <guid isPermaLink="false">oai:arXiv.org:2604.27460v4</guid>
      <category>math.OC</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Aaditya Kumar, Puduru Viswanadha Reddy</dc:creator>
    </item>
    <item>
      <title>Producing Quality Pseudorandomness with a Generalized Gauss Continued-Fraction Map</title>
      <link>https://arxiv.org/abs/2605.05378</link>
      <description>arXiv:2605.05378v3 Announce Type: replace-cross 
Abstract: Well-known chaotic maps, such as the logistic and tent maps, have been used to generate cryptographically secure pseudorandomness, yet we know of no efforts which attempt to use the Gauss continued-fraction map, a known chaotic map, as a starting point for producing quality pseudorandom output. In this paper, we consider the family of $r$-continued-fraction maps, which generalize the Gauss map, and use them to generate pseudorandom output which outperforms many standard generators, such as the Mersenne Twister, in statistical quality, as ascertained by use of the Dieharder, PractRand, and TestU01 suites. In this way, we demonstrate the potential viability of these maps as a starting point for novel generators, and provide practical motivation for further study of the properties of both the exact and finite-precision $r$-continued fraction maps.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.05378v3</guid>
      <category>math.DS</category>
      <category>cs.NA</category>
      <category>math.NA</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Benjamin V. Holt</dc:creator>
    </item>
    <item>
      <title>ForcingDAS: Unified and Robust Data Assimilation via Diffusion Forcing</title>
      <link>https://arxiv.org/abs/2605.14285</link>
      <description>arXiv:2605.14285v2 Announce Type: replace-cross 
Abstract: Data assimilation (DA) estimates the state of an evolving dynamical system from noisy, partial observations, and is widely used in scientific simulation as well as weather and climate science. In practice, filtering methods rely on frame-to-frame transition models. However, these models are fragile when observations are non-Markovian (when they form only a partial slice of a higher-dimensional latent state as in real-world weather data): they tend to accumulate errors over long horizons. At the same time, learned DA methods typically commit to a single regime, either filtering (nowcasting, real-time forecasting) or smoothing (retrospective reanalysis), which splits what should be a shared prior across application-specific pipelines. To address both issues, we introduce ForcingDAS, a unified and robust DA framework. Built on Diffusion Forcing with an independent noise level assigned to each frame, ForcingDAS learns a joint-trajectory prior instead of frame-to-frame transitions. This allows it to capture long-horizon temporal dependencies and reduce error accumulation. In addition, the same trained model spans the full filtering to smoothing spectrum at inference time. Specifically, nowcasting, fixed-lag smoothing, and batch reanalysis are selected through the inference schedule alone, without retraining. We evaluate ForcingDAS on 2D Navier-Stokes vorticity, precipitation nowcasting, and global atmospheric state estimation. Across all settings, a single model is competitive with or outperforms both learned and classical baselines that are specialized for individual regimes, with the largest gains observed on real-world weather benchmarks.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.14285v2</guid>
      <category>eess.IV</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Yixuan Jia, Siyi Chen, Yida Pan, Xiao Li, Lianghe Shi, Chanyong Jung, Haijie Yuan, Ismail Alkhouri, Yue Cynthia Wu, Saiprasad Ravishankar, Jeffrey A Fessler, Qing Qu</dc:creator>
    </item>
    <item>
      <title>SwAIther-Precip: Lead-Time-Aware Bias Correction Enables Kilometer-Scale Downscaling of Global AI Precipitation Forecasts over Switzerland</title>
      <link>https://arxiv.org/abs/2605.16163</link>
      <description>arXiv:2605.16163v2 Announce Type: replace-cross 
Abstract: Skillful medium-range precipitation forecasting at kilometer scale remains challenging over complex terrain because precipitation arises from multiscale nonlinear processes that global models cannot explicitly resolve at affordable cost. Global AI weather models can produce skillful medium-range forecasts, but their native 0.25 degrees resolution limits direct use for local hazard applications. Statistical downscaling can help bridge this gap, yet existing approaches often struggle with state-dependent, and especially lead-time-dependent, biases in global forecasts. We introduce SwAIther-Precip, a lead-time-aware downscaling framework that converts coarse-resolution AIFS forecasts into probabilistic km-scale precipitation fields over Switzerland. First, a U-Net conditioned on lead time via feature-wise linear modulation deterministically corrects systematic biases at coarse resolution. This targeted correction enables a cheaper super-resolution stage conditioned only on corrected precipitation, allowing direct training on observations rather than on the full atmospheric state. A diffusion-based model then generates fine-scale spatial variability independently of lead time. Using AIFS forecasts and CombiPrecip radar-gauge observations, SwAIther-Precip reduces CRPS by 48% relative to raw AIFS. The generated fields reproduce observed spatial variability with spectral fidelity above 0.85 at large scales and 0.88 at small scales, corresponding to an effective resolution of approximately 4 km on a 1 km grid for lead times up to 5 days. Training across lead times further improves long-range performance, yielding a 13% CRPS reduction at 6 days relative to lead-time-specific models. These results show that explicitly correcting lead-time-dependent biases before generative super-resolution is key to efficient km-scale probabilistic downscaling of global AI precipitation forecasts.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.16163v2</guid>
      <category>physics.ao-ph</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dan Assouline, Erwan Koch, Federico Amato, Filippo Quarenghi, Daniele Nerini, Thibaut Loiseau, Kyle van de Langemheen, Tom Beucler</dc:creator>
    </item>
    <item>
      <title>Proper Calibeating</title>
      <link>https://arxiv.org/abs/2605.26703</link>
      <description>arXiv:2605.26703v2 Announce Type: replace-cross 
Abstract: The classic concept of "calibrated forecasts" and its more recent refinement, "calibeating," are defined with respect to the standard quadratic scoring rule. We extend these notions to the class of $\textit{proper}$ scoring rules (for which the best forecast is the true distribution) and define $\textit{proper-calibration}$ and $\textit{proper-calibeating}$ by requiring the errors to converge to zero uniformly over all bounded proper scoring rules. We first establish that calibration always implies proper-calibration, whereas calibeating need not imply proper-calibeating. Second, we show how to guarantee proper-calibeating and proper-multicalibeating. Finally, we demonstrate the equivalence between proper-calibration and universal no regret when best replying to forecasts in decision-making under uncertainty.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.26703v2</guid>
      <category>econ.TH</category>
      <category>cs.GT</category>
      <category>cs.LG</category>
      <category>stat.ML</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Dean P. Foster, Sergiu Hart</dc:creator>
    </item>
    <item>
      <title>Optimal quantum locally differentially private mechanisms in the high-privacy regime</title>
      <link>https://arxiv.org/abs/2605.27278</link>
      <description>arXiv:2605.27278v2 Announce Type: replace-cross 
Abstract: We optimize the trade-off between privacy and utility in the high-privacy regime. We adopt local differential privacy (LDP) and its quantum extension, quantum local differential privacy (QLDP), for privacy protection, and investigate utility functions including the Holevo information (which reduces to the mutual information in the classical case) and the error exponents in symmetric and asymmetric hypothesis testing. These utility functions have classical and quantum optimal values, which are denoted by $C$ and $Q$, respectively, in this abstract for simplicity. In this paper, we provide optimal LDP and QLDP mechanisms achieving the classical and quantum optimal values in the high-privacy regime, and prove that the asymptotic ratio $Q/C$ in this regime takes the same value regardless of the utility function. Our results reveal quantum advantages (more precisely, $Q/C\ge3/2$) for the above utility functions when the protected private data are $n$-ary with $n\ge3$.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.27278v2</guid>
      <category>quant-ph</category>
      <category>cs.IT</category>
      <category>math.IT</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Yuuya Yoshida</dc:creator>
    </item>
    <item>
      <title>Zero-shot Quantum Neural Architecture Search</title>
      <link>https://arxiv.org/abs/2605.27410</link>
      <description>arXiv:2605.27410v2 Announce Type: replace-cross 
Abstract: Variational Quantum Algorithms (VQAs) are a leading approach to exploiting near-term quantum hardware, leveraging parameterized quantum circuits and classical optimization to achieve advantage. Despite their promise, the practical deployment of VQAs is challenged by the difficulty of designing quantum circuit architectures that balance expressivity, trainability, and hardware constraints. Existing evolutionary-based quantum neural architecture search methods address these challenges but suffer from high computational costs due to repeated training of candidate circuits. In this work, we identify a setting in which the Gram matrix of the Quantum Neural Tangent Kernel converges. Building on this observation, we design a zero-shot surrogate model to estimate candidate performance without full training, significantly accelerating the architecture search process. Using this surrogate, we propose MZeQAS, a Monte Carlo Tree Search (MCTS)-based Zero-Shot Quantum Neural Architecture Search framework for VQAs. By integrating proxy-based performance estimation with MCTS exploration, MZeQAS efficiently discovers high-performing architectures. Experimental results demonstrate that MZeQAS outperforms existing approaches in terms of both search efficiency and solution quality, providing a scalable and effective framework for advancing VQA deployment on noisy intermediate-scale quantum devices.</description>
      <guid isPermaLink="false">oai:arXiv.org:2605.27410v2</guid>
      <category>quant-ph</category>
      <category>cs.LG</category>
      <category>cs.NE</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by-nc-sa/4.0/</dc:rights>
      <dc:creator>Tung Dao, Son N. Tran, Huynh Thi Thanh Binh</dc:creator>
    </item>
    <item>
      <title>Parameter-Free and Group Conditional Online Conformal Prediction</title>
      <link>https://arxiv.org/abs/2606.00419</link>
      <description>arXiv:2606.00419v3 Announce Type: replace-cross 
Abstract: Uncertainty quantification (UQ) is critical for the deployment of machine learning predictors in real-world scenarios where the data distribution may shift over time (i.e., data may not be exchangeable). Online conformal prediction (OCP) methods address this issue at the expense of either (i) group-wise error control or (ii) learning-rate independent implementation. Group-conditional coverage is essential for fairness across different collections of data points and for providing finer UQ guarantees. Parameter-free optimization is crucial for robustness to adversarial and unknown data shifts. We propose a parameter-free algorithm for group-conditional OCP and demonstrate that it achieves the best group-conditional coverage guarantees. We evaluate our algorithm on synthetic and real-world data, demonstrating that our method not only improves the reliability of existing parameter-free OCP methods but also provides prediction intervals that are comparable in size to well-tuned group-conditional approaches. By unifying group-conditional coverage with parameter-free online algorithms, our work lays a foundation for fair and robust uncertainty quantification in shifting environments.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.00419v3</guid>
      <category>stat.ML</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Beepul Bharti, Ambar Pal, Jacopo Teneggi, Jeremias Sulam</dc:creator>
    </item>
    <item>
      <title>On the History of the Square and Multiply Algorithm</title>
      <link>https://arxiv.org/abs/2606.00958</link>
      <description>arXiv:2606.00958v2 Announce Type: replace-cross 
Abstract: The square-and-multiply algorithm, also known as binary exponentiation or repeated squaring, is a technique for fast exponentiation commonly used in modern cryptography and computational number theory. Despite its prominence, the historical origins of the algorithm are not known with certainty. This paper critically examines the origins and formalization of the algorithm through primary source analysis. We focus on Jamshid al-Kashi's fifteenth-century Miftah al-Hisab where the algorithm is articulated explicitly as a general method and claimed by al-Kashi as his own innovation. To contextualize this, we trace earlier instances of successive squaring in the works of al-Uqlidisi and al-Biruni, who applied these techniques for specific calculations, but did not formalize them into a general procedure. The earliest known work on this method of computation is found in Pingala's prosodic studies in ancient India (c. 200 BCE). Even though it was not fully developed as a general technique, Pingala's work seems to contain the conceptual foundation of the algorithm which is to employ the binary representation of a positive integer. By mapping this intellectual progression, the paper illustrates the historical background of an algorithm that is prominent in modern computation.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.00958v2</guid>
      <category>math.HO</category>
      <category>cs.CR</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Nuh Aydin, Mohammad K. Azarian, Omid Khormali, Ghaya Mtimet</dc:creator>
    </item>
    <item>
      <title>A New Method for Finding the Schulze Winner Set</title>
      <link>https://arxiv.org/abs/2606.02213</link>
      <description>arXiv:2606.02213v2 Announce Type: replace-cross 
Abstract: We propose a new voting algorithm based on the pairwise majority-comparison matrix derived from voters' preference profiles. We show that this algorithm induces exactly the winner set of the Schulze rule (Schulze, 1997). Our algorithm successively eliminates weaker candidates in terms of all-pairs comparisons, thereby reflecting a dual spirit to Condorcet's original idea of splitting preference cycles (de Condorcet, 1785). We further show that the direct sum of the survival sets obtained at each elimination round coincides with the Schwartz set (Schwartz, 1972). These two equivalence results provide a formal mathematical foundation for the ``folklore'' relationship between the Schulze winner set and the Schwartz set, as well as a new Condorcetian interpretation of the Schulze winner set.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.02213v2</guid>
      <category>econ.TH</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Satoru Fujishige, Leo Goto, Satoshi Nakada</dc:creator>
    </item>
    <item>
      <title>Erd\H{o}s Rado Sunflower Theorem for Shifted Families</title>
      <link>https://arxiv.org/abs/2606.02667</link>
      <description>arXiv:2606.02667v2 Announce Type: replace-cross 
Abstract: Let $f(k,s)$ denote the minimum integer $m$ such that any family $\mathcal{F}$ consisting of $k$-sized sets of cardinality at least $m$ always contain a sunflower of size $s$. The Erd\H{o}s-Rado Sunflower Conjecture states that for every $s &gt;2$, there is an constant $C=C(s)$ such that $f(k,s) \leq C^k$. In this paper, we prove the conjecture for shifted families.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.02667v2</guid>
      <category>math.CO</category>
      <category>cs.DM</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Tapas Kumar Mishra</dc:creator>
    </item>
    <item>
      <title>One Transit Is All You Need: Detecting Exoplanets Through Learned Stellar Behaviour with EXOVEIL</title>
      <link>https://arxiv.org/abs/2606.02778</link>
      <description>arXiv:2606.02778v2 Announce Type: replace-cross 
Abstract: I present EXOVEIL, a transit detection system that learns what a star's brightness should look like and flags when reality disagrees. Unlike existing systems that require phase-folded input, EXOVEIL operates on raw flux time series and can detect planets that transit only once.A Transformer world model, trained on 16,499 Kepler light curves with transit-masked self-supervised learning, predicts expected stellar flux. A matched-filter detector with variance weighting extracts transit signals from the prediction residuals. A learned classifier (XGBoost) separates planets from false positives, achieving AUC 0.938 on Kepler DR25. Applied to single-transit injection-recovery, EXOVEIL recovers 32% of transits at 1000 ppm depth a task where all classification-based systems score 0% by construction. A blind search of 3,737 Kepler stars yields 179 new transit-like signals not present in the DR25 TCE catalogue, including 46 monotransit candidates. Applied withoutretraining to 47 confirmed TESS planets in the PLATO LOPS2 field, EXOVEIL achieves 100% recovery, demonstrating zero-shot cross-mission transfer. At PLATO's 25-second cadence, detection reaches 100 ppm -- approaching the Earth-analog regime. I provide the first application of conformal prediction to transit detection (95.9% empirical coverage) and release the system as pip install exoveil with pretrained weights and a candidate catalogue.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.02778v2</guid>
      <category>astro-ph.EP</category>
      <category>astro-ph.IM</category>
      <category>cs.LG</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Pratik Priyanshu</dc:creator>
    </item>
    <item>
      <title>Optimizing Explicit Unit-Distance Lower-Bound Certificates</title>
      <link>https://arxiv.org/abs/2606.03419</link>
      <description>arXiv:2606.03419v4 Announce Type: replace-cross 
Abstract: The 2026 disproof of Erd\H{o}s's unit-distance conjecture and Sawin's quantitative refinement show that the maximum number $u(n)$ of unit distances among $n$ planar points can exceed $n^{1+\varepsilon}$ for a fixed positive $\varepsilon$. Sawin's explicit bound gives more than $n^{1.014}$ unit distances for arbitrarily large $n$ and exposes integer parameters whose choice is not fully optimized. This report treats Sawin's parameter selection as a nonlinear integer optimization problem and develops an open-source Python optimization and verification pipeline for certificates involving prime sets $T$ and $S_Q$, integer multiplicities $k(p)$, and a rationally encoded real parameter $R$. After reproducing Sawin's certificate with $\delta=0.014114\ldots$, the pipeline yields improved certificates with the same $T$. We develop a tailored integer evolution strategy achieving a certificate with $\delta=0.015263\ldots$ and supporting the cautious statement $u(n)&gt;n^{1.0152}$ for arbitrarily large $n$. For extended ramified prime ranges, the Emmerich--Cordella certificate obtained with the same framework reports $u(n)&gt;n^{1.031}$ for $\#T=67$, illustrating the importance of enlarging $T$. Very recent MathOverflow discussions, brought to the author's attention as of version~4, report further improvements, including certificates above $\delta&gt;0.035$ and beyond $\delta&gt;0.036$. Some of these improvements may rely not only on larger prime ranges but also on modified constraint systems and additional degrees of freedom that deviate from Sawin's original formulation. Beyond this application, the work illustrates how randomized optimization heuristics can improve, verify, and refine explicit certificates for combinatorial geometry through nonlinear integer optimization.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.03419v4</guid>
      <category>math.OC</category>
      <category>cs.AI</category>
      <category>cs.CG</category>
      <category>cs.NE</category>
      <category>math.CO</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0/</dc:rights>
      <dc:creator>Michael T. M. Emmerich</dc:creator>
    </item>
    <item>
      <title>Optimal Finite-Horizon LQR Control for Traffic Flow via Variable Speed Limits</title>
      <link>https://arxiv.org/abs/2606.03632</link>
      <description>arXiv:2606.03632v2 Announce Type: replace-cross 
Abstract: This article presents a finite-horizon linear quadratic regulator for the control of the first-order Lighthill-Whitham-Richards traffic model with a triangular fundamental diagram. The in-domain control action is realized through variable speed limits implemented as a source term in the governing hyperbolic partial differential equation. Unlike prior studies on infinite-horizon formulations, this article develops a finite-horizon LQR framework, deriving a space and time varying state feedback function for hyperbolic PDEs. The solution to the finite time optimal control problem relies on the solution of another PDE, called the Riccati PDE. The resulting nonlinear Riccati PDE is solved analytically via the parametric method of characteristics. The Riccati PDE solution is a function of both time and space, as well as the traffic regime. A sensitivity analysis demonstrates the effects of the LQR parameters for both the infinite and finite time horizon problem in different traffic situations, while siulations validate the finite-horizon LQR's ability to guarentee finite-time convergence. Comapred to the infinite-horizon LQR, the proposed approach achieves significantly improved control performance across various scenarios, making it particularly suitable for time-sensitive traffic management applications.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.03632v2</guid>
      <category>math.OC</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Brian Block, Stephanie Stockar</dc:creator>
    </item>
    <item>
      <title>Geometric Time-Domain Identification of Three-Phase Load Equivalents from Terminal Measurements</title>
      <link>https://arxiv.org/abs/2606.07048</link>
      <description>arXiv:2606.07048v2 Announce Type: replace-cross 
Abstract: This paper presents a geometric time-domain method for identifying three-phase load equivalents from instantaneous voltage and current measurements at the point of common coupling. Measured waveforms are interpreted as trajectories in Euclidean signal spaces, and load-equivalent parameters are recovered from the geometry of those trajectories. The method extends a previously published single-phase geometric identification formulation to three- and four-wire systems and places special emphasis on the three-wire case, where no neutral voltage is measured and the terminal data must satisfy coupled Kirchhoff constraints. The main advance over the earlier analytical formulation is a sampled-data implementation based on local time windows, normalized matrix equations, harmonic-projection derivative and primitive coordinates, explicit geometric identifiability tests, passivity constraints, and energy/Kirchhoff residuals. The method does not force a model when the measured trajectory lacks enough information; instead, it reports low-rank or ill-conditioned windows as low-confidence evidence. Numerical simulations with clean data, measurement noise, window-length sweeps, and sensor delay show that the method accurately identifies informative three-phase trajectories and exposes structurally degenerate cases such as pure single-frequency excitation for higher-order three-wire models. For a given admissible topology the identified circuit closes the instantaneous terminal energy balance of the measured load over the analysis window.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07048v2</guid>
      <category>eess.SP</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Francisco G. Montoya, Francisco de Leon, Francisco M. Arrabal-Campos, Alfredo Alcayde</dc:creator>
    </item>
    <item>
      <title>Branch-Level Energy Localization in Three-Phase Loads: Resolving Indeterminacy in Time-Domain</title>
      <link>https://arxiv.org/abs/2606.07076</link>
      <description>arXiv:2606.07076v2 Announce Type: replace-cross 
Abstract: This paper develops a branch-level energy-localization framework for three-phase loads. The instantaneous terminal power of an admissible lumped equivalent is decomposed uniquely as Joule dissipation plus magnetic and electric stored-energy rates, branch by branch. Three formal results are established: a Branch-Level Localization Theorem (uniqueness given an admissible topology); a Topology-Indeterminacy Theorem (multiple admissible topologies reproduce identical terminal data with distinct localizations); and a Generalized Energetic Duality Theorem that organizes classical electrical dualities (Norton-Thevenin, series--parallel, L vs C, R vs G) as restrictions to Linear Time Invariant (LTI) sinusoidal regimes of a single time-domain principle in which constant-parameter equivalence is replaced by time-varying parameters. The framework is exercised on six test cases including the de Leon--Cohen open-phase paradox, switched-resistive loads, three-wire delta-versus-wye-virtual indeterminacy, fluctuating-phase loads, and a four-wire nonlinear load with hysteretic, linear, and switched branches. The framework is positioned as complementary to IEEE Std. 1459, CPC, instantaneous p-q, and Fryze-Buchholz-Depenbrock: each answers a different question, and the apparent paradoxes vanish once the question is posed precisely.</description>
      <guid isPermaLink="false">oai:arXiv.org:2606.07076v2</guid>
      <category>eess.SP</category>
      <category>cs.SY</category>
      <category>eess.SY</category>
      <pubDate>Tue, 09 Jun 2026 00:00:00 -0400</pubDate>
      <arxiv:announce_type>replace-cross</arxiv:announce_type>
      <dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
      <dc:creator>Francisco G. Montoya, Francisco de Leon, Francisco M. Arrabal-Campos, Alfredo Alcayde</dc:creator>
    </item>
  </channel>
</rss>
