The Topology of Multimodal Fusion: Why Current Architectures Fail at Creative Cognition

📰 ArXiv cs.AI

arXiv:2604.04465v1 Announce Type: new Abstract: This paper identifies a structural limitation in current multimodal AI architectures that is topological rather than parametric. Contrastive alignment (CLIP), cross-attention fusion (GPT-4V/Gemini), and diffusion-based generation share a common geometric prior -- modal separability -- which we term contact topology. The argument rests on three pillars with philosophy as the generative center. The philosophical pillar reinterprets Wittgenstein's say

Published 7 Apr 2026
Read full paper → ← Back to News