Token-Efficient Multimodal Reasoning via Image Prompt Packaging
📰 ArXiv cs.AI
arXiv:2604.02492v1 Announce Type: cross Abstract: Deploying large multimodal language models at scale is constrained by token-based inference costs, yet the cost-performance behavior of visual prompting strategies remains poorly characterized. We introduce Image Prompt Packaging (IPPg), a prompting paradigm that embeds structured text directly into images to reduce text token overhead, and benchmark it across five datasets, three frontier models (GPT-4.1, GPT-4o, Claude 3.5 Sonnet), and two task
DeepCamp AI