Lossless LLM compression for efficient GPU inference via dynamic-length float
📰 Hacker News · CharlesW
Lossless LLM compression for efficient GPU inference via dynamic-length float. 117 comments, 411 points on Hacker News.
Lossless LLM compression for efficient GPU inference via dynamic-length float. 117 comments, 411 points on Hacker News.