Microsoft’s New Method Cuts Reasoning Model Memory by 3x — Here’s How It Actually Works

📰 Medium · Machine Learning

Microsoft's new method reduces reasoning model memory by 3x using MEMENTO, a technique that teaches LLMs to take notes on their own thinking

advanced Published 16 Apr 2026

Action Steps

Implement MEMENTO in your LLM architecture to reduce memory usage
Use the MEMENTO technique to teach your LLM to take notes on its own thinking
Evaluate the performance of your LLM with and without MEMENTO to measure the memory reduction
Apply the MEMENTO technique to various LLM applications to explore its potential benefits
Compare the results of MEMENTO with other memory reduction techniques to determine its effectiveness

Who Needs to Know This

Machine learning engineers and researchers can benefit from this technique to improve the efficiency of their LLMs, while product managers can consider the potential applications of this technology in their products

Key Insight

💡 MEMENTO technique teaches LLMs to take notes on their own thinking, reducing memory usage