Microsoft’s New Method Cuts Reasoning Model Memory by 3x — Here’s How It Actually Works

📰 Medium · Machine Learning

Microsoft's new method reduces reasoning model memory by 3x using MEMENTO, a technique that teaches LLMs to take notes on their own thinking

advanced Published 16 Apr 2026
Action Steps
  1. Implement MEMENTO in your LLM architecture to reduce memory usage
  2. Use the MEMENTO technique to teach your LLM to take notes on its own thinking
  3. Evaluate the performance of your LLM with and without MEMENTO to measure the memory reduction
  4. Apply the MEMENTO technique to various LLM applications to explore its potential benefits
  5. Compare the results of MEMENTO with other memory reduction techniques to determine its effectiveness
Who Needs to Know This

Machine learning engineers and researchers can benefit from this technique to improve the efficiency of their LLMs, while product managers can consider the potential applications of this technology in their products

Key Insight

💡 MEMENTO technique teaches LLMs to take notes on their own thinking, reducing memory usage

Share This
💡 Microsoft's new method reduces LLM memory by 3x using MEMENTO! 🤖
Read full article → ← Back to Reads