Memoization is a powerful tool for data scientists. It allows them to repeat a task multiple times, so that they can get the same result each time. This repetition can help speed up the process of data analysis and decision making. There are two main types of memoization: lazy memoization and forced memoization. Lazy memoization is when the data scientist doesn’t want to repeat a task because it’s slow or it’s not necessary. For example, if you have a list of features to test, you might not want to repeat the testing process every time you want to see if a feature works on a certain set of data. Instead, you might lazy-memoize the list of features and then run only those tests when you need to test them. Forced memoization is when the data scientist wants to repeat a task because it’s important or they need to be able to produce accurate results quickly. For example, if you’re trying to find outliers in your data set, forced memoization can help you do that by repeating the task multiple times so that you can get accurate results each time.


Memoization is a specific form of caching that lends itself to scenarios where a costly function is executed repeatedly, sometimes with the same arguments. Provided that the function is pure so that it always produces the same value from a particular set of inputs, memoizing it can increase efficiency and reduce wasted CPU cycles.

You’ll most often encounter memoization in functional programming languages. The technique has broad utility, though. It’s an abstract concept that you can incorporate into any recursive code. We’re using JavaScript for this article, but you could rewrite the examples into your working language.

A Simple Example

Here’s a simple function that generates the factorial of a given integer:

The factorial calculation is recursive, as factorial() calls itself within the else condition. The calculation is also pure, as any given value of n will always return the same value. factorial(5) is 120, irrespective of the state of the program.

Because of the recursive nature of the function, computing the factorial of several large numbers incurs wasted operations:

All the calculations needed to compute y were already performed as part of the computation of x. If factorial() were to cache its inputs and their corresponding outputs, the calculation of y could be significantly sped up.

Memoizing the Factorial Function

Here’s a basic approach that memoizes the factorial() function:

Now, there’s a cache object that factorial() uses to record its output values. Each time the function is called, it first checks whether it’s previously seen the input n. If it has, it can short circuit immediately and return the cached value. Otherwise, the recursive calculation proceeds as normal, but subsequent runs using the same number will be sped up.

Now, computing factorial(50) after factorial(100) will be vastly more efficient. The factorial of 50 would be calculated as part of the factorial of 100, so the function could return the value almost instantaneously.

A More General Solution

While the above code works, it’s specific to the factorial() function. If you were using other similar functions, you’d need to manually add the caching code to each one.

A more general solution would let you wrap any function with a higher-order function that provides the memoizing capability:

Now, you can adjust the original factorial() function:

By wrapping factorial() with memoize(), it gains automatic memoization capabilities. The wrapper function returns a new function that intercepts all calls to factorial(), stringifies their arguments, and checks whether they’ve been seen before. If they have, the previous return value is reused without calling the real function at all. When new arguments are seen, the function gets invoked, and its output is added to the cache.

The wrapper uses the JavaScript rest parameters syntax to accept a variadic number of arguments. This means that it will work with any JavaScript function. This approach is using JSON.stringify() to create the string representation of the arguments, so care should be taken if you’re calling a function with complex objects, which can’t be fully represented as JSON.

When Not to Use Memoization?

Memoization can deliver significant performance improvements, particularly to mathematically heavy operations. It isn’t a technique to use everywhere, though. Not all functions should be memoized, as you could end up harming performance in some cases.

By wrapping a function with memoize(), you’re forcing a comparison of the input arguments each time that function is called. This in itself will consume some CPU time. The example wrapper above runs JSON.stringify() every time that the function is called, adding a new overhead.

Memoization is appropriate for functions where there’s a high chance that the same input values will be seen regularly. If you’ll rarely call a function with the same arguments, cache hits will be infrequent, and you might lose performance to argument serialization and comparison. Maintaining the cache also increases memory use, as all previous inputs and outputs need to be retained.

You should, therefore, fully assess each function’s role in your program before deciding to memoize. Compare performance with and without memoization. Sometimes, it can prove more beneficial to rely on the browser’s own optimizations.

Finally, bear in mind that some functions can’t be memoized. The technique only works with pure functions—if your function reaches out to global variables or some other application state, it should not be memoized!

Memoizing the function above could yield unexpected results after the first run. Values of n will be cached using the original window.scrollY version. The memoization wrapper will return the cached output, even if window.scrollY has changed.

Summary

Memoization is a form of caching that accelerates the performance of repetitive recursive operations. It’s a functional programming technique that can be implemented as a generic wrapper for any pure function.

You’ll most often encounter memoization in frequently called functions that perform computationally heavy operations. It helps avoid waste by removing the need to recalculate values that have already been produced as part of a previous call.

The benefits of memoization will be less apparent in functions that are simple to begin with or infrequently called. In some cases, improper use of memoization can actually harm performance, so don’t blindly memoize every function in your application.