Joblib for memoization

Tags

  • eng
  • py

I was today years old when I found out Joblib was initially intended for on-disk memoization.

If you run the code below multiple times, it should automatically infer when identical input gets passed in. Not sure if the cache ever gets evicted?

NOTE: if you change the function in any way (e.g. adding a comment), joblib will warn you and recompute any input.

import random
import joblib
import logging

logging.basicConfig(level=logging.DEBUG)

mem = joblib.Memory("~/.cache")


@mem.cache
def memoized(x: str):
  # Removing this comment will make joblib recompute results.
  return x + " additional string"


if __name__ == "__main__":
  print(memoized("this is my string new"))
  print(memoized(f"this is my string {random.randint(0, 10_000)}"))

To directly memoize to disk (bypassing memory) you can also call: memoized.call_and_shelve("asdf") and get a joblib.MemorizedResult back.

Joblib storage internals

For completeness, it seems cache is on a per-path per-method-hash base. E.g. if the code is in /home/tk/proj/junk/memo:

$ cat '~/.cache/joblib/__main__--home-tk-proj-junk-memo/memoized/func_code.py'
> # first line: 1
> @mem.cache
> def memoized(x: str):
>     return x + " additional string"

$ cat '~/.cache/joblib/__main__--home-tk-proj-junk-memo/memoized/8c57f192feb71ccba3206245dd375221/metadata.json'
> {"duration": 0.0017366409301757812, "input_args": {"x": "'this is my string'"}}

$ cat '~/.cache/joblib/__main__--home-tk-proj-junk-memo/memoized/8c57f192feb71ccba3206245dd375221/output.pkl'
> [junk chars] 'this is my string new additional string' [junk chars]

This leads to some gotchas. Namely doing weird shit with renaming functions across sessions. Just don’t do it.