I was today years old when I found out Joblib was initially intended for on-disk memoization.
If you run the code below multiple times, it should automatically infer when identical input gets passed in. Not sure if the cache ever gets evicted?
NOTE: if you change the function in any way (e.g. adding a comment), joblib will warn you and recompute any input.
import random
import joblib
import logging
logging.basicConfig(level=logging.DEBUG)
mem = joblib.Memory("~/.cache")
@mem.cache
def memoized(x: str):
# Removing this comment will make joblib recompute results.
return x + " additional string"
if __name__ == "__main__":
print(memoized("this is my string new"))
print(memoized(f"this is my string {random.randint(0, 10_000)}"))
To directly memoize to disk (bypassing memory) you can also call:
memoized.call_and_shelve("asdf")
and get a joblib.MemorizedResult
back.
Joblib storage internals
For completeness, it seems cache is on a per-path per-method-hash base.
E.g. if the code is in /home/tk/proj/junk/memo
:
$ cat '~/.cache/joblib/__main__--home-tk-proj-junk-memo/memoized/func_code.py'
> # first line: 1
> @mem.cache
> def memoized(x: str):
> return x + " additional string"
$ cat '~/.cache/joblib/__main__--home-tk-proj-junk-memo/memoized/8c57f192feb71ccba3206245dd375221/metadata.json'
> {"duration": 0.0017366409301757812, "input_args": {"x": "'this is my string'"}}
$ cat '~/.cache/joblib/__main__--home-tk-proj-junk-memo/memoized/8c57f192feb71ccba3206245dd375221/output.pkl'
> [junk chars] 'this is my string new additional string' [junk chars]
This leads to some gotchas. Namely doing weird shit with renaming functions across sessions. Just don’t do it.