Storing Metric sequence data for fast access
Aim allows tracking metric sequences without any limitation of the number of steps tracked.
With a few thousands of steps per metric it is OK to load the entire sequence into memory
and slice it to get the desired number of points. However, when many (tens of millions) steps are tracked,
this approach cannot be applied.
Starting from version
3.11 Aim metric sequence data format has changed to improve the
data collection/sampling performance for long sequences.
What has changed?
Each value tracked has its unique key in the Run’s
rocksdb database. Previously
the tracking step has been used as the part of this key. This means that in order to get
the sample of K values we have to either do K random access operations or iterate over
entire sequence and skip values. Both have negative implication on data read performance.
3.11 the key uses the stable random hashing algorithm instead of tracking step.
Since the keys are sorted, reading first K keys is a good approximation for the entire
What to do if the Metric has been logged already?
In order to speed-up the metrics read for existing aim
Repos the data format upgrade
should be executed. In cases when
Run has been opened for write, the upgrade is done
automatically, since it is not possible to mix two formats.
For the rest of the cases new CLI command has been added to apply data format change
for selection of
Runs or for the entire repo:
# update selected list of runs aim storage upgrade 3.11+ <RUN_HASH_1> <RUN_HASH_2> ... # update all runs in the repo aim storage --repo <REPO_PATH> upgrade 3.11+ '*'
Run data backup with old format is available in case of any issues. In order to rollback format
change and get back to the old format run the following command:
aim storage restore <RUN_HASH>