Pivoting to the H5MD format

June 9, 2024

A New Direction for Zarrtraj

After learning that Gromacs will likely soon support H5MD, my mentors and I have come up with a new and exciting direction for zarrtraj that will allow it to be used by researchers as soon as it is functional: by adopting the H5MD format, I can write an H5MD reader that doesn’t care if it is looking at a .zarr or a .h5 file. This is possible thanks to the folks at kerchunk, who have found a way to translate hdf5 metadata in such a way that makes .h5 files openable in zarr.

This means that hdf5 files can be streamed using the well-established zarr-python interface, meaning any optimizations which applied previously to zarrtraj still apply and can be reused! This is great news for the project since we can gain a ton of new functionality and application without rewriting a whole lot of code.

New Project Priorities

Given that H5MD is significantly different and more flexible than the new format we were developing, project priorities must shift to accommodate the format:

Implementing the full H5MD spec in a cloud reader and writer
Testing the reader and writer with H5MD edge cases
Setting up raw benchmarks to compare pure zarr and zarr + dask with the reader
Optimizing for random access

Lawson

A New Direction for Zarrtraj

New Project Priorities