Pivoting to the H5MD format
A New Direction for Zarrtraj
After learning that Gromacs will likely soon support H5MD,
my mentors and I have come up with a new and exciting direction for zarrtraj
that will allow it to be used by
researchers as soon as it is functional: by adopting the H5MD format, I can write an H5MD reader that doesn’t
care if it is looking at a .zarr
or a .h5
file. This is possible thanks to the folks at kerchunk,
who have found a way to translate hdf5 metadata in such a way that makes .h5
files openable in zarr.
This means that hdf5 files can be streamed using the well-established zarr-python interface, meaning any optimizations which
applied previously to zarrtraj
still apply and can be reused! This is great news for the project since we can gain a ton of
new functionality and application without rewriting a whole lot of code.
New Project Priorities
Given that H5MD is significantly different and more flexible than the new format we were developing, project priorities must shift to accommodate the format:
-
Implementing the full H5MD spec in a cloud reader and writer
-
Testing the reader and writer with H5MD edge cases
-
Setting up raw benchmarks to compare pure zarr and zarr + dask with the reader
-
Optimizing for random access