Priorities for the zarrtraj project

After meeting with my mentors, we decided on the following priorities for the project. Since this is a small-sized project, I’ll tackle these in order of priority:

  1. Benchmarking raw performance of zarr and zarr with dask
    • To see if the MDAnalysis timestep model of analysis is a good fit for zarr, I will benchmark the reading speed performance of zarr outside of MDAnalysis with and without dask. Then, after the zarrtraj reader is optimized, these benchmarks can be used to see how close to the maximum theoretical speed we’re able to get with the timestep-based model.
    • Then, to compare zarr’s compression to xtc files, I’ll measure the filesize of a zarrtraj file with maximum compression and xtc-level precision.
  2. Ease of use improvements
    • Creating reasonable default settings so users don’t have to supply keyword arguments and writing comprehensive documentation for using zarrtraj will help drive adoption.
  3. Demonstrated application of the format
    • I’ll make MDAnalysis test data available via S3 URL to demonstrate how zarrtraj makes it convenient and easy to share trajectories.
  4. Random-access to trajectory frames and optimization
    • I’ll experiment with blocking and non-blocking, parallel and serial IO to find out what zarrtraj design gets closest to the fastest theoretical reading speed.
  5. Test coverage of zarr cloud backends
    • By using mock cloud testing frameworks like moto for AWS, I can ensure that zarrtraj works with a variety of backends.

Updated: