Model hangs when writing netCDFs on HPC service

Hi there,

We are running our CROCO model on Meluxina in Luxemburg, and when we try to use more than 16 nodes (128 cores per node), the code hangs when writing netCDFs.

We are thinking that it could be a hardcoded part of how I/O is handled by CROCO, or something controlled/manipulated by a setting in the slurm script.

Any advice would be greatly appreciated.

Thanks

@jvmcgovern
If there have any locking issues.
export HDF5_USE_FILE_LOCKING=FALSE
later you can check with striping (Optimizations)
Best,

1 Like

Thanks Subhadeep. Did you experience this issue previously in a similar manner?

@jvmcgovern I never ever look at my CPU, I may not want that.
Maybe I want MPI/IO? It’s great.
uphold your problem.
love,