problem with STATION output with MPI

asked 2020-07-17 21:00:39 +0200

krh gravatar image


I have found that when trying to save station data when using MPI an error occurs. I'm not sure that this is an issue with croco, or specific to the system I am using, but when STATION is define the program always fails at about the same point (well into the run so that some station data is saved correctly) with the following sent to the standard output:

[mpiexec@pn003] control_cb (../../pm/pmiserv/pmiserv_cb.c:864): connection to proxy 1 at host pn004 failed [mpiexec@pn003] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status [mpiexec@pn003] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:520): error waiting for event [mpiexec@pn003] main (../../ui/mpich/mpiexec.c:1149): process manager error waiting for completion

It appears to me that some part of the process hangs. Note that when STATION is undefined, the code runs fine.

Some additional information: NBQ is defined. The job is not that large (144 processes). Also the code is built using intel fortran and intel's HPC mpich distribution. I have yet to try it on standard openmpi.


edit retag flag offensive close merge delete