Unexpected behaviour when may_day_flag is raised

Hi,

I’m not sure if here is the right place to report this. If there is a better place, please let me know.

When my CROCO implementation (v2.1.3 with MPI) encounters a meteorology file with missing time step, get_bulk raises the may_day_flag but ierr is 0 so MPI_Abort is not called and the program continues to run. This is especially problematic for a forecasting application where problematic files can appear from time to time but the program should not continue to run after an error. adding a line like

if (may_day_flag.ne.0.and.ierr.eq.0) ierr=14

after the 100 label solves the issue for me.

best regards,

Ron

Hi,
I tried to understand your problem. then I think you tried to indicate that.
MessPass2D.F, MessPass3D.F, etc used MPI_Send. but if you use MPI_Isend or MPI_Issend then it will cause too much problem in MPI ? . when CROCO many code used MPI_Irecv not MPI_recv . The same way “ierr” used many places, may need “nerr” in zmask?
I believe in modern compiler using may_day_flag maybe worsen. What is your take on it.
I will happy to here from you. please share your log. see what can i do for you.
Thank you.
Best regards.

Thank you for your reply,

I think there several issues with may_day_flag. I was referring to the fact that the MPI block in main.F at lines 1022-1051 only checks ierr and not may_day_flag but it can be reached with ierr=0 and may_day_flag =/= 0, which will cause the program to get stuck in the MPI_Barrier.

The problem with MessPass, as far as I can see, is that MPI_Wait can crash because may_day_flag was raised somewhere else. In this case, only threads with may_day_flags=/=0 leave step.F at line 253 while other threads continue into MessPass. Then MessPass crashes because it need for all threads to pass through it. It would be better if the program terminated earlier or if may_day_flag was broadcasted when raised. However, I think the main issue is that not all places that raise may_day_flag print out the error (e.g. diag.F lines 356-366), so a user can get a MPI crach with no explanation of what went wrong.

I hope this is more clear.

Best regards,

Ron

1 Like

Hi Ron,
there in the OCEAN directory used “flush” , what is that? its in used in sub folder.
Best regards

I am not sure what you mean. If you mean “call flush” in diag.F line 381, then it is a Fortran subroutine to make sure data is written out (FLUSH (The GNU Fortran Compiler))

1 Like