AGRIF: Subscript #1 of the array IBUF_SNDS has value

Hello,

I’m running a simulation using a parent grid (884 X 800 X 32 points) and a child grid (702 X 606 X 32). All the files were generated with the croco_pytools.

The simulation without the AGRIF nesting works without any error and provides the expected output files. But when I use AGRIF, I get this error:

malloc(): unsorted double linked list corrupted

I recompiled with more flags (-check bounds -check pointers -check uninit -traceback) to get more info, and get this

forrtl: severe (408): fort: (2): Subscript #1 of the array IBUF_SNDS has value 450 which is greater than the upper bound of 449
  
Image              PC                Routine            Line        Source
croco_test         0000000001069EC6  sub_loop_messpass        1529  MessPass2D_.f
croco_test         0000000001063F12  messpass2d_3pts_t        1035  MessPass2D_.f
croco_test         00000000010BAE05  sub_loop_exchange         869  exchange_.f
croco_test         00000000010B9B6E  exchange_r2d_3pts         534  exchange_.f
croco_test         00000000007BE657  sub_loop_setup_gr         885  setup_grid1_.f
croco_test         00000000007A87F2  setup_grid1_tile_         362  setup_grid1_.f
croco_test         00000000007A39A7  sub_loop_setup_gr         237  setup_grid1_.f
croco_test         00000000007A3791  setup_grid1_               66  setup_grid1_.f
croco_test         00000000004D6593  sub_loop_main_           1133  main_.f
croco_test         00000000004D2369  MAIN__                    208  main_.f
croco_test         000000000040E54D  Unknown               Unknown  Unknown
libc-2.28.so       000015050E5D9D85  __libc_start_main     Unknown  Unknown
croco_test         000000000040E46E  Unknown               Unknown  Unknown

It seems that the error comes from an allocation in MessPass2D.F, around line 52:

sub_X=Lm
size_X=Npts*(sub_X+2*Npts)-1  !7+Npts*sub_X
sub_E=Mm
size_E=Npts*(sub_E+2*Npts)-1  !7+Npts*sub_E

If I increase the value of size_X and size_E and run again the simulation, then it works (well, there is a similar issue with MessPass3D.F but could be solved the same way) and I get some output files, yet the grid seems a bit messed up (probably due to the new & wrong allocation of size_X and size_E.

The picture shows the surface temperature in the child grid, the northern boundary and the northeastern corner obviously have some issues.

Compilation

The code is compiled using intel compilers on a cluster and run on 32 nodes with MPI. The netCDF libraries are also compiled with the same compilers and options.

Question

Any idea about the origin of this issue?
What other tests shall I run?

Thanks

Hi,

I believe you get this error as you are using a 5th order scheme for advection (either WENo or UP5) ?

If so the fix might be to change MessPass2D.F

# ifdef AGRIF
         sub_X=Lm
         size_X=Npts*(sub_X+2*Npts)-1  !7+Npts*sub_X
         size_X=MAXNPTS*(sub_X+2*MAXNPTS)-1  !7+Npts*sub_X
         sub_E=Mm
         size_E=Npts*(sub_E+2*Npts)-1  !7+Npts*sub_E
         size_E=MAXNPTS*(sub_E+2*MAXNPTS)-1  !7+Npts*sub_E

and MessPass3D.F :

# ifdef AGRIF
      size_Z=Npts*Npts*(NP1)
      size_Z=MAXNPTS*MAXNPTS*(NP1)
      sub_X=(Lm+NSUB_X-1)/NSUB_X
      size_X=(NP1)*Npts*(sub_X+2*Npts)
      size_X=(NP1)*MAXNPTS*(sub_X+2*MAXNPTS)
      sub_E=(Mm+NSUB_E-1)/NSUB_E
      size_E=(NP1)*Npts*(sub_E+2*Npts)
      size_E=(NP1)*MAXNPTS*(sub_E+2*MAXNPTS)

Let me know if it helps,

Enzo

1 Like

Hello Enzo,

thanks a lot, you, indeed I was using WENO5.
I tested with the suggested changes and the error did not appear.

In addition I noticed that the issue shown in the figure was not related, not sure what was the origin but now it’s fixed as well.

Cheers,

Charles

Hi Charles !

Good to know that your problem is solved !

MPI buffers are currently initialized, in the AGRIF case, for numerical schemes needing 2 ghost points and not 3 as requested by WENO or UP5. It is a known issue and it will be fixed for the next release.

Good luck for your work,

Enzo

Hi @eleboued @ctroupin1 ,
First of all thank you for a nice discussion. I am new here. UP5_MASKING case
i suspect there are some problems. can you please check it…
during compilation I don’t get an error but reading the code I feel the following changes are needed? Thanks in advance for your reply.
Y direction →

ELSE
mask1=rmask(i,j+1)*mask0
mask3=rmask(i,j+2)*mask2
ENDIF

maybe I need to change:

            ELSE
              mask2=rmask(i,j+2)*mask0*rmask(i,j-1)
              mask1=rmask(i,j+2)*mask0
              mask3=rmask(i,j+3)*mask2
            ENDIF

X direction →

ELSE
mask1=rmask(i+1,j)*mask0
mask3=rmask(i+2,j)*mask2
ENDIF

maybe I need to change:

             ESLE
              mask2=rmask(i+2,j)*mask0*rmask(i-1,j)
              mask1=rmask(i+2,j)*mask0
              mask3=rmask(i+3,j)*mask2
            ENDIF

Thanks.