Extracting time series using xcroco/xgcm from multiple CROCO NetCDF files

Hello,

I am working with CROCO outputs and using the xcroco/xgcm workflow to extract time series for comparison with buoy observations.

My goal is to extract time series of variables such as temperature, salinity, and currents (u, v) at specific buoy locations, possibly including vertical interpolation to a fixed depth (e.g. -3 m), using multiple CROCO NetCDF output files.

What is the recommended or standard approach to handle this efficiently when dealing with long time series?

I am currently facing memory issues when processing multiple files with xarray/xgcm (with my one python script), especially when loading several datasets and performing vertical coordinate calculations and interpolation.

Any guidance or examples of robust workflows would be greatly appreciated.

Thank you for raising this question. I am facing a similar challenge and would appreciate some guidance from the community.

Using xcroco, I have successfully extracted temperature and salinity time series from multiple CROCO NetCDF files and interpolated them to fixed depths. However, I am still struggling with the velocity components (u and v).

What is the recommended workflow for, how should the staggered Arakawa C-grid variables be handled, and what is the proper sequence for horizontal interpolation and subsequent interpolation to fixed depth levels?

Any examples, documentation, or best practices for working with u and v in xcroco would be greatly appreciated.

Thank you.

Hi @Jellyfish and @AsishKsasidharan ,

First all thanks for the feedback :slight_smile: xcroco is still under development and is unstable in some cases.

I did identify memory issues to the XGCM grid when using large dataset with XCROCO and it can be challenging indeed. First of all can you share:

  • Your Python code
  • XCROCO version you use
  • CROCO grid size and number of time steps

The problem arise when using Dask and XGCM with grid_metrics > 1 on large Datasets. Due to variable naming convention change in xcroco, the time dimension name changes and Dask load the entire variables needed for grid_metrics into a single worker ! We could discuss this later if some fixes proposed here do not work.

Brest practice using the open_files() method:

def open_files(

model,

gridname,

filenames,

grid_metrics=2,

drop_variables=[],

chunks={“t”: 1},

remove_ghost_pts=True,

xperiodic=False,

yperiodic=False,

verbose=False,

**prepro_kwargs,

):

“”"

open Netcdf files or zarr archive

input:

    model : instance of the Model class defined in the model.py module

    gridname : path to the grid file

    filenames : path to the Netcdf files or to the zarr archive

    grid_metrics : type of xgcm grid

                   0: no metrics

                   1: horizontal metrics

                   2: horizontal and vertical metrics

    drop_variables : list of variables to drop

    chunks: chunks to override those in the yaml file

    verbose: verbose mode

outputs:

    ds: an xarray dataset

    grid: the associated xgcm grid

Note on performances:

    chunk only time dimension if you use grid_metrics=2

    it's possible to chunk on sigma if you to not use vertical interpolation 

    or other vertical interpolation... 

"""

```

grid_metrics:

A simple fix is to use a lower value for `grid_metrics` (0 or 1). Computations may be a little slower than `grid_metrics = 2` but it will help managing memory.

drop_variables

Not tested yet, but should not hurt. you can provide a list of variables names in the file naming format.

If the problem persist, here are some clues:

  • try using chunks names (more especially time) using the file naming format ex: ‘scum_time’ or ‘ocean_time’ instead of ‘t’)
  • Use the latest xcroco version (only if you do not use the `cross_section` method from `girdop.py`

Keep me updated :slight_smile:

Hi jcollin,

First of all, thank you very much for your reply.

  • I am currently using an older version of xcroco (croco_pytools v2.0.3).
  • The CROCO grid has a horizontal resolution of dx = 1.22 km and dy = 1.78 km, with a grid size of eta_rho = 902 × xi_rho = 902.
  • I was trying to process a one-year simulation (2019) consisting of a total of 332,352 time steps (NTIMES).

In some cases, I tried reducing the number of variables and simplifying the processing, but the job was killed due to memory limitations.

I am still relatively new to this workflow, so please excuse any mistakes.

This was the best try that i had
time_series.py (3.6 KB)

This is a new updated version but i still don’t know if i´m applying correctly the interpolations + rotation of u and v
u_v_rot.py (4.7 KB)

Hi @jcollin,

First of all, thank you very much for your reply and for looking into this issue.

Here are the details of my setup:

  • CROCO_PYTOOLS version: Release 2.0.0

  • Grid configuration: BENGUELA_LR

    • LLm0 = 543

    • MMm0 = 363

    • N = 40 vertical levels

  • Simulation period: 2014-01-01 00:00:00 to 2014-06-30 00:00:00

  • Variables processed: temperature, salinity, and velocity components (u, v)

  • Operation: vertical interpolation to fixed z-levels using gop.interp_regular()

I have attached the draft of the Python code used for interpolating temperature and salinity to regular z-levels, along with the relevant sections of my croco.in file.

While processing multiple hourly output files, I noticed that variables and coordinates such as Cs_rho (and some other vertical coordinate information) are present only in the first output file. When I try to process the files individually, the absence of these coordinates in subsequent files results in errors during interpolation.

Could you please clarify whether this is the default behavior of CROCO output files, where these grid-related variables are written only once, or whether I may have missed a setting in my configuration?

I also have an additional question regarding the velocity fields.

I came across the XOA example for CROCO sections:
https://xoa.readthedocs.io/stable/examples/plot_croco_section.html

From the example, it appears that XOA can handle CROCO vertical coordinates and interpolation to regular z-levels. Would it be possible (or recommended) to use a similar XGCM/XOA-based workflow for interpolating the staggered u and v velocity components onto regular depth levels?

My understanding is that u and v are located on different C-grid positions than tracers (temp, salt), so I am unsure whether they should first be interpolated to rho points (or rotated to eastward/northward velocities) before performing the vertical interpolation. Alternatively, can XGCM/XOA directly handle the staggered velocity fields and interpolate them to regular z-levels correctly?

Any guidance on the recommended workflow for extracting u and v on fixed-depth levels would be greatly appreciated.

Any guidance on the recommended way to handle this when processing files independently with XCROCO would be greatly appreciated.

Thank you again for your help and for the continued development of XCROCO.

crocoin.txt (1.1 KB)

ts.py (3.1 KB)

I used the tuto_xcroco jupyter notebook to adapt my script. Now, it takes in account the rotation of the u and v components. I will try and compared it with buoys
u_v_rot.py (5.0 KB)

@Jellyfish

For memory Issue:

I may suggest that for the buoyancy interpolation as you only use a few grid point, to first find location of interest using the dataset and then from this dataset create a specific grid with vertical coordinates just around this area. using a zoom with 3 or 4 points (keeping rho-u-v dimensions consistent)

By doing so, you will use much less memory.

Regarding velocities rotation:

it all depends whether you want to keep CROCO grid coordinates or project on nortward/eastward velocities. There is actually a function that does that gop.rot_uv()

@AsishKsasidharan

grid related variables depends on user choices at CROCO config creation step. In general there are two cases, either you a croco_grd.nc containing only grid related variables or grid variables are in all files.

In either case providing gridname to io.open_files() calls gridop.add_grid() and add geometry to the dataset. So if the error arise at second file, my guess is that the error comes from another issue. Could you please provide error logs ?

For u,v location, you can take a look at the tutorial notebook or @Jellyfish code, there is a method gridop.to_grid_point() for example to interpolate u at rho-levels:
gop.to_grid_point(ds.u, xgrid, hcoord=’r’, vcoord=’r’)

Concerning XOA, I have never used it but from my understanding it is appropriate to use for CROCO grid manipulation and is probably more robust and easier to use than XCROCO which still under development.

@AsishKsasidharan

For u,v location, you can take a look at the tutorial notebook or @Jellyfish code, there is a method gridop.to_grid_point() for example to interpolate u at rho-levels:
gop.to_grid_point(ds.u, xgrid, hcoord=’r’, vcoord=’r’)

My last reply with the code u_v_rot.py it applies the interpolation of both u anv v at rho_levels and the “manual rotation” of both variables to be nortward/eastward velocities but it doesn’t interpolate again for the native grid like the function gop.rot_uv() - in my case i prefer to maintain the urot and vrot at rho-levels

def rot_uv(u, v, angle, xgrid):
“”"Rotate u,v to lat,lon coordinates

Args:
    u: DataArray
        3D velocity components in XI direction

    v: DataArray
        3D velocity components in ETA direction

    angle: DataArray
        Angle [radians] between XI-axis and the direction to the EAST at RHO-points

    xgrid: xgcm.grid
        grid object associated with u and v

Returns:
    DatArray: rotated velocities, urot/vrot at the horizontal u/v grid point
"""

assert xgrid is not None, "Xgcm grid should be input."
assert isinstance(xgrid, Grid), "xgrid must be `xgcm` grid object."

# save attributes to reinstitute at end
attrsu = u.attrs
attrsv = v.attrs

# change all the variables to the rho grid point if needed
u = to_grid_point(u, xgrid, hcoord="r", vcoord="r")
v = to_grid_point(v, xgrid, hcoord="r", vcoord="r")
angle = to_grid_point(angle, xgrid, hcoord="r", vcoord="r")

cosang = np.cos(angle)
sinang = np.sin(angle)

# Rotate velocities
urot = u * cosang - v * sinang
vrot = u * sinang + v * cosang

# change velocities to their horizontal grid point
urot = to_grid_point(urot, xgrid, hcoord="u", vcoord="r")
vrot = to_grid_point(vrot, xgrid, hcoord="v", vcoord="r")

# add original attributes back in
urot.attrs = {**attrsu, **urot.attrs}
vrot.attrs = {**attrsv, **vrot.attrs}

# try to guess cf coordinates
urot = urot.squeeze().cf.guess_coord_axis()
vrot = vrot.squeeze().cf.guess_coord_axis()

return [urot.rename("urot"), vrot.rename("vrot")]

ncdump.txt (33.4 KB) You could reply with something like:

Thanks for the clarification.

Unfortunately, I don’t have the error logs available anymore, so I can’t provide the full traceback.

ncdump.txt (33.4 KB)

I have attached the ncdump output for reference.

The issue I’m seeing is that whenever I try to interpolate using the xcroco code provided in the notebook:

I get an error indicating that Cs_rho is missing from the dataset. From inspecting the files, it appears that some of the vertical coordinate variables (Cs_rho and related grid/vertical coordinate variables) are not present in all files.

As a workaround, I am currently using the first file as a reference source for Cs_rho and the other grid-coordinate variables, and then applying the interpolation to the remaining files. This seems to work, but I’m not sure whether this is the recommended approach.

Could you please take a look at the attached ncdump and let me know if I’m handling this correctly, or if there is a better way to make these variables available across all files during interpolation?

Kindly suggest me better way to handle this.

When we want to interpolate a variable onto a targeted depth array (for example, specific depths like these, 0,5,10,…6000), which function should be used in this case? interp_regular or isoslice?

From my understanding, it isoslice seems to be intended for interpolation to specific depth levels, while it interp_regular is used for mapping onto a regularly spaced vertical grid. \

Any clarification would be appreciated.

Thanks for your help.

@AsishKsasidharan
I personally use grid_metrics=1 on large dataset to avoid computing the whole 4D coordinates and keeping my memory usage low.

Also if memory crashes when using Dask use more threads per workers and less workers. Or even consider not using Dask if performances is not a key limitation.

For vertical interpolation chose gop.isoslice()

Here is an example (using latest code version, there may be a few API changes for grid and dataset creation) grid is about (20 x 80 x 1600 x 2000):

# setting up Dask 
n_workers = 8
threads_per_worker = 6

cluster = LocalCluster(n_workers=n_workers, threads_per_worker=2)
client = Client(cluster)

grid_metrics = 1
chunks = {'t': 1}

# dataset
drop_variables = ['AKv', 'Akt', 'tke', 'gls', 'Lscale']
ds = io.open_files(
    croco,
    gridname,
    filenames,
    grid_metrics=grid_metrics,
    drop_variables=drop_variables,
    chunks=chunks,
)

# grid without vertical coordinates 
grid = gop.fast_xgcm_grid(ds, croco, grid_metrics=1)

# ------------------
depth = -10
ds1 = ds

# depths at rho-points
z_r = gop.get_z(croco, ds=ds1, z_sfc=ds1.z_sfc, xgrid=grid, hgrid="r")

# vertical interpolation
z_temp = gop.isoslice(ds1.temp, depth, grid, target_data=z_r, axis="z")

cplt.plotfig(z_temp.isel(t=0))

Regarding interp_regular()I believe this is to interpolate from horizontal curvilinear grid to regular cartesian grid.

I hope this helps

1 Like

Thank you very much for the fruitful discussion and for taking the time to share your insights. My main concern regarding the interpolation of the u and v velocity components remains unresolved. I hope this capability can be further developed and incorporated into Crocopy in the future. Thank you again for your support and assistance. Thank you croco team.

Based in the tutorial xcroco, you interpolate the u component with z_u = gop.get_z(croco, ds=ds1, z_sfc=ds1.z_sfc, xgrid=grid, hgrid="u") + uslice = gop.isoslice(ds1.xcur, depth, grid, target_data=z_u, axis="z") and v with z_v = gop.get_z(croco, ds=ds1, z_sfc=ds1.z_sfc, hgrid="v") + vslice = gop.slices(croco, ds1.ycur, z_v, ds=ds1, depth=depth) but for the points you want. In the end, you apply the def rot_uv() to rotate based on your grid file.

@jcollin @Jellyfish Are xcur and ycur intended to replace the original u and v Velocity fields?

Those names are the ones that the croco_native nameplate changes for the u and v that are provided from the outputs.

elif name == “croco_native”:
self.rename_vars = {
# surface wind stress
“sustr”: “xtau_sfc_u”, # x-wind stress component
“svstr”: “ytau_sfc_v”, # y-wind stress component
“wstr”: “tau_sfc”, # wind stress module
# surface fluxes
“radsw”: “sw_sfc_down”, # surface_downward_shortwave_flux
“shflx_rlw”: “lw_sfc_down”, # surface_downward_longwave_flux
“shflux”: “hf_net_sfc_down”, # surface_downward_shortwave_flux
“shflx_lat”: “lh_sfc_down”, # surface_downward_latent_heat_flux
“shflx_sen”: “sh_sfc_down”, # surface_downward_sensible_heat_flux
“swflux”: “ep_flux_sfc_down”, # surface_downward_freshwater_flux (E-P)
# ocean surface
“zeta”: “z_sfc”, # sea_surface_height
“ssh”: “z_sfc”, # sea_surface_height
# currents
“u”: “xcur”, # x-current component
“v”: “ycur”, # y-current component
“w”: “zcur”, # z-current component
“omega”: “zcur_sc”, # z-current component in sigma coordinate
“ubar”: “xcur_btrope”, # x-barotropic current component
“vbar”: “ycur_btrope”, # y-barotropic current component
# MLD
“hbls”: “mld_turb”, # ocean_turbulent_mixed_layer_depth
“hbl”: “mld_turb”, # ocean_turbulent_mixed_layer_depth
# tracers et al.
“temp”: “temp”, # ocean potential temperature
“salt”: “salt”, # ocean salinity
# other diag variables
“Akt”: “zdiff_tra_w”, # ocean_vertical_tracer_diffusivity_at_w_points
“Akv”: “zdiff_mtm_w”, # ocean_vertical_momentum_diffusivity
# bottom layer
“bostr”: “tau_bot”, # ocean bottom stress
“hbbl”: “bbl”, # bottom boundary layer
# grid et al.
“h”: “h”, # terrain height or bathy (positive upward, static)
“lat_rho”: “lat”, # latitude at mass points
“lon_rho”: “lon”, # longitude at mass points
“lat_u”: “lat_u”, # latitude at u points
“lon_u”: “lon_u”, # longitude at u points
“lat_v”: “lat_v”, # latitude at v points
“lon_v”: “lon_v”, # longitude at v points
“mask_rho”: “mask”, # land mask
“f”: “f”, # Coriolis parameter (1/seconds) at RHO-points
“VertCoordType”: “vtransform”, # new vertical S-coordinates
“Cs_rho”: “Cs_r”, # S-coordinates at mass points
# time
“time”: “t”, # time
# dimensions
“xi_rho”: “x”, # x dimension at mass points
“eta_rho”: “y”, # y dimension at mass points
“x_rho”: “x”, # x dimension at mass points
“y_rho”: “y”, # x dimension at mass points
“xi_u”: “x_u”, # x dimension at u points
“eta_u”: “y”, # y dimension at u points
“x_u”: “x_u”, # x dimension at u points
“y_u”: “y”, # y dimension at u points
“xi_v”: “x”, # x dimension at v points
“eta_v”: “y_v”, # y dimension at v points
“x_v”: “x”, # x dimension at v points
“y_v”: “y_v”, # y dimension at v points
“x_w”: “x”, # x dimension at mass points
“y_w”: “y”, # y dimension at mass points
“s_rho”: “s”, # sigma dimension at rho level
“s_w”: “s_w”, # sigma dimension at w level
}

Where in the left are the original names and on the right new ones

1 Like

Thank you very much.

regards

Asish K Sasidharan

Senior Research Fellow, INCOIS, Hyderabad