Running 4DVar application in JEDI (on Discover)

Running 4DVar application in JEDI (on Discover)#


Loading Modules#

To Load Spack Stack 1.9 modules with GNU compiler on Discover run:

#!/bin/bash

echo "Loading EWOK-SKYLAB Environment Using Spack-Stack 1.9.0"

# load modules
module purge
module use /discover/swdev/gmao_SIteam/modulefiles-SLES15
module use /discover/swdev/jcsda/spack-stack/scu17/modulefiles

module use /gpfsm/dswdev/jcsda/spack-stack/scu17/spack-stack-1.9.0/envs/ue-gcc-12.3.0/install/modulefiles/Core
module load stack-gcc/12.3.0
module load stack-openmpi/4.1.6
module load stack-python/3.11.7

module load singularity

# Discover compiler modules set environment variable COMPILER, need to repair for R2D2
export COMPILER=gnu

module load jedi-fv3-env
module load ewok-env

To Load Spack Stack 1.7.0 modules with GNU compiler on Discover run:

#!/bin/bash

echo "Loading EWOK-SKYLAB Environment Using Spack-Stack 1.7.0 GNU SCU17"

# load modules
module purge
module use /discover/swdev/gmao_SIteam/modulefiles-SLES15
module use /discover/swdev/jcsda/spack-stack/scu17/modulefiles
module load ecflow/5.11.4

module use /gpfsm/dswdev/jcsda/spack-stack/scu17/spack-stack-1.7.0/envs/ue-gcc-12.3.0/install/modulefiles/Core
module load stack-gcc/12.3.0
module load stack-openmpi/4.1.6
module load stack-python/3.10.13

# Discover compiler modules set environment variable COMPILER, need to repair for R2D2
export COMPILER=gnu

module load jedi-fv3-env
module load ewok-env
module load sp

# To build more expensive fv3-jedi (tier 2) tests
#export FV3JEDI_TEST_TIER=2

Check here for the latest Spack-Stack modules on Discover. Note that this is a JCSDA private repository.

To build the jedi-bundle follow the insturctions here.

For a (slightly) faster build, comment out these repos in jedi-bundle/CMakeLists.txt: MOM6, soca, MPAS-Model, mpas-jedi, and coupling.

YAML Structure#

The JEDI code is under active development, and some of the YAML keys used in the examples may change over time. As a result, this document may become outdated. It’s important for users to understand both the structure of the YAML files and the meaning of the keys, while also consulting the latest ctest examples in the JEDI repositories.

This section provides an overview of the different components within the 4DVar YAML files.

/discover/nobackup/mabdiosk/garage/applications/var-app includes the YAML files for running different 4dvar cases, the input files, and a run script (run_4dvar.sh).

4dvar_geos-cf_fv3lm_c24_p12.yaml is a 4dvar experiment example at C24 resolution.

cost function:
  cost type: 4D-Var
  time window:
    begin: 2021-08-05T03:00:00Z #always beginning of the window
    length: PT6H

In this example, the assimilation window is from 2021-08-05 03Z to 2021-08-05 09Z.

The beginning of the time window does not depend on the DA method (cost type) and is always set to the beginning of the assimilation window.

The length of the window is typically set to 6H.


  model:
    name: FV3LM
    namelist filename: input/geometry_input/input_geos_c24_p12.nml
    tstep: PT15M
    filetype: cube sphere history
    lm_do_dyn: 1
    lm_do_trb: 0
    lm_do_mst: 0
    model variables: &modelvars [ud,vd,ua,va,T,DELP,SPHU,qi,ql,NO2]

In 4DVar, you need to compute (or pre-compute) the model state at every tstep within the assimilation window. Ideally, you want to run the full model and generate the model state at every tstep, but this can be costly. To reduce cost, you can 1) compute the model state using a simplified model such as FV3LM or 2) read the pre-computed model state using PSEUDO). In this example, we are using FV3LM. See 4dvar_geos-cf_pseudo.yaml for a PSEUDO example.

FV3LM model requires ud,vd,ua,va,T,DELP,SPHU,qi,ql to be on the model variables list (and be available in the background files). Trace gas and aerosol variables can be added to this list and will be treated as tracers (only get transported, similar to moisture).

Linear turbulence scheme and linear moist physics are turned off.


  analysis variables: [eastward_wind,
  northward_wind,
  air_temperature,
  air_pressure_thickness,
  specific_humidity,
  cloud_liquid_ice,
  cloud_liquid_water,
  volume_mixing_ratio_of_no2]

The list of variables you want to assimilate. These will be available in the analysis output. Note that the first 7 variables (everything except no2) are required to be on the list.


  geometry:
    fms initialization:
      namelist filename: input/geometry_input/fmsmpp.nml
    akbk: input/geometry_input/akbk72.nc4
    npx: 25
    npy: 25
    npz: 72
    layout: [1,2]
    field metadata override: input/geometry_input/geos_cf_ewok.yaml

This section is repeated a few times in this YAML file and always exists in almost all of the JEDI YAML files. Here, you can define the geometry (grid setup) of your background (input) files. We have background files at C24 resolution with 72 vertical levels.

layout is another important setting that can change depending on the number of processors you use to run the application. With the layout of [1,2] (for each tile), you must use “1x2x6” or “12” processors (there are 6 tiles in total). If the layout is [2,2] you will need “2x2x6” or “24” processors.

field metadata override is a list that maps the variables in the background files to JEDI. long name in this file refers to the variable name in fv3-jedi and io name refers to the variable name in the background (input) files.


  background:
    datetime: 2021-08-05T03:00:00Z  #background beginning of the window
    filetype: cube sphere history
    datapath: input/bg/geoscf_c24_ewok
    filename: GCv14.0_GCMv1.17_c24.geoscf_jedi.%yyyy%mm%ddT%hh%MM%ssZ.nc4
    state variables: [ud,vd,ua,va,T,SPHU,qi,ql,DELP,NO2,phis]

In 4DVar, the background must be available at the beginning of the assimilation window.

state variables is the list of variables that JEDI is going to read from the background files and put in “state”. Anything that is listed under model variables must be under state variables too.


  background error:
    covariance model: SABER
    saber central block:
      saber block name: ID

In this example, for simplicity, we are using an Identity matrix for background error.


  observations:
    observers:
    - obs space:
        name: NO2
        obsdatain:
          engine:
            type: H5File
            obsfile: input/obs/obs.tropomi_s5p_no2_tropo.2021-08-05T060000Z.nc4
        obsdataout:
          engine:
            type: H5File
            obsfile: output/fb.4dvar.c24.tropomi_s5p_no2_tropo.20210805T060000Z.nc
        simulated variables: [nitrogendioxideColumn]
      obs operator:
        name: ColumnRetrieval
        nlayers_retrieval: 34
        tracer variables: [volume_mixing_ratio_of_no2]
        isApriori: false
        isAveragingKernel: true
        stretchVertices: topbottom #options: top, bottom, topbottom, none
      obs error:
        covariance model: diagonal
      get values:
        time interpolation: linear

Observation, observation operator specification is listed here. Note that obsfile under obsdatain points to a tropomi no2 observations file. This file includes measurements from 03Z to 09Z, which is throughout our assimilation window.

The output file specified under obsdataout is the feedback file. It includes observation values and model (background) values at observation location/time or hofx0 values. hofx1 is the analysis values (after one outer loop iteration) at observation location/time. oman is (observation - analysis) and ombg is (observation - background).


final:
  diagnostics:
    departures: oman
  analysis to latlon:
    local interpolator type: oops unstructured grid interpolator
    resolution in degrees: 15.0  # low resolution for testing
    variables to output: [volume_mixing_ratio_of_no2]
    #pressure levels in hPa: [500]
    model levels: [71]
    #bottom model level: true
    frequency: PT3H
    datapath: output
    exp: 4dvar.c24
    type: an
    
output:
  filetype: cube sphere history
  provider: geos
  datapath: output/
  filename: ana.4dvar.c24.%yyyy%mm%dd_%hh%MM%ssz.nc4
  first: PT0H
  frequency: PT6H

It is possible to write out analysis (and increment) output in lat/lon grid at specific model or pressure level by setting analysis to latlon under the final section. The lat/lon filename will have a prefix of exp and suffix of latlon.modelLevels.nc output filename will be 4dvar.c24.an.*Z.latlon.modelLevels.nc

The frequency of analysis output can be set as low as tstep under model. The frequency of PT6H means that there will be two analysis files generated at the beginning and beginning + 6H (end) of the window.


variational:
  minimizer:
    algorithm: DRPCG
  iterations:
  - ninner: 2
    gradient norm reduction: 1e-10
    test: on
    geometry:
      akbk: input/geometry_input/akbk72.nc4
      npx: 25
      npy: 25
      npz: 72
      layout: [1,2]
      field metadata override: input/geometry_input/geos_cf_ewok.yaml
    diagnostics:
      departures: ombg

    linear model:
      name: FV3JEDITLM
      namelist filename: input/geometry_input/input_geos_c24_p12.nml
      linear model namelist filename: input/geometry_input/inputpert_4dvar.nml
      tstep: PT15M
      tlm variables: *modelvars
      lm_do_dyn: 1
      lm_do_trb: 0
      lm_do_mst: 0
      trajectory:
        model variables: *modelvars

This section sets up the minimizer for the 4dvar experiment.

ninner is the number of iterations in the inner loop. For testing is set to 2. In scientific experiments, it is usually set to a larger number like 100. gradient norm reduction is the threshold for convergance.

In JEDI, you can run the minimizer in a different (coarser) resolution to reduce the computation cost. Here it is set to the same resolution as the analysis.

FV3JEDITLM is the adjoint of the tangent linear model used for iterative minimization of the cost function. namelist filename changes with model resolution and layout. Make sure what is specified in this file matches the 4dvar YAML.

tstep for FV3JEDITLM cannot be smaller than tstep in FV3LM (or PSEUDO) meaning that you need to have model states available at every step of FV3JEDITLM.

In this example, tlm variables and model variables under trajectory are the same as model variables under FV3LM.

To run this application, log into a compute node

export JEDIBUILD=/discover/nobackup/mabdiosk/jedi-bundle/build-gnu-spack-1.7.0/bin/
export WORKDIR=/gpfsm/dnb33/mabdiosk/garage/applications/var-app

/discover/swdev/gmao_SIteam/MPI/openmpi/4.1.6-SLES15/gcc-12.3.0/bin/mpiexec "-n" "12" "$JEDIBUILD/fv3jedi_var.x" "$WORKDIR/4dvar_geos-cf_fv3lm_c24_p12.yaml"

The JEDI executable to run variational applications (3D or 4DVar) is fv3jedi_var.x

Note that here we are requesting 12 processors.