TCCON QC Configuration

There are two configuration files that TCCON QC uses when making plots: the limits file, and the primary plots file. The limits file configures what axis limits to use for different variables. The plots configuration file controls which plots to make and how to style them.

Both files use TOML format. TOML has some similarities to the INI format often used for Linux configuration file, though TOML is more flexible.

Defaults for both files can be found in the inputs directory of the TCCON QC source code. You may edit these or copy them and use the --config and --limits command line arguments to tell the QC plotting program where to look for these.

Primary file

The primary configuration file is broken down into several main parts: variables, image postprocessing, styles, and plots.

Variables

The [variables] section of the configuration allows you to define strings to replace with other values elsewhere in the configuration. For example, if my configuration file included:

[variables]
static_ref_file = "/data/tccon/static/standard_network_data.nc"

then I could have a plots subsection like:

[[plots]]
kind = "timeseries-2panel+violin"
name = "XCO2 timeseries"
yvar = "xco2"
yerror_var = "xco2_error"
violin_data_file = "$static_ref_file"

This will see "$static_ref_file" in the [[plots]] section replaced with “/data/tccon/static/standard_network_data.nc”. In other words, QC plots interprets the plotting section as:

[[plots]]
kind = "timeseries-2panel+violin"
name = "XCO2 timeseries"
yvar = "xco2"
yerror_var = "xco2_error"
violin_data_file = "/data/tccon/static/standard_network_data.nc"

Substitutions obey the following rules:

  • Variables must be references with a leading dollar sign. Variable names may contain letters, numbers, and/or underscores, but may not start with a number.

  • If the variable makes up the entire value, as in the above example, type is preserved. That is, if you have a configuration file such as:

    [variables]
    custom_width = 16
    custom_height = 8
    
    [[plots]]
    kind = "flag-analysis"
    width = "$custom_width"
    height = "$custom_height"
    

    then width and height would be integers. This is important as some of the inputs are expected to be actual numeric or boolean types.

  • If the variable is inserted in the middle of a longer string, it is expanded using Bash-like rules: a variable begins with a dollar sign and ends with the first non-alphanumeric or underscore character, or curly braces can be used to disambiguate where the variable ends if it must abut an alphanumeric or underscore character. For example, you could do this:

    [variables]
    ref_path = "/data/tccon/static"
    ref_site = "oc"
    
    [[plots]]
    kind = "timeseries+violin"
    yvar = "xluft"
    violin_data_file = "$ref_path/${site}_static.nc"
    

    In this example, $ref_path does not need curly braces because the next character (/) cannot be part of a variable name. $site does need curly braces; it this were written as $site_static.nc, it would look for a variable named site_static. Any case where the value on the right hand side is not exactly one variable name, with no extra character, will always result in a string. That is, something like:

    [variables]
    custom_width = 16
    
    [[plots]]
    ...
    width = "$width "  # note the extra space at the end!
    

    will cause width for this plot to be a string, not an integer.

Note

When writing references to variables, they must be quoted as shown above. TOML syntax does not allow dollar signs in bare strings.

Image postprocessing

This section contains options that pertain to final conversion of the individual figures into combined PDFs. It begins with the [image_postprocessing] header. All options are optional (with one caveat regarding font_file). Options are:

  • disable_info (default = false): set this to true to skip writing plot information (plot number, name, and source file) in the upper left corner of each plot. Since writing that information requires that font_file points to a valid TrueType font file, setting disable_info = true is a workaround if you cannot find a TrueType font on your computer.

  • font_file (default = “LiberationSans-Regular.ttf”): TrueType font file to use when writing the plot number, name, and input file in the top left of each page. This is done using Python’s Pillow library for image manipulation. Pillow searches common directories for TrueType files, so in many cases you only need give the file name, and not a full path.

Note

If LiberationSans-Regular.ttf is not available on your system, you will need to change this option to a valid TrueType font file, or set disable_info to true. Otherwise the QC plots program will crash when it reaches this part. On Linux, fonts can usually be found under /usr/share/fonts.

  • font_size (default = 30): size of the font used to write the plot number, name, etc. in the upper left corner of each page.

  • bookmark_all (default = None): this controls whether each page in the output PDF automatically receives a bookmark. Setting this to true or false will turn that behavior on or off, respectively. However, the default when this is not specified is to check whether any of the plots have a value for their individual bookmark properties. If not, QC plots behaves as if bookmark_all is true (and so makes a bookmark for every plot); if so, then QC plots behaves as if bookmark_all is false (only making bookmarks for plots that have their individual bookmark properties set).

Plots

This section is the meat of the configuration, as it specifies which plots to make and in what order. A very short example is:

[[plots]]
kind = "flag-analysis"

[[plots]]
kind = "timeseries"
yvar = "dip"

[[plots]]
kind = "timeseries"
yvar = "fvsi"

Each plot begins with [[plots]]. Within each plot subsection, there are one or more key-value pairs such as kind = "timeseries". Every plot subsection must include the kind key, as this determines what type of plot to make. Different plots have different sets of options, which will be covered in the Plot types section.

In this example, we have two types of plots: one “flag-analysis” and two “timeseries”. The two timeseries plots have different yvar values, so each will plot a time series of their specified variables.

Styles

The styles section allows you to specify details of how different data are plotted in the QC plots. This section is the more complex, so let’s look at an example right away:

[style.default.scatter]
all = {color = "black", marker = "o", markersize = 1}
flag0 = {color = "black", marker = "o", markersize = 1}
flagged = {color = "red", marker = "o", markersize = 1}
legend_kws = {ncol = 2}

[style.main.scatter]
all = {color = "royalblue"}
flag0 = {color = "royalblue"}
flagged = {color = "red"}

[style.ref.scatter]
all = {color = "lightgray"}
flag0 = {color = "lightgray"}

Each style subsection is defined by a single bracketed header with the format [style.<data type>.<plot kind>]. In the first subsection of the example, “default” is the data type and “scatter” the plot type. In the second subsection, “main” is the data type and “scatter” is again the plot type. The four allowed data types are:

  • main - styles defined for the “main” data type affect how the data in the file passed as a positional argument are plotted, that is the “main” focus of the plots.

  • ref - styles defined for the “ref” data type affect reference data (i.e. the file passed to the --ref command line argument to be used as reference good quality data).

  • context - styles defined for the “context” data type affect data in the file passed through the --context command line argument, i.e. data from earlier in the record for the same site as the main data used to place the main data in the context of the overall record.

  • default - style values defined for “default” provide a fallback for the other three.

The allows plot types are the allowed values for the kind option in the plots section, which are enumerated in the Plot types section of this documentation.

Within each style subsection, how style options are organized depends on the specific plot type. Usually (but not always), the keys within the subsection refer to specific subsets of data, and their values are dictionaries of key-value pairs that affect the style used when plotting that subtype of data.

Let’s walk through how the example shown above is interpreted by the QC plotting program. We will assume that we’re making a scatter plot, since that is the only plot type defined here. When the code goes to plot the main data, it reads the [style.main.scatter] section. For a scatter plot, the default behavior is to plot good (flag == 0) data and not good (flag > 0) data as two separate series. The style for flag == 0 data is set by the flag0 entry. To build the full style, the flag0 options from both the [style.main.scatter] and [style.default.scatter] sections are combined, with main options taking precedence. In this example, the flag == 0 style would be:

{color = "royalblue", marker = "o", markersize = 1}

All three of these options were in the default section, but color was also defined in the main section, and so the latter color takes precedence. Likewise, the flag > 0 style comes from the combination of the default and main section’s flagged entries, and so is:

{color = "red", marker = "o", markersize = 1}

(In this case, both sections specified the same color, so it didn’t matter that main overrode the color value from default.)

All plot types are permitted to include legend_kws as a key within the “default” subsection, as you see in this example. This can point to a dictionary of keywords to pass to the matplotlib legend function. Unlike the other components of styles, the legend keywords can be overridden on individual plots using the legend_kws key in a [[plots]] subsection of the TOML file.

Warning

legend_kws is only read from the “default” subsection. If you put it in “main”, “ref”, or “context”, it will be ignored.

Note

The legend documentation makes a distinction between when legend is called on a figure vs. axes. Currently, all plot types in the TCCON QC program call legend on axes.

Cloning styles

Since many plot types are closely related, many plots offer the option to “clone” their style from another plot. For example, in the default configuration:

[style.default.scatter]
all = {color = "black", marker = "o", markersize = 1}
flag0 = {color = "black", marker = "o", markersize = 1}
flagged = {color = "red", marker = "o", markersize = 1}

[style.default.timeseries]
clone = "scatter"

By specifying clone = 'scatter' in the [style.default.timeseries] section, this means that all the styles defined for [style.default.scatter] are replicated in [style.default.timeseries]. In other words, the previous example is identical to:

[style.default.scatter]
all = {color = "black", marker = "o", markersize = 1}
flag0 = {color = "black", marker = "o", markersize = 1}
flagged = {color = "red", marker = "o", markersize = 1}

[style.default.timeseries]
all = {color = "black", marker = "o", markersize = 1}
flag0 = {color = "black", marker = "o", markersize = 1}
flagged = {color = "red", marker = "o", markersize = 1}

The value that comes after the clone = key is the plot kind to clone from. You can only clone styles from the same data type; that is, in this example, we could clone the default styles from scatter plots for the default styles in timeseries plots, but we could not clone the main data styles from scatter plots for the default styles in timeseries plots. Default to default, main to main, ref to ref, and context to context only.

Note

Not all plot types support cloning styles. If they do not, this will be noted in Plot types below.

You can override specific keys within a subsection after cloning. For example:

[style.default.timeseries]
clone = "scatter"
legend_kws = {ncol = 2}

would clone the all, flag0, and flagged values from [style.default.scatter] (from the first example in this section) but use {ncol = 2} for the legend_kws value.

Plot types

The following table summarizes the available plots.

  • The “Kind” column lists the string to give as the kind = value in the configuration file to create a plot of this type.

  • “Required keys” lists other keys that must be present in that configuration section to create that kind of plot.

  • “Optional keys” lists keys that may be provided to change the behavior of the given plot.

  • “Style keywords” describes what keys may be passed in the style section for this plot type; using this is “MPL function kws”, meaning any keywords for the Matplotlib function named can be given.

  • “Cloning supported” indicates whether that plot type allows style cloning

  • “Aux plots” lists auxiliary plots that can be added to that main style plot.

Kind

Required keys

Optional keys

Style keywords

Cloning supported

Aux plots

flag-analysis

min_percent

MPL bar kws

No

nan-check

vsw_windows, groups, percentage, window_font_size, sharey

N/A

No

neg-time-jump

marker

No

timing-error-am-pm

sza_range

yvar, freq, op, time_buffer_days, flag_cat_override

MPL plot kws

Yes

violin

delta-timing-error-am-pm

sza_range

yvar, freq, op, time_buffer_days, flag_cat_override

MPL plot kws

Yes

violin

timing-error-szas

sza_ranges, am_or_pm

yvar, freq, op, time_buffer_days, flag_cat_override

MPL plot kws

No

violin

scatter

xvar, yvar

match_axes_size, show_out_of_range_data

MPL plot kws

Yes

hexbin

xvar, yvar

show_reference, show_context

MPL hexbin kws

Yes

timeseries

yvar

time_buffer_days, show_out_of_range_data

MPL plot kws

Yes

violin

delta-timeseries

yvar1, yvar2

time_buffer_days, show_out_of_range_data

MPL plot kws

Yes

violin

timeseries-2panel

yvar, yerror_var

time_buffer_days, show_out_of_range_data

MPL plot kws

Yes

violin

timeseries-3panel

yvar

time_buffer_days, show_out_of_range_data, plot_height_ratios, height_space, bottom_limit, top_limit, even_top_bottom

MPL plot kws

Yes

violin

resampled-timeseries

yvar, freq, op

time_buffer_days, show_out_of_range_data

MPL plot kws

Yes

violin

rolling-timeseries

yvar, ops

gap, rolling_window, uncertainty, data_category, time_buffer_days, show_out_of_range_data

MPL plot kws

Yes

violin

rolling-timeseries-3panel

yvar, ops

gap, rolling_window, uncertainty, data_category, time_buffer_days, show_out_of_range_data, plot_height_ratios, height_space, bottom_limit, top_limit, even_top_bottom

MPL plot kws

Yes

violin

delta-rolling-timeseries

yvar1, yvar2, ops

gap, rolling_window, uncertainty, data_category, time_buffer_days, show_out_of_range_data

MPL plot kws

Yes

violin

rolling-derivative

yvar, dvar

derivative_order, gap, rolling_window, flag_category, time_buffer_days, show_out_of_range_data

MPL plot kws

Yes

violin

zmin-zobs-delta-rolling-timeseries

ops

gap, rolling_window, uncertainty, data_category, time_buffer_days, show_out_of_range_data, annotation_font_size

MPL plot kws

Yes

violin

prior-time-matchup-timeseries

max_time_diff_hours, mark_out_of_bounds_time_diffs, out_of_bounds_contiguous_days, out_of_bounds_style

MPL plot kws

Yes

Common optional keys

All plot types accept the following as optional keys:

  • key (default = None): a string used to refer to this plot from another plot. If not given, this plot cannot be referenced from another plot.

Warning

There is currently no check to protect against two plots having the same key. If you get odd results when trying to refer to another plot, make sure you don’t have duplicated plot keys!

  • name (default = None): a name to use for the plot alongside the plot number in the upper left corner of each page. If this is not given, then the filename used to save the intermediate plot images is inserted instead.

  • bookmark (default = None): controls whether and how this page gets bookmarked in the output PDF. Assigning a string as this property will use that name for the bookmark in the PDF (e.g. setting bookmark = "Flags" on a plot will cause that page in the final PDF to have the bookmark “Flags”). Setting this to true will use the value of name for the bookmark (either the value passed as name explicitly or the fallback file name). If the Image postprocessing key bookmark_all is true`, then all plots have a bookmark in the final PDF. In that case, the value of ``bookmark is used if available, then QC plots falls back on name.

  • legend_kws (default = {}): keyword to pass to the legend call for this plot only. Will be merged with legend keywords defined in the default style for this plot type.

  • extra_qc_lines (default = []): a list of dictionaries specifying extra horizontal or vertical lines to plot as a guide for whether data is in family or not. An example:

    extra_qc_lines = [{value = 0.996, axis="y", linestyle = "--", color = "darkorange", label="Expected range"},
                      {value = 1.002, axis="y", linestyle = "--", color = "darkorange"}]
    

    Each dictionary must have the key value, which gives the position of the line. The axis key is optional; it specifies on which axis the lines are positioned on (“y” = horizontal lines, “x” = vertical) and defaults to “y” if absent. Any other key-value pairs must be valid keyword arguments to axvline() or axhline().

  • width (default = 20): initial width of the plot in centimeters

  • height (default = 10): initial height of the plot in centimeters

Note

This does not guarantee the final page size will be 20 x 10 cm. Excess whitespace is trimmed from the plots and the final page size depends on the --size command line argument.

If a plot has an auxiliary plot added, it may have additional required or optional keys beyond those described in this section (or the plot-specific sections below). See Auxiliary Plots for information on which keys are added by which auxiliary plots.

flag-analysis

A flag analysis plot shows bar graphs of the number of spectra and percent of spectra marked as bad by different variables during the automatic QC process in TCCON post processing.

Required keys

None

Optional keys

  • min_percent (default = 1.0): the minimum percent of spectra a variable must flag for it to be shown on the plot.

Style

A flag-analysis style subsection must have the all key, this is the only one used. Keywords can be any valid keywords for matplotlib.pyplot.bar(). In addition, legend_fontsize (default 7) adjusts the size of the text in the legend.

nan-check

A plot that displays the number of percentage of data that is a NaN or fill value in each window. It uses the VSW variables, and shows the larger percentage/number for the VSW column amount and error amount for each window.

Required keys

None

Optional keys

  • vsw_windows (default = None): which windows from the VSW variables to include (e.g. ["co2_6220", "ch4_6002"]). Generally you will not use this input; use groups instead. Only use this if you need to limit to specific windows. When this is None (the default), all vsw variables are available.

  • groups (default = None): defines how to group the gases into axes. The default is to put all gases into one axes. Otherwise, this value must be a list of lists of gas names, e.g. [["co2", "ch4"], ["!h2o", "!co2", "!ch4"], ["h2o"]]. Each inner list corresponds to one axes; this example would plot CO2 and CH4 in the first, everything except CO2, CH4, and H2O in the second, and only H2O in the third. Prefixing a name with exclamation points (as in the second inner list of the example) will exclude that gas from the axes. Note that while it is allowed to mix excludes and includes (e.g. ["co2", "!h2o"]), this is identical to only providing the includes (e.g. ["co2"]).

  • percentage (default = true): whether to plot what percentage of the data in each window is a NaN or fill value (true) or a number of spectra (false).

  • window_font_size (default = 6): the font size to use for the label over each bar that indicates exactly what window it represents.

  • sharey (default = false): whether to force all axes in this plot to use the same y-limits.

Style

A nan-check style subsection must have the all key, this is the only one used. Keywords can be:

  • width (default = 0.8): the width of each window group. Since this scales all groups together, it isn’t generally useful.

  • zero_color (default = “b”): the color to use for the bars for windows with no NaNs/fill values. May be any valid Matplotlib color specification.

  • color_map (default = “autumn_r”): the name of the color map to use to color bars with >0 NaNs/fill values. May be any recognized Matplotlib color map name.

Note

These options have not been tested, please report if they do not work.

neg-time-jump

A plot that displays histograms of differences in ZPD time between adjacent spectra (positive and negative differences shown in separate panels) and a timeline of when the negative time differences are in different ranges of values. This is intended to check for duplicate spectra.

Required keys

None.

Optional keys

None supported through configuration.

Style

A neg-time-jump style subsection must have the all key, this is the only one used. The only keyword used is marker.

timing-error-am-pm

A plot that shows a time series of resampled values for a specific range of SZA values in the morning and afternoon. This is an experimental plot type to try to detect timing errors from differences in the morning and afternoon values.

Note

This plot uses all data from the main and context files unless the --flag0 command line flag was given. flag == 0 and flag > 0 data is not plotted separately.

Required keys

  • sza_range: a 2-element list giving the range of SZA values (in degrees) to average the yvar in. Example: [70, 80]

Optional keys

  • yvar (default = "xluft"): the variable from the netCDF file to plot on the y-axis.

  • freq (default = "W"): the temporal frequency to bin the data to. Any Pandas frequency interval is supported

  • op (default = "median"): what operation to use in the binning, usually “median” or “mean”, but any operation supported on a Pandas resampled data frame is supported.

  • time_buffer_days (default = 2): number of days to buffer the edges of the plot by to ensure the first and last points do not end up on the plot edge.

  • flag_cat_override (default = None): whether to override the default flag category that the data is drawn from for the medians. If this option is not present, the all data (flag = 0 and flag > 0) is used unless the command line argument --flag0 is given, in which case only flag = 0 data is used. If given, this option must be one of the strings “all”, “flag0”, or “flagged”, and that category of data will always be used.

Style

A style subsection for one of these plots may have any or all of the keys both, am, or pm. These provide style keywords that apply to the series for the morning data (am), afternoon data (pm) or both (both). The keywords given can be any style keywords accepted by matplotlib.pyplot.plot().

The label keyword is treated specially. In Matplotlib, this keyword is used to set the legend text for a given data series. The QC plots will include a default label if you do not specify one. If you do specify one, it is passed through a format call where three keyword values are available:

  • data will be replaced with a short description of the data (site name and whether flag == 0, flag > 0, etc)

  • ll and ul will be replace with the lower and upper SZA limits, respectively.

delta-timing-error-am-pm

This is the same as timing-error-am-pm except that the value plotted on the y-axis is the difference (afternoon - morning) instead of plotting them separately. All the required and optional keys are the same.

timing-error-szas

A plot that shows a time series of resampled values for multiple SZA ranges in the morning or afternoon. This is an experimental plot type to detect timing errors from differences in the typical value at different SZAs.

Note

This plot uses all data from the main and context files unless the --flag0 command line flag was given. flag == 0 and flag > 0 data is not plotted separately.

Required keys

  • sza_ranges: a list of 2-element lists specifying which SZA ranges to plot. Example: [[70,80], [40,50], [20,30]].

  • am_or_pm: one of the strings “am” or “pm”, indicating that the plot should use morning (“am”) or afternoon (“pm”) data.

Optional keys

Identical to those for timing-error-am-pm plots.

Style

Because these plots have an arbitrary number of data series (one per SZA range) rather than specific data categories, their style definitions follow a different pattern from other plots. Valid keywords are those accepted by matplotlib.pyplot.plot(), but they are not grouped by data subset. These keywords are specified directly within a [style.<data type>.timing-error-szas] section, as:

[style.default.timing-error-szas]
marker = "o"
markersize = 1
linestyle = "none"
color = ["tab:blue", "tab:orange", "tab:green"]

The value for each key may be either a scalar value (as in marker, markersize, and linestyle above) or a list of values (as with color). If a scalar value is provided, that value is used for all data series representing different data ranges. If a list is provided, then the plot cycles through the values for the different SZA ranges.

Note

If the list has fewer values than there are SZA ranges, then the plot cycles back through the values as many times as needed. If you are getting identical styles for two data series, make sure your lists are long enough.

Similar to timing-error-am-pm, if a value for label is provided, then that string is formatted with the data, ll, and ul keywords. If label is not provided, a default is used. See above for their meanings. Like the other options in this plot’s styles, label may be a single string or a list of strings.

scatter

A plot of one variable versus another.

Required keys

  • xvar: the name of the variable in the netCDF files to plot on the x-axis

  • yvar: the name of the variable in the netCDF files to plot on the y-axis

Optional keys

  • match_axes_size (default = None): if given, this must be a valid hex to a “hexbin” plot. The scatter plot’s axes will be compressed to match the width of the hexbin, allowing for colorbars.

  • show_out_of_range_data (default = true): determines whether or not to plots points that would fall outside the plot limits at the edge. The default behavior is to plot them; set this to false to turn that feature off.

Note

The points outside the plot limits will use one of the triangle markers or the large diamond, depending on which limit or limit(s) they are outside. If you want to avoid confusing in-limit points for out-of-limit points, do not use any of the markers “v”, “^”, “<”, “>”, or “D” in your styles.

Style

A scatter plot’s style subsection may have the keys all, flag0, or flagged. These provide the style keyword arguments for plotting all data, flag == 0 data, and flag > 0 data, respectively. Allowed keywords are those for matplotlib.pyplot.plot(). If linestyle is not provided, it defaults to “none”.

Note

Do not use the ls shorthand for linestyle, since linestyle is always set.

A default label is provided that include the site name and what subset of data (flag == 0, flag > 0, etc) a series refers to. If you provide a custom label, this string can be inserted by including {data} in your string.

hexbin

A plot of one variable versus another similar to a scatter plot, except it plots a 2D histogram rather than individual points.

Note

This does not plot flag == 0 and flag > 0 data separately. If the --flag0 command line flag is present, only flag == 0 data is used, otherwise all data is used.

Required keys

  • xvar: the name of the variable in the netCDF files to plot on the x-axis

  • yvar: the name of the variable in the netCDF files to plot on the y-axis

Optional keys

  • show_reference (default = false): Set to true to plot the reference data (if provided) as a second 2D histogram.

  • show_context (default = false): Set to true to plot the context data (if provided) as a second 2D histogram.

Style

A hexbin’s style subsection may have the keys all and flag0, used when plotting all data or flag == 0 data, respectively. This accepts all style keywords allowed by matplotlib.pyplot.hexbin(). Note that extent is provided a reasonable default and usually does not need specified.

There are two special keywords in addition to the standard matplotlib.pyplot.hexbin() keywords:

  • fit_style takes as value another dictionary of style keywords valid for matplotlib.pyplot.plot() to use when plotting the linear fit through the hexbin data. If label is included in these keywords, the first {} in it will be replaced with the linear fit information.

  • legend_fontsize sets the fontsize of the legend. 7 pts is the default, and usually keeps the linear fit within the plot bounds.

timeseries

A plot of a given variable vs. time.

Required keys

  • yvar: the variable from the netCDF file(s) to plot on the y-axis

Optional keys

  • time_buffer_days (default = 2): number of days to buffer the edges of the plot by to ensure the first and last points do not end up on the plot edge.

  • show_out_of_range_data (default = true): determines whether or not to plots points that would fall outside the plot limits at the edge. The default behavior is to plot them; set this to false to turn that feature off.

Note

The points outside the plot limits will use one of the triangle markers or the large diamond, depending on which limit or limit(s) they are outside. If you want to avoid confusing in-limit points for out-of-limit points, do not use any of the markers “v”, “^”, “<”, “>”, or “D” in your styles.

Style

Style configuration is identical to that for scatter plots.

delta-timeseries

A plot of the difference of two variables vs. time.

Required keys

  • yvar1 and yvar2: the two variables to difference. The quantity plotted on the y-axis will be yvar1 - yvar2.

Optional keys

  • time_buffer_days (default = 2): number of days to buffer the edges of the plot by to ensure the first and last points do not end up on the plot edge.

  • show_out_of_range_data (default = true): determines whether or not to plots points that would fall outside the plot limits at the edge. The default behavior is to plot them; set this to false to turn that feature off.

Note

The points outside the plot limits will use one of the triangle markers or the large diamond, depending on which limit or limit(s) they are outside. If you want to avoid confusing in-limit points for out-of-limit points, do not use any of the markers “v”, “^”, “<”, “>”, or “D” in your styles.

Style

Style configuration is identical to that for scatter plots.

timeseries-2panel

A plot of two variables vs. time, with the second in a smaller upper panel. Typically used for a retrieved variable and its error.

Required keys

  • yvar: the variable from the netCDF file(s) to plot on the y-axis for the main axes

  • yerror_var: that variable from the netCDF file(s) to plot on the y-axis for the smaller upper axes.

Optional keys

  • time_buffer_days (default = 2): number of days to buffer the edges of the plot by to ensure the first and last points do not end up on the plot edge.

  • show_out_of_range_data (default = true): determines whether or not to plots points that would fall outside the plot limits at the edge. The default behavior is to plot them; set this to false to turn that feature off.

Note

The points outside the plot limits will use one of the triangle markers or the large diamond, depending on which limit or limit(s) they are outside. If you want to avoid confusing in-limit points for out-of-limit points, do not use any of the markers “v”, “^”, “<”, “>”, or “D” in your styles.

Style

Style configuration is identical to that for scatter plots. Both panels will use the same style for the same data subset.

timeseries-3panel

A plot of one variable vs. time with the y-axis split into three panels to allow different degrees of zoom on different parts of the y-values’ ranges. The middle panel will have its y-limits set to those specified in the Limits file file or the min/max values indicated in the netCDF file. The lower and upper panels will show data outside these limits (less and greater than, respectively) out to either the maximum of the data or limits specified with the bottom_limit and top_limit keywords.

Required keys

  • yvar: the variable from the netCDF file(s) to plot

Optional keys

  • time_buffer_days (default = 2): number of days to buffer the edges of the plot by to ensure the first and last points do not end up on the plot edge.

  • show_out_of_range_data (default = true): determines whether or not to plots points that would fall outside the plot limits at the edge. The default behavior is to plot them; set this to false to turn that feature off. For this kind of plot, data that exceeds the top_limit will be plotted on the top edge of the upper panel, and data that is less than the bottom_limit will be plotted on the bottom edge of the lower panel. (If those keywords are not set, this should be moot, as the limits will adjust to include all data.)

  • plot_height_ratios (default = [1.0, 1.0, 1.0]) - a three number sequence giving the relative size of the top, middle, and bottom panels, respectively.

  • height_space (default = 0.01) - fraction of vertical space reserved for the gap between plots.

  • bottom_limit (default = None) - providing a value for this keyword sets the lower limit of the bottom panel to that value.

  • top_limit (default = None) - providing a value for this keyword sets the upper limit of the top panel to that value.

  • even_top_bottom (default = false) - set this to true to automatically set bottom_limit and top_limit to be equal in magnitude but opposite in sign.

Note

even_top_bottom cannot be set to true if either bottom_limit or top_limit are provided. Doing so will cause an error to be thrown.

resampled-timeseries

Similar to “timeseries” plots, except that the data is broken down into chunks of a specified length of time and summarized as a mean/median/etc.

Required keys

  • yvar: the variable from the netCDF file(s) to plot on the y-axis

  • freq: the temporal frequency to bin the data to. Any Pandas frequency interval is supported

  • op: what operation to use in the binning, usually “median” or “mean”, but any operation supported on a Pandas resampled data frame is supported.

Optional keys

  • time_buffer_days (default = 2): number of days to buffer the edges of the plot by to ensure the first and last points do not end up on the plot edge.

  • show_out_of_range_data (default = true): determines whether or not to plots points that would fall outside the plot limits at the edge. The default behavior is to plot them; set this to false to turn that feature off.

Note

The points outside the plot limits will use one of the triangle markers or the large diamond, depending on which limit or limit(s) they are outside. If you want to avoid confusing in-limit points for out-of-limit points, do not use any of the markers “v”, “^”, “<”, “>”, or “D” in your styles.

Style

Style configuration is identical to that for scatter plots.

rolling-timeseries

Similar to “timeseries” plots, but in addition to plotting the raw data, running mean/median/etc. series are overplotted.

Required keys

  • yvar: the variable from the netCDF file(s) to plot on the y-axis

  • ops: what operation(s) to use for the rolling, usually “median” or “mean”, but any operation supported on a Pandas rolling data frame is supported. This can be either a string for a single operation, or a list of strings to plot multiple rolled series. A special case is the “quantile” operation, this must include the quantile value to calculate, e.g. “quantile0.75” to compute the quantile with q = 0.75.

Optional keys

  • gap (default = "20000 days"): this specified a gap in time that the rolling operation will not cross. This can be any string recognized by Pandas timedelta. If there is a gap in the data longer than this duration, the data on either side will have the rolling operation applied separately. The default of “20000 days” (~50 years) is set to effectively disable this behavior by default.

  • rolling_window (default = 500): the number of points to use in the rolling window.

  • uncertainty (default = false): set this to true to plot uncertainty ranges for mean or median operations; means will use 1-sigma standard deviation and medians the upper and lower quartiles.

  • data_category (default = None): which subset of the yvar data to use, both when plotting the raw data and when computing the rolling operation(s). The default behavior is to use the normal subset for a given data type, or flag == 0 data if the --flag0 command line argument is set. Passing one of the strings “all”, “flag0”, or “flagged” will force the use of that subset (this may result in errors if one of the data files does not have the “flag” variable, which is required to figure out the latter two subsets).

  • show_out_of_range_data (default = true): determines whether or not to plots points that would fall outside the plot limits at the edge. The default behavior is to plot them; set this to false to turn that feature off.

Note

The points outside the plot limits will use one of the triangle markers or the large diamond, depending on which limit or limit(s) they are outside. If you want to avoid confusing in-limit points for out-of-limit points, do not use any of the markers “v”, “^”, “<”, “>”, or “D” in your styles.

Style

Style configuration is similar to that for scatter plots, in that the keys within a [style.<data type>.rolling-timeseries] section can be the data subsets (all, flag0, flagged), each of which has a dictionary of style arguments as its value. However, the rolling operations can each have their own style, as additional subsection keys (e.g. mean, median, etc.). Quantile operations will prefer to use a style for the specific quantile being calculated (if one is available) but will fall back on a provided generic quantile style if not.

Note

The fallback to a generic quantile style is done on a per-data type basis. That is, if your “main” data type section has both a quantile and quantile0.75 style and your “default” section has only a quantile section, then when using the “quantile0.75” operation, the final style will use the “main” section’s quantile0.75 style plus the default section’s quantile style. The “main” section’s quantile style is entirely ignored.

Like scatter plots, if you provide a label as one of the style keywords, it will be passed through a format call. The {data} substring will still be replaced by the description of the data (site name + data subset). In addition, the {op} substring will be replaced with the rolling operation.

Note

If you use {op} in a label for regular data (e.g. all, flag0, flagged), it will get replaced by the string “None”.

If you provide styles for std and quantile, those styles will be used if plotting uncertainty ranges for mean and median operations, respectively.

If the final style (composed from data-specific + default styles) does not include a linestyle, then the linestyle value is set to “none”, as for scatter plots. Avoid using the “ls” shorthand for “linestyle” since “linestyle” will always be set if absent.

rolling-timeseries-3panel

A combination of the three panel timeseries plot and the rolling timeseries plot. This plots rolling means, medians, etc. in the three panel format of timeseries-3panel.

Required keys

Required keys are the same as rolling-timeseries.

Optional keys

All keys accepted by rolling-timeseries and timeseries-3panel are accepted by this plot.

delta-rolling-timeseries

A rolling timeseries plot of the difference between two quantities in the netCDF file.

Required keys

  • yvar1 and yvar2: the variables from the netCDF file(s) to difference. The quantity plotted on the y-axis is yvar1 - yvar2.

  • ops: what operation(s) to use for the rolling, usually “median” or “mean”, but any operation supported on a Pandas rolling data frame is supported. This can be either a string for a single operation, or a list of strings to plot multiple rolled series. A special case is the “quantile” operation, this must include the quantile value to calculate, e.g. “quantile0.75” to compute the quantile with q = 0.75.

Optional keys

  • gap (default = "20000 days"): this specified a gap in time that the rolling operation will not cross. This can be any string recognized by Pandas timedelta. If there is a gap in the data longer than this duration, the data on either side will have the rolling operation applied separately. The default of “20000 days” (~50 years) is set to effectively disable this behavior by default.

  • rolling_window (default = 500): the number of points to use in the rolling window.

  • uncertainty (default = false): set this to true to plot uncertainty ranges for mean or median operations; means will use 1-sigma standard deviation and medians the upper and lower quartiles.

  • data_category (default = None): which subset of the yvar data to use, both when plotting the raw data and when computing the rolling operation(s). The default behavior is to use the normal subset for a given data type, or flag == 0 data if the --flag0 command line argument is set. Passing one of the strings “all”, “flag0”, or “flagged” will force the use of that subset (this may result in errors if one of the data files does not have the “flag” variable, which is required to figure out the latter two subsets).

  • show_out_of_range_data (default = true): determines whether or not to plots points that would fall outside the plot limits at the edge. The default behavior is to plot them; set this to false to turn that feature off.

Note

The points outside the plot limits will use one of the triangle markers or the large diamond, depending on which limit or limit(s) they are outside. If you want to avoid confusing in-limit points for out-of-limit points, do not use any of the markers “v”, “^”, “<”, “>”, or “D” in your styles.

Style

Style is the same as for rolling-timeseries.

rolling-derivative

Rolling derivative plots compute a derivative of one variable vs. another across spectra in a rolling window. For example, if told to compute the first derivative of y with respect to x using a rolling window of 500 spectra, this will take spectra 1 through 500 and fit a slope of y versus x in those 500 spectra, then do the same for spectra 2 through 501, and so on.

Required keys

  • yvar: the variable in the numerator of the derivative (the dependent variable).

  • dvar: the varibale in the denominator of the derivative (the independent variable).

Optional keys

  • derivative_order (default = 1): order of the derivative to calculate; 1 will compute a slope, 2 curvature, etc. Only 1 is implemented.

  • gap (default = "20000 days"): this specified a gap in time that the rolling operation will not cross. This can be any string recognized by Pandas timedelta. If there is a gap in the data longer than this duration, the data on either side will have the rolling operation applied separately. The default of “20000 days” (~50 years) is set to effectively disable this behavior by default.

  • rolling_window (default = 500): the number of points to use in the rolling window.

  • data_category (default = None): which subset of the data to use when computing the rolling derivative. The default behavior is to use the normal subset for a given data type, or flag == 0 data if the --flag0 command line argument is set. Passing one of the strings “all”, “flag0”, or “flagged” will force the use of that subset (this may result in errors if one of the data files does not have the “flag” variable, which is required to figure out the latter two subsets).

  • show_out_of_range_data (default = true): determines whether or not to plots points that would fall outside the plot limits at the edge. The default behavior is to plot them; set this to false to turn that feature off.

Note

The points outside the plot limits will use one of the triangle markers or the large diamond, depending on which limit or limit(s) they are outside. If you want to avoid confusing in-limit points for out-of-limit points, do not use any of the markers “v”, “^”, “<”, “>”, or “D” in your styles.

zmin-zobs-delta-rolling-timeseries

A delta-rolling-timeseries plot customized for the zmin - zobs difference. It includes the estimated corresponding pressure difference on the right hand side of the axes as well as an annotation indicating the site altitude and bottom GEOS level altitude.

Required keys

Optional keys

  • annotation_font_size (default = 6): the font size for the site/GEOS altitude annotation.

The other optional keys are the same as for delta-rolling-timeseries. Note that when using the violin auxiliary plot for this, the violin_plot_pad keyword is given a default value of 1.0 instead of 0.5 and the violin plot y-ticks are turned off by default. Both of these changes are to allow space for the estimated pressure difference.

prior-time-matchup-timeseries

A plot similar to timeseries but customized to identify cases where the difference between the prior time value and the ZPD time exceed some limit. This is used to detect cases where the prior index has not matched up correctly with the observed data. It includes text in the lower left corner giving the number of observations where this is likely the case. There are no required keys.

Optional keys

  • max_time_diff_hours (default = 1.51): observations with a difference between their prior time and ZPD time greater than this value will be counted as “out of bounds”. The default value of 1.51 was chosen because, for GGG2020, priors change every 3 hours and observations should use the closest prior. Therefore, no observation should have more than 1.5 hour difference in the priors vs. ZPD time, and we add a small cushion to avoid false positives.

  • mark_out_of_bounds_time_diffs (default = true): when this is true, observations considered out of bounds based on max_time_diff_hours will be highlighted by a background fill.

  • out_of_bounds_contiguous_days (default = 30): when mark_out_of_bounds_time_diffs is true, the background fills will be drawn for contiguous groups of out-of-bounds points. This determines how many days there must be between adjacent out-of-bounds points for the background fill to break. 30 was chosen as the default since the shortest time period that groups typically process is one month, so this tries to group together points likely processed together.

  • out_of_bounds_style (default = {color = "crimson", alpha = 0.5}): this controls the style of the background fills. It can be keywords for Matplotlib patches.

Auxiliary Plots

Auxiliary plots are extra panels that can be added to a main plot to provide extra information. To add an auxiliary plot to a page, add +<auxkind> to the end of the main plot’s kind values. For example, to add a violin plot to a timeseries plot, set the kind value to "timeseries+violin". Internally, "timeseries" and "timeseries+violin" are implemented as separate plot kinds. While this should be largely transparent to a user, it does have several implications to be aware of:

  1. Not all combinations of main + auxiliary plots will be implemented. Which auxiliary plots are supported with which main plots is listed above in Plot types.

  2. Only one auxiliary plot can be combined with a main plot. (Allowing multiple auxiliary plots to be added to a single plot would require a separate implementation for each possible combination, which isn’t practical. Future work could refactor the approach to auxiliary plots to make this more viable.)

  3. A main + auxiliary combination can have different styles and limits than the the main plot type alone; to continue our example from the first paragraph, you could readily define a ["timeseries+violin"] section in the Limits file or a [style.main."timeseries+violin"] Styles section to set limits or a style customized for timeseries plots with a violin plot attached only (i.e. not for normal timeseries plots). However, the default behavior is for the main plot to use the limits and styles it would without the auxiliary plot.

Note

If you do add a section for a main+auxiliary plot, you will need to quote the plot kind in the TOML file. Note how in the examples in the last point above, such as [style.main."timeseries+violin"] the “timeseries+violin” part is quoted. TOML files will not include plus signs in a string without it being quoted; if you did not quote this (i.e. [style.main.timeseries+violin]), it would be interpreted as a section named [style.main.timeseries]. If you already have a section named that, you’ll get a TOML error when running QC plots.

Note that in the third point above, the styles referred to are those for the main plot. Styles for the auxiliary plots need to be defined separately in the configurations; this will be described with each plot kind below.

The following subsections describe the available auxiliary plot kinds, including the extra required or optional keys they add to their [[plots]] section in the configuration file and their style options.

Violin aux plots

A violin auxiliary plot adds a small plot to the side of the main plot that shows the distribution of the y-variable of the main plot in some standard good-quality data. Note that this is separate from the normal reference file.

Required keys

  • violin_data_file: a path to the netCDF file to use to create the violin plots.

Optional keys

  • violin_plot_side (default = “right”): which side of the main plot axes to put the violin plot on. Can be “right”, “left”, “bottom”, or “top” (though only “left” or “right” are recommended).

  • violin_plot_size (default = “10%”): how big to make the violin plot horizontally (if the side is “left” or “right”) or vertically (if the side is “bottom” or “top”). To give as a percentage of the original plot size (easiest), make this a string ending in the percent sign, as the default is.

  • violin_plot_pad (default = 0.5): space to reserve between the original axes and the new violin plot axes.

  • violin_plot_hide_yticks (default = false): set to true to hide the y-tick labels on the violin plot axes.

Style

Style for the violin plots is read exclusively from the [style.extra.aux-violin] section. While this can have all the usual data subsets as keys (flag0, flagged, all), usually only the flag0 style matters since violin plots use flag = 0 exclusively. This can accept any keywords that matplotlib.pyplot.violinplot() does except for dataset and positions (they are already used), plus two additional keywords:

  • fill_color: color to make the violin density kernel.

  • line_color: color to make any lines (medians, extrema, etc.) on the plot.

An example style section is:

[style.extra.aux-violin]
flag0 = {showmedians = true, showextrema = false, fill_color = "silver", line_color = "dimgray"}

Limits file

Basic format

The limits file is broken down into sections that specify limits for different kinds of plots. Default values for each variable can also be specified. An example of a simple limits file is:

[default]
xluft = [0.975, 1.025]
xch4 = [1.6, 2.0]
xch4_error = [0, 0.05]

[scatter]
xluft = [0.996, 1.002]

Each section starts with a value in brackets. The [default] section in this example specifies the default limits for three variables: xluft, xch4, and xch4_error. Note that each set of limits is given as a list, also in square brackets.

Note

Make sure the limits have the lower value first! The TCCON QC code makes no guarantees about how the plots will behave if the limits are reversed.

In this example, we have a second section, [scatter] which specifies limits for xluft. This means that any scatter plots will use the tighter limits specified in this second section, while all other plots will use the looser limits given in [default].

The allowed section names other than [default] are the same as the allowed values for the kind argument in the primary configuration.

Wildcards

The limits file also supports limited wildcards in the variable names, so that a limit can match for all variables whose names follow a certain pattern. The allowed wildcards are:

  • * - matches 0 or more characters (i.e. anything)

  • ? - matches any single character

  • [seq] - matches any character in “seq”

  • [!seq] - matches any character not in “seq”

Consider this example:

[default]
"*vsf_hcl*" = [0.7, 1.4]
"vsf_*" = [0.9, 1.1]
"*_fs" = [-2, 2]

The first entry will match any variable that includes the substring “vsf_hcl” anywhere, because the two * can match anything (including nothing). The second entry will only match variables that begin with “vsf_”, while the third will only match variables that end in “_fs”.

Note

In this example, the strings on the left side of the equals sign are quoted, when they weren’t in the non-wildcard example. Whenever using special characters like *, it’s best to quote the string to ensure TOML interprets it as a string.

Precedence

With wildcards, it is quite easy to have a variable match multiple entries in your limits file. TCCON QC uses three rules to determine which limit to use:

  1. A plot specific section takes precedence over the [default] section

  2. Use the first entry in a section that matches the variable

  3. If no entry matches that variable, use the vmin and vmax attributes for that variable from the netCDF file(s) being plotted.

Email file

The email configuration file allows you to specify how to send emails containing the plots. An example file is:

[server]
use_external_program = true

[server.program]
program = "mail"
subject_flag = "-s"
from_addr_flag = "-r"
attachment_flag = "-a"
body_arg = "stdin"

[server.smtp]
smtp_address = "smtp.gmail.com"
smtp_port = 587

[email]
from = "me@self.com"
to = "you@other.edu"
body = "Plots automatically generated by `tccon_qc_plots` on {date} from {basename}."
subject_from_site_id = true
subject = "[#275]"

[email.sites]
ae = 226  # Ascension Island
an = 224  # Anmyeondo
bi = 213  # Bialystok
br = 236  # Bremen
bu = 220  # Burgos
ci = 210  # Caltech/Pasadena
db = 214  # Darwin
df = 225  # Armstrong/Dryden/Edwards
et = 227  # East Trout Lake
eu = 222  # Eureka
gm = 234  # Garmisch
hf = 276  # Hefei
hw = 274  # Harwell
iz = 216  # Izana
js = 233  # Saga
ka = 217  # Karlsruhe
ll = 219  # Lauder pre-2018 125HR
lr = 221  # Lauder post-2018 125HR
ni = 240  # Nicosia
ny = 237  # Ny-Alesund
oc = 223  # Lamont
or = 212  # Orleans
pa = 211  # Park Falls
pr = 231  # Paris
ra = 260  # Reunion
rj = 228  # Rikubetsu
so = 218  # Sodankyla
sp = 237  # Alternate abbreviation for Ny-Alesund?
tj = 229  # Tsukuba 120HR
tk = 229  # Tsukuba 125HR
wg = 215  # Wollongong
xh = 271  # Xianghe
zs = 235  # Zugspitze

This is a TOML document. For details on the TOML syntax, see https://toml.io/en/. Now, let’s consider each section.

server section

This section contains general options for what email server to use to send the email. It only has one option presently:

  • use_external_program - a boolean value that determines whether emails are sent using a command line program like mail (true) or Python’s own SMTP library (false).

server.program section

This section contains options specific to the case where emails are sent using a command line program. The required options are:

  • program - the name of the command line program to call.

  • subject_flag - what command line flag to use to pass the subject of the email.

  • from_addr_flag - what command line flag to use to pass the from address.

  • attachment_flag - what command line flag to use to pass a path to a file to attach.

  • body_arg - How to pass the body of the email. Currently the only acceptable value is “stdin”, meaning that the program accepts the body through piping (e.g. echo "This is the body" | mail) or input redirection (e.g. mail < body_file). If you intend to use a program that does not accept the message body in this way, QC plots will need updated.

Note that when using an external program, QC plots assumes that it accepts the “to” email addresses as the sole positional argument. If you wish to use an email program for which that is not true, QC plots will need upgraded.

server.smtp section

This section contains options specific to the case where emails are sent using Python’s smtp module. Note that this functionality has not been thoroughly tested, as it did not work with the SMTP server on tccondata.

  • smtp_address - what address to connect to to send the email. Common values are “localhost” (use an SMTP server on this computer), “smtp.gmail.com” (to send from a Gmail account) and “smtp.outlook.com” (to send from an Outlook account). Note that these last two may require an account for which insecure sending is permitted.

  • smtp_port - what port to connect to. A value of 0 will try to guess; gmail and output server both use 587.

  • password - password to use to connect to the sending account.

  • requires_auth - whether the sending account needs authentication (true or false). If true, then you will be prompted to enter your password interactively (so don’t use this in automated scripts). If false, the sending account does not need authentication to connect. If password is present, this option is ignored. Otherwise, true is the default.

Warning

If you put a login password in this configuration file, you should make sure that only trusted users can read it. On a Unix/Linux system, you should remove all access permissions for “other” set of users at the very least, and ideally this file would only be readable by the owner.

email section

The section controls the content of the email, as well as where it is sent.

  • from - the sending email address. This can be used to connect to a Gmail or Outlook server if use_external_program is false, in which case you will need to provide login authentication. If sending emails with an external (command line) program, this account does not need to be logged in to, it will just be set as the sender.

  • to - the recipient email address. If sending to GGGBugs, this will be the same email that sends alerts about watched topics (that you can reply to to update the topic).

  • body - the main body of the email. There are three substrings that will be substituted with useful values, if present:

    • “{date}” will be replaced with the current date, time, and timezone when the email is sent.

    • “{basename}” will be replaced with the name of the netCDF file given as input to the plotting program, without leading directories.

    • “{plot_url}” will be replaced with a URL at which the plots can be accessed. Note that if the plotting program is called without the --plot-url command line argument (or plot_url=None in the driver Python function) then this value will be None. If this substring is not present in your email body, but the QC plotting program was told to provide a URL, then a short sentence giving the URL is appended automatically.

Note

The body is formatted using Python’s string formatter. This means that if you have curly brace in the body (other than in the allowed substrings listed above) it will try to replace those curly braces, and probably crash due to missing format arguments. Avoid putting curly braces in your body other than around the substrings mentioned above, but if you must have a curly brace, write a double brace ({{ or }}) to protect it from formatting.

  • subject_from_site_id - a boolean indicating if the subject should be derived from the site ID, which is assumed to be the first two letters of the netCDF file name. If this is true, then the subject is determined from the email.sites section of this configuration file. If this is false, then the subject is set to the value of the subject setting in this section.

  • subject - the subject for the email; only used if subject_from_site_id is false.

email.sites section

In this section, each key value (on the left site of the equals sign) is a site ID, and the value on the right is the topic number that site has in GGG Bugs. When sending an email with subject_from_site_id = true, the first two letters of the netCDF file name will be compared against the keys in this section. If a match is found, the subject will be “[#N]”, where N is the number from this section. This is the format GGG Bug’s redmine software uses to match up an incoming email to a topic.

Warning

If your netCDF file has a site ID not in this list when subject_from_site_id is true, you’ll get an error and the email won’t send.