Workflow Parameters

The workflow parameters should be included in a configuration file, an example of which can be found at https://raw.githubusercontent.com/mriffle/nf-openmod-dda/main/resources/pipeline.config

The parameters in this file should be changed to indicate the locations of your data, the options you’d like to use for the software included in the workflow, and the capabilities and configuration for the system on which you are running the workflow steps.

The configuration file is roughly organized as:

params {
...
}

profiles {
...
}

mail {
...
}

The params section includes locations of data and configuration options for a specific run of the workflow.
The profiles sections includes parameters that describe the capabilities of the systems that run the steps of the workflow. For example, if running on your local system, this will include things like how many cores and how much RAM may be used by the steps of the workflow. This will not need to be changed for each run of the workflow.
The mail section includes configuration options for sending email. This is optional and only necessary if you wish to send emails when the workflow completes. This will not need to be changed for each run of the workflow.

Below is a complete description of all parameters that may be included in these sections.

Note

This workflow can process files stored in PanoramaWeb. When specifying directories or file locations, any paths that begin with https:// will be interpreted as being PanoramaWeb locations.

For example, to process raw files stored in PanoramaWeb, you would have the following in your pipeline.config file:

spectra_dir = 'https://panoramaweb.org/_webdav/path/to/@files/RawFiles/'

Where, https://panoramaweb.org/_webdav/path/to/@files/RawFiles/ is the WebDav URL of the folder on the Panorama server.

The `params` Section

Parameters for the `params` section
Req?	Parameter Name	Description
✓	`spectra_dir`	The path to the location of the raw or mzML files to be processed. This can be a directory location (e.g., `/data/mass_spec/my_raw_files/`) or a Panorama WebDAV URL (described above).
✓	`fasta`	The path to the location of the FASTA file to be used in the Magnum search. This can be a file location (e.g., `/data/mass_spec/my.fasta`) or a Panorama WebDAV URL (described above).
	`generate_decoys`	If `true`, the workflow will generate decoys using yarp. If `false`, decoys must already be present in the FASTA file and `Magnum.conf` must be told the decoy prefix. Default: `false`.
	`magnum_conf`	The path to the location of the Magnum configuration file to be used in the Magnum search. This can be a file location (e.g., `/data/mass_spec/Magnum.conf`) or a Panorama WebDAV URL (described above). Default: `'Magnum.conf'`.
	`process_separately`	Set to `true` to run Percolator and Limelight upload separately for each input file. If `false`, results are combined before running Percolator and uploading to Limelight. Default: `false`. Note Combining output for Percolator may result in better statistics, but it makes it harder to compare the results from individual raw files to other searches that were not a part of that Percolator run.
	`percolator_pin_columns_to_remove`	Optional list of PIN header names to remove before Percolator runs. Provide as a list (e.g., `['delta_score', 'mod_mass']`) or as a comma-delimited string. Leave empty to disable filtering. Default: `[]`.
	`limelight_upload`	Set to `true` to upload to Limelight. If set to `true`, the following Limelight-related parameters apply. Default: `false`.
	`limelight_project_id`	This is required if `limelight_upload` is set to `true`. This is the Limelight project ID to which to upload data.
	`limelight_webapp_url`	This is required if `limelight_upload` is set to `true`. This is the URL of the Limelight instance to which to upload data. E.g., `'https://limelight.yeastrc.org/limelight'`.
	`limelight_search_description`	Optional if `limelight_upload` is set to `true`. This is a one-line description of the search that will appear in Limelight. If `process_separately` is set to `true`, the base name of the raw/mzML file will be appended to this description. If not provided, the workflow will upload with `--no-search-description`.
	`limelight_search_short_name`	Optional if `limelight_upload` is set to `true`. This is a very brief one-word nickname for this search. Used in plots to label data. This is ignored if `process_separately` is set to `true`. If not provided, no short-label argument is sent.
	`limelight_tags`	Optional comma-delimited list of Limelight tags to use for this search (e.g., `'yeast,control,2023'`. Any tags present that haven’t been created in Limelight will be created in Limelight. Note: You can also specify categories for tags, and tags with the same tag categories will be grouped together in Limelight. For example, one could have a tag category called `treatment` and tags called `control` or `irradiated` as tags within this tag category. To specify a tag category use the tag category name then a tilda (~) then the tag name. E.g., `treatment~control,organism~yeast,year~2023`. Default: no tags will be sent.
	`email`	The email address to which a notification should be sent upon workflow completion. If no email is specified, no email will be sent. To send email, you must configure mail server settings (see below).
	`result_dir`	Directory where workflow results are published. Default: `'results/nf-openmod-dda'`.
	`report_dir`	Directory where Nextflow execution reports (timeline, report, trace) are written. Default: `'reports/nf-openmod-dda'`.

The `profiles` Section

The example configuration file includes this profiles section:

profiles {

    // "standard" is the profile used when the steps of the workflow are run
    // locally on your computer. These parameters should be changed to match
    // your system resources (that you are willing to devote to running
    // workflow jobs).
    standard {
        // cap per-task resource requests to what this machine provides
        process.resourceLimits = [ cpus: 8, memory: 16.GB, time: 240.h ]

        params.mzml_cache_directory = '/data/mass_spec/nextflow/nf-openmod-dda/mzml_cache'
        params.panorama_cache_directory = '/data/mass_spec/nextflow/panorama/raw_cache'
    }
}

These parameters describe the capability of your local computer for running the steps of the workflow. Below is a description of each parameter:

Parameters for the `profiles/standard` section
Req?	Parameter Name	Description
✓	`process.resourceLimits`	A map capping the CPUs, memory, and time any single workflow step may request, e.g. `[ cpus: 8, memory: 16.GB, time: 240.h ]`. Set this to match the resources of the machine (or queue) running the workflow; per-process requests are clamped to these ceilings.
✓	`params.mzml_cache_directory`	When `msconvert` converts a RAW file to mzML, the mzML file is cached for future use. This specifies the directory in which the cached mzML files are stored.
✓	`params.panorama_cache_directory`	If the RAW files to be processed are in PanoramaWeb, the RAW files will be downloaded to and cached in this directory for future use.

Note

The example above shows the standard profile, used to run the workflow locally. The workflow also ships slurm and aws profiles, selected with -profile slurm or -profile aws. Each profile defines its own process.resourceLimits and cache directories (the aws profile uses s3:// cache locations); override these in your config to match your cluster or AWS Batch compute environment. See How to Setup and Configure AWS Batch for running on AWS Batch.

Warning

These caches are keyed only by output file name, not by content or conversion settings. If you change msconvert options, or reuse a file name for different input data, the workflow will reuse the previously cached file rather than regenerating it. Clear the relevant cache directory (mzml_cache_directory / panorama_cache_directory) when you change conversion settings or reuse file names.

The `mail` Section

This is a more advanced and entirely optional set of parameters. When the workflow completes, it can optionally send an email to the address specified above in the params section. For this to work, the following parameters must be changed to match the settings of your email server. You may need to contact your IT department to obtain the appropriate settings.

The example configuration file includes this mail section:

mail {
    from = 'address@host.com'
    smtp.host = 'smtp.host.com'
    smtp.port = 587
    smtp.user = 'smpt_user'
    smtp.password = 'smtp_password'
    smtp.auth = true
    smtp.starttls.enable = true
    smtp.starttls.required = false
    mail.smtp.ssl.protocols = 'TLSv1.2'
}

Below is a description of each parameter:

Parameters for the `mail` section
Req?	Parameter Name	Description
✓	`from`	The email address from which the email should be sent.
✓	`smtp.host`	The internet address (host name or ip address) of the email SMTP server.
✓	`smtp.port`	The port on the host to connect to. Most likely will be `587`.
	`smtp.user`	If authentication is required, this is the username.
	`smtp.password`	If authentication is required, this is the password.
✓	`smtp.auth`	Whether or not (true or false) authentication is required.
✓	`smtp.starttls.enable`	Whether or not to enable TLS support.
✓	`smtp.starttls.required`	Whether or not TLS is required.
✓	`mail.smtp.ssl.protocols`	SSL protocol to use for sending SMTP messages.

Workflow Parameters

The params Section

The profiles Section

The mail Section

The `params` Section

The `profiles` Section

The `mail` Section