Command Line Interface

metaDMG has the following commands:


Config

metaDMG config takes a single argument, samples, and a bunch of additional options and flags. The samples refer to a single or multiple alignment-files (or a directory containing them), all with the file extensions: .bam, .sam, and .sam.gz.

Parameters

Damage mode

  • --damage-mode: [lca|local|global]. lca is the recommended and automatic setting. If using local, it means that damage patterns will be calculated for each chr/scaffold contig. If using global, it means one global estimate. Note that when using [local|global] the LCA parameters won’t matter.

LCA

  • Options:

    • --names: Path to the (NCBI) names-mdmg.dmp. Mandatory for LCA.

    • --nodes: Path to the (NCBI) nodes-mdmg.dmp. Mandator for LCA.

    • --acc2tax: Path to the (NCBI) acc2tax.gz. Mandatory for LCA.

    • --min-similarity-score: Normalised edit distance (read to reference similarity) minimum. Number between 0-1. Default: 0.95.

    • --max-similarity-score: Normalised edit distance (read to reference similarity) maximum. Number between 0-1 Default: 1.0.

    • --min-edit-dist: Minimum edit distance (read to reference similarity). Positive integer. Note that edit distances scores cannot be set at the same time as similarity scores; choose one or the other.

    • --max-edit-dist: Maximum edit distance (read to reference similarity). Positive integer. Note that edit distances scores cannot be set at the same time as similarity scores; choose one or the other.

    • --min-mapping-quality: Minimum mapping quality. Default: 0.

    • --lca-rank: The LCA rank used in ngsLCA. Can be either family, genus, species or "" (everything). Default is "".

  • Flags:

    • --custom-database: Using a custom database or the NCBI. If NCBI, automatically corrects for a couple of bad taxa. Default is False.

General

  • Options:

    • --output-dir: Path where the generated output files and folders are stored. Default: ./data/.

    • --config-file: The name of the generated config file. Default: config.yaml.

    • --metaDMG-cpp: The command needed to run the metaDMG-cpp program.

    • --max-position: Maximum position in the sequence to include. Default is (+/-) 15 (forward/reverse).

    • --min-reads: Minimum number of reads to include in the fits (min_reads <= N_reads)..

    • --parallel-samples: The number of samples to run in parallel. Default is running in seriel.

    • --cores-per-sample: Number of cores to use pr. sample. Do not change unless you know what you are doing.

    • --sample-prefix: Prefix for the sample names.

    • --sample-suffix: Suffix for the sample names.

    • --weight-type: Method for calculating weights. Default is 1. Do not change unless you know what you are doing.

  • Flags:

    • --forward-only: Only fit the forward strand.

    • --bayesian: Include a fully Bayesian model (probably better, but also a lot slower, about a factor of 100).

    • --long-name: Use the full, long, name for the sample.

    • --overwrite: Overwrite config file without user confirmation.

Examples

$ metaDMG config raw_data/alignment.sorted.bam \
    --names raw_data/names-mdmg.dmp \
    --nodes raw_data/nodes-mdmg.dmp \
    --acc2tax raw_data/acc2taxid.map.gz \
    --parallel-samples 4

metaDMG is pretty versatile regarding its input argument and also accepts multiple alignment files:

$ metaDMG config raw_data/*.bam [...]

or even an entire directory containing alignment files (.bam, .sam, and .sam.gz):

$ metaDMG config raw_data/ [...]

To run metaDMG in non-LCA mode, an example could be:

$ metaDMG config raw_data/alignment.sorted.bam --damage-mode local --max-position 15 --bayesian

Config GUI

metaDMG config-gui is a simple graphical user interface (GUI) to help with the config creation. The command itself does not take any parameters, everything is done by clicking and dragging. For more information about what the different buttons and sliders mean, see the normal config command.

Examples

$ metaDMG config-gui

The GUI presented looks like this:

_images/config_gui.png

Mandatory fields that need to be filled are coloured red. Note that if you change the damage mode to LOCAL or GLOBAL, the bottom left square becomes disabled, since these parameters are only relevant for LCA.


Compute

The metaDMG compute command takes an optional config-file as argument (defaults to config.yaml if not specified).

Parameters

  • Flags:

    • --force: Forced computation (even though the files already exists).

Examples

$ metaDMG compute
$ metaDMG compute non-default-config.yaml --force

Dashboard

You can now see a preview of the interactive dashboard.

The metaDMG dashboard command takes first an optional config-file as argument (defaults to config.yaml if not specified).

Parameters

  • Options:

    • --results: Path to the results directory.

    • --port: The port to be used for the dashboard. Default is 8050.

    • --host: The dashboard host adress. Default is 0.0.0.0.

  • Flags:

    • --debug: Allow for easier debugging the dashboard. For internal usage.

    • --server: If running on a server

Examples

$ metaDMG dashboard
$ metaDMG dashboard non-default-config.yaml --port 8050 --host 0.0.0.0

Get Data

The metaDMG get-data command gets test data and saves it in the output-dir. Useful for e.g. the online tutorial.

Parameters

  • Options:

    • --output-dir: Path to the output directory.

Examples

$ metaDMG get-data --output-dir raw_data

Convert

The metaDMG convert command takes first an optional config-file as argument (defaults to config.yaml if not specified) used to infer the results directory.

Parameters

  • Options:

    • --results: Direct path to the results directory.

    • --output: Mandatory output path.

  • Flags:

    • --add-fit-predictions: Include fit predictions D(x) in the output.

Note that neither the config-file nor --results have to be specified (in which just the default config.yaml is used), however, both cannot be set at the same time.

Examples

$ metaDMG convert --output ./directory/to/contain/results.csv
$ metaDMG convert non-default-config.yaml --output ./directory/to/contain/results.csv --add-fit-predictions

Filter

The metaDMG filter command takes first an optional config-file as argument (defaults to config.yaml if not specified) used to infer the results directory.

Parameters

  • Options:

    • --results: Direct path to the results directory.

    • --output: Mandatory output path.

    • --query: The query string to use for filtering. Follows the Pandas Query() syntax. Default is "" which applies no filtering and is thus similar to the metaDMG convert command.

  • Flags:

    • --add-fit-predictions: Include fit predictions D(x) in the output.

Note that neither the config-file nor --results have to be specified (in which just the default config.yaml is used), however, both cannot be set at the same time.

Examples

$ metaDMG filter --output convert-no-query.csv # similar to metaDMG convert
$ metaDMG filter --output convert-test.csv --query "N_reads > 5_000 & sample in ['subs', 'SPL_195_9299'] & tax_name == 'root'" --add-fit-predictions

Plot

The metaDMG plot command takes first an optional config-file as argument (defaults to config.yaml if not specified).

Parameters

  • Options:

    • --results: Direct path to the results directory.

    • --query: The query string to use for filtering. Follows the Pandas Query() syntax. Default is "" which applies no filtering.

    • --samples: A comma-space separated string containing the samples to use in the plots. Default is "" which applies no filtering.

    • --tax-ids: A comma-space separated string containing the tax-ids to use in the plots. Default is "" which applies no filtering.

    • --output: The path to the output pdf-file. Defaults to pdf_export.pdf.

Examples

$ metaDMG plot
$ metaDMG plot --query "100_000 <= N_reads & 8_000 <= phi" --tax-ids "1, 2, 42" --samples "sampleA, another-sample" --pdf-out name-of-plots.pdf

PMD

The metaDMG PMD command takes an alingment file as argument and computes the PMD scores for each read in the file. The results are saved to a csv file.

Examples

$ metaDMG PMD raw_data/alignment.sorted.bam --output PMDs.csv --metaDMG-cpp ./metaDMG-cpp

mismatch-to-mapDamage

The metaDMG mismatch-to-mapDamage command takes a mandatory mismatch-file as argument and converts it to the mapDamage format misincorporation.txt.

Parameters

  • Options:

    • --csv-out: Output CSV file (misincorporation.txt). Default is misincorporation.txt.

Examples

$ metaDMG mismatch-to-mapDamage data/mismatches/XXX.mismatches.parquet
$ metaDMG mismatch-to-mapDamage data/mismatches/XXX.mismatches.parquet --csv-out misincorporation.txt