Skip to content

Snipe CLI Documentation

qc(ref, sample, samples_from_file, amplicon, roi, advanced, ychr, debug, output, vars)

Perform quality control (QC) on multiple samples against a reference genome.

This command calculates various QC metrics for each provided sample, optionally including advanced metrics and ROI (Return on investement) predictions. Results are aggregated and exported to a TSV file.

Usage

snipe qc [OPTIONS]

Options

  • --ref PATH [required]
    Reference genome signature file.

  • --sample PATH
    Sample signature file. Can be provided multiple times.

  • --samples-from-file PATH
    File containing sample paths (one per line).

  • --amplicon PATH
    Amplicon signature file (optional).

  • --roi
    Calculate ROI for 1x, 2x, 5x, and 9x coverage folds.

  • --advanced
    Include advanced QC metrics.

  • --ychr PATH
    Y chromosome signature file (overrides the reference ychr).

  • --debug
    Enable debugging and detailed logging.

  • -o, --output PATH [required]
    Output TSV file for QC results.

  • --var PATH
    Variable signature file path. Can be used multiple times.

Examples

Performing QC on Multiple Samples
snipe qc --ref reference.sig --sample sample1.sig --sample sample2.sig -o qc_results.tsv
Performing QC with Samples Listed in a File
snipe qc --ref reference.sig --samples-from-file samples.txt -o qc_results.tsv

Contents of samples.txt:

sample1.sig
sample2.sig
sample3.sig
Performing QC with an Amplicon Signature
snipe qc --ref reference.sig --amplicon amplicon.sig --sample sample1.sig -o qc_results.tsv
Including Advanced QC Metrics and ROI Calculations
snipe qc --ref reference.sig --sample sample1.sig --advanced --roi -o qc_results.tsv
Using Multiple Variable Signatures
snipe qc --ref reference.sig --sample sample1.sig --var var1.sig --var var2.sig -o qc_results.tsv
Overriding the Y Chromosome Signature
snipe qc --ref reference.sig --sample sample1.sig --ychr custom_y.sig -o qc_results.tsv
Combining Multiple Options
snipe qc --ref reference.sig --sample sample1.sig --sample sample2.sig --amplicon amplicon.sig --var var1.sig --var var2.sig --advanced --roi -o qc_results.tsv

Detailed Use Cases

Use Case 1: Basic QC on Single Sample

Objective: Perform QC on a single sample against a reference genome without any advanced metrics or ROI.

Command:

snipe qc --ref reference.sig --sample sample1.sig -o qc_basic.tsv

Explanation:

  • --ref reference.sig: Specifies the reference genome signature file.
  • --sample sample1.sig: Specifies the sample signature file.
  • -o qc_basic.tsv: Specifies the output TSV file for QC results.

Expected Output:

A TSV file named qc_basic.tsv containing basic QC metrics for sample1.sig.

Use Case 2: QC on Multiple Samples with ROI

Objective: Perform QC on multiple samples and calculate Regions of Interest (ROI) for each.

Command:

snipe qc --ref reference.sig --sample sample1.sig --sample sample2.sig --roi -o qc_roi.tsv

Explanation:

  • --ref reference.sig: Reference genome signature file.
  • --sample sample1.sig & --sample sample2.sig: Multiple sample signature files.
  • --roi: Enables ROI calculations.
  • -o qc_roi.tsv: Output file for QC results.

Expected Output:

A TSV file named qc_roi.tsv containing QC metrics along with ROI predictions for sample1.sig and sample2.sig.

Use Case 3: Advanced QC with Amplicon and Variable Signatures

Objective: Perform advanced QC on a sample using an amplicon signature and multiple variable signatures.

Command:

snipe qc --ref reference.sig --amplicon amplicon.sig --sample sample1.sig --var var1.sig --var var2.sig --advanced -o qc_advanced.tsv

Explanation:

  • --ref reference.sig: Reference genome signature file.
  • --amplicon amplicon.sig: Amplicon signature file.
  • --sample sample1.sig: Sample signature file.
  • --var var1.sig & --var var2.sig: Variable signature files.
  • --advanced: Includes advanced QC metrics.
  • -o qc_advanced.tsv: Output file for QC results.

Expected Output:

A TSV file named qc_advanced.tsv containing comprehensive QC metrics, including advanced metrics and analyses based on the amplicon and variable signatures for sample1.sig.

Use Case 4: Overwriting Existing Output File

Objective: Perform QC and overwrite an existing output TSV file.

Command:

snipe qc --ref reference.sig --sample sample1.sig -o qc_results.tsv

Explanation:

  • If qc_results.tsv already exists, the command will fail to prevent accidental overwriting. To overwrite, use the --force flag (assuming you've implemented it; if not, you may need to adjust the qc command to include a --force option).

Adjusted Command with --force (if implemented):

snipe qc --ref reference.sig --sample sample1.sig -o qc_results.tsv --force

Expected Output:

The existing qc_results.tsv file will be overwritten with the new QC results for sample1.sig.

Use Case 5: Using a Custom Y Chromosome Signature

Objective: Override the default Y chromosome signature with a custom one during QC.

Command:

snipe qc --ref reference.sig --sample sample1.sig --ychr custom_y.sig -o qc_custom_y.tsv

Explanation:

  • --ychr custom_y.sig: Specifies a custom Y chromosome signature file to override the default.

Expected Output:

A TSV file named qc_custom_y.tsv containing QC metrics for sample1.sig with analyses based on the custom Y chromosome signature.

Use Case 6: Reading Sample Paths from a File

Objective: Perform QC on multiple samples listed in a text file.

Command:

snipe qc --ref reference.sig --samples-from-file samples.txt -o qc_from_file.tsv

Explanation:

  • --samples-from-file samples.txt: Specifies a file containing sample paths, one per line.

Contents of samples.txt:

sample1.sig
sample2.sig
sample3.sig

Expected Output:

A TSV file named qc_from_file.tsv containing QC metrics for sample1.sig, sample2.sig, and sample3.sig.

Use Case 7: Combining Multiple Options for Comprehensive QC

Command:

snipe qc --ref reference.sig --sample sample1.sig --sample sample2.sig --amplicon amplicon.sig --var var1.sig --var var2.sig --advanced --roi -o qc_comprehensive.tsv

Explanation:

  • --ref reference.zip: Reference genome signature file.
  • --sample sample1.zip & --sample sample2.sig: Multiple sample signature files.
  • --amplicon amplicon.zip: Amplicon signature file.
  • --var var1.zip & --var var2.zip: Variable signature files.
  • --advanced: Includes advanced QC metrics.
  • --roi: Enables ROI calculations.
  • -o qc_comprehensive.tsv: Output file for QC results.

Expected Output:

A TSV file named qc_comprehensive.tsv containing comprehensive QC metrics, including advanced analyses, ROI predictions, and data from amplicon and variable signatures for both sample1.sig and sample2.sig.

Source code in src/snipe/cli/cli_qc.py
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
@click.command()
@click.option('--ref', type=click.Path(exists=True), required=True, help='Reference genome signature file (required).')
@click.option('--sample', type=click.Path(exists=True), callback=validate_sig_input, multiple=True, default = None, help='Sample signature file. Can be provided multiple times.')
@click.option('--samples-from-file', type=click.Path(exists=True), help='File containing sample paths (one per line).')
@click.option('--amplicon', type=click.Path(exists=True), help='Amplicon signature file (optional).')
@click.option('--roi', is_flag=True, default=False, help='Calculate ROI for 1,2,5,9 folds.')
@click.option('--advanced', is_flag=True, default=False, help='Include advanced QC metrics.')
@click.option('--ychr', type=click.Path(exists=True), help='Y chromosome signature file (overrides the reference ychr).')
@click.option('--debug', is_flag=True, default=False, help='Enable debugging and detailed logging.')
@click.option('-o', '--output', required=True, callback=validate_tsv_file, help='Output TSV file for QC results.')
@click.option('--var', 'vars', multiple=True, type=click.Path(exists=True), help='Variable signature file path. Can be used multiple times.')
def qc(ref: str, sample: List[str], samples_from_file: Optional[str],
       amplicon: Optional[str], roi: bool, advanced: bool, 
       ychr: Optional[str], debug: bool, output: str, vars: List[str]):
    """
        Perform quality control (QC) on multiple samples against a reference genome.

        This command calculates various QC metrics for each provided sample, optionally including advanced metrics and ROI (Return on investement) predictions. Results are aggregated and exported to a TSV file.

        ## Usage

        ```bash
        snipe qc [OPTIONS]
        ```

        ## Options

        - `--ref PATH` **[required]**  
        Reference genome signature file.

        - `--sample PATH`  
        Sample signature file. Can be provided multiple times.

        - `--samples-from-file PATH`  
        File containing sample paths (one per line).

        - `--amplicon PATH`  
        Amplicon signature file (optional).

        - `--roi`  
        Calculate ROI for 1x, 2x, 5x, and 9x coverage folds.

        - `--advanced`  
        Include advanced QC metrics.

        - `--ychr PATH`  
        Y chromosome signature file (overrides the reference ychr).

        - `--debug`  
        Enable debugging and detailed logging.

        - `-o`, `--output PATH` **[required]**  
        Output TSV file for QC results.

        - `--var PATH`  
        Variable signature file path. Can be used multiple times.

        ## Examples

        ### Performing QC on Multiple Samples

        ```bash
        snipe qc --ref reference.sig --sample sample1.sig --sample sample2.sig -o qc_results.tsv
        ```

        ### Performing QC with Samples Listed in a File

        ```bash
        snipe qc --ref reference.sig --samples-from-file samples.txt -o qc_results.tsv
        ```

        *Contents of `samples.txt`:*

        ```
        sample1.sig
        sample2.sig
        sample3.sig
        ```

        ### Performing QC with an Amplicon Signature

        ```bash
        snipe qc --ref reference.sig --amplicon amplicon.sig --sample sample1.sig -o qc_results.tsv
        ```

        ### Including Advanced QC Metrics and ROI Calculations

        ```bash
        snipe qc --ref reference.sig --sample sample1.sig --advanced --roi -o qc_results.tsv
        ```

        ### Using Multiple Variable Signatures

        ```bash
        snipe qc --ref reference.sig --sample sample1.sig --var var1.sig --var var2.sig -o qc_results.tsv
        ```

        ### Overriding the Y Chromosome Signature

        ```bash
        snipe qc --ref reference.sig --sample sample1.sig --ychr custom_y.sig -o qc_results.tsv
        ```

        ### Combining Multiple Options

        ```bash
        snipe qc --ref reference.sig --sample sample1.sig --sample sample2.sig --amplicon amplicon.sig --var var1.sig --var var2.sig --advanced --roi -o qc_results.tsv
        ```

        ## Detailed Use Cases

        ### Use Case 1: Basic QC on Single Sample

        **Objective:** Perform QC on a single sample against a reference genome without any advanced metrics or ROI.

        **Command:**

        ```bash
        snipe qc --ref reference.sig --sample sample1.sig -o qc_basic.tsv
        ```

        **Explanation:**

        - `--ref reference.sig`: Specifies the reference genome signature file.
        - `--sample sample1.sig`: Specifies the sample signature file.
        - `-o qc_basic.tsv`: Specifies the output TSV file for QC results.

        **Expected Output:**

        A TSV file named `qc_basic.tsv` containing basic QC metrics for `sample1.sig`.

        ### Use Case 2: QC on Multiple Samples with ROI

        **Objective:** Perform QC on multiple samples and calculate Regions of Interest (ROI) for each.

        **Command:**

        ```bash
        snipe qc --ref reference.sig --sample sample1.sig --sample sample2.sig --roi -o qc_roi.tsv
        ```

        **Explanation:**

        - `--ref reference.sig`: Reference genome signature file.
        - `--sample sample1.sig` & `--sample sample2.sig`: Multiple sample signature files.
        - `--roi`: Enables ROI calculations.
        - `-o qc_roi.tsv`: Output file for QC results.

        **Expected Output:**

        A TSV file named `qc_roi.tsv` containing QC metrics along with ROI predictions for `sample1.sig` and `sample2.sig`.

        ### Use Case 3: Advanced QC with Amplicon and Variable Signatures

        **Objective:** Perform advanced QC on a sample using an amplicon signature and multiple variable signatures.

        **Command:**

        ```bash
        snipe qc --ref reference.sig --amplicon amplicon.sig --sample sample1.sig --var var1.sig --var var2.sig --advanced -o qc_advanced.tsv
        ```

        **Explanation:**

        - `--ref reference.sig`: Reference genome signature file.
        - `--amplicon amplicon.sig`: Amplicon signature file.
        - `--sample sample1.sig`: Sample signature file.
        - `--var var1.sig` & `--var var2.sig`: Variable signature files.
        - `--advanced`: Includes advanced QC metrics.
        - `-o qc_advanced.tsv`: Output file for QC results.

        **Expected Output:**

        A TSV file named `qc_advanced.tsv` containing comprehensive QC metrics, including advanced metrics and analyses based on the amplicon and variable signatures for `sample1.sig`.

        ### Use Case 4: Overwriting Existing Output File

        **Objective:** Perform QC and overwrite an existing output TSV file.

        **Command:**

        ```bash
        snipe qc --ref reference.sig --sample sample1.sig -o qc_results.tsv
        ```

        **Explanation:**

        - If `qc_results.tsv` already exists, the command will **fail** to prevent accidental overwriting. To overwrite, use the `--force` flag (assuming you've implemented it; if not, you may need to adjust the `qc` command to include a `--force` option).

        **Adjusted Command with `--force` (if implemented):**

        ```bash
        snipe qc --ref reference.sig --sample sample1.sig -o qc_results.tsv --force
        ```

        **Expected Output:**

        The existing `qc_results.tsv` file will be overwritten with the new QC results for `sample1.sig`.

        ### Use Case 5: Using a Custom Y Chromosome Signature

        **Objective:** Override the default Y chromosome signature with a custom one during QC.

        **Command:**

        ```bash
        snipe qc --ref reference.sig --sample sample1.sig --ychr custom_y.sig -o qc_custom_y.tsv
        ```

        **Explanation:**

        - `--ychr custom_y.sig`: Specifies a custom Y chromosome signature file to override the default.

        **Expected Output:**

        A TSV file named `qc_custom_y.tsv` containing QC metrics for `sample1.sig` with analyses based on the custom Y chromosome signature.

        ### Use Case 6: Reading Sample Paths from a File

        **Objective:** Perform QC on multiple samples listed in a text file.

        **Command:**

        ```bash
        snipe qc --ref reference.sig --samples-from-file samples.txt -o qc_from_file.tsv
        ```

        **Explanation:**

        - `--samples-from-file samples.txt`: Specifies a file containing sample paths, one per line.

        **Contents of `samples.txt`:**

        ```
        sample1.sig
        sample2.sig
        sample3.sig
        ```

        **Expected Output:**

        A TSV file named `qc_from_file.tsv` containing QC metrics for `sample1.sig`, `sample2.sig`, and `sample3.sig`.

        ### Use Case 7: Combining Multiple Options for Comprehensive QC

        **Command:**

        ```bash
        snipe qc --ref reference.sig --sample sample1.sig --sample sample2.sig --amplicon amplicon.sig --var var1.sig --var var2.sig --advanced --roi -o qc_comprehensive.tsv
        ```

        **Explanation:**

        - `--ref reference.zip`: Reference genome signature file.
        - `--sample sample1.zip` & `--sample sample2.sig`: Multiple sample signature files.
        - `--amplicon amplicon.zip`: Amplicon signature file.
        - `--var var1.zip` & `--var var2.zip`: Variable signature files.
        - `--advanced`: Includes advanced QC metrics.
        - `--roi`: Enables ROI calculations.
        - `-o qc_comprehensive.tsv`: Output file for QC results.

        **Expected Output:**

        A TSV file named `qc_comprehensive.tsv` containing comprehensive QC metrics, including advanced analyses, ROI predictions, and data from amplicon and variable signatures for both `sample1.sig` and `sample2.sig`.
    """

    print(sample)

    start_time = time.time()

    # Configure logging
    logger = logging.getLogger('snipe_qc')
    logger.setLevel(logging.DEBUG if debug else logging.INFO)
    handler = logging.StreamHandler(sys.stdout)
    handler.setLevel(logging.DEBUG if debug else logging.INFO)
    formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
    handler.setFormatter(formatter)
    if not logger.hasHandlers():
        logger.addHandler(handler)

    logger.info("Starting QC process.")

    # Collect sample paths from --sample and --samples-from-file
    samples_set: Set[str] = set()
    if sample:
        for _sample in sample:
            logger.debug(f"Adding sample from command-line: {_sample}")
            samples_set.add(_sample)


    if samples_from_file:
        logger.debug(f"Reading samples from file: {samples_from_file}")
        try:
            with open(samples_from_file, 'r', encoding='utf-8') as f:
                file_samples = {line.strip() for line in f if line.strip()}
            samples_set.update(file_samples)
            logger.debug(f"Collected {len(file_samples)} samples from file.")
        except Exception as e:
            logger.error(f"Failed to read samples from file {samples_from_file}: {e}")
            sys.exit(1)

    # Deduplicate and validate sample paths
    valid_samples = []
    for sample_path in samples_set:
        if os.path.exists(sample_path):
            valid_samples.append(os.path.abspath(sample_path))
        else:
            logger.warning(f"Sample file does not exist and will be skipped: {sample_path}")

    if not valid_samples:
        logger.error("No valid samples provided for QC.")
        sys.exit(1)

    logger.info(f"Total valid samples to process: {len(valid_samples)}")

    # Load reference signature
    logger.info(f"Loading reference signature from: {ref}")
    try:
        reference_sig = SnipeSig(sourmash_sig=ref, sig_type=SigType.GENOME, enable_logging=debug)
        logger.debug(f"Loaded reference signature: {reference_sig.name}")
    except Exception as e:
        logger.error(f"Failed to load reference signature from {ref}: {e}")
        sys.exit(1)

    # Load amplicon signature if provided
    amplicon_sig = None
    if amplicon:
        logger.info(f"Loading amplicon signature from: {amplicon}")
        try:
            amplicon_sig = SnipeSig(sourmash_sig=amplicon, sig_type=SigType.AMPLICON, enable_logging=debug)
            logger.debug(f"Loaded amplicon signature: {amplicon_sig.name}")
        except Exception as e:
            logger.error(f"Failed to load amplicon signature from {amplicon}: {e}")
            sys.exit(1)

    # Load Y chromosome signature if provided
    ychr_sig = None
    if ychr:
        logger.info(f"Loading Y chromosome signature from: {ychr}")
        try:
            ychr_sig = SnipeSig(sourmash_sig=ychr, sig_type=SigType.GENOME, enable_logging=debug)
            logger.debug(f"Loaded Y chromosome signature: {ychr_sig.name}")
        except Exception as e:
            logger.error(f"Failed to load Y chromosome signature from {ychr}: {e}")
            sys.exit(1)

    # Prepare variable signatures if provided
    vars_paths = []
    vars_snipesigs = []
    if vars:
        logger.info(f"Loading {len(vars)} variable signature(s).")
        for path in vars:
            if not os.path.exists(path):
                logger.error(f"Variable signature file does not exist: {path}")
                sys.exit(1)
            vars_paths.append(os.path.abspath(path))
            try:
                var_sig = SnipeSig(sourmash_sig=path, sig_type=SigType.AMPLICON, enable_logging=debug)
                vars_snipesigs.append(var_sig)
                logger.debug(f"Loaded variable signature: {var_sig.name}")
            except Exception as e:
                logger.error(f"Failed to load variable signature from {path}: {e}")

        logger.debug(f"Variable signature paths: {vars_paths}")


    predict_extra_folds = [1, 2, 5, 9]


    qc_instance = MultiSigReferenceQC(
            reference_sig=reference_sig,
            amplicon_sig=amplicon_sig,
            ychr=ychr_sig if ychr_sig else None,
            varsigs=vars_snipesigs if vars_snipesigs else None,
            enable_logging=debug
        )

    sample_to_stats = {}
    failed_samples = []
    for sample_path in tqdm(valid_samples):
        sample_sig = SnipeSig(sourmash_sig=sample_path, sig_type=SigType.SAMPLE, enable_logging=debug)
        try:
            sample_stats = qc_instance.process_sample(sample_sig=sample_sig,
                          predict_extra_folds = predict_extra_folds if roi else None,
                          advanced=advanced)
            sample_to_stats[sample_sig.name] = sample_stats
        except Exception as e:
            failed_samples.append(sample_sig.name)
            qc_instance.logger.error(f"Failed to process sample {sample_sig.name}: {e}")
            continue


    # Separate successful and failed results
    succeeded = list(sample_to_stats.keys())
    failed = len(failed_samples)

    # Handle complete failure
    if len(succeeded) == 0:
        logger.error("All samples failed during QC processing. Output TSV will not be generated.")
        sys.exit(1)

    # write total success and failure
    logger.info("Successfully processed samples: %d", len(succeeded))

    # Prepare the command-line invocation for comments
    command_invocation = ' '.join(sys.argv)

    # Create pandas DataFrame for succeeded samples
    df = pd.DataFrame(sample_to_stats.values())

    # Reorder columns to have 'sample' and 'file_path' first, if they exist
    cols = list(df.columns)
    reordered_cols = []
    for col in ['sample', 'file_path']:
        if col in cols:
            reordered_cols.append(col)
            cols.remove(col)
    reordered_cols += cols
    df = df[reordered_cols]

    # Export to TSV with comments
    try:
        with open(output, 'w', encoding='utf-8') as f:
            # Write comment with command invocation
            f.write(f"# Command: {command_invocation}\n")
            # Write the DataFrame to the file
            df.to_csv(f, sep='\t', index=False)
        logger.info(f"QC results successfully exported to {output}")
    except Exception as e:
        logger.error(f"Failed to export QC results to {output}: {e}")
        sys.exit(1)

    # Report failed samples if any
    if failed:
        failed_samples = [res['sample'] for res in failed]
        logger.warning(f"The following {len(failed_samples)} sample(s) failed during QC processing:")
        for sample in failed_samples:
            logger.warning(f"- {sample}")

    end_time = time.time()
    elapsed_time = end_time - start_time
    logger.info(f"QC process completed in {elapsed_time:.2f} seconds.")