-
Notifications
You must be signed in to change notification settings - Fork 77
Open
Description
I have a workflow which uses kmc to count all kmers in an extremely large dataset of about 250,000 fasta files. The workflow was originally built with v3.2.1 of kmc, but stalled when I updates to v3.2.4. Unfortunately it doesn't exit or report an error. Here's the details I can provide:
KMC call:
kmc -fm -ci0 -cx100000000000 -t94 -k75 -m745 @reference_list database databse_dir
Result:
The program spends some time printing * characters, and then it prints Stage 1: 0% before stalling. There are 511 bin files in the workdir. Htop shows no processor activity, but the commands are still listed.
Before changing versions, i spent time trying to make sure that none of the fasta.gz files were corrupted.
- gzip -t was clean for all genomes
- py_fasta_validator did not indicate a problem with any of the fasta formatting
- I ran kmc on each genome individually and it returned a result for all (however, it did fail on a few genomes, but then passed when I reran on those. This could be because I used xargs to parallelize 94 at a time)
Metadata
Metadata
Assignees
Labels
No labels