On 1st January 2025, registration will be permanently mandatory to access data curated after the 31st of December 2024. Note that access via the application programming interface (API) will also require authentication to access recent data (API authentication help link).
Please contact us if you have any questions.
Quality criteria for accepting whole genome sequencing data
Users are requested to submit only high-quality assemblies, generated from pure cultures and sequenced at a minimum coverage of 40X. Assembly files consisting of high numbers of contigs, or presenting a cumulative contigs length outside the typical range of the Corynebacterium diphtheriae species complex (~ 2.2 - 2.9 Mbp) will not be accepted.
NB. Please note that new alleles and profiles will not be defined based on long-read sequence technology alone nor based on Ion Torrent/Roche/454, to avoid introducing artifact sequences in the database due to low accuracy sequencing. For typing purposes, we recommend using only assemblies either generated from high quality short-reads or combining both short and long reads (hybrid assemblies).
Please refer to the assembly metrics below:
Species | Size of genome | Number of contigs | C+G% | Coverage |
---|---|---|---|---|
C. diphtheriae | 2300000 - 2600000 | 20 - 90 | 53 - 54 | >= 40 |
C. belfantii | 2500000 - 2900000 | 150 - 220 | 53 - 54 | >= 40 |
C. rouxii | 2200000 - 2500000 | 20 - 55 | 53 - 54 | >= 40 |
C. ulcerans & C. ramonii | 2400000 - 2800000 | 5 - 60 | 53 - 54 | >= 40 |
C. pseudotuberculosis | 2200000 - 2300000 | 10 - 30 | 53 - 54 | >= 40 |