Data access policy note: since January 1st, 2025, registration and authentication are mandatory to access all data curated after this date, either via the web interface or via the application programming interface (API authentication help link). Please contact us if you have any questions.

Whole genome sequencing data requirements

Users are requested to submit only high-quality assemblies, generated from pure cultures and sequenced at a minimum coverage of 40X. Assembly files consisting of high numbers of contigs, or presenting a cumulative contigs length outside the typical range of the Corynebacterium diphtheriae species complex (~ 2.2 - 2.9 Mbp), will not be accepted.

NB. For quality purposes, we only accept assemblies either generated from high quality short-reads or combining both short and long reads (hybrid assemblies). Please note that genomes obtained using long-read sequence technology, or by Ion Torrent/Roche/454 will not be uploaded to the database, nor used to define new alleles. However, if using one of these assemblies you discover a new MLST profile(s) composed solely of existing alleles, you may make a 'profile' submission type to define a new ST.

Please refer to the assembly metrics below:

Species	Size of genome	Number of contigs	C+G%	Coverage
C. diphtheriae	2 300 000 - 2 600 000	20 - 90	53 - 54	>= 40
C. belfantii	2 500 000 - 2 900 000	150 - 220	53 - 54	>= 40
C. rouxii	2 200 000 - 2 500 000	20 - 55	53 - 54	>= 40
C. ulcerans & C. ramonii	2 400 000 - 2 800 000	5 - 60	53 - 54	>= 40
C. pseudotuberculosis	2 200 000 - 2 300 000	10 - 30	53 - 54	>= 40

Edit on GitLab