LogoIntMeta

Kraken2

Taxonomic Classification

Kraken2 is an ultrafast k-mer based taxonomic sequence classifier that assigns taxonomic labels to short DNA reads by examining the k-mers within a read and querying a database with those k-mers. [1]

How to Obtain Output Model File

Below is a brief workflow the team ran to obtain the output model examples we present on the tools page.

Input

FASTQ/FASTA files (single-end or paired-end reads)

Output

Tab-separated report with taxonomic counts at each rank, plus per-read classification file

conda install -c bioconda kraken2

Docker image: staphb/kraken2:latest

Sample 1 human gut metagenome (SRR14092160, 5.5M read pairs) classified with Kraken2 against the Database.

  1. 1

    Download reads from NCBI SRA

    prefetch SRR14092160 && fasterq-dump SRR14092160 -O /data --split-files && gzip /data/SRR14092160_*.fastq

    Human gut metagenome, pre-VRE colonization timepoint (Day -9). Illumina paired-end, 5.5M read pairs.

  2. 2

    Download PlusPF-8 database

    wget https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_08_GB_20251015.tar.gz && mkdir -p kraken2_db && tar xzf k2_pluspf_08_GB_20251015.tar.gz -C kraken2_db/

    ~5.6 GB download, 7 GB extracted. Pre-built indexes available at https://benlangmead.github.io/aws-indexes/k2.

  3. 3

    Run Kraken2 classification

    kraken2 --db kraken2_db --paired --gzip-compressed --threads 12 --report kraken2_output.tsv --output /dev/null SRR14092160_1.fastq.gz SRR14092160_2.fastq.gz

    52.29% classified. Diverse community dominated by Bacillota.

Upload kreport.tsv to IntMeta

Materials Used

Sample Output Files

Download the output files used in the tool page demos. You can upload these directly to IntMeta to explore the visualizations.

Single & Comparison

Group Analysis

Charts Reference

Detailed descriptions for all 31 visualizations generated by Kraken2 in IntMeta.

distribution

distribution

Bar chart ranking the top organisms by clade read count (reads assigned to the taxon plus all its descendants) from the Kraken2 report. Provides a quick overview of the most abundant taxa at the selected rank.

composition

composition

Pie/donut chart showing the top taxa as a percentage of their combined read count at the selected rank. Each slice is computed as (taxon clade reads / sum of displayed clade reads) × 100.

richness

richness

Counts every unique organism with at least one clade read at each major rank (Domain through Species). The bar height is the raw distinct-taxon count — not abundance-weighted — so a taxon with 1 read counts the same as one with 1 million.

diversity

diversity

Alpha diversity indices computed from per-taxon clade-read proportions at the selected rank. Shannon H = −Σ(pᵢ · ln pᵢ), Simpson D = 1 − Σ(nᵢ(nᵢ−1))/(N(N−1)), and Pielou's evenness J = H / ln(richness). Higher Shannon entropy indicates a more even community.

clade-vs-direct

clade-vs-direct

Side-by-side comparison of clade reads (taxon + all descendants) versus directly-assigned reads for the top taxa. Includes a specificity ratio (direct / clade): values near 1.0 mean most reads map directly to that taxon, while values near 0 mean abundance is driven by sub-taxa.

multilevel-composition

multilevel-composition

Stacked bar chart displaying the top taxa (by clade reads) at each major rank, with remaining taxa grouped as 'Other'. Reveals how community composition shifts as classification resolution increases from Domain to Species.

dependency-wheel

dependency-wheel

Chord diagram connecting parent taxa to their child taxa across ranks (up to the selected max rank). Connection thickness is proportional to shared clade-read count. Edges below a minimum coverage threshold (default 5% of total reads) are filtered out.

sankey-flow

sankey-flow

Sankey flow diagram tracing how reads distribute from the start rank (default: Domain) to the end rank (default: Genus). Each band's width equals the read count flowing from a parent taxon to a child taxon. Edges below a minimum coverage threshold (default 5%) are removed to reduce clutter.

comp-classification

comp-classification

Stacked bar chart showing total classified vs unclassified reads per sample. Enables quick comparison of classification rates across samples — large unclassified fractions may indicate novel organisms or database limitations.

comp-grouped-abundance

comp-grouped-abundance

Grouped bar chart placing the top taxa from each sample side by side at the selected rank. Each group contains one bar per sample, colored by sample identity, enabling direct visual comparison of absolute clade-read counts for the same organism across samples.

comp-relative-abundance

comp-relative-abundance

100% stacked bar chart where each bar represents one sample and segments show the proportional contribution of each taxon. Useful for comparing community composition when samples have very different sequencing depths, since all bars are normalized to the same height.

comp-abundance-heatmap

comp-abundance-heatmap

Color-matrix heatmap with taxa on one axis and samples on the other. Cell color intensity is proportional to clade-read abundance at the selected rank. Hierarchical clustering on both axes groups similar samples and co-occurring taxa together.

comp-diversity-indices

comp-diversity-indices

Multi-panel chart displaying Shannon entropy, Simpson diversity, Observed Richness, and Pielou's Evenness for each sample on its own y-axis scale. Enables quick cross-sample comparison of alpha diversity without scale distortion.

comp-shared-taxa

comp-shared-taxa

Venn diagram showing the count of taxa that are shared between samples versus taxa exclusive to each individual sample. Computed at the selected taxonomic rank using presence/absence of clade reads ≥ 1.

group-alpha-diversity

group-alpha-diversity

Boxplots of Shannon entropy, Simpson diversity, Observed Richness, and Pielou's Evenness per group. Each box summarizes within-group variation. Kruskal-Wallis p-values test for significant differences between groups.

group-pcoa

group-pcoa

Principal Coordinates Analysis (PCoA) on Bray-Curtis dissimilarity matrix. Points are colored by group assignment with 95% confidence ellipses. Axis labels show the percentage of variance explained by each coordinate.

group-nmds

group-nmds

Non-metric Multidimensional Scaling (NMDS) ordination with stress value displayed. Points colored by group with confidence ellipses. Lower stress (<0.2) indicates a good representation of the original distances in 2D.

group-distance-boxplots

group-distance-boxplots

Boxplots of within-group vs between-group Bray-Curtis distances. PERMANOVA R² and p-value quantify the fraction of variance explained by grouping. ANOSIM R statistic measures the degree of group separation.

group-relative-abundance

group-relative-abundance

Group-averaged relative abundance as 100% stacked bars at the selected taxonomic rank. Each bar shows the mean proportional composition across all samples in the group, enabling direct group-level comparison.

group-differential-abundance

group-differential-abundance

Bar chart of taxa with significant abundance differences between groups (Kruskal-Wallis test, Benjamini-Hochberg FDR correction). Color indicates which group has higher abundance. Only taxa passing the significance threshold are shown.

group-lefse

group-lefse

LEfSe (Linear Discriminant Analysis Effect Size) biomarker chart. Horizontal bars show LDA scores for taxa that significantly discriminate between groups. Higher LDA scores indicate stronger association with the respective group.

group-shared-taxa

group-shared-taxa

Venn diagram showing shared and unique taxa across experimental groups. Each group's taxa set is the union of all taxa detected in any sample belonging to that group. Click regions to view the specific taxa list with read counts.

group-classification

group-classification

Stacked column chart showing mean classified vs unclassified reads per experimental group. Error bars represent standard deviation across samples within each group. Tooltip shows percentage breakdown and sample count per group.

group-distribution

group-distribution

Grouped column chart showing mean read counts for top N taxa at the selected taxonomic level, grouped by experimental group. Error bars show standard deviation. Enables comparison of taxonomic abundance patterns between groups.

group-heatmap

group-heatmap

Heatmap of mean log₁₀-transformed read counts per experimental group (columns) and taxa (rows). Color intensity reflects abundance, allowing quick visual comparison of taxonomic profiles across groups.

rarefaction

rarefaction

Rarefaction curve plotting observed taxa vs subsampled read depth at the selected rank. Computed analytically using the hypergeometric expectation. A curve that plateaus indicates sufficient sequencing depth; a curve still rising suggests under-sampling.

rank-abundance

rank-abundance

Rank-abundance curve (Whittaker plot) with taxa ranked by decreasing relative abundance on a logarithmic y-axis. The curve's length along the x-axis reflects richness, while the slope indicates evenness — a steep drop means a few taxa dominate.

comp-rarefaction

comp-rarefaction

Overlaid rarefaction curves for all uploaded samples. Enables visual comparison of sequencing effort and taxonomic saturation across samples.

comp-rank-abundance

comp-rank-abundance

Overlaid rank-abundance curves for all uploaded samples. Steeper curves indicate more uneven communities dominated by fewer taxa.

group-taxonomic-sunburst

group-taxonomic-sunburst

Interactive sunburst displaying the full taxonomic hierarchy as concentric rings. LEfSe biomarker nodes are colored by their enriched group; non-significant nodes are gray. Click any ring segment to drill down.

group-volcano

group-volcano

Volcano plot: log₂ fold-change (x) vs −log₁₀ adjusted p-value (y). Points above the horizontal line pass significance (BH FDR). Colored points are significantly enriched in the respective group.

References

[1]Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019).DOI
[2]Anderson, M.J. A new method for non-parametric multivariate analysis of variance. Austral Ecology 26, 32–46 (2001).DOI
[3]Segata, N., Izard, J., Waldron, L. et al. Metagenomic biomarker discovery and explanation. Genome Biol 12, R60 (2011).DOI
IntMeta — Interactive Metagenomics Visualizations