VAMB
Metagenomic BinningVAMB (Variational Autoencoders for Metagenomic Binning) uses deep variational autoencoders to learn a latent representation of contigs from their tetranucleotide frequencies and co-abundance profiles across multiple samples, enabling accurate metagenomic binning. [1]
How to Obtain Output Model File
Below is a brief workflow the team ran to obtain the output model examples we present on the tools page.
Input
Co-assembled contigs (FASTA) + per-sample BAM alignment files
Output
TSV with per-cluster metrics: cluster name, radius, peak/valley ratio, kind (normal/loner/fallback), total bp, contig count, medoid contig
conda install -c bioconda vamb
Docker image: quay.io/biocontainers/vamb:4.1.3--pyhdfd78af_0
Sample 1 gut microbiome (3 samples from PRJNA795985, 1M read pairs each) co-assembled with Assembly and binned with VAMB v5.0.4, producing 7,212 clusters.
- 1
Download 3 samples from PRJNA795985
for SRR in SRR17531757 SRR17531762 SRR17531772; do fastq-dump --split-files --gzip -X 1000000 --outdir reads/ $SRR; done
Downloads 1M read pairs per sample (~70 MB compressed each) from the "Diet and Antimicrobial Resistance in Healthy US Adults" study (Shrestha et al., mBio 2022).
- 2
Co-assemble all samples with MEGAHIT
megahit -1 reads/SRR17531757_1.fastq.gz,reads/SRR17531762_1.fastq.gz,reads/SRR17531772_1.fastq.gz -2 reads/SRR17531757_2.fastq.gz,reads/SRR17531762_2.fastq.gz,reads/SRR17531772_2.fastq.gz -o assembly --min-contig-len 1500 -t 8 --presets meta-sensitive
Produces 14,934 contigs (≥1,500 bp).
- 3
Map each sample to contigs
for SRR in SRR17531757 SRR17531762 SRR17531772; do minimap2 -ax sr -t 8 assembly/final.contigs.fa reads/${SRR}_1.fastq.gz reads/${SRR}_2.fastq.gz | samtools sort -@ 4 -o bams/${SRR}.sorted.bam && samtools index bams/${SRR}.sorted.bam; done - 4
Run VAMB v5.0.4
vamb bin default --outdir vamb_out --fasta assembly/final.contigs.fa --bamdir bams/ -m 1500 --minfasta 200000 -p 8
Produces vae_clusters_metadata.tsv with 7,212 clusters (715 normal, 6,472 loner, 25 fallback).
vae_clusters_metadata.tsv to IntMetaMaterials Used
Sample Output Files
Download the output files used in the tool page demos. You can upload these directly to IntMeta to explore the visualizations.
Single & Comparison
Group Analysis
Charts Reference
Detailed descriptions for all 22 visualizations generated by VAMB in IntMeta.

cluster-kind-distribution
Pie chart of cluster kinds from the 'kind' column: normal (separated by density peaks in latent space), loner (isolated single-contig clusters), and fallback (assigned the default clustering radius). A healthy assembly produces mostly normal clusters; many fallback clusters suggest poor separation.

genome-size-distribution
Bar chart of total bp per cluster, sorted by size and colored by kind. VAMB recommends filtering clusters below 250 Kbp and discarding fallback clusters for downstream analysis.

radius-vs-pvr
Scatter plot of clustering radius vs peak-to-valley ratio (PVR, from the 'peak valley ratio' column), colored by kind. Normal clusters typically have higher PVR (clearer density separation) and tighter radius. Low PVR with large radius suggests poorly resolved clusters.

genome-size-vs-contigs
Scatter plot of total bp (genome size) vs ncontigs per cluster. Well-resolved bins cluster at moderate contig counts with substantial genome sizes. Points in the upper-left (many contigs, small genome) may indicate chimeric clusters.

contigs-per-cluster
Bar chart of ncontigs per cluster, colored by kind. Very high contig counts may indicate over-fragmented or chimeric clusters that merged unrelated contigs.

avg-contig-length
Bar chart of average contig length (total bp / ncontigs) per cluster. Higher values indicate better-assembled genomes with longer individual contigs. Low averages suggest highly fragmented assemblies.

metrics-by-kind
Grouped box plot comparing distributions of genome size (bp), contig count, and radius across the three cluster kinds (normal, loner, fallback). Reveals systematic differences — e.g., loner clusters tend to be smaller with single contigs.

cluster-metrics-heatmap
Heatmap of min-max normalized metrics (bp, ncontigs, radius, peak valley ratio) across the top clusters. Each metric is scaled 0–1 within its column; darker cells = higher relative values. Useful for spotting outlier clusters.

comp-quality-tiers
Grouped bar chart comparing the distribution of heuristic quality tiers (High / Medium / Low based on genome size and N50 thresholds) across samples. Reveals which sample produced more high-quality VAMB clusters.

comp-genome-size
Box plot or grouped bar chart comparing the genome size (total bp) distribution of VAMB clusters across samples. Differences may reflect varying community complexity or co-assembly depth.

comp-kind-distribution
Stacked or grouped bar chart comparing the proportion of normal, loner, and fallback clusters across samples. A higher fraction of normal clusters indicates better latent-space separation and more reliable binning.
comp-quality-pct
100% stacked bar chart showing the proportion of High / Medium / Low quality clusters per sample. Normalizes for different cluster counts, enabling direct comparison of binning quality. Standard in CAMI challenge benchmarks.


comp-cdf
Empirical cumulative distribution function (CDF) of cluster sizes overlaid per sample. Reveals full distribution shape beyond boxplot summaries. Standard in CAMI benchmarking.

comp-contigs
Boxplot of contig count per cluster across samples. Higher contig counts indicate more fragmented assemblies, potentially from lower sequencing depth or complex community composition.

binning-group-quality-tiers
Grouped bar chart of heuristic quality tier counts (High / Medium / Low) per experimental group, with Chi-square test. Tiers are based on cluster kind and genome size: High = normal + ≥500 Kbp, Medium = normal ≥100 Kbp or fallback ≥500 Kbp, Low = all else. Note: these are assembly-metric heuristics, not MIMAG classifications — VAMB does not output completeness or contamination.

binning-group-metric-boxplots
Boxplots comparing a selected metric (genome size, contig count, radius, or peak-valley ratio) across groups. All individual cluster values are pooled per group. Kruskal-Wallis test (≥3 groups) or Mann-Whitney U test (2 groups) assesses significance; pairwise comparisons use Benjamini-Hochberg FDR correction.

binning-group-recovery-rate
Grouped bar chart of mean cluster count per sample at each quality tier (High, Medium, Low, Total) across groups, with ±1 SD error bars. Tests whether experimental groups differ in how many VAMB clusters they recover at each quality level.

binning-group-pca
PCA ordination of per-sample quality profiles (mean genome size, contig count, radius, peak-valley ratio, %normal, %loner, %fallback, %HQ). Features are z-score standardized before decomposition. PERMANOVA on the Euclidean distance matrix tests whether group centroids differ significantly. 95% confidence ellipses are drawn for groups with ≥3 samples.

binning-group-cdf
Overlaid empirical cumulative distribution functions (CDFs) of a selected metric across groups. The two-sample Kolmogorov-Smirnov test quantifies the maximum vertical distance between curves — significant D-statistics indicate the groups' metric distributions differ in location, spread, or shape.

binning-group-cluster-types
Stacked bar chart of cluster kind counts (Normal / Loner / Fallback) per group, with Chi-square test. These three cluster types are defined by VAMB's latent-space density-peak clustering. Significant p-values suggest experimental conditions influence the proportion of well-separated vs. poorly-resolved clusters.

binning-group-total-recovery
Stacked bar chart of mean total base pairs recovered per group, split by quality tier. Error bars show ±1 SD. Measures actual data volume binned per group, complementing cluster count and recovery rate charts.