Comparison of tools

View in Excel

Name	Category	Functionality	Data Input Formats	Data Output Formats	Software Type	Licensing	Operating System	Programming Language	Key Features	Strengths	Limitations	Documentation	Community Support	Website/Citation	Organism Supported	Galaxy Access
ColabFold	Structure Prediction	Uses AF2, AF2-multimer and Rosettafold for protein structure preditcition	Protein sequence	PDB,numpy array, jsoin	Jupyter notebook on google servers		Browser	Browser/Python	Colabfold uses google colab notebooks to run different versions of Alphafold2 for structure prediction of monomers and protein complexes. The multiple sequence alignment (MSA) step is performed by using mmseqs API (see below) and can be used to generate MSAs for other puproses. There are many different versions of AF2 are available. Due to limitations in available hardware in free tier google colab notebooks it may not be feasible to fold proteins or complexes >2000 aa total.	Easy to use, no gpu or sophisticated hardware needed	Limited to GPUs available on free tier for google colab. This will limit the size of the protein (or protein complexes) that can be predicted.			https://github.com/sokrypton/ColabFold	N/A	Yes
VEP	Variant effect prediction	Determines effects of variants	VCF / Variant information manually entered	VEP Format TSV	Web and Command Line	Apache 2.0	Mac / Windows / Linux	Unix / Web	VEP is Ensembl’s variant effect predictor. It integrates most common variant annotation tools and databases in a single location. It is available as a command line tool and a web interface for a smaller subset of variants. The command line tool is installed (with numerous annotation packages) in HPC and is available for all users. Many variant effect prediction tools like AlphaMissense, spliceai and MTSplice are readily integrated into VEP and can be added to variant annotations.	Has a lot of tools already integrated and can be called with simple flags. The output format can be specified for easy parsing (vcf, txt, json).	Not all tools are available online, the hpc modules will require command line interaction. Depending on the modules needed can be somewhat slow.	Detailed documentation and tutorials	https://www.ensembl.info/	https://useast.ensembl.org/info/docs/tools/vep/index.html	Many Organisms	No
MMSeqs	Sequence alignment	MMSeqs server can only be accessed through colabfold at the momente but a local server can also be set up if needed						API call	MMSeqs is a rapid multiple sequence alignment tool. It is several orders of magnitude faster than hmmer and jackhmmer with similar accuracy. Currently the web server is not available for use but you can still use mmseqs through colabfold (see above).	Very fast, can generate sequecne alignments and clusters from 1000s organisms.	Can only be accessed through colabford or through an API.			https://search.mmseqs.com/	Many Organisms	No
Expasy Tools	General tools	Simple calculations for protein, DNA, RNA sequence analysis	Sequence	Depends on tool used	Web and Command Line	Depends on tool used	Mac / Windows / Linux	Unix / Web	Expasy is hosted by Swill Bioinformatics Resource Portal. This is not a single tool but a large collection of small tools that can be use independently from one another. There are several tools for sequence, structure and small molecule analysis from similarity searches to simple molecular weight/isoelectric point calculation. Each tool has its own web interface with detailed documentation.	Very easy to use and diverse tools on many different types of analysis.	These are small tools that are designed to perform quick calculations. There is no API for programmatic acccess.			https://www.expasy.org/	N/A	No
Alphafold 3 Server	Structure Prediction	Structure Prediction	Sequence	Zipped download + Web output	Web	Non-commercial use only (See site for full terms)	Browser	Browser	AlphaFold Server is a web-service that can generate highly accurate biomolecular structure predictions containing proteins, DNA, RNA, ligands, ions, and also model chemical modifications for proteins and nucleic acids in one platform. It’s powered by the newest AlphaFold 3 model. While AF3 is publicly available pending agreement to certain license conditions the server provides an easier to use alternative.	Easy to use, there are no programming or hardware requiremnets. Reasonably fast.	There are limilations on the number of structures that can be calculated and limitations on the types of questions that can be asked.			https://alphafoldserver.com/welcome	N/A	No
Chai1 Server	Structure Prediction	Uses Chai model for structure prediction	protein, DNA, RNA sequence and smiles strings	pdb, json	Web	Non-commercial use only (See site for full terms)	Browser	Browser	Chai1 is another protein, DNA, RNA, ligand structure prediction tool that is developed by Discoverly labs. Based on their technical reports it is as accurate as Alphafold3 in most instances and shows better perfromance on antibody-protein interactions. The model is available openly in github and there is a web interface for ease of use.	Same as AF3 server	Same as AF3 server	About page		https://lab.chaidiscovery.com/auth/login?callbackUrl=https%3A%2F%2Flab.chaidiscovery.com%2Fdashboard	N/A	No
PhaSePred	Protein Characterization	Protein Characterization	Protein Name / ID	Web output / downloadable results	Web	free for non-commercial use for academic, government and non-profit institutions	Browser	Browser	Phase separation (PS) mediates the compartmentalization of proteins and nucleic acids in cell. This process is driven by multivalent weak interactions mediated by intrinsically disordered regions (IDRs) or multiple modular domains. A difference between these two interactions is that a single molecule species can undergo IDR-mediated phase separation, while phase separation mediated by multiple interacting domains often involves two or more different molecule species. PhaSePred is a centralized resource that provides self-assembling and partner-dependent phase-separating protein prediction and integrates scores from several PS-related predicting tools.	Easy to use, quick results, good visualization tools, focused on phase separation	May not cover all types of phase separation or complex interactions	Tutorial available on site	No specific community forum	http://predict.phasep.pro/	Primarily human	No
DeepLoc	Protein Localization Prediction	Predicts subcellular localization of eukaryotic proteins using deep learning models.	Protein Sequence in FASTA format	Web output / downloadable results	Web	The downloadable version of DeepLoc 2.1 is being commercialized (it is licensed for a fee to commercial users)	Browser	Browser	DeepLoc 2.0 predicts the subcellular localization(s) of eukaryotic proteins. DeepLoc 2.0 is a multi-label predictor, which means that is able to predict one or more localizations for any given protein. It can differentiate between 10 different localizations: Nucleus, Cytoplasm, Extracellular, Mitochondrion, Cell membrane, Endoplasmic reticulum, Chloroplast, Golgi apparatus, Lysosome/Vacuole and Peroxisome. Additionally, DeepLoc 2.0 can predict the presence of the sorting signal(s) that had an influence on the prediction of the subcellular localization(s). One can use DeepLocPro for prokaryotic proteins and DeepLocRNA for RNA localization.	High accuracy, supports many localization types.	Limited to eukaryotic proteins, performance may vary based on sequence data quality.	Brief instructions section	No specific community forum	https://services.healthtech.dtu.dk/services/DeepLoc-2.1/	Eukaryotes	No
Phobius	Prediction	Predicts transmembrane topology and signal peptides in proteins.	Protein Sequence in FASTA format	Web output / downloadable results	Web	Phobius is freely available for local installation for academic use	Browser	Browser	Phobius is a tool that predicts the transmembrane topology and signal peptides of a protein from its amino acid sequence. It can identify membrane, signal peptide, or cytoplasmic loop states with a single label. It can also force the predictor to choose between two types of features to improve discrimination. Phobius is available for free local installation for academic use on Unix platforms with Perl version 5.6 or later. It can also be accessed through the Phobius web server.	Accurate transmembrane prediction, user-friendly interface	Primarily limited to membrane proteins, not suitable for predicting other structural features.	Instructions section	No specific community forum	https://phobius.sbc.su.se/index.html	Eukaryotes	No
ProtVar	Variant Annotation	Provides information on the potential functional impact of protein variants, particularly related to disease.	Genomic or Protein variant location	Web output / downloadable results	Web	Creative Commons license	Browser	Browser	ProtVar (Protein Variation) is a resource to investigate SNV missense variation (not InDels) in humans by presenting annotations which may be relevant to interpretation. The tool is similar to VEP. It is easier to use but not as robust as VEP (see above).	High relevance to disease research, integrates multiple data sources	Primarily focused on human data, limited to annotated variants	Webinar tutorial video	No specific community forum	https://www.ebi.ac.uk/ProtVar/	Primarily human	No
GENCODE	Genome database	Provides comprehensive and high-quality annotations of human and mouse genomes, including gene models and transcript variants.	No input	GTF, GFF, FASTA	Web	Open (See: https://www.ebi.ac.uk/about/terms-of-use)	Browser	Browser	GENCODE is a project that aims to identify and annotate all protein-coding genes in the Human and Mouse genome, using a combination of computational analysis, targeted experimental approaches and manual curation. GENCODE also contains gene annotation files, amino acid and nucleotide sequence files of known proteins and transcripts. The project releases updates to its database multiple times per year and all data are available for download.	Actively updated	Limited to human and mouse species, large dataset may be cumbersome for some users.	Documentation on data output types	Github Page	https://www.gencodegenes.org/	Human and Mouse	No
GTEx	Expression database	Provides gene expression data across different tissues from healthy human donors to understand genetic regulation of gene expression.	Gene ID / name	Web output / downloadable results	Web	Open licensing (see: https://gtexportal.org/home/license)	Browser	Browser	GTEx is the Adult Genotype Tissue Expression (GTEx) project and aims to study human gene expression and regulation, and its relationship to genetic variation across multiple diverse tissues and individuals. More specifically GTEx contains data from and includes RNA-seq, snRNA-Seq, long-read RNA-seq, QTL (single and multi-tissue), histology, protein expression, methylation QTLs, and variant detections, across various tissue types and individuals. These data can be downloaded locally or visualized in a browser. The online browsers can be used to visualize bulk RNA-Seq expression, snRNA-Seq expression and Tissue & Histology data. There is also dGTEx which aims to study development-specific genetic effects on gene expression, as well as NHP dGTEx which studies development-specific genetic effects on gene expression in non-human primates. Both dGTEx and NHP dGTEx are still underway with no data being released yet.	Rich resource for gene expression analysis across tissues.	Data only from healthy donors, doesn’t cover disease-specific data.	How to videos and tutorials	Github Page	https://gtexportal.org/home/	Human	No
Alphafold Database	Structure Prediction	Provides predicted protein structures generated by AlphaFold for a wide range of species.	Protein / Gene name or Protein sequence	Web output / PDB downloads	Web	Creative Commons Attribution 4.0	Browser	Browser	Google DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI) have partnered to create AlphaFold DB to make these predictions freely available to the scientific community. The latest database release contains over 200 million entries, providing broad coverage of UniProt (the standard repository of protein sequences and annotations). They provide individual downloads for the human proteome and for the proteomes of 47 other key organisms important in research and global health.	State-of-the-art predictions, large-scale coverage across species.	Models based on predictions, not experimental data.	FAQ and About page	Github and Google Groups	https://alphafold.ebi.ac.uk/	48 complete proteomes (including Human)	Yes
ESM Atlas	Structure Database	Provides a large-scale resource for protein embeddings, using ESM (Evolutionary Scale Modeling) to predict protein sequences and functions.	Protein Sequence in FASTA format	Web output / downloadable results	Web	CC BY 4.0 license	Browser	Browser	Predicts protein properties, evolutionary embeddings	High-quality embeddings, useful for functional annotation.	Focus on sequence-function relationships; may not be fully predictive for all proteins.	About page for documentation	Github Page	https://esmatlas.com/	Eukaryotes	No
Interpro	Protein Function, Annotation	Integrates diverse protein family, domain, and functional site information to annotate protein sequences.	Protein Sequence in FASTA format / Protein ID	Web output / downloadable results	Web	CC0 1.0 Universal (CC0 1.0) Public Domain Dedication.	Browser	Browser	InterPro is a database which contains functional analysis of proteins and contains classification of protein families and predictions of domains and sites. InterPro integrates different member databases into a larger InterPro consortium, so searching for proteins can be aggregated across all the databases. Users can search for proteins by sequence, protein name or search for specific domain architectures of interest. Additionally, data from entire proteomes or member databases can be exported locally.	Wide coverage of protein domains, integrates well with other tools.	Somewhat complex interface for new users.	Quick tour guides and documentation pages	No specific community forum	https://www.ebi.ac.uk/interpro/	Many Organisms	Yes
STRING	Protein-Protein Interaction	Provides known and predicted protein-protein interactions (PPIs) for a wide range of organisms.	Protein Name / ID	Web output / downloadable results	Web	Creative Commons BY 4.0 license.	Browser	Browser	STRING is a database of known and predicted protein-protein interactions. The interactions include direct (physical) and indirect (functional) associations; they stem from computational prediction, from knowledge transfer between organisms, and from interactions aggregated from other (primary) databases.	Rich network of interactions, integrates with other databases.	Predictive data may not always be as reliable as experimental data.	User documentation and help videos	No specific community forum	https://string-db.org/	Many Organisms	No
PhospoSitePlus	Post-translational Modifications (PTMs)	Provides information on experimentally verified phosphorylation sites, as well as other PTMs.	Protein Sequence in FASTA format / Protein ID	Web output / downloadable results	Web	Licensing Agreement (see: https://www.phosphosite.org/staticLicensing)	Browser	Browser	PhosphoSitePlus is an online database which contains data for the study of protein post-translational modifications. PhosphoSitePlus contains data on post-translational modifications such as phosphorylation, acetylation, methylation, ubiquitination, and O-glycosylation. The database includes data from various diseases, tissue types and cell lines as well as built in motif analysis, sequence logo analysis and kinase predictions. Proteins can be searched for uniquely or comparatively by sequence or various specific sites and there are also options to download datasets locally. PhosphoSitePlus uses publicly available data, from numerous journal articles that have been published.	Extensive, experimental data on PTMs.	Primarily focused on human data, may not cover all PTMs comprehensively.	About page with broad overview of features	No specific community forum	https://www.phosphosite.org/homeAction.action	Human, mouse and rat	No
DisProt	Protein Structure, Disorder Prediction	Database of intrinsically disordered proteins (IDPs) and disordered regions within proteins.	Protein Sequence in FASTA format / Protein ID	Web output / downloadable results	Web	Creative Commons Attribution 4.0 International License	Browser	Browser	DisProt is the major manually curated repository of Intrinsically Disordered Proteins, both for structural and functional aspects. Expert curators are involved in collecting experimentally confirmed biological data, valuable for the scientific community, and for updating and maintaining DisProt over time.	High-quality, experimentally verified data on intrinsically disordered proteins.	Limited coverage compared to complete proteome analysis; focuses on disordered regions only.	Courses offered virtually or in person + documentation page	No specific community forum	https://disprot.org/	Human and other model organisms	No
VASTdb	Alternative Splicing	Alternative Splicing (AS) profiles across multiple tissue and cell types	Gene IDs / Coordinates	Web output / downloadable results	Web	Creative Commons License (Attribution-NonCommercial 4.0 International)	Browser	Browser	VastDB is a database of Alternative Splicing (AS) profiles across multiple tissue and cell types. VastDB contains AS events (including cassette exons, microexons, alternative 5′ and 3′ splice sites and retained introns) from various species. AS event identification and sequence inclusion level quantification in RNA-seq samples have been performed with VAST-TOOLS. In addition to AS inclusion levels, Vas’d provides general information about the AS events, including genomic and sequence context, impact on the reading frame, overlap with protein domains and disordered regions, mapping to protein structures, evolutionary conservation and primers for AS event validation through RT-PCR. Moreover, it also provides measures of Gene Expression, using the cRPKM metric.	Alternative Splicing (AS) profiles across multiple tissue and cell types	Limited number of organisms	FAQ page	No specific community forum	https://vastdb.crg.eu/wiki/Main_Page	Human, mouse, rat, cow, chicken, zebrafish and fruit fly	No
BioGRID	Protein-Protein Interactions	Provides an interaction repository that catalogs experimental data on protein-protein and genetic interactions across multiple organisms.	Protein / Gene IDs	Web output / downloadable results	Web	Open (see: https://wiki.thebiogrid.org/doku.php/terms_and_conditions)	Browser	Browser	The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (thebiogrid.org). BioGRID currently holds over 1,740,000 interactions curated from both high-throughput datasets and individual focused studies, as derived from over 70,000+ publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (S. cerevisiae), fission yeast (S. pombe) and thale cress (A. thaliana), and efforts to expand curation across multiple metazoan species are underway.	Large dataset of experimentally verified interactions, supports many organisms.	Limited to experimental interactions, may miss predictions or hypothetical interactions.	Wiki pages	Github Page	https://thebiogrid.org/	Many Organisms	No
Intact	Molecular Interaction	Database of molecular interactions, focusing on protein-protein interactions (PPIs), including both experimentally determined and predicted interactions.	Protein / Gene IDs	Web output / downloadable results	Web	Creative Commons Attribution 4.0 International (CC BY 4.0) License and Apache License, Version 2.0	Browser	Browser	IntAct provides a free, open source database system and analysis tools for molecular interaction data. All interactions are derived from literature curation or direct user submissions.	Robust data integration, supports many organisms, extensive molecular interaction data.	Primarily focused on molecular interactions, not all interactions are experimentally verified.	Detailed user guide	No specific community forum	https://www.ebi.ac.uk/intact/home	Many Organisms	No
Complex Portal	Macromolecular Complexes	Provides detailed information on the composition and interactions of protein complexes.	Protein / Gene IDs	Web output / downloadable results	Web	Creative Commons Public Domain (CC0) License and Apache License, Version 2.0	Browser	Browser	The Complex Portal is an encyclopaedic resource of macromolecular complexes from a number of key model organisms. In addition to the expert manually curated complexes, the portal now holds high-confidence machine-learning predicted human complexes from hu.MAP3.0 and MuSIC. All data is freely available for search and download.	Focus on protein complexes, integrates data from multiple sources.	Primarily human-centric, may not cover all types of protein complexes.	Detailed user guide	No specific community forum	https://www.ebi.ac.uk/complexportal/home	Human	No
Human Protein Atlas	Expression database	Provides information on the expression profiles of proteins in human tissues, organs, and cell lines.	Protein / Gene IDs	Web output / downloadable results	Web	Creative Commons Attribution-ShareAlike 4.0 International License	Browser	Browser	The Human Protein Atlas is a Swedish-based program initiated in 2003 with the aim to map all the human proteins in cells, tissues, and organs using an integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics, and systems biology. All the data in the knowledge resource is open access to allow scientists both in academia and industry to freely access the data for exploration of the human proteome.	Rich tissue expression data, visualization tools for protein localization.	Focused mainly on human tissues, limited for non-human species.	About page	No specific community forum	https://www.proteinatlas.org/	Human	No
Allen Brain Map	Expression database	Provides data on gene expression and brain activity in human and other species’ brains, with spatial resolution.	Cell / Tissue types / Gene IDs	Web output / downloadable results	Web	Open (See: Terms of Use	Browser	Browser	The Allen Brain Atlas is a free, online resource that maps gene expression, connectivity, and neuroanatomical information for the brains of mice, humans, and non-human primates. Data modalities such as gene expression and neural connectivity are deposited to the atlas by researchers. All data is availble for download both as raw as well as processed and annotated data	Excellent resource for neurobiology research, detailed brain maps.		Tutorial videos available	No specific community forum	https://portal.brain-map.org/	Human and Mouse	No
Ensembl	Genome database	Provides genome-wide annotations, variants, gene expression data, and regulatory information across many species.	Gene ID / name / Coordinates / Variant information	Web output / downloadable results	Web	Apache 2.0 software license	Browser	Browser	Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.	Highly versatile, covers many species, integration with genome browsers.	The vast amount of data can be overwhelming, may require significant computing power for large-scale analyses.	Detailed documentation page / wiki	No specific community forum	https://useast.ensembl.org/index.html	Many Organisms	Yes
Human Cell Atlas	Expression database	Maps the gene expression profiles of human cells, providing high-resolution data on cell types and states.	Gene IDs / Tissue types	Web output / downloadable results	Web	Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	Browser	Browser	The Human Cell Atlas is a global consortium that is mapping every cell type in the human body, creating a 3-dimensional Atlas of human cells to transform our understanding of biology and disease. The Atlas is likely to lead to major advances in the way illnesses are diagnosed and treated. Data is avable from many different tissues, labs and donors all freely	Highly detailed and resolution-specific, major resource for cell biology.	Focused primarily on human cells, may require advanced computational tools for analysis.	Detailed guides	Slack and Github	https://www.humancellatlas.org/	Human	No
Genomics Data Commons	Genomics, Cancer Data	Provides genomic, transcriptomic, and clinical data primarily for cancer research, integrated from large-scale cancer genome studies.	No input	Web output / downloadable results	Web	See: https://gdc.cancer.gov/about-gdc/gdc-policies	Browser	Browser	The NCI’s Genomic Data Commons (GDC) provides the cancer research community with a repository and computational platform for cancer researchers who need to understand cancer, its clinical progression, and response to therapy. The GDC supports several cancer genome programs at the NCI Office of Cancer Genomics (OCG), including The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research to Generate Effective Treatments (TARGET). Data from more than 40000 cases are available for download.	Large dataset, highly relevant for cancer research.	Focused primarily on cancer genomics; may not be as broadly applicable to non-cancer research.	Detailed About the data section	No specific community forum	https://gdc.cancer.gov/	Human	Yes
Pediatric Cancer Platform	Genome database	Provides genomic and clinical data for pediatric cancer research, including patient data, somatic mutations, and expression profiles.	Gene ID / Tissue Type / Study Type	Web output / downloadable results	Web	Open (See: https://www.stjude.org/legal.html)	Browser	Browser	The PeCan platform presents curated pediatric cancer genomics data including variants, mutational signatures, and gene expression data in addition to histological slide images* from ~9000 hematological, CNS, and non-CNS solid tumor patient samples. Data can be explored via a series of data facets containing both retrospective and prospective study cohorts from St. Jude Children’s Research Hospital and other trusted institutions and research centers around the world.	Highly valuable for pediatric cancer research, provides extensive patient data.	Niche focus on pediatric cancers, limited scope for non-cancer data.	Documentation on data output types	No specific community forum	https://pecan.stjude.cloud/	Human	No
MaveDB	Protein Function, Evolutionary Analysis	Database of multiplexed assay of variant effects (MAVEs), analyzing how genetic variants affect protein function.	Gene ID / Protein ID / Coordinates / Variant information	Web output / downloadable results	Web	MaveDB is open-source, released under the AGPLv3 license.	Browser	Browser	MaveDB is a public repository for datasets from Multiplexed Assays of Variant Effect (MAVEs), such as those generated by deep mutational scanning (DMS) or massively parallel reporter assay (MPRA) experiments.	Focuses on evolutionary analysis of protein function.	Primarily for protein variants, may not be suitable for large-scale genome-wide studies.	Documentation page for tutorials	No specific community forum	https://www.mavedb.org/	Human	No