Menu Close

Comparison of tools

View in Excel

Name Category Functionality Data Input Formats Data Output Formats Software Type Licensing Operating System Programming Language Key Features Strengths Limitations Documentation Community Support Website/Citation Organism Supported HPC Module Galaxy Access
ColabFold Structure Prediction Uses AF2, AF2-multimer and Rosettafold for protein structure preditcition Protein sequence PDB,numpy array, jsoin Jupyter notebook on google servers   Browser Browser/Python Colabfold uses google colab notebooks to run different versions of Alphafold2 for structure prediction of monomers and protein complexes. The multiple sequence alignment (MSA) step is performed by using mmseqs API (see below) and can be used to generate MSAs for other puproses. There are many different versions of AF2 are available. Due to limitations in available hardware in free tier google colab notebooks it may not be feasible to fold proteins or complexes >2000 aa total. Easy to use, no gpu or sophisticated hardware needed Limited to GPUs available on free tier for google colab. This will limit the size of the protein (or protein complexes) that can be predicted.     https://github.com/sokrypton/ColabFold N/A   Yes
VEP Variant effect prediction Determines effects of variants VCF / Variant information manually entered VEP Format TSV Web and Command Line Apache 2.0 Mac / Windows / Linux Unix / Web VEP is Ensembl’s variant effect predictor. It integrates most common variant annotation tools and databases in a single location. It is available as a command line tool and a web interface for a smaller subset of variants. The command line tool is installed (with numerous annotation packages) in HPC and is available for all users. Many variant effect prediction tools like AlphaMissense, spliceai and MTSplice are readily integrated into VEP and can be added to variant annotations. Has a lot of tools already integrated and can be called with simple flags. The output format can be specified for easy parsing (vcf, txt, json). Not all tools are available online, the hpc modules will require command line interaction. Depending on the modules needed can be somewhat slow. Detailed documentation and tutorials https://www.ensembl.info/ https://useast.ensembl.org/info/docs/tools/vep/index.html Many Organisms   No
MMSeqs Sequence alignment MMSeqs server can only be accessed through colabfold at the momente but a local server can also be set up if needed           API call MMSeqs is a rapid multiple sequence alignment tool. It is several orders of magnitude faster than hmmer and jackhmmer with similar accuracy. Currently the web server is not available for use but you can still use mmseqs through colabfold (see above). Very fast, can generate sequecne alignments and clusters from 1000s organisms. Can only be accessed through colabford or through an API.     https://search.mmseqs.com/ Many Organisms   No
Expasy Tools General tools Simple calculations for protein, DNA, RNA sequence analysis Sequence Depends on tool used Web and Command Line Depends on tool used Mac / Windows / Linux Unix / Web Expasy is hosted by Swill Bioinformatics Resource Portal. This is not a single tool but a large collection of small tools that can be use independently from one another. There are several tools for sequence, structure and small molecule analysis from similarity searches to simple molecular weight/isoelectric point calculation. Each tool has its own web interface with detailed documentation. Very easy to use and diverse tools on many different types of analysis. These are small tools that are designed to perform quick calculations. There is no API for programmatic acccess.     https://www.expasy.org/ N/A   No
Alphafold 3 Server Structure Prediction Structure Prediction Sequence Zipped download + Web output Web Non-commercial use only (See site for full terms) Browser Browser AlphaFold Server is a web-service that can generate highly accurate biomolecular structure predictions containing proteins, DNA, RNA, ligands, ions, and also model chemical modifications for proteins and nucleic acids in one platform. It’s powered by the newest AlphaFold 3 model. While AF3 is publicly available pending agreement to certain license conditions the server provides an easier to use alternative. Easy to use, there are no programming or hardware requiremnets. Reasonably fast. There are limilations on the number of structures that can be calculated and limitations on the types of questions that can be asked.     https://alphafoldserver.com/welcome N/A   No
Chai1 Server Structure Prediction Uses Chai model for structure prediction protein, DNA, RNA sequence and smiles strings pdb, json Web Non-commercial use only (See site for full terms) Browser Browser Chai1 is another protein, DNA, RNA, ligand structure prediction tool that is developed by Discoverly labs. Based on their technical reports it is as accurate as Alphafold3 in most instances and shows better perfromance on antibody-protein interactions. The model is available openly in github and there is a web interface for ease of use. Same as AF3 server Same as AF3 server About page   https://lab.chaidiscovery.com/auth/login?callbackUrl=https%3A%2F%2Flab.chaidiscovery.com%2Fdashboard N/A   No
PhaSePred Protein Characterization Protein Characterization Protein Name / ID Web output / downloadable results Web free for non-commercial use for academic, government and non-profit institutions Browser Browser Phase separation (PS) mediates the compartmentalization of proteins and nucleic acids in cell. This process is driven by multivalent weak interactions mediated by intrinsically disordered regions (IDRs) or multiple modular domains. A difference between these two interactions is that a single molecule species can undergo IDR-mediated phase separation, while phase separation mediated by multiple interacting domains often involves two or more different molecule species. PhaSePred is a centralized resource that provides self-assembling and partner-dependent phase-separating protein prediction and integrates scores from several PS-related predicting tools. Easy to use, quick results, good visualization tools, focused on phase separation May not cover all types of phase separation or complex interactions Tutorial available on site No specific community forum http://predict.phasep.pro/ Primarily human   No
DeepLoc Protein Localization Prediction Predicts subcellular localization of eukaryotic proteins using deep learning models. Protein Sequence in FASTA format Web output / downloadable results Web The downloadable version of DeepLoc 2.1 is being commercialized (it is licensed for a fee to commercial users) Browser Browser DeepLoc 2.0 predicts the subcellular localization(s) of eukaryotic proteins. DeepLoc 2.0 is a multi-label predictor, which means that is able to predict one or more localizations for any given protein. It can differentiate between 10 different localizations: Nucleus, Cytoplasm, Extracellular, Mitochondrion, Cell membrane, Endoplasmic reticulum, Chloroplast, Golgi apparatus, Lysosome/Vacuole and Peroxisome. Additionally, DeepLoc 2.0 can predict the presence of the sorting signal(s) that had an influence on the prediction of the subcellular localization(s). One can use DeepLocPro for prokaryotic proteins and DeepLocRNA for RNA localization. High accuracy, supports many localization types. Limited to eukaryotic proteins, performance may vary based on sequence data quality. Brief instructions section No specific community forum https://services.healthtech.dtu.dk/services/DeepLoc-2.1/ Eukaryotes   No
Phobius Prediction Predicts transmembrane topology and signal peptides in proteins. Protein Sequence in FASTA format Web output / downloadable results Web Phobius is freely available for local installation for academic use Browser Browser Phobius is a tool that predicts the transmembrane topology and signal peptides of a protein from its amino acid sequence. It can identify membrane, signal peptide, or cytoplasmic loop states with a single label. It can also force the predictor to choose between two types of features to improve discrimination. Phobius is available for free local installation for academic use on Unix platforms with Perl version 5.6 or later. It can also be accessed through the Phobius web server. Accurate transmembrane prediction, user-friendly interface Primarily limited to membrane proteins, not suitable for predicting other structural features. Instructions section No specific community forum https://phobius.sbc.su.se/index.html Eukaryotes   No
ProtVar Variant Annotation Provides information on the potential functional impact of protein variants, particularly related to disease. Genomic or Protein variant location Web output / downloadable results Web Creative Commons license Browser Browser ProtVar (Protein Variation) is a resource to investigate SNV missense variation (not InDels) in humans by presenting annotations which may be relevant to interpretation. The tool is similar to VEP. It is easier to use but not as robust as VEP (see above). High relevance to disease research, integrates multiple data sources Primarily focused on human data, limited to annotated variants Webinar tutorial video No specific community forum https://www.ebi.ac.uk/ProtVar/ Primarily human   No
GENCODE Genome database Provides comprehensive and high-quality annotations of human and mouse genomes, including gene models and transcript variants. No input GTF, GFF, FASTA Web Open (See: https://www.ebi.ac.uk/about/terms-of-use) Browser Browser GENCODE is a project that aims to identify and annotate all protein-coding genes in the Human and Mouse genome, using a combination of computational analysis, targeted experimental approaches and manual curation. GENCODE also contains gene annotation files, amino acid and nucleotide sequence files of known proteins and transcripts. The project releases updates to its database multiple times per year and all data are available for download. Actively updated Limited to human and mouse species, large dataset may be cumbersome for some users. Documentation on data output types Github Page https://www.gencodegenes.org/ Human and Mouse   No
GTEx Expression database Provides gene expression data across different tissues from healthy human donors to understand genetic regulation of gene expression. Gene ID / name Web output / downloadable results Web Open licensing (see: https://gtexportal.org/home/license) Browser Browser GTEx is the Adult Genotype Tissue Expression (GTEx) project and aims to study human gene expression and regulation, and its relationship to genetic variation across multiple diverse tissues and individuals. More specifically GTEx contains data from and includes RNA-seq, snRNA-Seq, long-read RNA-seq, QTL (single and multi-tissue), histology, protein expression, methylation QTLs, and variant detections, across various tissue types and individuals. These data can be downloaded locally or visualized in a browser. The online browsers can be used to visualize bulk RNA-Seq expression, snRNA-Seq expression and Tissue & Histology data. There is also dGTEx which aims to study development-specific genetic effects on gene expression, as well as NHP dGTEx which studies development-specific genetic effects on gene expression in non-human primates. Both dGTEx and NHP dGTEx are still underway with no data being released yet. Rich resource for gene expression analysis across tissues. Data only from healthy donors, doesn’t cover disease-specific data. How to videos and tutorials Github Page https://gtexportal.org/home/ Human   No
Alphafold Database Structure Prediction Provides predicted protein structures generated by AlphaFold for a wide range of species. Protein / Gene name or Protein sequence Web output / PDB downloads Web Creative Commons Attribution 4.0 Browser Browser Google DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI) have partnered to create AlphaFold DB to make these predictions freely available to the scientific community. The latest database release contains over 200 million entries, providing broad coverage of UniProt (the standard repository of protein sequences and annotations). They provide individual downloads for the human proteome and for the proteomes of 47 other key organisms important in research and global health. State-of-the-art predictions, large-scale coverage across species. Models based on predictions, not experimental data. FAQ and About page Github and Google Groups https://alphafold.ebi.ac.uk/ 48 complete proteomes (including Human)   Yes
ESM Atlas Structure Database Provides a large-scale resource for protein embeddings, using ESM (Evolutionary Scale Modeling) to predict protein sequences and functions. Protein Sequence in FASTA format Web output / downloadable results Web CC BY 4.0 license Browser Browser Predicts protein properties, evolutionary embeddings High-quality embeddings, useful for functional annotation. Focus on sequence-function relationships; may not be fully predictive for all proteins. About page for documentation Github Page https://esmatlas.com/ Eukaryotes   No
Interpro Protein Function, Annotation Integrates diverse protein family, domain, and functional site information to annotate protein sequences. Protein Sequence in FASTA format / Protein ID Web output / downloadable results Web CC0 1.0 Universal (CC0 1.0) Public Domain Dedication. Browser Browser InterPro is a database which contains functional analysis of proteins and contains classification of protein families and predictions of domains and sites. InterPro integrates different member databases into a larger InterPro consortium, so searching for proteins can be aggregated across all the databases. Users can search for proteins by sequence, protein name or search for specific domain architectures of interest. Additionally, data from entire proteomes or member databases can be exported locally. Wide coverage of protein domains, integrates well with other tools. Somewhat complex interface for new users. Quick tour guides and documentation pages No specific community forum https://www.ebi.ac.uk/interpro/ Many Organisms   Yes
STRING Protein-Protein Interaction Provides known and predicted protein-protein interactions (PPIs) for a wide range of organisms. Protein Name / ID Web output / downloadable results Web Creative Commons BY 4.0 license. Browser Browser STRING is a database of known and predicted protein-protein interactions. The interactions include direct (physical) and indirect (functional) associations; they stem from computational prediction, from knowledge transfer between organisms, and from interactions aggregated from other (primary) databases. Rich network of interactions, integrates with other databases. Predictive data may not always be as reliable as experimental data. User documentation and help videos No specific community forum https://string-db.org/ Many Organisms   No
PhospoSitePlus Post-translational Modifications (PTMs) Provides information on experimentally verified phosphorylation sites, as well as other PTMs. Protein Sequence in FASTA format / Protein ID Web output / downloadable results Web Licensing Agreement (see: https://www.phosphosite.org/staticLicensing) Browser Browser PhosphoSitePlus is an online database which contains data for the study of protein post-translational modifications. PhosphoSitePlus contains data on post-translational modifications such as phosphorylation, acetylation, methylation, ubiquitination, and O-glycosylation. The database includes data from various diseases, tissue types and cell lines as well as built in motif analysis, sequence logo analysis and kinase predictions. Proteins can be searched for uniquely or comparatively by sequence or various specific sites and there are also options to download datasets locally. PhosphoSitePlus uses publicly available data, from numerous journal articles that have been published. Extensive, experimental data on PTMs. Primarily focused on human data, may not cover all PTMs comprehensively. About page with broad overview of features No specific community forum https://www.phosphosite.org/homeAction.action Human, mouse and rat   No
DisProt Protein Structure, Disorder Prediction Database of intrinsically disordered proteins (IDPs) and disordered regions within proteins. Protein Sequence in FASTA format / Protein ID Web output / downloadable results Web Creative Commons Attribution 4.0 International License Browser Browser DisProt is the major manually curated repository of Intrinsically Disordered Proteins, both for structural and functional aspects. Expert curators are involved in collecting experimentally confirmed biological data, valuable for the scientific community, and for updating and maintaining DisProt over time. High-quality, experimentally verified data on intrinsically disordered proteins. Limited coverage compared to complete proteome analysis; focuses on disordered regions only. Courses offered virtually or in person + documentation page No specific community forum https://disprot.org/ Human and other model organisms   No
VASTdb Alternative Splicing Alternative Splicing (AS) profiles across multiple tissue and cell types Gene IDs / Coordinates Web output / downloadable results Web Creative Commons License (Attribution-NonCommercial 4.0 International) Browser Browser VastDB is a database of Alternative Splicing (AS) profiles across multiple tissue and cell types. VastDB contains AS events (including cassette exons, microexons, alternative 5′ and 3′ splice sites and retained introns) from various species. AS event identification and sequence inclusion level quantification in RNA-seq samples have been performed with VAST-TOOLS. In addition to AS inclusion levels, Vas’d provides general information about the AS events, including genomic and sequence context, impact on the reading frame, overlap with protein domains and disordered regions, mapping to protein structures, evolutionary conservation and primers for AS event validation through RT-PCR. Moreover, it also provides measures of Gene Expression, using the cRPKM metric. Alternative Splicing (AS) profiles across multiple tissue and cell types Limited number of organisms FAQ page No specific community forum https://vastdb.crg.eu/wiki/Main_Page Human, mouse, rat, cow, chicken, zebrafish and fruit fly   No
BioGRID Protein-Protein Interactions Provides an interaction repository that catalogs experimental data on protein-protein and genetic interactions across multiple organisms. Protein / Gene IDs Web output / downloadable results Web Open (see: https://wiki.thebiogrid.org/doku.php/terms_and_conditions) Browser Browser The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (thebiogrid.org). BioGRID currently holds over 1,740,000 interactions curated from both high-throughput datasets and individual focused studies, as derived from over 70,000+ publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (S. cerevisiae), fission yeast (S. pombe) and thale cress (A. thaliana), and efforts to expand curation across multiple metazoan species are underway. Large dataset of experimentally verified interactions, supports many organisms. Limited to experimental interactions, may miss predictions or hypothetical interactions. Wiki pages Github Page https://thebiogrid.org/ Many Organisms   No
Intact Molecular Interaction Database of molecular interactions, focusing on protein-protein interactions (PPIs), including both experimentally determined and predicted interactions. Protein / Gene IDs Web output / downloadable results Web  Creative Commons Attribution 4.0 International (CC BY 4.0) License and Apache License, Version 2.0 Browser Browser IntAct provides a free, open source database system and analysis tools for molecular interaction data. All interactions are derived from literature curation or direct user submissions. Robust data integration, supports many organisms, extensive molecular interaction data. Primarily focused on molecular interactions, not all interactions are experimentally verified. Detailed user guide No specific community forum https://www.ebi.ac.uk/intact/home Many Organisms   No
Complex Portal Macromolecular Complexes Provides detailed information on the composition and interactions of protein complexes. Protein / Gene IDs Web output / downloadable results Web Creative Commons Public Domain (CC0) License and Apache License, Version 2.0 Browser Browser The Complex Portal is an encyclopaedic resource of macromolecular complexes from a number of key model organisms. In addition to the expert manually curated complexes, the portal now holds high-confidence machine-learning predicted human complexes from hu.MAP3.0 and MuSIC. All data is freely available for search and download. Focus on protein complexes, integrates data from multiple sources. Primarily human-centric, may not cover all types of protein complexes. Detailed user guide No specific community forum https://www.ebi.ac.uk/complexportal/home Human   No
Human Protein Atlas Expression database Provides information on the expression profiles of proteins in human tissues, organs, and cell lines. Protein / Gene IDs Web output / downloadable results Web Creative Commons Attribution-ShareAlike 4.0 International License Browser Browser The Human Protein Atlas is a Swedish-based program initiated in 2003 with the aim to map all the human proteins in cells, tissues, and organs using an integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics, and systems biology. All the data in the knowledge resource is open access to allow scientists both in academia and industry to freely access the data for exploration of the human proteome. Rich tissue expression data, visualization tools for protein localization. Focused mainly on human tissues, limited for non-human species. About page No specific community forum https://www.proteinatlas.org/ Human   No
Allen Brain Map Expression database Provides data on gene expression and brain activity in human and other species’ brains, with spatial resolution. Cell / Tissue types / Gene IDs Web output / downloadable results Web Open (See:
Terms of Use
Browser Browser The Allen Brain Atlas is a free, online resource that maps gene expression, connectivity, and neuroanatomical information for the brains of mice, humans, and non-human primates. Data modalities such as gene expression and neural connectivity are deposited to the atlas by researchers. All data is availble for download both as raw as well as processed and annotated data Excellent resource for neurobiology research, detailed brain maps.   Tutorial videos available No specific community forum https://portal.brain-map.org/ Human and Mouse   No
Ensembl Genome database Provides genome-wide annotations, variants, gene expression data, and regulatory information across many species. Gene ID / name / Coordinates / Variant information Web output / downloadable results Web Apache 2.0 software license Browser Browser Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species. Highly versatile, covers many species, integration with genome browsers. The vast amount of data can be overwhelming, may require significant computing power for large-scale analyses. Detailed documentation page / wiki No specific community forum https://useast.ensembl.org/index.html Many Organisms   Yes
Human Cell Atlas Expression database Maps the gene expression profiles of human cells, providing high-resolution data on cell types and states. Gene IDs / Tissue types Web output / downloadable results Web Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Browser Browser The Human Cell Atlas is a global consortium that is mapping every cell type in the human body, creating a 3-dimensional Atlas of human cells to transform our understanding of biology and disease. The Atlas is likely to lead to major advances in the way illnesses are diagnosed and treated. Data is avable from many different tissues, labs and donors all freely Highly detailed and resolution-specific, major resource for cell biology. Focused primarily on human cells, may require advanced computational tools for analysis. Detailed guides Slack and Github https://www.humancellatlas.org/ Human   No
Genomics Data Commons Genomics, Cancer Data Provides genomic, transcriptomic, and clinical data primarily for cancer research, integrated from large-scale cancer genome studies. No input Web output / downloadable results Web See: https://gdc.cancer.gov/about-gdc/gdc-policies Browser Browser The NCI’s Genomic Data Commons (GDC) provides the cancer research community with a repository and computational platform for cancer researchers who need to understand cancer, its clinical progression, and response to therapy. The GDC supports several cancer genome programs at the NCI Office of Cancer Genomics (OCG), including The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research to Generate Effective Treatments (TARGET). Data from more than 40000 cases are available for download. Large dataset, highly relevant for cancer research. Focused primarily on cancer genomics; may not be as broadly applicable to non-cancer research. Detailed About the data section No specific community forum https://gdc.cancer.gov/ Human   Yes
Pediatric Cancer Platform Genome database Provides genomic and clinical data for pediatric cancer research, including patient data, somatic mutations, and expression profiles. Gene ID / Tissue Type / Study Type Web output / downloadable results Web Open (See: https://www.stjude.org/legal.html) Browser Browser The PeCan platform presents curated pediatric cancer genomics data including variants, mutational signatures, and gene expression data in addition to histological slide images* from ~9000 hematological, CNS, and non-CNS solid tumor patient samples. Data can be explored via a series of data facets containing both retrospective and prospective study cohorts from St. Jude Children’s Research Hospital and other trusted institutions and research centers around the world. Highly valuable for pediatric cancer research, provides extensive patient data. Niche focus on pediatric cancers, limited scope for non-cancer data. Documentation on data output types No specific community forum https://pecan.stjude.cloud/ Human   No
MaveDB Protein Function, Evolutionary Analysis Database of multiplexed assay of variant effects (MAVEs), analyzing how genetic variants affect protein function. Gene ID / Protein ID / Coordinates / Variant information Web output / downloadable results Web MaveDB is open-source, released under the AGPLv3 license. Browser Browser MaveDB is a public repository for datasets from Multiplexed Assays of Variant Effect (MAVEs), such as those generated by deep mutational scanning (DMS) or massively parallel reporter assay (MPRA) experiments. Focuses on evolutionary analysis of protein function. Primarily for protein variants, may not be suitable for large-scale genome-wide studies. Documentation page for tutorials No specific community forum https://www.mavedb.org/ Human   No