Protein domains are units of evolution. Domain combination analysis has been applied for examining proteins in various aspects. For instances, the analysis of co-occurring domains were related to protein functions and the prediction of protein cellular localization. Domain fusion was used for predicting protein-protein interactions. Domain graph and domain distance were introduced for exploring global properties of proteins in the genomes and investigating protein evolution, respectively. While various analyses of protein domains have been performed, available Web-based tools and servers such as PDART, CDART, and PfamAlyzer mainly enable protein homology search by domain architectures (DAs).

ATGC-Dom Web server was built with the aim to enable the comprehensive and customizable comparative analysis of proteomes based on DAs. It integrates three main analyses: (1) comparative proteomes based on DA search and alignment, (2) comparative domain versatilities and abundances based on domain graph, and (3) comparative protein evolutions based on domain distance. For customizable analyses, the user could either provide their own data sets in InterProScan raw format for various domain prediction tools or select data sets from system-provided database. We describe the three main features of the ATGC-Dom Web server in the following.

Poster Presentation :

  • Duangdao Wichadakul, Supawadee Ingsriswang, Eakasit Pacharawongsakda, Boonyarat Phadermrod and Sunai Yokwai, “ATGC-Dom: Alignment, Tree, and Graph for Comparative proteomes by DOMain architecture”, Proceedings of the 12th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2008), Mar 30-Apr 2, Singapore (Co-Winner of Poster Special Commendation Award)

ATGC-Dom Web Server Features:

Comparative proteome

  • The “comparative proteome” page lets the user enter proteins of interest and search for target proteins with the same or similar domain architectures. Proteins of interest could be provided by the user in raw format of InterProScan result. Or, the user could search for proteins of interest from system-provided database using (1) general terms such as “flowering,” “circadian rhythm,” or Gene Ontology (GO) ID such as “GO:0007623”, or (2) a combination of arbitrary domains. Proteins of interest will be searched against target proteins by which the user provides as the other InterProScan result in raw format or the user selects from system-provided database. The user may also specify DA score for the cuto . The search result is in a BLAST-like fashion summarizing number of matched target proteins by target organisms for each protein of interest. The user may explore the alignments of the matches in details. The results of comparative proteome highlight the conservation and diversification of proteins of interest based on their domain architectures within and across input data sets. They suggest protein sets with possibly redundant functions, possible annotations for unknown proteins, single copy genes in the genome, and etc.

Comparative domain versatility and abundance

  • The “comparative domain versatility and abundance” page lets the user explore versatility and abundance of protein domains within and among protein sets (e.g. among pathways in the same organisms, or among organisms for the same pathway). The user may provide some protein sets as InterProScan resulted files in raw format and select other sets from the system-provided database. The search result is in a table fashion summarizing the versatility and abundance of each protein domains for each protein set. The table allows the user to sort protein domains according to their versatilities or abundances. The user may explore the domain graph and protein lists of each co-occurring domains in a protein set and compare domain graphs among protein sets. Also, the domain graph is customizable to have direction, where an arrow from domain A to domain B represents the having of proteins with two consecutive domains A and B in the order from N- to C- terminals. The user may export domain graph in JPG, PNG, SVG, or PDF format. The domain graphs visualize conserved and diverged co-occurring domains across input data sets with different versatilities and abundances.

Comparative protein evolution

  • The “comparative protein evolution” page appears to the user after the user chooses all or some of the proteins resulted from other analyses. It calculates a distance matrix according to distances of domain architecture alignments. The user may choose to compare trees built from (1) different algorithms, or (2) different search tools (e.g. hmmpfam, hmmsmart, etc.), as well as (3) DA-based and sequence-based distance matrixes. The user may interactively explore trees of the proteins of interest and their domain architectures in scalable vector graphics (SVG) images. Proteins from di erent organisms are differentiated by colors. The user may export the images in JPG, PNG, SVG, or PDF formats. In addition, we incorporated a software tool for phylogeny comparison for the user to interactively compare trees. The comparative protein evolution results help a user to explore common ancestors, conserved domains among proteins during the evolution, and linage-specific domain architectures.