Zhejiang U | College of Pharmaceutical Sciences | 中文版
     
     
IDRB: Research Projects
RESEARCH PROJECTS

Our research projects in the fields of computer-aided drug design and bioinformatics are listed as follows:

  • Comparative study of nature-derived FDA approved drugs and Traditional Chinese Medicine (TCM) to reveal the mechanism of TCM. The mechanism of majority of the TCM is still unclear, this study tries to conduct a comparison between FDA approved drugs and TCM on the phylogenetic perspective to have an understanding on the distribution pattern between these two drug systems, which is expected to give us a deep understanding of the mechanism of TCM.
  • Deriving stable microarray cancer-differentiating signatures by machine learning and feature-elimination methods, and evaluating consensus scoring of multiple random sampling and Gene-Ranking’s consistency. Signatures identified reflect disease mechanism, and can provide indicators for disease diagnosis. My current interest lies in identifying biomarkers for breast cancer and major depression.
  • Identifying next generation innovative therapeutic targets for specific disease types, such as Obesity, Major Depression, Cancer, and so on. Collective methods are applied, which include: A. genetic sequence similarity analysis between drug-binding domains; B. computation of number of human similarity proteins, number of affiliated human pathways, and number of human tissues of a target; C. structural comparison between drug-binding domain; D. target classification based on physicochemical characteristics detected by machine learning.
  • Led and conduct the development of bioinformatics databases, which collect information of Biology, Pharmacy, Chemistry and so on. Moreover, we are interested in constructing innovative software for drug discovery and bioinformatics, which involves design and implementation of an integrated bioinformatics software system for novel therapeutic target agent explorations.
  • Conducting biostatistics study on the distribution of molecules with therapeutic effect, especially drugs approved and in clinical trial, across all biological species, and identifying key species for ecological protection. Comprehensive biostatistics studies on therapeutic targets in clinical trial, and comparative analysis against targets with drugs approved. Studying correlating groups of genes by utilizing graph theory for filtering complex gene correlation network. Genetic variation identified indicate complex inter- and intra-individual differences.

IDRB: Databases

DATABASE CONSTRUCTION

Our experiences on database construction have led to several pharmacoinformatics databases as follows:

  TTD: Therapeutic Target Database
    Database URL: http://db.idrblab.org/ttd/

    Extensive efforts have been directed at the discovery, investigation and clinical monitoring of targeted therapeutics. These efforts may be facilitated by the convenient access of the genetic, proteomic, interactive and other aspects of the therapeutic targets. Therefore, we developed the Therapeutic Target Database (TTD) to provide information about known and explored therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs directed at each of these targets. TTD was known to be one of the most popular pharmaceutical databases around the world, and included the links to relevant databases containing information about target function, sequence, 3D structure, ligand binding properties, enzyme nomenclature and drug structure, therapeutic class, and clinical development status.

    Our Publication(s) Describing This Database:

  1. Y. X. Wang, S. Zhang, F. C. Li, Y. Zhou, Y. Zhang, Z. W. Wang, R. Y. Zhang, J. Zhu, Y. X. Ren, Y. Tan, C. Qin, Y. H. Li, X. X. Li, Y. Z. Chen*, F. Zhu*. Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics. Nucleic Acids Research (impact factor of the publication year: 11.501, 生物一区 TOP 期刊). 48(D1): 1031-1041 (2020). PMID: 31691823.
  2. Publication of World-wide Reputation: ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 0.13% in 2022.
    • The Percentile in Subject Area shown in InCites™ was 0.15% in 2021.
    Media Coverage & News Report:

  3. Y. H. Li, C. Y. Yu, X. X. Li, P. Zhang, J. Tang, Q. X. Yang, T. T. Fu, X. Y. Zhang, X. J. Cui, G. Tu, Y. Zhang, S. Li, F. Y. Yang, Q. Sun, C. Qin, X. Zeng, Z. Chen, Y. Z. Chen*, F. Zhu*. Therapeutic target database update 2018: enriched resource for facilitating bench-to-clinic research of targeted therapeutics. Nucleic Acids Research (impact factor of the publication year: 11.561, 生物一区 TOP 期刊). 46(D1): 1121-1127 (2018). PMID: 29140520.
  4. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 0.21% in 2022.
    • The Percentile in Subject Area shown in InCites™ was 0.15% in 2021.
    • The Percentile in Subject Area shown in InCites™ was 0.14% in 2020.
    • The Percentile in Subject Area shown in InCites™ was 0.16% in 2019.
    Highlights by Experts in Subject Area:
    • Introduced by OMICTOOLS as "useful for facilitating patient focused research, discovery and clinical investigations of the targeted therapeutics".

  5. Y. Zhou, Y. T. Zhang, X. C. Lian, F. C. Li, C. X. Wang, F. Zhu*, Y. Q. Qiu*, Y. Z. Chen*. Therapeutic target database update 2022: facilitating drug discovery with enriched comparative data of targeted agents. Nucleic Acids Research (impact factor of the publication year: 16.971, 生物一区 TOP 期刊). 50(D1): 1398-1407 (2022). PMID: 34718717.
  6. Media Coverage & News Report:

  7. H. Yang, C. Qin, Y. H. Li, L. Tao, J. Zhou, C. Y. Yu, F. Xu, Z. Chen, F. Zhu*, Y. Z. Chen*. Therapeutic target database update 2016: enriched resource for bench to clinical drug target and targeted pathway information. Nucleic Acids Research (impact factor of the publication year: 9.202, 生物一区 TOP 期刊). 44(D1): 1069-1074 (2016). PMID: 26578601.
  8. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 0.99% in 2022.
    • The Percentile in Subject Area shown in InCites™ was 0.69% in 2021.
    • The Percentile in Subject Area shown in InCites™ was 0.78% in 2020.
    • The Percentile in Subject Area shown in InCites™ was 0.66% in 2019.
    • The Percentile in Subject Area shown in InCites™ was 0.71% in 2018.
    • The Percentile in Subject Area shown in InCites™ was 0.87% in 2017.

  9. F. Zhu, Z. Shi, C. Qin, L. Tao, X. Liu, F. Xu, L. Zhang, Y. Song, X. H. Liu, J. X. Zhang, B. C. Han, P. Zhang, Y. Z. Chen*. Therapeutic target database update 2012: a resource for facilitating target-oriented drug discovery. Nucleic Acids Research (impact factor of the publication year: 8.026, 生物一区 TOP 期刊). 40(D1): 1128-1136 (2012). PMID: 21948793.
  10. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 0.77% in 2022.
    • The Percentile in Subject Area shown in InCites™ was 0.69% in 2021.
    • The Percentile in Subject Area shown in InCites™ was 0.66% in 2020.
    • The Percentile in Subject Area shown in InCites™ was 0.60% in 2019.
    • The Percentile in Subject Area shown in InCites™ was 0.62% in 2018.
    • The Percentile in Subject Area shown in InCites™ was 0.31% in 2017.

    Highlights by Experts in Subject Area:

    • "FACULTYof1000" as "the top 2% of published articles in biology and medicine" and "a most useful resource for scientists and companies working on drug discovery and validation, drug lead discovery and optimization, and the development of multi-target drugs and drug combinations".
    • Prof. Chris Southan in his blog as "Therapeutic Target Database in PubChem".

  11. F. Zhu, B. C. Han, P. Kumar, X. H. Liu, X. H. Ma, X. N. Wei, L. Huang, Y. F. Guo, L. Y. Han, C. J. Zheng, Y. Z. Chen*. Update of TTD: therapeutic target database. Nucleic Acids Research (impact factor of the publication year: 7.479, 生物一区 TOP 期刊). 38(D1): 787-791 (2010). PMID: 19933260.
  12. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 2.95% in 2017.
  VARIDT: VARIability of Drug Transporter Database
    Database URL: https://idrblab.org/varidt/

    The absorption, distribution and excretion of drugs are largely determined by their transporters (DTs), the variability of which has thus attracted considerable attention. There are three aspects of variability: epigenetic regulation and genetic polymorphism, species/tissue/disease-specific DT abundances, and exogenous factors modulating DT activity. The variability data of each aspect are essential for clinical study, and a collective consideration among multiple aspects becomes essential in precision medicine. However, no database is constructed to provide the comprehensive data of all aspects of DT variability. Herein, the Variability of Drug Transporter Database (VARIDT) was introduced to provide such data. First, 177 and 146 DTs were confirmed, for the first time, by the transporting drugs approved and in clinical/preclinical, respectively. Second, for the confirmed DTs, VARIDT comprehensively collected all aspects of their variability (23,947 DNA methylations, 7,317 noncoding RNA/histone regulations, 1,278 genetic polymorphisms, differential abundance profiles of 257 DTs in 21,781 patients/healthy individuals, expression of 245 DTs in 67 tissues of human/model organism, 1,225 exogenous factors altering the activity of 148 DTs), which allowed mutual connection between any aspects. Due to huge amount of accumulated data, VARIDT made it possible to generalize characteristics to reveal disease etiology and optimize clinical treatment, and is freely accessible at: https://idrblab.org/varidt/.

    Our Publication(s) Describing This Database:

  1. T. T. Fu, F. C. Li, Y. Zhang, J. Y. Yin, W. Q. Qiu, X. D. Li, X. G. Liu, W. W. Xin, C. Z. Wang, L. S. Yu, J. Q. Gao, Q. C. Zheng*, S. Zeng*, F. Zhu*. VARIDT 2.0: structural variability of drug transporter. Nucleic Acids Research (impact factor of the publication year: 16.971, 生物一区 TOP 期刊). 50(D1): 1417-1431 (2022). PMID: 34747471.
  2. Media Coverage & News Report:

  3. J. Y. Yin, W. Sun, F. C. Li, J. J. Hong, X. X. Li, Y. Zhou, Y. J. Lu, M. Z. Liu, X. Zhang, N. Chen, X. P. Jin, J. Xue, S. Zeng*, L. S. Yu*, F. Zhu*. VARIDT 1.0: variability of drug transporter database. Nucleic Acids Research (impact factor of the publication year: 11.501, 生物一区 TOP 期刊). 48(D1): 1042-1050 (2020). PMID: 31495872.
  4. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 0.96% in 2022.
    • The Percentile in Subject Area shown in InCites™ was 0.33% in 2021.
    Media Coverage & News Report:

  INTEDE: Interactome of Drug-metabolizing Enzymes
    Database URL: https://idrblab.org/intede/

    Drug-metabolizing enzymes (DMEs) are critical determinant of drug safety and efficacy, and the interactome of DMEs has attracted extensive attention. There are 3 major interaction types in an interactome: microbiome-DME interaction (MICBIO), xenobiotics-DME interaction (XEOTIC), and host protein-DME interaction (HOSPPI). The interaction data of each type are essential for drug metabolism, and the collective consideration of multiple types has implication for the future practice of precision medicine. However, no database was designed to systematically provide the data of all types of DME interactions. Here, a database of the Interactome of Drug-Metabolizing Enzymes (INTEDE) was therefore constructed to offer these interaction data. First, 1,047 unique DMEs (448 host and 599 microbial) were confirmed, for the first time, using their metabolizing drugs. Second, for these newly confirmed DMEs, all types of their interactions (3,359 MICBIOs between 225 microbial species and 185 DMEs; 47,778 XEOTICs between 4,150 xenobiotics and 501 DMEs; 7,849 HOSPPIs between 565 human proteins and 566 DMEs) were comprehensively collected and then provided, which enabled the crosstalk analysis among multiple types. Because of the huge amount of accumulated data, the INTEDE made it possible to generalize key features for revealing disease etiology and optimizing clinical treatment. INTEDE is freely accessible at: https://idrblab.org/intede/.

    Our Publication(s) Describing This Database:

  1. J. Y. Yin, F. C. Li, Y. Zhou, M. J. Mou, Y. J. Lu, K. L. Chen, J. Xue, Y. C. Luo, J. B. Fu, X. He, J. Q. Gao, S. Zeng*, L. S. Yu*, F. Zhu*. INTEDE: interactome of drug-metabolizing enzymes. Nucleic Acids Research (impact factor of the publication year: 16.971, 生物一区 TOP 期刊). 49(D1): 1233-1243 (2021). PMID: 33045737.
  2. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 1.12% in 2022.
    Media Coverage & News Report:

  DrugMAP: Molecular Atlas and Pharma-Information of Drugs
    Database URL: https://idrblab.org/drugmap/

    The efficacy and safety of drugs are widely known to be determined by their interactions with multiple molecules of pharmacological importance, and it is therefore essential to systematically depict the molecular atlas and pharma-information of studied drugs. However, our understanding of such information is neither comprehensive nor precise, which necessitates the construction of a new database providing a network containing a large number of drugs and their interacting molecules. Here, a new database describing the molecular atlas and pharma-information of drugs (DrugMAP) was therefore constructed. It provides a comprehensive list of interacting molecules for >30 000 drugs/drug candidates, gives the differential expression patterns for >5000 interacting molecules among different disease sites, ADME (absorption, distribution, metabolism and excretion)-relevant organs and physiological tissues, and weaves a comprehensive and precise network containing >200 000 interactions among drugs and molecules. With the great efforts made to clarify the complex mechanism underlying drug pharmacokinetics and pharmacodynamics and rapidly emerging interests in artificial intelligence (AI)-based network analyses, DrugMAP is expected to become an indispensable supplement to existing databases to facilitate drug discovery. It is now fully and freely accessible at: https://idrblab.org/drugmap/.

    Our Publication(s) Describing This Database:

  1. F. C. Li, J. Y. Yin, M. K. Lu, M. J. Mou, Z. R. Li, Z. Y. Zeng, Y. Tan, S. S. Wang, X. Y. Chu, H. B. Dai, T. J. Hou, S. Zeng*, Y. Z. Chen*, F. Zhu*. DrugMAP: molecular atlas and pharma-information of drugs. Nucleic Acids Research (impact factor of the publication year: 19.160, 生物一区 TOP 期刊). doi: 10.1093/nar/gkac813 (2023). PMID: 36243961.
  2. Media Coverage & News Report:

  DRESIS: A Comprehensive Database for Drug Resistance Information
    Database URL: https://idrblab.org/dresis/

    Widespread drug resistance has become the key issue in global healthcare. Extensive efforts have been made to reveal not only diverse diseases experiencing drug resistance, but also the six distinct types of molecular mechanisms underlying this resistance. A database that describes a comprehensive list of diseases with drug resistance (not just cancers/infections) and all types of resistance mechanisms is now urgently needed. However, no such database has been available to date. In this study, a comprehensive database describing drug resistance information named ‘DRESIS’ was therefore developed. It was introduced to (i) systematically provide, for the first time, all existing types of molecular mechanisms underlying drug resistance, (ii) extensively cover the widest range of diseases among all existing databases and (iii) explicitly describe the clinically/experimentally verified resistance data for the largest number of drugs. Since drug resistance has become an ever-increasing clinical issue, DRESIS is expected to have great implications for future new drug discovery and clinical treatment optimization. It is now publicly accessible without any login requirement at: https://idrblab.org/dresis/.

    Our Publication(s) Describing This Database:

  1. X. N. Sun, Y. T. Zhang, H. Y. Li, Y. Zhou, S. Y. Shi, Z. Chen, X. He, H. Y. Zhang, F. C. Li, J. Y. Yin, M. J. Mou, Y. Z. Wang, Y. Q. Qiu, F. Zhu*. DRESIS: a comprehensive database for drug resistance information. Nucleic Acids Research (impact factor of the publication year: 19.160, 生物一区 TOP 期刊). doi: 10.1093/nar/gkac812 (2023). PMID: 36243960.
  GIMICA: Host Genetic and Immune Factors Shaping Human Microbiota
    Database URL: https://idrblab.org/gimica/

    Besides the environmental factors having tremendous impacts on the composition of microbial community, the host factors have recently gained extensive attentions on their roles in shaping human microbiota. There are two major types of host factors: host genetic factors (HGFs) and host immune factors (HIFs). These factors of each type are essential for defining the chemical and physical landscapes inhabited by microbiota, and the collective consideration of both types have great implication to serve comprehensive health management. However, no database was available to provide the comprehensive factors of both types. Herein, a database entitled ‘Host Genetic and Immune Factors Shaping Human Microbiota (GIMICA)’ was constructed. Based on the 4,257 microbes confirmed to inhabit nine sites of human body, 2,851 HGFs (1,368 single nucleotide polymorphisms (SNPs), 186 copy number variations (CNVs), and 1,297 non-coding ribonucleic acids (RNAs)) modulating the expression of 370 microbes were collected, and 549 HIFs (126 lymphocytes and phagocytes, 387 immune proteins, and 36 immune pathways) regulating the abundance of 455 microbes were also provided. All in all, GIMICA enabled the collective consideration not only between different types of host factor but also between the host and environmental ones, which is freely accessible without login requirement at: https://idrblab.org/gimica/.

    Our Publication(s) Describing This Database:

  1. J. Tang, X. L. Wu, M. J. Mou, C. Wang, L. D. Wang, F. C. Li, M. Y. Guo, J. Y. Yin, W. Q. Xie, X. N. Wang, Y. X. Wang, Y. B. Ding*, W. W. Xue*, F. Zhu*. GIMICA: host genetic and immune factors shaping human microbiota. Nucleic Acids Research (impact factor of the publication year: 16.971, 生物一区 TOP 期刊). 49(D1): 715-722 (2021). PMID: 33045729.
  2. Media Coverage & News Report:

  CovInter: Interaction Data between Coronavirus RNAs and Host Proteins
    Database URL: https://idrblab.org/covinter/

    Coronavirus has brought about three massive outbreaks in the past two decades. Each step of its life cycle invariably depends on the interactions among virus and host molecules. The interaction between virus RNA and host protein (IVRHP) is unique compared to other virus–host molecular interactions and represents not only an attempt by viruses to promote their translation/replication, but also the host's endeavor to combat viral pathogenicity. In other words, there is an urgent need to develop a database for providing such IVRHP data. In this study, a new database was therefore constructed to describe the interactions between coronavirus RNAs and host proteins (CovInter). This database is unique in (a) unambiguously characterizing the interactions between virus RNA and host protein, (b) comprehensively providing experimentally validated biological function for hundreds of host proteins key in viral infection and (c) systematically quantifying the differential expression patterns (before and after infection) of these key proteins. Given the devastating and persistent threat of coronaviruses, CovInter is highly expected to fill the gap in the whole process of the ‘molecular arms race’ between viruses and their hosts, which will then aid in the discovery of new antiviral therapies. It's now free and publicly accessible at: https://idrblab.org/covinter/.

    Our Publication(s) Describing This Database:

  1. K. Amahong, W. Zhang, Y. Zhou, S. Zhang, J. Y. Yin, F. C. Li, H. Q. Xu, T. C, Yan, Z. X. Yue, Y. H. Liu, T. J. Hou, Y. Q. Qi, L. Tao*, L. Y. Han*, F. Zhu*. CovInter: interaction data between coronavirus RNAs and host proteins. Nucleic Acids Research (impact factor of the publication year: 19.160, 生物一区 TOP 期刊). doi: 10.1093/nar/gkac834 (2023). PMID: 36200814.
  2. Media Coverage & News Report:

  SYNBIP: SYNthetic BInding Proteins for Research, Diagnosis and Therapy
    Database URL: https://idrblab.org/synbip/

    The success of protein engineering and design has extensively expanded the protein space, which presents a promising strategy for creating next-generation proteins of diverse functions. Among these proteins, the synthetic binding proteins (SBPs) are smaller, more stable, less immunogenic, and better of tissue penetration than others, which make the SBP-related data attracting extensive interest from worldwide scientists. However, no database has been developed to systematically provide the valuable information of SBPs yet. In this study, a database named ‘Synthetic Binding Proteins for Research, Diagnosis, and Therapy (SYNBIP)’ was thus introduced. This database is unique in (a) comprehensively describing thousands of SBPs from the perspectives of scaffolds, biophysical & functional properties, etc.; (b) panoramically illustrating the binding targets & the broad application of each SBP; and (c) enabling a similarity search against the sequences of all SBPs and their binding targets. Since SBP is a human-made protein that has not been found in nature, the discovery of novel SBPs relied heavily on experimental protein engineering and could be greatly facilitated by in-silico studies (such as AI and computational modeling). Thus, the data provided in SYNBIP could lay a solid foundation for the future development of novel SBPs. The SYNBIP is accessible without login requirement at both official (https://idrblab.org/synbip/) and mirror (http://synbip.idrblab.net/) sites.

    Our Publication(s) Describing This Database:

  1. X. N. Wang, F. C. Li, W. Q. Qiu, B. B. Xu, Y. L. Li, X. C. Lian, H. Y. Yu, Z. Zhang, J. X. Wang, Z. R. Li, W. W. Xue*, F. Zhu*. SYNBIP: synthetic binding proteins for research, diagnosis and therapy. Nucleic Acids Research (impact factor of the publication year: 16.971, 生物一区 TOP 期刊). 50(D1): 560-570 (2022). PMID: 34664670.
  2. Media Coverage & News Report:

  NPCDR: Natural Product-based Drug Combination and Its Disease-specific Molecular Regulation
    Database URL: https://idrblab.org/npcdr/

    Natural product (NP) has a long history in promoting modern drug discovery, which has derived or inspired a large number of currently prescribed drugs. Recently, the NPs have emerged as the ideal candidates to combine with other therapeutic strategies to deal with the persistent challenge of conventional therapy, and the molecular regulation mechanism underlying these combinations is crucial for the related communities. Thus, it is urgently demanded to comprehensively provide the disease-specific molecular regulation data for various NP-based drug combinations. However, no database has been developed yet to describe such valuable information. In this study, a newly developed database entitled ‘Natural Product-based Drug Combination and Its Disease-specific Molecular Regulation (NPCDR)’ was thus introduced. This database was unique in (a) providing the comprehensive information of NP-based drug combinations & describing their clinically or experimentally validated therapeutic effect, (b) giving the disease-specific molecular regulation data for a number of NP-based drug combinations, (c) fully referencing all NPs, drugs, regulated molecules/pathways by cross-linking them to the available databases describing their biological or pharmaceutical characteristics. Therefore, NPCDR is expected to have great implications for the future practice of network pharmacology, medical biochemistry, drug design, and medicinal chemistry. This database is now freely accessible without any login requirement at both official (https://idrblab.org/npcdr/) and mirror (http://npcdr.idrblab.net/) sites.

    Our Publication(s) Describing This Database:

  1. X. N. Sun, Y. T. Zhang, Y. Zhou, X. C. Lian, L. L. Yan, T. Pan, T. Jin, H. Xie, Z. M. Liang, W. Q. Qiu, J. X. Wang, Z. R. Li, F. Zhu*, X. B. Sui*. NPCDR: natural product-based drug combination and its disease-specific molecular regulation. Nucleic Acids Research (impact factor of the publication year: 16.971, 生物一区 TOP 期刊). 50(D1): 1324-1333 (2022). PMID: 34664659.
  REGLIV: Molecular regulation data of diverse living systems facilitating current multiomics research
    Database URL: https://idrblab.org/regliv/

    Multiomics is a powerful technique in molecular biology that facilitates the identification of new associations among different molecules (genes, proteins & metabolites). It has attracted tremendous research interest from the scientists worldwide and has led to an explosive number of published studies. Most of these studies are based on the regulation data provided in available databases. Therefore, it is essential to have molecular regulation data that are strictly validated in the living systems of various cell lines and in vivo models. However, no database has been developed yet to provide comprehensive molecular regulation information validated by living systems. Herein, a new database, Molecular Regulation Data of Living System Facilitating Multiomics Study (REGLIV) is introduced to describe various types of molecular regulation tested by the living systems. (1) A total of 2996 regulations describe the changes in 1109 metabolites triggered by alterations in 284 genes or proteins, and (2) 1179 regulations describe the variations in 926 proteins induced by 125 endogenous metabolites. Overall, REGLIV is unique in (a) providing the molecular regulation of a clearly defined regulatory direction other than simple correlation, (b) focusing on molecular regulations that are validated in a living system not simply in an in vitro test, and (c) describing the disease/tissue/species specific property underlying each regulation. Therefore, REGLIV has important implications for the future practice of not only multiomics, but also other fields relevant to molecular regulation. REGLIV is freely accessible at: https://idrblab.org/regliv/.

    Our Publication(s) Describing This Database:

  1. S. Zhang, X. N. Sun, M. J. Mou, K. Amahong, H. C. Sun, W. Zhang, S. Y. Shi, Z. R. Li, J. Q. Gao, F. Zhu*. REGLIV: molecular regulation data of diverse living systems facilitating current multiomics research. Computers in Biology and Medicine (impact factor of the publication year: 6.698, 工程技术二区期刊). 148: 105825 (2022). PMID: 35872412

IDRB: Softwares

SOFTWARE DEVELOPMENT

Our experiences on software development have led to several pharmacoinformatics servers as follows:

  ANPELA: analysis and performance-assessment of the label-free proteome quantification
    Server URL: https://idrblab.org/anpela/

    Label-free quantification (LFQ) with a specific and sequentially integrated workflow of acquisition technique, quantification tool and processing method has emerged as the popular technique employed in metaproteomic research to provide a comprehensive landscape of the adaptive response of microbes to external stimuli and their interactions with other organisms or host cells. The performance of a specific LFQ workflow is highly dependent on the studied data. Hence, it is essential to discover the most appropriate one for a specific data set. However, it is challenging to perform such discovery due to the large number of possible workflows and the multifaceted nature of the evaluation criteria. Herein, a web server ANPELA (https://idrblab.org/anpela/) was developed and validated as the first tool enabling performance assessment of whole LFQ workflow (collective assessment by five well-established criteria with distinct underlying theories), and it enabled the identification of the optimal LFQ workflow(s) by a comprehensive performance ranking. ANPELA not only automatically detects the diverse formats of data generated by all quantification tools but also provides the most complete set of processing methods among the available web servers and stand-alone tools. Systematic validation using metaproteomic benchmarks revealed ANPELA's capabilities in 1 discovering well-performing workflow(s), (2) enabling assessment from multiple perspectives and (3) validating LFQ accuracy using spiked proteins. ANPELA has a unique ability to evaluate the performance of whole LFQ workflow and enables the discovery of the optimal LFQs by the comprehensive performance ranking of all 560 workflows. Therefore, it has great potential for applications in metaproteomic and other studies requiring LFQ techniques, as many features are shared among proteomic studies.

    Our Publication(s) Describing This Server:

  1. J. Tang, J. B. Fu, Y. X. Wang, B. Li, Y. H. Li, Q. X. Yang, X. J. Cui, J. J. Hong, X. F. Li, Y. Z. Chen, W. W. Xue, F. Zhu*. ANPELA: analysis and performance-assessment of the label-free quantification workflow for metaproteomic studies. Briefings in Bioinformatics (impact factor of the publication year: 11.622, 生物一区 TOP 期刊). 21(2): 621-636 (2020). PMID: 30649171.
  2. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 0.20% in 2022.
    • The Percentile in Subject Area shown in InCites™ was 0.05% in 2021.

  3. J. Tang, J. B. Fu, Y. X. Wang, Y. C. Luo, Q. X. Yang, B. Li, G. Tu, J. J. Hong, X. J. Cui, Y. Z. Chen, L. X. Yao, W. W. Xue, F. Zhu*. Simultaneous improvement in the precision, accuracy and robustness of label-free proteome quantification by optimizing data manipulation chains. Molecular & Cellular Proteomics (impact factor of the publication year: 5.236, 生物二区 TOP 期刊). 18(8): 1683-1699 (2019). PMID: 31097671.
  4. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 1.12% in 2022.
    • The Percentile in Subject Area shown in InCites™ was 0.81% in 2021.
    • The Percentile in Subject Area shown in InCites™ was 0.57% in 2020.

  5. Y. Zhang, H. C. Sun, X. C. Lian, J. Tang, F. Zhu*. ANPELA: significantly enhanced quantification tool for cytometry-based single-cell proteomics. Advanced Science (impact factor of the publication year: 17.521, 工程技术一区 TOP 期刊). accepted: 10.1002/advs.202207061 (2023). PMID: 36950745.

  CNN-T4SE: CNN-based annotation of bacterial type IV Secretion system effectors
    Server URL: https://idrblab.org/cnnt4se/

    The type IV bacterial secretion system (SS) is reported to be one of the most ubiquitous SSs in nature, and can induce serious conditions by secreting type IV SS effectors (T4SEs) into the host cells. Recent studies mainly focus on annotating new T4SE from the huge amount of sequencing data, and various computational tools are therefore developed to accelerate T4SE annotation. However, these tools are reported as heavily dependent on the selected methods and their annotation performance need to be further enhanced. Herein, a convolution neural network (CNN) technique was used to annotate T4SEs by integrating multiple protein encoding strategies. First, the annotation accuracies of nine encoding strategies integrated with CNN were assessed and compared with that of the popular T4SE annotation tools based on independent benchmark. Second, false discovery rates (FDRs) of various models were systematically evaluated by (1) scanning the genome of Legionella pneumophila subsp. ATCC 33152 and (2) predicting the real-world non-T4SEs validated using published experiments. Based on above analyses, the encoding strategies, (a) position-specific scoring matrix (PSSM), (b) protein secondary structure & solvent accessibility (PSSSA) and (c) one-hot encoding scheme (Onehot), were identified as well-performing when integrated CNN. Finally, a novel strategy that collectively considering the three well-performing models (CNN-PSSM, CNN-PSSSA and CNN-Onehot) was proposed, and a new tool (CNN-T4SE, https://idrblab.org/cnnt4se/) was constructed to facilitate T4SE annotation. All in all, this study conducted a comprehensive analysis on the performance of a collection of encoding strategies when integrated with CNN, which could facilitate the suppression of T4SS in infection and limit the spread of antimicrobial resistance.

    Our Publication(s) Describing This Server:

  1. J. J. Hong, Y. C. Luo, M. J. Mou, J. B. Fu, Y. Zhang, W. W. Xue, T. Xie, L. Tao*, Y. Lou*, F. Zhu*. Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery. Briefings in Bioinformatics (impact factor of the publication year: 11.622, 生物一区 TOP 期刊). 21(5): 1825-1836 (2020). PMID: 31860715.
  2. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 1.18% in 2022.

  ConSIG: consistent discovery of molecular signature from OMIC data.
    Server URL: https://idrblab.org/consig/

    The discovery of proper molecular signature from OMIC data is indispensable for determining biological state, physiological condition, disease etiology, and therapeutic response. However, the identified signature is reported to be highly inconsistent, and there is little overlap among the signatures identified from different biological datasets. Such inconsistency raises doubts about the reliability of reported signatures and significantly hampers its biological and clinical applications. Herein, an online tool, ConSIG, was constructed to realize consistent discovery of gene/protein signature from any uploaded transcriptomic/proteomic data. This tool is unique in a) integrating a novel strategy capable of significantly enhancing the consistency of signature discovery, b) determining the optimal signature by collective assessment, and c) confirming the biological relevance by enriching the disease/gene ontology. With the increasingly accumulated concerns about signature consistency and biological relevance, this online tool is expected to be used as an essential complement to other existing tools for OMIC-based signature discovery. ConSIG is freely accessible to all users without login requirement at https://idrblab.org/consig/.

    Our Publication(s) Describing This Server:

  1. F. C. Li, J. Y. Yin, M. K. Lu, Q. X. Yang, Z. Y. Zeng, B. Zhang, Z. R. Li, Y. Q. Qiu, H. B. Dai, Y. Z. Chen*, F. Zhu*. ConSIG: consistent discovery of molecular signature from OMIC data. Briefings in Bioinformatics (impact factor of the publication year: 13.994, 生物一区 TOP 期刊). 23(4): bbac253 (2022). PMID: 35758241
  MMEASE: Meta-metabolomics by enhanced annotation, marker selection and enrichment
    Server URL: https://idrblab.org/mmease/

    Large-scale and long-term metabolomic studies have attracted widespread attention in the biomedical studies yet remain challenging despite recent technique progresses. In particular, the ineffective way of experiment integration and limited capacity in metabolite annotation are known issues. Herein, we constructed an online tool MMEASE enabling the integration of multiple analytical experiments with an enhanced metabolite annotation and enrichment analysis (https://idrblab.org/mmease/). MMEASE was unique in capable of (1) integrating multiple analytical blocks; (2) providing enriched annotation for >330 thousands of metabolites; (3) conducting enrichment analysis using various categories/sub-categories. All in all, MMEASE aimed at supplying a comprehensive service for long-term and large-scale metabolomics, which might provide valuable guidance to current biomedical studies.

    Our Publication(s) Describing This Server:

  1. Q. X. Yang, B. Li, S. J. Chen, J. Tang, Y. H. Li, Y. Li, S. Zhang, C. Shi, Y. Zhang, M. J. Mou, W. W. Xue*, F. Zhu*. MMEASE: online meta-analysis of metabolomic data by enhanced metabolite annotation, marker selection and enrichment analysis. Journal of Proteomics (impact factor of the publication year: 4.044, 生物二区期刊). 232: 104023 (2021). PMID: 33130111.
  2. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 2.51% in 2022.

  MetaFS: performance assessment for biomarker discovery in metaproteomics.
    Server URL: https://idrblab.org/metafs/

    Metaproteomic data suffer from two unavoidable issues: dimensionality and sparsity. Data reduction methods can maximally identify the relevant subset of significant differential features and reduce data redundancy. Feature selection (FS) approaches were often applied to obtain the significant differential subset. So far, a variety of feature selection have been developed for metaproteomic study. However, due to FS’s performance depended heavily on the data characteristics of a given research, the well-suitable feature selection method must be carefully chosen for obtaining the reliable and reproducibly results of analyses. Moreover, it is critical to evaluate the performances of each FS method according to comprehensive criteria, because single criterion is not sufficient to reflect the overall level of the FS method. Therefore, we constructed the online tool named MetaFS, which provided 13 types of FS methods and conduct the comprehensive evaluation on the complex FS methods using four widely accepted and independent criteria. Furthermore, the function and reliability of MetaFS were systematically tested and validated via two case studies. In summary, MetaFS could be a distinguished tool discovering the overall well-performed FS method for selecting the potential biomarkers in microbiome studies. The online tool is freely available at https://idrblab.org/metafs/.

    Our Publication(s) Describing This Server:

  1. J. Tang, M. J. Mou, Y. X. Wang, Y. C. Luo, F. Zhu*. MetaFS: performance assessment of biomarker discovery in metaproteomics. Briefings in Bioinformatics (impact factor of the publication year: 13.994, 生物一区 TOP 期刊). 22(3): bbaa105 (2021). PMID: 32510556.
  NOREVA: NORmalization and EVAluation of MS-based metabolomics data
    Server URL: https://idrblab.org/noreva/

    Diverse forms of unwanted signal variations in mass spectrometry-based metabolomics data adversely affect the accuracies of metabolic profiling. A variety of normalization methods have been developed for addressing this problem. However, their performances vary greatly and depend heavily on the nature of the studied data. Moreover, given the complexity of the actual data, it is not feasible to assess the performance of methods by single criterion. We therefore developed NOREVA to enable performance evaluation of various normalization methods from multiple perspectives. NOREVA integrated five well-established criteria (each with a distinct underlying theory) to ensure more comprehensive evaluation than any single criterion. It provided the most complete set of the available normalization methods, with unique features of removing overall unwanted variations based on quality control metabolites and allowing quality control samples based correction sequentially followed by data normalization. The originality of NOREVA and the reliability of its algorithms were extensively validated by case studies on five benchmark datasets. In sum, NOREVA is distinguished for its capability of identifying the well performed normalization method by taking multiple criteria into consideration and can be an indispensable complement to other available tools. NOREVA can be freely accessed at http://server.idrb.cqu.edu.cn/noreva/.

    Our Publication(s) Describing This Server:

  1. J. B. Fu, Y. Zhang, Y. X. Wang, H. N. Zhang, J. Liu, J. Tang, Q. X. Yang, H. C. Sun, W. Q. Qiu, Y. H. Ma, Z. R. Li, M. Y. Zheng, F. Zhu*. Optimization of metabolomic data processing using NOREVA. Nature Protocols (impact factor of the publication year: 17.021, 生物一区 TOP 期刊). 17(1): 129-151 (2022). PMID: 34952956.
  2. Featured Article: Media Coverage & News Report:

  3. Q. X. Yang, Y. X. Wang, Y. Zhang, F. C. Li, W. Q. Xia, Y. Zhou, Y. Q. Qiu, H. L. Li, F. Zhu*. NOREVA: enhanced normalization and evaluation of time-course and multi-class metabolomic data. Nucleic Acids Research (impact factor of the publication year: 16.971, 生物一区 TOP 期刊). 48(W1): 436-448 (2020). PMID: 32324219.
  4. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 1.60% in 2022.
    Media Coverage & News Report:

  5. B. Li, J. Tang, Q. X. Yang, S. Li, X. J. Cui, Y. H. Li, Y. Z. Chen, W. W. Xue, X. F. Li, F. Zhu*. NOREVA: normalization and evaluation of MS-based metabolomics data. Nucleic Acids Research (impact factor of the publication year: 10.162, 生物一区 TOP 期刊). 45(W1): 162-170 (2017). PMID: 28525573.
  6. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 0.77% in 2022.
    • The Percentile in Subject Area shown in InCites™ was 0.71% in 2021.
    • The Percentile in Subject Area shown in InCites™ was 0.75% in 2020.
    • The Percentile in Subject Area shown in InCites™ was 1.27% in 2019.
    • The Percentile in Subject Area shown in InCites™ was 2.98% in 2018.
    Highlights by Experts in Subject Area:
    • Introduced by OMICTOOLS as "provided valuable guidance to the selection of suitable algorithm in metabolomics".
    • Discussed in StackExchange as "works fine" and "corrections for batches without QC options".

  7. Q. X. Yang, J. J. Hong, Y. Li, W. W. Xue, S. Li*, H. Yang*, F. Zhu*. A novel bioinformatics approach to identify the consistently well-performing normalization strategy for current metabolomic studies. Briefings in Bioinformatics (impact factor of the publication year: 11.622, 生物一区 TOP 期刊). 21(6): 2142-2152 (2020). PMID: 31776543.
  8. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 2.69% in 2022.

  9. B. Li, J. Tang, Q. X. Yang, X. J. Cui, S. Li, S. J. Chen, Q. X. Cao, W. W. Xue, N. Chen, F. Zhu*. Performance evaluation and online realization of data-driven normalization methods used in LC/MS based untargeted metabolomics analysis. Scientific Reports (impact factor of the publication year: 5.228, 综合性二区期刊). 6: 38881 (2016). PMID: 27958387.
  10. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 2.25% in 2022.
    • The Percentile in Subject Area shown in InCites™ was 1.83% in 2021.
    • The Percentile in Subject Area shown in InCites™ was 1.87% in 2020.
    • The Percentile in Subject Area shown in InCites™ was 2.05% in 2019.

  POSREG: proteomic signature discovered by simultaneously optimizing its reproducibility and generalizability.
    Server URL: https://idrblab.org/posreg/

    Mass spectrometry-based proteomic technique has become indispensable in current exploration of complex and dynamic biological processes. Instrument development has largely ensured the effective production of proteomic data, which necessitates commensurate advances in statistical framework to discover the optimal proteomic signature. Current framework mainly emphasizes the generalizability of the identified signature in predicting the independent data but neglects the reproducibility among signatures identified from independently repeated trials on different sub-dataset. These problems seriously restricted the wide application of the proteomic technique in molecular biology and other related directions. Thus, it is crucial to enable the generalizable and reproducible discovery of the proteomic signature with the subsequent indication of phenotype association. However, no such tool has been developed and available yet. Herein, an online tool, POSREG, was therefore constructed to identify the optimal signature for a set of proteomic data. It works by (i) identifying the proteomic signature of good reproducibility and aggregating them to ensemble feature ranking by ensemble learning, (ii) assessing the generalizability of ensemble feature ranking to acquire the optimal signature and (iii) indicating the phenotype association of discovered signature. POSREG is unique in its capacity of discovering the proteomic signature by simultaneously optimizing its reproducibility and generalizability. It is now accessible free of charge without any registration or login requirement at https://idrblab.org/posreg/.

    Our Publication(s) Describing This Server:

  1. F. C. Li, Y. Zhou, Y. Zhang, J. Y. Yin, Y. Q. Qiu, J. Q. Gao, F. Zhu*. POSREG: proteomic signature discovered by simultaneously optimizing its reproducibility and generalizability. Briefings in Bioinformatics (impact factor of the publication year: 13.994, 生物一区 TOP 期刊). 23(2): bbac040 (2022). PMID: 35183059.
  PROFEAT: calculation of the protein physicochemical features
    Server URL: https://idrblab.org/profeat/

    The studies of biological, disease, and pharmacological networks are facilitated by the systems-level investigations using computational tools. In particular, the network descriptors developed in other disciplines have found increasing applications in the study of the protein, gene regulatory, metabolic, disease, and drug-targeted networks. Facilities are provided by the public web servers for computing network descriptors, but many descriptors are not covered, including those used or useful for biological studies. We upgraded the PROFEAT web server http://bidd2.nus.edu.sg/cgi-bin/profeat2016/main.cgi for computing up to 329 network descriptors and protein-protein interaction descriptors. PROFEAT network descriptors comprehensively describe the topological and connectivity characteristics of unweighted (uniform binding constants and molecular levels), edge-weighted (varying binding constants), node-weighted (varying molecular levels), edge-node-weighted (varying binding constants and molecular levels), and directed (oriented processes) networks. The usefulness of the network descriptors is illustrated by the literature-reported studies of the biological networks derived from the genome, interactome, transcriptome, metabolome, and diseasome profiles.

    Our Publication(s) Describing This Server:

  1. H. B. Rao&, F. Zhu&, G. B. Yang, Z. R. Li*, Y. Z. Chen. Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Research (impact factor of the publication year: 7.836, 生物一区 TOP 期刊). 39(W1): 385-390 (2011). PMID: 21609959.
  SSIZER: determining the sample sufficiency for comparative biological study
    Server URL: https://idrblab.org/ssizer/

    Comparative biomedical studies typically require plenty of samples to achieve statistically significant analysis. A frequently-encountered question is how many samples are sufficient for a particular study. This question has been traditionally assessed using the statistical power, but this assessment alone may not guarantee the full and reproducible discovery of markers truly discriminating biological groups (BMC Bioinformatics. 11: 447, 2010; Nat Rev Neurosci. 14: 365-76, 2013). Two novel types of statistical indexes have thus been introduced to assess the sample size from different perspectives by considering the diagnostic accuracy (Metabolomics. 9: 280-99, 2013) and robustness (Cancer Res. 74: 4612-21, 2014). Due to the complementary nature of these index-types, a comprehensive evaluation based on all types of indexes is necessary for more accurate assessment. However, no such tool is available yet. Herein, an online tool SSizer was developed and validated to enable the assessment of the sufficiency of a user-input biomedical dataset for given studies, and three index-types were provided for the first time to achieve the comprehensive assessment. These indexes included: (I) statistical power analyzing the level of difference between two comparative groups (Radiology. 227: 309-13, 2003), (II) overall diagnostic & classification accuracies on independent data (Metabolomics. 9: 280-99, 2013), and (III) robustness among the lists of biomarkers identified from different datasets (Cancer Res. 74:4612-21, 2014). Moreover, a sample simulation based on user-input data was performed to expand data and then determine the sample size required for given study (Anal Chem. 88: 5179-88, 2016). In sum, SSizer was unique for its capacity in comprehensively evaluating whether sample size was sufficient and determining the required number of samples for user-input dataset, which can therefore facilitate current biomedical studies including metabolomics, proteomics, and so on. SSizer is accessible free of charge at https://idrblab.org/ssizer/

    Our Publication(s) Describing This Server:

  1. F. C. Li, Y. Zhou, X. Y. Zhang, J. Tang, Q. X. Yang, Y. Zhang, Y. C. Luo, J. Hu*, W. W. Xue, Y. Q. Qiu, Q. J. He, B. Yang, F. Zhu*. SSizer: determining the sample sufficiency for comparative biological study. Journal of Molecular Biology (impact factor of the publication year: 5.067, 生物二区 TOP 期刊). 432(11): 3411-3421 (2020). PMID: 32044343.
  SVM-Prot: SVM-based Protein functional family prediction
    Server URL: https://idrblab.org/svmprot/

    Knowledge of protein function is important for biological, medical and therapeutic studies, but many proteins are still unknown in function. There is a need for more improved functional prediction methods. Our SVM-Prot web-server employed a machine learning method for predicting protein functional families from protein sequences irrespective of similarity, which complemented those similarity-based and other methods in predicting diverse classes of proteins including the distantly-related proteins and homologous proteins of different functions. Since its publication in 2003, we made major improvements to SVM-Prot with (1) expanded coverage from 54 to 192 functional families, (2) more diverse protein descriptors protein representation, (3) improved predictive performances due to the use of more enriched training datasets and more variety of protein descriptors, (4) newly integrated BLAST analysis option for assessing proteins in the SVM-Prot predicted functional families that were similar in sequence to a query protein, and (5) newly added batch submission option for supporting the classification of multiple proteins. Moreover, 2 more machine learning approaches, K nearest neighbor and probabilistic neural networks, were added for facilitating collective assessment of protein functions by multiple methods. SVM-Prot can be accessed at http://bidd2.nus.edu.sg/cgi-bin/svmprot/svmprot.cgi.

    Our Publication(s) Describing This Server:

  1. Y. H. Li, J. Y. Xu, L. Tao, X. F. Li, S. Li, X. Zeng, S. Y. Chen, P. Zhang, C. Qin, C. Zhang, Z. Chen, F. Zhu*, Y. Z. Chen. SVM-Prot 2016: a web-server for machine learning prediction of protein functional families from sequence irrespective of similarity. PLoS ONE (impact factor of the publication year: 3.234, 生物三区期刊). 11(8): e0155290 (2016). PMID: 27525735.
  2. ESI Highly Cited Paper:
    • The Percentile in Subject Area shown in InCites™ was 1.67% in 2022.
    • The Percentile in Subject Area shown in InCites™ was 1.31% in 2021.
    • The Percentile in Subject Area shown in InCites™ was 1.41% in 2020.

IDRB: Innovative Drug Research and Bioinformatics Group


All rights are reserved by: Innovative Drug Research and Bioinformatics Group (IDRB)
College of Pharmaceutical Sciences, Zhejiang University
Hangzhou, P.R. China, 310058.
Contact number: (86 - 571)88208444
ICP ID: 浙ICP备18036997号-1

Last Update: