Ontologies and Corpora

Ontologies and Corpora

 

Ontologies and corpora developed by the Bibliome group and the Biosys group:

Ontologies

  • Bacterial interlocked Process ONtology (BiPON)

Background

High-throughput technologies produce huge amounts of heterogeneous data at all cellular levels. In parallel, dramatic progresses on the understanding of molecular mechanisms involved in the adaptation of the cell to environmental changes have been achieved. Structuring these data and biological knowledge requires the development of integrative tools and methods to share and extract valuable information. Bio-ontologies are usually highly suitable to tackle this two interlaced integration problem since they can intrinsically formalize and organize different levels and sources of knowledge, information and data. The challenge is then to have an ontology that could embrace all cellular levels from a single molecule to a high-level cellular process and connect these entities to omics data, sequence information and finally parameters (reaction rate, association constant, etc.). Such an ontology does not currently exist.

In collaboration with the LRI, we developed the Bacterial interlocked Process ONtology (BiPON), an ontology permitting a multi-scale systemic representation of bacterial cellular processes and the coupling to their mathematical models. BiPON is further composed of two sub-ontologies, bioBiPON and modelBiPON. bioBiPON aims at organizing the systemic description of biological information while modelBiPON aims at describing the mathematical models (including parameters) associated to each biological process. As a proof of concept, we deploy BiPON on the description on the whole translation process. Automatic reasoning using bridge rules on specific classes then relates these two sub-ontologies. By doing so, biological processes are then automatically related to their mathematical models integrating specific parameters. 41% of BiPON classes have been imported from different well-established bio-ontologies while the others have been manually defined and curated. Currently, BiPON integrates the main bacterial gene expression processes. These processes are representative enough to regroup most of the difficulties in the formal knowledge description.

The knowledge formalization included in BiPON is highly flexible and generic. Most of the known cellular processes, new participants or other knowledge resources could be inserted in BiPON, and then linked to mathematical models if any. Altogether, BiPON opens up promising perspectives for knowledge integration and sharing, and could be used by various communities such as biologists, and system and computational biologists, and the emerging community of whole-cell modeling.

 

Download

BiPON is distributed under the license Creative Commons Attribution 4.0 (CC-by; https://creativecommons.org/licenses/by/4.0/) and can be downloaded here.

In addition, a toy ontology that is representative of BIPON and that permits to investigate automatic reasoning and SWRL rules can be downloaded here.

 

References

  1. Vincent Henry, Anne Goelzer, Arnaud Ferré,  Stephan Fischer, Marc Dinh, Valentin Loux, Christine Froidevaux,  Vincent Fromion. The Bacterial interlocked Process ONtology (BiPON): a systemic multi-scale unified representation of biological processes in prokaryotes.  Journal of Biomedical Semantics,  8(1):53, 2017. https://doi.org/10.1186/s13326-017-0165-6
  2.  Vincent Henry, Arnaud Ferré, Christine Froidevaux, Anne Goelzer, Vincent Fromion, Sarah Cohen-Boulakia, Sandra Dérozier, Marc Dinh, Ghislain Fiévet, Stephan Fischer, J.-François Gibrat, Valentin Loux, Sabine Peres. Représentation systémique multi-échelle des processus biologiques de la bactérie. Ingénierie des connaissances, Montpellier, 2016.

  • Bacterial interlocked Process Ontology for metabolism (BiPOm)

Background

Managing and organizing biological knowledge remains a major challenge due to the complexity and the level of sophisticity of living systems. Recently, systemic representations were shown to be promising to tackle such challenge at the whole-cell scale. In such representations, the cell is considered as a system composed of interlocked subsystems. The question is now to develop relevant tools to formalize the systemic description of cells.


In collaboration with the LRI, AgroParistech and GQE, we introduce BiPOm, an ontology  describing metabolic processes as interlocked subsystems using a minimal set of classes and properties. We explicitly formalized the relations between the enzyme, its activity, the substrates and the products of the reaction, as well as the active state of all involved molecules, using Description Logics language. We further showed that the information of molecules such as molecular types or molecular properties can be deduced using SWRL rules and automatic reasoning on instances of BiPOm. The information necessary to instantiate BiPOm can be extracted from existing databases or existing bio-ontologies. Altogether, this results in a paradigm shift where the anchorage of knowledge is rerouted from the molecule to the biological process.

Download

BiPOm is distributed under the license Creative Commons Attribution 4.0 (CC-by; https://creativecommons.org/licenses/by/4.0/) and can be downloaded here.

 

References

  1.  Vincent Henry, Fatiha Saïs, Elodie Marchadier, Juliette Dibie, Anne Goelzer, Vincent Fromion. BiPOm: Biological interlocked Process Ontology for metabolism. How to infer molecule knowledge from biological process?. In International Conference on Biomedical Ontology (ICBO), Newcastle, England, 2017.

  • WheatPhenotype Ontology

WheatPhenotype describes bread wheat phenotypes (Triticum aestivum) and environmental factors that influence them. Traits include resistance, development, nutrition, and bread quality. Environmental factors include biotic and abiotic traits. 

References

  1. Dialekti Valsamou, Robert Bossy, Marion Ranoux, Wiktoria Golik, Pierre Sourdille, Claire Nédellec. "Extraction d’information pour la sélection du blé par marqueur génétique". Actes de l'atelier IN-OVIVE 2ème édition des 25èmes Journées francophones d'Ingénierie des Connaissances, Clermont Ferrand, 14 mai 2014.
  2. Claire Nédellec, Robert Bossy, Dialekti Valsamou, Marion Ranoux, Wiktoria Golik, Pierre Sourdille. Information Extraction from Bibliography for Marker Assisted Selection in Wheat. In proceedings of Metadata and Semantics for Agriculture, Food & Environment (AgroSEM'14), special track of the 8th Metadata and Semantics Research Conference (MTSR’14), Springer Communications in Computer and Information Science, Series Volume 478, Karlsruhe, pp 301-313, Allemagne, 2014. DOI: 10.1007/978-3-319-13674-5_28
  3. Bossy et C. Nédellec. SamBlé. Moteur de recherche bibliographique sur la Sélection du blé assistée par marqueur. Projet FSOV Sélection du Blé Assistée par Marqueur.

OntoBiotope describes all types of microorganism habitats. The  BioNLP-ST'16 version of the ontology contains more than 2,000 concepts. OntoBiotope is used for the annotation of the corpus of the BioNLP-ST'11, 13 and 16 Bacteria Biotope tasks and the indexing of the PubMed Biotope semantic search engine and the  PubMed Biotope DataBase. Its is distributed by AgroPortal and LovINRA.

References

  1. Louise Deléger, Robert Bossy, Estelle Chaix, Mouhamadou Ba, Arnaud Ferré, Philippe Bessières, Claire Nédellec, Overview of the Bacteria Biotope Task at BioNLP Shared Task,  In Proceedings of the BioNLP Shared Task 2016 Workshop, Association for Computational Linguistics, Berlin, Germany 2016.
  2. Robert Bossy, Wiktoria Golik, Zorana Ratkovic, Dialekti Valsamou, Philippe Bessières, Claire Nédellec. An Overview of the  Gene Regulation Network and the Bacteria Biotope Tasks in BioNLP’13. BMC Bioinformatics, juillet 2015. 
  3. Robert Bossy, Julien Jourde, Alain-Pierre Manine, Philippe Veber, Erick Alphonse, Maarten van de Guchte, Philippe Bessières, Claire Nédellec. BioNLP Shared Task - The Bacteria Track. BMC Bioinformatics, (Suppl 11):S3, juin 2012.
  4. Bossy R., Golik W., Ratkovic Z., Bessières P., Nédellec C.. BioNLP shared Task 2013 - An Overview of the Bacteria Biotope Task. In Proceedings of the BioNLP 2013 Workshop, Association for Computational Linguistics, pages 74-82. Sofia, Bulgaria, 2013.
  5. Zorana Ratkovic, Wiktoria Golik, Pierre Warnier. BioNLP 2011 Task Bacteria Biotope - The Alvis System. BMC Bioinformatics 13(Suppl 11):S3, juin 2012.
  6. Zorana Ratkovic, Wiktoria Golik, Pierre Warnier, Philippe Veber, Claire Nédellec, "BioNLP 2011 Task Bacteria Biotope - The Alvis system", BioNLP workshop associé à ACL, Portland, Etats-Unis, 2011.
  7. Robert Bossy, Julien Jourde, Philippe Bessières, Maarten van de Guchte, Claire Nédellec, "BioNLP shared Tasks 2011 - Bacteria Biotope", BioNLP workshop associé à ACL, Portland, Etats-Unis, 2011.

  • ATOL Ontology

ATOL, the Animal Trait Ontology for Livestock describes the traits of livestock animals. It is developed by the INRA scientific department Phase in collaboration with the Bibliome group (on-going D-ONT project). 

References

  1. P.-Y. Le Bail, J. Bugeon, O. Dameron, A. Fatet, W. Golik, J.-F. Hocquette, C. Hurtaud, I. Hue, C. Jondreville, L. Joret, M.-C. Meunier-Salaün, J. Vernet, C. Nédellec, M. Reichstadt, P. Chemineau. Un langage de référence pour le phénotypage des animaux d’élevage : l’ontologie ATOL, INRA Prod. Anim., 2014, 27 (3), 195-208.
  2. Hue I , Bugeon J Dameron O, Fatet A, Hurtaud C, Joret L, Meunier-Salaün MC, Nédellec C, Reichstadt M, Vernet J, Le Bail PY. ATOL AND EOL ONTOLOGIES, STEPS TOWARDS EMBRYONIC PHENOTYPES SHARED WORLDWIDE, 4th Mammalian Embryo Genomics meeting, Québec, octobre 2013.
  3. Salaün, M.-C., Bugeon, J., Dameron, O., Fatet, A., Hue, I., Hurtaud, C., Nédellec, C., Reichstadt, M., Vernet, J., Reecy, J., Park, C., Le Bail, P.-Y. ATOL: an ontology for livestock. In : Book of abstracts of the 63rd Annual Meeting of the European Federation of Animal Science, Bratislava (Slovaquie).Wageningen (NLD) : Wageningen Academic Publishers (EAAP Book of Abstracts, 18), page 299, 2012.
  4. Wiktoria Golik, Olivier Dameron, Jérôme Bugeon, Alice Fatet, Isabelle Hue, Catherine Hurtaud, Matthieu Reichstadt, Marie-Christine Salaün, Jean Vernet, Léa Joret, Frédéric Papazian, Claire Nédellec et Pierre-Yves Le Bail. " ATOL: the multi-species livestock trait ontology" in proceedings of The 6th Metadata and Semantics Research Conference (MTSR 2012), pp 289-300. Springer Verlag Communications in Computer and Information Science Serie. Cadiz, Espagne, 28 au 30 novembre 2012. DOI: 10.1007/978-3-642-35233-1_28
  5. M. C. Meunier-Salaun, J. Bugeon, O. Dameron, A. Fatet, I. Hue, C. Hurtaud, L. Joret, C. Nédellec, M. Reichstadt, J. Vernet, PY Le Bail., Les ontologies ATOL et /EOL: des outils en appui aux nouveaux challenges en production porcine : phénotypage et élevage de précision, Journées de la Recherche Porcine (JRP), 4 et 5 février 2014.

  • TriPhase ("Terminology for Information Retrieval in Animal Physiology and farming systems") Ontology

Objective

The Triphase termino-ontology formally represents the research topics of the INRA scientific department PHASE, i.e. animal physiology and farming systems. Dedicated text-mining tools use TriPhase for the analysis of topics of Phase department researchers from their publications referenced in the ProdInra bibliographic database.

It has been developed by the Bibliome team and Information Science specialists from the Phase department to answer the needs for strategic analysis.

It contains 1,320 concepts named by 2,093 terms. The fine granularity of TriPhase is useful for the analysis of minor and transdisciplinary topics.

Use

TriPhase has been used for the analysis of concept distribution and evolution in time in publications from 2009 to 2013. The ANStrat tool developed by the Bibliome group is used to express queries on various criteria (e.g. topics, laboratories, type of publication, co-author partnership) and to display the results. Interactive navigation of TriPhase and concept selection is used to analyze topics at various levels of detail in combination with other bibliographic criteria.

Access and Licence

TriPhase is available on AgroPortal under CC-BY-SA license v3.0. Copyright Inra 2014.

References

  1. Agnès Girard et le réseau des documentalistes du Département Phase, Inra Rennes et Claire Nédellec et l’équipe de recherche Bibliome. Triphase : co-construction d’une ressource termino-ontologique. Arabesque, Revue trimestrielle de l'agence bibliographique de l'Enseignement Supérieur, August 2016.

Annotated corpora

This is the original corpus of the LLL challenge. The goal of the LLL challenge is to evaluate the ability of the participating Information Extraction systems to identify directed interactions and the gene/proteins that interact (named entities must detected). The on-line evaluation service is still available. Note that the LLL corpus differs from the BioInfer LLL corpus. The Bioinfer corpus is a transformation of the original LLL corpus where the IE task has been made much easier: the relation arguments are given and the relation is not directed.

References

  1. Nédellec C. "Learning Language in Logic - Genic Interaction Extraction Challenge" in Proceedings of the Learning Language in Logic (LLL05) workshop joint to ICML'05. Cussens J. and Nédellec C. (eds). p 31-37, Bonn, August 2005.

The BI corpus is part of the Bacteria Interaction task in the BioNLP Shared Task 2011. The goal is to extract complex interaction events from Pubmed references.

References

  1. Robert Bossy, Julien Jourde, Alain-Pierre Manine, Philippe Veber, Erick Alphonse, Maarten van de Guchte, Philippe Bessières, Claire Nédellec. BioNLP Shared Task - The Bacteria Track. BMC Bioinformatics, (Suppl 11):S3, juin 2012.
  2. Julien Jourde, Alain-Pierre Manine, Philippe Veber, Karen Fort, Robert Bossy, Erick Alphonse, Philippe Bessières, "BioNLP Shared Task 2011 - Bacteria Gene Interactions and Renaming", BioNLP workshop joint to ACL, Portland, USA, 2011.

The GRN corpus is part of the Gene Regulation Network in Bacteria task in the BioNLP Shared Task 2013. The goal is to extract the full regulation network of Bacillus subtilis sporulation. The on-line evaluation service is available.

References

  1. Robert Bossy, Wiktoria Golik, Zorana Ratkovic, Dialekti Valsamou, Philippe Bessières, Claire Nédellec. An Overview of the  Gene Regulation Network and the Bacteria Biotope Tasks in BioNLP’13. BMC Bioinformatics, Vol 16 Suppl 10, 2015
  2. Bossy R., Bessières P., Nédellec C. BioNLP Shared Task 2013 – An overview of the Genic Regulation Network Task. In Proceedings of the BioNLP 2013 Workshop, Association for Computational Linguistics, Sofia, Bulgaria, 2013.

The BB'11 corpus is part of the Bacteria Biotope Task in the BioNLP Shared Task 2011. The goal is (1) to identify the bacteria and their habitat that have to be categorized in seven different types and (2) to extract relations between bacteria and their habitat.

References

  1. Robert Bossy, Julien Jourde, Alain-Pierre Manine, Philippe Veber, Erick Alphonse, Maarten van de Guchte, Philippe Bessières, Claire Nédellec. BioNLP Shared Task - The Bacteria Track. BMC Bioinformatics, (Suppl 11):S3, juin 2012.
  2. Robert Bossy, Julien Jourde, Philippe Bessières, Maarten van de Guchte, Claire Nédellec, « BioNLP shared Tasks 2011 - Bacteria Biotope », BioNLP workshop associé à ACL, Portland, Etats-Unis, 2011.

The BB'13 corpus is part of the Bacteria Biotope Task in the BioNLP Shared Task 2013. The goal is (1) to identify the bacteria and their habitat that have to be categorized by the concept of the OntoBiotope ontologies and (2) to extract relations between bacteria and their habitat from webpages. The on-line evaluation service is available.

References

  1. Robert Bossy, Wiktoria Golik, Zorana Ratkovic, Dialekti Valsamou, Philippe Bessières, Claire Nédellec. An Overview of the  Gene Regulation Network and the Bacteria Biotope Tasks in BioNLP’13. BMC Bioinformatics, Vol 16 Suppl 10, 2015.
  2. Bossy R., Golik W., Ratkovic Z., Bessières P., Nédellec C. BioNLP shared Task 2013 – An Overview of the  Bacteria Biotope Task. In Proceedings of the BioNLP 2013 Workshop, Association for Computational Linguistics, Sofia, Bulgaria, 2013.

The BB'16 corpus is part of the Bacteria Biotope Task in the BioNLP Shared Task 2016. The goal is (1) to identify the bacteria and their habitat that have to be categorized by the concept of the OntoBiotope ontologies and (2) to extract relations between bacteria and their habitat from Pubmed reference. The on-line evaluation service is available.

References

  1. Louise Deléger, Robert Bossy, Estelle Chaix, Mouhamadou Ba, Arnaud Ferré, Philippe Bessières, Claire Nédellec, Overview of the Bacteria Biotope Task at BioNLP Shared Task,  In Proceedings of the BioNLP Shared Task 2016 Workshop, Association for Computational Linguistics, Berlin, Germany 2016.


The SeeDev'16 corpus is part of the SeeDev Task of the BioNLP Shared Task 2016. The goal is to extract complex interaction events involved in the development of Arabidopsis model plant seed. The on-line evaluation service is available.

References

    1. Estelle Chaix, Bertrand Dubreucq, Abdelhak Fatihi, Dialekti Valsamou, Robert Bossy, Mouhamadou Ba, Louise Deléger, Pierre Zweigenbaum, Philippe Bessières, Loïc Lepiniec, Claire Nédellec. Overview of the Regulatory Network of Plant Seed Development (SeeDev) Task at the BioNLP Shared Task.  In >Proceedings of the BioNLP Shared Task 2016 Workshop, Association for Computational Linguistics, Berlin, Germany 2016.

Corpora and Ontologies are distributed under Creative Commons CC-BY-SA license.