This tool supports ontology learning from a domain corpus or structured database using a lightweight encoder-based pipeline built around DeBERTa-v3-large. Rather than relying on a single end-to-end black-box model, it combines modular steps such as data enrichment, definition mining, candidate filtering, and fine-tuned classification. The framework can identify ontology concepts and predict both hierarchical and non-hierarchical relations between them. It is intended for specialised domains, including data-sensitive settings where internal knowledge sources cannot be sent to external APIs.
The tool can be used by cloning the GitHub repository and installing the required dependencies. A domain corpus, structured database, or other knowledge source must be provided as input.
This tool supports ontology learning in specialised and low-resource settings, where labelled data is limited, definitions may be sparse, and the number of candidate term pairs or triples can become large. It is implemented as a modular encoder-based framework built around fine-tuned DeBERTa-v3-large, with additional steps for data enrichment, definition mining, and candidate filtering.
The framework supports three tasks:
- Term typing: takes as input a list of domain terms and a set of ontology concepts or types, and assigns the most appropriate type labels to each term.
- Taxonomy discovery: takes as input a set of concepts and predicts valid hierarchical child-parent relations between them.
- Non-taxonomic relation extraction: takes as input a set of concepts and a predefined set of possible relations, and predicts valid triples of the form (subject, predicate, object).
In the current research implementation, the task-specific pipelines are organised separately and have been applied to challenge settings such as MatOnto, OBI, and SWEET.
Key features:
- Modular ontology learning pipeline built around fine-tuned DeBERTa-v3-large
- Separate task pipelines for term typing, taxonomy discovery, and non-taxonomic relation extraction
- Support for ontology learning from both domain corpora and structured knowledge sources
- Data enrichment through lexical variants and additional training examples
- Definition mining for terms, labels, and types from available resources
- Similarity-based candidate filtering to reduce the search space in relation prediction tasks
- Designed for specialised domains with technical terminology
- Suitable for self-hosted deployment in privacy-sensitive environments
The tool operates in the following stages:
- A domain corpus, structured database, or other knowledge source is provided as input.
- Candidate terms, concepts, or candidate pairs/triples are extracted, depending on the task.
- Training data is enriched with lexical variants and additional examples.
- Definitions for terms, labels, or types are collected from available resources.
- For taxonomy discovery and non-taxonomic relation extraction, similarity-based filtering reduces the candidate search space.
- A fine-tuned DeBERTa-v3-large model is applied to predict type labels, hierarchical relations, or non-taxonomic relation triples.
Because the framework is modular, individual stages can also be used as standalone components.
This tool is intended for:
- ontology learning from domain corpora
- ontology learning from structured databases or knowledge repositories
- taxonomy construction and semantic enrichment
- specialised domains where lightweight fine-tuned models are preferred over large API-based systems
- industrial settings where internal knowledge sources must remain private
Within the AIMS5.0 context, the tool is relevant for domains with specialised vocabularies and evolving concept structures, where ontology engineering benefits from compact, domain-adaptable models.
- Performance depends on the quality of the input corpus or database.
- The quality of learned ontology elements depends on candidate extraction and data enrichment.
- Definition mining depends on the coverage and quality of the available lexical or domain resources.
- Candidate filtering introduces task- and domain-specific parameters that may require tuning.
- The current implementation is a research prototype rather than a full end-user ontology-engineering platform.
If you use this tool in research or development, please cite the corresponding paper.
@inproceedings{latipov2025llms4ol,
title={IRIS at LLMs4OL 2025 Tasks B, C and D: Enhancing Ontology Learning Through Data Enrichment and Type Filtering},
author={Latipov, Insan-Aleksandr and Holenderski, Mike and Meratnia, Nirvana},
booktitle={LLMs4OL 2025: The 2nd Large Language Models for Ontology Learning Challenge at the 24th ISWC},
year={2025}
}