Knowledge graphs (KGs) can be queried with LLM-based NL2SPARQL systems that translate natural language questions into SPARQL queries. The proposed tool supports semantic validation of such systems. Given a natural language question, a context (e.g. a KG schema), and a candidate SPARQL query, it predicts whether the query is semantically correct or contains a specific semantic error. The framework is based on a formal Abstract Syntax Tree (AST)-based taxonomy of SPARQL semantic errors and can be used to audit query generation systems, analyse failure modes, and support downstream correction.
As this research is still in progress, all resources will be shared after publication of the paper.
This tool is a research framework for semantic validation of NL2SPARQL systems. The user specifies a knowledge graph setting, a set of natural language questions, candidate SPARQL queries, and an LLM used as the semantic judge. The framework then predicts whether a query is semantically correct or assigns a fine-grained semantic error label.
Key features:
The tool operates in the following stages:
For benchmarking, the framework also supports generation of syntactically valid candidate queries with controlled semantic errors. This makes it possible to evaluate which LLMs are effective at semantic error detection for NL2SPARQL systems.
The example below illustrates a wrong IRI semantic error.
{"wd:Q34253": "Linus Torvalds", "wd:Q5962": "Boltzmann constant", ...}ASK WHERE { wd:Q5962 wdt:P1181 ?obj . FILTER (?obj < 1.6567788e-23) }ASK WHERE { wd:Q34253 wdt:P1181 ?obj . FILTER (?obj < 1.6567788e-23) }wrong_iriIn this case, the query structure is preserved, but the entity IRI is incorrect.
The gold query refers to wd:Q5962 (Boltzmann constant), while the candidate query uses wd:Q34253 (Linus Torvalds). The framework therefore identifies the query as semantically incorrect and assigns the label wrong_iri.
This tool is intended for:
Within the AIMS5.0 context, the tool is relevant for applications that use natural language interfaces to knowledge graphs and require reliable, diagnosable query generation.
As this research is still in progress, a link will be provided once the paper is published.