*

Cohérence logique dans les grands modèles de langue

Offre de thèse

Cohérence logique dans les grands modèles de langue

Date limite de candidature

12-04-2026

Date de début de contrat

01-10-2026

Directeur de thèse

AMBLARD Maxime

Encadrement

This PhD offer is provided by the ENACT AI Cluster and its partners. Find all ENACT PhD offers and actions on https://cluster-ia-enact.ai/. The PhD offer will be located in the Sémagramme team of the LORIA. The Lorraine Research Laboratory in Computer Science and its Applications (LORIA) is a joint research unit affiliated with the University of Lorraine, CNRS, and Inria. It is one of the largest computer science research laboratories in France, with internationally recognized expertise in fields such as artificial intelligence, natural language processing, computer vision, data science, software engineering, and cybersecurity. LORIA conducts both fundamental and applied research, and is actively involved in national and international scientific collaborations. It provides a stimulating interdisciplinary environment for doctoral research, combining strong theoretical foundations with real-world applications. The Sémagramme team at LORIA focuses on research in natural language processing (NLP) and computational linguistics, with a particular emphasis on semantic representation and interpretation of texts. Its main research topics include lexical and distributional semantics, semantic parsing, and the exploitation of large-scale textual data. The team develops models and methods for analyzing meaning in language, addressing tasks such as information extraction, text classification, and semantic similarity. Sémagramme combines linguistic knowledge with formal and statistical approaches, and contributes to both theoretical advances in semantics and practical NLP applications. The PhD candidate will be integrated into a dynamic research team composed of permanent researchers and other doctoral students, providing a stimulating and collaborative scientific environment. The candidate will be enrolled in the IAEM Doctoral School, where they will be required to follow dedicated training courses aimed at developing both scientific and transferable skills. They will actively participate in the scientific life of the team through regular meetings and seminars, and will contribute to the broader research community by submitting scientific papers and presenting their work at national and international conferences. In addition, the candidate will be expected to undertake teaching activities for students in natural language processing and computer science, contributing to academic training and knowledge dissemination.

Type de contrat

Plan Investissement d'Avenir (Idex, Labex)

école doctorale

IAEM - INFORMATIQUE - AUTOMATIQUE - ELECTRONIQUE - ELECTROTECHNIQUE - MATHEMATIQUES

équipe

SEMMAGRAMME

contexte

The Lorraine Research Laboratory in Computer Science and its Applications (LORIA) is a joint research unit affiliated with the University of Lorraine, CNRS, and Inria. It is one of the largest computer science research laboratories in France, with internationally recognized expertise in fields such as artificial intelligence, natural language processing, computer vision, data science, software engineering, and cybersecurity. LORIA conducts both fundamental and applied research, and is actively involved in national and international scientific collaborations. It provides a stimulating interdisciplinary environment for doctoral research, combining strong theoretical foundations with real-world applications. The Sémagramme team at LORIA focuses on research in natural language processing (NLP) and computational linguistics, with a particular emphasis on semantic representation and interpretation of texts. Its main research topics include lexical and distributional semantics, semantic parsing, and the exploitation of large-scale textual data. The team develops models and methods for analyzing meaning in language, addressing tasks such as information extraction, text classification, and semantic similarity. Sémagramme combines linguistic knowledge with formal and statistical approaches, and contributes to both theoretical advances in semantics and practical NLP applications. The PhD candidate will be integrated into a dynamic research team composed of permanent researchers and other doctoral students, providing a stimulating and collaborative scientific environment. The candidate will be enrolled in the IAEM Doctoral School, where they will be required to follow dedicated training courses aimed at developing both scientific and transferable skills. They will actively participate in the scientific life of the team through regular meetings and seminars, and will contribute to the broader research community by submitting scientific papers and presenting their work at national and international conferences. In addition, the candidate will be expected to undertake teaching activities for students in natural language processing and computer science, contributing to academic training and knowledge dissemination.

spécialité

Informatique

laboratoire

LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications

Mots clés

LLM, TAL, logique, sémantique, raisonnement, durabilité

Détail de l'offre

Ces dernières années ont connu des avancées rapides en traitement automatique du langage naturel, les grands modèles de langage (LLM) atteignant des performances de pointe et étant de plus en plus utilisés dans des applications critiques. Malgré leur fluidité et leur capacité de généralisation, ces modèles présentent des incohérences systématiques, soulevant des inquiétudes quant à leur fiabilité, leur robustesse et leur vulnérabilité à des usages détournés ou à des comportements inattendus dans des contextes sensibles.

Les méthodes d'évaluation actuelles privilégient la précision des tâches, mais offrent peu d'informations sur la cohérence interne des modèles ou leur résistance aux perturbations. Ce projet de doctorat étudie dans quelle mesure des représentations sémantiques formelles peuvent servir à évaluer, surveiller et améliorer la cohérence des LLM, en mettant l'accent sur la cohérence logique comme élément clé d'un comportement robuste et sûr.

En mobilisant des outils de sémantique formelle et de logique, l'objectif est de développer des méthodes rigoureuses pour détecter les vulnérabilités liées aux incohérences et concevoir des systèmes vérifiables et débogables. À long terme, il s'agit de contribuer à des modèles dont le comportement peut être formellement caractérisé et contraint, afin d'en améliorer la fiabilité dans divers contextes et langues.

La cohérence des LLM est définie comme la capacité à produire des décisions non contradictoires. Elle inclut des dimensions sémantiques, factuelles et logiques (négation, symétrie, transitivité, additivité). Si les deux premières ont été largement étudiées, la cohérence logique reste peu explorée formellement.

Les approches actuelles reposent surtout sur des benchmarks ou sur la vérification de raisonnements intermédiaires via des outils externes. Elles présentent des limites : dépendance aux tâches, absence de garanties formelles générales et focalisation sur des textes courts. Le raisonnement sur des textes longs ou au niveau du discours demeure donc peu étudié, alors même que de nombreuses erreurs réelles proviennent d'incohérences à l'échelle du document. Par ailleurs, les travaux se concentrent principalement sur l'anglais.

Ce projet propose de dépasser ces limites grâce à des approches formelles et symboliques. Un objectif central est d'exploiter le formalisme sémantique YARN comme représentation intermédiaire du sens, et d'étudier sa conversion en formes logiques adaptées à la vérification. Les axes incluent : (i) la définition de contraintes logiques pour détecter les incohérences, y compris dans les textes longs ; (ii) des stratégies de vérification et de correction symboliques ; (iii) l'exploration de formalismes logiques plus riches et d'extensions multilingues.

Le projet défend également l'idée que des représentations sémantiques précises peuvent être implémentées dans des modèles plus petits et spécialisés, limitant le recours à des architectures massives. Enfin, l'impact environnemental sera pris en compte, en intégrant coût computationnel et consommation énergétique, dans une perspective d'IA durable.

Keywords

LLM, NLP, logic, semantic, reasoning, sustainability

Subject details

Recent years have witnessed rapid advances in natural language processing, with large language models (LLMs) achieving state-of-the-art performance across many tasks and being increasingly deployed in safety-critical applications. Despite their fluency and generalization capabilities, LLMs exhibit systematic failures inconsistency, raising concerns about their reliability, robustness, and susceptibility to misuse or unintended behaviour in high-stakes settings [Jang et al., 2022a,b; Liu et al., 2025; Novikova et al., 2025; Cheng et al., 2025]. Current evaluation paradigms largely emphasize task accuracy [Liu et al., 2025], providing limited insight into whether model behaviour is internally coherent or resilient to perturbations and attacks. This PhD project investigates to what extent formal semantic representations can be used to evaluate, monitor, and improve LLM consistency, with a focus on logical consistency as a component of robust and secure model behaviour. By leveraging tools from formal semantics and logic, the project seeks to develop principled methods for detecting inconsistency-driven vulnerabilities and enabling verifiable and debuggable LLM systems [Toroghi et al., 2024]. The long-term objective is to contribute to the design of LLM-based systems whose behaviour can be formally characterized and constrained, thereby improving their safety and trustworthiness across languages and deployment settings. LLM consistency is defined by Jang et al. [2022a] as a model's ability to make coherent, non-contradictory decisions. They distinguish semantic, logical, and factual consistency, with logical consistency including negational, symmetric, transitive, and additive properties. While semantic and factual consistency have been widely studied, logical consistency remains comparatively underexplored from a formal perspective. Existing evaluation methods mainly rely on benchmark-based logical attacks [Nakamura et al., 2023] or on verifying intermediate reasoning artifacts translated into logical forms and checked with external solvers [Cheng et al., 2025]. These approaches present several limitations: they are task-specific and heuristic, provide no formal guarantees beyond evaluated benchmarks, and mainly address short text segments. Long-text and discourse-level reasoning therefore remain largely understudied [Jang and Lukasiewicz, 2023], despite the fact that many real-world failures arise from document-level inconsistencies. Moreover, current work focuses almost exclusively on English, leaving open questions about typological diversity and low-resource settings. This project addresses these limitations through formal semantic and symbolic approaches to monitoring, evaluation, and mitigation. A central objective is to exploit the YARN semantic formalism [Pavlova et al, 2025] as a structured intermediate representation of meaning and to investigate its systematic transformation into logical representations suitable for formal verification. Research directions include: (i) defining explicit logical constraints to detect inconsistencies in model outputs, including in long-text settings; (ii) exploring symbolic verification and repair strategies, such as constrained decoding; and (iii) investigating richer logical formalisms and multilingual extensions to assess robustness across languages and under uncertainty. This project adopts the view that formal semantic approaches support precise and compact meaning representations that can be implemented in smaller, specialized models rather than relying on large-scale architectures. Accordingly, part of the research will focus on designing AI systems with high semantic quality and logical reliability while limiting dependence on extreme model scaling. Environmental impact will be considered as an evaluation criterion alongside accuracy and robustness, taking into account computational cost and energy consumption in line with sustainable and green AI principles.

Profil du candidat

Ce sujet de doctorat requiert un ensemble de compétences combinant expertise en traitement automatique du langage naturel (NLP) et en apprentissage automatique, avec des connaissances en sémantique formelle et en raisonnement logique. Les compétences clés incluent la manipulation de représentations sémantiques, la conception et l'évaluation de contraintes logiques, l'analyse du comportement et des modes de défaillance des LLM, ainsi que la mise en œuvre de pipelines expérimentaux intégrant analyse symbolique et modèles neuronaux. De solides capacités d'analyse, une autonomie en recherche et une expérience avec des données multilingues ou cross-lingues sont essentielles pour traiter les questions de robustesse et de cohérence dans des contextes réalistes.

Compétences techniques : Python, PyTorch / TensorFlow, formalismes logiques et solveurs (par ex. Prolog, Z3, solveurs SAT/SMT), traitement et annotation de texte, manipulation de grands corpus, recherche reproductible (Git, suivi des expériences).

Durabilité : principes de Green AI / IA durable, mesure des coûts computationnels et de la consommation énergétique, IA responsable / sécurité de l'IA, interprétabilité et explicabilité.

Compétences en recherche : rédaction scientifique, conception expérimentale, lecture critique de la littérature, analyse statistique, présentation en conférence.

Candidate profile

This PhD topic requires a skillset combining NLP and machine learning expertise with formal semantic and logical reasoning. Key competencies include working with semantic representations, designing and evaluating logic-based constraints, analyzing LLM behavior and failure modes, and implementing experimental pipelines that integrate symbolic analysis with neural models. Strong analytical skills, research autonomy, and experience with multilingual or cross-lingual data are essential for addressing robustness and consistency in realistic settings.

Technical Skills: Python, PyTorch / TensorFlow, Logical formalisms & solvers (e.g. Prolog, Z3, SAT/SMT solvers), Text processing & annotation, Working with large corpora, Reproducible research (Git, experiment tracking)
Sustainability: Green AI / sustainable AI principles, Measuring computational cost & energy use, Responsible AI / AI safety, Interpretability & explainability
Research Skills: Scientific writing, Experimental design, Critical reading of literature, Statistical analysis, Conference presentation

Référence biblio

Fengxiang Cheng, Haoxuan Li, Fenrong Liu, Robertvan Rooij, Kun Zhang, and Zhouchen Lin. Empowering LLMs with Logical Reasoning: A Comprehensive Survey. volume2, pages10400–10408, September 2025. doi: 10.24963/ijcai.2025/1155. URL https://www.ijcai.org/proceedings/2025/1155. ISSN: 1045-0823.

Myeongjun Jang and Thomas Lukasiewicz. Consistency Analysis of ChatGPT. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15970–15985, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.991. URL https://aclanthology.org/2023.emnlp-main.991/.

Myeongjun Jang, Deuk Sin Kwon, and Thomas Lukasiewicz. BECEL: Benchmark for Consistency Evaluation of Language Models. In Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na, editors, Proceedings of the 29th International Conference on Computational Linguistics, pages 3680–3696, Gyeongju, Republic of Korea, October 2022a. International Committee on Computational Linguistics. URL https://aclanthology.org/2022.coling-1.324/.

Myeongjun Jang, Frank Mtumbuka, and Thomas Lukasiewicz. Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text Correspondence. In Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz, editors, Findings of the Association for Computational Linguistics: NAACL 2022, pages 2030–2042, Seattle, United States, July 2022b. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-naacl.156. URL https://aclanthology.org/2022.findings-naacl.156/.

Junnan Liu, Hongwei Liu, Linchen Xiao, Ziyi Wang, Kuikun Liu, Songyang Gao, Wenwei Zhang, Songyang Zhang, and Kai Chen. Are Your LLMs Capable of Stable Reasoning? In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors, Findings of the Association for Computational Linguistics: ACL 2025, pages 17594–17632, Vienna, Austria, July 2025. Association for Computational Linguistics. ISBN 979-8-89176-256-5. doi: 10.18653/v1/2025.findings-acl.905. URL https://aclanthology.org/2025.findings-acl.905/.

Mutsumi Nakamura, Santosh Mashetty, Mihir Parmar, Neeraj Varshney, and Chitta Baral. LogicAttack: Adversarial Attacks for Evaluating Logical Consistency of Natural Language Inference. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13322–13334, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-emnlp.889. URL https://aclanthology.org/2023.findings-emnlp.889/.

Jekaterina Novikova, Carol Anderson, Borhane Blili-Hamelin, Domenic Rosati, and Subhabrata Majumdar. Consistency in Language Models: Current Landscape, Challenges, and Future Directions, July 2025. URL http://arxiv.org/abs/2505.00268. arXiv:2505.00268 [cs].

Siyana Pavlova. Toward Scalable Semantic Annotation: Bridging Readability and a Wide Range of Phenomena into a Layered Meaning Representation. Diss. Université de Lorraine, 2025.

Armin Toroghi, Willis Guo, Ali Pesaranghader, and Scott Sanner. Verifiable, Debuggable, and Repairable Commonsense Logical Reasoning via LLM-based Theory Resolution. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 6634–6652, Miami, Florida, USA, November 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.emnlp-main. 379. URL https://aclanthology.org/2024.emnlp-main.379/.