Contrôle agentique fiable pour les réseaux sans fil basés sur jumeaux numériques en présence d'incertitudes et d'écarts réalité-modèle

Offre de thèse

Date limite de candidature

19-04-2026

Date de début de contrat

01-10-2026

Directeur de thèse

MAIMOUR Moufida

Encadrement

La thèse sera réalisée au sein du département MPSI du CRAN Thèse en co-encadrement, avec comité de suivi de thèse.

Type de contrat

Enseignement supérieur

Candidater à cette offre

école doctorale

IAEM - INFORMATIQUE - AUTOMATIQUE - ELECTRONIQUE - ELECTROTECHNIQUE - MATHEMATIQUES

équipe

MPSI - Modélisation, Pilotage, Sûreté des Systèmes Industriels

contexte

Les architectures réseau émergentes (O-RAN, Edge) intègrent des mécanismes d'IA et des jumeaux numériques pour permettre un contrôle autonome des infrastructures sans fil. Cependant, les décisions d'agents autonomes reposent souvent sur des modèles imparfaits du système réel. Cette thèse étudie comment garantir des décisions fiables dans ces environnements dynamiques et incertains.

spécialité

Automatique, Traitement du signal et des images, Génie informatique

laboratoire

CRAN - Centre de Recherche en Automatique de Nancy

Mots clés

IA agentique, IA fiable, Jumeau numérique, Décision autonome, Apprentissage par renforcement sûr, Systèmes multi-agents

Détail de l'offre

Un jumeau numérique (JN) fournit une représentation virtuelle d'un système physique et est largement utilisé pour la supervision et l'aide à la décision. Les JNs de réseaux sans fil ont récemment été formalisés [2] et sont considérés comme des briques essentielles du contrôle autonome des réseaux [3]. Du point de vue IA, un JN comprend deux couches. La première repose sur des modèles descriptifs et prédictifs permettant d'estimer l'état du système. Nos travaux ont montré la difficulté de concevoir de tels JN et de maintenir une fidélité entre le virtuel et le réel dans l'IIoT [4]. La seconde couche concerne les mécanismes de décision construits au-dessus du JN : des politiques d'action y sont élaborées afin de contrôler, en boucle fermée, le comportement du système physique. Le JN devient alors un élément actif du cycle perception-décision-action. Les approches de modélisation et de simulation multi-agents permettent de représenter ces dynamiques complexes [5].
Le contrôle autonome s'inscrit naturellement dans des abstractions fondées sur les agents, depuis les cadres théoriques initiaux [6] jusqu'aux approches actuelles d'IA agentique [7]. Dans les architectures O-RAN et Edge, ces paradigmes contribuent à améliorer l'allocation distribuée des ressources [8,9], le plus souvent via l'apprentissage par renforcement multi-agents pour optimiser les performances [10,11]. Néanmoins, la majorité des approches agentiques et basées sur l'apprentissage se concentrent surtout sur la maximisation des performances et supposent implicitement l'existence d'une représentation fiable de l'environnement. Or, les modèles internes - appris, simulés ou fondés sur des JN - peuvent s'écarter du comportement réel du système physique, en raison d'hypothèses de modélisation, retards d'observation, changements environnementaux ou dynamiques non stationnaires. Ces écarts, parfois invisibles, soulèvent des questions sur la fiabilité et la crédibilité des décisions prises par des agents autonomes. Avec l'augmentation du niveau d'autonomie, les agents doivent opérer dans des conditions d'observabilité partielle, d'incertitude et de fidélité de modèle évolutif [12]. Lorsque des agents autonomes, par exemple fondés sur l'apprentissage par renforcement fédéré [13], agissent à travers un JN imparfait, leurs actions modifient les dynamiques mêmes du système qu'ils observent. Les décisions sont alors prises à partir de représentations incomplètes ou biaisées. Un risque majeur apparaît lorsque des performances apparemment satisfaisantes coexistent avec un raisonnement interne incorrect, pouvant conduire à des phénomènes de 'dérive silencieuse', d'instabilité ou à des comportements dangereux. Contrairement aux approches visant une 'performance à tout prix', cette thèse se concentre sur la fiabilité du processus décisionnel. La question centrale est : comment des systèmes d'IA peuvent-ils prendre des décisions fiables et sûres lorsqu'ils opèrent sur des réseaux sans fil dynamiques via un JN imparfait et évolutif ? Cette thèse vise à analyser les mécanismes de décision dans des boucles de contrôle reposant sur des JN, en considérant les réseaux sans fil comme un cas représentatif de systèmes dynamiques complexes. Les principaux objectifs sont :
- concevoir des mécanismes de contrôle garantissant un comportement stable malgré les imprécisions du JN, en intégrant la détection d'incertitude et des stratégies de décision contrainte ou de dégradation maîtrisée, inspirées de l'apprentissage par renforcement sûr[14] ;
- développer des modèles causaux adaptatifs capables de distinguer changements structurels et biais de modélisation afin de préserver la validité du raisonnement décisionnel en contexte non stationnaire ;
- étudier les limites de l'autonomie dans des réseaux de grande dimension et concevoir des mécanismes de coordination décentralisés et fédérés, conciliant autonomie locale, stabilité globale et frugalité des ressources.

Keywords

Agentic AI, Trustworthy AI, Digital Twin, Autonomous Decision-Making, Safe Reinforcement Learning, Multi-Agent Systems

Subject details

Context and Motivation Digital Twins (DTs) support complex distributed systems [1] such as wireless communication networks and Industry 4.0 environments. A DT maintains a virtual representation of the physical system, updated through observations and is widely used for monitoring and decision support. In wireless networks, Network DTs were recently formalized within standardization efforts [2], and are envisioned as key enablers for autonomous network control [3]. From an AI perspective, DTs comprise two complementary layers. The first relies on descriptive and predictive models to estimate the system state. Our previous work highlighted the difficulty of constructing such twins and maintaining a valid virtual-real mapping in industrial IoT [4]. The second layer concerns decision-making and control processes built on top of the DT, where action policies are derived from the virtual representation to influence the physical system behavior in a closed loop. In this setting, the DT becomes an active component of the perception-decision-action cycle. Multi-agent modeling and simulation capture these dynamics [5]. Autonomous control naturally aligns with agent-based abstractions, evolving from early theory [6] to Agentic AI [7]. In O-RAN and Edge settings, these frameworks enhance distributed resource allocation [8,9], typically leveraging Multi-Agent Reinforcement Learning (MARL) to optimize performance [10,11]. However, most existing agentic and learning-based approaches primarily emphasize performance maximization and implicitly assume a reliable representation of the environment. In practice, the internal models or representations used for decision-making, whether learned, simulated, or DT–based, may diverge from the physical system due to modeling assumptions, delayed observations, environmental changes, or non-stationary dynamics. Such divergence may not immediately affect performance, raising critical questions regarding the reliability and trustworthiness of agentic decision-making. Research Problem As autonomy increases, agents must operate under partial observability, uncertainty, and evolving model fidelity [12]. When autonomous agents, such as those implemented using Federated Reinforcement Learning [13], act through an imperfect DT, they influence the system dynamics they observe. Decisions are therefore taken under incomplete or biased representations. A critical risk arises where good apparent performance coexists with incorrect internal reasoning, leading to 'silent drift,' instability, or unsafe behavior. Unlike approaches that optimize for 'performance at all costs' [11], this thesis addresses the reliability and trustworthiness of the decision-making process itself. The central research question is : How can agentic AI systems make reliable and trustworthy decisions, without inducing unstable behaviors, when acting on dynamic wireless networks through an imperfect and evolving D T? Scientific Objectives This thesis aims to analyze agentic decision-making mechanisms in DT-based closed-loop control, using wireless networks as a representative class of complex dynamical systems. The main objectives are : - Trustworthy Control and Stability under Reality Gaps. Design agentic control mechanisms ensuring stable and safe behavior despite DT inaccuracies, by detecting uncertainty and enabling constrained decision-making or graceful degradation, building on safe reinforcement learning principles [14]. - Adaptive Causal World Modeling. Develop adaptive causal world models that distinguish structural environmental changes from modeling biases, ensuring valid decision logic under non-stationarity. - Scalable Decentralized Coordination. Analyze the limits of autonomy in large-scale networks and design decentralized and federated coordination mechanisms balancing local autonomy, global stability and frugality.

Profil du candidat

Candidat(e) titulaire (ou en dernière année) d'un Master ou diplôme d'ingénieur en informatique, réseaux, télécommunications ou domaine connexe, avec un profil orienté intelligence artificielle. Des compétences en apprentissage automatique, optimisation ou analyse de données, ainsi qu'un intérêt pour leur application aux problématiques des réseaux informatiques et des systèmes distribués sont fortement souhaités.

Candidate profile

Candidates must hold (or be in the final year of) a Master's degree or an engineering degree in computer science, networking, telecommunications, or a related field, with a profile oriented toward artificial intelligence. Strong skills in machine learning, optimization, or data analysis are highly desirable, along with an interest in applying these techniques to problems in computer networks and distributed systems.

Référence biblio

[1] Sakhri A, Ahmed A, Maimour M, Kherbache M, Rondeau E, Doghmane N. A digital twin-based energy-efficient wireless multimedia sensor network for waterbirds monitoring. Future Generation Computer Systems. 2024 Jun 1;155:146-63.
[2] 3GPP TR 28.915, Study on Management Aspects of Network Digital Twin, Rel. 18, 2023.
[3] Apostolakis N, Chatzieleftheriou LE, Bega D, Gramaglia M, Banchs A. Digital twins for next-generation mobile networks: Applications and solutions. IEEE Communications Magazine. 2023 May 8;61(11):80-6.
[4] Kherbache M, Ahmed A, Maimour M, Rondeau E. Constructing a Network Digital Twin through formal modeling: Tackling the virtual–real mapping challenge in IIoT networks. Internet of Things. 2023 Dec 1;24:101000.
[5] Shakya J, Chopin M, Merghem-Boulahia L. MultiAgentNetSim: Empowering Next-Generation Network Modeling with Multi-Agent Simulation. IEEE Communications Magazine. 2024 Dec 23.
[6] Wooldridge M, Jennings NR. Intelligent agents: Theory and practice. The knowledge engineering review. 1995 Jun;10(2):115-52.
[7] Xi Z, Chen W, Guo X, He W, Ding Y, Hong B, Zhang M, Wang J, Jin S, Zhou E, Zheng R. The rise and potential of large language model based agents: A survey. Science China Information Sciences. 2025 Feb;68(2):121101.
[8] Salama A, Nezami Z, Qazzaz MM, Hafeez M, Zaidi SA. Edge agentic ai framework for autonomous network optimisation in o-ran. In2025 IEEE 36th International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) 2025 Sep 1 (pp. 1-7). IEEE.
[9] O-RAN Alliance – Near-RT RAN Intelligent Controller (RIC) Architecture Specification.
[10] Zhang J, Liu Z, Zhu Y, Shi E, Xu B, Yuen C, Niyato D, Debbah M, Jin S, Ai B. Multi-agent reinforcement learning in wireless distributed networks for 6g. arXiv preprint arXiv:2502.05812. 2025 Feb 9.
[11] Toure B, Tsilimantos D, Giannakas T, Esrafilian O, Kountouris M. Multi-Objective Scheduling in Wireless Networks With Deep Reinforcement Learning. In2025 IEEE Wireless Communications and Networking Conference (WCNC) 2025 Mar 24 (pp. 1-6). IEEE.
[12] Kaelbling LP, Littman ML, Cassandra AR. Planning and acting in partially observable stochastic domains. Artificial intelligence. 1998 May 1;101(1-2):99-134.
[13] Ossongo E, Esseghir M, Merghem-Boulahia L. A multi-agent federated reinforcement learning-based optimization of quality of service in various LoRa network slices. Computer Communications. 2024 Jan 1;213:320-30.
[14] Garcıa J, Fernández F. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research. 2015 Aug;16(1):1437-80.