Although AI-driven networking is now a hot research topic, the concept of using ML in decentralized traffic routing dates back to the 1990s.. Decentralized Routing can be splited into two different ground of methods: Single-Agent and Multiagent. However, both group of method are based on the general learning algorithm of Reinforcement Learning technique, which is similar to the previous introduced Centralized Routing.
Single-Agent Reinforcement Learning
Boyan et al. introduced the Q-routing method in  to optimize packet routing control. Based on local information and communication, each router in the Q-routing algorithm modifies its policy according to its Q-function. The studies revealed that Q-routing outperformed the nonadaptive shortest route method, especially under heavy workload. Choi et al. introduced predictive Q-routing, a memory-based Q-learning method, in  to enhance the learning rate and convergence speed by preserving prior experiences. In addition, Kumar et al. introduced dual reinforcement Q-routing (DRQrouting) in , which leverages knowledge obtained from backward and forward exploration to speed up convergence. Reinforcement Learning (RL) was effectively implemented in wireless sensor network routing in [11, 12], with sensors and sink nodes self-adapting to the network environment. In a multiagent system, however, single-agent RL suffers from significant non-convergence. Instead, using multiagent RL to increase network node cooperation is more realistic, and there have been a number of research on MLdriven routing based on multiagent RL.
In this part, I also introduce the most up-to-date Q-Probabilistic routing (Q-PR) algorithm described in  as belows. For more details about notations, please visit .
The Q-PR-algorithm identifies the next hop on the actual route (a) in the form of network packets (not previous phases), b) when message is sent (opportunistic approach), c) using location-based route position and d) taking smart decisions on nodes receiving transmission requests, both based on local information and the importance of these packets. The Q-PR algorithm is summarized as follows,
Multiagent Reinforcement Learning
Stone et al. introduced the Team-Partitioned Opaque-Transition RL (TPOT-RL) routing algorithm in , , which allows a group of network nodes working toward a common objective to learn how to accomplish a collaborative job. Wolpert et al. created the Collective Intelligence (COIN) algorithm in , a sparse reinforcement learning method in which a global function is used to change the behavior of each network agent. In contrast, the author of  presented a routing algorithm based on Collaborative RL (CRL) that does not have a single global state. In , the CRL method was successfully utilized for delay-tolerant network routing.
Collaborative Reinforcement Learning  is one of the most successive method belongs to the Multiagent group, which algorithm is then decribed as belows. In a multiagent system, the CRL technique may be used to address system optimization issues, which can be discretized into a collection of DOPs and described as absorbing MDPs in the following schema.
Next section introduction
In the next part of this series, I will introduce a three-layer logical functionality architecture for AI-driven networking. This is the most up-to-date group of methods that is a combination between Centralized and Decentralized Routing to maximize the transmition.
 J. A. Boyan and M. L. Littman, “Packet routing in dynamically changing networks: A reinforcement learning approach,” in Proc. Int. Conf. Neural Information Processing Systems, Denver, 1993, pp. 671–678.
 S. P. Choi and D.-Y. Yeung, “Predictive Q-routing: A memory-based reinforcement learning approach to adaptive traffic control,” in Advances in Neural Information Processing Systems, Denver, Dec. 2–5, 1996, pp. 945–951.
 S. Kumar and R. Miikkulainen, “Dual reinforcement Q-routing: An on-line adaptive routing algorithm,” in Proc. Artificial Neural Networks in Engineering Conf., 1997, pp. 231–238.
 A. A. Bhorkar, M. Naghshvar, T. Javidi, and B. D. Rao, “Adaptive opportunistic routing for wireless ad hoc networks,” IEEE Trans. Netw., vol. 20, no. 1, pp. 243–256, Feb. 2012.
 R. Arroyo-Valles, R. Alaiz-Rodriguez, A. Guerrero-Curieses, and J. Cid-Sueiro, “Q-probabilistic routing in wireless sensor networks,” in Proc. Int. Conf. Intelligent Sensors, Sensor Networks and Information, Melbourne, Dec. 3–6, 2007, pp. 1–6.
 P. Stone, “TPOT-RL applied to network routing,” in Proc. Int. Conf. Machine Learning, California, June 29–July 2, 2000, pp. 935–942.
 P. Stone and M. Veloso, “Team-partitioned, opaque-transition reinforcement learning,” in Robot Soccer World Cup, Springer, 1998, pp. 261–272.
 D. Wolpert, K. Tumer, and J. Frank, “Using collective intelligence to route internet traffic,” in Advances in Neural Information Processing Systems, Denver, Nov. 29–Dec. 4, 1999, pp. 952–960.
 J. Dowling, E. Curran, R. Cunningham, and V. Cahill, “Using feedback in collaborative reinforcement learning to adaptively optimize MANET routing,” IEEE Trans. Syst., Man, Cybern., vol. 35, no. 3, pp. 360–372, May 2005.
 A. Elwhishi, P. H. Ho, K. Naik, and B. Shihada, “ARBR: Adaptive reinforcement based routing for DTN,” in Proc. IEEE Int. Conf. Wireless and Mobile Computing, Networking and Communications, Ontario, Oct. 10–13, 2010, pp. 376–385.