AI-driven routing (part 5) Challenges and Open Issues.

A series of Smart Routing based on AI.
type: insightlevel: advanceguides: smart_routing

In previous parts, we have mentioned about hybrid smart routing and its effectiveness. Although AI&ML-driven networking control is a potential paradigm for future networks, numerous problems remain, and much more effort is required. In this part, I'll go through the key problems and unresolved concerns surrounding AI&ML-powered networking. The Challenges and Open Issues, including:

  • New Hardware Architectures.
  • Promoting Machine Learning Algorithms
  • Advanced Software Systems.


New Hardware Architectures

As a result, every advancement in upper-level services is predicated on considerable improvements in the underlying hardware, such as a CPU for general-purpose calculations, a digital signal processor (DSP) for a communication system, and a GPU for image processing. In the same way, a dedicated AI networking processor is urgently needed to satisfy the demands of the AI-driven networking age.

Millisecond after millisecond, the current networks create a myriad of various sorts of fluxes. Such large data sets make it difficult to run AI systems. As of now, routers lack the computational power necessary for AI&ML implementation. GPU and Tensor Processing Unit (TPU) chips have recently become a cornerstone of the AI age as highly parallel, multicore, multithreaded computers. According to certain research, the use of a GPU can enhance packet processing capabilities. However, due to the necessity for high-speed processing of huge quantities of data (more than 10 Gb/s) and rigorous reaction delay requirements (less than 1 ms) for future networks, there is still a significant gap between universal AI processor chips and their real networking implementation possibilities. See Figure 1 to 3 for more details about CPU, GPU, TPU, and DPS.


Figure 1: Differences between CPU and GPU (Source).


Figure 2: TPU Architecture (Source).


Figure 3: DPS Architecture (Source).

Promoting Machine Learning Algorithms

Despite the fact that several ML algorithms have been created, contemporary ML methods are generally motivated by the demands of certain existing applications, such as Computer Vision (CV) and Natural Language Processing (NLP). Convolutional neural networks, for example, are intriguing and powerful tools for image and audio recognition that can reach superhuman performance on a variety of tasks. However, the networking domain requires very distinct theoretical mathematical models than those found in the disciplines of computer vision and natural language processing. Convolutional or recurrent layers may not perform well in the networking area. Furthermore, networks include considerably more data and severe reaction time requirements, posing significant hurdles for ML implementation. As a result of the rigorous needs and particular characteristics of the networking domain, both modification of current algorithms and invention of new ones will be required. As a result, attempts to fulfill the demands of the networking domain as a new application area for ML will propel advancements in both the ML and networking domains to new heights.

Advanced Software Systems

Currently, network data handling is facing typical big data difficulties; recent years have witnessed a threefold rise in total IP traffic and a >60% increase in the number of devices installed and the amount of telemetry data transmitted in near real time. Meanwhile, the geo-distributed nature of networking makes widespread adoption of systems for network data analytics much more challenging. For example, determining how to combine data such as log data, metric data, and network telemetry data; scaling up to the consumption of millions of flows every millisecond; and efficiently sharing information across distant network nodes are all problems. Current end-to-end solutions are highly complicated and time-consuming, combining various technologies like as Apache Spark and Hadoop MapReduce. As a result, a strong, scalable big data analytics platform for networks and network services is required.

Furthermore, software libraries for ML networking activities are a critical enabler for AI-based networking. High-level programming interfaces for developing, training, and evaluating ML algorithms are provided by ML frameworks. Current machine learning frameworks, such as TensorFlow, Caffe, and Theano, are built for general-purpose applications and put an undue load on the networking domain. They must be further tuned to meet the demands of networking applications, such as fast processing speed, low complexity, and light weight. See Figure 4 for comparision between different ML frameworks with different criteria.


Figure 4: DPS Architecture (Source).


In the whole series, we first looked at two deployment models for an intelligent control plane in a network, as well as the benefits and drawbacks of the centralized and distributed paradigms. Then, to satisfy the demands of various network services, we visited a hybrid ML paradigm for packet routing that combines distributed AI routers with a centralized network mind. We use a centralized AI control plane for tunneling-based routing and a hybrid AI architecture for hop-by-hop routing in the paradigm. Finally, we examined the effectiveness of smart routing and open challenges in this field. We hope that the series provides meaningful information for readers who are interested in AI-based routing.