Decentralized Federated Learning and Research Directions (Part 2).

Decentralized Federated Learning and Research Directions.
type: insightlevel: advanceguides: federated_learning

Federated Learning (FL) has become a powerful learning technique in Deep Learning. Due to the ability to train deep learning models with a large amount of cross-devices or cross-silos, extracting knowledge from distributed data, user privacy, maintaining privacy and sensitive data, increasing learning efficiency, and saving communication costs, FL has been widely studied and applied to various applications. Centralized Federated Learning (CFL) is the most common approach in research and application, which only requires a server and a connection between the server and each client. However, CFL suffers high risk and low trust when the server has the highest probability of bottleneck or being vulnerable to suspicious attacks. DFL addresses to overcome these limitations by removing the server in the configuration and letting the clients connect to each other. Thus, we aim to investigate the potential of DFL in research and application and find out research, development, and optimization directions for this topic.

In the previous part, we have investigate accumulation method direction. In this part, we will export topology design direction.

Topology design

When researchers design an algorithm (e.g aggregation), they usually run benchmark on fixed configuration, such as fully connected network, partically connected network, and node clustering. Each network has its own advantages and disadvantages, which will be shown in Figure 7.

Fig-7

Figure 7: Some network topologies. Image is taken from this paper

A robust fully completed network greatly enhances the communication cost, while a partially completed network becomes less trusted and delays in transmitting parameters. Therefore, designing a robust network topology is an extreme problem.

Throughput-Optimal Topology Design for Cross-Silo Federated Learning The authors found a topology with the largest throughput or with provable throughput guarantees using the theory of max-plus linear systems to compute the system throughput number of communication rounds per time unit. They defined the MCTM_{CT} problem: Given a connectivity graph Gc\mathcal{G}_c, they want to find the overlay (subgraph) Go\mathcal{G}_o to be a strongly directed graph with minimal cycle time.

When Gc\mathcal{G}_c is an undirected graph, the author suggests using the links in Gc\mathcal{G}_c with the smallest average of delays, This leads to the minimum weight spanning tree in an undirected graph, which can be solved using Prim algorithm. In the case that Gc\mathcal{G}_c is directed, the problem is non-deterministic polynomial-time hardness (NP-hard). However, in case that Gc\mathcal{G}_c is a complete Euclidean edge-capacitated graph (the delay is symmetric and satisfies triangle inequality), Christofides’ algorithm can be an approximation algorithm for MCTM_{CT}.

Reducing Training Time in Cross-Silo Federated Learning using Multigraph Topology: The authors construct the multigraph using the overlay graph, then parse this multigraph into different simple graphs with isolated nodes. The node with a long delay time is chosen as the isolated node (does not participate in aggregating parameters), then after some communication rounds, the isolated note is updated, and later becomes a normal node, which helps reduce training time.

D-Cliques: Compensating NonIIDness in Decentralized Federated Learning with Topology: They deal with label distribution at nodes in heterogeneous data (in the case when each node holds data with a simple class only), in contrast to the homogeneous data where the label distribution is same across nodes. They introduced the topology design D-Cliques to compensate for data heterogeneity. D-Cliques aims to construct a topology in which each node is a part of a clique (a subset of nodes whose induced subgraph is fully connected) such that the label distribution in each clique is close to the global label distribution.

  • Construct locally representative cliques: For a clique CC and a label yy, they defined the distribution of yy in CNC \subset N as pC(y)p_C(y), the global distribution p(y)p(y) and the skew of CC following:

    pC(y)=1CiCpi(y)p_C(y) = \dfrac{1}{|C|} \sum_{i \in C}p_i(y)
    p(y)=1niNpi(y)p(y) = \dfrac{1}{n} \sum_{i \in N}p_i(y)
    skew(C)=l=1L=pC(y=l)p(y=l)\text{skew}(C) = \sum_{l=1}^{L} =|p_C(y=l) - p(y=l)|

    They tried to construct a set of cliques with a small skew by the Greedy-Swap Algorithm, to ensure that the distribution of labels in a clique is as close to the global distribution as possible.

  • Add Sparse Inter-Clique Connections: To achieve global consensus and convergence. By seeing each clique as a node, they provide various topologies that suited well (ring, fractal, fully connected) by adding an inter-clique connection between two cliques in a pair.

  • Optimization: By limiting the number of clique connections, the difference in weight of two nodes in an inter-clique connection biases the gradient of the classes represented in these two nodes. They solved the problem by applying Clique Averaging to D-SGD. They average only the gradients of neighbors within the same clique to reduce bias due to inter-clique edges.

References

  1. Yuan, Liangqi, et al. "Decentralized Federated Learning: A Survey and Perspective." arXiv preprint arXiv:2306.01603 (2023).
  2. Beltrán, Enrique Tomás Martínez, et al. "Decentralized federated learning: Fundamentals, state of the art, frameworks, trends, and challenges." IEEE Communications Surveys & Tutorials (2023).
  3. Koloskova, Anastasia, et al. "A unified theory of decentralized sgd with changing topology and local updates." International Conference on Machine Learning. PMLR, 2020.
  4. Jiang, Jingyan, and Liang Hu. "Decentralised federated learning with adaptive partial gradient aggregation." CAAI Transactions on Intelligence Technology 5.3 (2020): 230-236.
  5. Chen, Zhikun, et al. "Dacfl: Dynamic average consensus based federated learning in decentralized topology." arXiv preprint arXiv:2111.05505 (2021).
  6. Sun, Tao, Dongsheng Li, and Bao Wang. "Decentralized federated averaging." IEEE Transactions on Pattern Analysis and Machine Intelligence 45.4 (2022): 4289-4301.
  7. Shi, Yifan, et al. "Improving the model consistency of decentralized federated learning." arXiv preprint arXiv:2302.04083 (2023).
  8. Marfoq, Othmane, et al. "Throughput-optimal topology design for cross-silo federated learning." Advances in Neural Information Processing Systems 33 (2020): 19478-19487.
  9. Do, Tuong, et al. "Reducing Training Time in Cross-Silo Federated Learning using Multigraph Topology." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.
  10. Bellet, Aurélien, Anne-Marie Kermarrec, and Erick Lavoie. "D-cliques: Compensating for data heterogeneity with topology in decentralized federated learning." 2022 41st International Symposium on Reliable Distributed Systems (SRDS). IEEE, 2022.