3 posts tagged with "federated autonomous driving"

View All Tags

Reducing Non-IID Effects in Federated Autonomous Driving with Contrastive Divergence Loss (Part 3)

In the previous article, we delved into integrating the contrastive divergence loss function into federated learning, exploring its potential benefits for enhancing model performance and tackling non-IID data issues in autonomous driving contexts. In this follow-up piece, we delve into various federated learning configurations and present experimental findings that support the efficacy of our approach.

1. Implementation

Dataset: Our experimentation involves three datasets (see Table 1): Udacity+, Gazebo Indoor, and Carla Outdoor. Gazebo and Carla datasets exhibit non-IID characteristics, while Udacity+ represents a non-IID variant of the Udacity dataset.

DatasetTotal samplesAverage samples in each silo (Gaia)Average samples in each silo (NWS)Average samples in each silo (Exodus)
Udacity+38,5863,5081,754488
Gazebo66,8066,0733,037846
Carla73,2356,6583,329927

Table 1: The Statistic of Datasets in Our Experiments.

Network Topology: Our experimentation encompasses three distinct federated topologies, namely the Internet Topology Zoo (Gaia), North American data centers (NWS), and the Zoo Exodus network (Exodus). The primary focus is on the Gaia topology, while supplementary insights from the NWS and Exodus topologies are provided in our ablation study.

Training: Within each silo, model training is executed with a batch size of 32 and a learning rate set at 0.001, facilitated by the Adam optimizer. The local training regimen within each silo precedes the transmission and aggregation of models using the specified global aggregation equation. The training regimen spans 3,600 communication rounds and leverages a simulation environment akin to that described in Nguyen et al. (2022), powered by an NVIDIA 1080 GPU.

Baselines: Our comparative analysis involves several contemporary methods across diverse learning scenarios, including Random and Constant baselines as outlined by Loquercio et al. (2018). Within the Centralized Local Learning (CLL) scenario, we utilize Inception-V3, MobileNet-V2, VGG-16, and Dronet as baseline models. In the context of Server-based Federated Learning (SFL), our comparison extends to FedAvg, FedProx, and STAR. For the Decentralized Federated Learning (DFL) setting, our evaluation includes MATCHA, MBST, and FADNet. Model effectiveness is assessed using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE), while wall-clock time (ms) serves as a metric for training duration.

2. Qualitative Results

In practice, we've noticed that the initial phases of federated learning often yield subpar accumulated models. Unlike other approaches that tackle the non-IID issue by refining the accumulation step whenever silos transmit their models, we directly mitigate the impact of divergence factors during the local learning phase of each silo. Our method aims to minimize the discrepancy between the distribution of accumulated weights from neighboring silos in the backbone network (representing divergence factors) and the weights specific to silo ii in the sub-network (comprising locally learned knowledge). Once the distribution between silos achieves an acceptable level of synchronization, we reduce the influence of the sub-network and prioritize the steering angle prediction task. Inspired by the contrastive loss of the original Siamese Network, our proposed Contrastive Divergence Loss is formulated as follows:

ModelMain FocusLearning MethodRMSE (Udacity+)RMSE (Gazebo)RMSE (Carla)MAE (Udacity+)MAE (Gazebo)MAE (Carla)# Training ParametersAvg. Cycle Time (ms)
Random__0.3580.1170.4640.2650.0870.361__
ConstantStatistical_0.3110.0920.3480.2090.0670.232__
InceptionArchitecture DesignCLL0.2090.0850.2970.1970.0620.20721,787,617_
MobileNetArchitecture DesignCLL0.1930.0830.2860.1760.0570.2002,225,153_
VGG-16Architecture DesignCLL0.1900.0830.3160.1610.0500.1847,501,587_
DroNetArchitecture DesignCLL0.1830.0820.3330.1500.0530.218314,657_
FedAvgAggregation OptimizationSFL0.2120.0940.2690.1850.0640.222314,657152.4
FedProxAggregation OptimizationSFL0.1520.0770.2260.1180.0630.151314,657111.5
STARAggregation OptimizationSFL0.1790.0620.2080.1490.0530.155314,657299.9
MATCHATopology DesignDFL0.1820.0690.2080.1480.0580.215314,657171.3
MBSTTopology DesignDFL0.1830.0720.2140.1490.0580.206314,65782.1
FADNetTopology DesignDFL0.1620.0690.2030.1340.0550.197317,72962.6
CDL (ours)Loss OptimizationCLL0.1690.0740.2660.1490.0530.172629,314_
CDL (ours)Loss OptimizationSFL0.1500.0600.2080.1040.0520.150629,314102.2
CDL (ours)Loss OptimizationDFL0.1410.0620.1830.0830.0520.147629,31472.7

Table 2: Performance comparison between different methods. The Gaia topology is used.

Table above summarizes the performance comparison between our proposed method and recent state-of-the-art approaches. The results indicate that our CDL under the Siamese setup with two ResNet-8 models outperforms other methods by a significant margin. Notably, our approach achieves substantial reductions in both RMSE and MAE across all three datasets: Udacity+, Carla, and Gazebo. Despite not increasing the network's parameter count, CDL introduces a larger model size during training due to the additional sub-network required by the Siamese setup. Additionally, our CDL with ResNet-8 demonstrates superior performance compared to other baselines, particularly in the DFL learning scenario, and to a lesser extent in SFL and CLL setups.

3. Contrastive Divergence Loss Analysis

CDL Performance Across Various Topologies In practice, training federated algorithms becomes more complex as the topology involves more vehicle data silos. To assess the efficacy of our CDL, we conduct training experiments and compare the results with other baseline methods across different topologies. Table 3 presents the performance comparison of DroNet, FADNet, and our CDL with a ResNet-8 backbone when trained using DFL across three distributed network infrastructures with varying numbers of silos: Gaia (11 silos), NWS (22 silos), and Exodus (79 silos). The table clearly demonstrates that our CDL consistently achieves superior results across all topology configurations. In contrast, DroNet encounters divergence issues, and FADNet exhibits suboptimal performance, particularly in the Exodus topology with 79 silos.

TopologyArchitectureUdacity+GazeboCarla
Gaia (11 silos)DroNet0.177 (↓0.036)0.073 (↓0.011)0.244 (↓0.061)
FADNet0.162 (↓0.021)0.069 (↓0.007)0.203 (↓0.020)
CDL (ours)0.1410.0620.183
NWS (22 silos)DroNet0.183 (↓0.045)0.075 (↓0.017)0.239 (↓0.057)
FADNet0.165 (↓0.027)0.070 (↓0.012)0.200 (↓0.018)
CDL (ours)0.1380.0580.182
Exodus (79 silos)DroNet0.448 (↓0.310)0.208 (↓0.147)0.556 (↓0.380)
FADNet0.179 (↓0.041)0.081 (↓0.020)0.238 (↓0.062)
CDL (ours)0.1380.0610.176

Table 3: Performance under different topologies.

CDL with Various Architectures Our CDL, functioning as a loss function, exhibits versatility across different network architectures when integrated into the Siamese setup, leading to performance enhancements. Figure.1 showcases the efficacy of CDL across diverse networks such as DroNet, FADNet, Inception, MobileNet, and VGG-16 within the Gaia Network framework in the DFL scenario. The outcomes demonstrate CDL's efficacy in mitigating the non-IID challenge across varied architectures, consistently elevating performance.

Figure 1. Performance of CDL under different networks in Siamese setup.

CDL with IID Data Figure.2 illustrates CDL's efficacy across various data distributions. While CDL is primarily tailored for addressing the non-IID challenge, it also demonstrates marginal performance enhancements when applied to models trained on IID data distributions. Leveraging the Siamese setup, CDL inherits traits and behaviors akin to triplet loss. Given triplet loss's established effectiveness with IID data, it becomes evident that CDL can similarly augment model performance in scenarios where IID data is utilized.

Figure 2. Performance of different methods on IID dataset (Udacity) and non-IID dataset (Udacity+).

3. Ablation Study

Figure.3 illustrates the training results in RMSE of our two baselines, DroNet and FADNet, as well as our proposed CDL. The results showcase the convergence ability of the mentioned methods across three datasets (Udacity+, Gazebo, Carla) with NWS and Gaia topology. It is evident that our proposed CDL can attain a superior convergence point compared to the two baselines. While other methods (DroNet and FADNet) struggled to converge or exhibited poor convergence trends, our proposed CDL demonstrates better overcoming of local optimal points and shows less bias towards any specific silo.

Figure 3. The convergence ability of different methods under Gaia topology (top row) and NWS topology (bottom row).

4. Discussion

Although our method exhibits promising results, there are several areas for potential improvement that warrant consideration for future work:

  • Our CDL is specifically designed to address the non-IID problem, meaning its effectiveness heavily depends on the presence of non-IID characteristics within the autonomous driving data. Thus, in scenarios where data distributions across vehicles are relatively consistent or lack significant non-IID factors, the proposed contrastive divergence loss may only lead to limited performance enhancements.

  • While our proposal has been validated on autonomous driving datasets, real-world testing on actual vehicles has not yet been conducted. Conducting a study involving various driving scenarios, such as interactions with pedestrians, cyclists, and other vehicles, could provide further validation of our method's efficacy.

  • Although our approach is tailored for autonomous driving applications, its underlying principles may have applicability in other domains where non-IID data is prevalent. Exploring its effectiveness in areas like healthcare, IoT, and industrial settings could expand its potential impact.

Conclusion

We presented a new method to address the non-IID problem in federated autonomous driving using contrastive divergence loss. Our method directly reduces the effect of divergence factors in the learning process of each silo. The experiments on three benchmarking datasets demonstrate that our proposed method performs substantially better than current state-of-the-art approaches. In the future, we plan to test our strategy with more data silos and deploy the trained model using an autonomous vehicle on roads.

Reducing Non-IID Effects in Federated Autonomous Driving with Contrastive Divergence Loss (Part 2)

In the previous post, we delved into the realm of federated learning in the context of autonomous driving, exploring its potential and the challenges posed by non-IID data distribution. We also discussed how contrastive divergence offers a promising avenue to tackle the non-IID problem in federated learning setups. Building upon that foundation, in this post, we will delve deeper into the integration of contrastive divergence loss into federated learning frameworks. We'll explore the mechanics of incorporating this loss function into the federated learning process and examine its potential implications for improving model performance and addressing non-IID data challenges in autonomous driving scenarios. We unravel the intricacies of leveraging contrastive divergence within federated learning paradigms to advance the capabilities of autonomous driving systems.

1. Overview

Motivation: The effectiveness of federated learning algorithms in autonomous driving hinges on two critical factors: firstly, the ability of each local silo to glean meaningful insights from its own data, and secondly, the synchronization among neighboring silos to mitigate the impact of the non-IID problem. Recent efforts have primarily focused on addressing these challenges through various means, including optimizing accumulation processes and optimizers, proposing novel network topologies, or leveraging robust deep networks capable of handling the distributed nature of the data. However, as highlighted by Duan et al., the indiscriminate adoption of high-performance deep architectures and their associated optimizations in centralized local learning scenarios can lead to increased weight variance among local silos during the accumulation process in federated setups. This variance detrimentally impacts model convergence and may even induce divergence, underscoring the need for nuanced approaches to ensure the efficacy of federated learning in autonomous driving contexts.

Figure 1. The Siamese setup when our CDL is applied for training federated autonomous driving model. ResNet-8 is used in the backbone and sub-network in the Siamese setup. During inference, the sub-network will be removed. Dotted lines represent the backward process. Our CDL has two components: the positive contrastive divergence loss Lcd+\mathcal{L}_{\rm {cd^+}} and the negative regularize term Lcd\mathcal{L}_{\rm {cd^-}}. The local regression loss Llr\mathcal{L}_{\rm {lr}} for automatic steering prediction is calculated only from the backbone network.

Siamese Network Approach: In our study, we propose a novel approach to directly tackle the non-IID problem within each local silo by addressing the challenges of learning optimal features and achieving synchronization separately. Our strategy involves implementing \textit{two distinct networks within each silo}: one network is dedicated to extracting meaningful features from local image data, while the other focuses on minimizing the distribution gap between the current model weights and those of neighboring silos. To facilitate this, we employ a Siamese Network architecture, comprising two branches. The first branch, serving as the backbone network, is tasked with learning local image features for autonomous steering using a local regression loss Llr\mathcal{L}_{\rm {lr}}, while simultaneously incorporating a positive contrastive divergence loss Lcd+\mathcal{L}_{\rm {cd^+}} to assimilate knowledge from neighboring silos. Meanwhile, the second branch, referred to as the sub-network, functions to regulate divergence factors arising from the backbone's knowledge through a contrastive regularizer term Lcd\mathcal{L}_{\rm {cd^-}}. See Figure.1 for more detail.

In practice, the sub-network initially adopts the same weights as the backbone during the initial communication round. However, starting from the subsequent communication rounds, once the backbone undergoes accumulation using Equation below, each silo's local model is trained using the contrastive divergence loss. The sub-network produces auxiliary features of identical dimensions to the output features of the backbone. Throughout training, we anticipate minimal discrepancies in weights between the backbone and the sub-network when employing the contrastive divergence loss. Synchronization of weights across all silos occurs when gradients from the backbone and sub-network learning processes exhibit minimal disparity.

θi(k+1)=θi(k)αk1mh=1mLlr(θi(k),ξih(k))\theta_i\left(k + 1\right) = {\theta}_i\left(k\right)-\alpha_{k}\frac{1}{m}\sum^m_{h=1}\nabla \mathcal{L}_{\rm {lr}}\left({\theta}_i\left(k\right),\xi_i^h\left(k\right)\right)

where Llr\mathcal{L}_{\rm {lr}} is the local regression loss for autonomous steering.

2. Contrastive Divergence Loss

In practice, we've noticed that the initial phases of federated learning often yield subpar accumulated models. Unlike other approaches that tackle the non-IID issue by refining the accumulation step whenever silos transmit their models, we directly mitigate the impact of divergence factors during the local learning phase of each silo. Our method aims to minimize the discrepancy between the distribution of accumulated weights from neighboring silos in the backbone network (representing divergence factors) and the weights specific to silo ii in the sub-network (comprising locally learned knowledge). Once the distribution between silos achieves an acceptable level of synchronization, we reduce the influence of the sub-network and prioritize the steering angle prediction task. Inspired by the contrastive loss of the original Siamese Network, our proposed Contrastive Divergence Loss is formulated as follows:

Lcd=βLcd++(1β)Lcd=βH(θib,θis)+(1β)H(θis,θib)\mathcal{L}_{\rm {cd}} = \beta \mathcal{L}_{\rm {cd^+}} + (1-\beta) \mathcal{L}_{\rm {cd^-}} = \beta \mathcal{H}(\theta^b_i, \theta^s_i) + (1-\beta) \mathcal{H}(\theta^s_i,\theta^b_i)

where Lcd+\mathcal{L}_{\rm {cd^+}} is the positive contrastive divergence term and Lcd\mathcal{L}_{\rm {cd^-}} is the negative regularizer term; H\mathcal{H} is the Kullback-Leibler Divergence loss function

H(y^,y)=f(y^)log(f(y^)f(y))\mathcal{H}(\hat{y},y) = \sum \mathbf{f}(\hat{y}) \log\left(\frac{\mathbf{f}(\hat{y})}{\mathbf{f}(y)}\right)

where y^\hat{y} is the predicted representation, yy is dynamic soft label.

Consider Lcd+\mathcal{L}_{\rm {cd^+}} in Equation above as a Bayesian statistical inference task, our goal is to estimate the model parameters θb\theta^{b*} by minimizing the Kullback-Leibler divergence H(θib,θis)\mathcal{H}(\theta^b_i, \theta^s_i) between the measured regression probability distribution of the observed local silo P0(xθis)P_0 (x|\theta^s_i) and the accumulated model P(xθib)P (x|\theta^b_i). Hence, we can assume that the model distribution has a form of P(xθib)=eE(x,θib)/Z(θib)P (x|\theta^b_i) = e^{-E(x,\theta^b_i)}/Z(\theta^b_i), where Z(θib)Z(\theta^b_i) is the normalization term. However, evaluating the normalization term Z(θib)Z(\theta^b_i) is not trivial, which leads to risks of getting stuck in a local minimum. Inspired by Hinton, we use samples obtained through a Markov Chain Monte Carlo (MCMC) procedure with a specific initialization strategy to deal with the mentioned problem. Additionally inferred from Equation above, the Lcd+\mathcal{L}_{\rm {cd^+}} can be expressed under the SGD algorithm in a local silo by setting:

Lcd+=xP0(xθis)E(x;θib)θib+xQθib(xθis)E(x;θib)θib\mathcal{L}_{\rm {cd^+}} = -\sum_{x}P_0 (x|\theta^s_i)\frac{\partial E(x;\theta^b_i)}{\partial \theta^b_i} + \sum_{x}Q_{\theta^b_i} (x|\theta^s_i)\frac{\partial E(x;\theta^b_i)}{\partial \theta^b_i}

where Qθib(xθis)Q_{\theta^b_i} (x|\theta^s_i) is the measured probability distribution on the samples obtained by initializing the chain at P0(xθis)P_0 (x|\theta^s_i) and running the Markov chain forward for a defined step.

Consider Lcd\mathcal{L}_{\rm {cd^-}} regularizer in Equation above as a Bayesian statistical inference task, we can calculate Lcd\mathcal{L}_{\rm {cd^-}} as in Equation above, however, the role of θs\theta^s and θb\theta^b is inverse:

Lcd=xP0(xθib)E(x;θis)θis+xQθis(xθib)E(x;θis)θis\mathcal{L}_{\rm {cd^-}}=-\sum_{x}P_0 (x|\theta^b_i)\frac{\partial E(x;\theta^s_i)}{\partial \theta^s_i} + \sum_{x}Q_{\theta^s_i} (x|\theta^b_i)\frac{\partial E(x;\theta^s_i)}{\partial \theta^s_i}

We note that the key difference is that while the weight θib\theta^b_i of the backbone is updated by the accumulation process from Equation above, the weight θis\theta^s_i of the sub-network, instead, is not. This lead to different convergence behavior of contrastive divergence in Lcd+\mathcal{L}_{\rm {cd^+}} and Lcd\mathcal{L}_{\rm {cd^-}}. The negative regularizer term Lcd\mathcal{L}_{\rm {cd^-}} will converge to state θis\theta^{s*}_i provided Eθis\frac{\partial E}{\partial \theta^s_i} is bounded:

g(x,θis)=E(x;θis)θisxP0(x(θib,θis))E(x;θis)θisg(x,\theta^s_i) = \frac{\partial E(x;\theta^s_i)}{\partial \theta^s_i} -\sum_{x}P_0 (x|(\theta^b_i,\theta^s_i))\frac{\partial E(x;\theta^s_i)}{\partial \theta^s_i}
(θisθis){xP0(x)g(x,θis)x,xP0(x)Kθism(x,x)g(x,θis)}k1θisθis2(\theta^s_i - \theta^{s*}_i)\cdot\left\{ \sum_{x}{P_0(x)g(x,\theta^s_i)} - \sum_{x',x}{P_0(x')\mathbf{K}^m_{\theta^s_i}(x',x)g(x,\theta^{s*}_i})\right\}\geq \mathbf{k}_1|\theta^s_i-\theta^{s*}_i|^2

for any k1\mathbf{k}_1 constraint. Note that, Kθsm\mathbf{K}^m_{\theta^s} is the transition kernel.

Note that the negative regularizer term Lcd\mathcal{L}_{\rm {cd^-}} is only used in training models on local silos. Thus, it does not contribute to the accumulation process of federated training.

3. Total Training Loss

Local Regression Loss. We use mean square error (MAE) to compute loss for predicting the steering angle in each local silo. Note that, we only use features from the backbone for predicting steering angles.

Llr=MAE(θib,ξ^i)\mathcal{L}_{\rm {lr}} = \text{MAE}(\theta^b_i, \hat{\xi}_i )

where ξ^i\hat{\xi}_i is the ground-truth steering angle of the data sample ξi\xi_i collected from silo ii.

Local Silo Loss. The local silo loss computed in each communication round at each silo before applying the accumulation process is described as:

Lfinal=Llr+Lcd\mathcal{L}_{{\rm{final}}} = \mathcal{L}_{\rm {lr}} + \mathcal{L}_{\rm {cd}}

In practice, we observe that both the contrastive divergence loss Lcd\mathcal{L}_{\rm {cd}} to handle the non-IID problem and the local regression loss Llr\mathcal{L}_{\rm {lr}} for predicting the steering angle is equally important and indispensable.

Combining all losses together, at each iteration kk, the update in the backbone network is defined as:

θib(k+1)={jNi+{i}Ai,jθjb(k), if k0(modu+1),θib(k)αk1mh=1mLb(θib(k),ξih(k)),otherwise.\theta^b_i\left(k + 1\right) =\begin{cases} \sum_{j \in \mathcal{N}_i^{+} \cup{\{i\}}}\textbf{A}_{i,j}{\theta}^b_{j}\left(k\right), \textit{ \quad \quad \quad if k} \equiv 0 \pmod{u + 1},\\ {\theta}^b_i\left(k\right)-\alpha_{k}\frac{1}{m}\sum^m_{h=1}\nabla \mathcal{L}_{\rm {b}}\left({\theta}^b_i\left(k\right),\xi_i^h\left(k\right)\right), \text{otherwise.} \end{cases}

where Lb=Llr+Lcd+\mathcal{L}_{\rm {b}} = \mathcal{L}_{\rm {lr}} + \mathcal{L}_{\rm {cd^+}}, uu is the number of local updates.

In parallel, the update in the sub-network at each iteration kk is described as:

θis(k+1)=θis(k)αk1mh=1mLcd(θis(k),ξih(k))\theta^s_i\left(k + 1\right) ={\theta}^s_i\left(k\right)-\alpha_{k}\frac{1}{m}\sum^m_{h=1}\nabla \mathcal{L}_{\rm {cd^-}}\left({\theta}^s_i\left(k\right),\xi_i^h\left(k\right)\right)

Next

In the next post, we will evaluate the effectiveness of Constrative Divergence Loss in dealing with Non-IID problem in Federated Autonomous Driving.

Reducing Non-IID Effects in Federated Autonomous Driving with Contrastive Divergence Loss (Part 1)

Federated learning has been widely applied in autonomous driving since it enables training a learning model among vehicles without sharing users' data. However, data from autonomous vehicles usually suffer from the non-independent-and-identically-distributed (non-IID) problem, which may cause negative effects on the convergence of the learning process. In this paper, we propose a new contrastive divergence loss to address the non-IID problem in autonomous driving by reducing the impact of divergence factors from transmitted models during the local learning process of each silo. We also analyze the effects of contrastive divergence in various autonomous driving scenarios, under multiple network infrastructures, and with different centralized/distributed learning schemes. Our intensive experiments on three datasets demonstrate that our proposed contrastive divergence loss significantly improves the performance over current state-of-the-art approaches.

1. Introduction

Autonomous driving represents a burgeoning field wherein vehicles navigate without human intervention, employing a blend of vision, learning, and control algorithms to perceive and react to environmental changes. While traditional approaches heavily rely on supervised learning and data collection for model training, recent efforts, such as those by Bergamini et al., Chen et al., and Wang et al., have proposed various solutions to different challenges in autonomous driving. However, data collection poses privacy concerns as user data is shared with third parties, prompting a shift towards federated learning (FL). FL allows multiple parties to collaboratively train models without sharing raw data, thereby preserving user privacy while enabling autonomous vehicles to collectively learn predictive models from diverse datasets.

Typically, there are two primary federated learning scenarios: Server-based Federated Learning (SFL) and Decentralized Federated Learning (DFL). In SFL, a central node coordinates the training process and aggregates contributions from all clients, enhancing data privacy by transmitting only model weights. However, the reliance on central nodes in SFL can potentially bottleneck the system due to data transmission constraints. In contrast, DFL operates without a central server, employing a fully distributed network architecture. In autonomous driving, several works have explored both DFL and SFL to address different problems such as collision avoidance, trajectory prediction, and steering prediction.

Figure 1. Sample viewpoints over three vehicles and their steering angle distributions in Carla dataset. The visualization shows that the three vehicles have differences in visual input as well as steering angle distribution.

In practical application, both Server-based Federated Learning (SFL) and Decentralized Federated Learning (DFL) approaches exhibit their own advantages and limitations, yet both are susceptible to the non-IID problem inherent in federated learning. The non-IID problem, as described by Sattler et al. and Wang et al., arises when data partitioning across silos exhibits significant distribution shifts. Particularly in autonomous driving, this issue poses significant challenges during the accumulation process across vehicle silos. Each vehicle's unique driving patterns, weather conditions, and road types contribute to differences in data distribution, exacerbating the non-IID problem. For instance, data collected from vehicles on highways may differ substantially from data collected in urban settings. This discrepancy can lead to difficulties in constructing robust and accurate learning models, where reliance on data from a specific context may result in poor performance in others.

Previous studies have tackled the non-IID problem through various approaches such as optimizing the accumulation step, incorporating normalization into the global model, fine-tuning global model weights via distillation, or aligning the distribution between the global model and local ones. These methods typically refine global model weights to adapt to divergence factors arising from diverse distributions across local datasets. However, this optimization process can be challenging due to factors like determining the optimal accumulation step size for each local silo, managing network topology disconnections and bottlenecks, and ensuring consistency between local and global objectives.

In the paper, we introduce a novel Contrastive Divergence Loss (CDL) to tackle the non-IID problem in autonomous driving. Unlike other methods that wait for local model learning before addressing the non-IID issue during global aggregation, our CDL loss directly mitigates the impact of divergence factors throughout each local silo's learning process. This approach simplifies the adaptation of distribution between neighboring silos compared to handling the non-IID problem in the global model. Consequently, each local model becomes more resilient to changes in data distribution, allowing the use of typical accumulation methods like FedAvg for global weights without concerns about non-IID effects on convergence. The intensive experiments on three autonomous driving datasets verify our observation and show significant improvements over state-of-the-art methods.

2. Related Works

Autonomous Driving: Autonomous driving has emerged as a prominent research area in recent years, with a focus on leveraging deep learning for various tasks such as object detection, trajectory prediction, and autonomous control. For instance, Xin et al. proposed a recursive backstepping steering controller, while Xiong et al. analyzed nonlinear dynamics behavior using proportional control laws. Yi et al. presented an algorithm for self-reconfigurable robots during waypoint navigation, and Yin et al. combined model predictive control with covariance steering theory for robust autonomous driving systems.

Federated Learning for Autonomous Driving: Federated learning offers a privacy-aware solution for collaborative machine learning without sharing local data. It has gained traction in autonomous driving and intelligent transport systems, addressing communication, computation, and storage efficiency. Various works have explored federated learning for autonomous driving tasks such as steering angle prediction and turning signal prediction. However, challenges remain, including non-IID data distribution among participants, which recent works primarily address through accumulation processes rather than local silo optimization.

Contrastive Divergence: Contrastive models have gained attention in federated learning for handling heterogeneity in local data distribution. While contrastive learning has been applied to various datasets, its potential for mitigating non-IID data effects in federated autonomous driving remains underexplored. Most existing works focus on optimizing frameworks in server-based or P2P federated learning, or adjusting the accumulation process, with limited consideration of contrastive loss function behavior in federated autonomous driving scenarios.

Figure 2. A federated autonomus driving system.

3. Preliminary

We summarize the notations of our paper in Table.1.

Table 1. Notations.

We consider each autonomous vehicle as a data silo. Our goal is to collaboratively train a global driving policy θ\theta from NN silos by aggregating all local learnable weights θi\theta_i of each silo. Each silo computes the current gradient of its local loss function and then updates the model parameter using an optimizer. Mathematically, in the local update stage, at each silo ii, in each iteration kk, the model weights of the local silo can be computed as:

θi(k+1)=θi(k)αk1mh=1mLlr(θi(k),ξih(k))\theta_i\left(k + 1\right) = {\theta}_i\left(k\right)-\alpha_{k}\frac{1}{m}\sum^m_{h=1}\nabla \mathcal{L}_{\rm {lr}}\left({\theta}_i\left(k\right),\xi_i^h\left(k\right)\right)

where Llr\mathcal{L}_{\rm {lr}} is the local regression loss for autonomous steering. To update the global model, each silo interacts with the associated ones through a predefined topology:

θ(k+1)=i=0N1ϑiθi(k)\theta\left(k + 1\right) = \sum^{N-1}_{i = 0 }\vartheta_i{\theta}_i\left(k\right)

In practice, the local model in each silo is a deep network that takes the RGB images as inputs and predicts the associated steering angles

Next

In the next post, we will introduce our proposal with contrastive divergence loss.