/ / 20 min read

Reducing Non-IID Effects in Federated Autonomous Driving with Contrastive Divergence Loss (Part 1)

An introduction about Federated Autonomous Driving

Federated learning has been widely applied in autonomous driving since it enables training a learning model among vehicles without sharing users' data. However, data from autonomous vehicles usually suffer from the non-independent-and-identically-distributed (non-IID) problem, which may cause negative effects on the convergence of the learning process. In this paper, we propose a new contrastive divergence loss to address the non-IID problem in autonomous driving by reducing the impact of divergence factors from transmitted models during the local learning process of each silo. We also analyze the effects of contrastive divergence in various autonomous driving scenarios, under multiple network infrastructures, and with different centralized/distributed learning schemes. Our intensive experiments on three datasets demonstrate that our proposed contrastive divergence loss significantly improves the performance over current state-of-the-art approaches.

1. Introduction

Autonomous driving represents a burgeoning field wherein vehicles navigate without human intervention, employing a blend of vision, learning, and control algorithms to perceive and react to environmental changes. While traditional approaches heavily rely on supervised learning and data collection for model training, recent efforts, such as those by Bergamini et al., Chen et al., and Wang et al., have proposed various solutions to different challenges in autonomous driving. However, data collection poses privacy concerns as user data is shared with third parties, prompting a shift towards federated learning (FL). FL allows multiple parties to collaboratively train models without sharing raw data, thereby preserving user privacy while enabling autonomous vehicles to collectively learn predictive models from diverse datasets.

Typically, there are two primary federated learning scenarios: Server-based Federated Learning (SFL) and Decentralized Federated Learning (DFL). In SFL, a central node coordinates the training process and aggregates contributions from all clients, enhancing data privacy by transmitting only model weights. However, the reliance on central nodes in SFL can potentially bottleneck the system due to data transmission constraints. In contrast, DFL operates without a central server, employing a fully distributed network architecture. In autonomous driving, several works have explored both DFL and SFL to address different problems such as collision avoidance, trajectory prediction, and steering prediction.

Figure 1. Sample viewpoints over three vehicles and their steering angle distributions in Carla dataset. The visualization shows that the three vehicles have differences in visual input as well as steering angle distribution.

In practical application, both Server-based Federated Learning (SFL) and Decentralized Federated Learning (DFL) approaches exhibit their own advantages and limitations, yet both are susceptible to the non-IID problem inherent in federated learning. The non-IID problem, as described by Sattler et al. and Wang et al., arises when data partitioning across silos exhibits significant distribution shifts. Particularly in autonomous driving, this issue poses significant challenges during the accumulation process across vehicle silos. Each vehicle's unique driving patterns, weather conditions, and road types contribute to differences in data distribution, exacerbating the non-IID problem. For instance, data collected from vehicles on highways may differ substantially from data collected in urban settings. This discrepancy can lead to difficulties in constructing robust and accurate learning models, where reliance on data from a specific context may result in poor performance in others.

Previous studies have tackled the non-IID problem through various approaches such as optimizing the accumulation step, incorporating normalization into the global model, fine-tuning global model weights via distillation, or aligning the distribution between the global model and local ones. These methods typically refine global model weights to adapt to divergence factors arising from diverse distributions across local datasets. However, this optimization process can be challenging due to factors like determining the optimal accumulation step size for each local silo, managing network topology disconnections and bottlenecks, and ensuring consistency between local and global objectives.

In the paper, we introduce a novel Contrastive Divergence Loss (CDL) to tackle the non-IID problem in autonomous driving. Unlike other methods that wait for local model learning before addressing the non-IID issue during global aggregation, our CDL loss directly mitigates the impact of divergence factors throughout each local silo's learning process. This approach simplifies the adaptation of distribution between neighboring silos compared to handling the non-IID problem in the global model. Consequently, each local model becomes more resilient to changes in data distribution, allowing the use of typical accumulation methods like FedAvg for global weights without concerns about non-IID effects on convergence. The intensive experiments on three autonomous driving datasets verify our observation and show significant improvements over state-of-the-art methods.

2. Related Works

Autonomous Driving: Autonomous driving has emerged as a prominent research area in recent years, with a focus on leveraging deep learning for various tasks such as object detection, trajectory prediction, and autonomous control. For instance, Xin et al. proposed a recursive backstepping steering controller, while Xiong et al. analyzed nonlinear dynamics behavior using proportional control laws. Yi et al. presented an algorithm for self-reconfigurable robots during waypoint navigation, and Yin et al. combined model predictive control with covariance steering theory for robust autonomous driving systems.

Federated Learning for Autonomous Driving: Federated learning offers a privacy-aware solution for collaborative machine learning without sharing local data. It has gained traction in autonomous driving and intelligent transport systems, addressing communication, computation, and storage efficiency. Various works have explored federated learning for autonomous driving tasks such as steering angle prediction and turning signal prediction. However, challenges remain, including non-IID data distribution among participants, which recent works primarily address through accumulation processes rather than local silo optimization.

Contrastive Divergence: Contrastive models have gained attention in federated learning for handling heterogeneity in local data distribution. While contrastive learning has been applied to various datasets, its potential for mitigating non-IID data effects in federated autonomous driving remains underexplored. Most existing works focus on optimizing frameworks in server-based or P2P federated learning, or adjusting the accumulation process, with limited consideration of contrastive loss function behavior in federated autonomous driving scenarios.

Figure 2. A federated autonomus driving system.

3. Preliminary

We summarize the notations of our paper in Table.1.

Table 1. Notations.

We consider each autonomous vehicle as a data silo. Our goal is to collaboratively train a global driving policy θ\theta from NN silos by aggregating all local learnable weights θi\theta_i of each silo. Each silo computes the current gradient of its local loss function and then updates the model parameter using an optimizer. Mathematically, in the local update stage, at each silo ii, in each iteration kk, the model weights of the local silo can be computed as:

θi(k+1)=θi(k)αk1mh=1mLlr(θi(k),ξih(k))\theta_i\left(k + 1\right) = {\theta}_i\left(k\right)-\alpha_{k}\frac{1}{m}\sum^m_{h=1}\nabla \mathcal{L}_{\rm {lr}}\left({\theta}_i\left(k\right),\xi_i^h\left(k\right)\right)

where Llr\mathcal{L}_{\rm {lr}} is the local regression loss for autonomous steering. To update the global model, each silo interacts with the associated ones through a predefined topology:

θ(k+1)=i=0N1ϑiθi(k)\theta\left(k + 1\right) = \sum^{N-1}_{i = 0 }\vartheta_i{\theta}_i\left(k\right)

In practice, the local model in each silo is a deep network that takes the RGB images as inputs and predicts the associated steering angles

Next

In the next post, we will introduce our proposal with contrastive divergence loss.

Like What You See?