2 posts tagged with "federated learning"

View All Tags

Federated Learning, Challenges and Hot Trends for research

Federated learning and its remaining challenge

Federated learning is the process of training statistical models via a network of distant devices or siloed data centers, such as mobile phones or hospitals, while keeping data locally. In terms of federated learning, there are five major obstacles that have a direct impact on the paper publishing trend.

1. Expensive Communication

Due to the internet connection, huge number of users, and administrative costs, there is a bottleneck in communication between devices and server-devices.

2. Systems Heterogeneity

Because of differences in hardware (CPU, RAM), network connection (3G, 4G, 5G, wifi), and power, each device in federated networks may have different storage, computational, and communication capabilities (battery level).

3. Statistical Heterogeneity

Devices routinely create and collect data in non-identically dispersed ways across the network; for example, in the context of a next word prediction, mobile phone users employ a variety of languages. Furthermore, the quantity of data points on different devices may differ greatly, and an underlying structure may exist that describes the interaction between devices and their related distributions. This data generation paradigm violates frequently-used independent and identically distributed (I.I.D. problem) assumptions in distributed optimization, increases the likelihood of stragglers, and may add complexity in terms of modeling, analysis, and evaluation.

4. Privacy Concerns

In federated learning applications, privacy is a crucial problem. Federated learning takes a step toward data protection by sharing model changes, such as gradient information, rather than the raw data created on each device. Nonetheless, transmitting model updates during the training process may divulge sensitive information to a third party or the central server.

5. Domain transfer

Not any task can be applied to the federated learning paradigm to finish their training process due to the aforementioned four challenges.

Hot trends

Data distribution heterogeneity and label inadequacy.

  • Distributed Optimization
  • Non-IID and Model Personalization
  • Semi-Supervised Learning
  • Vertical Federated Learning
  • Decentralized FL
  • Hierarchical FL
  • Neural Architecture Search
  • Transfer Learning
  • Continual Learning
  • Domain Adaptation
  • Reinforcement Learning
  • Bayesian Learning

Security, privacy, fairness, and incentive mechanisms:

  • Adversarial-Attack-and-Defense
  • Privacy
  • Fairness
  • Interpretability
  • Incentive Mechanism

Communication and computational resource constraints, software and hardware heterogeneity, the FL system

  • Communication-Efficiency
  • Straggler Problem
  • Computation Efficiency
  • Wireless Communication and Cloud Computing
  • FL System Design

Models and Applications

  • Models
  • Natural language Processing
  • Computer Vision
  • Health Care
  • Transportation
  • Recommendation System
  • Speech
  • Finance
  • Smart City
  • Robotics
  • Networking
  • Blockchain
  • Other

Benchmark, Dataset, and Survey

  • Benchmark and Dataset
  • Survey

Introduction to Federated Learning

Federated Learning: machine learning over a distributed dataset, where user devices (e.g., desktop, mobile phones, etc.) are utilized to collaboratively learn a shared prediction model while keeping all training data locally on the device. This approach decouples the ability to do machine learning from storing the data in the cloud.

Conceptually, federated learning proposes a mechanism to train a high-quality centralized model. Simultaneously, training data remains distributed over many clients, each with unreliable and relatively slow network connections.

The idea behind federated learning is as conceptually simple as its technologically complex. Traditional machine learning programs relied on a centralized model for training in which a group of servers runs a specific model against training and validation datasets. That centralized training approach can work very efficiently in many scenarios. Still, it has also proven to be challenging in use cases involving a large number of endpoints using and improving the model. The prototypical example of the limitation of the centralized training model can be found in mobile or internet of things(IoT) scenarios. The quality of a model depends on the information processed across hundreds of thousands or millions of devices. Each endpoint can contribute to a machine learning model's training in its own autonomous way in those scenarios. In other words, knowledge is federated.

Blockchain: large, distributed dataset, where no-one can edit/delete an old entry, nor fake a new entry. The data is enforced by fundamental limits of computations (i.e., Proof of Work).

Smart Contract: dataset stored on the Blockchain, which includes: Data (i.e., ledgers, events, statistics), State (today's ledger, today's events), Code (rules for changing state).

Math Examples

Fundamental Theorem of Calculus Let f:[a,b]Rf:[a,b] \to \R be Riemann integrable. Let F:[a,b]RF:[a,b]\to\R be F(x)=axf(t)dtF(x)= \int_{a}^{x}f(t)dt. Then FF is continuous, and at all xx such that ff is continuous at xx, FF is differentiable at xx with F(x)=f(x)F'(x)=f(x).

Lift(LL) can be determined by Lift Coefficient (CLC_L) like the following equation.

L=12ρv2SCLL = \frac{1}{2} \rho v^2 S C_L

References

[1] J.Rodriguez. Whats New in Deep Learning Research: Understanding Federated Learning.