Posts tagged "distillation learning"

Light-weight Deformable Registration using Adversarial Learning with Distilling Knowledge (Part 3)

July 31, 2022 · 7 min read

In this part, we will show the effectivness and the ablation studies of Light-weight Deformable Registration Network and Adversarial Learning Algorithm with Distilling Knowledge.

Dataset

As mentioned in [1], we train method on two types of scans: Liver CT scans and Brain MRI scans.

For Liver CT scans, we use 5 datasets:

LiTS contains 131 liver segmentation scans.
MSD has 70 liver tumor CT scans, 443 hepatic vessels scans, and 420 pancreatic tumor scans.
BFH is a smaller dataset with 92 scans.
SLIVER is a challenging dataset with 20 liver segmentation scans and annotated by 3 expert doctors.
LSPIG (Liver Segmentation of Pigs) contains 17 pairs of CT scans from pigs, provided by the First Affiliated Hospital of Harbin Medical University.

For Brain MRI scans, we use 4 datasets: 1. ADNI contains 66 scans. 2. ABIDE contains 1287 scans. 3. ADHD contains 949 scans. 4. LPBA has 40 scans, each featuring a segmentation ground truth of 56 anatomical structures.

Baselines

We compare LDR ALDK method with the following recent deformable registration methods:

ANTs SyN and Elastix B-spline are methods that find an optimal transformation by iteratively update the parameters of the defined alignment.
VoxelMorph predicts a dense deformation in an unsupervised manner by using deconvolutional layers.
VTN is an end-to-end learning framework that uses convolutional neural networks to register 3D medical images, especially large displaced ones.
RCN is a recent recursive deep architecture that utilizes learnable cascade and performs progressive deformation for each warped image.

Results

Table 1 summarizes the overall performance, testing speed, and the number of parameters compared with recent state-of-the-art methods in the deformable registration task. The results clearly show that Light-weight Deformable Registration network (LDR) accompanied by Adversarial Learning with Distilling Knowledge (ALDK) algorithm significantly reduces the inference time and the number of parameters during the inference phase. Moreover, the method achieves competitive accuracy with the most recent highly performed but expensive networks, such as VTN or VoxelMorph. We notice that this improvement is consistent across all experiments on different datasets SLIVER, LiTS, LSPIG, and LPBA.

In particular, we observe that on the SLIVER dataset the Dice score of best model with 3 cascades (3-cas LDR + ALDK) is 0.3% less than the best result of 3-cas VTN + Affine, while inference speed is ?21 times faster on a CPU and the parameters used during inference is ~8 times smaller. Including benchmarking results in three other datasets, i.e., LiTS, LSPIG, and LPBA, light-weight model only trades off an average of 0.5% in Dice score and 1.25% in Jacc score for a significant gain of speed and a massive reduction in the number of parameters. We also notice that method is the only work that achieves the inference time of approximately 1s on a CPU. This makes method well suitable for deployment as it does not require expensive GPU hardware for inference.

Fig-1

Table 1: COMPARISON AMONG LDR ALDK MODEL WITH RECENT APPROACHES.

Ablation Study

Effectiveness of ALDK. Table 2 summarizes the effectiveness of Adversarial Learning with Distilling Knowledge (ALDK) when being integrated into the light-weight student network. Note that LDR without ALDK is trained using only the reconstruction loss in an unsupervised learning setup. From this table, we clearly see that ALDK algorithm improves the Dice score of the LDR tested in the SLIVER dataset by 3.4%, 4.0%, and 3.1% for 1-cas, 2-cas, and 3-cas setups, respectively. Additionally, using ALDK also increases the Jacc score by 5.2%, 4.9%, and 3.9% for 1-cas LDR, 2-cas LDR, and 3-cas LDR. These results verify the stability of adversarial learning algorithm in the inference phase, under the differences evaluation metrics, as well as the number of cascades setups. Furthermore, Table 2 also clearly shows the effectiveness and generalization of ALDK when being applied to the student network. Since the deformations extracted from the teacher are used only in the training period, adversarial learning algorithm fully maintains the speed and the number of parameters for the light-weight student network during inference. All results indicate that student network incorporated with the adversarial learning algorithm successfully achieves the performance goal, while maintaining the efficient computational cost of the light-weight setup.

Fig-2

Table 2: COMPARISON AMONG LDR ALDK MODEL WITH RECENT APPROACHES.

Accuracy vs. Complexity. Figure 1 demonstrates the experimental results from the SLIVER dataset between LDR + ALDK and the baseline VTN under multiple recursive cascades setup on both CPU and GPU. On the CPU (Figure 1-a), in terms of the 1-cascade setup, the Dice score of method is 0.2% less than VTN while the speed is ~15 times faster. The more the number of cascades is leveraged, the higher the speed gap between LDR + ALDK and the baseline VTN, e.g. the CPU speed gap is increased to ~21 times in a 3-cascades setup. We also observe the same effect on GPU (Figure 1-b), where method achieves slightly lower accuracy results than VTN, while clearly reducing the inference time. These results indicate that LDR + ALDK can work well with the teacher network to improve the accuracy while significantly reducing the inference time on both CPU and GPU in comparison with the baseline VTN network.

Fig-3

Figure 1:Plots of Dice score and Inference speed with respect to the number of cascades of the baseline Affine + VTN and LDR + ALDK. (a) for CPU speed and (b) for GPU speed. Note that results are reported for the SLIVER dataset; bars represent the CPU speed; lines represent the Dice score. All methods use an Intel Xeon E5-2690 v4 CPU and Nvidia GeForce GTX 1080 Ti GPU for inference.

Visualization

Figure 2 illustrates the visual comparison among 1-cas LDR, 1-cas LDR + ALDK, and the baseline 1-cas RCN. Five different moving images in a volume are selected to apply the registration to a chosen fixed image. It is important to note that though the sections of the warped segmentations can be less overlap with those of the fixed one, the segmentation intersection over union is computed for the volume and not the sections. In the segmented images in Figure 2, besides the matched area colored by white, we also marked the miss-matched areas by red for an easy-to-read purpose.

From Figure 2, we can see that the segmentation resutls of 1-cas LDR network without using ALDK (Figure 2-a) contains many miss-matched areas (denoted in red color). However, when we apply ALDK to the student network, the registration results are clearly improved (Figure 2-b). Overall, LDR + ALDK visualization results in Figure 2-b are competitive with the baseline RCN network (Figure 2-c). This visualization confirms that framework for deformable registration can achieve comparable results with the recent RCN network.

Fig-3

Figure 2:The visualization comparison between LDR (a), LDR + ALDK (b), and the baseline RCN (c). The left images are sections of the warped images; the right images are sections of the warped segmentation (white color represents the matched areas between warped image and fixed image, red color denotes the miss-matched areas). The segmentation visualization indicates that LDR + ALDK (b) method reduces the miss-matched areas of the student network LDR (a) significantly. Best viewed in color.

Reference

[1] Tran, Minh Q., et al. "Light-weight deformable registration using adversarial learning with distilling knowledge." IEEE Transactions on Medical Imaging, 2022.

Open Source

🐱 Github: https://github.com/aioz-ai/LDR_ALDK

Light-weight Deformable Registration using Adversarial Learning with Distilling Knowledge (Part 2)

June 20, 2022 · 6 min read

In this part, we will introduce the Architecture of Light-weight Deformable Registration Network and Adversarial Learning Algorithm with Distilling Knowledge.

The Architecture of Light-weight Deformable Registration Network

In practice, recent deformation networks follow an encoder-decoder architecture and use 3D convolution to progressively down-sample the image, and deconvolution (transposed convolution) to recover spatial resolution [1, 3]. However, this setup consumes a large number of parameters. Therefore, the built models are computationally expensive and time-consuming. To overcome this problem we design a new light-weight student network as illustrated in Figure 1.

In particular, the proposed light-weight network has four convolution layers and three deconvolution layers. Each convolutional layer has a bank of $4 \times 4 \times 4$ filters with strides of $2 \times 2 \times 2$ , followed by a ReLU activation function. The number of output channels of the convolutional layers starts with $16$ at the first layer, doubling at each subsequent layer, and ends up with $128$ . Skip connections between the convolutional layers and the deconvolutional layers are added to help refine the dense prediction. The subnetwork outputs a dense flow prediction field, i.e., a $3$ channels volume feature map with the same size as the input.

In comparison with the current state-of-the-art dense deformable registration network [3], the number of parameters of our proposed light-weight student network is reduced approximately $10$ times. In practice, this significant reduction may lead to an accuracy drop. Therefore, we propose a new Adversarial Learning with Distilling Knowledge algorithm to effectively leverage the teacher deformations $\phi_t$ to our introduced student network, making it light-weight but achieving competitive performance.

Fig-1

Figure 1: The structure of Light-weight Deformable Registration student network. The number of channels is annotated above the layer. Curved arrows represent skip paths (layers connected by an arrow are concatenated before transposed convolution). Smaller canvas means lower spatial resolution (Source).

Adversarial Learning Algorithm with Distilling Knowledge

Our adversarial learning algorithm aims to improve the student network accuracy through the distilled teacher deformations extracted from the teacher network. The learning method comprises a deformation-based adversarial loss $\mathcal{L}_{adv}$ and its accompanying learning strategy (Algorithm 1).

Fig-2

Figure 2: Adversarial Learning Strategy(Source).

Adversarial Loss. The loss function for the light-weight student network is a combination of the discrimination loss $l_{dis}$ and the reconstruction loss $l_{res}$ . However, the forward and backward process through loss function is controlled by the Algorithm 1. In particular, the last deformation loss $\mathcal{L}_{adv}$ that outputs the final warped image can be written as:

\mathcal{L}_{adv} = \gamma l_{rec} + (1 - \gamma) l_{dis}

where $\gamma$ controls the contribution between $l_{rec}$ and $l_{dis}$ . Note that, the $\mathcal{L}_{adv}$ is only applied for the final warped image.

Discrimination Loss. In the student network the discrimination loss is computed in Equation below}.

l_{{dis}} = \left\lVert D_\mathbf{\theta}(\phi_{s}) - D_\mathbf{\theta}(\phi_{t}) \right\lVert_2^{2} + \lambda\bigg(\left\lVert \nabla_{\hat\phi_{s}}D_\mathbf{\theta}(\hat\phi_{s}) \right\lVert_2 - 1\bigg)^{2}

where $\lambda$ controls gradient penalty regularization. The joint deformation $\hat\phi_{s}$ is computed from the teacher deformation $\phi_{t}$ and the predicted student deformation $\phi_{s}$ as follow:

\hat\phi_{s} = \beta \phi_{t} + (1 - \beta) \phi_{s}

where $\beta$ control the effect of the teacher deformation.

In Discrimination Loss, $D_\mathbf{\theta}$ is the discriminator, formed by a neural network with learnable parameters ${\theta}$ . The details of $D_\mathbf{\theta}$ is shown in Figure 3. In particular, $D_\mathbf{\theta}$ consists of six $3D$ convolutional layers, the first layer is $128 \times 128 \times 128 \times 3$ and takes the $c \times c \times c \times 1$ deformation as input. $c$ is equaled to the scaled size of the input image. The second layer is $64 \times 64 \times 64 \times 16$ . From the second layer to the last convolutional layer, each convolutional layer has a bank of $4 \times 4 \times 4$ filters with strides of $2 \times 2 \times 2$ , followed by a ReLU activation function except for the last layer which is followed by a sigmoid activation function. The number of output channels of the convolutional layers starts with $16$ at the second layer, doubling at each subsequent layer, and ends up with $256$ .

Basically, this is to inject the condition information with a matched tensor dimension and then leave the network learning useful features from the condition input. The output of the last neural layer is the mean feature of the discriminator, denoted as $M$ . Note that in the discrimination loss, a gradient penalty regularization is applied to deal with critic weight clipping which may lead to undesired behavior in training adversarial networks.

Fig-3

Figure 3: The structure of the discriminator $D_\mathbf{\theta}$ used in the Discrimination Loss ( $l_{dis}$ ) of our Adversarial Learning with Distilling Knowledge algorithm (Source).

Reconstruction Loss. The reconstruction loss $l_{rec}$ is an important part of a deformation estimator. Follow the VTN [3] baseline, the reconstruction loss is written as:

l_{{rec}} (\textbf{\textit{I}}_m^h,\textbf{\textit{I}}_f) = 1 - CorrCoef [\textbf{\textit{I}}_m^h,\textbf{\textit{I}}_f]

where

CorrCoef[\textbf{\textit{I}}_1, \textbf{\textit{I}}_2] = \frac{Cov[\textbf{\textit{I}}_1,\textbf{\textit{I}}_2]}{\sqrt{Cov[\textbf{\textit{I}}_1,\textbf{\textit{I}}_1]Cov[\textbf{\textit{I}}_2,\textbf{\textit{I}}_2]}}

Cov[\textbf{\textit{I}}_1, \textbf{\textit{I}}_2] = \frac{1}{|\omega|}\sum_{x \in \omega} \textbf{\textit{I}}_1(x)\textbf{\textit{I}}_2(x) - \frac{1}{|\omega|^{2}}\sum_{x \in \omega} \textbf{\textit{I}}_1(x)\sum_{y \in \omega}\textbf{\textit{I}}_2(y)

where $CorrCoef[\textbf{\textit{I}}_1, \textbf{\textit{I}}_2]$ is the correlation between two images $\textbf{\textit{I}}_1$ and $\textbf{\textit{I}}_2$ , $Cov[\textbf{\textit{I}}_1, \textbf{\textit{I}}_2]$ is the covariance between them. $\omega$ denotes the cuboid (or grid) on which the input images are defined.

Learning Strategy. The forward and backward of the aforementioned $\mathcal{L}_{adv}$ is controlled by the adversarial learning strategy described in Algorithm 1.

In our deformable registration setup, the role of real data and attacking data is reversed when compared with the traditional adversarial learning strategy. In adversarial learning, the model uses unreal (generated) images as attacking data, while image labels are ground truths. However, in our deformable registration task, the model leverages the unreal (generated) deformations from the teacher as attacking data, while the image is the ground truth for the model to reconstruct the input information. As a consequence, the role of images and the labels are reversed in our setup. Since we want the information to be learned more from real data, the generator will need to be considered more frequently. Although the knowledge in the discriminator is used as attacking data, the information it supports is meaningful because the distilled information is inherited from the high-performed teacher model. With these characteristics of both the generator and discriminator, the light-weight student network is expected to learn more effectively and efficiently.

Reference

[1] S. Zhao, Y. Dong, E. I. Chang, Y. Xu, et al., Recursive cascaded networks for unsupervised medical image registration, in ICCV, 2019.

[2] G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, ArXiv, 2015.

[3] S. Zhao, T. Lau, J. Luo, I. Eric, C. Chang, and Y. Xu, Unsupervised 3d end-to-end medical image registration with volume tweening network, IEEE J-BHI, 2019.

Open Source

🐱 Github: https://github.com/aioz-ai/LDR_ALDK

Light-weight Deformable Registration using Adversarial Learning with Distilling Knowledge

April 19, 2022 · 7 min read

Introduction: Medical image registration

Medical image registration is the process of systematically placing separate medical images in a common frame of reference so that the information they contain can be effectively integrated or compared. Applications of image registration include combining images of the same subject from different modalities, aligning temporal sequences of images to compensate for the motion of the subject between scans, aligning images from multiple subjects in cohort studies, or navigating with image guidance during interventions. Since many organs do deform substantially while being scanned, the rigid assumption can be violated as a result of scanner-induced geometrical distortions that differ between images. Therefore, performing deformable registration is an essential step in many medical procedures.

Previous Studies, Remaining Challenges, and Motivation

Recently, learning-based methods have become popular to tackle the problem of deformable registration. These methods can be split into two groups: (i) supervised methods that rely on the dense ground-truth flows obtained by either traditional algorithms or simulating intra-subject deformations. Although these works achieve state-of-the-art performance, they require a large amount of manually labeled training data, which are expensive to obtain; and (ii) unsupervised learning methods that use a similarity measurement between the moving and the fixed image to utilize a large amount of unlabelled data. These unsupervised methods achieve competitive results in comparison with supervised methods. However, their deformations are reconstructed without the direct ground-truth guidance, hence leading to the limitation of leveraging learnable information. Furthermore, recent unsupervised methods all share an issue of great complexity as the network parameters increase significantly when multiple progressive cascades are taken into account. This leads to the fact that these works can not achieve real-time performance during inference while requiring intensively computational resources when deploying.

In practice, there are many scenarios when medical image registration are needed to be fast - consider matching preoperative and intra-operative images during surgery, interactive change detection of CT or MRI data for a radiologist, deformation compensation or 3D alignment of large histological slices for a pathologist, or processing large amounts of images from high-throughput imaging methods. Besides, in many image-guided robotic interventions, performing real-time deformable registration is an essential step to register the images and deal with organs that deform substantially. Economically, the development of a CPU-friendly solution for deformable registration will significantly reduce the instrument costs equipped for the operating theatre, as it does not require GPU or cloud-based computing servers, which are costly and consume much more power than CPU. This will benefit patients in low- and middle-income countries, where they face limitations in local equipment, personnel expertise, and budget constraints infrastructure. Therefore, design an efficient model which is fast and accurate for deformable registration is a crucial task and worth for study in order to improve a variety of surgical interventions.

Contribution

Deformable registration is a crucial step in many medical procedures such as image-guided surgery and radiation therapy. Most recent learning-based methods focus on improving the accuracy by optimizing the non-linear spatial correspondence between the input images. Therefore, these methods are computationally expensive and require modern graphic cards for real-time deployment. Thus, we introduce a new Light-weight Deformable Registration network that significantly reduces the computational cost while achieving competitive accuracy (Fig.1). In particular, we propose a new adversarial learning with distilling knowledge algorithm that successfully leverages meaningful information from the effective but expensive teacher network to the student network. We design the student network such as it is light-weight and well suitable for deployment on a typical CPU. The extensively experimental results on different public datasets show that our proposed method achieves state-of-the-art accuracy while significantly faster than recent methods. We further show that the use of our adversarial learning algorithm is essential for a time-efficiency deformable registration method.

Fig-1

(a)

(b) Figure 1: Comparison between typical deep learning-based methods for deformable registration (a) and our approach using adversarial learning with distilling knowledge for deformable registration (b). In our work, the expensive Teacher Network is used only in training; the Student Network is light-weight and inherits helpful knowledge from the Teacher Network via our Adversarial Learning algorithm. Therefore, the Student Network has high inference speed, while achieving competitive accuracy (Source).

Methodology

Method overview

We describe our method for Light-weight Deformable Registration using Adversarial Learning with Distilling Knowledge. Our method is composed of three main components: (i)) a Knowledge Distillation module which extracts meaningful deformations $\bm{\phi_t}$ from the Teacher Network; (ii) a Light-weight Deformable Registration (LDR) module which outputs a high-speed Student Network; and (iii) an Adversarial Learning with Distilling Knowledge (ALDK) algorithm which effectively leverages teacher deformations $\bm{\phi}_t$ to the student deformations. An overview of our proposed deformable registration method can be found in Fig.2.

Fig-2

Figure 2: An overview of our proposed Light-weight Deformable Registration (LDR) method using Adversarial Learning with Distilling Knowledge (ALDK). Firstly, by using knowledge distillation, we extract the deformations from the Teacher Network as meaningful ground-truths. Secondly, we design a light-weight student network, which has competitive speed. Finally, We employ the Adversarial Learning with Distilling Knowledge algorithm to effectively transfer the meaningful knowledge of distilled deformations from the Teacher Network to the Student Network (Source).

Since the content may over-length, in this part, we introduce the background theory for Deformable Registration and Knowledge Distillation for Deformation. In the next part, we will introduce the Architecture of Light-weight Deformable Registration Network and Adversarial Learning Algorithm with Distilling Knowledge. In the final part, we will introduce the effectiveness of the method in comparison with recent states of the arts and detailed analysis.

Background: Deformable Registration

We follow RCN [1] to define deformable registration task recursively using multiple cascades. Let $\textbf{\textit{I}}_m, \textbf{\textit{I}}_f$ denote the moving image and the fixed image respectively, both defined over $d$ -dimensional space $\bm{\Omega}$ . A deformation is a mapping $\bm{\phi} : \bm{\Omega} \rightarrow \bm{\Omega}$ . A reasonable deformation should be continuously varying and prevented from folding. The deformable registration task is to construct a flow prediction function $\textbf{F}$ which takes $\textbf{\textit{I}}_m, \textbf{\textit{I}}_ f$ as inputs and predicts a dense deformation $\bm{\phi}$ that aligns $\textbf{\textit{I}}_m$ to $\textbf{\textit{I}}_f$ using a warp operator $\circ$ as follows:

\textbf{F}^{(n)}(\textbf{\textit{I}}^{(n-1)}_m,\textbf{\textit{I}}_f)=\phi^{(n)} \circ \textbf{F}^{(n-1)}(\phi^{(n-1)} \circ \textbf{\textit{I}}^{(n-2)}_m,\textbf{\textit{I}}_f)

where $\textbf{F}^{(n-1)}$ is the same as $\textbf{F}^{(n)}$ , but in a different flow prediction function. Assuming for $n$ cascades in total, the final output is a composition of all predicted deformations, i.e.,

\textbf{F}(\textbf{\textit{I}}_m, \textbf{\textit{I}}_f)=\phi^{(n)} \circ...\circ \phi^{(1)},

and the final warped image is constructed by

\textbf{\textit{I}}_{m}^{(n)}=\textbf{F}(\textbf{\textit{I}}_m,\textbf{\textit{I}}_f) \circ \textbf{\textit{I}}_m

In general, previous Equations form the hypothesis function $\mathcal{F}$ under the learnable parameter $\mathbf{W}$ ,

\mathcal{F}(\textbf{\textit{I}}_{m}, \textbf{\textit{I}}_f, \mathbf{W}) = (\mathbf{v}_{\phi}, \textbf{\textit{I}}_m^{(n)})

where $\mathbf{v}_{\phi} = [\bm{\phi}^{(1)}, \bm{\phi}^{(2)}, ..., \bm{\phi}^{(k)},..., \bm{\phi}^{(n)}]$ is a vector containing predicted deformations of all cascades. Each deformation $\bm{\phi}^{(k)}$ can be computed as

\bm{\phi}^{(k)} = {\mathcal{F}}^{(k)}\left(\textbf{\textit{I}}_{m}^{(k-1)}, \textbf{\textit{I}}_f, \mathbf{W}_{\phi^{(k)}}\right)

To estimate and achieve a good deformation, different networks are introduced to define and optimize the learnable parameter $\mathbf{W}$ .

Knowledge Distillation for Deformation

Knowledge distillation is the process of transferring knowledge from a cumbersome model (teacher model) to a distilled model (student model). The popular way to achieve this goal is to train the student model on a transfer set using a soft target distribution produced by the teacher model.

Different from the typical knowledge distillation methods that target the output softmax of neural networks as the knowledge, in the deformable registration task, we leverage the teacher deformation $\bm{\phi}_t$ as the transferred knowledge. As discussed in [2], teacher networks are usually high-performed networks with good accuracy. Therefore, our goal is to leverage the current state-of-the-art Recursive Cascaded Networks (RCN) [1] as the teacher network for extracting meaningful deformations to the student network. The RCN network contains an affine transformation and a large number of dense deformable registration sub-networks designed by VTN [3]. Although the teacher network has expensive computational costs, it is only applied during the training and will not be used during the inference.

Reference

[1] S. Zhao, Y. Dong, E. I. Chang, Y. Xu, et al., Recursive cascaded networks for unsupervised medical image registration, in ICCV, 2019.

[2] G. Hinton, O. Vinyals, and J. Dean, Distilling the knowledge in a neural network, ArXiv, 2015.

[3] S. Zhao, T. Lau, J. Luo, I. Eric, C. Chang, and Y. Xu, Unsupervised 3d end-to-end medical image registration with volume tweening network, IEEE J-BHI, 2019.

Open Source

🐱 Github: https://github.com/aioz-ai/LDR_ALDK

3 posts tagged with "distillation learning"

#Dataset

#Baselines

#Results

#Ablation Study

#Visualization

#Reference

#Open Source

#The Architecture of Light-weight Deformable Registration Network

#Adversarial Learning Algorithm with Distilling Knowledge

#Reference

#Open Source

#Introduction: Medical image registration

#Previous Studies, Remaining Challenges, and Motivation

#Contribution

#Methodology

#Method overview

#Background: Deformable Registration

#Knowledge Distillation for Deformation

#Reference

#Open Source

Dataset

Baselines

Results

Ablation Study

Visualization

Reference

Open Source

The Architecture of Light-weight Deformable Registration Network

Adversarial Learning Algorithm with Distilling Knowledge

Reference

Open Source

Introduction: Medical image registration

Previous Studies, Remaining Challenges, and Motivation

Contribution

Methodology

Method overview

Background: Deformable Registration

Knowledge Distillation for Deformation

Reference

Open Source