50 - Deep Learning - Unsupervised Learning Part 5 [ID:18257]

50 von 150 angezeigt

Welcome back to deep learning to the last video where we discussed the different algorithms

regarding generative adversarial networks and today we want to look into the fifth part

of our lecture and these are essentially more tricks of the trade concerning GANs.

One trick that can help you quite a bit is one-sided label smoothing. So what you may want to do is

replace your targets of the real samples with a smoothed version. So instead of using a one

probability you use a 0.9 probability but you do not use the same for the fake samples. So you

don't change their label to zero because otherwise this will reinforce incorrect behavior. So your

generator would produce samples that resemble the data or samples it already makes. Benefits are

that you can prevent the discriminator from giving very large gradients to your generator and you

also prevent extrapolating to encourage extreme samples. Is balancing between the generator and

the discriminator necessary? No it's not. The GANs work by estimating the ratio of data model density

so the ratio is estimated correctly only when the discriminator is optimal so it's fine if your

discriminator overpowers the generator. When the discriminator gets too good your gradients of

course may vanish then you can use tricks like the non-saturating laws the Wasserstein GANs as

we talked about earlier and you may also run into the problem that your generator's gradients may

get too large and in this case you can use the trick of label smoothing. Of course you can also

work with deep convolutional GANs so this is the C-GAN where you implement a deep learning approach

into the generator so you can replace pooling layers with stride convolutions and transposed

convolutions. You can fully remove the connected hidden layers for deeper architectures and the

generator then typically uses ReLU activations except for the output layer in which you use

a tungent superpolychos and the discriminator for example here uses a leaky ReLU activation for

all the layers and they use batch normalization and if you do that then you may end up in the

following problem you can see here some generation results and within the batches there may be a very

strong intra-batch correlation so within the batch all of the generated images look very similar.

And this brings us to the concept of virtual batch normalization so you don't want to use

one batch normalization instance for both mini-batches. You could use two separate batch

normalizations or even better you use the virtual batch normalization and in case this is too

expensive you choose instance normalization for each sample and subtract the mean and divide by

the standard deviation. In case you choose virtual batch normalization then you create a reference

batch R of random samples and fix them once at the start of the training and then for each Xi of the

current mini-batch you create a new virtual batch that is the reference batch union d Xi and then

you compute the mean and standard deviation of this virtual batch and you always need to propagate

then R forward in addition to the current batch. This then allows you to normalize Xi with these

statistics so this may be kind of expensive but we have seen that this is very useful for stabilizing

the training and remove the intra-batch correlations. There's also the idea of

historical averaging so there you add a penalty term that punishes weights which are rather far

away from the historical average and this historical average of the parameters can then be updated in an

online fashion. Similar tricks from reinforcement learning can also work for generative adversarial

networks like experience replay. You keep a replay buffer of past generations and occasionally show

them and you keep checkpoints from the past generator and discriminator and occasionally swap

them out for a few iterations. So if you do so then you can do things like the dc-gun. Here are

bedrooms after just one epoch and you can see that you are able to generate quite a few different

bedrooms. So very interesting what kind of diversity in terms of generation you can actually achieve.

Another interesting observation is that you can do vector arithmetic on the generated images so

you can generate for example the mean of three instances of man with glasses and with this mean

then you can subtract for example the mean of man without glasses and then you compute the mean

of woman without glasses and add it on top and what you get is woman with glasses. So you can

really use the constrained generation with this trick in order to generate something where you

potentially don't have a conditioning variable for. So the guns learn a distribution representation

that disentangles the concept of gender from the concept of wearing glasses and if you're interested

Teil einer Videoserie :

Deep Learning - Plain Version

Presenters

Prof. Dr. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:18:09 Min

Aufnahmedatum

2020-06-21

Hochgeladen am

2020-06-21 21:26:42

Sprache

en-US

Deep Learning - Unsupervised Learning Part 5

In this last video on unsupervised learning, we introduce some more advanced GAN concepts to avoid mode collapse and strong intra-batch correlation using virtual batch normalization, unrolled GANs, and minibatch discrimination.

Further Reading:
A gentle Introduction to Deep Learning

Links
Link - Variational Autoencoders:
Link - NIPS 2016 GAN Tutorial of Goodfellow
Link - How to train a GAN? Tips and tricks to make GANs work (careful, not
everything is true anymore!)
Link - Ever wondered about how to name your GAN?

References
[1] Xi Chen, Xi Chen, Yan Duan, et al. “InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets”. In: Advances in Neural Information Processing Systems 29. Curran Associates, Inc., 2016, pp. 2172–2180.
[2] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, et al. “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion”. In: Journal of Machine Learning Research 11.Dec (2010), pp. 3371–3408.
[3] Emily L. Denton, Soumith Chintala, Arthur Szlam, et al. “Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks”. In: CoRR abs/1506.05751 (2015). arXiv: 1506.05751.
[4] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern classification. 2nd ed. New York: Wiley-Interscience, Nov. 2000.
[5] Asja Fischer and Christian Igel. “Training restricted Boltzmann machines: An introduction”. In: Pattern Recognition 47.1 (2014), pp. 25–39.
[6] John Gauthier. Conditional generative adversarial networks for face generation. Mar. 17, 2015. URL: http://www.foldl.me/2015/conditional-gans-face-generation/ (visited on 01/22/2018).
[7] Ian Goodfellow. NIPS 2016 Tutorial: Generative Adversarial Networks. 2016. eprint: arXiv:1701.00160.
[8] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, et al. “GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium”. In: Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 2017, pp. 6626–6637.
[9] Geoffrey E Hinton and Ruslan R Salakhutdinov. “Reducing the dimensionality of data with neural networks.” In: Science 313.5786 (July 2006), pp. 504–507. arXiv: 20.
[10] Geoffrey E. Hinton. “A Practical Guide to Training Restricted Boltzmann Machines”. In: Neural Networks: Tricks of the Trade: Second Edition. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 599–619.
[11] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, et al. “Image-to-Image Translation with Conditional Adversarial Networks”. In: (2016). eprint: arXiv:1611.07004.
[12] Diederik P Kingma and Max Welling. “Auto-Encoding Variational Bayes”. In: arXiv e-prints, arXiv:1312.6114 (Dec. 2013), arXiv:1312.6114. arXiv: 1312.6114 [stat.ML].
[13] Jonathan Masci, Ueli Meier, Dan Ciresan, et al. “Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction”. In: Artificial Neural Networks and Machine Learning – ICANN 2011. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 52–59.
[14] Luke Metz, Ben Poole, David Pfau, et al. “Unrolled Generative Adversarial Networks”. In: International Conference on Learning Representations. Apr. 2017. eprint: arXiv:1611.02163.
[15] Mehdi Mirza and Simon Osindero. “Conditional Generative Adversarial Nets”. In: CoRR abs/1411.1784 (2014). arXiv: 1411.1784.
[16] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial 2015. eprint: arXiv:1511.06434.
[17] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, et al. “Improved Techniques for Training GANs”. In: Advances in Neural Information Processing Systems 29. Curran Associates, Inc., 2016, pp. 2234–2242.
[18] Andrew Ng. “CS294A Lecture notes”. In: 2011.
[19] Han Zhang, Tao Xu, Hongsheng Li, et al. “StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks”. In: CoRR abs/1612.03242 (2016). arXiv: 1612.03242.
[20] Han Zhang, Tao Xu, Hongsheng Li, et al. “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks”. In: arXiv preprint arXiv:1612.03242 (2016).
[21] Bolei Zhou, Aditya Khosla, Agata Lapedriza, et al. “Learning Deep Features for Discriminative Localization”. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, June 2016, pp. 2921–2929. arXiv: 1512.04150.
[22] Jun-Yan Zhu, Taesung Park, Phillip Isola, et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”. In: CoRR abs/1703.10593 (2017). arXiv: 1703.10593.

Tags

Per RSS abonnieren