技术控

    今日:0| 主题:63445
收藏本版 (1)
最新软件应用技术尽在掌握

[其他] A subpixel super-resolution neural net implementation in Tensorflow

[复制链接]
情何以堪 发表于 2016-10-2 03:10:49
208 3
subpixel: A subpixel convolutional neural network implementation with Tensorflow

  Left: input images / Right: output images with 4x super-resolution after 6 epochs:
   

A subpixel super-resolution neural net implementation in Tensorflow

A subpixel super-resolution neural net implementation in Tensorflow

A subpixel super-resolution neural net implementation in Tensorflow

A subpixel super-resolution neural net implementation in Tensorflow

   See more examples inside theimages folder.
  In CVPR 2016 Shi et. al. from Twitter VX (previously Magic Pony) published a paper called Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network [1]. Here we propose a reimplementation of their method and discuss future applications of the technology.
  But first let us discuss some background.
  Convolutions, transposed convolutions and subpixel convolutions

  Convolutional neural networks (CNN) are now standard neural network layers for computer vision. Transposed convolutions (sometimes refered as deconvolution) are the GRADIENTS of a convolutional layer. Transposed convolutions were, as far as we know first used by Zeiler and Fergus [2] for visualization purposes while improving their AlexNet model.
  For visualization purposes let us check out that convolutions in the present subject are a sequence of inner product of a given filter (or kernel) with pieces of a larger image. This operation is highly parallelizable, since the kernel is the same throughout the image. People used to refer to convolutions as locally connected layers with shared parameters. Checkout the figure bellow by Dumoulin and Visin [3]:
   

A subpixel super-resolution neural net implementation in Tensorflow

A subpixel super-resolution neural net implementation in Tensorflow
source
   Note though that convolutional neural networks can be defined with strides or we can follow the convolution with maxpooling to downsample the input image. The equivalent backward operation of a convolution with strides, in other words its gradient, is an upsampling operation, where zeros a filled in between non-zeros pixels followed by a convolution with the kernel matrix. See representation copied from Dumoulin and Visin again:
   

A subpixel super-resolution neural net implementation in Tensorflow

A subpixel super-resolution neural net implementation in Tensorflow
source
  For classification purposes, all that we need is the feedforward pass of a convolutional neural network to extract features at different scales. But for applications such as image superresolution and autoencoders, both downsampling and upsampling operations are necessary in a feedforward pass. The community took inspiration on how the gradients are implemented in CNNs are used them as a feedforward layer instead.
  But as one may have observed the upsampling operation as implemented above with strided convolution gradients adds zero values to the upscale the image, that have to be later filled in with meanigful values. Maybe even worse, these zero values have no gradient information that can be backpropagated through.
  To cope with that problem, Shi et. al [1] proposed what we argue to be one the most useful recent convnet tricks (at least in my opinion as a generative model researcher!) They proposed a subpixel convolutional neural network layer for upscaling. This layer essentially uses regular convolutional layers followed by a specific type of image reshaping called a phase shift. In other words, instead of putting zeros in between pixels and having to do extra computation, they calculate more convolutions in lower resolution and resize the resulting map into an upscaled image. This way, no meaningless zeros are necessary. Checkout the figure below from their paper. Follow the colors to have an intuition about how they do the image resizing.
   

A subpixel super-resolution neural net implementation in Tensorflow

A subpixel super-resolution neural net implementation in Tensorflow
source
  Next we will discuss our implementation of this method and later what we foresee to be the implications of it everywhere where upscaling in convolutional neural networks was necessary.
  Subpixel CNN layer

  Following Shi et. al. the equation for implementing the phase shift for CNNs is:
   

A subpixel super-resolution neural net implementation in Tensorflow

A subpixel super-resolution neural net implementation in Tensorflow
source
  In numpy, we can write this as
  1. def PS(I, r):
  2.   assert len(I.shape) == 3
  3.   assert r>0
  4.   r = int(r)
  5.   O = np.zeros((I.shape[0]*r, I.shape[1]*r, I.shape[2]/(r*2)))
  6.   for x in range(O.shape[0]):
  7.     for y in range(O.shape[1]):
  8.       for c in range(O.shape[2]):
  9.         c += 1
  10.         a = np.floor(x/r).astype("int")
  11.         b = np.floor(y/r).astype("int")
  12.         d = c*r*(y%r) + c*(x%r)
  13.         print a, b, d
  14.         O[x, y, c-1] = I[a, b, d]
  15.   return O
复制代码
  To implement this in Tensorflow we would have to create a custom operator and its equivalent gradient. But after starting for a few minutes in the image depiction of the resulting operation we noticed how to write that using just regular reshape , split and concatenate operations. To understand that note that phase shift simply goes through different channels of the output convolutional map and builds up neighborhood of r x r pixels. And we can do the same with a few lines of Tensorflow code as:
  1. def _phase_shift(I, r):
  2.     # Helper function with main phase shift operation
  3.     bsize, a, b, c = I.get_shape().as_list()
  4.     X = tf.reshape(I, (bsize, a, b, r, r))
  5.     X = tf.transpose(X, (0, 1, 2, 4, 3))  # bsize, a, b, 1, 1
  6.     X = tf.split(1, a, X)  # a, [bsize, b, r, r]
  7.     X = tf.concat(2, [tf.squeeze(x) for x in X])  # bsize, b, a*r, r
  8.     X = tf.split(1, b, X)  # b, [bsize, a*r, r]
  9.     X = tf.concat(2, [tf.squeeze(x) for x in X])  #
  10.     bsize, a*r, b*r
  11.     return tf.reshape(X, (bsize, a*r, b*r, 1))
  12. def PS(X, r, color=False):
  13.   # Main OP that you can arbitrarily use in you tensorflow code
  14.   if color:
  15.     Xc = tf.split(3, 3, X)
  16.     X = tf.concat(3, [_phase_shift(x, r) for x in Xc])
  17.   else:
  18.     X = _phase_shift(X, r)
  19.   return X
复制代码
  The reminder of this library is an implementation of a subpixel CNN using the proposed PS implementation for super resolotion of celeb-A image faces. The code was written on top of carpedm20/DCGAN-tensorflow , as so, to use follow the same instructions:
  1. $ python download.py celebA  # this won't work though, you will have to download the dataset by hand somewhere
  2. $ python main.py --dataset celebA --is_train True --is_crop True
复制代码
Subpixel CNN future is bright

  Here are we want to forecast that subpixel CNNs are going to ultimately replace transposed convolutions (deconv, conv grad, or whatever you call it) in feedforward neural networks. The gradient is much more meanigful and resizing operations are virtually free computationally. Our implementation is a highly level one, using default Tensorflow OPs. But next we will rewrite everything with Keras so that an even larger community can use it. Plus, a cuda backend level implementation would be even more appreciated.
  But for now we want to encourage the community to experiment with where else deconv can be replaced by subpixel operatinos. By everything we mean:
  
       
  • Conv-deconv autoencoders Similar to super-resolution, include subpixel in other autoencoder implementations with deconv layers   
  • Style transfer networks This didn't work in a lazy plug and play. We have to look more carefully   
  • Deep Convolutional Autoencoders (DCGAN) We started doing this, but as predicted we have to change hyperparameters. The network power is totally different from deconv layers.   
  • Segmenation Networks (SegNets) ULTRA LOW hanging fruit!   
  • wherever upscaling is done with zero padding  
  Join us in the revolution to get rid of meaningless zeros in feedfoward convnets, give suggestions here, try our code!
  References

   [1] Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network . By Shi et. al.
   [2] Visualizing and Understanding Convolutional Networks . By Zeiler and Fergus.
   [3] A guide to convolution arithmetic for deep learning . By Dumoulin and Visin.
xmhouse236 发表于 2016-10-2 05:26:02
情何以堪不整容也像雷锋!
回复 支持 反对

使用道具 举报

rt春业逢o 发表于 2016-10-3 15:01:45
我不在江湖,但江湖中有我的传说。  
回复 支持 反对

使用道具 举报

jhx 发表于 2016-10-9 18:03:45
秀起来~
回复 支持 反对

使用道具 举报

我要投稿

推荐阅读


回页顶回复上一篇下一篇回列表
手机版/c.CoLaBug.com ( 粤ICP备05003221号 | 粤公网安备 44010402000842号 )

© 2001-2017 Comsenz Inc.

返回顶部 返回列表