# Paper Overview

## Main idea

The key idea of this paper is to improve speed of generating waveform from spectrogram. Authors use GAN with CNN-only architecture which is optimized for GPU. Both generator and discriminator is very lightweight in comparison to previous SOTA approaches like wave net.

## Features were used

#### Architecture

In generator we reduce dimensionality layer by layer and also to prevent gradient vanish we use residual stack. In discriminator we instead increasing dimensionality layer by layer. Also we features from every layer of discriminator. Also we should use 3 discriminators instead of 1. For every next discriminator we downsample input by 2 with average pooling.

#### Weight normalization

In my first experiments I didn't add weight normalization, it leads to instability in losses also generated results wasn't good. So, for every convolutional (all layers are convolutional) we apply weight normalization.

#### Loss functions

The basic loss is taken from [LS GAN paper](https://arxiv.org/abs/1611.04076). The main difference between vanilla GAN loss is that we don't use sigmoid function for output. The loss function is:

$$
\begin{array}{l}\min *{D*{k}} \mathbb{E}*{x}\left\[\min \left(0,1-D*{k}(x)\right)\right]+\mathbb{E}*{s, z}\left\[\min \left(0,1+D*{k}(G(s, z))\right)\right], \forall k=1,2,3 \ \min *{G} \mathbb{E}*{s, z}\left\[\sum\_{k=1,2,3}-D\_{k}(G(s, z))\right]\end{array}
$$

Also, to improve generator convergence we also use L1 distance for features from discriminator between real and generated audio.

$$
\mathcal{L}*{\mathrm{FM}}\left(G, D*{k}\right)=\mathbb{E}*{x, s \sim p*{\text {data }}}\left\[\sum\_{i=1}^{T} \frac{1}{N\_{i}}\left|D\_{k}^{(i)}(x)-D\_{k}^{(i)}(G(s))\right|\_{1}\right]
$$

$$
\lambda = \dfrac{10}{N\_D}
$$


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://elephantmipt.gitbook.io/melgan/paper-overview.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
