Inverse neural network

What i can figure till now is , outputs should generate input, it's not one to one mapping. Something like shift and scale used, to make it more meaningful. I still don't understand it, it's being used in diffusion, Fno, wno.
 
They also do something very clever like splitting the input into 2 types( I still don’t know how exactly it is clever, but it gives the vibe that it is). The 2 inputs corresponds to affine coupling layers, from where scale and shift also comes from
 
The base equation is this:
notion image
 
What is diffusion, what is inverse neural network? How are they related, where does ELBO loss comes in all this, what is the usual loss function?