* Support diffusion models: Add Dream 7B
* Move diffusion to examples
* Move stuff to examples. Add patch to not use kv-cache
* Address review comments
* Make sampling fast
* llama: remove diffusion functions
* Add basic timings + cleanup
* More cleanup
* Review comments: better formating, use LOG instead std::cerr, re-use batch, use ubatch instead of max_length
* fixup!
* Review: move everything to diffusion-cli for now