AI Tools
A collection of links and comments on generative machine learning tools I have used.
* this list hasn’t been updated since 2022
Image
Stable Diffusion
The most powerful open-source model available.
DreamStudio is an easy interface that goes with credits. AUTOMATIC1111’s notebook is a more complete UI running through Colab and HuggingFace (the linked notebook includes ControlNet, a technique to better control the composition). The Deforum notebook is to create videos and the DreamBooth notebook is to train a model on your own imagery.
It is both flexible in mixing concepts and coherent in shapes. However it requires more experimenting than DALLE 2, since it won’t create an aesthetically pleasing image right away.
Marge Simpson in the style of glitchy deepdream, 3D by Dreamwork
Easy interface for changing an image through natural language. Also available for changing video.
It’s is fairly good at understanding context and changing only what is needed. Might toggle a bit with CFG parameters if the results aren’t of satisfaction.
change it to photo of stainless steel pot with delicious M soup heated on fire in a warm kitchen in Tuscany, many other pots
A versatile tool to create any sort of imagery with not much coherence.
Images aren’t coherent in their shapes so it results in trippy interpretation of visual representations.
Disco Diffusion v5.2
Spooky images with painterly feel and dominant compositions.
A good balance between coherence and flexibility but the process demands a lot of power.
A free website known for generating fast and creepy images.
While it’s very easy to use and can create realistic looking images, it can only produce low resolution.
An open version of OpenAI’s DALL-E 1. It’s fast and can be accesed from its website, its Telegram bot and colab.
It’s easily accessible and can replicate signs with much accuracy. However it lacks in flexibility and it is difficult to mesh concepts together.
Easy interface for creating realistic looking images with creepy undertones.
While easy for beginners, it also allows access to more detailed parameters.
Looking Glass v1.4
An earlier tool to fine-tune a generative model with input images. It requires a few images to create good results.
The training needs a lot of time and results are not always flexible in variation.
My very first AI tool I used. The interface is very easy for someone entering the AI space for the first time. It allows to use other people’s trained models too and needs payed subscription.
A big data set of at least 500 images is required to generate similar images and the coherence sometimes totally breaks. Here is an animation that features such images, while here you can see a model I trained on Zuckerberg. Their models can also be implemented with a separate audio-reactive AI (see below).
Nowadays you can do a lot more on their site.
A weaker but more disturbing alternative to DALL-E 2. These might fool you at first sight, but often reveal their artificiality at a closer inspection.
Images vary a lot and have a realistic looking feeling. Great for producing a lot, but only at a small resolution. When increasing the size, the image breaks and repeats a sort of pattern.
A tool to generate pixel art. It has a good understanding of palettes and manages to create coherent objects.
This colab is one of many and I use this only because it was the easiest to use for me.
Sound
Together with models from RunwayML, this colab can create videos that move to the beat of music. It transitions inside the latent space of StyleGAN models. Here is an example of how it looks like. I additionally needed to use this colab too, to convert the model into the needed format.
The best available AI to generate music from a sample or based on genres/artists. It was created by OpenAI and it creates uncanny songs.
The colab is very time consuming because it takes a lot of time to train it. Here is a work of mine where I used it to generate the music.
An excellent paying service that can generate speech from text. The results are surprisingly good and can deceive someone into believing it is human.
While you need to pay to use it, it is very fast and versatile. This is the project where I used it, in which the robot’s voice was completely generated.
Fake You
Amazing text-to-voice service, free to use, that can be trained on ones voice.
Like deepfakes, but for voice, this website has a big catalog of amazing voices, from memes to movies, and it is very fast.
Video
Just like VQGAN+CLIP, but now you can make an animation while zooming or sliding through the image. It is keyframe based in which one needs to manually type the timestamps of the camera’s movement.
Here is a video where I used it various times during transitions.
Text
The best tool to generate text. It just needs an initial prompt and it will continue the sentence with contextual awareness. It is available for everyone with some free usage, while after one needs to pay for each generation.
Depending on the model, one can steer the AI to talk in a certain style, or one can ask an instruction/question for it to answer. I used it in my book as a fairy tale generator, where it understood the format I wanted it to replicate.