AI Tools

A collection of links and comments on generative machine learning tools I have used.

* this list hasn’t been updated since 2022

Image

Stable Diffusion

The most powerful open-source model available.

DreamStudio is an easy interface that goes with credits. AUTOMATIC1111’s notebook is a more complete UI running through Colab and HuggingFace (the linked notebook includes ControlNet, a technique to better control the composition). The Deforum notebook is to create videos and the DreamBooth notebook is to train a model on your own imagery.

It is both flexible in mixing concepts and coherent in shapes. However it requires more experimenting than DALLE 2, since it won’t create an aesthetically pleasing image right away.

Marge Simpson in the style of glitchy deepdream, 3D by Dreamwork

InstructPix2Pix

Easy interface for changing an image through natural language. Also available for changing video.

It’s is fairly good at understanding context and changing only what is needed. Might toggle a bit with CFG parameters if the results aren’t of satisfaction.

change it to photo of stainless steel pot with delicious M soup heated on fire in a warm kitchen in Tuscany, many other pots

VQGAN+CLIP

A versatile tool to create any sort of imagery with not much coherence.

Images aren’t coherent in their shapes so it results in trippy interpretation of visual representations.

Alien cake generated with VQGAN+CLIP — Alien cakes exploding in your face Artstation HD sflckr

Disco Diffusion v5.2

Spooky images with painterly feel and dominant compositions.

A good balance between coherence and flexibility but the process demands a lot of power.

Teletubbies generated with Disco Diffusion — Teletubbies gone wrong painted in Artstation

DALL-E mini (craiyon)

A free website known for generating fast and creepy images.

While it’s very easy to use and can create realistic looking images, it can only produce low resolution.

Bulbasaur generated with DALLE mini craiyon — I took a photo of realistic Bulbasaur in the wild cooking HD highly detailed

Artbreeder collage

Versatile tool that allows to create shapes from which it will generate the prompt. Very easy to use and a new way to easily approach generative art with more agency

Surprisingly good results, fast rendering and a lot of control.

Minion generated with Artbreeder collage — Minions

ruDALLE

An open version of OpenAI’s DALL-E 1. It’s fast and can be accesed from its website, its Telegram bot and colab.

It’s easily accessible and can replicate signs with much accuracy. However it lacks in flexibility and it is difficult to mesh concepts together.

GLID-3_XL (ldm finetune)

Easy interface for creating realistic looking images with creepy undertones.

While easy for beginners, it also allows access to more detailed parameters.

a brand mascot mix of Luigi Mii Yoshi close up photograph 3d plastic soft

Looking Glass v1.4

An earlier tool to fine-tune a generative model with input images. It requires a few images to create good results.

The training needs a lot of time and results are not always flexible in variation.

Waluigi generated with Looking Glass — Trained on Waluigi fan art

RunwayML

My very first AI tool I used. The interface is very easy for someone entering the AI space for the first time. It allows to use other people’s trained models too and needs payed subscription.

A big data set of at least 500 images is required to generate similar images and the coherence sometimes totally breaks. Here is an animation that features such images, while here you can see a model I trained on Zuckerberg. Their models can also be implemented with a separate audio-reactive AI (see below).

Nowadays you can do a lot more on their site.

Alien generated with RunwayML — Trained on a collection of alien images

Latent Diffusion LAION-400M

A weaker but more disturbing alternative to DALL-E 2. These might fool you at first sight, but often reveal their artificiality at a closer inspection.

Images vary a lot and have a realistic looking feeling. Great for producing a lot, but only at a small resolution. When increasing the size, the image breaks and repeats a sort of pattern.

Brand mascots collectibles promoting climate catastrophe

Text2PixelArt

A tool to generate pixel art. It has a good understanding of palettes and manages to create coherent objects.

This colab is one of many and I use this only because it was the easiest to use for me.

Waluigi generated with Text2PixelArt — Super Mario waluigi astronaut walking character design on Behance

Sound

Audio reactive latent

Together with models from RunwayML, this colab can create videos that move to the beat of music. It transitions inside the latent space of StyleGAN models. Here is an example of how it looks like. I additionally needed to use this colab too, to convert the model into the needed format.

Jukebox

The best available AI to generate music from a sample or based on genres/artists. It was created by OpenAI and it creates uncanny songs.

The colab is very time consuming because it takes a lot of time to train it. Here is a work of mine where I used it to generate the music.

Speechdio

An excellent paying service that can generate speech from text. The results are surprisingly good and can deceive someone into believing it is human.

While you need to pay to use it, it is very fast and versatile. This is the project where I used it, in which the robot’s voice was completely generated.

Fake You

Amazing text-to-voice service, free to use, that can be trained on ones voice.

Like deepfakes, but for voice, this website has a big catalog of amazing voices, from memes to movies, and it is very fast.

Video

Zooming VQGAN+CLIP

Just like VQGAN+CLIP, but now you can make an animation while zooming or sliding through the image. It is keyframe based in which one needs to manually type the timestamps of the camera’s movement.

Here is a video where I used it various times during transitions.

Text

GPT-3

The best tool to generate text. It just needs an initial prompt and it will continue the sentence with contextual awareness. It is available for everyone with some free usage, while after one needs to pay for each generation.

Depending on the model, one can steer the AI to talk in a certain style, or one can ask an instruction/question for it to answer. I used it in my book as a fairy tale generator, where it understood the format I wanted it to replicate.