Whether it is designing captivating content, exploring visual storytelling, or building appealing presentations, AI image generators have become increasingly popular to create striking imagery with powerful detailing.
In the dynamic world of generative AI where several image generation tools have emerged over the last couple of years, a new suite of visual AI models is here. Meet Flux.1, a family of 12 billion parameter, text-to-image models from Black Forest Labs that has generated a lot of interest among AI enthusiasts over the past few weeks. Let’s dive into the magic behind this groundbreaking model and learn how to run it on an Ori GPU instance.
Model variants: Flux.1 Pro vs Dev vs Schnell
|
Flux.1 Pro |
Flux1.Dev |
Flux1.Schnell |
Description |
State-of-the-art performance with top notch prompt adherence, image quality, detailing and output diversity. |
Open weight model distilled from the Pro variant. Similar image quality and prompt guidance, but more efficient than Pro, and can be fine tuned. |
Weights and inference code made openly available, optimized for inference speed. |
Repository |
Not shared |
Hugging Face - Dev, Inference code
|
Hugging face - Schnell, Inference code
|
Licensing |
From Black Forest Labs |
Non-commercial |
Apache-2.0 - Personal, scientific and commercial |
Flux.1 has been trained with a new approach called Adversarial Diffusion Distillation that reduces the number of inference steps of a pre-trained diffusion model to 1–4 sampling steps while maintaining high sampling fidelity. The benchmarks shared by Black Forest Labs portray state-of-the-art (SOTA) performance that exceeds many existing text to image generation models, including several non-distilled models.
Black Forest Labs has also revealed their plans to release a SOTA text-to-video model in the near future.
How to run Flux.1 Schnell on an Ori virtual machine
Pre-requisites
Create a GPU virtual machine (VM) on Ori Global Cloud. We chose the NVIDIA L40s with 48 GB VRAM and 90GiB of system memory for this demo, however many users have been able to run with smaller amounts of memory. A powerful GPU with higher memory usually helps run large models faster and provide the ability to run more instances of the model if needed. We’ve chosen Ubuntu 22.04 as our OS, however Debian is also an option.
Prerequisites
Step 1: Once you SSH into your VM, clone the official Flux Github repository into a directory of your choice on the VM and navigate to the “flux” directory.
cd /home/image-gen
git clone https://github.com/black-forest-labs/flux
cd flux
Step 2: Install python, if you haven’t already and create a virtual environment
apt install python3.10-venv
Activate the virtual environment and install relevant python packages
python3.10 -m venv flux-env
source flux-env/bin/activate
pip install -e '.[all]'
It might take a few minutes to install all the packages.
Step 3: Login to Hugging Face ( only needed to run Dev). For our demo, we'll be using Schnell
Install Hugging face CLI and login
pip install -U "huggingface_hub[cli]"
huggingface-cli login
Step 4: Run the Streamlit demo
The demo python file created by Black Forest Labs is already downloaded in the flux directory.
Once the command is successfully run, you will see the following message in your terminal
“You can now view your Streamlit app in your browser”, followed by Local, Network and External URLs.
Step 5: Generate images from your browser
Copy the link to your browser and choose the model you want to load. Make adjustments to your image dimensions and steps as needed.
For this demo we’ll be choosing Schnell where you can only alter Steps, whereas the Dev demo lets you alter guidance and seed
Enter a prompt and Voila your image will be ready in a few seconds
Alternative ways to run Flux.1 on the cloud
Gradio
There is also a Gradio demo in the directory which you can run instead of Streamlit:
python3.10 demo_gr.py --name flux-schnell --device cuda
Run the model directly from your terminal
python3.10 -m flux --name flux-schnell \
--height 1360 --width 768 \
--prompt "A painting of an empty football stadium in the style of Banksy"
You can now access the image via Jupyter lab. Install Jupyter Notebook and spin up a notebook on a port of your choice, we specified 8889 here
pip3 install notebook
jupyter notebook --port 8889 --allow-root --no-browser --ip=0.0.0.0
The command will return an URL to the local machine. You could also replace the localhost string with your VM’s IP to access the Jupyter lab via your browser
Open the URL in a browser window and navigate to the output directory within the flux folder. All generated images will be stored in the output directory.
How good is Flux.1?
We ran a few prompts to test the Schnell model and here are some observations from our experimentation:
- The quality of the images, detailing and speed of generation were on par with top of the line image generation tools.
- Prompt adherence was very good, especially the ease with which Flux.1 interprets natural language prompts is impressive.
- Captioning for images was mostly good, and the positioning of the caption within the image was also mostly as prompted. However, signboards and other text within images were many times incompatible with the rest of the imagery.
Flux.1 also allows users to change parameters such as resolution, number of steps, guidance, and seed to customize your images. The dev model, which is for non-commercial use only can also be finetuned for more personalization. From photorealistic images to illustrations and AI art in the style of your favorite painter, Flux.1 is quite adept at turning text prompts and descriptions into beautiful images.
Examples of Flux.1 Schnell images generated on an Ori virtual machine
Prompt: A Lego hedgehog sits between the dimly lit server racks of a data center. The hedgehog has purple spikes. At the bottom of the image there is a caption which reads "Created with Flux on Ori"
Prompt: A low angle shot of vibrant and dense rainforests with a view of tiny streaks of sunlight piercing through the treetops
Prompt: A panoramic shot of a cute panda dressed in a parka and snow goggles, skiing down the snowy slopes of Whistler, CanadaPrompt: A landscape painting in the style of Banksy
Prompt: A wide lens shot of an awe-inspiring, circular building on Mars with several levels. There are futuristic transport pods in front of the building
Build, deploy and scale AI on Ori
Ori’s AI Native cloud is purpose-built for AI/ML workloads such as training models of varying sizes including foundation models, fine tuning generative AI models, running inference at scale and much more. Backed by top-notch GPUs, performant storage and AI-ready networking Ori enables AI-focused startups and enterprises to: