We've designed a machine learning model capable of generating images that accurately incorporate specified text or letter placement within the image. Existing image-generation models often struggle with proper text placement, either misplacing the text or introducing spelling errors. Our goal is to train the model to master the art of placing text accurately and aesthetically within generated images. This innovation enables the seamless creation of visually appealing outputs such as poster designs and template fills, all from a single prompt.
Despite the progress in AI image generation, no model has mastered the art of text placement. Challenges we aim to solve:
- Ensuring precise text placement that enhances the visual appeal.
- Preventing spelling errors and maintaining stylistic consistency.
- Supporting unstructured text in different layouts and styles.
Our project brings together advanced AI techniques and innovative approaches to overcome these limitations.
We used black-forest-labs/FLUX.1-dev for generating high-quality images. This model serves as the backbone of our pipeline.
Using ControlNet, we ensure the model adheres to layout constraints like text placement, image structure, and artistic styling.
- Poster Collection: Thousands of poster images were collected and cleaned.
- Caption Generation: Leveraged
moondream2to create captions describing each poster. - Conditional Images: Created lineart images with
ControlNet-v1-1-nightlyfor layout guidance.
| 🖼️ Image | ✍️ Caption | 🖊️ Conditional Image |
|---|---|---|
| Movie poster image | A description of the poster, including characters, text, colors, etc. | Lineart representation |
- Dataset Link: ControlNet-Poster Dataset
We trained the model using the following parameters:
!accelerate launch train_controlnet_flux.py \
--pretrained_model_name_or_path="black-forest-labs/FLUX.1-dev" \
--dataset_name="fhai50032/ControlNet-Poster" \
--conditioning_image_column="conditional_image" \
--image_column="image" \
--caption_column="caption" \
--output_dir="text-controlnet" \
--mixed_precision="bf16" \
--resolution=512 \
--learning_rate=3e-5 \
--max_train_steps=3000 \
--train_batch_size=2 \
--gradient_accumulation_steps=3 \
--report_to="wandb" \
--num_double_layers=4 \
--num_single_layers=2 \
--seed=42 \
--lr_scheduler "cosine" \
--checkpointing_steps 100 \
--max_train_samples 3000 \
--use_adafactor \
--push_to_hub- Hardware: Trained on an NVIDIA A100 GPU
The learning rate progression during training:
You can view the models instructions at : Here
Prompt: "Create a poster for a website with text 'DIGIVARSITY', background as sunset mountain college."

Prompt: "Print a poster with text 'CHAMPIONS LEAGUE' at the top, and 'MADRID' and 'FINAL' at the bottom."

- Text Placement: The model demonstrates basic text placement but requires further refinement for complex layouts.
- Visual Quality: Image quality is high, but further training is needed to improve text stylization and integration.
- Expand training to the full dataset for better accuracy.
- Develop new loss functions to better handle text placement errors.
- Train the model for multilingual text support.
- Improve text stylization and unstructured layout handling.
- Framework :
PyTorch - Image Generation:
FLUX.1-dev - Control Parameters:
ControlNet - Caption Generation:
moondream2 - Conditional Image Generation:
Lineart - Text Encoding:
FLUX.1-dev-4bit - Unified Controlnet:
FLUX.1-dev-ControlNet-Union-Pro
- We welcome contributions! To contribute:
- Fork the repository.
- or Submit a pull request with your changes or suggestions.
- fhai50032 for his valuable contributions to this project.
Black Forest Labsfor theFLUX.1-devmodel.lllyasvielfor ControlNet and Annotators, which made conditional image generation possible.vikhyatkfor themoondream2model, enabling automated caption generation.HighCWufor theFLUX.1-dev-4bit.Shakker-LabsforFLUX.1-dev-ControlNet-Union-Pro.
Apache License 2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

