Choosing the Best AI Image Generation apps (Part 2)

Last Modified: Jul 18th, 2023 - Category: Artificial Intelligence, Case Studies, UX Research
Choosing the Best AI Image Generation apps cover image
Best Generative Image AI Introduction
(part 1)
Defining the AI image generation apps to test
(this page)
Top 5 AI Image Generator Analysis
(part 3)
Experts View and Results
(part 4)

AI Image Generation: How to choose the best apps?

This is the second part of our AI image generation case study. Please refer to the bar on top to jump to any specific page of this study.

Continuing with our case study were we left in the previous step, let’s see the second key insight. This was undeniably the most important one: to define the top 5 apps for AI image generation — the very reason we embarked on this research.

As mentioned earlier, we could have tested numerous apps — literally hundreds. This is not an exaggeration on our side, but an exaggeration on the app publishers. Many app publishers claim to use AI, but this is not true. Or at least it’s not true for what we’re looking for: real AI image generation.

However, after thorough research, we discovered that many self-proclaimed AI image apps did not truly utilize AI for image generation, or their implementation was exceptionally limited. Simply put, they didn’t go beyond basic image filters available on platforms like Instagram or even your smartphone.

This realization prompted us to refine our definition of an AI art application to “separate the wheat from the chaff.” Thus, we added the following criterion: Can the app create completely new visuals from a prompt?

To elaborate, most filter-based apps take an existing image and apply various tools and processes to it. While they might employ AI in some of their processes, they do not generate anything from scratch; they primarily manipulate pre-existing images.

For instance, one of the most advanced and popular apps we encountered, Loopsie, found at, merely adds backgrounds and creates a pseudo 3D effect for a subject provided by the user.

A similar case can be made for Disflow. Although the results are slightly more intriguing, it still functions primarily as a filter, offering limited scope.

Is this real AI image generation? Well, not by our parameters

Equipped with this refined parameter, we narrowed down the selection to 8 applications, from which we chose five to test. We excluded some because they were excessively similar to existing models, opting for the most representative options instead.

Note: There’s a sixth option that we chose, and that should be included in this report: Dall-E 2. However, when we performed the tests, we made an error, and two of the testers didn’t test that app. Therefore, in order to keep the data consistent with the other options, we decided to set it aside and conduct a further evaluation in the near future.

And the final top 5 is…

Based on the aforementioned evaluation, we present our final candidates for testing. The list, in no particular order, is as follows:

  • Adobe Firefly
  • Bing AI Image
  • Leonardo
  • Kaiber
  • Midjourney

Generative image AI test and analysis

Here, we provide a list of the generative art tools we analyzed. Please note that we have no affiliation with any of them, and you won’t find any affiliate links or retributions. In other words, this is an honest examination of the technology without any hidden agendas. If we were to sell you something, it would be our UX services!

This post, like every other article on this website, aims to provide an honest perspective. While we strive to be thorough, it’s possible that we might miss something or that circumstances may change if you’re reading this in a few months or a year (which they probably will!). Rest assured, we haven’t received any incentives or contacted the developers; we even paid for those apps that required payment.

Research Methodology

Example of AI image generation
An example of AI Image Generation created with the prompt “thinking robots”

The research methodology was qualitative, using the talkaloud method in a presential modality, with varied demographics and a good English level (as the research was conducted in a non-English speaking country).

In order to reach a verdict, we defined a series of variables and assigned a value of 1 to 5 to each of them. Then, five people thoroughly tested the apps. The testers had the following characteristics:

  • Tester 1: female, 25 years old, professional designer, had never used generative image tools. Good English level.
  • Tester 2: female, 29 years old, professional designer, had some experience with image generation and AI in general. Good English level.
  • Tester 3: male, 52 years old, professional designer, had high expertise in AI, including image generation. Excellent English level.
  • Tester 4: N/A, 34 years old, had no knowledge of design and had never used AI in any form. Good English level.
  • Tester 5: female, 40 years old, had no knowledge of design and had never used apps for image generation by AI. Excellent English level.

The users were given a 15-minute period for the discovery process in which they played around with the app. The findings were compared to pre-existing similar data. After this discovery process, they received a brief tutorial provided by the researcher and a list of the commands for the apps that required it.

We used the following variables for the analysis:

  1. Image quality: the quality of the final output.
  2. Prompt-to-Image adherence: how reliable the final output was in relation to what the user prompted.
  3. Image-to-Image adherence: how reliable the final output was in relation to an uploaded image.
  4. Trainable: whether the model was trainable or not, and how easy or difficult it was perceived by the testers.
  5. Weighting: whether the apps had the option for prompt weighting.
  6. Negative weighting: same as above, but for negative elements (e.g., some element shouldn’t be in the image or it should be less important).
  7. Special features: special features like zooming, panning, color modes, upscaling, models, etc.
  8. User Expectation: how close the final result was to what the user expected.
  9. Usability: the overall usability of the app, assessed subjectively by the tester.
  10. Absolute Price: the user’s perception of the absolute value.
  11. Relative Price: the user’s perception of the results in relation to the value, measured by the question “Given the results, how likely are you to pay for this tool?”
AI image generation: decorative image created with AI

The users assessed all 11 items in the list and assigned values ranging from 1 (lower) to 5 (higher) to each item. Lack of features was graded as 0, with two exceptions: If the app was free, Question 10 was given a value of 5. The same issue occurred with Question 5 (Trainable) since testers were unable to evaluate the training aspect. As a result, it was graded as 0 if it couldn’t be trained and 5 if training was possible, regardless of the quality of the training.

Please note that the descriptions provided in this report are based on the users’ evaluation during the testing period, and the business model for these apps may have changed since then. Therefore, it is important to consider that the information presented here might not accurately reflect the current state of the apps at the time of reading this report.

AI-Based Image Generation: Continuation

This concludes the second part of our four-part case study on AI image generation tools. If you’re interested in reading the analysis and user research for each of the apps mentioned in this post, please continue to “Top 5 AI Image Generation Analysis”. Alternatively, if you arrived from a different website, please scroll to the top of the page to access all sections of this study.

We can improve your business!

Let us help you with the best solutions for your business.

It only takes one step, you're one click away from getting guaranteed results!

I want to improve my business NOW!