Generative AI Tools: Experts View and Results (Part 4)

Last Modified: Jul 18th, 2023 - Category: Artificial Intelligence, Case Studies, UX Research
Generative AI Tools Cover Image
Best Generative Image AI Introduction
(part 1)
Defining the AI image generation apps to test
(part 2)
Top 5 AI Image Generator Analysis
(part 3)
Experts View and Results
(this page)

Generative AI Tools: Insights and analysis

This is the fourth and final part of our Generative AI Tools testing. If you have been following the entire case study, you would have observed the following:

First Part: Introduction to Generative Image AI

In the initial section, I explained our objectives, shared some initial findings, and discussed potential approaches for continuing the UX research process. We explored a portion of the methodology and the rationale behind our subsequent steps.

Second Part: Defining AI Image Generation Apps

In the second segment, we established specific parameters for our research. Most importantly, we defined a guiding principle that shaped the entire process: distinguishing between genuine AI image generators and apps that merely claim to be. We examined the disparity between authentic AI image generation and apps that fall short of it. Furthermore, we provided an overview of our methodology, including how we selected interviewees for user research and the unique characteristics of each participant.

Third Part: Analysis of AI Image Generators

In the third phase of our case study, we analyzed the five selected apps based on the parameters established in the previous section. Each app was introduced with an overview of its capabilities and functionalities, followed by an evaluation of its performance with real users. Based on user research, our examination of the chosen generative AI tools yielded the following initial results (ranked from best to worst):

  2. Midjourney
  3. Kaiber
  4. Firefly
  5. Bing

Refining insights

Generative AI tools from the future
Prompt: Old Generative AI tools from the future

If you’re someone seeking knowledge about generative AI tools, the initial impression of the results from the previous section may appear accurate. After all, it’s tempting to believe that whatever our users think must be correct… right?

However, seasoned UX researchers understand that these results, while valuable, are just one piece of the puzzle in a comprehensive UX research approach. Further analysis and additional tools are necessary to either validate, refute, or refine our initial findings.

Now, let’s delve into the realm of deep UX, moving beyond user research alone. To achieve this, we employed two methods: Heuristic Analysis (Expert Panel) and side-by-side comparison.

Let’s begin with heuristic analysis. For those familiar with it, you might be wondering how it connects to the current subject. We’ll address that shortly.

Heuristic Analysis Applied to Generative AI Apps

For those unfamiliar with the concept, heuristic analysis is a widely used research method in UX to identify problems and issues. In other words, its purpose is to identify what is wrong rather than what is right. This approach enables designers, product managers, and developers to focus on addressing the identified issues. After all, if something is already correct, there’s no need to include it in a report, right? To avoid overwhelming you, we have excluded the raw data from the heuristic analysis.

One distinguishing characteristic of heuristic analysis is that it is conducted by an Expert Panel. This means that individuals outside of the organization, who are experts in the field being researched (in this case, generative AI images), perform the analysis. These experts have no vested interest in the outcome. Simply put, the researcher contacted three experts (and no, I was not one of them—I am far from an expert) who agreed to evaluate the apps using Shneidermann’s Golden Rules of User Interface with the help of the Heurio App.

Another notable characteristic of heuristic analysis is its speed and flexibility. It possesses the ability to be swiftly applied to a wide range of scenarios and can be adjusted accordingly. For instance, when conducting a heuristic analysis for a client, we thoroughly examine each element of a website design or an application.

However, in certain cases, we can perform a holistic analysis of a website and generate a benchmark assessment. For individuals interested in learning about UX research methods, let’s consider the analysis of site A using the rule “#6: Permit easy reversal of actions.” Rather than solely focusing on a single element, we can evaluate the entire site. Does the site facilitate easy reversal of actions?

With this in mind, experts can assign a rating on a scale of 1 to 5 to different elements (where 1 represents “Very Bad” and 5 represents “Very Good”) and calculate an average score for the site.

Analysis of the Top 5 Generative Image AI Tools by Expert Panels

EXPERT 1Leonardo.aiMidjourneyKaiberFireflyBing
Strive for consistency.51455
Seek universal usability.42245
Offer informative feedback.43234
Design dialogs to yield closure.41243
Prevent errors.32144
Permit easy reversal of actions.23132
Keep users in control.33234
Reduce short-term memory load.41455

Expert 1 Insights

Midjourney has a very low heuristics rating because of its disociation between its own site and the environment where it works. Also, having to memorize commands in a visual user interface is a really bad user experience.

Adobe and Bing have teams of UX experts, and it’s evident when compared to startups. Leonardo is surprisingly good for a free startup, and it obviously has a team of experts working on its design and development.

EXPERT 2Leonardo.aiMidjourneyKaiberFireflyBing
Strive for consistency.53454
Seek universal usability.31334
Offer informative feedback.34143
Design dialogs to yield closure.32143
Prevent errors.31143
Permit easy reversal of actions.11123
Keep users in control.31134
Reduce short-term memory load.51535

Expert 2 Insights

Heuristics are common in this type of applications, especially considering that they’re new. The most striking thing was to find a text tool in an image application. Although the concept of “text2image” is what governs the functioning of these tools, dealing with a tool that only accepts commands that need to be memorized without the possibility of shortcuts is something that not every user will be able to accept.

EXPERT 3Leonardo.aiMidjourneyKaiberFireflyBing
Strive for consistency.51555
Seek universal usability.31342
Offer informative feedback.22242
Design dialogs to yield closure.43142
Prevent errors.22134
Permit easy reversal of actions.21121
Keep users in control.41333
Reduce short-term memory load.51334

Expert 3 Insights

As a regular user of all these services, I find their lack of contextual help and control really disturbing. There’s not much to say about Bing since it’s just a search box. Firefly does a good job, which is to be expected, and the same goes for, although there are some concerning accessibility issues. I find Kaiber to be simple and easy to use, although it has the fewest options after Bing. However, its user interface has significant problems that need to be addressed. Finally, Midjourney is a victim of its own model. In terms of usability and user interface, it is clearly the worst, but its model is successful, so I recommend a more prolonged and detailed study.

Summarizing Experts’ Insights

Generative AI Tools: Expert's Panel Opinions
Image AI generation using the prompt “Stupid Sexy Flanders” (don’t ask)

Surprising, isn’t it? The results of Heuristics Analysis are almost completely different from the User Research results on Page 3 of our case study.

So I asked them to give a subjective quantitative rating of their user experience, without taking into account the heuristic factors, but considering their overall experience. Basically, I wanted them to think about what they expected, what they actually received, and the effort required to achieve those results.

The results are as follow:

Expert 1Expert 2Expert 3
Generative AI Tools: Expert’s Gradation – Methodology: Likert Scale

Again, another surprise: The same experts who assigned Midjourney the worst heuristic values subjectively ranked it as the best Generative AI tool.

As we observed in the case of user research, this is merely a piece of the puzzle, and even this apparent contradiction is valuable data in itself.

Comparing AI Image Generators

So our last test is comparing results side by side. After all, that’s what matters.

For this purpose, we compared all the tools using the exact same prompt, trying to use the best model for that specific prompt. We didn’t use negative prompts since not all tools support that feature. Additionally, we didn’t compare Kaiber since it can’t generate a single image, making it impossible to compare side by side.

One thing to note: Bing Free AI Image Generator creates images with high levels of compression, which also explains the lack of quality. On the other hand, better tools provide a lot of additional information in the images. Please refer to the details in the image below.

AI Image Generation compression ratios
Bing highly compress its Ai-based images, while better tools have a lot of additional information

AI image generation test #1

a man dressed in black from head to toes staring at the sea, in the style of editorial photo, high reflection, hyperrealistic, high details, cinematic

(click on images to enlarge and see details)

Generative AI analysis

In this case, we used a very simple prompt that should result in a simple and almost monochromatic AI-generated image. Midjourney followed the prompt accurately and produced exactly what was requested, including the reflections and the editorial photo style. Bing also adhered well to the prompt, but lacks reflections and the quality is lower. Nonetheless, it is still a good overall result.

Leonardo took a “creative” approach and did not follow the prompt. The still legible logo of a stock photo company raises questions about the ethics of AI-generated images and how Leonardo creates its models.

Finally, Firefly’s outcome is even worse than Leonardo’s. The strange face and the right foot make this photo a complete failure. It is important to note that I (subjectively) chose the best option out of the four options provided by the AI generator, and this was the best one.

Generative Image AI test #2

A beautiful young woman is smiling and having fun. She is wearing a plain ruby shirt with no pattern. She is not wearing any jewelry. She is looking away from the camera, so we see her profile. The setting is natural, she's outdoors. cinematic portrait. hyper-detailed female. cinematic dreamy sharp look. beautifully color graded

Generative AI analysis

In this case, I took an already existing prompt from a random Midjourney user. I liked the quality, so I thought it would be a good idea to test it with multiple models from Midjourney’s prompt adherence is close to perfect, with one exception: it included jewelry. However, to exclude something from the prompt, a negative construction should be used, so I don’t think it’s Midjourney’s fault. Other than that, the photo is perfect.

While generating an image with the same prompt in Leonardo, I tried their most realistic models, only using each model once. The one in this set is the best I could get, and it actually turned out to be a very nice result. This time, Leonardo respected the prompt, although the lighting and colors are not as beautiful as in Midjourney’s case, but it still looks nice. And unlike Midjourney, there’s no visible jewelry.

Firefly generated a set of weird images with strange faces. The one I chose was the “least worst,” just a peculiar eye.

When it comes to Bing’s Free AI Generator, the result is pretty good in terms of prompt adherence. However, the extreme compression levels make the image appear very low quality, as if the subject is made of cardboard and pasted onto a blurry background. Other than that, it’s a decent result, and we can attribute the technical decision of over-compressing to save bandwidth.

An anthropological curiosity I observed through many image generations is that, for some reason, Midjourney tends to portray “beauty” as a Caucasian person. Meanwhile, Leonardo tends to associate “beauty” with Asian people. It would be very interesting to know the origins of these companies to determine if Cultural UX plays a role in prompt generation.

AI Image Generation test #3

a hyper realistic male peacock with his tail feathers open behind and it facing foward showing the details of the deep vivid colors of purple teal, blue and yellow with a white background

Generative AI analysis

This time, I tried a non-human subject and the ability to understand a plain solid color background for further usage (for example, to cut the subject from the background and use it in another composition).

Midjourney did an excellent job again. While it doesn’t look very realistic, it respected the prompt, and the result is a peacock with nice detail of the feathers and a white background. Exactly what I asked for.

Bing came second. It’s a bit less realistic than Midjourney’s image, but still quite decent, and it has a white background. Again, the over-compression problem is an issue, but it is what it is.

Firefly AI image generator clearly understood “white background” as “white feathers.” The peacock’s “eyes” look a bit psychedelic, but overall, it’s a decent result and by far the best I could achieve with Firefly.

Finally, Leonardo didn’t follow the prompt and also created a strange image. The face and neck are nice, but everything else is a mess.

Generative AI Tools Case Study: Conclusion

So we had to come to an end of this case study. I made it as short as possible, yet it obviously is really long. I hope not excessively long, and I really hope you made it through all of it.

What follows next is my personal opinion on each of these tools based on extensive usage and testing, plus the learning extracted from user research and the Expert’s panel heuristic analysis. Please note that, as I said previously, this case study will evolve and change in the near future, and whatever you read today may not be what I originally wrote nor what each app will be in the future (or even new apps in the market).

With this being said, let’s go to the final insights, in alphabetical order:

Adobe Firefly Image Generation

A big question mark. It’s Adobe, they used Artificial Intelligence long before other programs, their plans for generative AI are impressive… yet they fell short on everything. Of all the tools we analyzed, it is the one that generated the weirdest images. And I’m not saying it in a good way. Nevertheless, again: it’s Adobe, so I expect them to improve. For now, my rating is 3 out of 10.

Would I use it? Not as it is now.

Bing Free AI Image Generator

The most modest of all tools actually does a pretty good job if you’re not very exigent. It has above-average prompt adherence, it’s easy to use, and it’s free. Overcompression and lack of additional tools take some points off it. Then again: it’s free, and user research positioned it very well. All things considered, my rating is 5 out of 10.

Would I use it? Yes if I didn’t have access to other tools.

Kaiber Video AI generator gets extra points for being the only AI video generation tool we know of. It doesn’t matter it’s not real video, they’re the only ones in the niche, and we have to recognize it. Prompt adherence is really poor, and additional commands that could make a difference (like reactivity) make very little difference. Add to that a pretty shady way to conduct business, and we go from a massive winner result to a huge question mark. Because of all these things and variables, I give it 4 points out of 10.

Would I use it? Unless they improve all the things they need to improve, no. Free AI Generation

Leonardo has everything and then some more, but they always fall short. However, they’re constantly developing new models and features. And yes, there will come a time when they will charge. But for now, it’s free. Quality is pretty good, prompt adherence is a bit poor. Also, the only one (at least that I found) that steals from stock image repositories, which I find quite unethical, especially when there are billions of images without logos they could have obtained, even without copyright. Nevertheless, a great resource that will gain traction pretty soon. I give it 7 points out of 10.

Would I use it? Yes, I actually use it on a daily basis.


I think Midjourney is in a different league right now. It’s all that Leonardo wants to be (let alone other generative AI tools). However, the Discord interface, the need to memorize commands (come on, it’s 2023!), and the shady charges to credit card take some points off from an otherwise perfect score. All things considered, I’ll give it 8 points out of 10.

Would I use it? Yes, I actually use it on a daily basis.

Summary of our AI Image Generation Tools

Artificial Intelligence is here to stay, and image generation through AI-based tools is the emergence of this new scenario. Some people may be scared of what this new scenario means, and I won’t judge in any way. But the reality is that things won’t go back to a previous point, so it’s futile to resist a change that not only has started but is also in full swing even though it’s still in its early stages. Just imagine once these tools reach their full potential!

Well, that’s all for now. I hope you liked it, and don’t forget to bookmark this page and come back in the next months to see how things will (probably) change. That’s another research experiment in itself!

For those interested in having high-quality UX research studies for their businesses, do not hesitate to contact us!

We can improve your business!

Let us help you with the best solutions for your business.

It only takes one step, you're one click away from getting guaranteed results!

I want to improve my business NOW!