
Again in November, I examined the picture era capabilities inside Google’s Gemini, which was powered by the Imagen 3 mannequin. Whereas I favored it, I bumped into its limitations fairly shortly. Google lately rolled out its successor — Imagen 4 — and I’ve been placing it by means of its paces over the past couple of weeks.
I feel the brand new model is unquestionably an enchancment, as a number of the points I had with Imagen 3 at the moment are fortunately gone. However some frustrations nonetheless stay, that means the brand new model isn’t fairly pretty much as good as I’d like.
How usually do you create pictures with AI?
442 votes
So, what has improved?

The standard of the photographs produced has typically improved, although the advance isn’t huge. Imagen 3 was already typically good at creating pictures of individuals, animals, and surroundings, however the brand new model constantly produces sharper, extra detailed pictures.
In terms of producing pictures of individuals — which is just potential with Gemini Superior — I had persistent points with Imagen 3 the place it might create cartoonish-looking images, even after I wasn’t asking for that particular type. Prompting it to alter the picture to one thing extra lifelike was usually a shedding battle. I haven’t skilled any of that with Imagen 4. All the photographs of individuals it generates look very skilled — maybe a bit an excessive amount of, which is one thing we’ll contact on later.
One in all my greatest frustrations with the older mannequin was the restricted management over facet ratios. I usually felt caught with 1:1 sq. pictures, which severely restricted their use case. I couldn’t use them for on-line publications, and printing them for the standard picture body was out of the query.
Whereas Imagen 4 nonetheless defaults to a 1:1 ratio, I can now merely immediate it to make use of a special one, like 16:9, 9:16, or 4:3. That is the characteristic I’ve been ready for, because it makes the photographs created way more versatile and usable.
Imagen 4 additionally works much more easily. Whereas I haven’t discovered it to be noticeably sooner — though a sooner mannequin is reportedly within the works — there are far fewer errors. With the earlier model, Gemini would generally present an error message, saying it couldn’t produce a picture for an unknown motive. I’ve acquired none of these with Imagen 4. It simply works.
Nonetheless appears to be like a bit too retouched
Whereas Imagen 4 produces higher pictures, is extra dependable, and permits for various facet ratios, a number of the points I encountered when testing its predecessor are nonetheless current.
My essential downside is that the photographs usually aren’t as lifelike as I’d like, particularly when creating close-ups of individuals and animals. Pictures have a tendency to return out fairly saturated, and plenty of characteristic a outstanding bokeh impact that professionally blurs the background. All of them seem like they had been taken by a photographer with 15 years of expertise as an alternative of by me, simply pointing a digital camera at my cat and urgent the shutter.
Certain, they appear good, however a “informal mode” can be a unbelievable addition — one thing extra lifelike, the place the lighting isn’t excellent and the topic isn’t posing like a mannequin. I prompted Gemini to make a picture extra lifelike by eradicating the bokeh impact and usually making it much less excellent. The AI did strive, however after prompting it three or 4 occasions on the identical picture, it appeared to succeed in its restrict and stated it couldn’t do any higher. Every new picture it produced was a bit extra informal, however it was nonetheless fairly polished, clearly hinting that it was AI-generated.
You’ll be able to see that within the pictures above, going from left to proper. The primary one features a sturdy bokeh impact, and the person has very clear pores and skin, whereas the opposite two progress to the person trying older and older, in addition to extra drained. He even began balding a bit within the final picture. It’s not what I actually meant when prompting Gemini to make the picture extra lifelike, though it does come out extra informal.
Imagen 4 does a a lot better job with random pictures like landscapes and metropolis skylines. These pictures, taken from afar, don’t embrace as many close-up particulars, so they appear extra real. Nonetheless, it may be a hit and miss. A picture of the Sydney Opera Home appears to be like nice, though the saturation is bumped up fairly a bit — the grass is further inexperienced, and the water is a picture-perfect blue. However after I requested for an image of the Grand Canyon, it got here out trying utterly synthetic and wouldn’t idiot anybody into considering it was an actual picture. It did carry out higher after a couple of retries, although.
Enhancing is healthier, however not fairly there
One in all my gripes with the earlier model was its clumsy enhancing. When requested to alter one thing minor — like the colour of a hat — the AI would do it, however it might additionally generate a model new, utterly totally different picture. The best situation can be to create a picture after which be allowed to edit each element exactly, comparable to altering a chunk of clothes, including a selected merchandise, or altering the climate circumstances whereas leaving every little thing else precisely as is.
Imagen 4 is healthier on this regard, however not by a lot. After I prompted it to alter the colour of a jacket to blue, it created a brand new picture. Nevertheless, by particularly asking it to maintain all different particulars the identical, it managed to keep up a variety of the surroundings and topic from the unique. That’s what occurred within the examples above. The girl within the third picture was the identical, and he or she gave the impression to be in the same room, however her pose and the digital camera angle had been totally different, making it extra of a re-shoot than an edit.
Right here’s one other instance of a cat consuming a popsicle. I prompted Gemini to alter the colour of the popsicle, and it did, and it saved a variety of the small print. The cat’s the identical, and so is a lot of the background. However the cat’s ears at the moment are protruding, and the hat is a bit totally different. Nonetheless, strive.
Regardless of its shortcomings, Imagen 4 is a good instrument
Even with its points and an extended wishlist of lacking performance, Imagen 4 continues to be among the many greatest AI picture turbines out there. A lot of the issues I’ve talked about are additionally current in different AI image-generation software program, so it’s not as if Gemini is behind the competitors. It appears there are vital technical hurdles that must be overcome earlier than a lot of these instruments can attain the following degree of precision and realism.
Different limitations are nonetheless in place, comparable to the shortcoming to create pictures of well-known individuals or generate content material that violates Google’s security tips. Whether or not that’s or a foul factor is a matter of opinion. For customers looking for fewer restrictions, there are alternate options like Grok.
Have you ever tried out the newest picture era in Gemini? Let me know your ideas within the feedback.