Apple Releases MGIE, an AI Model for Instruction-Based Image Editing (venturebeat.com) 21
Apple has released a new open-source AI model, called "MGIE," that can edit images based on natural language instructions. From a report: MGIE, which stands for MLLM-Guided Image Editing, leverages multimodal large language models (MLLMs) to interpret user commands and perform pixel-level manipulations. The model can handle various editing aspects, such as Photoshop-style modification, global photo optimization, and local editing. MGIE is the result of a collaboration between Apple and researchers from the University of California, Santa Barbara. The model was presented in a paper accepted at the International Conference on Learning Representations (ICLR) 2024, one of the top venues for AI research. The paper demonstrates the effectiveness of MGIE in improving automatic metrics and human evaluation, all while maintaining competitive inference efficiency.
MGIE is based on the idea of using MLLMs, which are powerful AI models that can process both text and images, to enhance instruction-based image editing. MLLMs have shown remarkable capabilities in cross-modal understanding and visual-aware response generation, but they have not been widely applied to image editing tasks. MGIE integrates MLLMs into the image editing process in two ways: First, it uses MLLMs to derive expressive instructions from user input. These instructions are concise and clear and provide explicit guidance for the editing process. For example, given the input "make the sky more blue," MGIE can produce the instruction "increase the saturation of the sky region by 20%."
MGIE is based on the idea of using MLLMs, which are powerful AI models that can process both text and images, to enhance instruction-based image editing. MLLMs have shown remarkable capabilities in cross-modal understanding and visual-aware response generation, but they have not been widely applied to image editing tasks. MGIE integrates MLLMs into the image editing process in two ways: First, it uses MLLMs to derive expressive instructions from user input. These instructions are concise and clear and provide explicit guidance for the editing process. For example, given the input "make the sky more blue," MGIE can produce the instruction "increase the saturation of the sky region by 20%."
One step closer (Score:4, Insightful)
Re: (Score:3)
And we're one step closer to the scene in Blade Runner where Deckard speaks image processing commands to analyze a photo.
I'd say we're there.
And no annoying clicky pan and zoom!
Re: (Score:3)
We've had this for over a year in Stable Diffusion with Instruct Pix2Pix. Though I certainly look forward to the models improving. I usually find it to edit images in other ways than to use Instruct Pix2Pix.
Re: (Score:2)
Is this truly open source and not controlled by Apple in the wild like Stable Diffusion is?
Re: (Score:2)
Apple grants you a personal, non-exclusive license, under Apple's copyrights in this original Apple software (the "Apple Software"), to use, reproduce, modify and redistribute the Apple Software, with or without modifications, in source and/or binary forms;
Example query (Score:3)
Alexa, draw our butts even bigger, -Kardashians
Re:Example query (Score:4, Funny)
Sorry...not enough canvas left in the universe needed to comply.
Re: (Score:2)
I get the same vision when I hear Kar-who-evers
Re: (Score:2)
The mystifying thing to me is that I still have seen no way to painlessly remove telephone poles and utility wires from an image.
James Fridman version? (Score:5, Funny)
Now there's only one thing left to do: Create the James Fridman input transformer that deliberately reinterprets your image editing prompt in a way that is likely to cause the most appallingly wrong interpretation of what you asked for.
Re: (Score:3)
Now there's only one thing left to do: Create the James Fridman input transformer that deliberately reinterprets your image editing prompt in a way that is likely to cause the most appallingly wrong interpretation of what you asked for.
That sounds like something Douglas Adams would have Dreamed-up! (Dreamt?)
Will be a success with only one function (Score:4, Funny)
Re: (Score:2)
https://www.youtube.com/watch?... [youtube.com]
A classic.
Re: (Score:2)
What somebody eventually *is* going to do is a CSI type "enhance" where they pull some impossible image from around the corner out of view maybe from a reflection on an eye's surface by saying "enhance..." phrases until the generative AI makes up a whole image.
Slashdotted (Score:2)
It's been a while, probably not just us, but the demo site is brought to you by 2001-era servers.
Re: (Score:1)
However, trying the model though put me in a queue 5h long
Re: (Score:2)
Hence my comment. Accessing a site is cool and all, but five hours is a little too much to play "I wonder what this does?"
Apple lacks the creds (Score:1)
Re: (Score:1)
I don't think Apple can produce anything "remarkable" in the AI sphere. 99% of the time I ask Siri for directions to Annerley Road it plots a route to Anerly in London on the wrong side of the world.
Just wait.
Siri's being rewritten from the ground-up.
Re: (Score:2)
Siri has the tendency to send us on long roundabout detours that eventually loop back to the point where we left the original route and then continue on our way. It's done this enough times that even on my wife's iToy we just use Maps now.
Will it help? (Score:1)
In the last months I replayed some "ancient" DOS games for nostalgic reasons. Due to the common AI / MLM hype I have asked myself if those gigantic models could help to recreate the graphics of those old things - automatically!
I mean, those pesky CGA graphics were bad back then but accepted because there was nothing better. Should it not be possible for those models to take the existing images / renderings and crank them up? I mean, if they can produce high quality porn images with just some keywords, coul