Apple Releases MGIE, an AI Model for Instruction-Based Image Editing (venturebeat.com) 21

Posted by msmash on Wednesday February 07, 2024 @11:40AM from the prodigal-son-returns dept.

Apple has released a new open-source AI model, called "MGIE," that can edit images based on natural language instructions. From a report: MGIE, which stands for MLLM-Guided Image Editing, leverages multimodal large language models (MLLMs) to interpret user commands and perform pixel-level manipulations. The model can handle various editing aspects, such as Photoshop-style modification, global photo optimization, and local editing. MGIE is the result of a collaboration between Apple and researchers from the University of California, Santa Barbara. The model was presented in a paper accepted at the International Conference on Learning Representations (ICLR) 2024, one of the top venues for AI research. The paper demonstrates the effectiveness of MGIE in improving automatic metrics and human evaluation, all while maintaining competitive inference efficiency.

MGIE is based on the idea of using MLLMs, which are powerful AI models that can process both text and images, to enhance instruction-based image editing. MLLMs have shown remarkable capabilities in cross-modal understanding and visual-aware response generation, but they have not been widely applied to image editing tasks. MGIE integrates MLLMs into the image editing process in two ways: First, it uses MLLMs to derive expressive instructions from user input. These instructions are concise and clear and provide explicit guidance for the editing process. For example, given the input "make the sky more blue," MGIE can produce the instruction "increase the saturation of the sky region by 20%."

Apple Releases MGIE, an AI Model for Instruction-Based Image Editing

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 21 Comments Log In/Create an Account

Comments Filter:

One step closer (Score:4, Insightful)

by flippy ( 62353 ) writes: on Wednesday February 07, 2024 @11:46AM (#64222280) Homepage

And we're one step closer to the scene in Blade Runner where Deckard speaks image processing commands to analyze a photo.

- Re: (Score:3)
  
  by NoMoreACs ( 6161580 ) writes:
  
  And we're one step closer to the scene in Blade Runner where Deckard speaks image processing commands to analyze a photo.
  I'd say we're there.
  And no annoying clicky pan and zoom!
- Re: (Score:3)
  
  by Rei ( 128717 ) writes:
  
  We've had this for over a year in Stable Diffusion with Instruct Pix2Pix. Though I certainly look forward to the models improving. I usually find it to edit images in other ways than to use Instruct Pix2Pix.
  - Re: (Score:2)
    
    by cayenne8 ( 626475 ) writes:
    
    I wonder with the Apple version being discussed here....if it marks anything as edited with watermarks, etc....of if it has any guard rails on it to keep it from doing NSFW imagery ?
    Is this truly open source and not controlled by Apple in the wild like Stable Diffusion is?
    - Re: (Score:2)
      
      by superzerg ( 1523387 ) writes:
      
      looking at the license from https://github.com/apple/ml-mg... [github.com] it looks like it is open source
      Apple grants you a personal, non-exclusive license, under Apple's copyrights in this original Apple software (the "Apple Software"), to use, reproduce, modify and redistribute the Apple Software, with or without modifications, in source and/or binary forms;
Example query (Score:3)

by Tablizer ( 95088 ) writes: on Wednesday February 07, 2024 @11:49AM (#64222288) Journal

Alexa, draw our butts even bigger, -Kardashians

- Re:Example query (Score:4, Funny)
  
  by cayenne8 ( 626475 ) writes: on Wednesday February 07, 2024 @12:49PM (#64222450) Homepage Journal
  
  Alexa, draw our butts even bigger, -Kardashians
  Sorry...not enough canvas left in the universe needed to comply.
  
- Re: (Score:2)
  
  by sit1963nz ( 934837 ) writes:
  
  Look up the TV series "The rise and fall of Reginald Perrin"...when ever he hears about his mother-in-law they show are arse end of a hippopotamus
  
  I get the same vision when I hear Kar-who-evers
- Re: (Score:2)
  
  by cusco ( 717999 ) writes:
  
  The mystifying thing to me is that I still have seen no way to painlessly remove telephone poles and utility wires from an image.
James Fridman version? (Score:5, Funny)

by dgatwood ( 11270 ) writes: on Wednesday February 07, 2024 @12:11PM (#64222366) Homepage Journal

Now there's only one thing left to do: Create the James Fridman input transformer that deliberately reinterprets your image editing prompt in a way that is likely to cause the most appallingly wrong interpretation of what you asked for.

- Re: (Score:3)
  
  by NoMoreACs ( 6161580 ) writes:
  
  Now there's only one thing left to do: Create the James Fridman input transformer that deliberately reinterprets your image editing prompt in a way that is likely to cause the most appallingly wrong interpretation of what you asked for.
  That sounds like something Douglas Adams would have Dreamed-up! (Dreamt?)
Will be a success with only one function (Score:4, Funny)

by Zemplar ( 764598 ) writes: on Wednesday February 07, 2024 @12:14PM (#64222372) Journal

"Enhance"

- Re: (Score:2)
  
  by Petersko ( 564140 ) writes:
  
  https://www.youtube.com/watch?... [youtube.com]
  A classic.
- Re: (Score:2)
  
  by bussdriver ( 620565 ) writes:
  
  What somebody eventually *is* going to do is a CSI type "enhance" where they pull some impossible image from around the corner out of view maybe from a reflection on an eye's surface by saying "enhance..." phrases until the generative AI makes up a whole image.
Slashdotted (Score:2)

by ratbag ( 65209 ) writes:

It's been a while, probably not just us, but the demo site is brought to you by 2001-era servers.
- Re: (Score:1)
  
  by superzerg ( 1523387 ) writes:
  
  the demo site is hungingface.co, hardly a small site by any measure and there are no problem to access it.
  However, trying the model though put me in a queue 5h long ... :-/
  - Re: (Score:2)
    
    by ratbag ( 65209 ) writes:
    
    Hence my comment. Accessing a site is cool and all, but five hours is a little too much to play "I wonder what this does?"
Apple lacks the creds (Score:1)

by Anonymous Coward writes:

I don't think Apple can produce anything "remarkable" in the AI sphere. 99% of the time I ask Siri for directions to Annerley Road it plots a route to Anerly in London on the wrong side of the world.
- Re: (Score:1)
  
  by NoMoreACs ( 6161580 ) writes:
  
  I don't think Apple can produce anything "remarkable" in the AI sphere. 99% of the time I ask Siri for directions to Annerley Road it plots a route to Anerly in London on the wrong side of the world.
  Just wait.
  Siri's being rewritten from the ground-up.
- Re: (Score:2)
  
  by cusco ( 717999 ) writes:
  
  Siri has the tendency to send us on long roundabout detours that eventually loop back to the point where we left the original route and then continue on our way. It's done this enough times that even on my wife's iToy we just use Maps now.
Will it help? (Score:1)

by ajkieser ( 450423 ) writes:

In the last months I replayed some "ancient" DOS games for nostalgic reasons. Due to the common AI / MLM hype I have asked myself if those gigantic models could help to recreate the graphics of those old things - automatically!
I mean, those pesky CGA graphics were bad back then but accepted because there was nothing better. Should it not be possible for those models to take the existing images / renderings and crank them up? I mean, if they can produce high quality porn images with just some keywords, coul

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Apple Releases MGIE, an AI Model for Instruction-Based Image Editing (venturebeat.com) 21

Apple Releases MGIE, an AI Model for Instruction-Based Image Editing More Login

Apple Releases MGIE, an AI Model for Instruction-Based Image Editing

One step closer (Score:4, Insightful)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Example query (Score:3)

Re:Example query (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

James Fridman version? (Score:5, Funny)

Re: (Score:3)

Will be a success with only one function (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

Slashdotted (Score:2)

Re: (Score:1)

Re: (Score:2)

Apple lacks the creds (Score:1)

Re: (Score:1)

Re: (Score:2)

Will it help? (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot