Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Apple Technology

Apple Releases MGIE, an AI Model for Instruction-Based Image Editing (venturebeat.com) 21

Apple has released a new open-source AI model, called "MGIE," that can edit images based on natural language instructions. From a report: MGIE, which stands for MLLM-Guided Image Editing, leverages multimodal large language models (MLLMs) to interpret user commands and perform pixel-level manipulations. The model can handle various editing aspects, such as Photoshop-style modification, global photo optimization, and local editing. MGIE is the result of a collaboration between Apple and researchers from the University of California, Santa Barbara. The model was presented in a paper accepted at the International Conference on Learning Representations (ICLR) 2024, one of the top venues for AI research. The paper demonstrates the effectiveness of MGIE in improving automatic metrics and human evaluation, all while maintaining competitive inference efficiency.

MGIE is based on the idea of using MLLMs, which are powerful AI models that can process both text and images, to enhance instruction-based image editing. MLLMs have shown remarkable capabilities in cross-modal understanding and visual-aware response generation, but they have not been widely applied to image editing tasks. MGIE integrates MLLMs into the image editing process in two ways: First, it uses MLLMs to derive expressive instructions from user input. These instructions are concise and clear and provide explicit guidance for the editing process. For example, given the input "make the sky more blue," MGIE can produce the instruction "increase the saturation of the sky region by 20%."

This discussion has been archived. No new comments can be posted.

Apple Releases MGIE, an AI Model for Instruction-Based Image Editing

Comments Filter:
  • One step closer (Score:4, Insightful)

    by flippy ( 62353 ) on Wednesday February 07, 2024 @12:46PM (#64222280) Homepage
    And we're one step closer to the scene in Blade Runner where Deckard speaks image processing commands to analyze a photo.
    • And we're one step closer to the scene in Blade Runner where Deckard speaks image processing commands to analyze a photo.

      I'd say we're there.

      And no annoying clicky pan and zoom!

    • by Rei ( 128717 )

      We've had this for over a year in Stable Diffusion with Instruct Pix2Pix. Though I certainly look forward to the models improving. I usually find it to edit images in other ways than to use Instruct Pix2Pix.

      • I wonder with the Apple version being discussed here....if it marks anything as edited with watermarks, etc....of if it has any guard rails on it to keep it from doing NSFW imagery ?

        Is this truly open source and not controlled by Apple in the wild like Stable Diffusion is?

        • looking at the license from https://github.com/apple/ml-mg... [github.com] it looks like it is open source
          Apple grants you a personal, non-exclusive license, under Apple's copyrights in this original Apple software (the "Apple Software"), to use, reproduce, modify and redistribute the Apple Software, with or without modifications, in source and/or binary forms;
  • by Tablizer ( 95088 ) on Wednesday February 07, 2024 @12:49PM (#64222288) Journal

    Alexa, draw our butts even bigger, -Kardashians

  • by dgatwood ( 11270 ) on Wednesday February 07, 2024 @01:11PM (#64222366) Homepage Journal

    Now there's only one thing left to do: Create the James Fridman input transformer that deliberately reinterprets your image editing prompt in a way that is likely to cause the most appallingly wrong interpretation of what you asked for.

    • Now there's only one thing left to do: Create the James Fridman input transformer that deliberately reinterprets your image editing prompt in a way that is likely to cause the most appallingly wrong interpretation of what you asked for.

      That sounds like something Douglas Adams would have Dreamed-up! (Dreamt?)

  • by Zemplar ( 764598 ) on Wednesday February 07, 2024 @01:14PM (#64222372) Journal
    "Enhance"
  • It's been a while, probably not just us, but the demo site is brought to you by 2001-era servers.

    • the demo site is hungingface.co, hardly a small site by any measure and there are no problem to access it.
      However, trying the model though put me in a queue 5h long ... :-/
      • by ratbag ( 65209 )

        Hence my comment. Accessing a site is cool and all, but five hours is a little too much to play "I wonder what this does?"

  • by Anonymous Coward
    I don't think Apple can produce anything "remarkable" in the AI sphere. 99% of the time I ask Siri for directions to Annerley Road it plots a route to Anerly in London on the wrong side of the world.
    • I don't think Apple can produce anything "remarkable" in the AI sphere. 99% of the time I ask Siri for directions to Annerley Road it plots a route to Anerly in London on the wrong side of the world.

      Just wait.

      Siri's being rewritten from the ground-up.

    • by cusco ( 717999 )

      Siri has the tendency to send us on long roundabout detours that eventually loop back to the point where we left the original route and then continue on our way. It's done this enough times that even on my wife's iToy we just use Maps now.

  • In the last months I replayed some "ancient" DOS games for nostalgic reasons. Due to the common AI / MLM hype I have asked myself if those gigantic models could help to recreate the graphics of those old things - automatically!

    I mean, those pesky CGA graphics were bad back then but accepted because there was nothing better. Should it not be possible for those models to take the existing images / renderings and crank them up? I mean, if they can produce high quality porn images with just some keywords, coul

Life is a healthy respect for mother nature laced with greed.

Working...