Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI Open Source Apple

Apple Launches MLX Machine-Learning Framework For Apple Silicon (computerworld.com) 31

Apple has released MLX, a free and open-source machine learning framework for Apple Silicon. Computerworld reports: The idea is that it streamlines training and deployment of ML models for researchers who use Apple hardware. MLX is a NumPy-like array framework designed for efficient and flexible machine learning on Apple's processors. This isn't a consumer-facing tool; it equips developers with what appears to be a powerful environment within which to build ML models. The company also seems to have worked to embrace the languages developers want to use, rather than force a language on them -- and it apparently invented powerful LLM tools in the process.

MLX design is inspired by existing frameworks such as PyTorch, Jax, and ArrayFire. However, MLX adds support for a unified memory model, which means arrays live in shared memory and operations can be performed on any of the supported device types without performing data copies. The team explains: "The Python API closely follows NumPy with a few exceptions. MLX also has a fully featured C++ API which closely follows the Python API."

Apple has provided a collection of examples of what MLX can do. These appear to confirm the company now has a highly-efficient language model, powerful tools for image generation using Stable Diffusion, and highly accurate speech recognition. This tallies with claims earlier this year, and some speculation concerning infinite virtual world creation for future Vision Pro experiences. Ultimately, Apple seems to want to democratize machine learning. "MLX is designed by machine learning researchers for machine learning researchers," the team explains.

This discussion has been archived. No new comments can be posted.

Apple Launches MLX Machine-Learning Framework For Apple Silicon

Comments Filter:
  • So it must run on Apple II's and my Commodore 64.

    https://en.wikipedia.org/wiki/... [wikipedia.org]

  • Apple finally enters the AI chat and OpenSource??? So unlike Apple. I guess they don't want to build it for us.

    • Apple finally enters the AI chat and OpenSource??? So unlike Apple. I guess they don't want to build it for us.

      But as usual, Apple is rarely in the bleeding edge; but when they do something, they tend to do it right.

  • Why not on Intel? (Score:4, Interesting)

    by henrik stigell ( 6146516 ) on Thursday December 07, 2023 @07:18PM (#64065163) Homepage

    Is there some technical reason why it can't be ran on Intel? I am quite disappointed at my pretty expensive MBP 16" Intel. In everything I use it for I use just a minor part of its capacity (I develop in IntelliJ/Eclipse, sometimes use Handbrake etc - of course, it runs at 100 % when compressing a video or compiling something, but it is usually so fast anyway so it really doesn't matter. The rest of the time is mostly idles.), with one exception - and that is AI-related stuff. I use Whisper a lot and that is slow as f*ck. And despite paying quite a bit extra for a better GPU than the default one, AMD Radeon's aren't supported by most AI-frameworks.

    So I have a computer that offers more capacity than I need for 99 % of all tasks, but is seriously underpowered for one specific task I use a lot.

    I spent $3500 on this machine and if AI hadn't happened it would have been up to the task for probably, at least, five more years (I also have a MBP 15" 2015, which, with the exception for AI, actually is perfectly capable. The only reason I replaced it with the 16" was that it mechanically started to break after fours years of pretty heave mobile use).

    So, could Intel be supported by this framework? And how do I get to use my GPU (AMD Radeon Pro 5500M 4 GB) for AI-related stuff? GPU-support would probably extend this machine's useful life several years.

    • Re:Why not on Intel? (Score:5, Informative)

      by NoMoreACs ( 6161580 ) on Thursday December 07, 2023 @09:03PM (#64065341)

      Is there some technical reason why it can't be ran on Intel? I am quite disappointed at my pretty expensive MBP 16" Intel. In everything I use it for I use just a minor part of its capacity (I develop in IntelliJ/Eclipse, sometimes use Handbrake etc - of course, it runs at 100 % when compressing a video or compiling something, but it is usually so fast anyway so it really doesn't matter. The rest of the time is mostly idles.), with one exception - and that is AI-related stuff. I use Whisper a lot and that is slow as f*ck. And despite paying quite a bit extra for a better GPU than the default one, AMD Radeon's aren't supported by most AI-frameworks.

      So I have a computer that offers more capacity than I need for 99 % of all tasks, but is seriously underpowered for one specific task I use a lot.

      I spent $3500 on this machine and if AI hadn't happened it would have been up to the task for probably, at least, five more years (I also have a MBP 15" 2015, which, with the exception for AI, actually is perfectly capable. The only reason I replaced it with the 16" was that it mechanically started to break after fours years of pretty heave mobile use).

      So, could Intel be supported by this framework? And how do I get to use my GPU (AMD Radeon Pro 5500M 4 GB) for AI-related stuff? GPU-support would probably extend this machine's useful life several years.

      If you haven't noticed, Apple Silicon SoCs have a fair amount of Silicon devoted to Machine Learning processing. This Framework is obviously aimed at taking advantage of all that dedicated hardware.

      My feeling is that this could possibly modified to run on something like typical GPUs; but it just isn't optimized for that sort of processing and non-Unified Memory (which has its own, unique throughput advantages).

      • Apple Silicon SoCs have a fair amount of Silicon devoted to Machine Learning processing.

        Not that much.
        The ANE covers about as much space as 4 E cores.
        Also, this library does not support the ANE.

        My feeling is that this could possibly modified to run on something like typical GPUs; but it just isn't optimized for that sort of processing and non-Unified Memory (which has its own, unique throughput advantages).

        You wouldn't really want to.
        Existing libraries already do this for other devices.
        The problem is that things like PyTorch and PyNum have terrible performance on AS via their mps backend.
        This will give a similar API to those, but undoubtedly using CoreML on the backend (which nobody seems to want to learn how to use) so that you can get non-shit performance out of your AS GPU in ML tasks.

        • Apple Silicon SoCs have a fair amount of Silicon devoted to Machine Learning processing.

          Not that much.

          The ANE covers about as much space as 4 E cores.

          Also, this library does not support the ANE.

          My feeling is that this could possibly modified to run on something like typical GPUs; but it just isn't optimized for that sort of processing and non-Unified Memory (which has its own, unique throughput advantages).

          You wouldn't really want to.

          Existing libraries already do this for other devices.

          The problem is that things like PyTorch and PyNum have terrible performance on AS via their mps backend.

          This will give a similar API to those, but undoubtedly using CoreML on the backend (which nobody seems to want to learn how to use) so that you can get non-shit performance out of your AS GPU in ML tasks.

          How can this not use the ANE?

          I understand that porting this wouldn't make that much sense.

          • I don't know. Can only speculate.
            The TOPS figure quoted for the ANE is INT8 operations. The ANE *will* do fp16, but at half the performance. It will not do fp32, at all.
            You wouldn't generally quantize any model down to INT8 unless you had to, because the quality of the results will be lower.
            The "justification" for doing this on cell phones has been that they're small devices with small screens, so the trade-off of poorer quality inference is worth the huge increase in efficiency of said inference.
            i.e.,
            • I don't know. Can only speculate.

              The TOPS figure quoted for the ANE is INT8 operations. The ANE *will* do fp16, but at half the performance. It will not do fp32, at all.

              You wouldn't generally quantize any model down to INT8 unless you had to, because the quality of the results will be lower.

              The "justification" for doing this on cell phones has been that they're small devices with small screens, so the trade-off of poorer quality inference is worth the huge increase in efficiency of said inference.

              i.e., my MBP 32-core GPU will use ~60W to crunch through a 50-step 512x512 fp32/fp16 SD1.5 image, taking approximately 7 seconds.

              My ANE uses ~3W to crunch through a 60-step 512x512 fp16 SD1.5 imagine, taking approximately 24 seconds. That would be halved to ~12 seconds if I quantized it to INT8 (though you really wouldn't want to do that with SD, the results would be.... bad)

              The ANE just isn't very good for mid quality inferences, and can't do high quality inferences at all.

              I think it's only really targeted for mobile devices, where the energy tradeoff is not only huge, but likely required, because nobody's putting a high power GPU in a phone. 60W in a package that small is called a firestarter.

              I think we got them in our M*'s, because 1) architecturally, having it there was probably close to zero work (since they're a regular part of the A* parts), and 2) i imagine it's useful for people to be able to test on actual real hardware phones will be using, if you're trying to market the mac as a platform for making mobile apps.

              Hmmm. Ick.

              I see why they didn't support the ANE in this Framework; there's little point in it.

              I would ASSume that maybe Apple is going to shower some 3 nm-acreage Love unto the ANE, and Thus it Grew the Ability to handle fp64 (and maybe even some fp128 Results), along with a (one time) 200 x Troughput Boost.

              I can Dream. . . ;-)

    • Maybe but I'm not sure why you would. The whole aim here is to leverage the unified memory model of apple silicon to avoid expensive memory copies. That isn't an option with intel + GPU devices, so you'd be far better off with a conventional framework that supports AMD. Which isn't a lot. AMD is generally not ones first choice for ML. In fact in general laptop GPUs are generally not up to the task for much beyond basic inference (although I'm yet to see how this new stuff on apple silicon fares, I do know t

      • The whole point is getting more code into a closed garden. Unified memory vs. a GPU behind PCIe is a minimal difference in and of itself. Ignoring complex parallel schemes for training you're not ping ponging a lot of data between domains (and no one is going to do that on Apple silicon at the moment).

        • Unified memory vs. a GPU behind PCIe is a minimal difference in and of itself.

          Depends. Sometimes it is, sometimes the difference is fucking massive.
          Run SD on "High" memory usage and see. Then run SD on a GPU that doesn't have enough RAM to do that, and observe the difference.
          Running in "High" (all data is prepped with no data moving) will give you 4.5x the performance per TFLOP. +450% is not nothing.
          Using High and CoreML, my M1 Max (10.4 TFLOPS) can do a 512x512 SD1.5 50 step rendering in the same amount of time as an RX6950XT (47 TFLOPS)

          Then compare to a games... where you get

          • What is SD?

          • But that's not a question of unified memory, that's just a question of insufficient GPU memory ... most of which have 8+ GB now.

            • But that's not a question of unified memory, that's just a question of insufficient GPU memory ...

              Well sure, absolutely.
              But that's literally why unified memory exists.
              Because devices with discrete local memory are connected by very, very slow buses.

              most of which have 8+ GB now.

              Still not enough for a big model.
              20+ is generally needed.

              So ya, unified memory lets you have big fucking VRAM.
              In a perfect world, you would have big discrete VRAM, and it would be connected to host memory at something other than a fucking abysmally bad 32GB/s, but we aren't there yet, so until we are, there are certain applications where "Unified Memory

              • It's not the unified memory which helps, just the size of memory pool accessible to the GPU. The CPU and GPU don't cooperate in any meaningful way which require unification. Plenty of Mx platforms have 8 GB, the networks can run equally well on GPUs behind PCIe with 8 GB.

                • It's not the unified memory which helps, just the size of memory pool accessible to the GPU.

                  That's a nonsensical claim. You can't separate those two things.

                  The CPU and GPU don't cooperate in any meaningful way which require unification.

                  Unified memory don't mean that the CPU and GPU cooperate. It means that GPU and CPU share cache-coherent memory.
                  This means that you can have a larger pool of VRAM available for your device (since every PC involved in a discussion like this has vastly more RAM than VRAM), and that the latency for a transfer of ownership for a page is whatever the latency of RAM is.

                  Plenty of Mx platforms have 8 GB

                  Ya, they suck. What is your point?

                  the networks can run equally well on GPUs behind PCIe with 8 GB.

                  Only because the bottleneck there is the disk f

                  • No, the bottleneck is the size of the memory pool available to the matrix compute.

                    My point was forcing you to "imagine". The moment you did you acquiesce'd that in the non imagined scenarios it's not the unification which is relevant. Higher level descriptions of perceptron+magic networks can be compiled to any compute platform with sufficient memory. Unification of the memory with a general purpose CPU is not needed.

                    It serves Apple's purpose to have a proprietary description and compiler not to leverage un

                    • No, the bottleneck is the size of the memory pool available to the matrix compute.

                      Depends on the memory model.
                      In "high utilization" memory model, the amount of VRAM available is indeed the bottleneck.
                      Unless you have a $4000 GPU, you have more system RAM than you have VRAM. Unified memory helps you here.
                      In the "low" memory model, the bus between main RAM and VRAM is the bottleneck.
                      For PCIe, that's 32GB/s. For Unified Memory, it's whatever the latency of however long it takes you to tell the GPU the pointer to its RAM.

                      My point was forcing you to "imagine". The moment you did you acquiesce'd that in the non imagined scenarios it's not the unification which is relevant. Higher level descriptions of perceptron+magic networks can be compiled to any compute platform with sufficient memory. Unification of the memory with a general purpose CPU is not needed.

                      Non-imagined? You can't even intellectually honestly conduct this de

                    • I didn't have to decide what is imagined, you brought it up. Your high bandwidth matrix compute memory is 64 GB, which is the only thing of relevance to inference outside of imagined scenarios.

                      I didn't even comment on Apple marketing, I commented on someone here saying they are making their own language to "leverage the unified memory model". Which is not a lie, just a misunderstanding. Unified memory, is simply not relevant. The only thing which is relevant is the amount of memory available to the matrix-c

      • although I'm yet to see how this new stuff on apple silicon fares

        Pretty fucking incredibly, actually. [slashdot.org], at least for a laptop (or a non-tensor accelerated GPU, which means any AMD GPU right now, until they get RDNA3 AI accelerators working)

      • "Unfortunately you brought into the apple ecosystem at the worst possible time"

        Yes and no? Intel is still very common for WindowsIsn't the main problem with my computer the GPU? If if had had a similar NVidia GPU I could have used it with reasonable success/performance? Why does AMD not support Python's AI-libraries/why do the Python AI-community not support AMD GPUs?

        • by tonywong ( 96839 )
          AMD lost big money and was uncompetitive for many years in the CPU and then the GPU space. AMD was/is fighting a 2 front war against hypercompetitive Intel and Nvidia.

          Nvidia hasn't misstepped badly like Intel and keeps a significant lead over AMD, who is now only 1 step behind team green in GPU hardware but 2-3 steps behind in software. CUDA and ML support has been nurtured for years by Nvidia while AMD was nonexistent. An entire AI/ML ecosystem has grown up around Nvidia as a result.

          RocM exists but support
      • Would installing Linux or Windows on my Intel-MBP16 make a difference? Or is the problem AMD, independently of the OS?

    • Looks like the machine isn't suitable for what your new use-case is. MBP tend to hold their value, so take a look at what you could get for it and how close that is to the price of a new M-series machine. I think that you could de-spec significantly and end up with better performance for ML.

  • Both the github and pypi repo appear to be completely detached from Apple as a company. There's at least one of their ML engineers involved, but that's about it.
  • since I am already using whisper on MacOS, which up to now studiously ignored the ML features of Apple silicon, I am really excited to hear about this. About time.
  • It's inspired by the toolkits people want to use.

    Instead of contributing unified memory support to one of those, they created their own.

    Noticed how that's worked out with Metal? Get ready for more of the same.

Sentient plasmoids are a gas.

Working...