Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
AI Desktops (Apple)

DeepSeek-V3 Now Runs At 20 Tokens Per Second On Mac Studio 90

An anonymous reader quotes a report from VentureBeat: Chinese AI startup DeepSeek has quietly released a new large language model that's already sending ripples through the artificial intelligence industry -- not just for its capabilities, but for how it's being deployed. The 641-gigabyte model, dubbed DeepSeek-V3-0324, appeared on AI repository Hugging Face today with virtually no announcement (just an empty README file), continuing the company's pattern of low-key but impactful releases. What makes this launch particularly notable is the model's MIT license -- making it freely available for commercial use -- and early reports that it can run directly on consumer-grade hardware, specifically Apple's Mac Studio with M3 Ultra chip.

"The new DeepSeek-V3-0324 in 4-bit runs at > 20 tokens/second on a 512GB M3 Ultra with mlx-lm!" wrote AI researcher Awni Hannun on social media. While the $9,499 Mac Studio might stretch the definition of "consumer hardware," the ability to run such a massive model locally is a major departure from the data center requirements typically associated with state-of-the-art AI. [...] Simon Willison, a developer tools creator, noted in a blog post that a 4-bit quantized version reduces the storage footprint to 352GB, making it feasible to run on high-end consumer hardware like the Mac Studio with M3 Ultra chip. This represents a potentially significant shift in AI deployment. While traditional AI infrastructure typically relies on multiple Nvidia GPUs consuming several kilowatts of power, the Mac Studio draws less than 200 watts during inference. This efficiency gap suggests the AI industry may need to rethink assumptions about infrastructure requirements for top-tier model performance.
"The implications of an advanced open-source reasoning model cannot be overstated," reports VentureBeat. "Current reasoning models like OpenAI's o1 and DeepSeek's R1 represent the cutting edge of AI capabilities, demonstrating unprecedented problem-solving abilities in domains from mathematics to coding. Making this technology freely available would democratize access to AI systems currently limited to those with substantial budgets."

"If DeepSeek-R2 follows the trajectory set by R1, it could present a direct challenge to GPT-5, OpenAI's next flagship model rumored for release in coming months. The contrast between OpenAI's closed, heavily-funded approach and DeepSeek's open, resource-efficient strategy represents two competing visions for AI's future."
This discussion has been archived. No new comments can be posted.

DeepSeek-V3 Now Runs At 20 Tokens Per Second On Mac Studio

Comments Filter:
  • by OrangeTide ( 124937 ) on Tuesday March 25, 2025 @06:42PM (#65259225) Homepage Journal

    The Mac 512K had an introductory price of $3,195 (equivalent to $9,670 in 2024). I think the collapse of home computer prices in the 1990's and 2000's has altered what we think a reasonable price for "consumer hardware" is.

    • by quenda ( 644621 )

      The Mac 512K was hardly "consumer hardware", unless you lived in a very wealthy area.
      I only saw it in businesses and universities. Consumers had a Commodore 64 if they were lucky.

    • Were you making a joke reference?
      They specifically referred to a Mac Studio with M3 Ultra chip.

    • I still think it is a miracle that for a few C-notes, one can pick up a generic mini PC for gaming. I think about how something like the Apple //fx was, with a price tag over $10,000... in 1990s dollars, and that was completely obsoleted by cheaper Quadras a year or two later.

      It was a decade where on average, $2500 was the "sweet spot" for a computer, and that was pretty much a basic home machine. However, the one thing those machines had which current ones didn't was often a tape drive, so one could do t

      • My $200 smartphones can emulate DOS faster than my 486dx2-66. The power of Moore's law. (and Taiwan chipset manufacturing)

        The 1990's was terrible time to invest in a high-end workstation, and a great time to chase the PC upgrade train. Between the falling prices the speed bumps you could get an affordable computer that was faster than last year's but also obsolete in 18-24 months.

        I remember the Beyond 2000 TV show promising holographic storage in crystals. We should be able to store an exabyte in write-once

      • You can order all that online.
        The simplest is probably a blue ray writer.

        Regarding USB/Thunderbolt tape drives, it is probly only a matter of money, just google them.

  • Is 66 English characters a second fast?

    How would you feel about a 66 BAUD modem?

    • Re:66 BAUD (Score:5, Interesting)

      by quenda ( 644621 ) on Tuesday March 25, 2025 @07:44PM (#65259333)

      "baud" , named after Émile Baudot, is bits per second, not bytes.
      Any yes, 20 tokens per second is good for native inference. I don't know where you get 20 tokens = 66 char from, but it sounds reasonable, and a lot faster than people read.
      This is a machine smarter than most people at a huge number of tasks, for the price of a used car, and you are complaining about ... what?

      • Something ignorant confusing a baud rate with characters-per-second, at first glance. Wouldn't pay it too much mind. If he had at least done it with bits, I would have just helpfully educated him on the difference.
      • That's not right. Baud can mean either 1 bit per second or 1 byte per second depending on the modem.
        • FSK, 1 bit per baud
          PSK, 2 bit per baud
          then they got into encodings with 4 bits/baud, and then they started using compression

      • Re: (Score:3, Insightful)

        Nope.
        It is steps per second, has nothing to do with bits or bytes per se.
        For example if you can transfer 4 bits with one signal change, and can do and recognize 1000 signal changes aka steps per second, then you transfer 1000 baud, but 4000 bits per second.

        That is why high end modems on 4kHz phone lines had 4000 baud but transfered roughly 20kBits.

      • by gdm ( 97336 )

        "baud" , named after Émile Baudot, is bits per second, not bytes.

        The baud rate is a measure of 'symbols' per second, where a symbol might be more than one bit, in which case the baud rate is lower than the bit rate. It's a form of compression, where a single state change on a wire can supply more bits, something regularly used by dial-up modems in the good old days.

    • Whether I got the capitalization wrong or not, whatever. Baud is one change in a carrier signal per second. Kinda similar to a change in tokens in a token stream. It was never about bits or bits per second, or about equating characters to bits.

      According to this API doc, 1 English character 0.3 token:
      https://api-docs.deepseek.com/... [deepseek.com]

      20 Tokens per second, 0.3 tokens per English character, 66.666 English characters per second.

  • by kwerle ( 39371 ) <kurt@CircleW.org> on Tuesday March 25, 2025 @07:24PM (#65259287) Homepage Journal

    I'm not up on my AI jargon. How should I feel about 20 tokens/second? What does that mean to me as some kind of user?

    • This is the one thing that needed to be included to be valuable.

      That said, I've been running the distilled model, 14b and it runs favourably on my M1 Max Pro with 32GB.

    • Re:20 tokens/second (Score:4, Informative)

      by quenda ( 644621 ) on Tuesday March 25, 2025 @08:00PM (#65259363)

      20 tokens/sec is faster than you can read, so it is very usable.
      And note that this is a not a reasoning model, so you won't be waiting ages for it to start the response proper.

    • o1 is $15 per M tokens. I'd stick with o1.

      • At 20 tokes / second, you do 630M tokens / year, which at 630*15 has the value of $9450 which just about the value of the desktop computer you need to run this.

        And while it is true that o1 is better than deepseek, it is also true that $15 is a heavily subsidised price. I'm sure it costs OpenAI more than $10k to run o1 for a full year, not to mention electricity cost.
        The point being that AI can be commodified in the sense of enabling small outfits buying a bunch of servers and starting to compete with big gu

        • Pricing is definitely not subsidized since they offer deep batch discounts but yeah it's not clear yet how they will make money

    • Well I'm not quite sure either but let me tell you what I experienced on an older Intel iMac Pro - (2017).

      I loaded up the largest model possible just to see what it would do... I entered some initial question, I forget what, and then got about 10-20 minutes of a "thinking" message.

      Then, I got... an "H".

      A few minutes later... an "I".

      Yeah it too about 30 minutes to begin a message with "Hi", I gave up after a few hours.

      So 20 tokens a second is sounding pretty good compared with that!

    • by allo ( 1728082 )

      It's a bit faster than you can read.
      On the other hand, R1 (built upon V3 but as the thinking alternative and not the successor) creates a wall of text of reasoning before the answer part (even though the reasoning is often helpful to read), which introduces some waiting time if you have "only" 20 T/s. Still quite good for running such a large model on hardware you can afford.

    • A token is a portion of a word. It doesn't equate to individual letters, nor entire words, but something in between. The average is around .75 tokens per word. So four "average" words take 5 tokens.

      So 20 tokens per second is perfectly fine for a single person interactively chatting with the LLM. If you're doing any sort of larger data processing (feeding in large documents, outputting large documents, or multiple users) it's pretty slow.

  • by Kunedog ( 1033226 ) on Tuesday March 25, 2025 @08:01PM (#65259367)
    Thanks for the TPS Report, /.
  • OpenAI's closed, heavily-funded approach.

    What's in a name?

Happiness is a hard disk.

Working...