Stories
Slash Boxes
Comments

News for nerds, stuff that matters

34 Design Flaws in 20 Days of Intel Core Duo

Posted by CmdrTaco on Tue Jan 24, 2006 12:26 PM
from the what-am-i-designing-this-now dept.
Pray_4_Mojo writes "Geek.com is reporting that Intel's errata (bug) documentation shows that the Intel Core Duo chip has 34 known issues found in the 20 days since the launch of the iMac Core Duo. (you can read the list) with only plans to fix one of them. While bugs in hardware is nothing new (the P4 has 64 known issues, at this time Intel does not plan to fix a single one) this marks one of the first times that Intel released a processor with known bugs, and some of the bugs are of higher severity than in the past. Also alarming is the rate the flaws have been found, at one and half per day since the launch of the iMac Core Duo."
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

34 Design Flaws in 20 Days of Intel Core Duo 25 Comments More | Login /

 Full
 Abbreviated
 Hidden
More | Login
Keybindings Beta
Q W E
A S D
Loading ... Please wait.
  • Up front (Score:5, Interesting)

    by emerrill (110518) on Tuesday January 24 2006, @12:28PM (#14548936)
    I just think it means that Intel is being more honest about the problems, rather then hiding them til others find them.
  • "one of the first times"? (Score:5, Insightful)

    by sczimme (603413) on Tuesday January 24 2006, @12:31PM (#14548970)

    this marks one of the first times that Intel released a processor with known bugs

    No: either it is the first time or it is not. There can be only one... first time.

    and some of the bugs are of higher severity then in the past

    then != than

    • Re:"one of the first times"? (Score:5, Insightful)

      by Golias (176380) on Tuesday January 24 2006, @12:40PM (#14549085)
      this marks one of the first times that Intel released a processor with known bugs

      No: either it is the first time or it is not. There can be only one... first time.


      I disagree with the mod who marked you "Off-topic." It may look like you are just being a grammar nazi, but you raise a valid point.

      Saying "this marks one of the first times that Intel released a processor with known bugs" is pretty much the same as saying, "this is not the first time that Intel has released a processor with known bugs, but I want it to sound like alarmingly bad news for Apple."
      [ Parent ]
  • 20 days? (Score:5, Insightful)

    by Anonymous Coward on Tuesday January 24 2006, @12:32PM (#14548983)
    It's a little disohnest to use the phrasing "Core Duo chip has 34 known issues found in the 20 days since the launch of the iMac Core Duo."

    Most of these bugs were found well before the release of Core Duo. Many of the bugs are listed as having been observed by Intel only. That means the verficiation teams did hit these issues, either with very bizarre code setup, or doing something that's probably not technically legal anyway. Odds of seeing most of it in an end-user platform are very unlikely.
      • Re:20 days? (Score:5, Informative)

        by Anonymous Coward on Tuesday January 24 2006, @12:47PM (#14549152)
        And AMD has no bugs in their chips? Here's the Athlon 64 Revision History document off of AMD's own website:

        http://www.amd.com/us-en/assets/content_type/white _papers_and_tech_docs/25759.pdf [amd.com]

        There's a lot more listed there than for the Core Duo so far, and quite a few marked as "Won't be Fixed" and are scary sounding. Here's an example of a rather nasty looking ordering bug that results in system hang:

        Downstream non-posted requests to devices that are dependent on the completion of an upstream
        non-posted request can cause a deadlock in the presence of transactions resulting in bus locks, as shown in the following two scenarios:

        1. A downstream non-posted read to the LPC bus occurs while an LPC bus DMA is in progress. The legacy LPC DMA blocks downstream traffic until it completes its upstream reads.

        2. A downstream non-posted read is sent to a device that must first send an upstream non-posted read before it can complete the downstream read.

        In both cases, a locked transaction causes the upstream channel to be blocked, causing the deadlock condition.

        Potential Effect on System
        The system fails due to a bus deadlock.
        [ Parent ]
  • AMD errata (Score:5, Informative)

    by Anonymous Coward on Tuesday January 24 2006, @12:33PM (#14548992)
    Revision Guide for AMD AthlonTM 64 and AMD OpteronTM Processors [amd.com]. Just for balance. (only two of them are really interesting, #113 is one of them IIRC)
  • First time with BUGs?!?! (Score:5, Informative)

    by Ninja Programmer (145252) on Tuesday January 24 2006, @12:34PM (#14549011) Homepage
    ... While bugs in hardware is nothing new (the P4 has 64 known issues, at this time Intel does not plan to fix a single one) this marks one of the first times that Intel released a processor with known bugs, ...


    Huh? That's clearly wrong. When Intel had its famous FDIV bug, they shipped it knowing that the problem was there (the chips were already manufactured before they noticed it in their internal design validation.) In fact I would highly doubt that any Intel chip (or AMD chip) has shipped without some known bugs in them.

    Its just a question of severity. Most of these bugs tend to be highly marginal in a "real software doesn't push that hard on the CPU" sense.
  • Why is this an Apple issue? (Score:5, Informative)

    by toupsie (88295) on Tuesday January 24 2006, @12:35PM (#14549020) Homepage
    Apple is not the only manufacturer using the Core Duo [notebookreview.com] chip [google.com].
  • Oh thats it! (Score:5, Funny)

    by catahoula10 (944094) on Tuesday January 24 2006, @12:36PM (#14549033)
    Why does Apple want to use an intel chip?

    Oh, thats right:
    Microsoft Owns Apple.

    How can we tell?

          1. Apple's stock only rose 25% last week.
          2. Bill Gates's birthday now a paid holiday for Apple employees.
          3. Default Mac startup sound changed to "Taps."
          4. Wall Street brokers have stopped using Apple stock certificates as toilet paper.
          5. Apple's new slogan: "Almost as good as Windows!"
          6. Apple has been bent over with its pants dropped for so long now, even a geek like Bill Gates was bound to get lucky.
          7. Cute rainbow-colored apple now inhabited by cute rainbow-colored worm.
          8. microsoft comes out with an operating system incorporating Mac technology ... uh, wait a minute ...
          9. Phone and utilities mysteriously start working again at Apple's corporate HQ.
        10. Steve Jobs seen tending bar at the Gates' private lawn party.
        11. Diners in Microsoft's staff cafeteria can now enjoy their apple pie purely for its wholesome goodness and no longer as a symbolic act of global domination.
        12. Unsold Newtons used as cobblestones in Gates's driveway.
        13. Apple Employee of the Month gets to hunt loose change at Bill's house.
        14. New Apple employee dress code includes large "Property of B. Gates" tattoo on ass.
        15. Bill Gates still burned in effigy, but upper management no longer attends.

    (http://www.ehumorcentral.com/Directory/Jokes/838. html [ehumorcentral.com])

    I like #7 and #11 myself :-)

  • by tlhIngan (30335) <[slashdot] [at] [worf.net]> on Tuesday January 24 2006, @12:36PM (#14549035)
    It's called "errata", and it's common for most processors to be released with pages and pages and pages of errata.

    Of course, what happens is that the alpha/beta silicon ships to select customers without many errata (though internal testing often finds them too, and they ship with those). Then the manufacturer goes back, resolves a few, then the cycle repeats until everyone is happy with the bugs and it's released with a book of errata on them, and workarounds for the severe ones.

    "No fix" errata are common. The most serious of those have workarounds. Fixed errata are for things where there can be no possible software workaround. But there's a large number of varying severity - from cache incoherences, lock failures (you try to lock something, and it either can't be unlocked the usual way, or it doesn't reliably indicate lock), to bus and spec violations.

    Nothing new here...
  • by shawnce (146129) on Tuesday January 24 2006, @12:44PM (#14549127) Homepage
    Not sure I understand the point of this new article... all chips have errata. This is like reporting that the sun set again or that slashdotters have no love life.

    For eample...

    The MPC7410 family of chips (aka G4) from Freescale (formally part of Motorola) has 21 errata currently listed: MPC7410CE.pdf [freescale.com]

    The MPC7447 family of chips (aka G4) from Freescale has 36 errata currently listed: MPC7457CE.pdf [freescale.com]

    The PPC 970FX (aka G5) from IBM has 24 errata currently listed: 970fx_errata_dd3.x_v1.6.pdf [ibm.com]

  • It's normal to not fix silicon bugs (Score:5, Informative)

    by Theovon (109752) on Tuesday January 24 2006, @12:48PM (#14549160)
    As an ASIC designer, I have produced my fair share of silicon bugs. Chips are expensive to produce, making bugs expensive to fix. As a result, chip designers (even ones with deep pockets like Intel) do not look at bugs as something to FIX, but rather as something to MASK. I don't mean to hide it from people (although that does happen), but to make it not a bug by working around it.

    Unless the bug is so fatal that you can't work around it, or the bug could potentially cost lives, the primary solution is to work around it. Either you write driver code to avoid the bug, or you find some other cheap solution. Sometimes, it's a simple matter of removing a feature from your marketing literature.

    Intel's typical means to mask processor bugs is microcode. This hurts performance, but they can typically create a workaround that routes everything around the bug. I can't read the article (it's slashdotted), but I'm sure that by saying they won't fix some bugs, they're saying that they won't respin the silicon but rather mask the bug in some other way.

    Listing the bugs (and not fixing them in this version) is an appropriate thing for Intel to do.

    (I'm no Intel fanboy. I think they're bastards. But this is NOT an example of them being bastards.)
  • I like this comment (Score:5, Funny)

    by jm91509 (161085) on Tuesday January 24 2006, @01:01PM (#14549295) Homepage
    AE 16:

    Show-stopper but only observed by Intel so far. Also, any OS developer who codes like this deserves this one.
    • Re:Faster (Score:5, Funny)

      by Golias (176380) on Tuesday January 24 2006, @12:34PM (#14549003)
      Shh!!! You're ruining perfectly good FUD!
      [ Parent ]
    • Re:Faster (Score:5, Funny)

      by adrianmonk (890071) on Tuesday January 24 2006, @12:50PM (#14549192)
      Maybe they're just getting faster/better at finding bugs?

      Yeah, I hear they're 2 to 3 times as fast now on the most important bug finding benchmarks.

      [ Parent ]
    • Re:Faster (Score:5, Insightful)

      by Surt (22457) on Tuesday January 24 2006, @12:52PM (#14549210) Homepage Journal
      It seems likely that given the increasing complexity, the error rate is going to rise proportionally. I mean, how many errors do you expect in a 100,000 transistor chip vs a 100,000,000 transistor chip?
      [ Parent ]
      • Re:Faster (Score:5, Informative)

        by VitaminB52 (550802) on Tuesday January 24 2006, @01:38PM (#14549693)
        It seems likely that given the increasing complexity, the error rate is going to rise proportionally. I mean, how many errors do you expect in a 100,000 transistor chip vs a 100,000,000 transistor chip?

        Given the fact that a very substantial part of the extra chip estate is being used as L1 and L2 chache, the error rate should increase less than proportionally. If you upgrade cache size from say 8 kB to 1 MB, then there is only a relative small increase in complexity of the cache controler, not of the cache itself.
        Add the new chip design software and the use of hardware libraries for standard chip functionality, then the error rate should increase even slower.

        [ Parent ]
    • It's because (Score:5, Funny)

      by RealProgrammer (723725) on Tuesday January 24 2006, @12:57PM (#14549262) Homepage Journal
      ... for the first time, they're releasing the chip for a stable OS first.

      It used to be that testers only had an unstable testbed OS (designed primarily to run the same company's office suite) to use for validatation. Testers were never quite sure before where the blue screens, lockups, funny noises, and billowing smoke actually originated.

      (Relax, it's just a joke).
      [ Parent ]
          • Re:Faster (Score:5, Insightful)

            by Golias (176380) on Tuesday January 24 2006, @01:19PM (#14549480)
            What I am saying is that in general, what's the use of getting better and faster at finding bugs if there aren't plans to fix it?

            Because the purpose of finding silicon bugs is almsot never to fix it. Fixing CPU bugs is often impractical. You find the flaws so you can route around them. This is the case with every consumer chip on the market, including the one you are using to read this right now.
            [ Parent ]
    • Re:A flawed design kept alife. (Score:5, Insightful)

      by TheRaven64 (641858) on Tuesday January 24 2006, @12:42PM (#14549100) Homepage Journal
      Not quite the same. All that has been kept the same is the interface, not the implementation. It's the equivalent to having to keep an API/ABI stable. It can cause problems (see the WMF features for more information), but it's also often useful - Win3.0 apps running on Windows XP, for example, or UNIX code from the '80s compiling and running on Linux / BSD.

      The problem with x86 comes from the fact that a large number of instructions interact in relatively complex ways with others. Changing a small amount of silicon can change a side-effect of an instruction, which is then a bug. An ISA such as Alpha eliminated this by keeping inter-instruction interactions to a minimum (no condition registers, etc).

      [ Parent ]
        • Re:Should've gone with AMD (Score:5, Informative)

          by freidog (706941) on Tuesday January 24 2006, @12:50PM (#14549193)
          Here you go [amd.com]

          I didn't bother to actually count the number of unfixed or no fix planned glitches / bugs in there, so I don't know if it actually validates the 80+ the grandparent claimed, but there are quite a few known bugs in A64 and its HTT bus.

          In fact there are going to be any CPU released, even stuff like Power / Itanium / USpark are going to have errata like this. Microprocessors are inredibly complex equipment, and 100% stable and glitch free under all possible conditions just isn't going to happen. Who ever submitted this story is blowing this entirely out of proportion. The link is already Slashdotted so I haven't gotten a chance to read what the bugs / glitches are, but I would be good money a normal user could go through the entire life of their Core Dou Mac and never notice one. These are typically very small gliches / bugs that occur under very specific conditions, and are meant more for hardware manufacturers to be aware of than they are to warn a user there could be problems with their chips.

          publishing them publicly I think is a good move on Intel's part, but they do run this risk where people don't understand that this is a completely and utterly ordinary and expected thing to happen.

          [ Parent ]
    • Re:No buy (Score:5, Informative)

      by mr100percent (57156) * on Tuesday January 24 2006, @01:00PM (#14549285) Homepage Journal
      All chips have errata, and custmarily are well documented and are published on the vendor's web site. BTW, errata can be something as simple as a correction to the datasheet. Most are usually minor and are dealt with by the compiler. For example, if there's an error with calculations dealing with a certain registry and decimal values, the compiler would just not use that registry for the calculations.

      The documented and known errata are not what you should be concerned with. It's the unknown ones that freeze your computer or cause all robots to attack their masters.

      If someone's complaining about this, they should just turn off their computers, because as we ALL know, every operating system (the OS is what runs on chips that have the errata) also are shipped with hundreds, if not thousands, of known bugs. You're not going to find a perfect chip in the real world. How many errata did the G4/G5 have? By comparison the IBM PowerPC 970FX has 24 errata, none of which is planned for a fix. When you consider the 970FX is a fairly mature chip, 34 errata on a new chip is hardly news worthy. As transistors get more and more compact and miniaturized, I'm sure we're bound to see more.
      [ Parent ]
    • "85 pages" is a misleading comment. (Score:5, Informative)

      by wild_berry (448019) on Tuesday January 24 2006, @01:43PM (#14549743) Journal
      Your comment is misleading. The document lists only 61 errata and contains their respective details. The initial table of errata -- table 5 -- is only four pages long (begins 13 and ends 16) and is most likely to group the problems by the wafer families; the next two pages reiterate the errata for each given brand name of AMD K7/K8 chip; all but one of the remaining pages detail the errata and their suggested workarounds/fixes. The last page is a list of extra resources.

      I don't dispute your comment regarding the experience of a chipset designer.
      [ Parent ]