Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
OS X Businesses Operating Systems Software Utilities (Apple) Apple

Measuring Fragmentation in HFS+ 417

keyblob8K writes "Amit Singh takes a look at fragmentation in HFS+. The author provides numbers from his experiments on several HFS+ disks, and more interestingly he also provides the program he developed for this purpose. From his own limited testing, Apple's filesystem seems pretty solid in the fragmentation avoidance department. I gave hfsdebug a whirl on my 8-month-old iMac and the disk seems to be in good shape. I don't have much idea about ext2/3 or reiser, but I know that my NTFS disks are way more fragmented than this after similar amount of use."
This discussion has been archived. No new comments can be posted.

Measuring Fragmentation in HFS+

Comments Filter:
  • HFS+ defrag source (Score:5, Informative)

    by revscat ( 35618 ) * on Wednesday May 19, 2004 @01:06PM (#9196458) Journal
    As mentioned in the article, HFS+ does defragging on the fly when files are opened if they are less than 20MB. The source code for this is available here [arstechnica.com], as is a discussion about it that contains input from some Darwin developers.
  • Re:Anonymous (Score:3, Informative)

    by o0zi ( 652605 ) on Wednesday May 19, 2004 @01:08PM (#9196474) Homepage
    Ext2/3 and reiserfs both have inbuilt defragmentation capabilities. This can be seen, for instance, when you boot an ext2 system after an unclean shutdown and it checks the integrity of the filesystem. Ext3 and reiserfs are both journaling filesystems, which also helps with this problem. This is often strange for new Linux users, as they're used to worrying about their Windows NTFS/FAT32 filesystems. In Linux, it's just not necessary (nor in any Unix derivative such as AIX or BSD that uses those filesystems).
  • by Anonymous Coward on Wednesday May 19, 2004 @01:08PM (#9196478)
    Goto My Computer. Right click the drive to be analyzed. Select tools/defragment now.../Analyze.

    This was my PhD Thesis.
  • My stats (Score:5, Informative)

    by Twirlip of the Mists ( 615030 ) <twirlipofthemists@yahoo.com> on Wednesday May 19, 2004 @01:13PM (#9196509)
    I throw these out there for no real reason but the common interest.

    I've got a G4 with an 80 GB root drive which I use all day, every day. Well, almost. It's never had anything done to it, filesystem-maintenance-wise, since I last did an OS upgrade last fall, about eight months ago.
    Out of 319507 non-zero data forks total, 317386 (99.34 %) have no fragmentation.
    Not too shabby, methinks.
  • Re:Give it a rest (Score:4, Informative)

    by chasingporsches ( 659844 ) on Wednesday May 19, 2004 @01:13PM (#9196511)
    i believe the topic at hand is fragmentation, not how well it works as a filesystem. in that regard, FAT32 and NTFS do have horrible problems with fragmentation, while HFS+ defragments on the fly.
  • by pjt33 ( 739471 ) on Wednesday May 19, 2004 @01:15PM (#9196529)
    HFS+ is also journalled by default.
  • by MemoryDragon ( 544441 ) on Wednesday May 19, 2004 @01:18PM (#9196546)
    Ntfs does not fragment that strongly as long as you dont hit the 90% full mark of your disk, once you reach that, see the files becoming fragmented in no time. NTFS uses the open space for write access and then probably relocates the files in time, once it hits 90% the open space usage algorithm does not seem to work anymore.
  • by ericdano ( 113424 ) on Wednesday May 19, 2004 @01:25PM (#9196609) Homepage
    Intech's Speedtools [speedtools.com] is a good set of utilities and includes a good defragmenter. For a complete defrag, something like Drive 10 or TechTool 4 [micromat.com] work better.


    Good luck

  • by ahknight ( 128958 ) * on Wednesday May 19, 2004 @01:27PM (#9196624)
    As stated in the article, this is a feature of the HFS+ code in Panther. The filesystem cannot have a defrag feature as the filesystem is just a specification. The implementation of that specification, however, can do most anything to it. :)
  • by danaris ( 525051 ) <danaris@mac . c om> on Wednesday May 19, 2004 @01:27PM (#9196627) Homepage

    That's not quite correct. In Panther (Mac OS X 10.3, for the uninitiated), journaling is enabled by default: that is, when you first install Panther, it will add journaling to your existing HFS+ disk, and if you're reformatting, it will default to HFS+ (Journaled). However, prior to Panther, there was no journaling support in HFS+, to my knowledge.

    Dan Aris

  • by Anonymous Coward on Wednesday May 19, 2004 @01:29PM (#9196646)
    Well I have 32 Linux servers and 87 Linux desktops in the domain I administer. Never had a corrupted fs on any of those in the 1 /2 years I've been here, on the other hand we also run 143 Win desktops and I currently have 14 associates' hard drives with corrupted fs sitting in front of me (granted 11 of them are Win98).
  • by funkdid ( 780888 ) on Wednesday May 19, 2004 @01:31PM (#9196660)
    You are the victim of the "click of death" this is a very well documented HD defect (not an FS defect). Tell Apple about it, or go to an Apple store. They will replace your HD free of charge.
  • by sterwill ( 972 ) on Wednesday May 19, 2004 @01:33PM (#9196677) Homepage
    Grab smartmontools [sourceforge.net] and run them on your drive (like "smartctl -a /dev/hda" or similar). Most SCSI and most newer ATA drives will maintain a SMART error log of any defects/problems. smartmontools will also print drive attributes (for most drives) that can tell you when a drive is about to fail, before it actually does.
  • Re:Huh? (Score:5, Informative)

    by Ann Elk ( 668880 ) on Wednesday May 19, 2004 @01:39PM (#9196728)

    My own experience, using a small tool I wrote to analyze NTFS fragmentation:

    NTFS is pretty good at avoiding fragmentation when creating new files if the size of the file is set before it is written. In other words, if the file is created, the EOF set, and then the file data is written, NTFS does a good job of finding a set of contiguous clusters for the file data.

    NTFS does a poor job of avoiding fragmentation for files written sequentially. Consider a file retrieved with wget. An empty file is created, then the contents are written sequentially as it is read from the net. Odds are, the file data will be scattered all over the disk.

    Here's a concrete example. Today, I downloaded Andrew Morton's 2.6.6-mm4.tar.bz2 patch set. (Yes, I run WinXP on my Toshiba laptop -- deal with it.) Anyway, the file is less than 2.5MB, but it is allocated in 19 separate fragments. I copied it to another file, and that file is unfragmented. Since the copy command sets EOF before writing the data, NTFS can try ot allocate a contiguous run of clusters.

    Note - This was done on uncompressed NTFS. My feeling is that compressed NTFS is even worse about fragmentation, but I don't have any numbers to back that up.

  • by exwhyze ( 781211 ) on Wednesday May 19, 2004 @01:40PM (#9196735)
    Buzzsaw and Dirms [dirms.com] -- I admit, the site looks a little seedy, but I've used both of these programs on several machines for upwards of a year and they've done a superb job of keeping my NTFS disks defragmented.
  • by Steveftoth ( 78419 ) on Wednesday May 19, 2004 @01:42PM (#9196765) Homepage
    Jaguar (10.2) has journaled support as well, but you had to enable it as it was not a default option.

    Even in 10.3 it's optional, not required, but it's the new default for new disks. Probably because Apple decided that their code was solid enough to put into production. After testing it on 10.2 I agree with them.
  • by 42forty-two42 ( 532340 ) <bdonlan.gmail@com> on Wednesday May 19, 2004 @01:45PM (#9196780) Homepage Journal
    Manually run e2fsck it'll tell you how fragmented it is, as in:
    $ e2fsck -f -n knoppix.img
    knoppix.img: 453/7680 files (3.1% non-contiguous), 12180/30720 blocks
  • by djupedal ( 584558 ) on Wednesday May 19, 2004 @01:50PM (#9196820)
    http://docs.info.apple.com/article.html?artnum=256 68 [apple.com]

    Mac OS X: About Disk Optimization

    Do I need to optimize?

    You probably won't need to optimize at all if you use Mac OS X. Here's why:
  • by 222 ( 551054 ) <stormseeker@nOsPAm.gmail.com> on Wednesday May 19, 2004 @01:52PM (#9196831) Homepage
    For proof, check out this. [electronicsandmore.us] This drive was defragged about a week ago, and although it does go through heavy use, the current low disk space causes massive fragmentation.
  • Re:Anonymous (Score:3, Informative)

    by muck1969 ( 237358 ) <muck@fl e x .com> on Wednesday May 19, 2004 @02:01PM (#9196900) Homepage
    The most significant display I've ever seen for the benefits of defragmentation was on a 386 box that had Win 3.11 in 1992. The boot time was cut from two minutes down to 40 seconds and response was very noticeable. I didn't defrag due to any outside encouragement; I happened to find the utility in some drawer on a job site and gave it a try.

    Fragmentation is a performance killer for Win 9x on older machines ... presuming that Win 9x actually performs.
  • Re:Anonymous (Score:4, Informative)

    by flynn_nrg ( 266463 ) <mmendez@gma i l .com> on Wednesday May 19, 2004 @02:18PM (#9197078) Homepage Journal

    What are you talking about?

    Ext2/3 and reiserfs both have inbuilt defragmentation capabilities.

    No, they don't. But since they borrow their design from BSD's FFS they don't need it either.

    This can be seen, for instance, when you boot an ext2 system after an unclean shutdown and it checks the integrity of the filesystem. On journaled filesystems, the log is replayed. IBM's jfs also runs a modified fsck.

    Erm, that's fsck. fsck doesn't do defragmentation.

    In Linux, it's just not necessary (nor in any Unix derivative such as AIX or BSD that uses those filesystems).

    It's true, however performance is severely degraded when disk usage reaches around 90% for classic FFS-like filesystems. While the BSDs can mount ext2 partitions none of them uses ext[23] as default. AIX uses a JFS version that's a bit different from the one you see in Linux, which was based on OS/2's code. I think you're mixing up filesystem integrity with fragmentation. In classic BSD UFS/FFS data is stored in datablocks, which are partitioned in fragments, usually 1/4th of the datablock size. A fragmented file is a file that's stored in non-contiguous fragments. Just that. The performance impact of fragmented files vs the time needed to reorganize the data shows that it's not worth running a defrag program on FFS filesystems.

    This paper [harvard.edu] has some more info on the subject.

  • by Anonymous Coward on Wednesday May 19, 2004 @02:23PM (#9197131)
    You just made his point. The DRIVER does the defragging. The HFS+ is a specification for how the files are laid out and written to the disk, such that a driver that understands this specification can read it. Linux has HFS+ drivers, but I doubt they defrag on the fly. Supposedly (though I don't know), Mac OS versions prior to 10.3 didn't defrag either.

    So therefore it might be a part of the operating system's filesystem. That's the system that deals with files. But that's not what was asked. What was asked was whether it was an inherent feature of HFS+, and that's not possible, since HFS+ doesn't tell the OS what to do when a file is opened, only how the stuff is stored on the disk.

    Perhaps you didn't understand the dual nature of the word filesystem: it can be the subsystem of the OS that handles files, or it can be the physical representation of the data on to the hard drive. If you assume it's only the first, your explanation makes sense. If you assume the second one (which would be the usage intended and understood by most people given the fact that the question and response were about HFS+ (physical filesystem) compared to Panther (OS filesystem)), then you'd be wrong.

    And I've been trolled, but who cares.
  • There are a couple things that you have to consider. For one, if part of the disk corrupts, how will you identify a header? Or for that matter, how would you identify the header space vs. file space in a non-corrupted file system?

    You're probably thinking "just store the size of the file", This is perfectly valid, but it does have certain implications. You see, in Comp-Sci, we refer to a list like this as a "linked list". The concept basically being that each item in the list has information (i.e. a "link") that helps identify the next item in the list. Such a data structure has a worst case access time of O(n). Or in other words, if your item is at the end of the list,and you have you have 2000 files, you'll have to check through all two thousand headers before finding your file.

    Popular file systems circumvent this by using what's called a Tree structure. A tree is similar to a linked list, but allows for multiple links that point to children of the node. A node that has no children is referred to as a "leaf node". In a file system the directories and files are nodes of a tree, with files being leaf nodes. This configuration gives us two performance characteristics that we must calculate for:

    1. The maximum number of children in a node.
    2. The maximum depth of the tree.

    Let's call them "c" for children and "d" for depth. Our performance formula is now O(c*d) and is irrespective of the number of items in the data structure. Let's make up and example to run this calculation against:

    Path: /usr/local/bin/mybinary

    Nodes:
    / (34) /usr (10) /usr/local (9) /usr/local/bin (72)

    Longest path: /usr/X11R6/include/X11

    Plugging the above numbers (72 for c, 4 for d) we get a worst case of 72*4 = 288 operations. Thus our worst case is much better than the linked list. And if we calculate the real case to access /usr/local/bin/mybinary, we get 34+10+9+72 = 134 operations.

    Hope this helps. :-)

  • Re:Panther Defrag (Score:3, Informative)

    by aristotle-dude ( 626586 ) on Wednesday May 19, 2004 @02:26PM (#9197167)
    Have you heard of kernel extensions aka kernel modules? Drivers on OSX end with a .kext which denotes a kernel extension.
  • by Daniel_Staal ( 609844 ) <DStaal@usa.net> on Wednesday May 19, 2004 @02:28PM (#9197201)

    I believe the actual sequence is this:

    1. Get request for file
    2. Open File
    3. Buffer file to memory
    4. Answer request for file
    5. If needed, defragment file

    In other words, it defrangments after the file has been returned to the program needing it, as a background process. The buffer to memory is a pre-existing optimization, so the only real trade off is the background processor usage goes up. If you aren't doing major work at the time, you'll never notice. (And if you are doing major work, you probably are using files larger than 20MB in size anyway.)

    Files larger than 20MB just aren't defragmented, unless you have another tool to do it.

  • by javax ( 598925 ) on Wednesday May 19, 2004 @02:33PM (#9197241)
    hmmm... the advantage would be that your system wouldn't need to get defragmented every night you're asleep.

    The real advantage is that it will speed up things immediately for the 2nd time the file is read. Though you can still defrag your disk overnight if you like that.
  • by pantherace ( 165052 ) on Wednesday May 19, 2004 @02:52PM (#9197389)
    Well, all modern operating systems handle it so that any program, except certain tools such as the defragmenter, which either look at it directly, or use a lower level call.

    NTFS is horrible. on a system installed less than a week ago, and a few programs (nwn, firefox, avg, itunes, aa, nvdvd, windows updates, and a couple more programs, it has 9.3GB used, and it is reported that it has "Total Fragmentation: 22%, File Fragmentation: 45%"

    So yes there are various methods of calculating file fragmentation. (2 I can think of: (# of files with fragments)/(total number of files) = 0 for a totally defragemented hd (& gives nice percentages) & (# of file fragments)/(total number of files) = 1 for a perfectly defragmented hd. or variations on those, and I haven't been able to find what calculations Windows, & e2fstools use, so I can't tell.

  • Re:Anonymous (Score:2, Informative)

    by EsbenMoseHansen ( 731150 ) on Wednesday May 19, 2004 @02:54PM (#9197404) Homepage
    The cache on the disk. And while we do not REALLY know, the described behaviour is a common and proven strategy. Remember that the disk does not have a lot of information to go by; it basically just sees request to read individual sectors. More or less.
  • by Atomic Frog ( 28268 ) on Wednesday May 19, 2004 @03:02PM (#9197472)
    No, it doesn't take much to outdo NTFS.

    NTFS fragments _very_ fast on me, after a few months of use, it is in the 20% or more range.

    Same user (i.e. me), so same usage pattern, on my HPFS disks (yes, HPFS, that would be OS/2, not OS X), the fragmentation after 3 _years_ is less than 2% on ALL of my HPFS disks.
  • by solios ( 53048 ) on Wednesday May 19, 2004 @03:14PM (#9197573) Homepage
    HFS+ was one of the major features of the OS 8.1 update. OS 8.0 and earlier can't "see" HFS+ volumes- they see a tiny disk with a simpletext file titled "where have all my files gone?" which, if I remember correctly, gives a brief explanation that the disk is HFS+ and requires 8.1 or higher to view. :)

    Journalling didn't show up until one of the Jaguar updates, where it could be enabled via the command line on clients and via disk utility on Server.
  • by wardk ( 3037 ) on Wednesday May 19, 2004 @03:30PM (#9197690) Journal
    My reccollection of the OS/2 HPFS file system from IBM was that in many cases it would purposely fragment to take advantage of the disk spin, thus using fragmentation to increase performance.

    Defrag utils for OS/2 had options to only defrag if there were more than 3 extents, to avoid nullifying this effect.

    funny, years after the death of OS/2, it still kicks ass on much what we use now.
  • by jimfrost ( 58153 ) * <jimf@frostbytes.com> on Wednesday May 19, 2004 @03:47PM (#9197821) Homepage
    No, FFS does not do after-the-fact defragmentation. It attempts to allocate blocks that have low seek latency as files are extended. For the most part this avoids the problem entirely.

    If you ever wondered why there is a "soft limit" on FFS filesystems, the reason why is that its allocator's effectiveness breaks down at about the point where the filesystem is 90% full. So they sacrifice 10% of the filesystem space so that they can avoid fragmentation problems. It's not a bad tradeoff, particularly these days.

    I didn't know that HFS+ used an after-the-fact defragmentation system, but they've been around for awhile too. Significant research was done into such things as part of log-based filesystem research in the early 1990s (reference BSF LFS and Sprite). You had to have a "cleaner" process with those filesystems anyway (to pick up abandoned fragments of the log and return them to the free pool) so it made sense to have it also perform some optimization features.

  • Re:Defrag = placebo? (Score:3, Informative)

    by jimfrost ( 58153 ) * <jimf@frostbytes.com> on Wednesday May 19, 2004 @04:12PM (#9198116) Homepage
    Consider an average eide hard drive: 10ms seek rate, ata100 interface.

    If you need to save a 100kb file, it will take 10ms (1/100th of a second) to seek to the first block, and then, assuming everything is perfect, it will take 100MB/sec / 100kb = 1/1000th of a second to write the file... so, seeking to the start of the file took 10 times as long as writing it!

    This gross simplification actually trivializes the real effect. The 10ms seek figure is an average track-to-track seek delay between adjacent tracks. The farther apart the tracks are the longer a seek takes (it's more or less linear although there is a per-seek overhead). You also don't deal with the fact that you're going to have to perform seeks on larger files no matter what.

    I note that there is a similar latency issue with head switches.

    There is a big difference between the delay necessary to pull a sector off of a track adjacent to where the heads currently are and one 1000 tracks away -- the delay can be an appreciable fraction of a second just for that single seek.

    The problem with simplistic filesystem block allocators is they do not weight their block allocations according to seek time. Usually they just pick whatever's first in the free list. This results, over time, to random block placement and therefore seek times that will on average approach 50% of worst case. I'd have to look closely to give a good figure with today's drives, but order-of-magnitude degredation is certainly possible. What you would prefer is blocks placed such that they're within a few percent of best case if at all possible.

    This is not hard to do, and to my knowledge BSD FFS was the first to attempt it -- and it was wildly successful at it.

    As an aside, I've never seen a fragmentation analysis program that took noncontiguous-but-well-placed into account. It's entirely possible to create a block layout that those programs think is awful that is within 90% of optimum. I think, actually, that BSD FFS would typically show up that way although I never investigated.

    I also note that smart block allocation makes a defragmenter's job a heck of a lot easier.

  • by shamino0 ( 551710 ) on Wednesday May 19, 2004 @04:18PM (#9198203) Journal
    HFS+ was one of the major features of the OS 8.1 update. OS 8.0 and earlier can't "see" HFS+ volumes- they see a tiny disk with a simpletext file titled "where have all my files gone?" which, if I remember correctly, gives a brief explanation that the disk is HFS+ and requires 8.1 or higher to view. :)

    And the person who came up with this idea was a genius. This is far far better than what most other operating systems do (refuse to mount the volume.)

    If I boot MS-DOS on a machine that has FAT-32 or NTFS volumes, I simply don't find any volume. I can't tell the difference between an unsupported file system and an unformatted partition. If the file system would create a FAT-compatible read-only stub (like HFS+ does), it would be much better for the user. Instead of thinking you have a corrupt drive, you'd know that there is a file system that your OS can't read.

  • Re:Defrag = placebo? (Score:3, Informative)

    by MyHair ( 589485 ) on Wednesday May 19, 2004 @04:27PM (#9198304) Journal
    It shouldn't really be an issue post-FAT. I think most people's obsesison with fragementation are a remnant of having to defragment FAT drives regularly. One did it superstitiously in those days because an overly fragmented filesystem did slow down considerably. No modern filesystem has an excuse for not handling fragmentation with no interference from the user.

    Head seek and rotational latency is still much slower than contiguous blocks. True, modern systems deal with it better, partially due to b-tree and other file indexing strategies and partially due to having plenty of RAM for metadata caching and predictive caching. But fragmentation is still a major issue for me on multiuser Windows PCs and periodic disk cleanup and defragmentation is necessary for reasonable operation speed.

    <MS gripe>
    In particular, the hidden "Content.IE5" cache of IE on 20-100 user PCs fills up hard drives in a big hurry, and I haven't found a way of controlling this except for periodically deleting with the following batch file I made for Win2k. (Limiting the cache size doesn't seem to affect these files.)
    @echo off
    echo.
    echo ***** WARNING!!!!! This will wipe out
    echo ***** C:\Documents and Settings\*\Local Settings\Temporary Internet Files contents.
    echo ***** Press Control-C to abort or any other key to delete all temp files.
    echo.
    pause
    for /D %%x in (c:\"Documents and Settings"\*) do rd /s /q "%%x\Local Settings\Temporary Internet Files\Content.ie5"
    This needs to be done preventatively, though. In addition to fragmenting data, 20-100 user PCs with large numbes of files (and Content.IE5 is my killer in my situation) fill up the MFT and then fragment it, and once you get the MFT fragmented you're basically screwed.
    </MS gripe>

    Admittedly the biggest problem PCs have disks less than 12G and I don't have as much of a problem with 20G+ systems. But have you ever run defrag after a clean install, even with an enormous hard drive? You'd think it could at least install itself without severe fragmentation. Oh well.
  • Re:Huh? (Score:2, Informative)

    by Rakefighter ( 147924 ) on Wednesday May 19, 2004 @04:28PM (#9198307) Homepage
    Those are stange numbers...

    When you download a file with internet explorer, it downloads to a temporary directory, and then copies it to the location you selected in the "Save" dialog (using Windows copy facilities). According to your logic, the file that you downloaded should not be fragmented, at all.

    Care to explain yourself?
  • by jimfrost ( 58153 ) * <jimf@frostbytes.com> on Wednesday May 19, 2004 @04:28PM (#9198308) Homepage
    A drive is not one dimensional, it is three dimensional: Rotation, platter, track. It is this geometry that BSD FFS takes advantage of to avoid large fragmentation-related delays, since while there may only be one "optimal" sector to use there are quite a few that are "nearby" in terms of rotation, seek, or platter.
  • by Artifakt ( 700173 ) on Wednesday May 19, 2004 @04:29PM (#9198333)
    There are so many comments already posted to this topic that seem to not grasp the following point, that I think the best way to deal with it is to start a completely new thread. I'm sorry if it seems more than a little obvious to some of you:

    There are fundamentally only a few types of files when it comes to fragmentation.

    1. There are files that simply never change size, and once written don't get overwritten. (Type 1). Most programs are actually type 1, if you use sufficiently small values of never :-), such as until you would need to perform disk maintenace anyway for lots of other reasons in any 'reasonable' file system. A typical media file is probably Type 1 in 99%+ of cases.

    2. There are files that will often shorten or lengthen in use, for example a word processor document in .txt format, while it is stll being edited by its creator. (type 2). (That same document may behave as effectively Type 1 once it is finished, only to revert to type 2 when a second edition is created from it.)

    Of type 2, there are files of type 2a. Files that may get either longer or shorter with use, on a (relatively) random basis. (as a relatively simple case, a .doc file, that may become longer for obvious reasons like more text, but may also become longer for less obvious reasons (such as the hidden characters created when you make some text italic or underlined). (These are reasons that are not obvious to most end users, and often not predictable in detail even to people who understand them better). The default configuration for a Windows swap file is type 2a. It is likely to be hard for an automated system to predict the final size of Type 2a files, as that would imply a software system of near human level intelligence to detect patterns that are not obvious and invariant to a normal human mind. It may be possible to predict in some cases only because many users are unlikely to make certain mistakes, (i.e. cutting and pasting an entire second copy of a text file into itself is unusual, while duplicating a single sentence or word isn't).

    Then there are files of type 2b. Files that get longer or shorter only for predictable reasons, (such as a Windows .bmp, which will only get larger or smaller if the user changes the color depth or size of the image, and not if he just draws something else on the existing one.). A good portion of users (not all by any means) will learn
    what to expect for these files, which suggests a well-written defragger could theoretically also auto-predict the consequences of the changes a user is making).

    3. Then there are type 3 files, which only get longer. These too have predictable and unpredictable subtypes. Most log files for example, are set up to keep getting longer on a predictable basis when their associated program is run (type 3b). Anything that has been compressed (i.e. .zip) is hopefully a 3b, but only until it is run, then the contents may be of any type. A typical Microsoft patch is a 3a (it will somehow always end up longer overall, but you never know just what parts will vary or why).

    4. Type 4 would be files that always get smaller, but there are no known examples of this type :-).

    These types are basic in any system, as they are implied by fundamental physical constraints. However, many defrag programs use other types instead of starting from this model, often with poor results.

    In analyizing what happens with various defrag methods, such as reserving space for predicted expansion or defragging in the background/on the fly methods, the reader should try these various types (at least 1 through 3), and see what will happen when that method is used on each type. Then consider how many of those type files will be involved in the overall process, and how often.

    For example, Some versions of Microsoft Windows (tm) FAT32 defragger move files that have been accessed more than a certain number of times (typically f
  • by ewhac ( 5844 ) on Wednesday May 19, 2004 @05:14PM (#9198895) Homepage Journal
    How safe is [resizing an NTFS partition] anyways?

    With the latest versions of ntfsresize, fairly safe. I did it on a machine at work with very important data on it (yes, I backed it up first), and had no trouble at all. However, all ntfsresize can do is truncate an NTFS partition's free space. In other words, it won't relocate blocks to other free areas of the disk. So the most you can shrink it is by however much free space you have at the end of the partition. ((After Googling around a bit, I've learned that the most recent versions of ntfsresize [rulez.org] will now move datablocks around, so apparently that restriction is now gone. I have not personally tested this, however.))

    Incidentally, ntfsresize is part of Knoppix, and gets run through QTPartEd, a partition editing tool. It is an older, non-relocating version, however.

    Schwab

  • Re:Not Exciting (Score:2, Informative)

    by greck ( 79578 ) on Wednesday May 19, 2004 @05:27PM (#9199027) Homepage
    Ideally you wouldn't see your harddrive thrash when booting...

    actually, Darwin/OS X has a really nifty feature called BootCache that collects information at boot time and primes the read-ahead on subsequent boots to smooth things out... everyone found out the hard way when it was mildly broken in an update to 10.2 exactly how much difference it makes (it knocks about 2/3 off the boot time of my PowerBook).

    see Amit Singh's excellent article [kernelthread.com] for more info, there's a chunk on BootCache at the bottom of this page [kernelthread.com].
  • by tenton ( 181778 ) on Wednesday May 19, 2004 @05:52PM (#9199323)
    Journalling has existed since 10.2.2 [apple.com](at least on the Server end; I believe the consumer end too, except you had to enable it via a terminal command), so... ^_^
  • by spectecjr ( 31235 ) on Wednesday May 19, 2004 @05:53PM (#9199332) Homepage
    How many ways are there do define fragmented files? If I can read the file starting from the byte address of the first byte of the file sequentially all the way to the EOF, it isn't fragmented. Otherwise, every time I have to jump to a non-sequential byte address, that's another fragment. Am I missing something?

    As an example, look up the docs on ext2. Note that file fragments are not necessarily the same as fragmented files. Also note that people use the "file fragment" number as an indicator of how fragmented their ext2 partition is - which is wrong.
  • by gerardrj ( 207690 ) on Wednesday May 19, 2004 @06:49PM (#9199926) Journal
    There used to be several disk access optimizations

    Vendors used to do interleaving with the format/fdisk commands I recall. The idea was that writing the sectors in a continuous stream was not very efficient as the drives of the time could not move data to or from the disk so quickly. You'd read sector 1, and by the time you were ready to read sector two, sector 3 was under the head, so you had to wait almost an entire disk revolution to find sector 2 again.
    The interleave told the OS to skip X physical disk sectors for each 1 logical sector.

    For example, assume a disk with 12 sectors on a track such that when stationary the disk's sectors align with the hours on a clock face. With interleave of 3 the OS would put sector 1 at 1:00, sector 2 at 4:00, sector 3 at 7:00, sector 4 at 10:00, sector 5 at 2:00, and so on. The OS would occasionally skip more than the "interleave" number of sectors in order to not overwrite previous sectors. This meant that by the time that logical sector 1 was read and transferred to the computer, logical sector 2 would just about be under the heads for reading, thus eliminating or at least minimizing the rotational latency.

    Another big advantage was placing the directory structures in the middle tracks of the drive. This minimized the longest seek that would have to be performed. Unless a single file was very large or in just the wrong spot, it would usually be positioned completely on the inside or outside half of the tracks. After reading, the head only had to move at most half way across the disk to locate the next file or cluster/fragment to read or write; the again at most 1/2 the disk to perform the next operation.
    Most of today's file systems start placing directory/catalog information at the start of the disk, this effectively doubling average seek times to the data stored on the disk.

    As others mentioned, on some "faster" drives, there were filesystems that essentially treated the platters in a drive as individual units and managed then like a RAID 0, a RAIP so to speak (Redundant Array of Independent Platters).

    File fragmentation in today's fast, large buffer drives is, I think, the least of our worries. Fragmented or not we need more optimization of data structures on the drive. I'd rather have related files fragmented and near-by each other than contiguous and spread evenly across the drive.
  • by Nexx ( 75873 ) on Wednesday May 19, 2004 @10:28PM (#9201112)
    FWIW, WinXP will, when you have a tonne of browsers open, place them under one entry on your taskbar. Same goes with Excel windows, vi windows, etc. I think it goes by the process names.

    What's not too obvious is that a lot of Windows administration tools are instances of one executable (though I imagine it executing different things), so they all get lumped under one entry on my taskbar too.
  • Re:Offtopic (Score:3, Informative)

    by toupsie ( 88295 ) on Thursday May 20, 2004 @01:02AM (#9201701) Homepage
    And I'd be interested in the metric you use. Tell me, how often did we have terrorist attacks on US soil before 9/11? Well let's see, there was Oklahoma city, but that was an American so it doesn't really apply does it. So that leaves The first WTC bombing as the most recent attack preceding 9/11. 8 years lie between those two attacks.

    You are forgetting two embassies in Africa and an American Warship. All of those are American soil. So it is not an attack every eight years.

  • Fast! (Score:3, Informative)

    by rixstep ( 611236 ) on Thursday May 20, 2004 @03:24AM (#9202079) Homepage
    One thing people rarely talk about is how fast HFS+ is. Or perhaps how slow UFS on the Mac with OS X is. But the difference is more than dramatic: a clean install of OS X using HFS+ can take less than half an hour - including the developers tools. The same procedure using UFS seems to never end.

    It might be the way they've 'frobbed' UFS for use with OS Server, but UFS really gives high priority to disk ops with GUI ops taking the back seat, and yet HFS+ is in comparison blazingly fast.

    I believe in a good clean machine like anyone, and I do see the probability DiskWarrior will be needed now and again, but the speed alone is quite a pedigree for HFS+ IMHO.

Real Programmers don't eat quiche. They eat Twinkies and Szechwan food.

Working...