Mac OS X 10.3 Defrags Automatically 181
EverLurking writes "There is a very interesting discussion over at Ars' Mac Forum about how Mac OS X 10.3 has implemented an on-the-fly defragmentation scheme for files on the hard drive. Apparently it uses a method known as 'Hot-File-Adaptive-Clustering' to consolidate fragmented files that are under 20 MB in size as they are accessed. Source code from the Davwin 7.0 Kernel is cited as proof that this is happening."
Amortized cost... (Score:4, Interesting)
Also, I wonder whether if you were to calculate the extra time (perhaps 500ms) to defragment each fragmented 20MB file against doing a manual defrag every month, and whether it's actually worth it...
Don't some Linux filesystems already do this to some extent? I could be hallucinating again, but I'm sure I read this somewhere.
Re:Amortized cost... (Score:2)
As for Linux filesystems they don't support FileIDs so who cares |-)
I think it's... (Score:2)
Re:I think it's... (Score:2)
Re:I think it's... (Score:3, Funny)
Re:I think it's... (Score:2)
Ahhh.... :-)
Re:I think it's... (Score:3, Insightful)
I believe the rationale is that it takes little more than the same number of IOs to defrag as it is going to take to read the file once, and will take less IOs on subsequent accesses to the file (after defrag), which would ap
Re:Amortized cost... (Score:2)
Re:Amortized cost... (Score:5, Informative)
The end goal of the disk subsystem is to get your data to you as soon as you need it. In general that goal would be achieved if the data you want to read next happened to be under the read head. If you're reading sequentially through a single file then this will happen when the file is in a single contiguous region (i.e. unfragmented). For any other access pattern fragmentation doesn't matter as much, since you'll be skipping around the disk regardless of how the files are arranged.
Prefetching heuristics and caching can hide a lot of these problems from the user as well.
Re:Amortized cost... (Score:2)
For an extent based FS a less fragmented file is more likely to have whatever part of it's extent table is needed in memory to do a read or write on it. (for a block mapped one that is a non issue because the block map takes space proportional to the file size, not the number of fragments)
information (Score:4, Insightful)
Re:Amortized cost... (Score:2)
One time I saw something like 60% fragmentation. I'd never seen anything above 2 prior to that.
No such limits. (Score:4, Informative)
The source code is posted to that thread; the only conditions are (1) 3 minutes after the system time starts (i.e. avoid doing so when booting up), (2) less than 20 MB of size, (3) file isn't already opened.
The only negative consequence is a possible speed hit, though. There's no danger.
I'm pretty impressed by this. Sure, it's been done before. Sure, there are more elaborate methods. But this is just a simple little lump of code that'll defragment the worst files most of the time.
Oh, missed one... (Score:2)
Re:No such limits. (Score:2)
Isn't the point of keeping your drive defrag'ed to increase the performance of reading and writing?
With 200+ gig hard drive capacities becoming more ubiquitous, the performance hit of on the fly defragging is worthwhile. Over the long run, it improves performance of a machine to keep it defragged, right?
Re:No such limits. (Score:2)
Apple's choice of algorithm defrags files as it encounters them. If you are upgrading a drive, it may have a lot of fragmented files that need to be defragmented. So initially, it will be very slow.
Once the code has been exercised for a while, yeah, it'll be much faster.
Re:Amortized cost... (Score:3, Interesting)
Self-defragging might be great on file servers but Macs are (largely) about the multimedia.
Re:Amortized cost... (Score:2, Informative)
Yes it would be a one-time hit, but we hard disk intensive audio and video people don't want to be streaming multiple tracks off our hard disks while they are defragging themselves!
...which is why there's the 20 MB limit.
Comment removed (Score:5, Interesting)
Re:Amortized cost... (Score:5, Insightful)
This is true. However it is also true that a defrag does not have to put the data in physically contigious blocks. It can just as easily put the data in whatever configuration makes retrevial work fastest on that particular drive geometry.
This means that an intelligent defrag can improve performance.
The defragger already allows partial fragmentation (Score:2)
Re:Amortized cost... (Score:2)
I thought (but could be wrong) that hard drives internally split up the platters for speed games - sort of an internal 'raid 0' approach.
It would make sense - modern hard drives already lie about their geometry. If a drive has 2 platters/4 heads, by doing an internal 'raid 0', that drive is now four times faster then the competition.
Perhaps a slashdotter with a bit of hard drive clue could tell us if that's the case.
Re:Amortized cost... (Score:2, Informative)
Link [storagereview.com]
'Multiple Heads' misunderstood (Score:2)
I tell you this, I notice quite a speedup after I perform a full-backup-and-restore, the only way to defrag most Linux filesystems. I saw loading OpenOffice take 7 seconds before, and 3 after.
When I load Mozilla
JOUNRALLING my boy JOURNALING (Score:4, Informative)
Re:JOUNRALLING my boy JOURNALING (Score:2)
hfs_global_shared_lock_acquire(hfsmp);
if (hfsmp->jnl) {
if (journal_start_transaction(hfsmp->jnl) != 0) {
return (EINVAL);
}
}
if the the transaction can't be started, the acquired HFS lock never gets released. After this, the code is careful to record the error and jump to a common epilog that unlocks the HFS subsystem.
I'd rather not turn journaling on.
Re:Amortized cost... (Score:2)
Re:Amortized cost... (Score:2)
Cool but... (Score:1)
How to defrag your entire hardrive using this (Score:3, Informative)
this should defrag all of the 20M or less files on your hard drive.
it locates every file, opens it and reads every bite then closes it.
This should force the defragger to run on all files under 20M. Not that technically the defragger only activates when the file is broken into more that 8 extent regions. So this does not actually defrag everything.
but its also possible that having the file broken into less extents is harmless. first because the the first 8 extents are the fastt
Re:How to defrag your entire hardrive using this (Score:3, Informative)
That'll also read them even if they don't need to be defraged. This may be better:
sudo find / -exec head {} >/dev/null \;
Left as an exercise to the reader:
Hmm... (Score:1, Offtopic)
Re:Hmm... (Score:2)
Comment removed (Score:4, Funny)
Re: (Score:2)
Autodefrag. (snort) (Score:5, Funny)
In my day, we'd crack open the drive on our Mac SE30s, sharpen a magnet on a whetstone, and defrag that sucker by hand.
Kids these days. It's the MTV, ya know - makes 'em lazy.
Re:Autodefrag. (snort) (Score:2)
Re:Autodefrag. (snort) (Score:2)
Re:Autodefrag. (snort) (Score:5, Funny)
Oh wait: that would have been actually useful. (What, nobody else remembers stiction?)
What exactly are.... (Score:3, Interesting)
Are they comparible to what Reiser4FS will have? Are they better that XYZ offering in Linux?
I'm seriously interested in what EXACTLY they are. Please spare the fanboy attitude if you do wish to answer..
Re:What exactly are.... (Score:2)
Re:What exactly are.... (Score:3, Informative)
Re:What exactly are.... (Score:2)
After looking through the basics, isn't the FileID something similar Hans did in Reiser? Course in the earlier versions, the "same ID bug" got my
Re:What exactly are.... (Score:2)
If you have multiple files or directories referring to one file that sounds more like an inode than a FileID.
Re:What exactly are.... (Score:2)
Any filesystem that supports hard links has to have something like a Mac FileID. Traditional Unix filesystems call them "inode numbers" or "i-numbers", try "ls -i" sometime.
On Mac OS you can open a file given it's FileID but on most Unixes you can not (I assume OSX can, and some versions of SunOS/Solaris can as pat of one of the backup products). It opens a small security hole where you put files in a directory
Re:What exactly are.... (Score:2)
BULLSHIT!!!
FileIDs are not inodes. They are NOT equivalent as I pointed out elsewhere.
Re:What exactly are.... (Score:2)
Um, all I saw you say that one can do with FileIDs that you can't do on most Unixish systems is #1 open files directly by them, and #2 convert them to a name without a full scan of the filesystem. Am I wrong?
If I'm right they are equivalent data structures, but the operations you want are not normally available. Wit pretty minimal work one could put both operations into an Open Source Unixish system. I would say 3 hours work
Re:What exactly are.... (Score:2)
Secondly the problem is the roles are in reverse. The FileID is like the path in UFS. You don't have a situation where multiple paths can point to one FileID because the path is more like a file attribute. In other words the FileID (or CNID) has a path, not the other way around.
On UNIXish filesystems it's the path which has the inode, not the other way around.
think outside the /. (Score:5, Insightful)
Well that's fine. The real upside of this is for people that have never heard of /. and don't really know what a hard drive is, let alone know how to defrag one.
Previously these people would just go forever without defragging. Now they can still do that, because Apple is doing it for them behind the scenes.
This is yet one more example of Apple's winning philosophy: Keep it simple, make it better.
Re:think outside the /. (Score:2)
The upside is that 95% of this is simply marketing to wow the Windows users raised on FAT. IMHO.
Damn! (Score:4, Funny)
XP has a similar feature (Score:2, Interesting)
Re:XP has a similar feature (Score:2)
Re:XP has a similar feature (Score:2)
http://www.kellys-korner-xp.com/xp_defrag.htm [kellys-korner-xp.com]
XP way maybe not so good. (Score:3, Interesting)
I'm Not sure the windows approach is really better. Notice that the apple approach is more minimalist in moving files.
Re:XP way maybe not so good. (Score:3, Interesting)
Not if you do it intelligently (copy data, compare to original, delete original as an atomic operation).
(notice that the apple method relies on journaling to save your butt if the computer crashes mid write.)
That mightn't save your file data. AFAIK HFS+'s journalling is metadata only.
the windows program wont be able to move files that are currently open (I would
Re:XP way maybe not so good. (Score:2)
The algo Apple uses is get a write lock on the file, write the current data out as if it were being appended (which attempts to write it in as few chunks as possiable), then get a read lock on the file, then free up the "old part" of the file and adjust meta data to make the newly written blocks be the start of the file. I assume something prevents the file from looking twice as long as it should dur
Re:XP way maybe not so good. (Score:2)
Ah, I think you'll find it just moves the file to a new, contiguous area and *then* opens it.
Journaling of meta data is all you need. the defragged ffile is written out. if the computer crashed during the update of the file table youre hosed. Journalling saves your butt.
If the machine crashes while the OS is halfway through writing a file, it's entirely possible for the file to become corrupted, or not all the new data to be written. Metadata journalling won't help you then.
Re:No youre wrong, parent is right (Score:3, Interesting)
By the logic you seem to be applying, every time a file is accessed you "risk" corrupting it.
It does not matter if you double check what you wrote, because that only decreases the chance of making an error. it does not eliminate it. you might make the same read error twice in a row.(e.g. to make this plausible imagine say a weak magnetization that flips after a temperature change later that night). or perhaps you may have read the file wrong in the first
Re:No youre wrong, parent is right (Score:2)
It's hardly the 500G aspect. Defragging the 20G hard disk in my laptop shows no discernable benefit, even when going from "99% fragmentation" to "1% fragmentation".
But if the OS is going to do it in the background with practically zero overhead, even without any other benefits I say go for it!
Except it will - by definition - have overhead. The more files a user is manipulating, the more overhead it will have.
Not quite right (clarification) (Score:5, Informative)
To clarify, there are 2 separate file optimizations going on here.
The first is automatic file defragmentation. When a file is opened, if it is highly fragmented (8+ fragments) and under 20MB in size, it is defragmented. This works by just moving the file to a new, arbitrary, location. This only happens on Journaled HFS+ volumes.
The second is the "Adaptive Hot File Clustering". Over a period of days, the OS keeps track of files that are read frequently - these are files under 10MB, and which are never written to. At the end of each tracking cycle, the "hottest" files (the files that have been read the most times) are moved to a "hotband" on the disk - this is a part of the disk which is particularly fast given the physical disk characteristics (currently sized at 5MB per GB). "Cold" files are evicted to make room. As a side effect of being moved into the hotband, files are defragmented. Currently, AHFC only works on the boot volume, and only for Journaled HFS+ volumes over 10GB.
interaction with "secure" delete (Score:2, Interesting)
(Also, has anyone confirmed that the code snippet is actually executed?)
Re:In other news.... (Score:3, Interesting)
Time for HFS++
Re:In other news.... (Score:2)
How are FileIDs different than inodes then?
-molo
Re:In other news.... (Score:5, Informative)
Also you cannot get a file path from an inode thus if the file path is changed (moving a file for example) the application cannot know what the new path is.
A FileID is really more equivalent to a path, or rather used in place of a path with the advantage that the path can change and the fileID remains the same. Thus referring to a FileID is less fragile.
Also FileIDs are smaller so searching for files using a FolderID or FileID is faster and uses less memory.
They're not equivalent.
Re:In other news.... (Score:2)
There are some Unixes that let you open a file give it's i-number (er, and device number, or a path to the devices FS, I forget which). There are also some Unix-ish OSes (Plan9!) that let you get a part for any open file.
If you had those two primitaves would not FileIDs and i-numbers be the same? Or is there more to a FileID?
Re:In other news.... (Score:2)
And it is NOT an inode! And before you even say it, it isn't a file descriptor either.
Re:In other news.... (Score:2)
Re: (Score:3, Interesting)
Re:In other news.... (Score:2)
Re:In other news.... (Score:3, Informative)
Because the rest of the computing world is more interested in successfully interacting with itself and has realised that filesystem metadata is practically impossible to successfully move between systems using "common" tools.
Apple has finally figured this out, too, which is why they're moving away from it.
Filesystem metadata is another one of those cool ideas th
Re: (Score:2)
Re:In other news.... (Score:2)
Conversely on Unix you get the "everything is a file" and "a file is a stream of ASCII text" which is really powerful for shell programming but makes end user cross application interaction
Re:In other news.... (Score:2)
A good example is a versioning system (version 1, version 2...) gets you 1/2 of what CVS gets you (versioning and branching).
Microsoft discovers metadata (Score:2)
Actually, if you read some of the articles about Microsoft's preview of Longhorn you'll see that they're developing a new filesystem that not only incorporates metadata but takes it to the level of a relational database. Expect to see many articles on how metadata is the best innovation to come from Microsoft since Windows 95.
Re:In other news.... (Score:2)
See for more info.
Re:In other news.... (Score:5, Informative)
You speak as though HFS+ has trouble with file fragmention. It's easily already one of the best filesystems for avoiding fragmentation - I've worked on Macs that have been run for years without attention and were better than 90% unfragmented. This is considerably better than any of the Microsoft filesystems, for instance. This tweak is an improvement, to get from 90% to 99%.
HFS+ doesn't just put the files down randomly, either, it has some smarts [apple.com].
This also explains why the hard drive on my iBook seems alot hotter since upgrading.
The only way this feature can do that is if you're writing small files continuously. That's very strange software behavior, and perhaps a worst case for this optimizer. Why would you be doing that?
Don't get me wrong, HFS+ isn't the best filesystem ever created, but it's very featureful (multiple forks, file ID's, case-preserving, case-insensitive-possible, UNICODE, attributes, 64-bit file sizes, POSIX compliance, etc.) and the MacOS relies on it heavily. Anything to replace it would be a superset of HFS+. Fortunately, Apple hired the guy behind the Be Filesystem a few years back. I doubt he's working on iMovie 3.1.
Re:In other news.... (Score:2)
Doesn't being case insensitive violate POSIX? Or has that been fixed?
Re:In other news.... (Score:2)
2) Fuck POSIX |-)
I'll take usability over some standard which is easy to work-around.
Re:In other news.... (Score:2)
Actually, I usually hear HFS+ described as case insensitive while reading, but case invariant or case preserving while writing. That is, the filesystem will record the file however it was first written, but can permit case insenstive searching.
That way, when working in the BSD shell, everything works the same way it does on any other POSIX shell, with case sensitivity being the norm, but when working in the Finder you can browse & search in a case insensitive way.
There's a pretty good case to be mad
Re:In other news.... (Score:2)
Nope:
It's always case-insensitive (unless you've constructed a case-sensitive HFS+) - there's no BSD shell vs. Finder difference.
Re:In other news.... (Score:2)
I stand corrected, and I should have thought of this.
This is exactly the problem when installing the LWP library for Perl -- it offers to install /usr/bin/HEAD for you, as a tool for doing http-head requests on web servers. On most POSIX filesystems this isn't a big deal, but on HFS+ it ends up clobbering /usr/bin/head, the standard tool for retrieving the opening lines from a file or data stream (a/k/a file). The LWP maintainers don't see this as a bug on their end, because they get the behavior they wan
Re:In other news.... (Score:2)
Re:In other news.... (Score:2)
You'd probably have to take that up with Apple, but I suspect that the tool is seen as a userland one rather than a system level one, hence /usr/bin instead of just /bin. I've just checked on two Solaris machines (`uname -r` one gives 5.7, the other gives 5.8) and one Linux machine (RedHat 6.2), and all three of them have their head utility at /usr/bin/head -- so it's not just OSX that puts head in with the userland toolkit in /usr/bin.
I think the fix you're grasping for is to put LWP's HEAD in something
Re:In other news.... (Score:5, Informative)
See the -s option for newfs_hfs [apple.com]:
(from man newfs_hfs)
-s
Creates a case-sensitive HFS Plus filesystem. By default a case-insensitive filesystem is created. Case-sensitive HFS Plus file systems require a Mac OS X version of 10.3 (Darwin 7.0) or later
Re:In other news.... (Score:2)
Re:In other news.... (Score:2)
YEEEEEeeeeeeess.... [macrumors.com]
You have the option of formatting your drive as case sensitive, if you really want to.
Personally, I'd recommend against it, unless you have a really good reason to choose otherwise. Most Mac apps assume case insensitivity. For instance Dantz [dantz.com] has this knowledgebase article [dantz.com] regarding Panther and Retrospect. Note the paragraph which reads:
Re:In other news.... (Score:3, Informative)
Doesn't being case insensitive violate POSIX? Or has that been fixed?
yes [macosxhints.com]
Re:In other news.... (Score:2)
I hope he's working on HFS++ instead of some BFS port.
BTW filename extensions still suck...what was Apple thinking?
Re:In other news.... (Score:3, Funny)
Interoperability.
Re:In other news.... (Score:2)
> The only way this feature can do that is if you're writing small files continuously. That's very strange software behavior, and perhaps a worst case for this optimizer. Why would you be doing that?
Sounds like compiling to me. Typical usage for a developer.
- Muggins the Mad
Re:In other news.... (Score:2)
I don't think so. The source code files are generaly written in their entirety by the editor even if you only make changes starting halfway though a file (or add a few functions on the end). So they will normally only occupy one extent (i.e. be unfragmented), and are stunningly unlikely to have mroe then 7 extents (needed for auto-defrag to kick in). The object files are again only written in their entierty. The .s files (if written) are the
Re:In other news.... (Score:2)
Perhaps for a server, but not for workstation. If I'm browsing the web, it's going to be at least 10 seconds between cache writes, as the page loads and I read it. Email or usenet would be similar - a bulk write every five minutes or so, maybe an occasional write on message send. That's not 'continuously' for a filesystem or a hard drive - that's light duty
Re:In other news.... (Score:2)
What makes you think that? Have you ever had a powerbook put its disk to sleep? The OS is still running (and not writing lots of small files continuously).
Re:Necessarily Useless (Score:3, Insightful)
Isn't the reason all these high performance machines have so much RAM so that
they don't have to take the enormous hit of swapping to disk?
Even ram is too slow. That's why they're putting so much cache on the chips now.
Re:Necessarily Useless (Score:4, Interesting)
Even with this file defragmenter built-in, a drive defragmenter is still needed for certain types of users.
Re:Necessarily Useless (Score:2)
Re:I am just curious, (Score:2)
Re:I am just curious, (Score:2)
You can use UFS, but you better not run anything but pure OS X.
ColMustard's Question Hour - Part 2 (Score:2)
I was wondering why my Ford Escort doesn't get good pickup when I've got it packed with bricks. It goes fine when it's just me and the seats, but once I pile those bricks in it's so laggy! Do you think I should put brighter headlights in, or should I change the blinker fluid?