Submit Hint Search The Forums LinksStatsPollsHeadlinesRSS
14,000 hints and counting!


Click here to return to the 'Not exactly' hint
The following comments are owned by whoever posted them. This site is not responsible for what they say.
Not exactly
Authored by: babbage on Dec 06, '01 03:39:30PM

"Metadata" is just data about data. Some of it is stored along with the actual data -- such as dates at which the file was created & last edited -- while the rest can be inferred from the data itself -- such as the size of a file in bytes or, critical to this issue, the format of the file. Different systems have different mechanisms for working with metadata, and you can certainly have a bad / non portable mechanism, but ideally this is mostly harmless, because the data itself can generally be exchanged harmlessly (unless your implementation is *really* bad) and you just lose the enhancements. Remember, the idea is to keep the data in one place (your file, your database, whatever) and your metadata sits parallel to it, pointing at the data.

All computer systems provide mechanisms for working with metadata, some of these are just more advanced than others. Oldschool Macs & Unix workstations could infer file type data by "resource forks" on the Mac, or (more cleverly, in my opinion) "magic numbers" on Unix. Resource forks should be familiar to most people here, so I won't rehash them, but the idea with magic numbers is that each major file type has a fingerprint of some kind -- gif images should start with the five letters GIF89, Perl scripts will start with a shebang (#!) and should have the path to the local copy of Perl after that, HTML can be detected by all the angle brackets & keywords, etc. The point is, no file has to explicitly declare what type it is, because with a well filled out magic number database, you can find those fingerprints automagically.

The BeOS filesystem took these ideas very far, almost into making a relational database out of their filesystem, where you could store an arbitrary number of fields about each file into something resembling a Mac resource fork. Plain ascii text files could have formatting information available yet could be opened on other systems in a plain text editor like Notepad or Vi. Mp3s might know the artist, album, musicians, year, byte length, time length, sample rate, and so on. Email messages could have all the header data embedded as metadata fields -- to, from, mailer, date, subject, etc. The more fields the better, because it makes it possible for you to sort & filter based on these criteria at the filesystem level, using the Tracker (BeOS's version of the Finder). You could open up the MP3 directory and get a list of all Beatles songs where George was the writer, sorted by year, album, and track sequence. Without available metadata, that might not even be possible without manually tracking down all that Beatles trivia.

Magic numbers, resource forks, and the Be storage sytem are all clever mechanisms. The system used on Windows and on the web is crude by comparison: extensions. If a file ends in .XLS, then it must be a Excel spreadsheet -- even if a magic number parser would correctly notice that it's just a tab delimited data file that could be safely opened & edited in any text editor, Windows will only let you open it in Excel.

Metadata storage mechanisms don't have to be mutually exclusive. Apache is capable of serving web content by either extension or magic number info -- so it's false to assume that it's an either/or proposition here.

That's why Apple's decision -- not to abandon metadata, but to abandon their implementation of it in favor of the lowest common denominator extensions approach used on the Web and Windows -- is such a bad thing. For files that are meant to be excanged, sure, go ahead, slap an extension on the end of it so you don't break the poor little WinXP box. But for internal use it would be nice to have that enhanced mechanism too. In the broad sense, metadata is an unquestionably good thing, and the more of it the better. By declaring that the more crude system used by the others is the only one they're going to support, they're effectively taking a massive step backwards. Sign the petition and urge them not to do this, please.

[ Reply to This | # ]

Not exactly
Authored by: percy on Dec 06, '01 03:53:05PM
Hear, hear. Your comment does a good job to explain one of the issues people often don't think about.
In the text John Siracusa wrote, he's also linked to two articles of his that he's written for Ars Technica. In those, he explains these issues more thorough.



[ Reply to This | # ]
Thanks for the expansion...
Authored by: robg on Dec 06, '01 05:12:06PM

I figured if I put a brief summary up, reflecting my limited knowledge of the depths of metadata, someone with more knowledge than I would go into the details! ;-).

thanks!
-rob.



[ Reply to This | # ]
Thanks for the expansion...
Authored by: babbage on Dec 06, '01 07:43:18PM
Thanks, glad it helped. If I can be a pushy jerk now, it might help to rephrase the first paragraph slightly, You've got:
On the Mac, one piece of metadata is the type and creator information that is stored with each file on the system. These are the bits that tell the Mac which application should open which file, regardless of the file's name or extension. Metadata has long provided an advantage over Windows - no need for filename extensions. The downside, however, is that Mac files are more difficult to exchange with PC users, and lose their metata in the transition. With the release of OS X, Apple has headed away from metadata as the sole means of identifying a file, and has added file extensions. While improving cross-platform compatability, this change has been the source of tremendous debate among Mac users -- do filename extensions mean the end of metadata on the Mac?

It might read better as something like this:

On the Mac, metadata such as the type and creator codes are stored with each file on the system in resource forks. These tell the Mac which application should open which file, regardless of the file's name or extension. Resource fork based metadata has long provided an advantage over Windows - no need for filename extensions, and many other subtle benefits besides. The downside, however, is that Mac files are more difficult to exchange with PC users, and lose most metadata in the transition. With the release of OS X, Apple has headed away from resource forks as the primary means of identifying a file, and has encouraged the adoption of file extensions. While improving cross-platform compatability, this change has been the source of tremendous debate among Mac users -- do filename extensions mean the end of metadata on the Mac?

The important changes being that [a] Apple isn't asking to get rid of metadata, they're getting rid of their time-tested but non-portable mechanism for storing it, on grounds that it's no fun being a little island of sanity in a mad, crude, networked & Windows dominated world, that [b] file name extensions existed before, but were optional -- Photoshop EPS, just to pick one -- and are now being required and the older, more robust & dynamic standard is being abandoned.

The concern is that this is a false dilemma, and that the Mac can easily do both. For network portability (and, ironically, compatibility with some of the BSD tools that come with OSX), extensions should be preferred to resource forks. But for the most part, external storage of metadata is the way forward. Windows has had support for it, if not widespread use of it, for years now, and that has only been expanded with WinXP. With OSX, Apple wants to step back towards where Windows was years ago even as Microsoft is finally learning the lessons Apple has been teaching for decades now. That is why this is important, and why people should sign the petition & get Apple not to do this.

[ Reply to This | # ]

Thanks for the expansion...
Authored by: dahlenu on Dec 08, '01 06:16:45PM

It seems you think that type and creator codes are stored in the resource fork. It is not.

Think of type/creator as on the same level as owner/group/permissions in UFS.

Again, since this is the most mis-understood thing about meta-data on the Mac: type and creator codes have nothing to do with the resource fork, and they do not require multi-forked files.



[ Reply to This | # ]