October 15, 2001, 2:26 PM — In my March '98 article on file compression, I asked the question: How big is a file, anyway? This month I am going to expand that question to: How big is a directory, anyway?
If the directory only contains files, it's easy enough to issue an
command and get the sizes of files in bytes and blocks.
$ ls -ls total 6 2 -rw-r--r-- 1 mjb group 3 Feb 04 23:31 minutes.txt 4 -rw-r--r-- 1 mjb group 1201 Feb 04 23:25 note.txt
The first column contains the size of the file in 512-byte blocks, and the sixth column
gives the size of the file in bytes. Files in this directory consume 6
blocks, containing only 1204 bytes. In the March column, I discussed
allocation units -- the minimum space allocated by the operating system for a
file. You should review that article for more details, but here's a brief
explanation of how allocation units work.
This method is used in all major operating systems in one form or another.
Some convenient number of bytes is selected as the minimum amount that can be
allocated to a file. This amount is an allocation unit. If the file doesn't
use all the space in an allocation unit, it's recorded
at the beginning of the unit, with the remaining space set aside to accommodate further expansion of that file.
As you add to the file, the new data is stored in the empty reserved space on the disk, so long as it doesn't exceed the number of bytes permitted in an allocation unit. Once the file has used all available space, another allocation unit is grabbed and reserved. Any spillover from the first allocation unit is tucked in at the start of second allocation unit, and so on.
Earlier Unix systems used an
allocation unit of 512 bytes. These 512 bytes came to be known as a block. As
disk sizes grew, the basic allocation unit was increased to 1024 bytes on most
systems (larger on some), but many utilities, such as
ls above, still report
file sizes or disk use in 512 byte blocks. So, the 3-byte file uses 2 blocks.
In the following example, the directory in question includes a subdirectory,
perl. The 2 blocks allocated for the perl directory are the blocks used only
by the directory itself, not those used by the files in the directory.
$ ls -ls total 6 2 -rw-r--r-- 1 mjb group 3 Feb 04 23:31 minutes.txt 4 -rw-r--r-- 1 mjb group 1201 Feb 04 23:25 note.txt 2 drwxr-xrx 2 mjb group 128 Jan 29 18:53 perl
We could figure out the sizes, by doing an
ls -ls perl, but suppose there's
another directory under perl? And what if there were a third directory beneath that one?
How do you du?