How to repair a full Unix directory

A few easy techniques quickly bring the situation under control.

By Cameron Laird  Add a new comment

A reader wrote me this week that his bash scripts were complaining "out of memory"; what should he do? It didn't take long to get him moving again.

While my colleague Sandra Henry-Stocker usually covers this territory in her "Unix as a Second Language", the ideas involved in this episode apply nicely in common situations developers and Windows administrators encounter, so I think there is value in reporting them here. My correspondent knew that he wanted to run

    find . -type f -exec grep -i -l -H "keyword" '{}' + | xargs rm -rf

but he was getting "out of memory" because he had millions (!) of files in his directory tree, and, if I understood him correctly, was operating with an older host that only had 256 megabytes of main memory. What should he do?

My first thought:

    INTERMEDIATE_FILE=/tmp/xyz.txt
        # Caution:  this coding is fragile, in that it mishandles filenames which
        # embed blanks.  Accommodating those is a story for another day.
    find . -type f -exec grep -i -l -H "$keyword" {} \; > $INTERMEDIATE_FILE
    for NAME in `cat $INTERMEDIATE_FILE`
    do
        rm -rf $NAME
    done

Did that help? "Yes!", the report came back--well, "yes and no." As I'm a big believer that long journeys begin with small steps, I found more encouragement than discouragement in the answer. Apparently the questioner needed to do several waves of cleanup, and "unrolling" the one-liner with an $INTERMEDIATE_FILE helped with some of the out-of-memory situations, but not all.

"One step at a time", I thought. After a little more negotiation, we reduced his symptoms to "out of memory" faults with

    ls -1 >> $INTERMEDIATE_FILE

and

    find ./ -size -6k -type f >> $INTERMEDIATE_FILE

Did I have any tricks left for those?

Sure; in fact, I have a history of creating this situation for myself. I often use temporary files for various test automations I run, and, unless I'm scrupulous about cleaning up after the tests, it's easy to find myself with tens of thousands of files named, for example, /tmp/tmp${RANDOM}.log. I've often had so many of these that trying to clean up the mess with rm /tmp/tmp*log does just what my questioner described: complains "out of memory". In a case like this, it's time to "eat the elephant one bite at a time", which translates, in this case, to something like

    rm /tmp/tmp*a*.log
    rm /tmp/tmp*b*.log
       ...
    rm /tmp/tmp*[g-j]*.log
        ...
    rm /tmp/tmp*[A-H]*.log
        ...

In English, the idea is to specify a subset of /tmp/tmp*.log small enough to fit in memory, but large enough to nibble away at the whole list. After slicing out a few "chunks", we quickly reduce the whole collection of remaining /tmp/tmp*.log to a manageable size, where more traditional bash programming can take over.

For find, a homologous approach would be something like

    find . -name "*a*" -size -6k -type f >> $INTERMEDIATE_FILE
    find . -name "*[bc]*" -size 6k -type f >> $INTERMEDIATE_FILE
        ...

The excitement wasn't quite over yet, of course; situations like this seem always to have "loose ends". In the case of my questioner, he had many files whose names included non-ASCII Unicode characters. I've got plenty of tricks for dealing with those, too, including switching to Tcl for my scripting. This time, though, we started with the files whose names were easy to express, processed all of them, and then determined, to my non-surprise, that the residuum which remained was small enough that the questioner could use his usual bash coding skills. Mission accomplished.

What's the conclusion? I don't have a particularly polished aphorism to summarize what happened. I do know, though, that many cases that look like "show-stoppers" the first time encountered turn out to be easy to solve for someone with just a little more experience. If you're feeling stuck, be clear with yourself what your true requirements are, what you're getting, and what appears to constrain you. Ask for help; someone else, with a different perspective, might quickly see a way to fit together all the elements of your problem to make a solution.

There's also a lesson here about craft-work that I don't yet know how to put into words. Part of the difference between "textbook learning" and the kind of professional training that diesel mechanics, physicians, lawyers, and plumbers all practice has to do with learning how to handle novel situations. It involves thorough apprenticeship in the basics, followed by exposure to progressively more challenging variations. If rm * doesn't give you what you want, break down the * part into pieces small enough to handle.

ITworld LIVE

SoftwareWhite Papers & Webcasts

White Paper

Activities Streams Base An Integrated Social Layer

The enterprise social software market is exploding thanks to converging trends of consumerization, cloud, and mobile. In this must-read report, "The Forrester Wave: Activities Streams, Q2 2012", Forrester Research Inc. evaluated five social software vendors with core strengths in the stream based on the overall strength of vendors' current offerings, a clear product strategy, and vendor market presence. In a detailed look at the space, Forrester named Yammer as a leader.

White Paper

ESG Lab Review: HP 3PAR Peer Motion Software

This ESG Lab review sponsored by HP + Intel documents hands-on testing of HP 3PAR Peer Motion Software's distributed volume.Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

White Paper

ESG Lab Review: HP 3PAR Peer Motion Software

This ESG Lab review documents hands-on testing of HP 3PAR Peer Motion Software's distributed volume management with a focus on federated workload balancing, asset management, and thin provisioning.Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

White Paper

Deliver Cost-Effective Business Continuity with Extreme Capacity

IBM DB2 provides application cluster transparency technology that equips organizations running OLTP applications with the ability to deliver high availability and continuous uptime for transactional data, plus the flexibility and capacity they need to remain competitive.

White Paper

What Developers Want: The End of Application Redeploys

Eliminate application restarts in Java with JRebel! JRebel is a JVM plugin that eliminates application redeploys from the Java development cycle, a process that takes over 10 minutes of coding time away from developers each working hour, according to a recent survey. Just code, refresh and see everything instantly.

See more White Papers | Webcasts

Ask a question

Ask a Question