node-walk
====

| Sponsored by [ppl](https://ppl.family)

nodejs walk implementation.

This is somewhat of a port python's `os.walk`, but using Node.JS conventions.

  * EventEmitter
  * Asynchronous
  * Chronological (optionally)
  * Built-in flow-control
  * includes Synchronous version (same API as Asynchronous)

As few file descriptors are opened at a time as possible.
This is particularly well suited for single hard disks which are not flash or solid state.

Installation
----

```bash
npm install --save walk
```

Getting Started
====

```javascript
(function () {
  "use strict";

  var walk = require('walk');
  var fs = require('fs');
  var walker;

  walker = walk.walk("/tmp", options);

  walker.on("file", function (root, fileStats, next) {
    fs.readFile(fileStats.name, function () {
      // doStuff
      next();
    });
  });

  walker.on("errors", function (root, nodeStatsArray, next) {
    next();
  });

  walker.on("end", function () {
    console.log("all done");
  });
}());
```

Common Events
-----

All single event callbacks are in the form of `function (root, stat, next) {}`.

All multiple event callbacks callbacks are in the form of `function (root, stats, next) {}`, except **names** which is an array of strings.

All **error** event callbacks are in the form `function (root, stat/stats, next) {}`.
**`stat.error`** contains the error.

* `names`
* `directory`
* `directories`
* `file`
* `files`
* `end`
* `nodeError` (`stat` failed)
* `directoryError` (`stat` succedded, but `readdir` failed)
* `errors` (a collection of any errors encountered)


A typical `stat` event looks like this:

```javascript
{ dev: 16777223,
  mode: 33188,
  nlink: 1,
  uid: 501,
  gid: 20,
  rdev: 0,
  blksize: 4096,
  ino: 49868100,
  size: 5617,
  blocks: 16,
  atime: Mon Jan 05 2015 18:18:10 GMT-0700 (MST),
  mtime: Thu Sep 25 2014 21:21:28 GMT-0600 (MDT),
  ctime: Thu Sep 25 2014 21:21:28 GMT-0600 (MDT),
  birthtime: Thu Sep 25 2014 21:21:28 GMT-0600 (MDT),
  name: 'README.md',
  type: 'file' }
```

Advanced Example
====

Both Asynchronous and Synchronous versions are provided.

```javascript
(function () {
  "use strict";

  var walk = require('walk');
  var fs = require('fs');
  var options;
  var walker;

  options = {
    followLinks: false
    // directories with these keys will be skipped
  , filters: ["Temp", "_Temp"]
  };

  walker = walk.walk("/tmp", options);

  // OR
  // walker = walk.walkSync("/tmp", options);

  walker.on("names", function (root, nodeNamesArray) {
    nodeNamesArray.sort(function (a, b) {
      if (a > b) return 1;
      if (a < b) return -1;
      return 0;
    });
  });

  walker.on("directories", function (root, dirStatsArray, next) {
    // dirStatsArray is an array of `stat` objects with the additional attributes
    // * type
    // * error
    // * name

    next();
  });

  walker.on("file", function (root, fileStats, next) {
    fs.readFile(fileStats.name, function () {
      // doStuff
      next();
    });
  });

  walker.on("errors", function (root, nodeStatsArray, next) {
    next();
  });

  walker.on("end", function () {
    console.log("all done");
  });
}());
```

### Sync

Note: You **can't use EventEmitter** if you want truly synchronous walker
(although it's synchronous under the hood, it appears not to be due to the use of `process.nextTick()`).

Instead **you must use `options.listeners`** for truly synchronous walker.

Although the sync version uses all of the `fs.readSync`, `fs.readdirSync`, and other sync methods,
I don't think I can prevent the `process.nextTick()` that `EventEmitter` calls.

```javascript
(function () {
  "use strict";

  var walk = require('walk');
  var fs = require('fs');
  var options;
  var walker;

  // To be truly synchronous in the emitter and maintain a compatible api,
  // the listeners must be listed before the object is created
  options = {
    listeners: {
      names: function (root, nodeNamesArray) {
        nodeNamesArray.sort(function (a, b) {
          if (a > b) return 1;
          if (a < b) return -1;
          return 0;
        });
      }
    , directories: function (root, dirStatsArray, next) {
        // dirStatsArray is an array of `stat` objects with the additional attributes
        // * type
        // * error
        // * name

        next();
      }
    , file: function (root, fileStats, next) {
        fs.readFile(fileStats.name, function () {
          // doStuff
          next();
        });
      }
    , errors: function (root, nodeStatsArray, next) {
        next();
      }
    }
  };

  walker = walk.walkSync("/tmp", options);

  console.log("all done");
}());
```

API
====

Emitted Values

  * `on('XYZ', function(root, stats, next) {})`

  * `root` - the containing the files to be inspected
  * *stats[Array]* - a single `stats` object or an array with some added attributes
    * type - 'file', 'directory', etc
    * error
    * name - the name of the file, dir, etc
  * next - no more files will be read until this is called

Single Events - fired immediately

  * `end` - No files, dirs, etc left to inspect

  * `directoryError` - Error when `fstat` succeeded, but reading path failed (Probably due to permissions).
  * `nodeError` - Error `fstat` did not succeeded.
  * `node` - a `stats` object for a node of any type
  * `file` - includes links when `followLinks` is `true`
  * `directory` - **NOTE** you could get a recursive loop if `followLinks` and a directory links to its parent
  * `symbolicLink` - always empty when `followLinks` is `true`
  * `blockDevice`
  * `characterDevice`
  * `FIFO`
  * `socket`

Events with Array Arguments - fired after all files in the dir have been `stat`ed

  * `names` - before any `stat` takes place. Useful for sorting and filtering.
    * Note: the array is an array of `string`s, not `stat` objects
    * Note: the `next` argument is a `noop`

  * `errors` - errors encountered by `fs.stat` when reading ndes in a directory
  * `nodes` - an array of `stats` of any type
  * `files`
  * `directories` - modification of this array - sorting, removing, etc - affects traversal
  * `symbolicLinks`
  * `blockDevices`
  * `characterDevices`
  * `FIFOs`
  * `sockets`

**Warning** beware of infinite loops when `followLinks` is true (using `walk-recurse` varient).

Comparisons
====

Tested on my `/System` containing 59,490 (+ self) directories (and lots of files).
The size of the text output was 6mb.

`find`:
    time bash -c "find /System -type d | wc"
    59491   97935 6262916

    real  2m27.114s
    user  0m1.193s
    sys 0m14.859s

`find.js`:

Note that `find.js` omits the start directory

    time bash -c "node examples/find.js /System -type d | wc"
    59490   97934 6262908

    # Test 1
    real  2m52.273s
    user  0m20.374s
    sys 0m27.800s

    # Test 2
    real  2m23.725s
    user  0m18.019s
    sys 0m23.202s

    # Test 3
    real  2m50.077s
    user  0m17.661s
    sys 0m24.008s

In conclusion node.js asynchronous walk is much slower than regular "find".

LICENSE
===

`node-walk` is available under the following licenses:

  * MIT
  * Apache 2

Copyright 2011 - Present AJ ONeal