1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
|
node-walk
====
| Sponsored by [ppl](https://ppl.family)
nodejs walk implementation.
This is somewhat of a port python's `os.walk`, but using Node.JS conventions.
* EventEmitter
* Asynchronous
* Chronological (optionally)
* Built-in flow-control
* includes Synchronous version (same API as Asynchronous)
As few file descriptors are opened at a time as possible.
This is particularly well suited for single hard disks which are not flash or solid state.
Installation
----
```bash
npm install --save walk
```
Getting Started
====
```javascript
(function () {
"use strict";
var walk = require('walk');
var fs = require('fs');
var walker;
walker = walk.walk("/tmp", options);
walker.on("file", function (root, fileStats, next) {
fs.readFile(fileStats.name, function () {
// doStuff
next();
});
});
walker.on("errors", function (root, nodeStatsArray, next) {
next();
});
walker.on("end", function () {
console.log("all done");
});
}());
```
Common Events
-----
All single event callbacks are in the form of `function (root, stat, next) {}`.
All multiple event callbacks callbacks are in the form of `function (root, stats, next) {}`, except **names** which is an array of strings.
All **error** event callbacks are in the form `function (root, stat/stats, next) {}`.
**`stat.error`** contains the error.
* `names`
* `directory`
* `directories`
* `file`
* `files`
* `end`
* `nodeError` (`stat` failed)
* `directoryError` (`stat` succedded, but `readdir` failed)
* `errors` (a collection of any errors encountered)
A typical `stat` event looks like this:
```javascript
{ dev: 16777223,
mode: 33188,
nlink: 1,
uid: 501,
gid: 20,
rdev: 0,
blksize: 4096,
ino: 49868100,
size: 5617,
blocks: 16,
atime: Mon Jan 05 2015 18:18:10 GMT-0700 (MST),
mtime: Thu Sep 25 2014 21:21:28 GMT-0600 (MDT),
ctime: Thu Sep 25 2014 21:21:28 GMT-0600 (MDT),
birthtime: Thu Sep 25 2014 21:21:28 GMT-0600 (MDT),
name: 'README.md',
type: 'file' }
```
Advanced Example
====
Both Asynchronous and Synchronous versions are provided.
```javascript
(function () {
"use strict";
var walk = require('walk');
var fs = require('fs');
var options;
var walker;
options = {
followLinks: false
// directories with these keys will be skipped
, filters: ["Temp", "_Temp"]
};
walker = walk.walk("/tmp", options);
// OR
// walker = walk.walkSync("/tmp", options);
walker.on("names", function (root, nodeNamesArray) {
nodeNamesArray.sort(function (a, b) {
if (a > b) return 1;
if (a < b) return -1;
return 0;
});
});
walker.on("directories", function (root, dirStatsArray, next) {
// dirStatsArray is an array of `stat` objects with the additional attributes
// * type
// * error
// * name
next();
});
walker.on("file", function (root, fileStats, next) {
fs.readFile(fileStats.name, function () {
// doStuff
next();
});
});
walker.on("errors", function (root, nodeStatsArray, next) {
next();
});
walker.on("end", function () {
console.log("all done");
});
}());
```
### Sync
Note: You **can't use EventEmitter** if you want truly synchronous walker
(although it's synchronous under the hood, it appears not to be due to the use of `process.nextTick()`).
Instead **you must use `options.listeners`** for truly synchronous walker.
Although the sync version uses all of the `fs.readSync`, `fs.readdirSync`, and other sync methods,
I don't think I can prevent the `process.nextTick()` that `EventEmitter` calls.
```javascript
(function () {
"use strict";
var walk = require('walk');
var fs = require('fs');
var options;
var walker;
// To be truly synchronous in the emitter and maintain a compatible api,
// the listeners must be listed before the object is created
options = {
listeners: {
names: function (root, nodeNamesArray) {
nodeNamesArray.sort(function (a, b) {
if (a > b) return 1;
if (a < b) return -1;
return 0;
});
}
, directories: function (root, dirStatsArray, next) {
// dirStatsArray is an array of `stat` objects with the additional attributes
// * type
// * error
// * name
next();
}
, file: function (root, fileStats, next) {
fs.readFile(fileStats.name, function () {
// doStuff
next();
});
}
, errors: function (root, nodeStatsArray, next) {
next();
}
}
};
walker = walk.walkSync("/tmp", options);
console.log("all done");
}());
```
API
====
Emitted Values
* `on('XYZ', function(root, stats, next) {})`
* `root` - the containing the files to be inspected
* *stats[Array]* - a single `stats` object or an array with some added attributes
* type - 'file', 'directory', etc
* error
* name - the name of the file, dir, etc
* next - no more files will be read until this is called
Single Events - fired immediately
* `end` - No files, dirs, etc left to inspect
* `directoryError` - Error when `fstat` succeeded, but reading path failed (Probably due to permissions).
* `nodeError` - Error `fstat` did not succeeded.
* `node` - a `stats` object for a node of any type
* `file` - includes links when `followLinks` is `true`
* `directory` - **NOTE** you could get a recursive loop if `followLinks` and a directory links to its parent
* `symbolicLink` - always empty when `followLinks` is `true`
* `blockDevice`
* `characterDevice`
* `FIFO`
* `socket`
Events with Array Arguments - fired after all files in the dir have been `stat`ed
* `names` - before any `stat` takes place. Useful for sorting and filtering.
* Note: the array is an array of `string`s, not `stat` objects
* Note: the `next` argument is a `noop`
* `errors` - errors encountered by `fs.stat` when reading ndes in a directory
* `nodes` - an array of `stats` of any type
* `files`
* `directories` - modification of this array - sorting, removing, etc - affects traversal
* `symbolicLinks`
* `blockDevices`
* `characterDevices`
* `FIFOs`
* `sockets`
**Warning** beware of infinite loops when `followLinks` is true (using `walk-recurse` varient).
Comparisons
====
Tested on my `/System` containing 59,490 (+ self) directories (and lots of files).
The size of the text output was 6mb.
`find`:
time bash -c "find /System -type d | wc"
59491 97935 6262916
real 2m27.114s
user 0m1.193s
sys 0m14.859s
`find.js`:
Note that `find.js` omits the start directory
time bash -c "node examples/find.js /System -type d | wc"
59490 97934 6262908
# Test 1
real 2m52.273s
user 0m20.374s
sys 0m27.800s
# Test 2
real 2m23.725s
user 0m18.019s
sys 0m23.202s
# Test 3
real 2m50.077s
user 0m17.661s
sys 0m24.008s
In conclusion node.js asynchronous walk is much slower than regular "find".
LICENSE
===
`node-walk` is available under the following licenses:
* MIT
* Apache 2
Copyright 2011 - Present AJ ONeal
|