Skip to main content

Node.js Generate File Stream from String

Free2019-08-17#Node#nodejs create read stream from string#nodejs implements file stream#nodejs create filestream from string#Nodejs从字符串创建文件流

In Node.js, how to create a fake-but-real FileStream out of thin air?

I. Background

In scenarios like file-related data processing, we often face the problem of how to handle generated physical files, such as:

  • Where to put the generated file, does the path exist?

  • When to clean up temporary files, how to resolve naming conflicts to prevent overwriting?

  • How to guarantee read/write order in concurrent scenarios?

  • ...

For these problems brought by reading/writing physical files, the best solution is not to write files. However, in some scenarios, it's not so easy to avoid writing files, such as file upload.

II. Problem

File upload is generally implemented through form submission, for example:

var FormData = require('form-data');
var fs = require('fs');

var form = new FormData();
form.append('my_file', fs.createReadStream('/foo/bar.jpg'));
form.submit('example.org/upload', function(err, res) {
  console.log(res.statusCode);
});

(Excerpted from Form-Data)

If you don't want to write physical files, you can do this:

const FormData = require('form-data');

const filename = 'my-file.txt';
const content = 'balalalalala...变身';

const formData = new FormData();
// 1.先将字符串转换成 Buffer
const fileContent = Buffer.from(content);
// 2.补上文件 meta 信息
formData.append('file', fileContent, {
  filename,
  contentType: 'text/plain',
  knownLength: fileContent.byteLength
});

That is to say, file streams can provide not only data, but also have some meta information, such as filename, file path, etc., and this information is not available in ordinary Streams. So, is there a way to create a "real" file stream out of thin air?

III. Ideas

To create a "real" file stream, there are at least 2 approaches, positive and negative:

  • Add file-related meta information to ordinary streams

  • First get a real file stream, then change its data and meta information

Obviously, the former is more flexible, and can be implemented to be completely file-independent.

Production Process of File Stream

Following the idea of creating out of thin air, after exploring the internal implementation of fs.createReadStream API, the key processes of producing file stream are as follows:

function ReadStream(path, options) {
  // 1.打开 path 指定的文件
  if (typeof this.fd !== 'number')
    this.open();
}

ReadStream.prototype.open = function() {
  fs.open(this.path, this.flags, this.mode, (er, fd) => {
    // 2.拿到文件描述符并持有
    this.fd = fd;
    this.emit('open', fd);
    this.emit('ready');
    // 3.开始流式读取数据
    // read 来自父类 Readable,主要调用内部方法_read
    // ref: https://github.com/nodejs/node/blob/v10.16.3/lib/_stream_readable.js#L390
    this.read();
  });
};

ReadStream.prototype._read = function(n) {
  // 4.从文件中读取一个 chunk
  fs.read(this.fd, pool, pool.used, toRead, this.pos, (er, bytesRead) => {
    let b = null;
    if (bytesRead > 0) {
      this.bytesRead += bytesRead;
      b = thisPool.slice(start, start + bytesRead);
    }
    // 5.(通过触发 data 事件)吐出一个 chunk,如果还有数据,process.nextTick 再次 this.read,直至 this.push(null) 触发'end'事件
    // ref: https://github.com/nodejs/node/blob/v10.16.3/lib/_stream_readable.js#L207
    this.push(b);
  });
};

P.S. Step 5 is relatively complex, this.push(buffer) can both trigger reading of the next chunk (this.read()), and trigger 'end' event after data is fully read (through this.push(null)). See details at node/lib/_stream_readable.js

Reimplementing File Stream

Since we've figured out the production process of file stream, the next step is naturally to replace all file operations, until the file stream implementation is completely file-independent, for example:

// 从文件中读取一个 chunk
fs.read(this.fd, pool, pool.used, toRead, this.pos, (er, bytesRead) => {
  /* ... */
});

// 换成
this._fakeReadFile(this.fd, pool, pool.used, toRead, this.pos, (bytesRead) => {
  /* ... */
});

// 从输入字符串对应的 Buffer 中 copy 出一个 chunk
ReadStream.prototype._fakeReadFile = function(_, buffer, offset, length, position, cb) {
  position = position || this.input._position;
  // fake read file async
  setTimeout(() => {
    let bytesRead = 0;
    if (position < this.input.byteLength) {
      bytesRead = this.input.copy(buffer, offset, position, position + length);
      this.input._position += bytesRead;
    }
    cb(bytesRead);
  }, 0);
}

That is, remove file operations from it, replace them with string-based operations.

IV. Solution

Thus, we have ayqy/string-to-file-stream, used to create file streams out of thin air:

string2fileStream('string-content') === fs.createReadStream(/* path to a text file with content 'string-content' */)`

For example:

const string2fileStream = require('string-to-file-stream');

const input = 'Oh, my great data!';
const s = string2fileStream(input);
s.on('data', (chunk) => {
  assert.equal(chunk.toString(), input);
});

The generated stream can also have file meta information:

const string2fileStream = require('string-to-file-stream');
const FormData = require('form-data');

const formData = new FormData();
formData.append('filetoupload', string2fileStream('my-string-data', { path: 'no-this-file.txt' }));
form.submit('http://127.0.0.1:8123/fileupload', function(err, res) {
  console.log(res.statusCode);
});

Good enough to pass as real.

References

Comments

No comments yet. Be the first to share your thoughts.

Leave a comment