Skip to main content

Node.js Inter-Process Communication

Free2018-02-16#Node#Node跨进程通信#Node创建进程#node ipc#node communication between processes

How to fully utilize the advantages of multi-process under multi-core/multi-machine? How to communicate across processes?

1. Scenarios

Node runs under single-thread, but this doesn't mean unable to utilize advantages of multi-process under multi-core/multi-machine

In fact, Node considered distributed network scenarios from initial design:

Node is a single-threaded, single-process system which enforces shared-nothing design with OS process boundaries. It has rather good libraries for networking. I believe this to be a basis for designing very large distributed programs. The "nodes" need to be organized: given a communication protocol, told how to connect to each other. In the next couple months we are working on libraries for Node that allow these networks.

P.S. About why Node is called Node, see Why is Node.js named Node.js?

2. Creating Processes

Communication methods are related to process generation methods, and Node has 4 ways to create processes: spawn(), exec(), execFile() and fork()

spawn

const { spawn } = require('child_process');
const child = spawn('pwd');
// Form with parameters
// const child = spawn('find', ['.', '-type', 'f']);

spawn() returns ChildProcess instance, ChildProcess is also based on event mechanism (EventEmitter API), provides some events:

  • exit: Triggers when child process exits, can know process exit status (code and signal)

  • disconnect: Triggers when parent process calls child.disconnect()

  • error: Triggers when child process creation fails, or is killed

  • close: Triggers when child process's stdio streams (standard input/output streams) close

  • message: Triggers when child process sends message via process.send(), parent and child processes can communicate through this built-in message mechanism

Can access child process's stdio streams through child.stdin, child.stdout and child.stderr, when these streams are closed, child process triggers close event

P.S. Difference between close and exit mainly reflects in scenarios where multiple processes share same stdio stream, one process exiting doesn't mean stdio stream is closed

In child process, stdout/stderr have Readable characteristics, while stdin has Writable characteristics, opposite to main process situation:

child.stdout.on('data', (data) => {
  console.log(`child stdout:\n${data}`);
});

child.stderr.on('data', (data) => {
  console.error(`child stderr:\n${data}`);
});

Using process stdio stream pipe characteristics, can complete more complex things, for example:

const { spawn } = require('child_process');

const find = spawn('find', ['.', '-type', 'f']);
const wc = spawn('wc', ['-l']);

find.stdout.pipe(wc.stdin);

wc.stdout.on('data', (data) => {
  console.log(`Number of files ${data}`);
});

Effect equivalent to find . -type f | wc -l, recursively count current directory file count

IPC Options

Additionally, through spawn() method's stdio option can establish IPC mechanism:

const { spawn } = require('child_process');

const child = spawn('node', ['./ipc-child.js'], { stdio: [null, null, null, 'ipc'] });
child.on('message', (m) => {
  console.log(m);
});
child.send('Here Here');

// ./ipc-child.js
process.on('message', (m) => {
  process.send(`< ${m}`);
  process.send('> 不要回答 x3');
});

For detailed information about spawn() IPC options, please see options.stdio

exec

spawn() method by default doesn't create shell to execute passed commands (so slightly better on performance), while exec() method will create a shell. Additionally, exec() is not based on stream, but temporarily stores execution results of passed commands in buffer, then passes entirely to callback function

exec() method's characteristic is fully supports shell syntax, can directly pass in any shell script, for example:

const { exec } = require('child_process');

exec('find . -type f | wc -l', (err, stdout, stderr) => {
  if (err) {
    console.error(`exec error: ${err}`);
    return;
  }

  console.log(`Number of files ${stdout}`);
});

But exec() method therefore also has command injection security risk, need special attention in scenarios containing user input and other dynamic content. So, exec() method's applicable scenarios are: hope to directly use shell syntax, and expected output data volume is not large (no memory pressure)

So, is there a way that both supports shell syntax and has stream IO advantages?

Yes. Best of both worlds approach as follows:

const { spawn } = require('child_process');
const child = spawn('find . -type f | wc -l', {
  shell: true
});
child.stdout.pipe(process.stdout);

Enable spawn()'s shell option, and simply connect child process's standard output to current process's standard input through pipe() method, to see command execution results. Actually there's even easier way:

const { spawn } = require('child_process');
process.stdout.on('data', (data) => {
  console.log(data);
});
const child = spawn('find . -type f | wc -l', {
  shell: true,
  stdio: 'inherit'
});

stdio: 'inherit' allows child process to inherit current process's standard input/output (sharing stdin, stdout and stderr), so above example can get child process's output results by listening to current process process.stdout's data event

Additionally, besides stdio and shell options, spawn() also supports some other options, such as:

const child = spawn('find . -type f | wc -l', {
  stdio: 'inherit',
  shell: true,
  // Modify environment variables, default process.env
  env: { HOME: '/tmp/xxx' },
  // Change current working directory
  cwd: '/tmp',
  // Exist as independent process
  detached: true
});

Note, besides passing data to child process in environment variable form, env option can also be used to implement sandbox-style environment variable isolation, by default takes process.env as child process's environment variable set, child process can access all environment variables same as current process, if specifying custom object as child process's environment variable set like in above example, child process cannot access other environment variables

So, if wanting to add/remove environment variables, need to do like this:

var spawn_env = JSON.parse(JSON.stringify(process.env));

// remove those env vars
delete spawn_env.ATOM_SHELL_INTERNAL_RUN_AS_NODE;
delete spawn_env.ELECTRON_RUN_AS_NODE;

var sp = spawn(command, ['.'], {cwd: cwd, env: spawn_env});

detached option is more interesting:

const { spawn } = require('child_process');

const child = spawn('node', ['stuff.js'], {
  detached: true,
  stdio: 'ignore'
});

child.unref();

Independent processes created this way behave depending on operating system, on Windows detached child processes will have their own console window, while on Linux the process will create new process group (this characteristic can be used to manage child process families, implement characteristics similar to tree-kill)

unref() method is used to sever relationship, so "parent" process can exit independently (won't cause child process to exit along with it), but need to note that at this time child process's stdio should also be independent from "parent" process, otherwise child process will still be affected after "parent" process exits

execFile

const { execFile } = require('child_process');
const child = execFile('node', ['--version'], (error, stdout, stderr) => {
  if (error) {
    throw error;
  }
  console.log(stdout);
});

Similar to exec() method, but doesn't execute through shell (so slightly better on performance), so requires passing in executable files. Under Windows some files cannot be directly executed, such as .bat and .cmd, these files cannot be executed with execFile(), can only use exec() or spawn() with shell option enabled

P.S. Same as exec(), also not based on stream, also has output data volume risk

xxxSync

spawn, exec and execFile all have corresponding synchronous blocking versions, wait until child process exits

const { 
  spawnSync, 
  execSync, 
  execFileSync,
} = require('child_process');

Synchronous methods are used to simplify script tasks, such as startup processes, should avoid using these methods at other times

fork

fork() is a variant of spawn(), used to create Node processes, biggest characteristic is parent and child processes come with communication mechanism (IPC pipeline):

The child_process.fork() method is a special case of child_process.spawn() used specifically to spawn new Node.js processes. Like child_process.spawn(), a ChildProcess object is returned. The returned ChildProcess will have an additional communication channel built-in that allows messages to be passed back and forth between the parent and child. See subprocess.send() for details.

For example:

var n = child_process.fork('./child.js');
n.on('message', function(m) {
  console.log('PARENT got message:', m);
});
n.send({ hello: 'world' });

// ./child.js
process.on('message', function(m) {
  console.log('CHILD got message:', m);
});
process.send({ foo: 'bar' });

Because fork() comes with communication mechanism advantage, especially suitable for splitting time-consuming logic, for example:

const http = require('http');
const longComputation = () => {
  let sum = 0;
  for (let i = 0; i < 1e9; i++) {
    sum += i;
  };
  return sum;
};
const server = http.createServer();
server.on('request', (req, res) => {
  if (req.url === '/compute') {
    const sum = longComputation();
    return res.end(`Sum is ${sum}`);
  } else {
    res.end('Ok')
  }
});

server.listen(3000);

Fatal problem with this approach is once someone visits /compute, subsequent requests cannot be processed in time, because event loop is still blocked by longComputation, can only recover service capability until time-consuming calculation completes

To avoid time-consuming operations blocking main process's event loop, can split longComputation() to child process:

// compute.js
const longComputation = () => {
  let sum = 0;
  for (let i = 0; i < 1e9; i++) {
    sum += i;
  };
  return sum;
};

// Switch, starts working after receiving message
process.on('message', (msg) => {
  const sum = longComputation();
  process.send(sum);
});

Main process opens child process to execute longComputation:

const http = require('http');
const { fork } = require('child_process');

const server = http.createServer();

server.on('request', (req, res) => {
  if (req.url === '/compute') {
    const compute = fork('compute.js');
    compute.send('start');
    compute.on('message', sum => {
      res.end(`Sum is ${sum}`);
    });
  } else {
    res.end('Ok')
  }
});

server.listen(3000);

Main process's event loop will no longer be blocked by time-consuming calculation, but process count still needs further limitation, otherwise service capability will still be affected when resources are exhausted by processes

P.S. Actually, cluster module is encapsulation of multi-process service capability, approach similar to this simple example

3. Communication Methods

1. Pass json through stdin/stdout

stdin/stdout and a JSON payload

Most direct communication method, after getting child process's handle, can access its stdio streams, then agree on a message format to start happily communicating:

const { spawn } = require('child_process');

child = spawn('node', ['./stdio-child.js']);
child.stdout.setEncoding('utf8');
// Parent process-send
child.stdin.write(JSON.stringify({
  type: 'handshake',
  payload: '你好吖'
}));
// Parent process-receive
child.stdout.on('data', function (chunk) {
  let data = chunk.toString();
  let message = JSON.parse(data);
  console.log(`${message.type} ${message.payload}`);
});

Child process similar:

// ./stdio-child.js
// Child process-receive
process.stdin.on('data', (chunk) => {
  let data = chunk.toString();
  let message = JSON.parse(data);
  switch (message.type) {
    case 'handshake':
      // Child process-send
      process.stdout.write(JSON.stringify({
        type: 'message',
        payload: message.payload + ' : hoho'
      }));
      break;
    default:
      break;
  }
});

P.S. VS Code inter-process communication adopts this method, specifically see access electron API from vscode extension

Obvious limitation is need to get "child" process's handle, two completely independent processes cannot communicate through this method (such as cross-application, even cross-machine scenarios)

P.S. For detailed information about stream and pipe, please see Streams in Node

2. Native IPC Support

As spawn() and fork() examples, processes can communicate using built-in IPC mechanism

Parent process:

  • process.on('message') receive

  • child.send() send

Child process:

  • process.on('message') receive

  • process.send() send

Limitations same as above, also need one party to be able to get other party's handle

3. sockets

Use network to complete inter-process communication, not only can cross processes, but also cross machines

node-ipc adopts this approach, for example:

// server
const ipc=require('../../../node-ipc');

ipc.config.id = 'world';
ipc.config.retry= 1500;
ipc.config.maxConnections=1;

ipc.serveNet(
    function(){
        ipc.server.on(
            'message',
            function(data,socket){
                ipc.log('got a message : ', data);
                ipc.server.emit(
                    socket,
                    'message',
                    data+' world!'
                );
            }
        );

        ipc.server.on(
            'socket.disconnected',
            function(data,socket){
                console.log('DISCONNECTED\n\n',arguments);
            }
        );
    }
);
ipc.server.on(
    'error',
    function(err){
        ipc.log('Got an ERROR!',err);
    }
);
ipc.server.start();

// client
const ipc=require('node-ipc');

ipc.config.id = 'hello';
ipc.config.retry= 1500;

ipc.connectToNet(
    'world',
    function(){
        ipc.of.world.on(
            'connect',
            function(){
                ipc.log('## connected to world ##', ipc.config.delay);
                ipc.of.world.emit(
                    'message',
                    'hello'
                );
            }
        );
        ipc.of.world.on(
            'disconnect',
            function(){
                ipc.log('disconnected from world');
            }
        );
        ipc.of.world.on(
            'message',
            function(data){
                ipc.log('got a message from world : ', data);
            }
        );
    }
);

P.S. More examples see RIAEvangelist/node-ipc

Of course, under single-machine scenarios completing inter-process communication through network is somewhat wasteful of performance, but network communication's advantage lies in cross-environment compatibility and further RPC scenarios

4. message queue

Both parent and child processes communicate through external message mechanism, cross-process capability depends on MQ support

That is processes don't directly communicate, but through intermediate layer (MQ), adding a control layer can gain more flexibility and advantages:

  • Stability: Message mechanism provides powerful stability guarantees, such as confirmed delivery (message acknowledgment ACK), failure retry/prevent multiple sends, etc.

  • Priority control: Allows adjusting message response order

  • Offline capability: Messages can be cached

  • Transactional message processing: Combine related messages into transactions, guarantee their delivery order and completeness

P.S. Hard to implement? Can wrapping one layer solve it, if not then wrap two layers...

More popular ones include smrchy/rsmq, for example:

// init
RedisSMQ = require("rsmq");
rsmq = new RedisSMQ( {host: "127.0.0.1", port: 6379, ns: "rsmq"} );
// create queue
rsmq.createQueue({qname:"myqueue"}, function (err, resp) {
    if (resp===1) {
      console.log("queue created")
    }
});
// send message
rsmq.sendMessage({qname:"myqueue", message:"Hello World"}, function (err, resp) {
  if (resp) {
    console.log("Message sent. ID:", resp);
  }
});
// receive message
rsmq.receiveMessage({qname:"myqueue"}, function (err, resp) {
  if (resp.id) {
    console.log("Message received.", resp)	
  }
  else {
    console.log("No messages for me...")
  }
});

Will start a Redis server, basic principle as follows:

Using a shared Redis server multiple Node.js processes can send / receive messages.

Message receive/send/cache/persist relies on capabilities provided by Redis, implement complete queue mechanism on this basis

5. Redis

Basic approach similar to message queue:

Use Redis as a message bus/broker.

Redis comes with Pub/Sub mechanism (i.e., publish-subscribe pattern), suitable for simple communication scenarios, such as one-to-one or one-to-many and not concerned about message reliability scenarios

Additionally, Redis has list structure, can be used as message queue, thereby improving message reliability. General approach is producer LPUSH messages, consumer BRPOP messages. Suitable for simple communication scenarios requiring message reliability, but disadvantage is messages don't have state, and no ACK mechanism, cannot satisfy complex communication requirements

P.S. Redis Pub/Sub example see What's the most efficient node.js inter-process communication library/method?

4. Summary

Node inter-process communication has 4 methods:

  • Pass json through stdin/stdout: Most direct method, suitable for scenarios where can get "child" process's handle, suitable for communication between related processes, cannot cross machines

  • Node native IPC support: Most native (authentic?) method, slightly more "formal" than above one, has same limitations

  • Through sockets: Most universal method, has good cross-environment capability, but has network performance loss

  • Using message queue: Most powerful method, since communicating and scenarios are complex, might as well extend a message middleware layer, beautifully solve various communication problems

References

Comments

No comments yet. Be the first to share your thoughts.

Leave a comment