Skip to main content

Illustrated Node Module Loading Principles

Free2020-05-11#Node#nodejs native module self-register#how nodejs native module works#nodejs js2c#Node 启动流程#Node 核心模块#Nodejs 源码分析

How exactly are these 4 types of modules supported by require loaded?

I. Module Types

Node.js supports 2 types of modules by default:

  • Core Modules: Compiled into binary, source code located in lib/ directory

  • File Modules: Including JavaScript files (.js), JSON files (.json), C++ extension files (.node)

From easy to difficult, let's first look at the JS modules we most commonly deal with

II. JS Modules

[caption id="attachment_2169" align="alignnone" width="496"]js module js module[/caption]

Note a detail: the module instance is cached before loading & executing the module file, not after. This is the fundamental reason why Node.js can calmly handle circular dependencies:

When there are circular require() calls, a module might not have finished executing when it is returned.

If circular references occur during module loading, causing an unfinished module to be referenced, according to the illustrated module loading process, it will also hit the cache (without entering infinite recursion), even though at this point module.exports may be incomplete (module code hasn't finished executing, some things haven't been attached yet)

P.S. For how to find the absolute path of the corresponding module (entry) file based on module identifier, same-name module loading priority, and related Node.js source code interpretation, see [Node Module Loading Mechanism](/articles/node 模块加载机制/)

III. JSON Modules

Similar to JS modules, JSON files can also be loaded directly as modules through require. The specific process is as follows:

[caption id="attachment_2170" align="alignnone" width="541"]json module json module[/caption]

Except for the different loading & execution methods, the loading process is completely consistent with JS modules

IV. C++ Extension Modules

Compared to JS and JSON modules, the loading process of C++ extension modules (.node) is more closely related to the C++ layer:

[caption id="attachment_2171" align="alignnone" width="532"]addon module addon module[/caption]

JS layer processing stops at process.dlopen(). Actual loading, execution, and how the properties/methods exposed by extension modules are passed into the JS runtime are all completed by the C++ layer:

[caption id="attachment_2172" align="alignnone" width="625"]addon module cpp addon module cpp[/caption]

The key is loading C++ dynamic link libraries (i.e., .node files) through dlopen()/uv_dlopen. Related Node.js source code (Node v14.0.0):

The reason why the module instance of extension modules can be obtained externally is because extension modules have a self-registration mechanism:

// When module registers
extern "C" void node_module_register(void* m) {
  struct node_module* mp = reinterpret_cast<struct node_module*>(m);

  if (mp->nm_flags & NM_F_INTERNAL) {
    mp->nm_link = modlist_internal;
    modlist_internal = mp;
  } else if (!node_is_initialized) {
    // "Linked" modules are included as part of the node project.
    // Like builtins they are registered *before* node::Init runs.
    mp->nm_flags = NM_F_LINKED;
    mp->nm_link = modlist_linked;
    modlist_linked = mp;
  } else {
    // Hang module instance on global variable, expose it
    thread_local_modpending = mp;
  }
}

// When loading module
void DLOpen(const FunctionCallbackInfo<Value>& args) {
  /* ...omit some non-critical code */
  const bool is_opened = dlib->Open();
  // After loading dynamic link library, read global variable, get module instance
  node_module* mp = thread_local_modpending;
  thread_local_modpending = nullptr;
  // Finally pass exports and module to module entry function, bring out properties/methods exposed by module
  if (mp->nm_context_register_func != nullptr) {
    mp->nm_context_register_func(exports, module, context, mp->nm_priv);
  } else if (mp->nm_register_func != nullptr) {
    mp->nm_register_func(exports, module, mp->nm_priv);
  }
}

P.S. For detailed information about C++ extension module development, compilation, and running, see [Node.js C++ Extension Beginner's Guide](/articles/node-js-c 扩展入门指南/)

V. Core Modules

Similar to C++ extension modules, most core module implementations depend on corresponding lower-level C++ modules (such as file I/O, network requests, encryption/decryption, etc.), just wrapped with JS to expose user-facing upper-layer interfaces (such as fs.writeFile, fs.writeFileSync, etc.)

Essentially they are all C++ class libraries. The main difference is that core modules are compiled into the Node.js installation package (including the upper-layer wrapped JS code, already linked into the executable at compile time), while extension modules need to be dynamically loaded at runtime

P.S. For more information about C++ dynamic link libraries and static libraries, see [Node.js C++ Extension Beginner's Guide](/articles/node-js-c 扩展入门指南/#articleHeader1)

Therefore, compared to the previous types of modules, the core module loading process is slightly more complex, divided into 4 parts:

  • (Pre-compilation phase) "Compile" JS code

  • (At startup) Load JS code

  • (At startup) Register C++ modules

  • (At runtime) Load core modules (including JS code and referenced C++ modules)

[caption id="attachment_2173" align="alignnone" width="625"]core module core module[/caption]

Among them, the more interesting parts are JS2C transformation and core C++ module registration

JS2C Transformation

Through pre-processing before compilation, the JS code part of core modules is converted into C++ files (located at ./out/Release/obj/gen/node_javascript.cc), then embedded into the executable:

NativeModule: a minimal module system used to load the JavaScript core modules found in lib/**/*.js and deps/**/*.js. All core modules are compiled into the node binary via node_javascript.cc generated by js2c.py, so they can be loaded faster without the cost of I/O. This class makes the lib/internal/*, deps/internal/* modules and internalBinding() available by default to core modules, and lets the core modules require itself via require('internal/bootstrap/loaders') even when this file is not written in CommonJS style.

(Excerpted from node/lib/internal/bootstrap/loaders.js)

The main content of the generated node_javascript.cc is as follows:

static const uint8_t internal_bootstrap_environment_raw[] = {
  39,117,115,101, 32,115,116,114,105, 99,116, 39, 59, 10, 10, 47, 47, 32, 84,104,105,115, 32,114,117,110,115, 32,110,101,
  99,101,115,115, 97,114,121, 32,112,114,101,112, 97,114, 97,116,105,111,110,115, 32,116,111, 32,112,114,101,112, 97,114
  // ...
}

void NativeModuleLoader::LoadJavaScriptSource() {
  source_.emplace("internal/bootstrap/environment", UnionBytes{internal_bootstrap_environment_raw, 374});
  source_.emplace("internal/bootstrap/loaders", UnionBytes{internal_bootstrap_loaders_raw, 10110});
  // ...
}

UnionBytes NativeModuleLoader::GetConfig() {
  return UnionBytes(config_raw, 3030);  // config.gypi
}

That is to say, LoadJavaScriptSource that can't be found by searching through source code is actually automatically generated during the pre-compilation phase:

// ref https://github.com/nodejs/node/blob/v14.0.0/src/node_native_module.cc#L24
NativeModuleLoader::NativeModuleLoader() : config_(GetConfig()) {
  // Implementation of this function is not in source code, but in compiled node_javascript.cc
  LoadJavaScriptSource();
}

Core C++ Module Registration

All C++ code that core modules depend on has a line of registration code at the end, for example:

// src/node_file.cc
NODE_MODULE_CONTEXT_AWARE_INTERNAL(fs, node::fs::Initialize)
// src/timers.cc
NODE_MODULE_CONTEXT_AWARE_INTERNAL(timers, node::Initialize)
// src/js_stream.cc
NODE_MODULE_CONTEXT_AWARE_INTERNAL(js_stream, node::JSStream::Initialize)

After NODE_MODULE_CONTEXT_AWARE_INTERNAL macro expands, it's node_module_register, recording the registered C++ modules to the modlist_internal linked list:

extern "C" void node_module_register(void* m) {
  struct node_module* mp = reinterpret_cast<struct node_module*>(m);

  if (mp->nm_flags & NM_F_INTERNAL) {
    // Record internal C++ modules
    mp->nm_link = modlist_internal;
    modlist_internal = mp;
  } else if (!node_is_initialized) {
    // "Linked" modules are included as part of the node project.
    // Like builtins they are registered *before* node::Init runs.
    mp->nm_flags = NM_F_LINKED;
    mp->nm_link = modlist_linked;
    modlist_linked = mp;
  } else {
    thread_local_modpending = mp;
  }
}

At runtime, these built-in C++ modules are loaded through internalBinding

Related Node.js source code (Node v14.0.0):

Reference Materials

Comments

No comments yet. Be the first to share your thoughts.

Leave a comment