I. Background
We often face scenarios where we want to split a large codebase (repo) into multiple smaller repos, for example:
-
The existing codebase is voluminous and module management is chaotic, making it easy to accidentally modify others' code
-
A certain module needs to be built separately, such as the React pilot in a jQuery project, the pure frontend part in a Node project, the UI part in an Electron project, etc.
-
A certain module is a black-box dependency, where only the built version is needed during development, such as framework libraries, etc.
For such situations, there are generally 3 solutions:
-
npm package: Extract the dependency as an npm package, with the codebase becoming independent
-
monorepo: No problem if a single repo is voluminous, just manage modules well
-
git submodules: Split dependencies into multiple independent repos as submodules of the main repo
npm package
The advantage of npm package lies in its mature dependency management mechanism, which is standardized and easy to use. The disadvantage is that the main project can only get updates of independent modules through package version numbers, which becomes very troublesome in scenarios where the main project needs to coordinate debugging with submodules:
Main project: Can't get it to work
Submodule: There's an issue, let me fix it... change version number - build - publish npm package
Main project: Update dependency, try again... still doesn't work
Submodule: Still some issues...
Frequent releases are stupid; you can modify and build locally then copy it over, but it's still troublesome. Of course, you can usually decouple debugging dependencies by mocking interfaces or data, but sometimes mocking the full set of APIs is costly, and fakes certainly aren't as good as the real thing.
monorepo
Monorepo advocates not splitting repos, but rather managing the build processes, version numbers, etc. of various modules uniformly within a single repo, and encourages modifying others' code.
This is suitable for projects with clear module boundaries and clear ownership (such as React, Babel, etc.), but in practical applications, business repos rarely maintain clear module boundaries and dependency relationships, at which point monorepo becomes idealistic.
git submodules
Git submodules provides a dependency management mechanism similar to npm package, including functions for adding, removing, and updating dependencies. The difference is that the former manages the source code of submodules, while the latter manages the build artifacts of submodules. In this regard, git submodules is consistent with monorepo (both care about the source code of submodules).
This eliminates the trouble when the main project needs to frequently coordinate debugging with submodules, because the submodules pulled by the main project are complete repos that can be directly modified - built - committed.
II. Submodules vs Monorepo
From a structural perspective, the main repo of a submodules project is very similar to monorepo, equivalent to extracting each module in monorepo into independent repos, only recording the version numbers of modules that the main repo depends on (in commit hash form).
Specifically, monorepo stores all submodule source code in a single repo (packages/xxx/src), for example:
react/
packages/
react-dom/
/src
react-reconciler/
/src
...
While submodules only stores "indexes" of all submodules in the main repo (repo url + branch name + commit hash), for example:
# .gitmodules file of main repo
[submodule "react-dom"]
path = packages/react-dom
url = https://github.com/facebook/react.git
branch = master
[submodule "react-reconciler"]
path = packages/react-reconciler
url = https://github.com/facebook/react.git
branch = master
...
The main repo only keeps corresponding empty directories as "slots" for submodules, without storing their source code:
react/
packages/
react-dom/ # empty directory
react-reconciler/ # empty directory
After pulling all submodule dependencies, the actual directory structure is as follows:
react/
packages/
react-dom/
/src
react-reconciler/
/src
...
The main repo does not track submodule source code, only records their version numbers (in commit hash form):
# Output is empty, indicating submodule src is not tracked
$ git ls-tree -r master | grep packages/react-dom/src
# View submodule slots tracked by git in main repo
$ git ls-tree -r master | grep ' commit'
160000 commit 3edf340cee50fd4bc918a0a95b438a30447ae042 packages/react-dom
160000 commit 373f207b09a7bf900fa82c3188aeefdc9ce6146c packages/react-reconciler
...
P.S. For the meaning of git ls-tree output format, see Output Format
III. Specific Usage
The git submodule command is used to manage submodules:
$ git submodule --help
git-submodule - Initialize, update or inspect submodules
# Initialize
git submodule init
# Add
git submodule add
# Remove
git submodule deinit
# Modify (version control)
git submodule update
Adding Submodule Dependencies
$ cd ./react
# Add dependency
$ git submodule add -b master https://github.com/path-to/react-dom.git src/packages/react-dom
This creates an empty directory src/packages/react-dom in the main repo as a slot for the submodule. Actually, the add process mainly involves 3 things:
-
Clone a submodule repo to the main repo's git cache directory, such as
.git/modules/src/packages/react-dom -
Create an empty slot directory and associate it with the latest commit hash of the submodule repo
-
Create a
.gitmodulesfile in the main repo root directory as needed, recording the submodule repo address (url), branch name (branch), and slot path (path)
Then commit these submodule configurations:
$ git add ./src/packages/react-dom ./.gitmodules
$ git commit -m "build: add react-dom submodule"
$ git push origin master
Next, pull submodules locally to complete initialization:
# Initialize submodules
$ git submodule update --init
This clones the submodule repo to the src/packages/react-dom directory. Actually, 2 things happen:
-
Check if the cache has a cloned submodule repo (for example, if the cloned main repo hasn't
added it, there's no cache), clone as needed -
Create
.git/configin the submodule repo root directory, recording its repo address (url)
Initializing Submodules
After cloning a repo containing submodules, initialization is required:
# Create some local configurations
$ git submodule init
# Pull each submodule repo
$ git submodule update --init
You can also complete the above two steps through the --recursive option when cloning the main repo:
$ git clone git://gihub.com/path-to/main-repo.git --recursive
Pulling Submodule Updates
Update all submodules:
$ git submodule update --remote
This pulls the latest code of the submodule's corresponding branch. If there are updates, the git status of the slot directory will change:
$ git status
modified: src/packages/react-dom (new commits)
Actually, the commit hash has changed:
$ git diff
diff --git a/src/packages/react-dom b/src/packages/react-dom
index 3edf340cee..d056efbc62 160000
--- a/src/packages/react-dom
+++ b/src/packages/react-dom
@@ -1 +1 @@
-Subproject commit 3edf340cee50fd4bc918a0a95b438a30447ae042
+Subproject commit d056efbc62cbf976b4ef83e70d7019fba4506e85
P.S. The commit hash in submodules is equivalent to the version number in npm package's dependencies
Controlling Dependency Versions
To update the submodule version that the main repo depends on, commit this commit hash change:
$ git add src/packages/react-dom
$ git commit -m "build: update react-dom submodule"
$ git push origin master
Otherwise, without the --remote option, roll back to the current dependency version:
$ git submodule update
Modifying Submodule Code
Submodules are independent repos, so operate normally:
$ cd ./packages/react-dom
# Remember to switch branches, usually in detached state
$ git checkout master
$ git add .
$ git commit -m 'feat: xxx'
$ git push origin master
Afterwards, the main repo can pull the latest version through git submodule update --remote, and then the main repo decides whether to upgrade its dependent submodule version.
Executing the Same Git Command on Each Submodule
The foreach command is provided for batch processing submodules, for example:
# Enter each submodule directory and execute git stash
$ git submodule foreach 'git stash'
# Uniformly create feature branches
$ git submodule foreach 'git checkout -b featureA'
This command is quite useful when there are multiple submodule dependencies.
P.S. For more submodule usage, see 7.11 Git Tools - Submodules
IV. Common Issues
Submodule Branch in Detached State
After each execution of git submodule update --remote, the submodule will be in a detached state, for example:
$ cd ./packages/react-dom
$ git branch
* (HEAD detached at ac4d1fc)
master
This is by design; there's no good solution:
It's also important to realize that a submodule reference within the host repository is not a reference to a specific branch of that submodule's project, it points directly to a specific commit (or SHA1 reference), it is not a symbolic reference such as a branch or tag. In technical terms, it's a detached HEAD pointing directly to the latest commit as of the submodule add.
Therefore, before modifying submodule code, you need to manually switch to the master branch:
$ git checkout master
$ git add .
$ git commit -m 'feat: xxx'
$ git push origin master
Local Submodule Cache
When a submodule repo is migrated, executing git submodule add may encounter local cache issues:
$ git submodule add ssh://XXX.XXX.XXX.XXX:XXXXX/opt/git/fdf.git projets/fdf
A git directory for 'projets/fdf' is found locally with remote(s): origin ssh://git@XXX.XXX.XXX.XXX:XXXXX/opt/git/fdf.git If you want to reuse this local git directory instead of cloning again from ssh://XXX.XXX.XXX.XXX:XXXXX/opt/git/fdf.git use the '--force' option. If the local git directory is not the correct repo or you are unsure what this means choose another name with the '--name' option.
You need to delete the original configuration first (steps 2 and 3), then the locally cached submodule information (steps 1 and 4):
# 1. Delete git cache and physical files
$ git rm --cached path_to_submodule
$ rm -rf path_to_submodule
# 2. Delete the submodule's related configuration in .gitmodules
$ vi .gitmodules
[submodule "path_to_submodule"]
path = path_to_submodule
url = https://github.com/path_to_submodule
# 3. Delete the submodule's related configuration in .git/config
$ vi .git/config
[submodule "path_to_submodule"]
url = https://github.com/path_to_submodule
# 4. Delete submodule cache
$ rm -rf .git/modules/path_to_submodule
After cleanup, re-execute git submodule add.
P.S. In step 4, the submodule cache location can be viewed with the following command:
$ cat path_to_submodule/.git
gitdir: ../.git/modules/path_to_submodule
P.S. For more common issues, see Using Git Submodules
No comments yet. Be the first to share your thoughts.