HomeDocumentationChatIssues GitHub

Module System


Interlock's module system is fundamental to Interlock's build and runtime systems, and is important in the following ways:

To understand this better, let's look at the two data structures upon which the module system is based.


Merkle tree

A Merkle tree, also known as a hash-tree, is a tree structure where every non-leaf node is designated a hash. These hashes are generated from the hashes of its child nodes, recursively, all the way down to the leaf nodes. The general idea is depicted below.

│         hash(hash(a) + hash(b))          │
│                    +                     │
│         hash(hash(c) + hash(d))          │
          ▼                      ▼          
┌───────────────────┐  ┌───────────────────┐
│ hash(a) + hash(b) │  │ hash(c) + hash(d) │
└───────────────────┘  └───────────────────┘
          │                      │          
     ┌────┴────┐            ┌────┴────┐     
     ▼         ▼            ▼         ▼     
  ┌─────┐   ┌─────┐      ┌─────┐   ┌─────┐  
  │  A  │   │  B  │      │  C  │   │  D  │  
  └─────┘   └─────┘      └─────┘   └─────┘  

You may have seen this data structure before: it is the foundational data structure behind both Git and Bitcoin. In Bitcoin, the leaves are transactions from a transaction block, and the recursive hashes allow someone who has no direct access to a particular transaction to trust its authenticity.

Git also relies heavily on hashing the hashes of child nodes. However, Merkle trees don't map perfectly to Git's use-case. To understand the variation that Git introduces, let's look at another data structure.

Directed Acyclic Graph

A directed acyclic graph, or DAG, is a graph of nodes fitting within a handful of constraints.

First, each node is connected to one or more other nodes through vertices - this is an important difference to the tree structure described above, where every node had only one parent. Additionally, each vertex has a direction, meaning there exists one or more child-parent relationships for every node. And finally, the graph is finite, meaning that there exist a set number of nodes and vertices.

┌───┐      ┌───┐ ┌───┐      ┌───┐
│ A │─────▶│ B │►│ C │─────▶│ D │
└───┘      └───┘ └───┘      └───┘
  ╲                           ▲  
   ╲──╲ ┌───┐ ┌───┐ ┌───┐ ╱──╱   
       ▶│ X │►│ Y │►│ Z │╱       
        └───┘ └───┘ └───┘        

Git's DAG

If you've used Git for any amount of time, the above graph might be familiar to you.

From commit A, you create a new branch in which three new commits are made, X, Y, and Z. Meanwhile, in the main branch, B and C are commited. Finally, the two branches are merged together in D.

Along the way, each commit is represented by a hash - this is unique to Git, and not true of DAGs in general. Each commit-hash is generated from:

This data structure actually resembles a Merkle tree in a number of ways.

But there are a couple of important differences:

Git commit hashes are in no way arbitrary. They are generated deterministically, so that if the same author made the same changes at the same time on top of the same commit in two different locations, the commit hashes would be identical. And for Git's use-case, there is no reason why they shouldn't be treated as identical. The inverse is also true - commits with different meta-data, data, and history will have different hashes. This is ultimately what puts the "distributed" in "distributed version control".

Interlock's module system functions very similarly to a Git repo - and you can think of both as a Merklized DAG. More on that below.

Dependency Graph

This brings us to Interlock, and the applications you're writing. To tie this all together, consider the dependency graph for a typical application.

          │ entry-point.js │   
              ▼         ▼      
          ┌──────┐  ┌──────┐   
          │ a.js │  │ b.js │   
          └──────┘  └──────┘   
              │         │      
    ┌─────────┤         │      
    ▼         ▼         │      
┌──────┐  ┌──────┐      │      
│ c.js │  │ d.js │  ┌───┘      
└──────┘  └──────┘  │          
              │     │          
         │n_m/lodash/index.js │

There is an entry-point.js that requires two dependencies, a.js and b.js. A has a couple of dependencies itself, ultimately depending on Lodash, and B depends only on Lodash.

You might notice that this graph is constructed very similarly to the DAG we saw above.

Because of this, we can think about and treat JavaScript applications as a Merklized DAG, where each node is a module, and each module has a uniquely identifying hash. Indeed, this is even true of all applications collectively (see Determinism and Universality below).

Module Hashes

What goes into a module hash?

Like most parts of Interlock, this is overridable (see Extensibility for more on this). But, by default, there is a clearly defined set of data and meta-data that goes into generating a module hash.

Determinism and Universality

Similar to Git, Interlock's module hashes are deterministic, meaning that two builds on two different machines will generate the same hash for a given module, so long as all the content and metadata for that module, as well as for its dependencies, are equal.

Because of this, we can extend our mental model for a Merklized dependency graph a little further to encompass all possible modules that will ever be compiled, in any combination. To understand this, consider the following two applications.

   ┌─────────────┐             ┌─────────────┐   
   │ bundle-a.js │      │      │ bundle-b.js │   
   └─────────────┘             └─────────────┘   
          │             │             │          
    ┌─────┴─────┐               ┌─────┴─────┐    
    ▼           ▼       │       ▼           ▼    
┌───────┐   ┌───────┐       ┌───────┐   ┌───────┐
│ aa.js │   │ ab.js │   │   │ ba.js │   │ bb.js │
└───────┘   └───────┘       └───────┘   └───────┘
                ┃       │       ┃                
                ┗ ━ ━ ━ ┳ ━ ━ ━ ┛                
                  │ lodash.js │                  

The above scenario shows us two applications, A and B. The application entry point of each has a couple of dependencies, and each ultimately depends on Lodash. Let's consider the following scenarios for how the bundles might be compiled and interact with Lodash at run-time.

Scenario 1: Same Version

In this scenario, each bundle depends on the same version of Lodash. Because of this, each Lodash module's meta-data will be identical:

During compilation, each bundle's reference to Lodash (its hash) will be identical. And because all Interlock bundles share the same run-time when loaded on the same page (see Runtime Architecture), these references will resolve to the same module.exports in both places.

This results in automatic de-duping for identical versions of Lodash, or any other library, so long as they share the same version.

Scenario 2: Slightly Different Versions

In this scenario, application A depends on lodash@4.16.4 while application B depends on lodash@4.16.2. Let's consider how the hash's constituent data and meta-data match up:

What this ultimately boils down to is this:

Fortunately, this is the behavior that we want. For modules that function the same, there is no need to include two copies in the bundle. For modules whose behavior has potentially changed, we will want to include both and non de-duplicate.

Scenario 3: Very Different Versions

In this scenario, application A depends on lodash@4.16.4 while application B depends on lodash@3.9.3. Let's take a look at how things match-up here:

Again, this is the behavior that we want. If a handful of Lodash modules remained in the same place and have the same functionality, they'll be de-duped. But for the most part, the data and meta-data won't match up and most of both copies of Lodash will be included in the bundles.

Behavior Across Builds

So far, we've considered two applications that were compiled as part of the same build process. But what about two applications that were compiled separately?

It may be obvious that no de-duping can occur here. Without some special context, there is no way for one application to know that it'll be running at the same time as another. So full copies of Lodash would be included in each compiled bundle.

However, we've also established that these modules hashes are generated deterministically, and independent of any particular build at any particular time. This is where the universality of the module hashes come into play.

Because both copies of Lodash share the same module hashes, there will only ever be one copy of Lodash "running" on a page at a time, if they're the same version. They will share behavior and any state or config that they might encapsulate. This introduces benefits for libraries that may be expensive to spin up initially.

It is also worth noting that it is possible to build shared libraries into a bundle once, and then re-use that bundle in multiple applications. This can be beneficial to build times, browser cache and webpage performance, and the total size of your shipped applications.

To learn more about this, see the interlock-share documentation.

Caveats and Warnings

Unwanted state sharing

You may experience undesired behavior in the following scenario:

This has not come up as a real problem, but it is theoretically possible.

Should this occur to you, you could modify the behavior of the hashModule compilation step to incorporate package version information or something of the like. This would result in the two modules having different hashes where they normally would have the same one.