Skip to main content

React List Diff

Free2017-05-28#Front-End#JS#list diff algorithm#tree diff algorithm#dom diff#react virtual dom diff

The seemingly magical but actually quite simple React Diff

I. Tree Diff

Diffing a tree is a relatively complex problem. Let's first consider a simple scenario:

    A           A'
   / \   ?    / | \
  B   C  ->  G  B   C
 / \  |         |   |
D   E F         D   E

The result of diff(treeA, treeA') should be:

1.insert G before B
2.move   E to     F
3.remove F

Applying these 3 steps to treeA will give us treeA'. When the scale of the tree is small, it can be seen at a glance. Let's recall how we did it:

1. Observe top-down, if they are completely different, no need to look further.
2. Compare nodes at the current level, find Additions (new has it, old doesn't), Deletions (old has it, new doesn't).
3. Look down to see if the subtree structure is similar to determine if it's a Move (moving an existing node to a new position).

If a computer were to do this, Add and Delete are easy to find, but determining Move is more complicated. First, it needs to quantify the similarity of the trees (like a weighted edit distance), and determine at what similarity score a Move is more cost-effective than a Delete + Add (fewer operation steps).

P.S. If you are interested in the Tree Diff algorithm, you can refer to Matching, diffing and merging XML. The article is quite old (2008), but it's very interesting.

II. React Assumptions

The virtual DOM tree maintained internally by React also faces the problem of how to diff. The DOM tree updates frequently (with frequent interactions), and an overly complex tree diff solution would bring performance issues. Considering the characteristics of DOM operation scenarios:

  • Many small local changes, few large-scale changes (for performance reasons, avoided by showing/hiding).

  • Few cross-level moves, many same-level node moves (like sorting a table).

If there were many large-scale changes, diff would basically be meaningless, purely a performance loss. If there were many cross-level moves, precise move determination would be necessary, placing high demands on the algorithm. DOM operation scenarios happen to not require such high demands. Based on this, a few assumptions were proposed:

1.Two elements of different types will produce different trees.

2.The developer can hint at which child elements may be stable across different renders with a key prop.

That is to say:

  • Assume elements of different types correspond to different subtrees (don't consider "looking down to see if the subtree structure is similar", the Move determination becomes effortless).

  • Both the old and new structures will carry a unique key as the basis for diff, assuming the same key means the same element (lowering the cost of comparison).

Additionally, to be safe, React also provides the shouldComponentUpdate hook, allowing manual intervention in the diff process to avoid misjudgments.

III. Virtual DOM Tree Diff vs. List Diff

If we were to directly compare tree structures, we wouldn't know where to start (the naked eye easily compares shape and structure differences, but computers aren't good at this). For example:

Page
  Header
    div
      img
      p
  List
    Item1
      img
      p
      button
    Item2...

It seems difficult to get a direct result from diff(Page, newPage). So we need to find a way to reduce the scale of the problem. A highly effective method is recursion:

1. Each node checks if it needs to be updated itself.
2. If needed, diff the direct children, finding Add, Delete, Move, and Modify.
3. Recursively check downwards for those needing Modify, until the leaves.

This way, the tree diff is broken down into list diff. The part that actually needs to be implemented is just the list diff, making the problem very clear.

IV. List Diff

Consider two one-dimensional integer arrays:

var oldList = [1, 2, 3, 7, 4];
var newList = [1, 4, 5, 3, 7, 6];

How do we use diff to find Add/Delete/Move?

1. Traverse the old one, find Delete.
2. Traverse the new one, find Add, Move.

Simple Solution

Ignoring performance and complexity for now, let's try to implement a simple and crude list diff:

var diff = function(oldList, newList) {
    var changes = [];
    // Mirror to simulate positions
    var _oldList = oldList.slice();
    // Traverse old, find Delete
    oldList.forEach(function(item, i) {
        if (newList.indexOf(item) === -1) {
            changes.push({
                type: 'remove',
                index: i
            });
            _oldList.splice(i, 1);
        }
    });

    // Traverse new, find Add/Move
    newList.forEach(function(item, i) {
        var index = _oldList.indexOf(item);
        if (index === -1) {
            // Add
            changes.push({
                type: 'insert',
                index: i,
                item: item
            });
            _oldList.splice(i, 0, item);
        }
        else {
            // Move
            if (index !== i) {
                var step = {
                    type: 'move',
                    from: index,
                    to: i
                };
                changes.push(step);
                move(_oldList, step.from, step.to);
            }
        }
    });

    return changes;
};
var move = function(list, from, to) {
    var item = list.splice(from, 1);
    if (from > to)
        list.splice(to, 0, item[0]);
    else
        list.splice(to - 1, 0, item[0]);
};

Simulating patch:

var showSteps = function(changes) {
    changes.forEach(function(change) {
        switch (change.type) {
            case 'insert':
                console.log('insert ' + change.item + ' before ' + oldList[change.index]);
                oldList.splice(change.index, 0, change.item);
                break;
            case 'remove':
                console.log('remove ' + oldList[change.index]);
                oldList.splice(change.index, 1);
                break;
            case 'check':
                console.log('check ' + oldList[change.index] + ' for update');
                break;
            case 'move':
                console.log('move ' + oldList[change.from] + ' to ' + oldList[change.to]);
                move(oldList, change.from, change.to);
                break;
            default:
                console.error('not here');
                break;
        }
        console.log(oldList);
    });
};

Executing gets the operation steps:

source  1 2 3 7 4
target  1 4 5 3 7 6

remove 2
  1 3 7 4
move 4 to 3
  1 4 3 7
insert 5 before 3
  1 4 5 3 7
insert 6 before undefined
  1 4 5 3 7 6

Lazily operating on the _oldList mirror to simulate positions, avoiding calculating the impact of Add/Delete/Move on index, incurs additional space overhead. We need to find a way to remove this part.

Optimizing Space Consumption

Remove the simulated position operations and directly calculate index changes:

// 1. Replace simulated position operations with recording index changes, traverse new first, then old.
var diff = function(oldList, newList) {
    var changes = [];
    // Calculate the impact of change on the index of uncompared elements in oldList, it's a range
    // Because moving an element before me to an even earlier position doesn't affect my index
    // When moving, only affects index <= lastIndex
    // Represented as <=index: delta
    var maxLength = Math.max(oldList.length, newList.length);
    var oldIndexChanges = new Array(maxLength).fill(0);
    // Index before correction
    var _index;

    // Traverse new, find Add/Move
    newList.forEach(function(item, i) {
        var index = oldList.indexOf(item);
        if (index === -1) {
            // Add
            changes.push({
                type: 'insert',
                index: i,
                item: item
            });
            // insert affects the index of all subsequent elements in oldList
            oldIndexChanges[maxLength - 1]++;
        }
        else {
            _index = index;
            // Correct old index
            // Count from index to the end, sum delta
            index += oldIndexChanges.reduce(function(acc, delta, idx) {
                if (idx >= index)
                    return acc + delta;
                else
                    return acc;
            });
            // Move
            if (index !== i) {
                var step = {
                    type: 'move',
                    from: index,
                    to: i
                };
                changes.push(step);
                if (index > i) {
                    // move affects elements <= from in oldList
                    oldIndexChanges[_index]++;
                }
                else {
                    // from cannot be less than to
                    // Because scanning from front to back, positions [0, to-1] are fixed, won't pick elements from the front
                    console.error('impossible');
                }
            }
        }
    });

    // Traverse old, find Delete
    // Calculate total delta
    // After insert and move, elements to be deleted must be at the very end, affected by all index changes
    var indexDelta = oldIndexChanges.reduce(function(acc, delta) {
        return acc + delta;
    });
    oldList.forEach(function(item, i) {
        if (newList.indexOf(item) === -1) {
            // Correct old index
            i += indexDelta;
            changes.push({
                type: 'remove',
                index: i
            });
        }
    });

    return changes;
};

// 2. Simulate patch
var showSteps = function(changes) {
    changes.forEach(function(change) {
        switch (change.type) {
            case 'insert':
                console.log('insert ' + change.item + ' before ' + oldList[change.index]);
                oldList.splice(change.index, 0, change.item);
                break;
            case 'remove':
                console.log('remove ' + oldList[change.index]);
                oldList.splice(change.index, 1);
                break;
            case 'check':
                console.log('check ' + oldList[change.index] + ' for update');
                break;
            case 'move':
                console.log('move ' + oldList[change.from] + ' to ' + oldList[change.to]);
                move(oldList, change.from, change.to);
                break;
            default:
                console.error('not here');
                break;
        }
        console.log(oldList);
    });
};

Executing gets the operation steps:

source  1 2 3 7 4
target  1 4 5 3 7 6

move 4 to 2
  1 4 2 3 7
insert 5 before 2
  1 4 5 2 3 7
move 3 to 2
  1 4 5 3 2 7
move 7 to 2
  1 4 5 3 7 2
insert 6 before 2
  1 4 5 3 7 6 2
remove 2
  1 4 5 3 7 6

There are 2 extra operation steps because the Move operations were done before the Delete (if Delete was done first, we wouldn't need to consider index changes caused by Delete when doing Add/Move). The two move operations relative to the 2 that is to be deleted are meaningless. We need to find a way to avoid this. We won't expand on it here, let's directly look at React's original implementation.

V. React List Diff

Similar to our idea, it also traverses the new list first, then the old one:

var diff = function(oldList, newList) {
    var changes = [];
    // The largest index of the previously visited children in the old list
    var lastIndex = 0;
    // The last placed node, used for Add/Move
    var lastPlacedNode = null;

    // Traverse new, find Add/Move
    newList.forEach(function(item, i) {
        var index = oldList.indexOf(item);
        if (index === -1) {
            // Add
            changes.push({
                type: 'insert',
                item: item,
                afterNode: lastPlacedNode
            });
        }
        else {
            // lastIndex filters out Moves where relative positions haven't changed
            if (index < lastIndex) {
                // Move
                var step = {
                    type: 'move',
                    item: item,
                    afterNode: lastPlacedNode
                };
                changes.push(step);
            }
            lastIndex = Math.max(index, lastIndex);
        }
        lastPlacedNode = item;
    });

    // Traverse old, find Delete
    oldList.forEach(function(item, i) {
        if (newList.indexOf(item) === -1) {
            changes.push({
                type: 'remove',
                index: i
            });
        }
    });

    return changes;
};

One very clever optimization here is lastIndex. It records the index of the rightmost element touched in the source list. Only when the original relative position can no longer satisfy the need (meaning a Move is strictly necessary) is a Move performed. This maximizes the use of original relative positions and avoids unnecessary movements.

Simulating patch:

var showSteps = function(changes) {
    // Keep a copy to look up previous positions for Moves
    var _oldList = oldList.slice();
    // For Add, Move, and Delete, simulating DOM operations requires knowing the current index of the old element during patching
    // In actual DOM patching, this is not needed, because after finding the element, the DOM API doesn't need an index to Add, Move, or Delete
    var patchingIndex = -1;
    changes.forEach(function(change) {
        switch (change.type) {
            case 'insert':
                console.log('insert ' + change.item + ' after ' + change.afterNode);
                patchingIndex = oldList.indexOf(change.afterNode);
                insertAfter(oldList, change.item, patchingIndex);
                break;
            case 'remove':
                console.log('remove ' + _oldList[change.index]);
                patchingIndex = oldList.indexOf(_oldList[change.index]);
                oldList.splice(patchingIndex, 1);
                break;
            case 'move':
                console.log('move ' + change.item + ' to pos after ' + change.afterNode);
                patchingIndex = oldList.indexOf(change.afterNode);
                move(oldList, oldList.indexOf(change.item), patchingIndex);
                break;
            default:
                console.error('not here');
                break;
        }
        console.log(oldList);
    });
};
var move = function(list, from, to) {
    var item = list.splice(from, 1);
    if (from > to)
        list.splice(to + 1, 0, item[0]);
    else
        list.splice(to, 0, item[0]);
};
var insertAfter = function(list, item, index) {
    list.splice(index + 1, 0, item);
};

Executing gets the operation steps:

source  1 2 3 7 4
target  1 4 5 3 7 6

insert 5 after 4
  1 2 3 7 4 5
move 3 to pos after 5
  1 2 7 4 5 3
move 7 to pos after 3
  1 2 4 5 3 7
insert 6 after 7
  1 2 4 5 3 7 6
remove 2
  1 4 5 3 7 6

When encountering 1 and 4, they are not moved because their original relative order is correct. When it hits 3, it has to be moved (the original relative position can no longer be maintained), so it is moved.

Looking at this example, React's implementation is not the most efficient. The simple version we wrote initially (Delete first, then Add/Move) only took 4 steps.

VI. Online Demo

The React implementation under the example scenario and the diff methods in the article: http://www.ayqy.net/temp/react/demo/list-diff.html

Open the browser Console, click "Update List", and look at the diff result.

P.S. React version is 15.5.4

References

Comments

No comments yet. Be the first to share your thoughts.

Leave a comment