Skip to main content

JavaScript's RegExp

Free2015-09-26#JS#Regular_Expression#JavaScript正则表达式#test#exec#match#search#replace#regex

A small problem took quite some time, let's look at JavaScript's regular expressions again

I. Problem

Where's Wally

Description:

Write a function that returns the index of the first occurence of the word "Wally". "Wally" must not be part of another word, but it can be directly followed by a punctuation mark. If no such "Wally" exists, return -1.

Examples:

"Wally" => 0
"Where's Wally" => 8
"Where's Waldo" => -1
"DWally Wallyd .Wally" => -1
"Hi Wally." => 3
"It's Wally's." => 5
"Wally Wally" => 0
"'Wally Wally" => 7

From codewars

Hmm, finding the position of Wally in a string, a pattern matching problem

II. Solutions

2 approaches: parse it yourself or use regular expressions

Parsing it yourself can definitely solve it, no need to elaborate. Here we mainly discuss the regular expression approach

1. Initial Version (has bugs)

function wheresWally(string){console.log(string);
  var regex = /(^| )Wally($|[.' ])/;
  var pos = -1;

  if(string.test(regex)) {
    pos = string.indexOf('Wally');
  }
  
  return pos;
}

This implementation seems to have no problems, but there's actually a bug:

'aWally Wally' => 1

The expected return value is 7. The reason is that string.indexOf only returns the position of the first match. We want to know the position where regex first matches successfully, so string.lastIndexOf is also useless.

regex.test simply returns true/false, we have no way to know the index, so regex.test is not suitable for this problem.

2. Corrected Version (has bugs)

function wheresWally(string){console.log(string);
  var regex = /(^| )Wally($|[.' ])/;
  var pos = -1;
  
  var res = regex.exec(string);
  if (res !== null) {
    pos = res.index;
  }
  
  return pos;
}

This time we use regex.exec instead. exec is the most powerful regular expression method, it can definitely provide the information we need. MDN API is as follows:

// Match "quick brown" followed by "jumps", ignoring characters in between
// Remember "brown" and "jumps"
// Ignore case
var re = /quick\s(brown).+?(jumps)/ig;
var result = re.exec('The Quick Brown Fox Jumps Over The Lazy Dog');
ObjectProperty/IndexDescriptionExample
result[0]The full string of characters matchedQuick Brown Fox Jumps
[1], ...[n ]The parenthesized substring matches, if any. The number of possible parenthesized substrings is unlimited.[1] = Brown
[2] = Jumps
indexThe 0-based index of the match in the string.4
inputThe original string.The Quick Brown Fox Jumps Over The Lazy Dog
relastIndexThe index at which to start the next match. When "g" is absent, this will remain as 0.25
ignoreCaseIndicates if the "i" flag was used to ignore case.true
globalIndicates if the "g" flag was used for a global match.true
multilineIndicates if the "m" flag was used to search in strings across multiple line.false
sourceThe text of the pattern.quick\s(brown).+?(jumps)

Note: index is a property of the exec return value, while lastIndex is a property of regex, easy to get wrong.

We only care about index, which carries the position of successful match. But the above implementation still has a bug:

'aWally Wally' => 6

Why 6 instead of 7? Look carefully at our regular expression (^| )Wally($|[.' ]), we find that this successful match starts from a space. The position of the space is indeed 7. This is easy to handle, just fix it simply:

if (string.charAt(pos) === ' ') {
  pos++;
}

Actually, we can also use string.replace(regex, func(match, p1, p2..., offset, string)) to complete the same task. offset carries the same information as index.

Special note: The impact of global mode on exec is: if g mode is not enabled, regex.lastIndex value is always 0. If g mode is enabled, each execution of exec updates the value of regex.lastIndex. You can also manually modify this value to change the starting position of the next exec. For example:

var regex = /^abc/;
undefined
var regex_g = /^abc/g;
undefined
var str = 'abc abcd';
undefined
regex.lastIndex;
0
regex_g.lastIndex;
0
regex.exec(str);
["abc"]
regex.lastIndex;
0
regex_g.exec(str);
["abc"]
regex_g.lastIndex;
3
regex_g.lastIndex = 1;
1
regex_g.exec(str);
null

3. Community Solutions

Solution 1

function wheresWally(string){
  return (" "+string).search(/ Wally\b/) 
}

Modify the original string first then match, very clever.

Solution 2

function wheresWally(string) {
  var match = /(^|[ ])Wally\b/.exec(string)
  return match ? match.index + match[1].length : -1
}

Same approach as the author, but much more concise. Uses match[1].length to cleverly solve the space problem, much better than if...charAt.

Solution 3

function wheresWally(string){
  var mtch = " ".concat(string).match(/\sWally\b/)
  var idx = mtch ? " ".concat(string).indexOf(" Wally") : -1;
  return idx;
}

Handle space/no-space separately. A general method for simplifying complex problems: case-by-case analysis.

Solution 4

function wheresWally(string) {
  var match = string.match(/(^|\s)Wally($|[^\w])/); 
  return match ? match.index + match[0].indexOf('Wally') : -1;
}

Cooperate with indexOf to solve the space problem, also a good approach.

III. Reflection

As seniors said, community is an important learning pathway.

  • str.match(regexp)

    Returns an array of match results, or null.

    Without g mode, match only once. With g mode, put all matches into the result array.

  • str.replace(regexp, func)

    func parameters are依次 match, p1, p2..., offset, string ~ matched part, captured part 1, captured part 2..., match position, entire string.

  • str.search(regexp)

    Returns match position, or -1.

    Note: Whether g mode is enabled or not, only returns the first match position. Same as regex.test (enabling g mode is purely wasteful).

2. RegExp

1. Properties

  • regex.lastIndex

    The position where the next match will start, initial value is 0.

  • regex.global/regex.ignoreCase/regex.multiline

    Corresponds to g/i/m three modes (global/ignore case/multiline), returns true/false.

  • regex.source

    Returns the pattern string itself (the part between the two slashes in literal form, or the part between two slashes after conversion to literal in new form).

2. Methods

  • regex.exec(str)

    Returns an array of match results, or null.

    The first element in the result array is the matched part, subsequent elements are captured parts in order.

    Note: The result array also has two properties:

    • index

      Match position

    • input

      Entire string

  • regex.test()

    Returns true/false, enabling g mode or not doesn't affect the result.

  • regex.compile()

    Obsolete, not recommended for use.

3. Relationships

  • regex.test(str) is equivalent to str.search(regex) !== -1

  • regex.exec(str) is equivalent to str.replace(regex, func), exec provides more control capabilities (regex.lastIndex)

Comments

No comments yet. Be the first to share your thoughts.

Leave a comment