JavaScript 的 RegExp

##一。問題

Where's Wally

Description:

Write a function that returns the index of the first occurence of the word "Wally". "Wally" must not be part of another word, but it can be directly followed by a punctuation mark. If no such "Wally" exists, return -1.

Examples:

"Wally" => 0

"Where's Wally" => 8

"Where's Waldo" => -1

"DWally Wallyd .Wally" => -1

"Hi Wally." => 3

"It's Wally's." => 5

"Wally Wally" => 0

"'Wally Wally" => 7

摘自 codewars

嗯，在字串裡找笨蛋（Wally）的位置，模式匹配問題

##二。解法

2 種方案：自己解析 or 正則表示式

自己解析肯定能搞定，不再贅述，這裡主要討論正則表示式解法

###1.最初版本（有 bug）

function wheresWally(string){console.log(string);
  var regex = /(^| )Wally($|[.' ])/;
  var pos = -1;

  if(string.test(regex)) {
    pos = string.indexOf('Wally');
  }
  
  return pos;
}

這個實現看起來好像沒什麼問題，其實存在 bug:

'aWally Wally' => 1

期望返回結果是 7，原因在於 string.indexOf 只返回第一次匹配成功的位置，我們想要知道的是 regex 第一次匹配成功的位置，所以 string.lastIndexOf 也沒有用

regex.test 只簡單地返回 true/false，我們無從得知 index，所以 regex.test 不適用於這個問題

###2.修正版本（有 bug）

function wheresWally(string){console.log(string);
  var regex = /(^| )Wally($|[.' ])/;
  var pos = -1;
  
  var res = regex.exec(string);
  if (res !== null) {
    pos = res.index;
  }
  
  return pos;
}

這次改用 regex.exec 來做，exec 是最強大的正則表示式方法了，肯定能夠提供我們需要的資訊，MDN 的 API 如下：

// Match "quick brown" followed by "jumps", ignoring characters in between
// Remember "brown" and "jumps"
// Ignore case
var re = /quick\s(brown).+?(jumps)/ig;
var result = re.exec('The Quick Brown Fox Jumps Over The Lazy Dog');

Object	Property/Index	Description	Example
`result`	`[0]`	The full string of characters matched	`Quick Brown Fox Jumps`
	`[1], ...[n ]`	The parenthesized substring matches, if any. The number of possible parenthesized substrings is unlimited.	`[1] = Brown [2] = Jumps`
	`index`	The 0-based index of the match in the string.	`4`
	`input`	The original string.	`The Quick Brown Fox Jumps Over The Lazy Dog`
`re`	`lastIndex`	The index at which to start the next match. When "g" is absent, this will remain as 0.	`25`
	`ignoreCase`	Indicates if the "`i`" flag was used to ignore case.	`true`
	`global`	Indicates if the "`g`" flag was used for a global match.	`true`
	`multiline`	Indicates if the "`m`" flag was used to search in strings across multiple line.	`false`
	`source`	The text of the pattern.	`quick\s(brown).+?(jumps)`

注意：index 是 exec 返回值的屬性，而 lastIndex 是 regex 的屬性，很容易搞錯

我們只關注 index，它攜帶了匹配成功的位置。但上面的實現還存在 bug：

'aWally Wally' => 6

為什麼是 6 而不是 7？仔細看看我們的正則表示式 (^| )Wally($|[.' ])，發現本次成功的匹配是從空格開始的，空格的位置確實是 7，這個好辦，簡單修復下就行：

if (string.charAt(pos) === ' ') {
  pos++;
}

其實也可以用 string.replace(regex, func(match, p1, p2..., offset, string)) 來完成同樣的任務，offset 攜帶了與 index 相同的資訊

特別注意：global 模式對 exec 的影響是，如果不開 g 模式，regex.lastIndex 的值一直都是 0，如果開了 g 模式，每執行一次 exec，regex.lastIndex 的值都會更新，還可以手動修改這個值，改變下一次 exec 開始的位置，例如：

var regex = /^abc/;
undefined
var regex_g = /^abc/g;
undefined
var str = 'abc abcd';
undefined
regex.lastIndex;
0
regex_g.lastIndex;
0
regex.exec(str);
["abc"]
regex.lastIndex;
0
regex_g.exec(str);
["abc"]
regex_g.lastIndex;
3
regex_g.lastIndex = 1;
1
regex_g.exec(str);
null

###3.網友的解法

####解法 1

function wheresWally(string){
  return (" "+string).search(/ Wally\b/) 
}

先改原串再匹配，很巧妙

####解法 2

function wheresWally(string) {
  var match = /(^|[ ])Wally\b/.exec(string)
  return match ? match.index + match[1].length : -1
}

和筆者思路一致，但簡短得多，用 match[1].length 巧妙解決空格問題，比 if...charAt 漂亮多了

####解法 3

function wheresWally(string){
  var mtch = " ".concat(string).match(/\sWally\b/)
  var idx = mtch ? " ".concat(string).indexOf(" Wally") : -1;
  return idx;
}

有空格/無空格分開處理，複雜問題簡單化的一般方法：分情況

####解法 4

function wheresWally(string) {
  var match = string.match(/(^|\s)Wally($|[^\w])/); 
  return match ? match.index + match[0].indexOf('Wally') : -1;
}

配合 indexOf 解決空格問題，也算不錯的思路

##三。反思

前輩說的沒錯，社群是一種重要的學習途徑

###1.String 裡與 RegExp 有關的方法

str.match(regexp)

返回匹配結果陣列，或者 null

不開 g 模式就只匹配一次，開 g 模式就把所有匹配項都裝入結果陣列

str.replace(regexp, func)

func 的參數依次是 match, p1, p2..., offset, string ~ 匹配部分，捕獲部分 1, 捕獲部分 2..., 匹配位置, 整串

str.search(regexp)

返回匹配位置，或者 -1

注意：開不開 g 模式都只返回第一個匹配位置，這一點和 regex.test 一樣（開 g 模式純屬浪費）

###2.RegExp

####1.屬性

regex.lastIndex

下一次匹配將要開始的位置，初始值是 0

regex.global/regex.ignoreCase/regex.multiline

對應 g/i/m 三種模式（全域/忽略大小寫/多行），返回 true/false

regex.source

返回模式串本身（字面量方式中兩條斜線之間的部分，或者 new 方式中轉換為字面量後兩條斜線之間的部分）

####2.方法

regex.exec(str)

返回匹配結果陣列，或者 null

結果陣列中第一個元素匹配部分，後面的元素依次是捕獲部分

注意：結果陣列還有兩個屬性

-  index

  *匹配位置*

-  input

  整串

regex.test()

返回 true/false，開不開 g 模式不影響結果

~~regex.compile()~~

過時了，不建議使用

###3.聯絡

regex.test(str) 等價於 str.search(regex) !== -1
regex.exec(str) 相當於 str.replace(regex, func)，exec 能提供更多的控制能力（regex.lastIndex）

評論

提交評論