JavaScript の RegExp

一.問題

Where's Wally

Description:

Write a function that returns the index of the first occurence of the word "Wally". "Wally" must not be part of another word, but it can be directly followed by a punctuation mark. If no such "Wally" exists, return -1.

Examples:

"Wally" => 0

"Where's Wally" => 8

"Where's Waldo" => -1

"DWally Wallyd .Wally" => -1

"Hi Wally." => 3

"It's Wally's." => 5

"Wally Wally" => 0

"'Wally Wally" => 7

codewars より引用

嗯、文字列の中で Wally の位置を見つける問題です、パターンマッチング問題

二.解法

2 つの方案：自分で解析する or 正規表現

自分で解析すれば確かに解決できますが、ここでは主に正規表現解法について議論します

##1.最初のバージョン（バグあり）

function wheresWally(string){console.log(string);
  var regex = /(^| )Wally($|[.' ])/;
  var pos = -1;

  if(string.test(regex)) {
    pos = string.indexOf('Wally');
  }
  
  return pos;
}

この実装は問題ないように見えますが、実際にはバグが存在します：

'aWally Wally' => 1

期待される戻り値は 7 ですが、理由は string.indexOf が最初にマッチ成功した位置のみを返すためです。私たちが知りたいのは regex が最初にマッチ成功した位置なので、string.lastIndexOf も役に立ちません

regex.test は単純に true/false を返すのみで、index を知ることはできません。したがって regex.test はこの問題には適していません

##2.修正バージョン（バグあり）

function wheresWally(string){console.log(string);
  var regex = /(^| )Wally($|[.' ])/;
  var pos = -1;
  
  var res = regex.exec(string);
  if (res !== null) {
    pos = res.index;
  }
  
  return pos;
}

今回は regex.exec を使用するように変更しました。exec は最も強力な正規表現メソッドで、確かに必要な情報を提供できます。MDN の API は以下の通り：

// Match "quick brown" followed by "jumps", ignoring characters in between
// Remember "brown" and "jumps"
// Ignore case
var re = /quick\s(brown).+?(jumps)/ig;
var result = re.exec('The Quick Brown Fox Jumps Over The Lazy Dog');

Object	Property/Index	Description	Example
`result`	`[0]`	The full string of characters matched	`Quick Brown Fox Jumps`
	`[1], ...[n ]`	The parenthesized substring matches, if any. The number of possible parenthesized substrings is unlimited.	`[1] = Brown [2] = Jumps`
	`index`	The 0-based index of the match in the string.	`4`
	`input`	The original string.	`The Quick Brown Fox Jumps Over The Lazy Dog`
`re`	`lastIndex`	The index at which to start the next match. When "g" is absent, this will remain as 0.	`25`
	`ignoreCase`	Indicates if the "`i`" flag was used to ignore case.	`true`
	`global`	Indicates if the "`g`" flag was used for a global match.	`true`
	`multiline`	Indicates if the "`m`" flag was used to search in strings across multiple line.	`false`
	`source`	The text of the pattern.	`quick\s(brown).+?(jumps)`

注意：index は exec の戻り値の属性であり、lastIndex は regex の属性です。間違えやすいです

私たちは index のみに注目し、それはマッチ成功した位置を保持しています。しかし上記の実装にはまだバグが存在します：

'aWally Wally' => 6

なぜ 6 で 7 ではないのか？私たちの正規表現 (^| )Wally($|[.' ]) をよく見ると、今回の成功したマッチはスペースから始まっていることが分かります。スペースの位置は確かに 7 です。これは簡単に修復できます：

if (string.charAt(pos) === ' ') {
  pos++;
}

実際には string.replace(regex, func(match, p1, p2..., offset, string)) を使用して同じタスクを完了することもできます。offset は index と同じ情報を保持しています

特别注意：global モードが exec に与える影響は、g モードをオフにすると regex.lastIndex の値は常に 0 のままで、g モードをオンにすると、exec を実行するたびに regex.lastIndex の値が更新され、手動でこの値を変更して、次回 exec が開始される位置を変更することもできます。例：

var regex = /^abc/;
undefined
var regex_g = /^abc/g;
undefined
var str = 'abc abcd';
undefined
regex.lastIndex;
0
regex_g.lastIndex;
0
regex.exec(str);
["abc"]
regex.lastIndex;
0
regex_g.exec(str);
["abc"]
regex_g.lastIndex;
3
regex_g.lastIndex = 1;
1
regex_g.exec(str);
null

###3.ネットユーザーの解法

####解法 1

function wheresWally(string){
  return (" "+string).search(/ Wally\b/) 
}

元の文字列を変更してからマッチする、巧妙です

####解法 2

function wheresWally(string) {
  var match = /(^|[ ])Wally\b/.exec(string)
  return match ? match.index + match[1].length : -1
}

筆者の思路と一致していますが、はるかに簡潔です。match[1].length を使用してスペース問題を巧妙に解決しており、if...charAt よりもはるかに美しいです

####解法 3

function wheresWally(string){
  var mtch = " ".concat(string).match(/\sWally\b/)
  var idx = mtch ? " ".concat(string).indexOf(" Wally") : -1;
  return idx;
}

スペースあり/スペースなしを別々に処理します。複雑な問題を単純化する一般的な方法：場合分け

####解法 4

function wheresWally(string) {
  var match = string.match(/(^|\s)Wally($|[^\w])/); 
  return match ? match.index + match[0].indexOf('Wally') : -1;
}

indexOf と組み合わせてスペース問題を解決します。これも良い思路です

三.反省

先輩の言う通り、コミュニティ は重要な学習経路です

###1.String で RegExp に関連するメソッド

str.match(regexp)

マッチ結果配列、または null を返します

g モードをオフにすると 1 回のみマッチし、g モードをオンにするとすべてのマッチ項目を結果配列に格納します

str.replace(regexp, func)

func の引数は順に match, p1, p2..., offset, string です ~ マッチ部分、キャプチャ部分 1、キャプチャ部分 2...、マッチ位置、全体文字列

str.search(regexp)

マッチ位置、または -1 を返します

注意：g モードをオンにしてもオフにしても、最初のマッチ位置のみを返します。これは regex.test と同じです（g モードをオンにするのは無駄です）

###2.RegExp

####1.属性

regex.lastIndex

次回のマッチが開始される位置、初期値は 0

regex.global/regex.ignoreCase/regex.multiline

g/i/m の 3 つのモード（グローバル/大文字小文字無視/複数行）に対応し、true/false を返します

regex.source

パターン文字列自体を返します（リテラル方式では 2 つのスラッシュ間の部分、または new 方式ではリテラルに変換後の 2 つのスラッシュ間の部分）

####2.メソッド

regex.exec(str)

マッチ結果配列、または null を返します

結果配列の最初の要素はマッチ部分、後の要素は順にキャプチャ部分です

注意：結果配列にはさらに 2 つの属性があります

-  index

  *マッチ位置*

-  input

  全体文字列

regex.test()

true/false を返します。g モードをオンにしてもオフにしても結果に影響しません

~~regex.compile()~~

時代遅れで、使用を推奨しません

###3.関連

regex.test(str) は str.search(regex) !== -1 と等価
regex.exec(str) は str.replace(regex, func) に相当します。exec はより多くの制御能力（regex.lastIndex）を提供します

一.問題

二.解法

三.反省

コメント

コメントを書く