【PHP入門2021】文字列を正規表現でマッチング検索をするpreg_match/preg_match_all関数の使い方

文字列を正規表現でマッチング検索をするpreg_match/preg_match_all関数の使い方
1. preg_match/preg_match_allでできること
preg_match関数の使い方
1. preg_match関数の基本構文
2. preg_match関数の使い方サンプル
preg_match_all関数の使い方
1. preg_match_all関数の基本構文
2. preg_match_all関数の使い方サンプル
関連する関数
1. 正規表現についてまとめ
2. PHPの正規表現が使える関数一覧(PCRE関数)

文字列を正規表現でマッチング検索をするpreg_match/preg_match_all関数の使い方

文字列を正規表現を使用してマッチングしたり、検索位置などを取得できるpreg_matchとpreg_match_allの2つの関数をご紹介します。

preg_match/preg_match_allでできること

文字列を正規表現でマッチング・検索
文字列を正規表現でマッチングしてその位置を取得する
文字列を正規表現で分解して切り出す

preg_match関数の使い方

preg_match関数の基本構文

正規表現によるマッチングをします。マッチングは繰り返さず、1回目終了時点で終了します。オプションフラグを使って検索位置の取得などもできます。

preg_match ( string $pattern , string $subject , array &$matches = null , int $flags = 0 , int $offset = 0 )

パラメータ

型	名前	初期値	説明
string	$pattern		マッチさせる正規表現パターン
string	$subject		検索する文字列
array	&$matches	null	検索結果を配列に格納します
int	$flags	0	PREG_OFFSET_CAPTURE(256)→$matches配列にマッチした箇所のインデックス(バイト単位)も格納する PREG_UNMATCHED_AS_NULL(512)→$matches配列上でマッチしない箇所をnullとして格納する
int	$offset	0	検索開始位置のインデックス

返り値(戻り値)

型	説明	例
int\|false	マッチしない=0 マッチ=1 エラー=false

preg_match関数の使い方サンプル

preg_matchで正規表現マッチさせるサンプルソースコード

urlがhttps://やhttp://を含むかどうかマッチングさせてみます。
URLにはそもそも/を複数含むのでパターン囲み文字には|を使用します。

マッチする例

$str = 'https://www.yahoo.co.jp';
$regex = '|https:|';
$result = preg_match($regex, $str, $matches);

echo "文字列「{$str}」に対してパターン「{$regex}」は " . ($result > 0 ? "{$result}=マッチする" : "{$result}:マッチしない") . PHP_EOL;

文字列「https://www.yahoo.co.jp」に対してパターン「|https:|」は 1:マッチする

マッチしない例

$str = 'https://www.yahoo.co.jp';
$regex = '|http:|';
$result = preg_match($regex, $str, $matches);

echo "文字列「{$str}」に対してパターン「{$regex}」は " . ($result > 0 ? "{$result}=マッチする" : "{$result}:マッチしない") . PHP_EOL;

文字列「https://www.yahoo.co.jp」に対してパターン「|http:|」は 0:マッチしない

preg_matchで正規表現のグループ化を使うサンプルソースコード

URLの形式にマッチするか判定しつつ、URLの要素をバラバラに分解してみます。

$str = 'https://search.yahoo.co.jp/search?p=php';
$regex = '|(https?)(://)([^?/]+)(.+)|'; //URLのパターンマッチ
$result = preg_match($regex, $str, $matches);

echo "文字列「{$str}」に対してパターン「{$regex}」は " . ($result > 0 ? "{$result}=マッチする" : "{$result}:マッチしない") . PHP_EOL;
print_r($matches);

文字列「https://search.yahoo.co.jp/search?p=php」に対してパターン「|(https?)(://)([^?/]+)(.+)|」は 1=マッチする
Array
(
    [0] => https://search.yahoo.co.jp/search?p=php
    [1] => https
    [2] => ://
    [3] => search.yahoo.co.jp
    [4] => /search?p=php
)

preg_matchの正規表現マッチングのくり返しは1回、を確認するサンプルソースコード

上はURL1つでしたが2つにしてみましょう

$str =<<<EOF
https://search.yahoo.co.jp/search?p=php
https://www.google.com/search?q=google
EOF;

$regex = '|(https?)(://)([^?/]+)(.+)|'; //URLのパターンマッチ
$result = preg_match($regex, $str, $matches);

echo "文字列「{$str}」に対してパターン「{$regex}」は " . ($result > 0 ? "{$result}=マッチする" : "{$result}:マッチしない") . PHP_EOL;
print_r($matches);

文字列「https://search.yahoo.co.jp/search?p=php
https://www.google.com/search?q=google」に対してパターン「|(https?)(://)([^?/]+)(.+)|」は 1=マッチする
Array
(
    [0] => https://search.yahoo.co.jp/search?p=php
    [1] => https
    [2] => ://
    [3] => search.yahoo.co.jp
    [4] => /search?p=php
)

マッチしますが、先ほどと同様の結果でURL1つしか抽出されていません。
preg_matchは指定されたパターン全体を1回分しかマッチングしないことが分かりました。

preg_matchで検索位置のインデックスを取得するフラグPREG_OFFSET_CAPTURE

$flag=PREG_OFFSET_CAPTUREを使うことで、指定した正規表現のパターン、サブパターンがどこに出現するのかを$matchesの引数に格納することができます。

$str = 'https://search.yahoo.co.jp/search?p=php';
$regex = '|(https?)(://)([^?/]+)(.+)|'; //URLのパターンマッチ
$result = preg_match($regex, $str, $matches, PREG_OFFSET_CAPTURE);

echo "文字列「{$str}」に対してパターン「{$regex}」は " . ($result > 0 ? "{$result}=マッチする" : "{$result}:マッチしない") . PHP_EOL;
print_r($matches);

文字列「https://search.yahoo.co.jp/search?p=php」に対してパターン「|(https?)(://)([^?/]+)(.+)|」は 1=マッチする
Array
(
    [0] => Array
        (
            [0] => https://search.yahoo.co.jp/search?p=php
            [1] => 0
        )

    [1] => Array
        (
            [0] => https
            [1] => 0
        )

    [2] => Array
        (
            [0] => ://
            [1] => 5
        )

    [3] => Array
        (
            [0] => search.yahoo.co.jp
            [1] => 8
        )

    [4] => Array
        (
            [0] => /search?p=php
            [1] => 26
        )

)

preg_match_all関数の使い方

preg_match_all関数の基本構文

正規表現によるマッチングをします。preg_matchと違いマッチングを終わるまで繰り返します。オプションフラグを使ってマッチ順の指定や検索位置の取得などもできます。

preg_match_all ( string $pattern , string $subject , array &$matches = null , int $flags = 0 , int $offset = 0 )

パラメータ

型	名前	初期値	説明
string	$pattern		マッチさせる正規表現パターン
string	$subject		検索する文字列
array	&$matches	null	検索結果を配列に格納します
int	$flags	0	PREG_PATTERN_ORDER(1)→マッチした順 PREG_SET_ORDER(2)→$matches配列にマッチ、サブマッチを階層にして格納 PREG_OFFSET_CAPTURE(256)→$matches配列にマッチした箇所のインデックス(バイト単位)も格納する PREG_UNMATCHED_AS_NULL(512)→$matches配列上でマッチしない箇所をnullとして格納する
int	$offset	0	検索開始位置のインデックス

フラグによる$matchesへの格納形式

PREG_SET_ORDERフラグを使うことで$matchesへのマッチ、サブマッチの階層構造を変更することができます。

以下の表は「’|(https?)(://)([^?/]+)(.+)|’」で2つのURLを分解したものです。

	列[0]	列[1]	列[2]	列[3]	列[4]
行[0]	https://search.yahoo.co.jp/search?p=php	https	://	search.yahoo.co.jp	/search?p=php
行[1]	https://www.google.com/search?q=google	https	://	www.google.com	/search?q=google

PREG_SET_ORDERを指定しない

デフォルトの挙動です。

表の左上から↓⤴↓⤴↓⤴↓⤴↓⤴…と格納されていきます。

Array
(
    [0] => Array
        (
            [0] => https://search.yahoo.co.jp/search?p=php
            [1] => https://www.google.com/search?q=google
        )

    [1] => Array
        (
            [0] => https
            [1] => https
        )

    [2] => Array
        (
            [0] => ://
            [1] => ://
        )

    [3] => Array
        (
            [0] => search.yahoo.co.jp
            [1] => www.google.com
        )

    [4] => Array
        (
            [0] => /search?p=php
            [1] => /search?q=google
        )

)

PREG_SET_ORDERを指定する

表の左上から→→→→↙→→→→↙…の順で格納されていきます。

Array
(
    [0] => Array
        (
            [0] => https://search.yahoo.co.jp/search?p=php
            [1] => https
            [2] => ://
            [3] => search.yahoo.co.jp
            [4] => /search?p=php
        )

    [1] => Array
        (
            [0] => https://www.google.com/search?q=google
            [1] => https
            [2] => ://
            [3] => www.google.com
            [4] => /search?q=google
        )

)

返り値(戻り値)

型	説明	例
int\|false\|null	マッチした数を返す。エラー=false

preg_match_all関数の使い方サンプル

マッチ数とマッチした文字を取得するサンプルソースコード

大文字を切り出します

$str = "abcABCabcXYZabc";
$regex = "/[A-Z]/";
$result = preg_match_all($regex, $str, $matches);
echo "マッチ数: $result" . PHP_EOL;
print_r($matches);

マッチ数: 6
Array
(
    [0] => Array
        (
            [0] => A
            [1] => B
            [2] => C
            [3] => X
            [4] => Y
            [5] => Z
        )

)

指定文字列の複数マッチングをグループ化するサンプルソースコード

さきほどpreg_matchで１つめしかマッチングできなかった例を今度はpreg_match_allでやってみます。

$str =<<<EOF
https://search.yahoo.co.jp/search?p=php
https://www.google.com/search?q=google
EOF;

$regex = '|(https?)(://)([^?/]+)(.+)|'; //URLのパターンマッチ
$result = preg_match_all($regex, $str, $matches);

echo "マッチ数: $result" . PHP_EOL;
print_r($matches);

マッチ数: 2
Array
(
    [0] => Array
        (
            [0] => https://search.yahoo.co.jp/search?p=php
            [1] => https://www.google.com/search?q=google
        )

    [1] => Array
        (
            [0] => https
            [1] => https
        )

    [2] => Array
        (
            [0] => ://
            [1] => ://
        )

    [3] => Array
        (
            [0] => search.yahoo.co.jp
            [1] => www.google.com
        )

    [4] => Array
        (
            [0] => /search?p=php
            [1] => /search?q=google
        )

)

$matches配列に格納するグループ化の階層を変更する

正直、デフォルトの$matchesへのマッチング文字列の格納はわかりづらいです。
PREG_SET_ORDERフラグを指定するとより直感的な結果が得られます。

$str =<<<EOF
https://search.yahoo.co.jp/search?p=php
https://www.google.com/search?q=google
EOF;

$regex = '|(https?)(://)([^?/]+)(.+)|'; //URLのパターンマッチ
$result = preg_match_all($regex, $str, $matches, PREG_SET_ORDER);

echo "マッチ数: $result" . PHP_EOL;
print_r($matches);

マッチ数: 2
Array
(
    [0] => Array
        (
            [0] => https://search.yahoo.co.jp/search?p=php
            [1] => https
            [2] => ://
            [3] => search.yahoo.co.jp
            [4] => /search?p=php
        )

    [1] => Array
        (
            [0] => https://www.google.com/search?q=google
            [1] => https
            [2] => ://
            [3] => www.google.com
            [4] => /search?q=google
        )

)

このようにマッチ、サブマッチがそのまま階層になっています。

文字列の中からURLだけを抽出する

文章の中からURLを抽出してみます。

$str =<<<EOF
URL①→https://trios.pro/0000/1111/2222/
URL②→https://trios.pro/search?a=b#xyz
URL③→https://user:password@trios.pro:8080/search?a=b#xyz

EOF;

$regex = '|https?://[-_.!~*\'()a-zA-Z0-9;/?:@&=+$,%#]+|'; //URLのパターンマッチ
preg_match_all($regex, $str, $matches);
print_r($matches);

Array
(
    [0] => Array
        (
            [0] => https://trios.pro/0000/1111/2222/
            [1] => https://trios.pro/search?a=b#xyz
            [2] => https://user:password@trios.pro:8080/search?a=b#xyz
        )

)

文字列の中から電話番号だけを抽出する

様々な電話番号表記を抜き出してみます。

$str =<<<EOF
TEL①→03-4773-2434(固定電話)
TEL②→+813-1234-5678(国際表記)
TEL③→090-1234-5678(携帯電話)
EOF;

$regex = '|\+*\d{2,5}-?\d{1,4}-?\d{4}|'; //電話番号のパターンマッチ
preg_match_all($regex, $str, $matches);
print_r($matches);

Array
(
    [0] => Array
        (
            [0] => 03-4773-2434
            [1] => +813-1234-5678
            [2] => 090-1234-5678
        )

)

文字列の中からメールアドレスだけを抽出する

文章の中からメールアドレスを抽出してみます。

$str =<<<EOF
メール①→abc@trios.pro(山田さん)
メール②→TEST_1@gmail.com(佐藤さん)
メール③→test+1@yahoo.co.jp(田中さん)
メール③→php-daisuki.234@docomo.jp(鈴木さん)
EOF;

$regex = '/([a-z0-9+_\-.]+)@([a-z0-9\-.]+\.[a-z]{2,6})/i'; //URLのパターンマッチ
preg_match_all($regex, $str, $matches);
print_r($matches);

Array
(
    [0] => Array
        (
            [0] => abc@trios.pro
            [1] => TEST_1@gmail.com
            [2] => test+1@yahoo.co.jp
            [3] => php-daisuki.234@docomo.jp
        )

    [1] => Array
        (
            [0] => abc
            [1] => TEST_1
            [2] => test+1
            [3] => php-daisuki.234
        )

    [2] => Array
        (
            [0] => trios.pro
            [1] => gmail.com
            [2] => yahoo.co.jp
            [3] => docomo.jp
        )

)

もし、メールアドレスの表記が正しいかチェックしたいときはいったん抜き出してから詳細のチェックを行いましょう。

メールアドレスはプロバイダがルール通りに発行しているとは限らないので様々なパターンを想定する必要があるためとても煩雑です。

正規表現だけでやろうとするとかなり複雑な表記になり、他人が読みにくくメンテナンス性の低いコードになりがちです。