ComputersProgramming

PHP (regular expression) - what is it? Examples and checking of regular expressions

When working with texts in any modern programming language, developers constantly meet with the tasks of checking the entered data for matching the required template, searching and replacing test fragments and other typical operations for processing symbol information. Development of own verification algorithms leads to loss of time, incompatibility of the program code and complexity in its development and modernization.

Rapid development of the Internet and the languages of WEB-development required the creation of universal and compact text processing facilities with the minimum amount of code required for this. The PHP language is not an exception and is popular among beginners and professional developers. Regular expression as a language of text templates allows you to simplify the task of text processing and reduce the program code by tens and hundreds of lines. Many tasks can not be solved without it.

Regular expressions in PHP

The PHP language contains three mechanisms for working with regular expressions - "ereg", "mb_ereg" and "preg". The most common is the "preg" interface, the functions of which provide access to the PCRE regular expression support library, originally developed for the Perl language, which is included in the PHP package. Preg-functions search in the specified text string of a match, according to a certain pattern in the language of regular expressions.

Basics of syntax

Within the framework of a short article it is impossible to describe in detail the entire syntax of regular expressions, for this purpose there is a special literature. Here are just the basic elements for showing a wide range of possibilities for the developer and understanding of code examples.

In PHP, a regular expression is formally defined very difficult, and therefore, we simplify the description. A regular expression is a text string. It consists of a delimited template and a modifier, indicating how to handle it. It is possible to include various alternatives and repetitions in templates.

For example, in the expression / \ d {3} - \ d {2} - \ d {2} / m, the delimiter is "/" , then the pattern goes, and the "m" character is the modifier.

All the power of regular expressions is encoded with metacharacters. The main metacharacter of the language is the backslash - "\". It changes the type of the character following it to the opposite one (that is, the ordinary character turns into a metacharacter and vice versa). Another important metacharacter is the straight line "|", which specifies alternative variants of the template. More examples of metacharacters:

^ Beginning of an object or string
( Beginning of the subpattern
) End of the subpattern
{ Origin of the quantifier
} End of quantifier
\ D Decimal digit from 0 to 9
\ D Any character that is not a digit
\ S Blank character, space, tab
\ W Dictionary character

PHP, processing regular expressions, considers a space as a separate significant symbol, so the expressions ABCDEF and ABC WHERE are different.

Subpatterns

In PHP, regular subpatterns are allocated in parentheses and are sometimes called "subexpressions". Perform the following functions:

  1. Allocating alternatives . For example, the pattern of heat (something | bird |) coincides with the words "heat", "firebird" and "roast" . And without brackets, this will only be an empty string, "bird" and "roast".

  2. "Exciting" subpattern. This means that if the substring matches in the template, then all matches are returned as a result. For clarity, we give an example. The following regular expression is given: the winner receives ((gold | gilded) (medal | goblet)) - and a string for matching: "the winner receives a gold medal" . In addition to the original phrase, as a result of the search will be issued: "gold medal" , "medal", "gold" .

Repetition operators (quadrifiers)

When creating regular expressions, it is very often necessary to analyze the repetition of numbers and symbols. This is not a problem if there are not many repetitions. But what if we do not know their exact number? In this case, you need to use special metacharacters.

For the description of repetitions, quadrics are used - metacharacters for specifying the number. Quadrices are of two types:

  • General, enclosed in brackets;
  • Abridged.

The total quantifier is affected by the minimum and maximum number of allowed repetitions of the element in the form of two numbers in curly brackets, for example: x {2,5}. If the maximum number of repetitions is unknown, the second argument is not specified: x {2,}.

Reduced quantifiers represent symbols for the most common repetitions to avoid unnecessary overloading of the syntax. Three abbreviations are usually used:

1. * - zero and more repetitions, which is equivalent to {0,}.

2. + one or more repetitions, i.e. {1,}.

3.? - zero or only one repetition - {0,1}.

Examples of regular expressions

For those who study regular expressions, examples are the best textbook. We give a few that show their ample opportunities with a minimum of effort. All program codes are fully compatible with PHP 4.x and higher versions. To fully understand the syntax and use all the language features, we recommend J. Friedl's book "Regular Expressions", where the syntax is fully understood and there are examples of regular expressions not only in PHP, but also in Python, Perl, MySQL, Java, Ruby and C #.

Checking the correctness of the E-mail address

A task. There is an Internet page where an email address is requested from the visitor. The regular expression must check the correctness of the received address before sending messages. The check does not guarantee that the specified mailbox really exists and accepts letters. But it is possible to weed out deliberately wrong addresses.

Decision. As in any programming language, regular expressions of email address verification in PHP can be implemented in PHP in various ways, and the examples in this article are not the final and only option. Therefore, in each case, we will list the requirements that need to be taken into account in programming, and the specific implementation depends entirely on the developer.

So, the expression checking the validity of email should check the following conditions:

  1. The presence of the @ symbol in the source string and the absence of spaces.
  2. The domain part of the address, outside the @ symbol, contains only valid characters for domain names. The same applies to the user name.
  3. When verifying the user name, you must determine whether special characters are present, such as an apostrophe or a vertical bar. Such symbols are potentially dangerous and can be contained in attacks such as SQL injections. Avoid these addresses.
  4. User names allow only one point, which can not be the first or last character in the string.
  5. The domain name must contain at least two and not more than six characters.

An example that takes into account all these conditions can be seen in the figure below.

Validating URLs

A task. Check whether the specified text string is a valid URL. Once again, regular URL expressions can be implemented in various ways.

Decision. Our final version is as follows:

/^(https?:\/\/)?([\da-z\.-]+)\.([az\.]{2,6})([\/\w \ .-] *) * \ /? $ /

Now let's analyze its components in more detail using the figure.

Item 1 Before the URL can not be any characters
Item 2 We check for the mandatory prefix "http"
Item 3 There must be no characters
Item 4 If there is an "s", then the URL points to a secure connection "https"
Item 5 Required "//"
Item 6 No characters
Item 7-9 Verification of the correctness of the first level domain and the availability of a point
10-13 Controlling the correctness of writing a second-level domain and point
Item 14-17

The URL file structure is a set of numbers, letters, underscores, hyphens, dots and slashes at the end

Checking credit card numbers

A task. It is necessary to verify the correctness of the entered plastic card number of the most common payment systems. The option is considered only for Visa and MasterCard.

Decision. When creating an expression, it is necessary to take into account the possible presence of spaces in the entered number. The numbers on the map are divided into groups for easy reading and dictation. It is therefore quite natural that a person can try to enter a number in this way (ie, using spaces).

It is more difficult to write a universal expression that takes into account possible spaces and hyphens than to simply drop all symbols except digits. Therefore, in the expression, it is recommended to use the / D metacharacter, which deletes all characters except digits.

Now you can go directly to the number verification. All credit card companies use a unique number format. In the example, this is used, and the client does not need to enter the company name - it is determined by the number. Visa cards always start with 4 and have a length of 13 or 16 digits. MasterCard starts in the range of 51-55 with the length of the number 16. As a result, we get the following expression:

Before processing the order, you can perform an additional check of the last digit of the number, which is calculated by the algorithm of the Moon.

Checking phone numbers

A task. Checking the correctness of the entered phone number.

Decision. The number of digits in the fixed and mobile phone numbers varies greatly depending on the country, so it is universal to check using regular expressions, the phone number can not be correct. But international numbers have a strict format and are great for checking by template. Moreover, more and more national telephone operators are trying to comply with a single standard. Number structure is as follows:

+ CCC.NNNNNNNNNNxEEEE, where:

- C is a country code consisting of 1-3 digits.

- N - number up to 14 digits.

- E is an optional extension.

Plus is an obligatory element, and the sign of x is present only if expansion is necessary.

As a result, we have the following expression:

^ \ + [0-9] {1,3} \. [0-9] {4,14} (?: x. +)? $

Numbers in the range

A task. It is necessary to ensure that an integer matches a certain range. Additionally, it is necessary that regular expressions are found only from the range of values.

Decision. We give several expressions for several of the most common cases:

Determine the hour from 1 to 24 ^ (1 [0-2] | [1-9]) $
Day inside the month 1-31 ^ (3 [01] | [12] [0-9] | [1-9]) $
Second or minute 0-59 ^ [1-5]? [0-9] $
Number from 1 to 100 ^ (100 | [1-9]? [0-9]) $
Day of the Year 1-366 ^ (36 [0-6] | 3 [0-5] [0-9] | [12] [0-9] {2} | [1-9] [0-9]?) $

Search for an IP address

A task. You must determine whether the specified string is a valid IP address in the IPv4 format in the range from 000.000.000.000-255.255.255.255.

Decision. As with any PHP task, the regular expression has a number of options. For example, this:

Online test of expressions

Checking regular expressions for correctness for beginning programmers can be difficult because of the complexity of the syntax, which differs from "normal" programming languages. To solve this problem, there are many online expression testers that allow you to easily verify the correctness of the created template on real text. The programmer enters the expression and data for verification and instantly sees the result of the processing. Usually there is also a reference section where regular expressions, examples and implementation differences for the most common programming languages are described in detail.

But fully trusting the results of online services is not recommended for all developers who use PHP. A regular expression, written and verified in person, raises the qualification and guarantees the absence of errors.

Similar articles

 

 

 

 

Trending Now

 

 

 

 

Newest

Copyright © 2018 en.unansea.com. Theme powered by WordPress.