Regular Expressions
Regular Expressions
Regular expressions is the sequence of charcaters which are used to search for a pattern in a string. In python we have re module which can be used to perform regular expressions task easily. We need to import this module named re like this -
import re
To find the any pattern in the given string we can use search() function which will give us the object and in that object contains the information about that searched patern. It will contain whether the patten has been found or not if yes then it contains the span of that pattern i.e. starting and ending index. Remember that it is case sensitive function.
print(re.search("pattern", "searching pattern in text") )
Output: <re.Match object; span=(10, 17), match='pattern'>
Now we can use this object take out relevent information like start() function can be used to find the starting index in string where pattern has been found and end() function can be used to find the ending index.
match = re.search("pattern", "Searching pattern in the string.")
print(match.start()) # 10
print(match.end()) # 17
findall() function return the list containing a list of all matches of a pattern within the string.
str = "This is book"
matches = re.findall("is", str)
print(matches) # ['is', 'is']
match() function works in the similar way as search() function. It also returns the match object just like search() function.
It is not necessary that we use a specified string to find it in the main input string. Suppose you do not actually know what you want to search but you know some properties of that string like how many characters, digits etc. So we can form a regular expression by using the mix of meta characters, special sequences and tests. Each meta character has a specified meaning.
Special sequences are the sequences containing \ followed by one of the characters. We will see the list below in the post.
A set is a group of characters given inside a pair of square brackets. It gives the special meaning.
. = It signals that any charcater is present at some specific place.
^ = It represents that pattern present at begining of string.
$ = It represents that pattern present at the end of string.
* = It represent zero or more occurences of pattern in string.
+ = It represents one or more occurences of a pattern in string.
{} = It represents the specified number of occurences of pattern in string.
| = It represents either this or that character is present.
\d = Digit
\D = Non-digit
\s = white space
\S = non-white space
\w = alphanumeric
\W = non-alphanumeric
sd* = s followed by zero or more d.
sd+ = s followed by one or more d.
sd? = s followed by zero or one d.
sd{3} = s followed by 3 d.
sd{2, 3} = s followed by 2 to 3 d.
[sd] = either s or d
s[sd] = s followed by one or more s or d.
[a-z]+ = sequence of lower case letters.
[A-Z]+ = Sequence of upper case letters.
[a-zA-Z]+ = sequence of lower or upper case letters.
[A-Z][a-z]+ = one upper case letters followed by lower case letters.
Comments
Post a Comment