Regular Expressions
A regular expression defines a search pattern for strings. The abbreviation for regular expression is regex. The search pattern can be anything from a simple character, a fixed string or a complex expression containing special characters describing the pattern. The pattern defined by the regex may match one or several times or not at all for a given string.
Regular expressions can be used to search, edit and manipulate text.
The process of analyzing or modifying a text with a regex is called: The regular expression is applied to the text/string. The pattern defined by the regex is applied on the text from left to right. Once a source character has been used in a match, it cannot be reused. For example, the regex aba will match ababababa only two times (aba_aba__).
Common Matching Symbols
Meta characters
![](https://cds.santechz.com/userfiles/media/uploaded/lknn082m.png)
Examples:
A regular expression to check the name of a person with a minimum limit of 5 chars and maximum limit of 20 characters
^[A-Za-z]{5,20}$
A regular expression to check a mobile number
^(0)\d{10}$
A Regular Expression to check an email address
^[\w\d_\-]+[@][a-z0-9_\-.]+\.[a-z]{2,4}$
RegEx Module in Python
Python has a built-in package called re, which can be used to work with Regular Expressions. A Python regular expression is a sequence of metacharacters that define a search pattern. We use these patterns in a string-searching algorithm to "find" or "find and replace" on strings. The term "regular expressions" is frequently shortened to "RegEx".
findall() function
The findall() function returns a list containing all matches.
Example:
import re
text = "The Galeb class were minelayers originally built as \
minesweepers for the Imperial German Navy between 1918 and 1919.\
They were also known as the Orao class."
# Find out the number of words which contain the letter a
found=re.findall("a",text)
print(len(found))
# Output: 13
# Find out the number of times the word "as" is repeated in the sentence
# We define the word we are searching for within a boundary
# We also define the regular expression as a raw string
found=re.findall(r"\b(as)\b",text)
print(len(found))
# Output: 2
split() function
The split() function returns a list that shows where the string has been split at each match.
Example:
import re
text = "The Galeb class were minelayers originally built as \
minesweepers for the Imperial German Navy between 1918 and 1919.\
They were also known as the Orao class."
# Split the string into an array of elements
split_str=re.split(r"\s",text)
print(split_str)
"""
Output:
['The', 'Galeb', 'class', 'were', 'minelayers', 'originally',
'built', 'as', 'minesweepers', 'for', 'the', 'Imperial', 'German',
'Navy', 'between', '1918', 'and', '1919.They', 'were', 'also', 'known',
'as', 'the', 'Orao', 'class.']
"""
search() function
The search() function takes a regular expression pattern and a string, and it searches for that pattern within the string. If the search is successful, search() returns a match object. Otherwise, it doesn’t return any.
In Python's re module, match()
and search()
return match objects when a string matches a regular expression pattern. You can extract the matched string and its position using methods provided by the match object. re. match() searches for matches from the beginning of a string while re.search() searches for matches anywhere in the string.
Example:
import re
text = "The Galeb class were minelayers originally built as \
minesweepers for the Imperial German Navy between 1918 and 1919.\
They were also known as the Orao class."
# Find the first position of the word class
index=re.search("class",text)
print("The word is located at position:",index.start())
# Output: The word is located at position: 10
sub() Function
The sub() function replaces the matches with the text of your choice:
Example:
import re
text="Eric is an IT Graduate. Eric is a specialist in Cloud Computing."
text=re.sub("Eric","Aaron",text)
print(text)
# Output: Aaron is an IT Graduate. Aaron is a specialist in Cloud Computing.
text=re.sub(r"\b\w{3,5}\b","Grid",text)
print(text)
# Output: Grid is an IT Graduate. Grid is a specialist in Grid Computing.
Get the matched position: start(), end(), span()
You can get the position (index) of the matched substring using the match object's methods start(), end(), and span().
Example:
import re
text="Eric is an IT Graduate. Eric is a specialist in Cloud Computing."
matched=re.search(r"\b(IT)\b",text)
print(matched.start())
print(matched.end())
print(matched.span())
# Output:
# 11
# 13
# (11,13)
Group Extraction
The "group" feature of a regular expression allows you to pick out parts of the matching text.
Suppose for the emails problem that we want to extract the username and host separately. To do this, add parentheses ( ) around the username and host in the pattern, like this: r'([\w.-]+)@([\w.-]+)'. In this case, the parentheses do not change what the pattern will match, instead they establish logical "groups" inside of the match text. On a successful search, match.group(1) is the match text corresponding to the 1st left parentheses, and match.group(2) is the text corresponding to the 2nd left parentheses. The plain match.group() is still the whole match text as usual.
Examples:
# Example 1:
text="Contact us at support@test.com"
em=re.search(r"([\w.-]+)@([\w.-]+)",text)
if em:
print(em.group())
print(em.group(1))
print(em.group(2))
print(em.groups())
# Output:
# support@test.com
# support
# test.com
# ('support', 'test.com')
# Example 2: Extract the usernames from multiple emails
text="Contact us at support@test.com or at products@test.com"
em=re.findall(r"([\w.-]+)@([\w.-]+)",text)
print(em)
# Output: [('support', 'test.com'), ('products', 'test.com')]
for e in em:
print(e[0])
# Output:
# support
# products
File operations in Python
A file is a container in computer storage devices used for storing data. When we want to read from or write to a file, we need to open it first. When we are done, it needs to be closed so that the resources that are tied with the file are freed.
Hence, in Python, a file operation takes place in the following order:
- Open a file
- Read or write (perform operation)
- Close the file
Opening Files in Python
In Python, we use the open() method to open files.
Syntax: open(<file_name>,mode)
Different Modes to Open a File in Python
Mode | Description |
---|---|
r | Open a file for reading. (default) |
w | Open a file for writing. Creates a new file if it does not exist or truncates the file if it exists. |
x | Open a file for exclusive creation. If the file already exists, the operation fails. |
a | Open a file for appending at the end of the file without truncating it. Creates a new file if it does not exist. |
t | Open in text mode. (default) |
b | Open in binary mode. |
+ | Open a file for updating (reading and writing) |
Closing Files in Python
When we are done with performing operations on the file, we need to properly close the file.
Closing a file will free up the resources that were tied with the file. It is done using the close() method in Python.
Writing to a file
There are two things we need to remember while writing to a file:
- If we try to open a file that doesn't exist, a new file is created.
- If a file already exists, its content is erased, and new content is added to the file.
Syntax:
write(string)
This writes the string to the file.
writelines(seq)
This writes the sequence to the file. No line endings are appended to each sequence item. It’s up to you to add the appropriate line ending(s).
Example:
# Function to create a new file
def create_file(file_name):
f=open(file_name,"x")
f.close()
# Function to append text to a file
# We use the 'with open' syntax to automate the closing of a file
def append_text(file_name,text):
with open(file_name,"a") as f:
f.write(text)
create_file("log.txt")
append_text("log.txt","Testing my logging system")
Read a file
By default the read() method returns the whole text, but you can also specify how many characters you want to return.
You can return one line by using the readline() method.
def read_file(file_name):
with open(file_name,"r") as f:
# Reads a single line at a time
text = f.readline()
print(text)
# Reads all the lines in a file
text2=f.read()
print(text2)
read_file("log.txt")
Exercise:
Make a program which reads the following file: 100-Contacts.csv
Make sure to use the built in fuctions of python to open and read the files
Using the built in python collections and by the help of regular expressions, do the following:
1) Extract the email addresses
2) Extract the first name of every customer
3) Find out how many times the county named Los Angeles is repeated.
4) List the phone numbers of those who are living in Boston city.
Threading and Multiprocessing
Threading is a sequence of instructions in a program that can be executed independently of the remaining process. You can see them as different units of your process that do jobs independently when scheduled. If they need to wait for a slow external operation to finish (such as a network request, or disk access), they sleep for a while and enable the scheduler to spend time executing another thread.
What is the Process?
A process is an executable instance of a computer program. Usually, a process is executed in a single sequence of control flow.
See the Key differences between thread and process in Python from the following table:
PROCESS | THREAD |
---|---|
A process is a program in execution. | A thread is a lightweight process. |
Processes run independently of each other. | Threads share the same memory and can access the same variables and data structures. |
Each process has its own memory space. | Threads run within the memory space of the process. |
Processes are heavyweight and take more resources. | Threads are lightweight and require fewer resources. |
Processes communicate with each other through interprocess communication (IPC). | Threads communicate with each other through shared memory or message passing. |
Processes can run on different processors or cores. | Threads are limited to a single processor or core. |
Threading is a way of achieving multitasking in Python. It allows a program to have multiple threads of execution simultaneously. Each thread runs independently and can perform different tasks concurrently. This means that if one thread is blocked or waiting for input/output, other threads can continue to run and keep the program responsive.
Python provides a threading module that makes it easy to create and manage threads in a program. With this module, you can create multiple threads, start them, and synchronize their execution.
Example of Multiprocessing:
import time
import multiprocessing
def count(num):
a = 0
for i in range(num):
a += i
time.sleep(1)
print(a)
if __name__ == '__main__':
p1=multiprocessing.Process(target=count,args=(50000,))
p1.start()
p2=multiprocessing.Process(target=count,args=(7000,))
p2.start()
print("the processes have started")
print("this message is show before the completion of a non joined process")
print("Starting newer processes")
p3=multiprocessing.Process(target=count,args=(100000,))
p3.start()
p4=multiprocessing.Process(target=count,args=(8000,))
p4.start()
# Join allows the program to wait till the processes have completed
p3.join()
p4.join()
print("This message is shown after the processes have terminated successfully")
"""
Output:
the processes have started
this message is show before the completion of a non joined process
Starting newer processes
24496500
1249975000
31996000
4999950000
This message is shown after the processes have terminated successfully
"""
Example of Threading:
import threading
import time
def count(num,id):
print(id,"Running")
a = 0
for i in range(num):
a += i
#One second delay
time.sleep(1)
print(id,a)
t1 = threading.Thread(target=count,args=(1000000,"Thread 1:"))
t1.start()
t2 = threading.Thread(target=count,args=(5000,"Thread 2:"))
t2.start()
t3 = threading.Thread(target=count,daemon=True,args=(2000000,"Thread 3:"))
t3.start()
print("Threads are running")
"""
Output:
Thread 1: Running
Thread 2: Running
Thread 3: Running
Threads are running
Thread 2: 12497500
Thread 1: 499999500000
"""
Why did thread 3 not complete its execution?
Beacuse it is a daemon thread. A daemon thread is a thread that runs in the background, and is not expected to complete its execution before the program exits.
In order to solve this problem, we can add the following line after t3.start(): t3.join()
Exercise:
Write a program which simultaneously writes 200 odd numbers to one file and 100 prime number to another file.