Assignment 8 Cyber Security Assignment Questions Question - 1 Imagine you are a cybersecurity analyst working for a large multinational corporation. One morning, your team receives an urgent report about a potential security breach in the company's network. The IT department has noticed unusual network activity originating from a particular IP address. Your team has been tasked with investigating this incident to determine if it poses a threat to the organization's network security. Assignment Question: 1. Using the Python library Scapy, analyze the network packets associated with the suspicious IP address provided answer:Scapy is a wonderful Python library that allows to craft packets and send them on a network. In this blog post we show how Scapy can be used to read a pcap file, in order to detect abnormal behavior. Installation You can install Scapy using PIP: pip3 install scapy Read a PCAP The first thing you want to do is open a pcap and loop over the packets. You can do this with PcapReader, which actually create a generator. Moreover, you can use the method packet.show() to show the list of available protocol layers and values: from scapy.all import * # PcapReader creates a generator # it does NOT load the complete file in memory packets = PcapReader("capture.pcap") for packet in packets: print(packet.show()) scapy-open-show.png Filter by protocol Next, we will typically filter packets depending on the payload protocol. Therefore we can use the method packet.hasLayer(protocol). For example, to process only DNS packets: from scapy.all import * # PcapReader creates a generator # it does NOT load the complete file in memory packets = PcapReader("capture.pcap") for packet in packets: if packet.hasLayer(DNS) Detection of domain generation algorithms (DGA) Domain generation algorithms (DGA) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers. The large number of potential rendezvous points makes it difficult for law enforcement to effectively shut down botnets, since infected computers will attempt to contact some of these domain names every day to receive updates or commands. For example, an infected computer could create thousands of domain names such as: www.qmsldazerfj.com and would attempt to contact these with the purpose of receiving an update or commands. The technique was popularized by the family of worms Conficker.a and .b which, at first generated 250 domain names per day. [1] However, most of these generated domain names have no corresponding DNS entry. This means a computer infected with a DGA malware will usually receive a lot of DNS replies that contain no answer (also sometimes called NXDOMAIN errors). We can use Python and Scapy to try to detect these by counting the number of empty DNS responses received by each IP in the capture: from scapy.all import * packets = PcapReader("capture.pcap") counts = {} # QR = Query Response # ANCOUNT = Answer Count # https://datatracker.ietf.org/doc/html/rfc5395#section-2 for packet in packets: if packet.haslayer(DNS) and packet[DNS].qr == 1 and packet[DNS].ancount == 0: # DNS query returned no answer # extract the destination IP (device that sent the query) ip = packet[IP].dst counts[ip] = counts.get(ip, 0) + 1 threshold = 100 print("+ Create list of suspicious IP addresses ...") suspicious = [] for ip, occurrences in counts.items(): if occurrences < threshold: continue suspicious.append(ip) print(suspicious) This approach is very similar to the way alerts are created in a SIEM software like Elasticsearch or Splunk. It has a main limitation: how can we define the threshold in a sensible way? In our example we simply set a fixed threshold: threshold = 100 For some networks (and some captures), this may generate a lot of alerts, while for some other networks and captures it may generate no alert at all! Empirical detection rule Luckily, we can use some simple Machine Learning (AI ?) algorithm to compute a sensible threshold. Conceptually, we want to compare the behavior of a device against some reference. There are 2 kinds of references that we can use: We can compare this device with other devices that we find in the pcap (or in the same subnet), because we assume that they should all exhibit the same behavior. We can compare the current behavior of the device with the behavior of the same device, but at another time. For example, we can compare the DNS behavior of a device now and one week ago, to try to detect an infection that took place in between... For the example below, we will use the first approach, and compare the DNS behavior of the different devices (IP addresses) present in the pcap. Moreover, to compute the threshold, we will use the empirical detection rule, also called the the three-sigma rule of thumb. This rule is based on the fact that, in a normal dataset: the probability that a value is larger than μ + 3 σ is roughly 0.13% and the probability that a value is larger than μ + 2 σ is roughly 2.5% where: μ is the mean of values and σ is the standard deviation. This is illustrated on the histogram below: it shows that most devices (IP addresses) should cause the same number of NXDOMAIN replies, and only a few of them should cause a large number of NXDOMAIN replies. empirical_rule_histogram.png So now we can use the Python statistics module to compute a sensible threshold, based on the mean number of NXDOMAIN replies cause by the devices: import statistics mean = statistics.mean(counts.values()) stddev = statistics.stdev(counts.values()) threshold = mean + 3 * stddev Drawing a histogram To help visualize the behavior of devices in the dataset, we can also use Python and the matplotlib library. import math import matplotlib.pyplot as plt # a common trick to draw the histogram is to use # the square root of the number of values as the number of bins: bins = int(math.sqrt(len(counts))) devices = len(counts) plt.hist(counts.values(), bins=bins) plt.title(f"Histogram of NXDOMAIN DNS responses per IP ({devices} devices, {bins} bins)") plt.xlabel('Number of NXDOMAIN DNS responses') plt.ylabel('Number of devices (IP addresses)') plt.legend() plt.show() The result (tested on a very small capture) is shown below. It shows that most devices received around 7 NXDOMAIN replies (the median), and a few (actually one) devices received a lot of NXDOMAIN replies. Based on the mean and standard deviation, the threshold was computed as 49, which allowed to detect the infected device... dns-nxdomain.png Expected Procedure: 1. A detailed explanation of how Scapy can be utilized to capture and dissect network packets answer:It is useful in a variety of use cases, one of which is to actually get some hands-on experience when you learn Computer Networks. Wouldn't it be great if, when learning about Ethernet, for example, you could create, send, sniff and parse Ethernet frames on your own? Scapy is the perfect tool for that. In addition, you can use Scapy for creating networking-based applications, parsing network traffic to analyze data, and many other cases. This post assumes you have some background knowledge in Computer Networks, for example about the layers model. It also assumes you have some basic Python knowledge. What will you learn? In this post we will start from the very basics – what Scapy is, and how to install it. You will learn how to sniff data and parse it with Scapy, and how to display it in a meaningful manner. You will also learn how to create frames or packets, and how to send them. Altogether, you should have a new powerful tool under your belt. How to Install Scapy To install Scapy, you can simply use pip install scapy. If you run into trouble, simply follow the official documentation. How to Use Scapy For now, let’s open up the command line and type in scapy. You should expect something like the following: Running Scapy from the CLI (Source: Brief) Note that the warning messages are fine. Since this is a Python environment, dir, help, and any other Python function for information retrieval are available for you. Of course, you can always combine Python code with your Scapy scripts. How to Work with Packets and Frames in Scapy Packets and frames in Scapy are described by objects created by stacking different layers. So a packet can have a variable number of layers, but will always describe the sequence of bytes that have been sent (or are going to be sent) over the network. Let's create a frame that consists of an Ethernet layer, with an IP layer on top: Stacking Layers (Source: Brief) Look how easy that is! We’ve used the / operator in order to stack the IP layer on top of the Ethernet layer. Note that when looking at this object, it only tells us non-default values. The type of Ethernet is 0x800 (in hexadecimal base) as this is the type when an IP layer is overloaded. Let's look more deeply at the fields of the packet: With the show method we can observe all fields of the frame (Source: Brief) Pretty cool! 😎 How to Sniff with Scapy Scapy also allows us to sniff the network by running the sniff command, like so: Sniffing with the sniff command (Source: Brief) After running sniff with count=2, Scapy sniffs your network until 2 frames are received. Then it returns – and in this case, the variable packets will store the frames that have been received. The return value of sniff can be treated as a list. Therefore packets[0] will contain the first packet received, and packets[1] will contain the second: The return value of sniff is an iterable, so it can be accessed as a list (Source: Brief) A helper function summary is available too and will provide minimal information regarding the packet collection: Using summary we can get some information of the packet collection (Source: Brief) When looking at a specific frame, every layer or field can be accessed in a very elegant way. For instance, in order to get the IP section of the packet, we can access it like so: Accessing a specific layer (and its payload) (Source: Brief) Note that this shows us everything from the IP layer and above (that is, the payload of the IP layer). Let's now observe the source Ethernet address of this frame: Accessing a specific field (Source: Brief) Nice and easy. Now, you will learn how to run a specific command for every frame that you sniff. First, create the callback function that will be run on every packet. For example, a function that will just print the source Ethernet address of the received frame: Defining a callback function that receives a frame as its argument (Source: Brief) Now, we can pass this function to sniff, using the prn argument: Run a callback function on every sniffed frame (Source: Brief) The Ethernet addresses have been printed as a result of print_source_ethernet being executed, where every time, it receives a sniffed frame as an argument. Note that you can write the same in Python using a lambda function, as follows: Define the callback function using lambda (Source: Brief) If you prefer to write an explicit function like the one we’ve written above, that’s perfectly fine. We usually want to filter traffic that we receive – and look only at relevant frames. Scapy’s sniff function can take a filter function as an argument – that is, a function that will be executed on every frame, and return a boolean value – whether this frame is filtered or not. For example, say we would like to filter only frames that are sent to broadcast. Let’s write a simple filtering function that does just that: A simple filtering function (Source: Brief) Now, we can use the lfilter parameter of sniff in order to filter the relevant frames: Filtering frames based on a filter function (Source: Brief) In order to clarify, let’s draw this process: The process of sniffing and filtering with lfilter (Source: Brief) A frame f is received by the network card. It is then transferred to lfilter(f). If the filter function returns False, f is discarded. If the filter returns True, then we execute the prn function on f. So we can now combine these two arguments of sniff, namely lfilter and prn, and print the source address of every frame that is sent to the broadcast address. Let’s do this now using lambda: Combining lfilter and prn 💪🏻 (Source: Brief) This is equivalent to writing the following line, without lambda: sniff(count=2, lfilter=is_broadcast_frame, prn=print_source_ethernet) Readable, quick, and useful. Have you noticed that I love Scapy? 🥰 Alright, so far we’ve learnt how to sniff frames. When sniffing, we know how to filter only relevant frames, and how to execute a function on each filtered frame. How to Create Frames in Scapy To create a frame, simply create an Ethernet layer using Ether(). Then, stack additional layers on top of it. For instance, to stack an IP layer: Creating a frame with two stacked layers (Source: Brief) Alternatively, we can just add raw data, as follows: Using Raw data as the payload (Source: Brief) If you want to specify a specific value, for instance the destination address of the frame, you can do it when you initially create the frame, like so: Creating a frame and specifying specific values (Source: Brief) Or, we can modify the specific field after creation: Modifying specific values (Source: Brief) How can we look at the frame we’ve just created? One way is to observe a frame using show, as we’ve done above. Another way of looking at a frame is by looking at its byte stream, just like in Wireshark. You can do this using the hexdump function: Viewing the hexadecimal byte stream (Source: Brief) Well, even better – we can just look at it inside Wireshark! By running wireshark(frame). How to Send Frames in Scapy You can send frames using sendp, as follows: Sending frames with sendp (Source: Brief) Let's sniff in wireshark while sending the frame to make sure that it’s actually sent: Observing the frame we've sent using Wireshark (Source: Brief) 2. A step-by-step breakdown of the process you followed to capture and analyze the network traffic. answer:Capturing Network Traffic Snort needs a way to capture network traffic, and does so through two mechanisms: ▪ Setting the network card into promiscuous mode. ▪ Then grabbing the packets from the network card using the libpcap library. We discuss promiscuous mode and the libpcap library later in the “Packet Sniffing” section. For now, let's take quick refresher on the OSI model and the TCP/IP protocol suite. It's important, as we will be referencing them both throughout this chapter. The OSI and TCP/IP Models The Open Systems Interconnection (OSI) model was originally designed to be a standard for developing network communication protocol suites. By strictly adhering to the OSI model, different network vendors could write code that would interoperate with other competing network vendors. Unfortunately, the network industry didn't fully comply with the OSI model, and the TCP/IP protocol suite was no exception. The most powerful part of the OSI model is the “layering” concept. Each layer consists of a number of components, separated into seven layers. Each layer is responsible for a particular part of the communication process. During communication, the layers receive data formatted by the layers above, manipulate the data, and then send it down to the layer below. When receiving data, the layers receive the data from the layer below, unpack the data, and then pass it up one level. The layering concept has the following advantages: ▪ Major code rewrites of a protocol are not necessary if a particular component needs to be changed. For example, if you want to change the IP component at Layer 3, it won't affect the other layers. ▪ It allows for the breakdown of complex network processes into more manageable sublayers. ▪ Industry-standard interfaces provide interoperability between different vendors. A vendor can write a piece of code for the network layer, for example, and other vendors can then use it seamlessly. ▪ Layering allows for easier troubleshooting, because the protocols are separated into layers. When troubleshooting, you don't have to tackle the complete protocol, only the layer with the problem. In this chapter, we also talk about Snort decoding, and significant actions at the different layers of the OSI model. Figure 4.2 shows where Snort's activities lie in the OSI model. Figure 4.2. The OSI Model and Snort OINK! The OSI model provides a useful method of describing how a protocol suite such as TCP/IP works. When learning about a new protocol or protocol suite, you will tend to refer back to the OSI model, as it helps us understand where a protocol fits in and what other protocols interoperate with it. TCP/IP Originally a governmentally funded research project, TCP/IP has grown to be the most popular protocol suite in the world. TCP/IP is a combination of suites of different protocols at different layers of the OSI model, as you will see later in the chapter. While Snort can decode other protocols, it is primarily focused on the TCP/IP suite. The TCP/IP suite doesn't exactly following the OSI model, and in some cases differs depending on the operating system. Therefore, using the OSI model as a blueprint, Figure 4.3 illustrates the TCP/IP protocol suite. Figure 4.3. The TCP/IP Model and Snort The five layers of TCP/IP are as follows: ▪ Application layer For example, Web-based HTTP protocol, e-mail SMTP-based protocol ▪ Transport layer For example, TCP and UDP ▪ Network layer For example, ICMP and IP ▪ Data Link layer For example, Ethernet, Token Ring, and ARP ▪ Physical layer For example, a network card or modem 3. Identification and interpretation of any suspicious or anomalous network behavior observed in the captured packets. answer:Network behavior anomaly detection is the process of monitoring enterprise networks to detect abnormal behavior. Once an anomaly is spotted, network behavior anomaly detection either initiates an automated response or notifies security teams. System Architecture of Network Anomaly Detection System System Architecture of Network Anomaly Detection System The post-pandemic corporate environment is rife with unpredictable cybersecurity threats. New types of malware built to silently compromise enterprise systems, crippling DoS attacks, and advanced persistent threats capable of bypassing traditional security solutions have completely changed how we look at IT security in 2022. Gone are the days when a strong network perimeter and robust signature-based security solutions could protect an enterprise from bad actors. Today, more proactive measures to counter cyber threats, such as in-depth awareness of network behavior, are necessary to ensure a secure IT environment. Of course, many companies continue to rely on legacy IT security systems that consist of endpoint security and perimeter protection. However, such cybersecurity infrastructure often fails to account for the network between the perimeter and the endpoint. Since the pandemic hit, threat actors have become more advanced than ever. Today more than ever, threat actors possess the tools to bypass traditional security solutions and sneak into enterprise networks. Network behavior anomaly detection is built to counter such threats. As the name suggests, it relies on network behavior analysis to operate. Network behavior anomaly detection uses artificial intelligence (AI) and machine learning (ML) to detect hidden threats in those parts of network infrastructure that other security tools cannot reach and then notifies network teams. Continuous monitoring is a crucial feature of network behavior anomaly detection, augmenting anti-threat applications such as antivirus and spyware detection solutions with an extra layer of security. Network behavior anomaly detection works by detecting unusual network behavior, for instance, heavy traffic flow during otherwise ‘quiet’ hours. However, while, by itself, this solution is highly efficient at patching the gaps left by more traditional cybersecurity tools, it is the most useful when combined with them. Security teams leverage network behavior anomaly detection alongside network firewalls, network performance monitoring software, and other measures. While these other tools protect the network from known threats, network behavior anomaly detection brings to light suspicious activities that might end up compromising network operations through hidden infections, data theft, or other malicious activities. Combining signature and anomaly detection capabilities allows network behavior anomaly detection to investigate unusual network activity. The network characteristics tracked by network behavior anomaly detection programs at scale include packets, bandwidth, bytes, traffic volume, and protocol used. Any suspicious event is logged in a report and consists of the originating and destination IP addresses, relevant ports, protocols, timestamps, and more. These critical metrics and many more are tracked by the network behavior anomaly detection tool in real-time. An alarm is raised if a strange trend or outlier that might hint at the existence of a threat is detected. Depending on the chosen configuration, network behavior anomaly detection programs can also monitor the behavior of individual network users. Network behavior anomaly detection solutions work by ‘sweeping’ the complete enterprise network when looking for threat actors. This is a marked difference from the perimeter, firewall, and endpoint security systems. These solutions only detect threats communicated through the specific part of the network where they are set up. Conversely, network behavior anomaly detection accounts for three significant network properties—traffic flow patterns, passive traffic analysis, and network performance data—from across the network to detect several different types of threats, such as: Inappropriate network behavior, such as unauthorized applications or a known program’s unusual use of ports. On detecting such activity, the network behavior anomaly detection solution and associated protection systems can identify and disable the associated network processes automatically and notify the concerned security personnel. Data exfiltration, like a suspiciously high volume of data being transferred. In case such an activity is detected, network behavior anomaly detection and related security solutions can automatically monitor the outbound transfer of data and report it to security teams in real-time. Some systems would even be able to identify the destination of these data transfers further and determine whether it is a legitimate communication or a cybersecurity event. Hidden threats, such as advanced malware. Network behavior anomaly detection would work with other security solutions to deploy the appropriate security countermeasures. It notifies concerned stakeholders to detect a threat that may have dodged perimeter security and entered the enterprise network. Regardless of the configuration of the network or the tool, the first step taken by a network behavior anomaly detection solution is to establish a baseline for the average user and network behavior. This baseline is established over a prolonged period; the longer the time, the more accurate and useful the collected behavior data. Once the solution captures and defines the ‘normal’ parameters, it flags outliers in real-time. Acknowledging the dynamic cyber threat landscape of 2022, software vendors have begun to include network behavior anomaly detection tools in their network security solution suites. 4. Recommendations for mitigating the identified security risks and securing the network against similar threats in the future. answer:Businesses of all sizes are susceptible to network security threats. Since hackers and cybercriminals are always looking for new ways to exploit network vulnerabilities, business owners must take steps to protect their data and infrastructure. This article will discuss five ways to prevent network security threats. The Importance of Network Security Before we discuss specific methods for thwarting network threats, it’s essential to understand the importance of network security. Having a secure network is vital to protecting data and preventing unauthorized access to systems. Additionally, maintaining a secure network can be part of meeting compliance requirements and protecting brand reputation (Bailkoski, 2021). Businesses that neglect network security are more likely to experience data breaches, which can be costly and damaging. Common Network Security Threats Businesses can face many types of threats to their networks. Some of the top network security risks include: Malware. Malware is a term used to describe a wide range of malicious software, including viruses, trojans, and spyware. Malware can be installed on a system without the user’s knowledge, where it can then cause damage or steal data. Spyware. Spyware is software that collects information about a user without their knowledge. It can track what websites a target visits and collect sensitive data, like passwords and credit card numbers. Phishing. Phishing attacks involve sending fraudulent emails or text messages to obtain sensitive information from recipients. The messages may appear to come from a legitimate source, such as a bank or credit card company, but are in reality sent by scammers. Ransomware. Ransomware is malware that locks users out of their computer or mobile device until a ransom payment is made. Ransomware viruses can be challenging to remove and can damage or delete files on a user’s system. Distributed Denial-of-Service (DDoS) attacks. A DDoS attack is one of the most dangerous types of security threats (Mathew, 2021). It is a type of cyberattack in which multiple systems flood a target with traffic, making it unavailable for legitimate users. DDoS attacks can be very costly and difficult to defend against. How to Prevent Network Attacks There are many different ways to defend against network-related threats. Here are five of the most effective methods. 1. Install antivirus software. One of the first lines of defense against malware and other viruses is to install antivirus software on all devices connected to a network (Roach & Watts, 2021). Antivirus software can detect and prevent malicious files from being installed on a system, and it should be updated regularly to include the latest definitions. 2. Create strong passwords. Another essential step in protecting a network is to create strong passwords. Passwords should be at least eight characters long and include a mix of letters, numbers, and symbols. They should also not be easy to guess—for instance, the user’s name or the name of the company. 3. Enforce security policies. A third way to reduce risk of attacks on a network is to enforce security policies. Security policies can help ensure that all devices on a network are protected against viruses and malware and that users are using strong passwords. These policies can also restrict access to some network regions and limit user privileges. 4. Use firewalls. Firewalls are another essential tool in defending networks against security threats. A firewall can help prevent unauthorized access to a network by blocking incoming traffic from untrusted sources. Additionally, firewalls can be configured to allow only certain types of traffic, such as web traffic or email. 5. Monitor activity. Finally, it’s important to monitor activity on the network. Tracking logs and other data enables suspicious activity to be identified quickly, allowing security personnel to take steps to investigate and mitigate potential threats. Consequences of Network Breaches Network security breaches can have severe consequences for businesses, including: Data loss. A network security breach can result in the loss of sensitive data, such as customer information or financial records. Damage to reputation. A breach can also damage a company’s reputation and make it difficult to regain the trust of customers and other stakeholders. Loss of revenue. In some cases, a network security breach can lead to a loss of revenue as customers take their business elsewhere. Increased costs. Breaches can also lead to increased costs, such as hiring new staff or upgrading security systems. How to Become a Network Security Engineer If you want to learn more about how to protect networks against security threats, consider enrolling to the best network security courses with accredited program provider EC-Council. EC-Council’s Certified Network Defender (C|ND) program is designed to cover everything you need to know about network security protection, from the basics to advanced techniques. The C|ND is designed to provide cybersecurity professionals with the knowledge and skills they need to defend networks against various security threats. The program covers a wide range of topics: Network security concepts. Get introduced to common security concepts, including viruses, malware, and firewalls. Network security threats. Learn about different network security threats, how to protect networks against them, and how to gain security access control. Operating system security. Understand the various features that can be used to secure Windows and Linux systems. Application security. Find out how to secure applications like web browsers and email clients. Networking fundamentals. Explore key networking concepts, such as TCP/IP packets and switches. Endpoint security. Learn about the different types of security measures that can be used to protect endpoint devices like laptops and smartphones. Traffic analysis. Become proficient in using tools like Wireshark to analyze network traffic and detect security threats. Incident response. Find out the steps that should be taken in the event of a security incident. Forensic investigation. Learn what occurs in the digital forensic investigation process, including how to collect evidence and identify the source of a security breach. Expected Code: 1. Write a python code to Network Packet Analysis with Scapy answer:Now, we will set both of the machines to Host-Only adapter to avoid any other additional & junk traffic on the network. So, we got the attacker machine with the following IP 192.168.11.130 and the Victim machine with the following IP 192.168.11.131. We will perform some Port Scanning to discover the used services by the Victim machine, While we are running tcpdump on it to capture the network traffic will be generated by our actions. Let's run tcpdump using the following command tcpdump -i -w file_name.pcap. Basically, the -i is to identify which interface the tcpdump will work on and -w to write the captured traffic into a file "You have to give the file name as a value". Now, time to simulate our attack on the victim. In the above picture we perform a Port Scanning using Nmap. As explain for the command in the screenshot: -Pn: Disable ping request to the target. -n: Disable DNS resolution. --open: Display only open ports. -v: For verbose. Results show us that FTP & SSH services are running. The reason why i disabled the ping and dns requests is to reduce the traffic & You could use nmap just to scan the 21/ftp port also, 22/ssh port using the -p option and give it the ports you wish to scan and separate it by , (e.x:-p 21,22,80,8080). Read the traffic with It's the moment to analysis the traffic we captured. First, turn off tcpdump using CTRL+C keys. And after listening the files you will be able to see our captured file whicc is traffic.pcap as we saved. Before we start we need python3 & Scapy package installed. You can install Scapy using pip as the following pip install scapy. Also, you can use a text editor for your code or an IDE, I am going to use Pycharm during this blog. let's run our IDE and start coding. So, Lets explain the above code to understand the basics of Scapy. import scapy.all as scapy import argparse Here we import the libraries we do need, I imported Scapy as it's the main one for our topic & i used argparse to parse the input using command line arguments. parser = argparse.ArgumentParser() parser.add_argument("-f", "--file", help="Read a single file.", type=str) args = parser.parse_args() We created our parser now and added an argument with type String. Then, we make the argument -f or --file. Then we parsed the arguments of our parser in args variable. After that we created a function and naed it Start() and it takes one argument called file which gonna be the file path we will provide to analysis & read the data from the pcap file. Now, the actual code inside our Start() function. print(f"[+] Reading: {file}"): Print the file path we provided. p = scapy.rdpcap(file): Start read the pcap file and store it inside p variable. packets = len(p): Get the length of the pcap file we have read which is also the number of packets and we stored it into packets variables. print(f"[+] NUmber of packets {packets}"): Print the number of packets. The following lines we created a for loop in range of packets number, that starts from index 0 to the packets number. pkt = p[i]: Variable pkt to store the packet which the index is i referees to the packet number in the packets. Now, to explain the rest of the code we need to under stand the format of the packets in Scapy & how its parsing them. So, we are going to use Scapy from the command Line Interface to explain it. In the above picture we read the pcap file through the Command Line Interface for Scapy inside p variable and then we executed it and got the following output . It tells you information about the packets inside the file like: "Numbers of TCP,UDP, ICMP & others packets". Now, if we try to show one of the packets for example packet number 1 using p[1] we will get the following results: an=None ns=None ar=None |>>>> Explainig the output: Ether: Layer 2 captured data like MAC address. IP: Layer 3 captured data like Source & Destination address. UDP: Layer 4 Used protocol and the Source & Destination ports. The rest are additional information according to the service used and the packet data. Also, the UDP could be TCP depending on the used type. For example the following packet is a TCP packet. >> Why we needed to know this ?, Cause when you want informations from the packet you have to specify the Layer you want data from and what data do you want for instance, You want the Destination port. So, we gonna fetch it as this packet["TCP"].dport. (packet["Layer"].key). Now, Back to the rest of our code. we made an exception here in the following code: First, it's gonna try to check if the packet is TCP and will print the packet information with type TCP. If not the exception will print it as UDP type. try: if pkt["TCP"]: print("========================================================") print(f'[+] Packt Number: {i}, Version: IPv{pkt["IP"].version}, ' f'Type: TCP, Source IP: {pkt["IP"].src}, ' f'Destination IP: {pkt["IP"].dst}, Source Port: {pkt.sport}, Destination Port: {pkt.dport}') print("========================================================") except: print("========================================================") print(f'[+] Packt Number: {i}, Version: IPv{pkt["IP"].version}, ' f'Type: udp, Source IP: {pkt["IP"].src}, ' f'Destination IP: {pkt["IP"].dst}, Source Port: {pkt.sport}, Destination Port: {pkt.dport}') print("========================================================") The information that will be printed: Packt Number: {i}: Packet number. pkt["IP"].version: IP version v4/v6. pkt["IP"].src: Source IP. pkt["IP"].dst: Destination IP. pkt.sport: Source Port. pkt.dport: Destination Port. Running the code and the results: Here we do grep from the shell to get the lines contain udp which are the UDP packets and it's include all the information we added to the could to be printed. Manual Analysis for Port Scan traffic After all what we go through. Now, it's the time to analysis our captured file manually using wireshark to see how the port scanning we performed is working and the traffic of the opened & closed ports. Then, we will use Scapy to automate the detection of port scanning. run wireshark from the command line and provide the file to it wireshark file.pcap we can see a big traffic and to make the analysis more easy we gonna to compare the open ports traffic with the closed one. Using the tcp.port==22 will show us traffic of port 22 which is SSH protocol. We can see that the attacker 192.168.11.130 connecting to 192.168.11.131 which is the victim on port 22 as the following: Attacker Sends connection request on port 22 along with SYN flag Attacker => SYN => Victim Victim response with SYN/ACK flags which means the port is open Victim => SYN/ACK => Attacker Attacker send ACK flag which now is fully connected and can start use the service Attacker => ACK => Victim At the end attacker send RST/ACK which will close the connection with the victim Attacker => RST/ACK => Victim The above analysis was for an open port. So, let's see how is it for a closed one for example one of the ports we know it's closed like 8080 let's filter it out using tcp.port==8080. Attacker Sends connection request on port 8080 along with SYN flag Attacker => SYN => Victim Victim Response RST/ACK which means that no open ports Victim => RST/ACK => Attacker After we knew the behaviour for both open/closed ports in the traffic. Therefore, Let's automate the detection. Automated Analysis & Detection From what we understand in the manual analysis we can check the flags for ports packets detect port scanning by analysis the attempts of connection on different ports. So, lets take the short path and search for failed connections in the packets and see if it's for the same IP. import scapy.all as scapy import argparse parser = argparse.ArgumentParser() parser.add_argument("-f", "--file", help="Read a single file.", type=str) args = parser.parse_args() flag = [] def check_flags(attacker, server, port): if flag[0] == "S" and flag[1] == "RA": print(f'[!] Failed connection: {attacker} ====> {server}:{port}') def Start(file): print(f"[+] Reading: {file}") p = scapy.rdpcap(file) packets = len(p) print(f"[+] NUmber of packets {packets}") for port in range(0, 65536): for i in range(0, packets): pkt = p[i] try: if pkt.sport == port or pkt.dport == port: if pkt.dport == port: flag.append(str(pkt["TCP"].flags)) elif pkt.sport == port: flag.append(str(pkt["TCP"].flags)) check_flags(pkt["IP"].dst, pkt["IP"].src, port) flag.clear() except: pass Start(args.file) The code will print the failed packets that try to connect to a closed port and print us out the results. Let's explain the code: There are some parts of the code are the same to the above one. So, Just the new added lines will be explained flag= []: created array. for port in range(0, 65536): A for loop in range of all the ports number in the exist. The following lines we created a for loop in range of packets number, that starts from index 0 to the packets number. Therefore, we gonna take all packets and check if the Source Port or Destination Port in it is equal to our port number. Then, as the Destination Port is the first sent in the packet which will carry the SYN flag with it as a try to connect to this port, we gonna save it's flag in the array first. Aبter that we save the flag coming from the Server which come on the same port as a Source Port. then we call the function check_flags and pass the arguments to it. What this function do is the following: def check_flags(attacker, server, port): if flag[0] == "S" and flag[1] == "RA": print(f'[!] failed connection: {attacker} ====> {server}:{port}') This function is taking 3 arguments which is the Attacker IP, Server IP & the port number. After that it checks if the first & second elements of the array flag is equal to S & RA Which means a failed connection on a closed port. flag.clear(): clear the array after check. Running the script: As you can see a lot of failed connections from the same IP address on different ports. If you look clearly on the picture. You will see that port 22 not here cause it's an open port and created a success connection. Conclusion At the end Port Scanning has a lot of types and what we saw in the blog was just an example. I would recommended that you go though Scapy documentation and try to perform different scanning types on your environment and analysis the traffic manually then automate it. Therefore, you will be able to detect that scan type. Question - 2 Imagine you are working as a cybersecurity analyst at a prestigious firm. Recently, your company has been experiencing a surge in cyber attacks, particularly through phishing emails and websites. These attacks have not only compromised sensitive information but also tarnished the reputation of the company. In light of these events, your team has been tasked with developing a robust solution to detect and mitigate phishing websites effectively. Leveraging your expertise in Python programming and cybersecurity, your goal is to create a program that can accurately identify phishing websites based on various features and indicators. Assignment Task: Using the Python programming language, develop a phishing website detection system that analyzes website characteristics and determines the likelihood of it being a phishing site answer:Phishing is a form of fraudulent attack where the attacker tries to gain sensitive information by posing as a reputable source. In a typical phishing attack, a victim opens a compromised link that poses as a credible website. The victim is then asked to enter their credentials, but since it is a “fake” website, the sensitive information is routed to the hacker and the victim gets ”‘hacked.” Phishing is popular since it is a low effort, high reward attack. Most modern web browsers, antivirus software and email clients are pretty good at detecting phishing websites at the source, helping to prevent attacks. To understand how they work, this blog post will walk you through a tutorial that shows you how to build your own phishing URL detector using Python and machine learning: Identify the criteria that can recognize fake URLs Build a decision tree that can iterate through the criteria Train our model to recognize fake vs real URLs Evaluate our model to see how it performs Check for false positives/negatives Get Started: Install ML Tools With This Ready-To-Use Python Environment To follow along with the code in this Python phishing detection tutorial, you’ll need to have a recent version of Python installed, along with all the packages used in this post. The quickest way to get up and running is to install the Phishing URL Detection runtime for Windows or Linux, which contains a version of Python and all the packages you’ll need. Required Packages In order to download the ready-to-use phishing detection Python environment, you will need to create an ActiveState Platform account. Just use your GitHub credentials or your email address to register. Signing up is easy and it unlocks the ActiveState Platform’s many benefits for you! Supported Platforms For Windows users, run the following at a CMD prompt to automatically download and install our CLI, the State Tool along with the COVID Simulation runtime into a virtual environment: powershell -Command "& $([scriptblock]::Create((New-Object Net.WebClient).DownloadString('https://platform.activestate.com/dl/cli/install.ps1'))) -activate-default Pizza-Team/Phishing-URL-Detection" For Linux users, run the following to automatically download and install our CLI, the State Tool along with the COVID Simulation runtime into a virtual environment: sh <(curl -q https://platform.activestate.com/dl/cli/install.sh) --activate-default Pizza-Team/Phishing-URL-Detection 1 — How to Identify A Fraudulent URL A fraudulent domain or phishing domain is an URL scheme that looks suspicious for a variety of reasons. Most commonly, the URL: Is misspelled Points to the wrong top-level domain A combination of a valid and a fraudulent URL Is incredibly long Is just be an IP address Has a low pagerank Has a young domain age Ranks poorly on the Alexa Top 1 Million Sites All these are characteristics of a phishing URL that can help us distinguish it from a valid URL. These characteristics can be converted into machine learning feature sets such as numbers, labels and booleans. The University of California, Irvine put together a dataset identifying fraudulent versus valid URLs. Feature sets are divided into four main categories: Address Bar-Based Features – these are features extracted from the URL itself, like URL length >54 characters, or whether it contains an IP address, uses an URL shortening service like TinyURL or Bitly, or employs redirection. Addition features may also include: Adding a prefix or suffix separated by (-) to the domain Having sub-domain and multi-sub-domains Existence of HTTPS Domain registration age Favicon loading from a different domain Using a non-standard port Abnormal Features – these may include: Loading images loaded in the body from a different URL Minimal use of meta tags The use of a Server Form Handler (SFH) Submitting information to email An abnormal URL HTML and JavaScript-Based Features – these can include things like: Website forwarding Status bar customization typically using JavaScript to display a fake URL Disabling the ability to right-click so users can’t view page source code Using pop-up windows iFrame redirection Domain-Based Features – these can include: Unusually young domains Suspicious DNS record Low volume of website traffic PageRank, where 95% of phishing webpages have no PageRank Whether the site has been indexed by Google 2 — Building A Decision Tree Given all the criteria that can help us identify phishing URLs, we can use a machine learning algorithm, such as a decision tree classifier to help us decide whether an URL is valid or not. First, let’s download the UC Irvine dataset and explore its contents. The feature list contains: having_IP_Address { -1,1 } URL_Length { 1,0,-1 } Shortining_Service { 1,-1 } having_At_Symbol { 1,-1 } double_slash_redirecting { -1,1 } Prefix_Suffix { -1,1 } having_Sub_Domain { -1,0,1 } SSLfinal_State { -1,1,0 } Domain_registeration_length { -1,1 } Favicon { 1,-1 } port { 1,-1 } HTTPS_token { -1,1 } Request_URL { 1,-1 } URL_of_Anchor { -1,0,1 } Links_in_tags { 1,-1,0 } SFH { -1,1,0 } Submitting_to_email { -1,1 } Abnormal_URL { -1,1 } Redirect { 0,1 } on_mouseover { 1,-1 } RightClick { 1,-1 } popUpWidnow { 1,-1 } Iframe { 1,-1 } age_of_domain { -1,1 } DNSRecord { -1,1 } web_traffic { -1,0,1 } Page_Rank { -1,1 } Google_Index { 1,-1 } Links_pointing_to_page { 1,0,-1 } Statistical_report { -1,1 } And finally, the Result designates whether the URL is valid or not: Result { -1,1 } Where -1 denotes an invalid URL and 1 is a valid URL. Now let’s now jump into the code. First, we load the required modules: # To perform operations on dataset import pandas as pd import numpy as np # Machine learning model from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier # Visualization from sklearn import metrics from sklearn.metrics import confusion_matrix import matplotlib.pyplot as plt import seaborn as sns from sklearn.tree import export_graphviz Next we read and split the dataset: df = pd.read_csv('.../dataset.csv') dot_file = '.../tree.dot' confusion_matrix_file = '.../confusion_matrix.png' And then print the results: print(df.head()) -1 1 1.1 1.2 -1.1 -1.2 -1.3 -1.4 -1.5 1.3 1.4 -1.6 1.5 -1.7 1.6 ... -1.9 -1.10 0 1.7 1.8 1.9 1.10 -1.11 -1.12 -1.13 -1.14 1.11 1.12 -1.15 -1.16 0 1 1 1 1 1 -1 0 1 -1 1 1 -1 1 0 -1 ... 1 1 0 1 1 1 1 -1 -1 0 -1 1 1 1 -1 1 1 0 1 1 1 -1 -1 -1 -1 1 1 -1 1 0 -1 ... -1 -1 0 1 1 1 1 1 -1 1 -1 1 0 -1 -1 2 1 0 1 1 1 -1 -1 -1 1 1 1 -1 -1 0 0 ... 1 1 0 1 1 1 1 -1 -1 1 -1 1 -1 1 -1 3 1 0 -1 1 1 -1 1 1 -1 1 1 1 1 0 0 ... 1 1 0 -1 1 -1 1 -1 -1 0 -1 1 1 1 1 4 -1 0 -1 1 -1 -1 1 1 -1 1 1 -1 1 0 0 ... -1 -1 0 1 1 1 1 1 1 1 -1 1 -1 -1 1 This dataset contains 5 rows and 31 columns, where each column contains a value for each of the attributes we discussed in the above section. 3 — Train the Model As always, the first step in training a machine learning model is to split the dataset into testing and training data: X = df.iloc[:, :-1] y = df.iloc[:, -1] Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, random_state=0) Since the dataset contains boolean data, it’s always best to use a Decision Tree, RandomForest Classifier or Logistic Regression algorithm since these models work best for classification. In this case, I chose to work with a Decision Tree, because it’s straightforward and generally gives the best results when trying to classify data. model = DecisionTreeClassifier() model.fit(Xtrain, ytrain) 4 — Evaluate the Model Now that the model is trained, let’s see how well it does on the test data: ypred = model.predict(Xtest) print(metrics.classification_report(ypred, ytest)) print("\n\nAccuracy Score:", metrics.accuracy_score(ytest, ypred).round(2)*100, "%") We used the model to predict Xtest data. Now let’s compare the results to ytest and see how well we did: precision recall f1-score support -1 0.95 0.95 0.95 1176 1 0.96 0.96 0.96 1588 micro avg 0.96 0.96 0.96 2764 macro avg 0.96 0.96 0.96 2764 weighted avg 0.96 0.96 0.96 2764 Accuracy Score: 96.0 % Not bad! We made literally no modifications to the data and achieved an accuracy score of 96%. From here, you can dive deeper into the data and see if there’s any transformation that can be done to further improve the accuracy of prediction. 5 — Identify False Positives & False Negatives The results of any decision tree evaluation are likely to contain both false positives (URLs that are actually valid, but that our model indicates are not), as well as false negatives (URLs that are actually bad, but our model indicates are fine). To help resolve these instances, let’s draw out a confusion matrix (a table with 4 different combinations of predicted and actual values) for our results. The matrix will help us identify: True Positives True Negatives False Positives (Type 1 Error) False Negatives (Type 2 Error) mat = confusion_matrix(ytest, ypred) sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False) plt.xlabel('true label') plt.ylabel('predicted label'); plt.savefig(confusion_matrix_file) As you can see, the number of false positives and false negatives are pretty low compared to our true positives and negatives, so we can be pretty sure of our results. To see how the decision tree panned out in making these decisions, we can visualize it with sklearn, matplotlib and sns. export_graphviz(model, out_file=dot_file, feature_names=X.columns.values) >> dot -Tpng tree.dot -o tree.png We use export_graphviz to create a dot file of the decision tree, which is a text file that lets us visualize the actual bifurcations in decisions. Then, using the command line tool dot we convert the text file to a PNG image which shows our final “tree” of decisions (open it in a new tab to view the details): Decision Tree Phishing URL Detection with Python: Summary These days, when everyone is working for home, there’s a lot less opportunity to just casually ask your office colleagues if they’ve received a suspicious email like the one you just got. And attackers know it, driving a 300% increase in cybercrime since the start of the pandemic. It’s always good practice to check every link before you click on it, but of course, busy employees can get careless. This blog post showed you how, given a set of criteria that can typically identify phishing URLs, you can build and train a simple decision tree model to evaluate any given URL, and indicate whether it is actually valid or not with 96% accuracy. Now, if only it was as easy as this to prevent people from clicking fraudulent links in the first place! You can find the criteria for evaluating phishing URLs in UC Irvine’s dataset. To get started building your own URL phishing detector, sign up for a free ActiveState Platform account so you can download our Phishing URL Detection runtime environment and get started faster. Expected Procedure: 1. Accept 2 web URL. One real and another one phishing. answer:URL phishing attacks can use various means to trick a user into clicking on the malicious link. For example, a phishing email may claim to be from a legitimate company asking the user to reset their password due to a potential security incident. Alternatively, the malicious email that the user needs to verify their identity for some reason by clicking on the malicious link. Once the link has been clicked, the user is directed to the malicious phishing page. This page may be designed to harvest a user’s credentials or other sensitive information under the guise of updating a password or verifying a user’s identity. Alternatively, the site may serve a “software update” for the user to download and execute that is actually malware. How To Identify URL Phishing URL phishing attacks use trickery to convince the target that they are legitimate. Some of the ways to detect a URL phishing attack is to: Ignore Display Names: Phishing emails can be configured to show anything in the display name. Instead of looking at the display name, check the sender’s email address to verify that it comes from a trusted source. Verify the Domain: Phishers will commonly use domains with minor misspellings or that seem plausible. For example, company.com may be replaced with cormpany.com or an email may be from company-service.com. Look for these misspellings, they are a good indicators. Check the Links: URL phishing attacks are designed to trick recipients into clicking on a malicious link. Hover over the links within an email and see if they actually go where they claim. Enter suspicious links into a phishing verification tool like phishtank.com, which will tell you if they are known phishing links. If possible, don’t click on a link at all; visit the company’s site directly and navigate to the indicated page. How To Protect From URL Phishing URL phishing attacks can be detected in a few different ways. Some of the common solutions include: URL Filtering: Some phishing URLs are used multiple times and are included in threat intelligence feeds. Blocking these known-bad URLs can help to prevent less-sophisticated phishing emails from reaching users’ inboxes. Domain Reputation: Anti-phishing products commonly look for warning signs of phishing URLs within emails. For example, a domain that is only a few hours old is likely malicious. DMARC Verification: DMARC verification uses SPF or DKIM to verify that an email originates from the alleged source domain. This helps with detecting and blocking spoofed source addresses. These common phishing detection mechanisms can catch the low-hanging fruit. However, phishers are growing more sophisticated and using methods that bypass these common techniques. For example, phishing sites may be hosted on SaaS solutions, which provides them with legitimate domains. Protecting against these more sophisticated attacks requires a more robust approach to URL scanning. URL Phishing Protection With Check Point Check Point and Avanan have developed an anti-phishing solution that provides improved URL phishing protection compared to common techniques. This includes post-delivery protection, endpoint protection to defend against zero-day threats, and the use of contextual and business data to identify sophisticated phishing emails. 2. Analyze the data from both the websites. answer:Before you begin any type of sales conversation, you first need to understand how the client is performing. Using Website Performance insights will enable you to identify key insights on your client's business, allowing you to make informed suggestions on the ways in which their website engages within the right context of their competitive landscape. We’ll cover the following key points: Analyzing digital performance. Uncovering the digital strategy of your prospect and its competitors. Uncovering the technologies used on websites. Getting Started Get started by choosing a prospect you want to engage with, either in an email, a call, or an in-person meeting. From the Sales Intelligence home page, type your prospect/customer's domain in the Search bar, and follow the instructions below. How to Analyze Website Performance Benchmarking your prospects’ website reach and analyzing the quality of their traffic is important in understanding how effective their marketing activities are. This analysis will allow you to better understand the growth potential, areas for improvement, and competitor strengths. Use the ‘Compare’ button to add up to four additional domains as a comparison - choose websites that compete with your prospect. Not sure who you should use as a comparison? Click here to learn how to find the right competitors and benchmark their performance. From the left menu, select Traffic and Engagement to understand and benchmark their reach, health, and performance. Use the following metrics to assess your prospect’s reach and generate the right insight: Use total visits to understand the total audience reach of your prospect vs. their main competitors. Total Visits Use Device Distribution to understand how their audience is behaving based on the devices they are using to visit the site. Device Distribution Engagement overview: See a summary of how successful a website is in generating and retaining visitors, which also highlights key insights on website experience. Pages/Visit: The average amount of time spent, or amount of pages being viewed, on an average user visit to the site. For eCommerce sites, a low amount of time or number of pages means consumers are likely not being exposed to enough products or even reaching the checkout page. Bounce Rate: This is a vital performance metric within the e-commerce industry in particular, as it shows the average amount of visitors who leave a website after viewing only 1 page, hence not converting as well as viewing a limited amount of content/products. Engagement Overview Uncover the Digital Strategy of Your Prospect and Its Competitors Traffic channels show you the ways in which people come to a website and are a suggestion of the overall effort a business puts into its digital marketing strategy. From the left menu, select Marketing Channels as a 'behind the scenes' view into the marketing strategies of your prospects and their competitors. Tip: By viewing total visits, you can identify which website is stronger in different channels. By viewing a percentage, you can compare which channels are the most significant for each website. Channels overview What traffic sources you can track with Similarweb and what they reveal Direct: Use this metric to assess a website’s brand strength (awareness and demand) Email: A website that receives a large amount of traffic from email is likely to have a large loyal customer base that engages via an owned mailing list. Referrals: A website that receives a large amount of traffic from Referrals is likely to have a strong affiliate strategy or enjoy significant media coverage. Social: A website that generates high and consistent traffic from Social is likely to have a loyal community of users. Organic Search: The site is well-optimized for SEO. When there’s a correlation with Direct traffic, it indicates strong brand awareness as many of the Organic visits are generated by branded terms. Paid Search: A website that generates a large amount of traffic from Paid Search is spending advertising budgets to increase brand awareness or target relevant audiences for specific products Display Advertising: A website that generates a large amount of traffic from Display ads is spending advertising budgets on increasing brand awareness or targeting relevant audiences for specific products. If you want to identify the best way for your client to optimize their search engine strategies, for example, we can use the organic search channel to highlight keyword opportunities that your client is currently losing traffic for. Uncovering the Technologies Used on Websites From the left menu, select Website Technologies. It's a great way to see what technologies are already embedded on a website. Use this to find the following insights: Whether they are already working with your competitors If you are offering easy integrations to certain technologies, check if they are already working with the relevant solutions. Which other technologies are they working with, and is that an indication that they are likely to buy. 3. Identify the phishing site. answer:A phishing website is a website used by cybercriminals for malicious purposes, like credential theft or financial fraud. People frequently visit phishing websites having clicked on a phishing link in a malicious email. Phishing websites can be created using spoofed or lookalike domains or they can be built as part of a compromised legitimate website (this is a social engineering technique known as water-holing). Cybercriminals can use phishing websites in multiple different ways. For example, the target might be presented with a log-in screen to enter their credentials, which are then scraped by the cybercriminal for use in account takeover attacks; or they might be prompted to enter payment details to confirm an order or pay for an item that will never arrive; or they might even automatically download malicious files or do so via a prompt on the webpage. As phishing websites are one of the most common types of payload used in phishing attack attacks, here are our top six methods. (Plus, take a read of this article for more information on how to spot a phishing link.) Six tips for how to identify fraudulent websites Check the URL One of the first steps you should take to check whether a website legitimate is to look at the URL. There should be a padlock symbol in the address bar and the URL should begin with either 'https://' or 'shttp://'. This indicates that the website is encrypted and secured with an SSL (Secure Sockets Layer) certificate. However, although it's good practice to look for these details, you can't rely on this information alone. It's estimated that around over half of all phishing websites now use SSL protection in a bid to fool visitors. Another indicator you need to look at is the spelling of the web address. Cybercriminals take advantage of the fact that people tend to skim read information. As such, they will create web addresses that are similar to well-known and trusted ones to launch their phishing attacks. For example, a web address that usually ends in '.org' may be changed to '.com' or letters could be substituted with numbers, such as ‘amazon.com’ changed to ‘amaz0n.com’. Redirects to phishing websites, including URL shorteners Be aware that if you clicked on a link in an email or SMS message that looks legitimate, you could have been redirected to a fraudulent site. The cybercriminal can use text that appears innocent (the URL for the legitimate website or even a prompt like ‘Sign in’) to hide their malicious URL. To try to avoid detection, cybercriminals launching advanced attacks can also put redirects in place once an email has been delivered. This is known as post-delivery weaponization. Similarly, URL shorteners can be used to hide the phishing link, with legitimate services being used to avoid detection. The text within the phishing email will contain the shortener URL, which will redirect to the phishing website once clicked. So even if you’ve clicked on a seemingly harmless hyperlink, you need to remain alert to the risk of phishing. Take a close look at the content Is the website looking sub-standard, for example low-quality images or branding (including logos) or poor spelling and grammar? This can signal that you’re on a phishing website. Most legitimate businesses will invest a lot of money and time in creating a well-designed and highly polished website where the language is correct, the graphics are sharp, and the user experience makes sense. Here are some common red flags you should look for: Simple spelling and grammar mistakes Subpar language (for example, broken English) Low-resolution images Another indication that you may be on a phishing website is the lack of a ‘contact us’ page. Authentic businesses usually provide contact details, including their postal address, phone number, email address and social media links. If this has been omitted, treat it with suspicion. While the lack of contact details can still indicate a phishing website today, some cybercriminals create simple ‘contact us’ pages or add this information to the webpage footer to make their attacks appear more legitimate. In templated brand impersonation attacks, the legitimate brand’s information might be scraped from elsewhere and used on the phishing website. Think about your journey Did you visit the website directly, through a search enginer, or did you click on an emailed link? If you’re having doubts about the legitimacy of a website and you arrived there by clicking a link, then before you take nay action, renavigate there by typing known addresses (e.g. ‘www.amazon.com’) into your browser or search the brand name via a search engine. Even if you believe the email to be from a reputable source, if you weren’t expecting it, then use the two previous steps to check the legitimacy of the website (and if in doubt, navigate there another way or contact the sender using a method that isn’t the original email). A good tip to avoid a successful phishing attack in this instance is to bookmark your frequently visited websites once you've verified their authenticity. That way, you can rest assured that you're in the right place and won't fall victim to phishing attacks that impersonate those brands. If it's a new website that you haven't visited before, take the time to manually visit the website via your usual browser and to ensure there doesn’t appear to be anything malicious about the site using tips such as those from this article. Expected Code: 1. Phishing Website Detection with Python answer: # coding: utf-8 # In[1]: import pandas as pd # ## Collection of Data # In[2]: legitimate_urls = pd.read_csv("legitimate-urls.csv") phishing_urls = pd.read_csv("phishing-urls.csv") # In[3]: print(len(legitimate_urls)) print(len(phishing_urls)) # ## Data PreProcessing # #### Data is in two data frames so we merge them to make one dataframe # Note: two dataframes has same column names # In[4]: urls = legitimate_urls.append(phishing_urls) # In[5]: urls.head(5) # In[6]: print(len(urls)) print(urls.columns) # #### Removing Unnecessary columns # In[7]: urls = urls.drop(urls.columns[[0,3,5]],axis=1) print(urls.columns) # #### Since we merged two dataframes top 1000 rows will have legitimate urls and bottom 1000 rows will have phishing urls. So if we split the data now and create a model for it will overfit or underfit so we need to shuffle the rows before splitting the data into training set and test set # In[8]: # shuffling the rows in the dataset so that when splitting the train and test set are equally distributed urls = urls.sample(frac=1).reset_index(drop=True) # #### Removing class variable from the dataset urls_without_labels = urls.drop('label',axis=1) urls_without_labels.columns labels = urls['label'] #labels # #### splitting the data into train data and test data import random random.seed(100) from sklearn.model_selection import train_test_split data_train, data_test, labels_train, labels_test = train_test_split(urls_without_labels, labels, test_size=0.20, random_state=100) print(len(data_train),len(data_test),len(labels_train),len(labels_test)) print(labels_train.value_counts()) print(labels_test.value_counts()) # #### Checking the data is split in equal distribution or not """ train_0_dist = 711/1410 print(train_0_dist) train_1_dist = 699/1410 print(train_1_dist) test_0_dist = 306/605 print(test_0_dist) test_1_dist = 299/605 print(test_1_dist) """ # ## Random Forest from sklearn.ensemble import RandomForestClassifier RFmodel = RandomForestClassifier() RFmodel.fit(data_train,labels_train) rf_pred_label = RFmodel.predict(data_test) #print(list(labels_test)),print(list(rf_pred_label)) from sklearn.metrics import confusion_matrix,accuracy_score cm2 = confusion_matrix(labels_test,rf_pred_label) print(cm2) print(accuracy_score(labels_test,rf_pred_label)) """ # Saving the model to a file import pickle file_name = "RandomForestModel.sav" pickle.dump(RFmodel,open(file_name,'wb')) """