Answer for the following questions usign Data Analysis.
What is the most frequent internet activity time of the day ? How often the ip changes ? How often the device changed. What is the average usage per hour , per day and per month ?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv(r"C:\Users\ahlad\Downloads\ML_DL_Assignment_1.zip")
data.head()
name | start_time | usage_time | IP | MAC | upload | download | total_transfer | seession_break_reason | |
---|---|---|---|---|---|---|---|---|---|
0 | user1 | 2022-05-10 02:59:32 | 00:00:36:28 | 10.55.14.222 | 48:E7:DA:58:22:E9 | 15861.76 | 333168.64 | 349030.40 | Idle-Timeout |
1 | user1 | 2022-05-10 18:53:27 | 00:01:49:56 | 10.55.2.253 | 48:E7:DA:58:22:E9 | 16957.44 | 212152.32 | 229109.76 | Idle-Timeout |
2 | user1 | 2022-05-10 21:20:44 | 00:01:35:00 | 10.55.2.253 | 48:E7:DA:58:22:E9 | 14080.0 | 195153.92 | 209233.92 | Idle-Timeout |
3 | user1 | 2022-05-11 00:37:42 | 00:00:26:00 | 10.55.2.253 | 48:E7:DA:58:22:E9 | 5242.88 | 40806.4 | 46049.28 | Idle-Timeout |
4 | user1 | 2022-05-11 02:59:38 | 00:00:11:52 | 10.55.2.253 | 48:E7:DA:58:22:E9 | 22067.2 | 10772.48 | 32839.68 | Idle-Timeout |
import datetime
from IPython.display import display
# Data cleaning and preprocessing
data.dropna(inplace=True) # Drop any missing values
data.drop_duplicates(inplace=True) # Drop any duplicate values
data['start_time'] = pd.to_datetime(data['start_time']) # Convert timestamp column to datetime
# What is the most frequent internet activity time of the day?
data['hour'] = data['start_time'].dt.hour
freq_activity_time = data['hour'].mode()[0]
print("The most frequent internet activity time of the day is: ", freq_activity_time, "hours.")
The most frequent internet activity time of the day is: 22 hours.
# How often the IP changes?
num_ip_changes = (data['IP'] != data['IP'].shift()).sum()
print("The IP changes ", num_ip_changes, "times.")
The IP changes 2304 times.
# How often the device changed?
num_device_changes = (data['MAC'] != data['MAC'].shift()).sum()
print("The MAC changes ", num_device_changes, "times.")
The MAC changes 1224 times.
# What is the average usage per hour, per day and per month?
data[' date'] = data['start_time'].dt.date
data[' month'] = data['start_time'].dt.month
data['total_transfer'] = pd.to_numeric(data['total_transfer'])
avg_usage_hour = data.groupby('hour')['total_transfer'].mean()
avg_usage_day = data.groupby('date')['total_transfer'].mean()
avg_usage_month = data.groupby('month')['total_transfer'].mean()
print("The average internet usage per hour is:\n", avg_usage_hour)
print("The average internet usage per day is:\n", avg_usage_day)
print("The average internet usage per month is:\n", avg_usage_month)
The average internet usage per hour is: hour 0 464530.443023 1 530880.856788 2 431576.112743 3 345303.341176 4 359809.443333 5 275960.910769 6 468959.586757 7 292886.830164 8 366681.918762 9 377480.638954 10 393259.119955 11 309492.445992 12 310137.981415 13 335270.579648 14 472403.712765 15 517005.111506 16 403919.401872 17 525423.692116 18 666590.764187 19 389841.785382 20 355862.804027 21 474038.339233 22 449600.499185 23 407785.083903 Name: total_transfer, dtype: float64 The average internet usage per day is: date 2022-05-09 109844.480000 2022-05-10 151600.782667 2022-05-11 411055.589200 2022-05-12 340207.616000 2022-05-13 297072.926250 ... 2022-11-01 374462.644706 2022-11-02 463347.552895 2022-11-03 348276.877241 2022-11-04 424498.885517 2022-11-05 362904.137143 Name: total_transfer, Length: 154, dtype: float64 The average internet usage per month is: month 5 311177.156960 6 338418.082988 7 418583.993765 8 479042.438202 9 482955.522841 10 549467.626233 11 399804.112119 Name: total_transfer, dtype: float64