IP Analysis Data Overview
Getting an overview of the data
The list_of_dfs variable is a dictionary containing the following dataframes:
flow_started (event based)
flow_closed (event based)
upd_classification (event based)
upd_fqdn (event based)
upd_ssl (event based)
upd_network (event based)
report (sent every second)
A flow is defined with his IPs (source & destination) and ports (source & destination). It gets a unique identifier: flow_id.
All the different dataframes are linked with this flow_id.
If we are not interested in the different updates of one flow, we can group all the different dataframes (except report) and take the latest update into consideration.
flows_info = d["flow_started"]
classification=d["upd_classification"].sort(by='time').group_by('flow_id').last().drop("time")
network=d["upd_network"].sort(by='time').group_by('flow_id').last().drop("time")
fqdn=d["upd_fqdn"].sort(by='time').group_by('flow_id').last().drop("time")
ssl=d["upd_ssl"].sort(by='time').group_by('flow_id').last().drop("time")
flows_info = flows_info.join(classification, on='flow_id', how='full', coalesce=True)
flows_info = flows_info.join(network, on='flow_id', how='full', coalesce=True)
flows_info = flows_info.join(fqdn, on='flow_id', how='full', coalesce=True)
flows_info = flows_info.join(ssl, on='flow_id', how='full', coalesce=True)
flows_info.shape
# return (45,64)
45 IP connections were identified, 64 different information may be attributed to each connections:
source IP & port: geo location, flags
destination IP & port: geo location, flags
layer 3, 4, 7: value, group, attributes
application: name, group, attributes
cellular network: bearer/QOS, ran, apn, slice, DNS
SSL: client, server
An overview of the flow recognized as application, displaying a few information:
import polars as pl
overview = flows_info.select([
"application_value",
"qfi",
"pdu_session_id",
"rans",
"apn",
"slice_service_type",
"slice_differentiator",
"server_versions_text"
]).filter(
(pl.col("application_value").is_not_null()) &
(pl.col("application_value") != "unknown")
)
pl.Config.set_tbl_rows(-1) # Show all rows
overview
application_value |
qfi |
pdu_session_id |
rans |
apn |
slice_service_type |
slice_differentiator |
server_versions_text |
|---|---|---|---|---|---|---|---|
Google Analytics |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Google APIs |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Yahoo Ad Tech |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
||
Yahoo Ad Tech |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
||
Google Analytics |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
||
Yahoo |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Yahoo |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Yahoo Ad Tech |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
||
Yahoo |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Google APIs |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Yahoo |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Yahoo |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
GSM Association |
6 |
9 |
[‘NR1’] |
ims.mnc001.mcc001.gprs |
1 |
||
Yahoo |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Yahoo Ad Tech |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
||
Google Shared Services |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Yahoo |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Google Analytics |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Google APIs |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
||
Yahoo |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Microsoft Bing |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Yahoo |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
||
Yahoo |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
||
Google APIs |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |
|
Yahoo |
6 |
10 |
[‘NR1’] |
apn.mnc001.mcc001.gprs |
1 |
[‘TLSv1_3’] |