IP Analysis Data Overview

Getting an overview of the data

The list_of_dfs variable is a dictionary containing the following dataframes:

  • flow_started (event based)

  • flow_closed (event based)

  • upd_classification (event based)

  • upd_fqdn (event based)

  • upd_ssl (event based)

  • upd_network (event based)

  • report (sent every second)

A flow is defined with his IPs (source & destination) and ports (source & destination). It gets a unique identifier: flow_id.

All the different dataframes are linked with this flow_id.

If we are not interested in the different updates of one flow, we can group all the different dataframes (except report) and take the latest update into consideration.

flows_info = d["flow_started"]
classification=d["upd_classification"].sort(by='time').group_by('flow_id').last().drop("time")
network=d["upd_network"].sort(by='time').group_by('flow_id').last().drop("time")
fqdn=d["upd_fqdn"].sort(by='time').group_by('flow_id').last().drop("time")
ssl=d["upd_ssl"].sort(by='time').group_by('flow_id').last().drop("time")

flows_info = flows_info.join(classification, on='flow_id', how='full', coalesce=True)
flows_info = flows_info.join(network, on='flow_id', how='full', coalesce=True)
flows_info = flows_info.join(fqdn, on='flow_id', how='full', coalesce=True)
flows_info = flows_info.join(ssl, on='flow_id', how='full', coalesce=True)

flows_info.shape
# return (45,64)

45 IP connections were identified, 64 different information may be attributed to each connections:

  • source IP & port: geo location, flags

  • destination IP & port: geo location, flags

  • layer 3, 4, 7: value, group, attributes

  • application: name, group, attributes

  • cellular network: bearer/QOS, ran, apn, slice, DNS

  • SSL: client, server

An overview of the flow recognized as application, displaying a few information:

import polars as pl
overview = flows_info.select([
    "application_value",
    "qfi",
    "pdu_session_id",
    "rans",
    "apn",
    "slice_service_type",
    "slice_differentiator",
    "server_versions_text"
]).filter(
    (pl.col("application_value").is_not_null()) &
    (pl.col("application_value") != "unknown")
)

pl.Config.set_tbl_rows(-1)  # Show all rows
overview

application_value

qfi

pdu_session_id

rans

apn

slice_service_type

slice_differentiator

server_versions_text

Google Analytics

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Google APIs

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Yahoo Ad Tech

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

Yahoo Ad Tech

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

Google Analytics

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

Yahoo

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Yahoo

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Yahoo Ad Tech

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

Yahoo

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Google APIs

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Yahoo

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Yahoo

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

GSM Association

6

9

[‘NR1’]

ims.mnc001.mcc001.gprs

1

Yahoo

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Yahoo Ad Tech

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

Google Shared Services

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Yahoo

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Google Analytics

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Google APIs

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

Yahoo

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Microsoft Bing

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Yahoo

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

Yahoo

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

Google APIs

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]

Yahoo

6

10

[‘NR1’]

apn.mnc001.mcc001.gprs

1

[‘TLSv1_3’]