IP Analysis Data Overview ========================= Getting an overview of the data -------------------------------- The list_of_dfs variable is a dictionary containing the following dataframes: - flow_started (event based) - flow_closed (event based) - upd_classification (event based) - upd_fqdn (event based) - upd_ssl (event based) - upd_network (event based) - report (sent every second) A flow is defined with his IPs (source & destination) and ports (source & destination). It gets a unique identifier: flow_id. All the different dataframes are linked with this flow_id. If we are not interested in the different updates of one flow, we can group all the different dataframes (except report) and take the latest update into consideration. .. code-block:: python flows_info = d["flow_started"] classification=d["upd_classification"].sort(by='time').group_by('flow_id').last().drop("time") network=d["upd_network"].sort(by='time').group_by('flow_id').last().drop("time") fqdn=d["upd_fqdn"].sort(by='time').group_by('flow_id').last().drop("time") ssl=d["upd_ssl"].sort(by='time').group_by('flow_id').last().drop("time") flows_info = flows_info.join(classification, on='flow_id', how='full', coalesce=True) flows_info = flows_info.join(network, on='flow_id', how='full', coalesce=True) flows_info = flows_info.join(fqdn, on='flow_id', how='full', coalesce=True) flows_info = flows_info.join(ssl, on='flow_id', how='full', coalesce=True) flows_info.shape # return (45,64) 45 IP connections were identified, 64 different information may be attributed to each connections: - source IP & port: geo location, flags - destination IP & port: geo location, flags - layer 3, 4, 7: value, group, attributes - application: name, group, attributes - cellular network: bearer/QOS, ran, apn, slice, DNS - SSL: client, server An overview of the flow recognized as application, displaying a few information: .. code-block:: python import polars as pl overview = flows_info.select([ "application_value", "qfi", "pdu_session_id", "rans", "apn", "slice_service_type", "slice_differentiator", "server_versions_text" ]).filter( (pl.col("application_value").is_not_null()) & (pl.col("application_value") != "unknown") ) pl.Config.set_tbl_rows(-1) # Show all rows overview ====================== ===== ================ ======= ====================== ==================== ====================== ====================== application_value qfi pdu_session_id rans apn slice_service_type slice_differentiator server_versions_text ====================== ===== ================ ======= ====================== ==================== ====================== ====================== Google Analytics 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Google APIs 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Yahoo Ad Tech 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 Yahoo Ad Tech 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 Google Analytics 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 Yahoo 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Yahoo 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Yahoo Ad Tech 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 Yahoo 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Google APIs 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Yahoo 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Yahoo 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] GSM Association 6 9 ['NR1'] ims.mnc001.mcc001.gprs 1 Yahoo 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Yahoo Ad Tech 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 Google Shared Services 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Yahoo 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Google Analytics 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Google APIs 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 Yahoo 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Microsoft Bing 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Yahoo 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 Yahoo 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 Google APIs 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] Yahoo 6 10 ['NR1'] apn.mnc001.mcc001.gprs 1 ['TLSv1_3'] ====================== ===== ================ ======= ====================== ==================== ====================== ======================