bursty_dynamics.trains

This module contains functions for detecting the trains, calculating the BP and MC of the trains, and also getting information of the trains.

bursty_dynamics.trains.train_detection(df, subject_id, time_col, max_iet, time_unit='days', min_burst=3, only_trains=True)

Detects and assigns train IDs to events in the provided DataFrame based on the specified parameters.

Parameters

dfDataFrame

The DataFrame containing the data.

subject_idstr

The column name for subject IDs.

time_colstr

The column name for the datetime values.

max_ietint

Maximum distance between consecutive events in a train, in units specified by time_unit.

time_unitstr, optional

Unit of time for the intervals ('seconds', 'minutes', 'hours', 'days', 'weeks', 'months', and 'years'). Default is 'days'.

min_burstint, optional

Minimum number of events required to form a train. Default is 3.

only_trainsbool, optional

Whether to return only the events that form trains. Default is True.

Returns

DataFrame

DataFrame with train_id included which indicates the train the events belong to.

Examples

>>> data = {
...     'subject_id': [1, 1, 1, 1 ,2 ,2 ],
...     'event_time': ['2023-01-01', '2023-01-02', '2023-01-10','2023-01-20', '2023-01-01', '2023-01-03']
... }
>>> df = pd.DataFrame(data)
>>> train_df = train_detection(df, 'subject_id', 'event_time', max_iet=30, time_unit='days', min_burst=2)
>>> train_df
     subject_id  event_time  train_id
0      1         2023-01-01    1
1      1         2023-01-02    1
2      1         2023-01-10    1
3      1         2023-01-20    1
4      2         2023-01-01    1
5      2         2023-01-03    1
bursty_dynamics.trains.train_info(train_df, subject_id, time_col, summary_statistic=None)

Calculate summary statistics for train data.

Parameters

train_dfDataFrame

DataFrame containing train information.

subject_idstr

Name of the column containing subject IDs.

time_colstr

Name of the column containing timestamps.

summary_statisticbool, optional

Whether to print summary statistics. Default is False.

Returns

DataFrame

DataFrame with calculated train information.

Examples

>>> train_info(train_df, subject_id = 'subject_id', time_col = 'event_time')
    subject_id train_id  unique_event_counts  total_term_counts  train_start  train_end  train_duration_yrs  total_trains
0      1          1          4                   4                2023-01-01   2023-01-20    0.05                 1
1      2          1          2                   2                2023-01-01   2023-01-03    0.01                 1
bursty_dynamics.trains.train_scores(train_df, subject_id, time_col, min_event_n=None, scatter=False, hist=False)

Calculate Burstiness Parameter (BP) and Memory Coefficient (MC) for each train_id per subject_id.

Parameters

train_dfpd.DataFrame

Input DataFrame.

subject_idstr

Name of the column containing subject IDs.

time_colstr

Name of the column containing the date.

min_event_nint, optional

Minimum number of events required in a train for it to be included in the dataset. Defaults to None.

scatterbool, optional

Whether to plot scatter plot. Defaults to False.

histstr or None, optional

Type of histogram to plot. Options: - True: Plot histograms for both BP and MC. - "BP": Plot histogram for BP only. - "MC": Plot histogram for MC only. - "Both": Plot histograms for both BP and MC on the same plot. - False: Do not plot any histograms. Defaults to False.

Returns

tuple or DataFrame

If both scatter and hist are True: returns (merged_df, scatter_plot, hist_plot). If only scatter is True: returns (merged_df, scatter_plot). If only hist is True: returns (merged_df, hist_plot). If neither scatter nor hist is True: returns merged_df.

Notes

  • merged_dfDataFrame

    The input DataFrame with burstiness parameter (BP) and memory coefficient (MC) for each train_id per subject_id.

  • scatter_plotmatplotlib.figure.Figure or None

    The figure object containing the scatter plot (if scatter=True).

  • hist_plotsmatplotlib.figure.Figure or None

    The figure objects containing the histogram (if hist=True).

  • Multiple events occurring at the same time will be aggregated into a single event when calcualting the BP and MC.

Examples

>>> train_scores(train_df, subject_id = 'subject_id', time_col ='event_time', min_event_n= 3)
    subject_id  train_id  BP         MC
0      1           1      -0.19709   1.0