bursty_dynamics.trains

This module contains functions for detecting the trains, calculating the BP and MC of the trains, and also getting information of the trains.

bursty_dynamics.trains.train_detection(df, subject_id, time_col, max_iet, time_unit='days', min_burst=3, only_trains=True)

Detects and assigns train IDs to events in the provided DataFrame based on the specified parameters.

Parameters

dfDataFrame

The DataFrame containing the data.

subject_idstr

The column name for subject IDs.

time_colstr

The column name for the datetime values.

max_ietint

Maximum distance between consecutive events in a train, in units specified by time_unit.

time_unitstr, optional

Unit of time for the intervals ('seconds', 'minutes', 'hours', 'days', 'weeks', 'months', and 'years'). Default is 'days'.

min_burstint, optional

Minimum number of events required to form a train. Default is 3.

only_trainsbool, optional

Whether to return only the events that form trains. Default is True.

Returns

DataFrame

DataFrame with train_id included which indicates the train the events belong to.

Examples

>>> data = {
...     'subject_id': [1, 1, 1, 1 ,2 ,2 ],
...     'event_time': ['2023-01-01', '2023-01-02', '2023-01-10','2023-01-20', '2023-01-01', '2023-01-03']
... }
>>> df = pd.DataFrame(data)
>>> train_df = train_detection(df, 'subject_id', 'event_time', max_iet=30, time_unit='days', min_burst=2)
>>> train_df
     subject_id  event_time  train_id
0      1         2023-01-01    1
1      1         2023-01-02    1
2      1         2023-01-10    1
3      1         2023-01-20    1
4      2         2023-01-01    1
5      2         2023-01-03    1
bursty_dynamics.trains.train_info(train_df, subject_id, time_col, summary_statistic=None)

Calculate summary statistics for train data.

Parameters

train_dfDataFrame

DataFrame containing train information.

subject_idstr

Name of the column containing subject IDs.

time_colstr

Name of the column containing timestamps.

summary_statisticbool, optional

Whether to print summary statistics. Default is False.

Returns

DataFrame

DataFrame with calculated train information.

Examples

>>> train_info(train_df, subject_id = 'subject_id', time_col = 'event_time')
    subject_id train_id  unique_event_counts  total_term_counts  train_start  train_end  train_duration_yrs  total_trains
0      1          1          4                   4                2023-01-01   2023-01-20    0.05                 1
1      2          1          2                   2                2023-01-01   2023-01-03    0.01                 1
bursty_dynamics.trains.train_scores(train_df, subject_id, time_col, min_event_n=None, scatter=False, hist=False)

Calculate Burstiness Parameter (BP) and Memory Coefficient (MC) for each train_id per subject_id.

Parameters

train_dfpd.DataFrame

Input DataFrame.

subject_idstr

Name of the column containing subject IDs.

time_colstr

Name of the column containing the date.

min_event_nint, optional

Maximum IET for filtering events. Defaults to None.

scatterbool, optional

Whether to plot scatter plot. Defaults to False.

histstr or None, optional

Type of histogram to plot. Options: - True: Plot histograms for both BP and MC. - "BP": Plot histogram for BP only. - "MC": Plot histogram for MC only. - "Both": Plot histograms for both BP and MC on the same plot. - False: Do not plot any histograms. Defaults to False.

Returns

tuple or DataFrame

If both scatter and hist are True: returns (merged_df, scatter_plot, hist_plot). If only scatter is True: returns (merged_df, scatter_plot). If only hist is True: returns (merged_df, hist_plot). If neither scatter nor hist is True: returns merged_df.

Notes

  • merged_dfDataFrame

    The input DataFrame with burstiness parameter (BP) and memory coefficient (MC) for each train_id per subject_id.

  • scatter_plotmatplotlib.figure.Figure or None

    The figure object containing the scatter plot (if scatter=True).

  • hist_plotsmatplotlib.figure.Figure or None

    The figure objects containing the histogram (if hist=True).

  • Multiple events occurring at the same time will be aggregated into a single event when calcualting the BP and MC.

Examples

>>> train_scores(train_df, subject_id = 'subject_id', time_col ='event_time', min_event_n= 3)
    subject_id  train_id  BP         MC
0      1           1      -0.19709   1.0