bursty_dynamics.trains
This module contains functions for detecting the trains, calculating the BP and MC of the trains, and also getting information of the trains.
- bursty_dynamics.trains.train_detection(df, subject_id, time_col, max_iet, time_unit='days', min_burst=3, only_trains=True)
Detects and assigns train IDs to events in the provided DataFrame based on the specified parameters.
Parameters
- dfDataFrame
The DataFrame containing the data.
- subject_idstr
The column name for subject IDs.
- time_colstr
The column name for the datetime values.
- max_ietint
Maximum distance between consecutive events in a train, in units specified by time_unit.
- time_unitstr, optional
Unit of time for the intervals ('seconds', 'minutes', 'hours', 'days', 'weeks', 'months', and 'years'). Default is 'days'.
- min_burstint, optional
Minimum number of events required to form a train. Default is 3.
- only_trainsbool, optional
Whether to return only the events that form trains. Default is True.
Returns
- DataFrame
DataFrame with train_id included which indicates the train the events belong to.
Examples
>>> data = { ... 'subject_id': [1, 1, 1, 1 ,2 ,2 ], ... 'event_time': ['2023-01-01', '2023-01-02', '2023-01-10','2023-01-20', '2023-01-01', '2023-01-03'] ... } >>> df = pd.DataFrame(data) >>> train_df = train_detection(df, 'subject_id', 'event_time', max_iet=30, time_unit='days', min_burst=2) >>> train_df subject_id event_time train_id 0 1 2023-01-01 1 1 1 2023-01-02 1 2 1 2023-01-10 1 3 1 2023-01-20 1 4 2 2023-01-01 1 5 2 2023-01-03 1
- bursty_dynamics.trains.train_info(train_df, subject_id, time_col, summary_statistic=None)
Calculate summary statistics for train data.
Parameters
- train_dfDataFrame
DataFrame containing train information.
- subject_idstr
Name of the column containing subject IDs.
- time_colstr
Name of the column containing timestamps.
- summary_statisticbool, optional
Whether to print summary statistics. Default is False.
Returns
- DataFrame
DataFrame with calculated train information.
Examples
>>> train_info(train_df, subject_id = 'subject_id', time_col = 'event_time') subject_id train_id unique_event_counts total_term_counts train_start train_end train_duration_yrs total_trains 0 1 1 4 4 2023-01-01 2023-01-20 0.05 1 1 2 1 2 2 2023-01-01 2023-01-03 0.01 1
- bursty_dynamics.trains.train_scores(train_df, subject_id, time_col, min_event_n=None, scatter=False, hist=False)
Calculate Burstiness Parameter (BP) and Memory Coefficient (MC) for each train_id per subject_id.
Parameters
- train_dfpd.DataFrame
Input DataFrame.
- subject_idstr
Name of the column containing subject IDs.
- time_colstr
Name of the column containing the date.
- min_event_nint, optional
Maximum IET for filtering events. Defaults to None.
- scatterbool, optional
Whether to plot scatter plot. Defaults to False.
- histstr or None, optional
Type of histogram to plot. Options: - True: Plot histograms for both BP and MC. - "BP": Plot histogram for BP only. - "MC": Plot histogram for MC only. - "Both": Plot histograms for both BP and MC on the same plot. - False: Do not plot any histograms. Defaults to False.
Returns
- tuple or DataFrame
If both scatter and hist are True: returns (merged_df, scatter_plot, hist_plot). If only scatter is True: returns (merged_df, scatter_plot). If only hist is True: returns (merged_df, hist_plot). If neither scatter nor hist is True: returns merged_df.
Notes
- merged_dfDataFrame
The input DataFrame with burstiness parameter (BP) and memory coefficient (MC) for each train_id per subject_id.
- scatter_plotmatplotlib.figure.Figure or None
The figure object containing the scatter plot (if scatter=True).
- hist_plotsmatplotlib.figure.Figure or None
The figure objects containing the histogram (if hist=True).
Multiple events occurring at the same time will be aggregated into a single event when calcualting the BP and MC.
Examples
>>> train_scores(train_df, subject_id = 'subject_id', time_col ='event_time', min_event_n= 3) subject_id train_id BP MC 0 1 1 -0.19709 1.0