Feature specifications#

timeseriesflattener.specs#

class PredictorSpec(value_frame: ValueFrame, lookbehind_distances: InitVar[Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]], aggregators: Sequence[Aggregator], fallback: int | float | str | None, column_prefix: str = 'pred')[source]#

Bases: object

Specification for a temporal predictor.

The value_frame must contain columns:

entity_id_col_name: The name of the column containing the entity ids. value_timestamp_col_name: The name of the column containing the timestamps for each value. additional columns containing values to aggregate. The name of the columns will be used for feature naming.

aggregators: Sequence[Aggregator]#
column_prefix: str = 'pred'#
property df: pl.DataFrame#
fallback: int | float | str | None#
lookbehind_distances: InitVar[Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]]#
value_frame: ValueFrame#
class PredictionTimeFrame(init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame], entity_id_col_name: str = 'entity_id', timestamp_col_name: str = 'pred_timestamp', prediction_time_uuid_col_name: str = 'prediction_time_uuid')[source]#

Bases: object

Specification for prediction times, i.e. the times for which predictions are made.

init_df must be a dataframe (pandas or polars) containing columns:

entity_id_col_name: The name of the column containing the entity ids. timestamp_col_name: The name of the column containing the timestamps for when to make a prediction.

collect() DataFrame[source]#
entity_id_col_name: str = 'entity_id'#
init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame]#
prediction_time_uuid_col_name: str = 'prediction_time_uuid'#
required_columns() Sequence[str][source]#
timestamp_col_name: str = 'pred_timestamp'#
class BooleanOutcomeSpec(init_frame: InitVar[TimestampValueFrame], lookahead_distances: Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]], aggregators: Sequence[Aggregator], output_name: str, column_prefix: str = 'outc')[source]#

Bases: object

Specification for a boolean outcome, e.g. whether a patient received a treatment or not.

The init_frame must contain columns:

entity_id_col_name: The name of the column containing the entity ids. Must be a string, and the column’s values must be strings which are unique. value_timestamp_col_name: The name of the column containing the timestamps of when the event occurs. Must be a string, and the column’s values must be datetimes.

aggregators: Sequence[Aggregator]#
column_prefix: str = 'outc'#
property df: DataFrame#
init_frame: InitVar[TimestampValueFrame]#
lookahead_distances: Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]#
output_name: str#
class OutcomeSpec(value_frame: ValueFrame, lookahead_distances: InitVar[Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]], aggregators: Sequence[Aggregator], fallback: int | float | str | None, column_prefix: str = 'outc')[source]#

Bases: object

Specification for an outcome. If your outcome is binary/boolean, you can use BooleanOutcomeSpec instead.

aggregators: Sequence[Aggregator]#
column_prefix: str = 'outc'#
property df: DataFrame#
fallback: int | float | str | None#
lookahead_distances: InitVar[Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]]#
value_frame: ValueFrame#
class StaticFrame(init_df: 'InitVar[pl.DataFrame | pd.DataFrame]', entity_id_col_name: 'str' = 'entity_id')[source]#

Bases: object

collect() DataFrame[source]#
entity_id_col_name: str = 'entity_id'#
init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame]#
class StaticSpec(value_frame: StaticFrame, column_prefix: str, fallback: int | float | str | None)[source]#

Bases: object

Specification for a static feature, e.g. the sex of a person.

The value_frame must contain columns:

entity_id_col_name: The name of the column containing the entity ids. Must be a string, and the column’s values must be strings which are unique. additional columns containing the values of the static feature. The names of the columns will be used for feature naming.

column_prefix: str#
fallback: int | float | str | None#
value_frame: StaticFrame#
class TimeDeltaSpec(init_frame: 'TimestampValueFrame', fallback: 'int | float | str | None', output_name: 'str', column_prefix: 'str' = 'pred', time_format: "Literal['seconds', 'minutes', 'hours', 'days', 'years']" = 'days')[source]#

Bases: object

column_prefix: str = 'pred'#
property df: pl.DataFrame#
fallback: int | float | str | None#
init_frame: TimestampValueFrame#
output_name: str#
time_format: Literal['seconds', 'minutes', 'hours', 'days', 'years'] = 'days'#

Specification for a time delta feature, i.e. the time between a prediction timestamp and a value timestamp. Useful for e.g. calculating age or the time since a certain event.

init_frame must contain columns:

entity_id_col_name: The name of the column containing the entity ids. Must be a string, and the column’s values must be strings which are unique. value_timestamp_col_name: The name of the column containing the timestamps of when the event occurs. Must be a string, and the column’s values must be datetimes.

output_name: the desired name of the feature column. time_format:

class TimestampValueFrame(init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame], entity_id_col_name: str = 'entity_id', value_timestamp_col_name: str = 'timestamp')[source]#

Bases: object

Timestamps, useful for computing e.g. age.

Must contain columns:

entity_id_col_name: The name of the column containing the entity ids. Must be a string, and the column’s values must be strings which are unique. value_timestamp_col_name: The name of the column containing the timestamps. Must be a string, and the column’s values must be datetimes.

collect() DataFrame[source]#
entity_id_col_name: str = 'entity_id'#
init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame]#
value_timestamp_col_name: str = 'timestamp'#