Feature specifications#

timeseriesflattener.specs#

class PredictorSpec(value_frame: ValueFrame, lookbehind_distances: InitVar[Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]], aggregators: Sequence[Aggregator], fallback: int | float | str | None, column_prefix: str = 'pred')[source]#

Bases: object

Specification for a temporal predictor.

The value_frame must contain columns:

entity_id_col_name: The name of the column containing the entity ids. value_timestamp_col_name: The name of the column containing the timestamps for each value. additional columns containing values to aggregate. The name of the columns will be used for feature naming.

aggregators: Sequence[Aggregator]#
column_prefix: str = 'pred'#
property df: pl.DataFrame#
fallback: int | float | str | None#
static from_primitives(df: pl.DataFrame, entity_id_col_name: str, lookbehind_days: Sequence[float | tuple[float, float]], aggregators: Sequence[AggregatorName], value_timestamp_col_name: str = 'timestamp', column_prefix: str = 'pred', fallback: int | float | str | None = 0) PredictorSpec[source]#

Create a PredictorSpec from primitives.

lookbehind_distances: InitVar[Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]]#
value_frame: ValueFrame#
class PredictionTimeFrame(init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame], entity_id_col_name: str = 'entity_id', timestamp_col_name: str = 'pred_timestamp', prediction_time_uuid_col_name: str = 'prediction_time_uuid')[source]#

Bases: object

Specification for prediction times, i.e. the times for which predictions are made.

init_df must be a dataframe (pandas or polars) containing columns:

entity_id_col_name: The name of the column containing the entity ids. timestamp_col_name: The name of the column containing the timestamps for when to make a prediction.

collect() DataFrame[source]#
entity_id_col_name: str = 'entity_id'#
init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame]#
prediction_time_uuid_col_name: str = 'prediction_time_uuid'#
required_columns() Sequence[str][source]#
timestamp_col_name: str = 'pred_timestamp'#
class BooleanOutcomeSpec(init_frame: InitVar[TimestampValueFrame], lookahead_distances: Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]], aggregators: Sequence[Aggregator], output_name: str, column_prefix: str = 'outc')[source]#

Bases: object

Specification for a boolean outcome, e.g. whether a patient received a treatment or not.

The init_frame must contain columns:

entity_id_col_name: The name of the column containing the entity ids. Must be a string, and the column’s values must be strings which are unique. value_timestamp_col_name: The name of the column containing the timestamps of when the event occurs. Must be a string, and the column’s values must be datetimes.

aggregators: Sequence[Aggregator]#
column_prefix: str = 'outc'#
property df: DataFrame#
static from_primitives(df: pl.DataFrame | pd.DataFrame, entity_id_col_name: str, lookahead_days: Sequence[float | tuple[float, float]], aggregators: Sequence[AggregatorName], value_timestamp_col_name: str = 'timestamp', column_prefix: str = 'outc') BooleanOutcomeSpec[source]#

Create an OutcomeSpec from primitives.

init_frame: InitVar[TimestampValueFrame]#
lookahead_distances: Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]#
output_name: str#
class OutcomeSpec(value_frame: ValueFrame, lookahead_distances: InitVar[Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]], aggregators: Sequence[Aggregator], fallback: int | float | str | None, column_prefix: str = 'outc')[source]#

Bases: object

Specification for an outcome. If your outcome is binary/boolean, you can use BooleanOutcomeSpec instead.

aggregators: Sequence[Aggregator]#
column_prefix: str = 'outc'#
property df: DataFrame#
fallback: int | float | str | None#
static from_primitives(df: pl.DataFrame, entity_id_col_name: str, lookahead_days: Sequence[float | tuple[float, float]], aggregators: Sequence[AggregatorName], value_timestamp_col_name: str = 'timestamp', column_prefix: str = 'outc') OutcomeSpec[source]#

Create an OutcomeSpec from primitives.

lookahead_distances: InitVar[Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]]#
value_frame: ValueFrame#
class StaticFrame(init_df: 'InitVar[pl.DataFrame | pd.DataFrame]', entity_id_col_name: 'str' = 'entity_id')[source]#

Bases: object

collect() DataFrame[source]#
entity_id_col_name: str = 'entity_id'#
init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame]#
class StaticSpec(value_frame: StaticFrame, column_prefix: str, fallback: int | float | str | None)[source]#

Bases: object

Specification for a static feature, e.g. the sex of a person.

The value_frame must contain columns:

entity_id_col_name: The name of the column containing the entity ids. Must be a string, and the column’s values must be strings which are unique. additional columns containing the values of the static feature. The names of the columns will be used for feature naming.

column_prefix: str#
fallback: int | float | str | None#
static from_primitives(df: DataFrame, entity_id_col_name: str, column_prefix: str, fallback: int | float | str | None) StaticSpec[source]#

Create a StaticSpec from primitives.

value_frame: StaticFrame#
class TimeDeltaSpec(init_frame: 'TimestampValueFrame', fallback: 'int | float | str | None', output_name: 'str', column_prefix: 'str' = 'pred', time_format: "Literal['seconds', 'minutes', 'hours', 'days', 'years']" = 'days')[source]#

Bases: object

column_prefix: str = 'pred'#
property df: pl.DataFrame#
fallback: int | float | str | None#
static from_primitives(df: pl.DataFrame, entity_id_col_name: str, output_name: str, value_timestamp_col_name: str = 'timestamp', column_prefix: str = 'pred', fallback: int | float | str | None = 0) TimeDeltaSpec[source]#

Create a TimeDeltaSpec from primitives.

init_frame: TimestampValueFrame#
output_name: str#
time_format: Literal['seconds', 'minutes', 'hours', 'days', 'years'] = 'days'#

Specification for a time delta feature for an entity, i.e. the time between a prediction timestamp and a value timestamp (e.g. a birthdate). Useful for e.g. calculating age or the time since a certain event. Joins on the entity_id column.

init_frame must contain columns:

entity_id_col_name: The name of the column containing the entity ids. Must be a string, and the column’s values must be strings which are unique. value_timestamp_col_name: The name of the column containing the timestamps of when the event occurs. Must be a string, and the column’s values must be datetimes.

output_name: the desired name of the feature column. time_format:

class TimestampValueFrame(init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame], entity_id_col_name: str = 'entity_id', value_timestamp_col_name: str = 'timestamp')[source]#

Bases: object

Timestamps, useful for computing e.g. age.

Must contain columns:

entity_id_col_name: The name of the column containing the entity ids. Must be a string, and the column’s values must be strings which are unique. value_timestamp_col_name: The name of the column containing the timestamps. Must be a string, and the column’s values must be datetimes.

collect() DataFrame[source]#
entity_id_col_name: str = 'entity_id'#
init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame]#
value_timestamp_col_name: str = 'timestamp'#