Feature specifications#
timeseriesflattener.specs#
- class PredictorSpec(value_frame: ValueFrame, lookbehind_distances: InitVar[Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]], aggregators: Sequence[Aggregator], fallback: int | float | str | None, column_prefix: str = 'pred')[source]#
Bases:
object
Specification for a temporal predictor.
- The value_frame must contain columns:
entity_id_col_name: The name of the column containing the entity ids. value_timestamp_col_name: The name of the column containing the timestamps for each value. additional columns containing values to aggregate. The name of the columns will be used for feature naming.
- aggregators: Sequence[Aggregator]#
- column_prefix: str = 'pred'#
- property df: pl.DataFrame#
- fallback: int | float | str | None#
- static from_primitives(df: pl.DataFrame, entity_id_col_name: str, lookbehind_days: Sequence[float | tuple[float, float]], aggregators: Sequence[AggregatorName], value_timestamp_col_name: str = 'timestamp', column_prefix: str = 'pred', fallback: int | float | str | None = 0) PredictorSpec [source]#
Create a PredictorSpec from primitives.
- lookbehind_distances: InitVar[Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]]#
- value_frame: ValueFrame#
- class PredictionTimeFrame(init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame], entity_id_col_name: str = 'entity_id', timestamp_col_name: str = 'pred_timestamp', prediction_time_uuid_col_name: str = 'prediction_time_uuid')[source]#
Bases:
object
Specification for prediction times, i.e. the times for which predictions are made.
- init_df must be a dataframe (pandas or polars) containing columns:
entity_id_col_name: The name of the column containing the entity ids. timestamp_col_name: The name of the column containing the timestamps for when to make a prediction.
- entity_id_col_name: str = 'entity_id'#
- init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame]#
- prediction_time_uuid_col_name: str = 'prediction_time_uuid'#
- timestamp_col_name: str = 'pred_timestamp'#
- class BooleanOutcomeSpec(init_frame: InitVar[TimestampValueFrame], lookahead_distances: Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]], aggregators: Sequence[Aggregator], output_name: str, column_prefix: str = 'outc')[source]#
Bases:
object
Specification for a boolean outcome, e.g. whether a patient received a treatment or not.
- The init_frame must contain columns:
entity_id_col_name: The name of the column containing the entity ids. Must be a string, and the column’s values must be strings which are unique. value_timestamp_col_name: The name of the column containing the timestamps of when the event occurs. Must be a string, and the column’s values must be datetimes.
- aggregators: Sequence[Aggregator]#
- column_prefix: str = 'outc'#
- property df: DataFrame#
- static from_primitives(df: pl.DataFrame | pd.DataFrame, entity_id_col_name: str, lookahead_days: Sequence[float | tuple[float, float]], aggregators: Sequence[AggregatorName], value_timestamp_col_name: str = 'timestamp', column_prefix: str = 'outc') BooleanOutcomeSpec [source]#
Create an OutcomeSpec from primitives.
- init_frame: InitVar[TimestampValueFrame]#
- lookahead_distances: Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]#
- output_name: str#
- class OutcomeSpec(value_frame: ValueFrame, lookahead_distances: InitVar[Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]], aggregators: Sequence[Aggregator], fallback: int | float | str | None, column_prefix: str = 'outc')[source]#
Bases:
object
Specification for an outcome. If your outcome is binary/boolean, you can use BooleanOutcomeSpec instead.
- aggregators: Sequence[Aggregator]#
- column_prefix: str = 'outc'#
- property df: DataFrame#
- fallback: int | float | str | None#
- static from_primitives(df: pl.DataFrame, entity_id_col_name: str, lookahead_days: Sequence[float | tuple[float, float]], aggregators: Sequence[AggregatorName], value_timestamp_col_name: str = 'timestamp', column_prefix: str = 'outc') OutcomeSpec [source]#
Create an OutcomeSpec from primitives.
- lookahead_distances: InitVar[Sequence[dt.timedelta | tuple[dt.timedelta, dt.timedelta]]]#
- value_frame: ValueFrame#
- class StaticFrame(init_df: 'InitVar[pl.DataFrame | pd.DataFrame]', entity_id_col_name: 'str' = 'entity_id')[source]#
Bases:
object
- entity_id_col_name: str = 'entity_id'#
- init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame]#
- class StaticSpec(value_frame: StaticFrame, column_prefix: str, fallback: int | float | str | None)[source]#
Bases:
object
Specification for a static feature, e.g. the sex of a person.
- The value_frame must contain columns:
entity_id_col_name: The name of the column containing the entity ids. Must be a string, and the column’s values must be strings which are unique. additional columns containing the values of the static feature. The names of the columns will be used for feature naming.
- column_prefix: str#
- fallback: int | float | str | None#
- static from_primitives(df: DataFrame, entity_id_col_name: str, column_prefix: str, fallback: int | float | str | None) StaticSpec [source]#
Create a StaticSpec from primitives.
- value_frame: StaticFrame#
- class TimeDeltaSpec(init_frame: 'TimestampValueFrame', fallback: 'int | float | str | None', output_name: 'str', column_prefix: 'str' = 'pred', time_format: "Literal['seconds', 'minutes', 'hours', 'days', 'years']" = 'days')[source]#
Bases:
object
- column_prefix: str = 'pred'#
- property df: pl.DataFrame#
- fallback: int | float | str | None#
- static from_primitives(df: pl.DataFrame, entity_id_col_name: str, output_name: str, value_timestamp_col_name: str = 'timestamp', column_prefix: str = 'pred', fallback: int | float | str | None = 0) TimeDeltaSpec [source]#
Create a TimeDeltaSpec from primitives.
- init_frame: TimestampValueFrame#
- output_name: str#
- time_format: Literal['seconds', 'minutes', 'hours', 'days', 'years'] = 'days'#
Specification for a time delta feature for an entity, i.e. the time between a prediction timestamp and a value timestamp (e.g. a birthdate). Useful for e.g. calculating age or the time since a certain event. Joins on the entity_id column.
- init_frame must contain columns:
entity_id_col_name: The name of the column containing the entity ids. Must be a string, and the column’s values must be strings which are unique. value_timestamp_col_name: The name of the column containing the timestamps of when the event occurs. Must be a string, and the column’s values must be datetimes.
output_name: the desired name of the feature column. time_format:
- class TimestampValueFrame(init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame], entity_id_col_name: str = 'entity_id', value_timestamp_col_name: str = 'timestamp')[source]#
Bases:
object
Timestamps, useful for computing e.g. age.
- Must contain columns:
entity_id_col_name: The name of the column containing the entity ids. Must be a string, and the column’s values must be strings which are unique. value_timestamp_col_name: The name of the column containing the timestamps. Must be a string, and the column’s values must be datetimes.
- entity_id_col_name: str = 'entity_id'#
- init_df: dataclasses.InitVar[polars.dataframe.frame.DataFrame | pandas.core.frame.DataFrame]#
- value_timestamp_col_name: str = 'timestamp'#