Core concepts

Feature View

To load features into models do aligned use the concept of a feature view. Feature views can kind of be seen as a data model from the BI domain. Therefore, it could be the gold / mart, silver / intermediate, or bronze / staging layer if you want to get crazy.

Schema Definition

To define the schema is it almost as easy as setting up a dataclass.

Let's use the following schema as an example.

Column nameData type
zipcodeInt
location_typeString
populationInt
event_timestampDatetime
created_timestampDatetime

To define the features we can we use the following code.

from aligned import FeatureView, String, Int64, EventTimestamp, Timestamp, FileSource


@feature_view(...)
class Zipcode:

    zipcode = Int64().as_entity()

    event_timestamp = EventTimestamp()
    created_timestamp = Timestamp()

    location_type = String()
    population = Int64()

This defines all our columns above, with their data types, and some extra semantic meaning. Like entity and event timestamp in case of historic data.

@feature_view

But what is this @feature_view?

This contains all metadata related to our schema. This could be, our main source, a materialized source, owners, descriptions, expected freshness and more.

Source

The main source of our features

@feature_view(
    name="zipcode",
    source=FileSource.parquet_at("data/zipcode_table.parquet")
)
class Zipcode:
    ...

Materialized Source

The materialized source of our features.

This can be usefull for caching downstream transformations, or moving data to a more performant data storage.

zipcode_source = 

@feature_view(
    name="zipcode",
    source=FileSource.csv_at("data/zipcode_table.csv"),
    materialized_source=FileSource.parquet_at("data/zipcode_table.parquet")
)
class Zipcode:
    ...

Freshness

Most use-cases will most likely not be streaming. Therefore, we often load data at a schedule. As a result, aligned allow you to define how long of a time period is acceptable to not have updated features, but also what is unacceptable.

@feature_view(
    name="zipcode",
    source=FileSource.parquet_at("data/zipcode_table.parquet"),
    acceptable_freshness=timedelta(hours=1),
    unacceptable_freshness=timedelta(hours=3)
)
class Zipcode:
    event_timestamp = EventTimestamp()

Metadata

Furthermore, you can also add description, tags and a list of contacts.

@feature_view(
    name="zipcode",
    source=FileSource.parquet_at("data/zipcode_table.parquet"),
    description="The zipcode features in Norway",
    contacts=["MatsMoll"],
    tags=["eta-team"]
)
class Zipcode:
    ...

Load data

Finaly we can load data with the following code.

df = await Zipcode.query().all().to_pandas()

Or if we have a loaded feature store.

df = await store.feature_view("zipcode_features").all().to_pandas()
Previous
Idiology