Aligned - Control your ML data debt

Learn about how Aligned works, and the problems it tries to solve in a few minutes.

Installation

Step-by-step guides to setting up your system and installing the library.

Architecture guide

Learn how the internals work and contribute.

Examples

See practical examples of how to use Aligned

API reference

Learn to easily customize and modify your app's visual design to fit your brand.

Quick start

Here will we describe how you can quickly get started using Alinged.

Installing dependencies

Install aligned through your favorite Python package manager.

Pip

pip install aligned

Poetry

poetry add aligned

Now that aligned is installed, can we start describing our system data logic.

Basic usage

Here will we go throuh some basic usage. We will try to define a minimal titanic model, and show how to load features for training and inference.

Define a data source

Our first step will be to point to a data source. aligned currently supports the following data sources:

PostgreSQL
Redshift
Parquet
CSV
S3 files

For the following example will we use a CSV file located in the MatsMoll/aligned-example GitHub repo.

from aligned import FileSource
repo_url = "https://github.com/MatsMoll/aligned-example/raw/main"
titanic_source = FileSource.csv_at(f"{repo_url}/data/titanic.csv")

We have now defined a reference to a data file containing different featurs. However, we have not defined which features exists, or how we want to identify each row.

Here is where a FeatureView comes in.

Your first Feature View

The clasical titanic data set contains a lot of features, but we will only focus on a subset of them here. Furthermore, for this use-case will it make the most sense to load features based on a passenger_id. Therefore, we will set the passenger_id as the entity. Which is done with the .as_entity() method.

Furthermore, we want to define that the titanic_source contains an age features of type Float, the sibsp which is an Int, the sex which is a String, and finaly if the passenger survived.

from aligned import feature_view, Int32, Bool, Float, String
from examples.titanic.source import titanic_source

@feature_view(
    name="titanic",
    description="Features from the titanic dataset",
    batch_source=titanic_source,
)
class TitanicPassenger:
    
    passenger_id = Int32().as_entity()

    age = Float().description("Come of the ages are a float as `0.8`")

    sibsp = Int32().description("Number of siblings on titanic")
    has_siblings = sibsp > 0
    
    sex = String().accepted_values(["male", "female"])
    is_male, is_female = sex.one_hot_encode(['male', 'female'])

    survived = Bool().is_required()

Create a Model Contract

Now that we have our features, let's define which features our ML model will use, and what it will predict.

We will that we want to use the features from TitanicPassenger. Which can be done with passenger = TitanicPassenger(). Furthermore, we want to make sure that the passenger.survived feature is our classification label.

from aligned import model_contract
from examples.titanic.passenger import TitanicPassenger

passenger = TitanicPassenger()

@model_contract(
    name="titanic",
    description="A model predicting if a passenger will survive on titanic",
    features=[
        passenger.age,
        passenger.is_male,
        passenger.has_siblings
    ],
)
class TitanicModel:

    survived = passenger.survived.as_classification_label()

Why create feature views?

Notice that we create the feature view inside the Model. This is on purpose to provide the developer with code completion for all the availible features, and so that linters can catch errors faster! 🚀 Furthermore, this tells aligned which entites are needed to query all the wanted features, as they are derived besed on the referenced feature views.

This model was a fairly simple on, as only data source was used with trivial transformations. However, check out our examples for more complicated examples.

Data lineage

One of the most powerful features of Aligned is it's capability to understand the data lineage implicitly. The data lineage will be captured for both between transformations, and models. Therefore, making it possible to understand, view how data flows in the ML system. However, even more powerful is that Aligned can remove unneeded transformations, and reduce the computational load since we know every concumer of our features.

Aligned UI presenting the data lineage of a model

Load a data set

We can finaly load data with a few lines of code. The code shown below will load features from the batch source defined in our FeatureViews.

entities = {
    "passenger_id": [10, 11, 20, 100]
}
df = await store.model("titanic").features_for(entities).to_pandas()

Load inference features

However, for inference we may want features with lower latency. Therefore, we can easily load data from a low latency storage with the following code.

redis_store = store.with_source(Redis.localhost())

entities = {
    "passenger_id": [10, 11, 20, 100]
}
df = await redis_store.model("titanic").features_for(entities).to_pandas()

Notice that the only differn is that we define which store to use, and it will not load them from a Redis key value store.

Getting help

Everyone will stuble upon some challenges. Therefore, we will happly help you whenever you need some guidance. The best way to get help will be to contact us in the Discord, so why not join us there already.

Submit an issue

The Aligned project is still in an early development phase. Therefore, some bugs may exists. However, if you know how to fix the problem, maybe submit an issue, or a PR and contribute.

Join the community

Our community have is just starting to grow. So get in touch and join us in our Discord.