Introduction
Getting started
Learn about how Aligned works, and the problems it tries to solve in a few minutes.
Installation
Step-by-step guides to setting up your system and installing the library.
Architecture guide
Learn how the internals work and contribute.
Examples
See practical examples of how to use Aligned
API reference
Learn to easily customize and modify your app's visual design to fit your brand.
Quick start
Here will we describe how you can quickly get started using Alinged.
Installing dependencies
Install aligned
through your favorite Python package manager.
Pip
pip install aligned
Poetry
poetry add aligned
Now that aligned
is installed, can we start describing our system data logic.
Basic usage
Here will we go throuh some basic usage. We will try to define a minimal titanic
model, and show how to load features for training and inference.
Define a data source
Our first step will be to point to a data source. aligned
currently supports the following data sources:
- PostgreSQL
- Redshift
- Parquet
- CSV
- S3 files
For the following example will we use a CSV
file located in the MatsMoll/aligned-example
GitHub repo.
from aligned import FileSource
repo_url = "https://github.com/MatsMoll/aligned-example/raw/main"
titanic_source = FileSource.csv_at(f"{repo_url}/data/titanic.csv")
We have now defined a reference to a data file containing different featurs. However, we have not defined which features exists, or how we want to identify each row.
Here is where a FeatureView
comes in.
Your first Feature View
The clasical titanic data set contains a lot of features, but we will only focus on a subset of them here. Furthermore, for this use-case will it make the most sense to load features based on a passenger_id
. Therefore, we will set the passenger_id
as the entity. Which is done with the .as_entity()
method.
Furthermore, we want to define that the titanic_source
contains an age features of type Float
, the sibsp
which is an Int
, the sex
which is a String
, and finaly if the passenger survived
.
from aligned import feature_view, Int32, Bool, Float, String
from examples.titanic.source import titanic_source
@feature_view(
name="titanic",
description="Features from the titanic dataset",
batch_source=titanic_source,
)
class TitanicPassenger:
passenger_id = Int32().as_entity()
age = Float().description("Come of the ages are a float as `0.8`")
sibsp = Int32().description("Number of siblings on titanic")
has_siblings = sibsp > 0
sex = String().accepted_values(["male", "female"])
is_male, is_female = sex.one_hot_encode(['male', 'female'])
survived = Bool().is_required()
Create a Model Contract
Now that we have our features, let's define which features our ML model will use, and what it will predict.
We will that we want to use the features from TitanicPassenger
. Which can be done with passenger = TitanicPassenger()
. Furthermore, we want to make sure that the passenger.survived
feature is our classification label.
from aligned import model_contract
from examples.titanic.passenger import TitanicPassenger
passenger = TitanicPassenger()
@model_contract(
name="titanic",
description="A model predicting if a passenger will survive on titanic",
features=[
passenger.age,
passenger.is_male,
passenger.has_siblings
],
)
class TitanicModel:
survived = passenger.survived.as_classification_label()
Why create feature views?
Notice that we create the feature view inside the Model
. This is on purpose to provide the developer with code completion for all the availible features, and so that linters can catch errors faster! 🚀 Furthermore, this tells aligned
which entites are needed to query all the wanted features, as they are derived besed on the referenced feature views.
This model was a fairly simple on, as only data source was used with trivial transformations. However, check out our examples for more complicated examples.
Data lineage
One of the most powerful features of Aligned is it's capability to understand the data lineage implicitly. The data lineage will be captured for both between transformations, and models. Therefore, making it possible to understand, view how data flows in the ML system. However, even more powerful is that Aligned can remove unneeded transformations, and reduce the computational load since we know every concumer of our features.
Load a data set
We can finaly load data with a few lines of code. The code shown below will load features from the batch source defined in our FeatureView
s.
entities = {
"passenger_id": [10, 11, 20, 100]
}
df = await store.model("titanic").features_for(entities).to_pandas()
Load inference features
However, for inference we may want features with lower latency. Therefore, we can easily load data from a low latency storage with the following code.
redis_store = store.with_source(Redis.localhost())
entities = {
"passenger_id": [10, 11, 20, 100]
}
df = await redis_store.model("titanic").features_for(entities).to_pandas()
Notice that the only differn is that we define which store to use, and it will not load them from a Redis key value store.
Getting help
Everyone will stuble upon some challenges. Therefore, we will happly help you whenever you need some guidance. The best way to get help will be to contact us in the Discord, so why not join us there already.
Submit an issue
The Aligned project is still in an early development phase. Therefore, some bugs may exists. However, if you know how to fix the problem, maybe submit an issue, or a PR and contribute.
Join the community
Our community have is just starting to grow. So get in touch and join us in our Discord.