|
1 | | -# Data model and concepts |
2 | | - |
3 | | -### Concepts |
4 | | - |
5 | | -The top-level namespace within Feast is a [project](data-model-and-concepts.md#project). Users define one or more [feature views](data-model-and-concepts.md#feature-view) within a project. Each feature view contains one or more [features](data-model-and-concepts.md#feature) that relate to a specific [entity](data-model-and-concepts.md#entity). A feature view must always have a [data source](data-model-and-concepts.md#data-source). This source is used during the generation of training [datasets](data-model-and-concepts.md#dataset) and when materializing feature values into the online store. |
6 | | - |
7 | | - |
8 | | - |
9 | | -### Project |
10 | | - |
11 | | -Projects provide complete isolation of feature stores at the infrastructure level. This is accomplished through resource namespacing, e.g., prefixing table names with the associated project. Each project should be considered a completely separate universe of entities and features. It is not possible to retrieve features from multiple projects in a single request. We recommend having a single feature store and a single project per environment \(`dev`, `staging`, `prod`\). |
12 | | - |
13 | | -{% hint style="info" %} |
14 | | -Projects are currently being supported for backward compatibility reasons. The concept and functionality provided by Projects may change in the future as we simplify the Feast API. |
15 | | -{% endhint %} |
16 | | - |
17 | | -### Data Source |
18 | | - |
19 | | -Feast uses a time-series data model to represent data. This data model is used to interpret feature data in data sources in order to build training datasets or when materializing features into an online store. |
20 | | - |
21 | | -Below is an example data source with a single entity \(`driver`\) and two features \(`trips_today`, and `rating`\). |
22 | | - |
23 | | - |
24 | | - |
25 | | -### Entity |
26 | | - |
27 | | -An entity is a collection of semantically related features. Users define entities to map to the domain of their use case. For example, a ride-hailing service could have customers and drivers as their entities, which group related features that correspond to these customers and drivers. |
28 | | - |
29 | | -```python |
30 | | -driver = Entity(name='driver', value_type=ValueType.STRING, join_key='driver_id') |
31 | | -``` |
32 | | - |
33 | | -Entities are defined as part of feature views. Entities are used to identify the primary key on which feature values should be stored and retrieved. These keys are used during the lookup of feature values from the online store and the join process in point-in-time joins. It is possible to define composite entities \(more than one entity object\) in a feature view. |
34 | | - |
35 | | -Entities should be reused across feature views. |
36 | | - |
37 | | -### Feature |
38 | | - |
39 | | -A feature is an individual measurable property observed on an entity. For example, a feature of a `customer` entity could be the number of transactions they have made on an average month. |
40 | | - |
41 | | -Features are defined as part of feature views. Since Feast does not transform data, a feature is essentially a schema that only contains a name and a type: |
42 | | - |
43 | | -```python |
44 | | -trips_today = Feature( |
45 | | - name="trips_today", |
46 | | - dtype=ValueType.FLOAT |
47 | | -) |
48 | | -``` |
49 | | - |
50 | | -Together with [data sources](data-model-and-concepts.md#data-source), they indicate to Feast where to find your feature values, e.g., in a specific parquet file or BigQuery table. Feature definitions are also used when reading features from the feature store, using [feature references](data-model-and-concepts.md#feature-references). |
51 | | - |
52 | | -Feature names must be unique within a [feature view](data-model-and-concepts.md#feature-view). |
53 | | - |
54 | | -### Feature View |
55 | | - |
56 | | -A feature view is an object that represents a logical group of time-series feature data as it is found in a data source. Feature views consist of one or more entities, features, and a data source. Feature views allow Feast to model your existing feature data in a consistent way in both an offline \(training\) and online \(serving\) environment. |
57 | | - |
58 | | -{% tabs %} |
59 | | -{% tab title="driver\_trips\_feature\_view.py" %} |
60 | | -```python |
61 | | -driver_stats_fv = FeatureView( |
62 | | - name="driver_activity", |
63 | | - entities=["driver"], |
64 | | - features=[ |
65 | | - Feature(name="trips_today", dtype=ValueType.INT64), |
66 | | - Feature(name="rating", dtype=ValueType.FLOAT), |
67 | | - ], |
68 | | - input=BigQuerySource( |
69 | | - table_ref="feast-oss.demo_data.driver_activity" |
70 | | - ) |
71 | | -) |
72 | | -``` |
73 | | -{% endtab %} |
74 | | -{% endtabs %} |
75 | | - |
76 | | -Feature views are used during |
77 | | - |
78 | | -* The generation of training datasets by querying the data source of feature views in order to find historical feature values. A single training dataset may consist of features from multiple feature views. |
79 | | -* Loading of feature values into an online store. Feature views determine the storage schema in the online store. |
80 | | -* Retrieval of features from the online store. Feature views provide the schema definition to Feast in order to look up features from the online store. |
81 | | - |
82 | | -{% hint style="info" %} |
83 | | -Feast does not generate feature values. It acts as the ingestion and serving system. The data sources described within feature views should reference feature values in their already computed form. |
84 | | -{% endhint %} |
| 1 | +# Data model |
85 | 2 |
|
86 | 3 | ### Dataset |
87 | 4 |
|
@@ -147,42 +64,3 @@ Example of an entity dataframe with feature values joined to it: |
147 | 64 |
|
148 | 65 |  |
149 | 66 |
|
150 | | -### **Online Store** |
151 | | - |
152 | | -The Feast online store is used for low-latency online feature value lookups. Feature values are loaded into the online store from data sources in feature views using the `materialize` command. |
153 | | - |
154 | | -The storage schema of features within the online store mirrors that of the data source used to populate the online store. One key difference between the online store and data sources is that only the latest feature values are stored per entity key. No historical values are stored. |
155 | | - |
156 | | -Example batch data source |
157 | | - |
158 | | - |
159 | | - |
160 | | -Once the above data source is materialized into Feast \(using `feast materialize`\), the feature values will be stored as follows: |
161 | | - |
162 | | - |
163 | | - |
164 | | -### Offline Store |
165 | | - |
166 | | -An offline store is a storage and compute system where historic feature data can be stored or accessed for building training datasets or for sourcing data for materialization into the online store. |
167 | | - |
168 | | -Offline stores are used primarily for two reasons |
169 | | - |
170 | | -1. Building training datasets |
171 | | -2. Querying data sources for feature data in order to load these features into your online store |
172 | | - |
173 | | -Feast does not actively manage your offline store. Instead, you are asked to select an offline store \(like `BigQuery` or the `File` offline store\) and then to introduce batch sources from these stores using [data sources](data-model-and-concepts.md#data-source) inside feature views. |
174 | | - |
175 | | -Feast will use your offline store to query these sources. It is not possible to query all data sources from all offline stores, and only a single offline store can be used at a time. For example, it is not possible to query a BigQuery table from a `File` offline store, nor is it possible for a `BigQuery` offline store to query files in your local file system. |
176 | | - |
177 | | -Please see [feature\_store.yaml](../reference/feature-store-yaml.md#overview) for configuring your offline store. |
178 | | - |
179 | | -### **Provider** |
180 | | - |
181 | | -A provider is an implementation of a feature store using specific feature store components targeting a specific environment**.** More specifically, a provider is the target environment to which you have configured your feature store to deploy and run. |
182 | | - |
183 | | -Providers are built to orchestrate various components \(offline store, online store, infrastructure, compute\) inside an environment. For example, the `gcp` provider may only support `BigQuery` as an offline store and `datastore` as the online store, but it ensures that these components can work together seamlessly. |
184 | | - |
185 | | -Providers also come with default configurations which makes it easier for users to start a feature store in a specific environment. |
186 | | - |
187 | | -Please see [feature\_store.yaml](../reference/feature-store-yaml.md#overview) for configuring a provider. |
188 | | - |
0 commit comments