Update README.md

lorr1 · web-flow · commit fa5e287c4232 · 2021-08-16T19:23:45.000-06:00
I'll do the restructure to incorporate ecosystems in a day or two. But wanted to add this now.
diff --git a/README.md b/README.md
@@ -90,7 +90,7 @@ The choice of transformations used in augmentation is an important consideration
 
 [Self-Supervision Area Page](self-supervision.md)
 
-The need for large, labeled datasets has motivated methods to pre-train latent representations of the input space using unlabeled data and use the resulting knowledge-rich representations in downstream tasks. As the representations allow for knowledge transfer to downstream tasks, these tasks require less labeled data. For example, language models can be pre-trained to predict the next token in a textual input to learn representations of words or sub-tokens. These word representations are then used in downstream models such as sentiment classification. This paradigm, called "self-supervision", has revolutionized how we train (and pre-train) models. Importantly, these self-supervised pre-trained models learn without manual labels or hand curated features. This reduces the engineer effort to create and maintain features and makes models significantly easier to deploy and maintain. This shift has allowed for more data to be fed to the model and shifted the focus to understanding what data to use.
+The need for large, labeled datasets has motivated methods to pre-train latent representations of the input space using unlabeled data and use the resulting knowledge-rich representations in downstream tasks. As the representations allow for knowledge transfer to downstream tasks, these tasks require less labeled data. This paradigm, called "self-supervision", has revolutionized how we train (and pre-train) models. These models, which are recently termed "foundation models" by the [Stanford initiative](https://arxiv.org/abs/2108.07258) around understanding self-supervised ecosystems, has shifted focus away from hand-labeled data towards understanding what data to fed to these models.
 
 As self-supervised data is often curated from large, public data sources (e.g., Wikipedia), it can contain popularity bias where the long tail of rare things are not well represented in the training data. As [Orr et. al.](https://arxiv.org/pdf/2010.10363.pdf) show, some popular models (e.g., BERT) rely on context memorization and struggle to resolve this long tail as they are incapable of seeing a rare thing enough times to memorize the diverse set of patterns associated with it. The long tail problem even propagates to downstream tasks, like retrieval tasks from [AmbER](https://arxiv.org/pdf/2106.06830.pdf). One exciting future direction that lies at the intersection of AI and years of research from the data management community to address the long tail is through the integration of structured knowledge into the model. Structured knowledge is the core idea behind the tail success of [Bootleg](https://arxiv.org/pdf/2010.10363.pdf), a system for Named Entity Disambiguation.