You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recent work has hand-crafted more sophisticated primitives, such as
47
+
Recent work has proposed more sophisticated hand-crafted primitives:
48
48
49
-
-[Cutout](https://arxiv.org/abs/1708.04552)
50
-
-[Mixup](https://arxiv.org/pdf/1710.09412.pdf)
51
-
-[CutMix](https://arxiv.org/abs/1905.04899.pdf)
52
-
-[MixMatch](https://arxiv.org/pdf/1905.02249.pdf) and [ReMixMatch](https://arxiv.org/abs/1911.09785.pdf)
49
+
-[Cutout](https://arxiv.org/abs/1708.04552) randomly masks patches of the input image during training.
50
+
-[Mixup](https://arxiv.org/pdf/1710.09412.pdf) augments a training dataset with convex combinations of training examples. There is substantial empirical [evidence](https://papers.nips.cc/paper/2019/file/36ad8b5f42db492827016448975cc22d-Paper.pdf) that Mixup can improve generalization and adversarial robustness. A recent [theoretical analysis](https://arxiv.org/abs/2010.04819) helps explain these gains, showing that the Mixup loss can be approximated by standard ERM loss with regularization terms.
51
+
-[CutMix](https://arxiv.org/abs/1905.04899.pdf) combines the two approaches above: instead of summing two input images (like Mixup), CutMix pastes a random patch from one image onto the other and updates the label to be weighted sum of the two image labels proportional to the size of the cutouts.
52
+
-[MixMatch](https://arxiv.org/pdf/1905.02249.pdf) and [ReMixMatch](https://arxiv.org/abs/1911.09785.pdf) extend the utility of these techniques to semi-supervised settings.
53
53
54
54
While these primitives have culminated in compelling performance gains, they can often produce unnatural images and distort image semantics. However, data augmenation techniques such as [AugMix](https://arxiv.org/abs/1912.02781) can mix together various unnatural augmentations and lead to images that appear more natural.
55
55
@@ -135,5 +135,5 @@ Several open questions remain in data augmentation and synthetic data generation
_This area is a stub, you can help by improving it._
3
+
4
+
5
+
## Data Valuation
6
+
Quantifying the contribution of each training datapoint to an end model is useful in a number of settings:
7
+
1. in __active learning__ knowing the value of our training examples can help guide us in collecting more data
8
+
2. when __compensating__ individuals for the data they contribute to a training dataset (_e.g._ search engine users contributing their browsing data or patients contributing their medical data)
9
+
3. for __explaining__ a model's predictions and __debugging__ its behavior.
10
+
11
+
However, data valuation can be quite tricky.
12
+
The first challenge lies in selecting a suitable criterion for quantifying a datapoint's value. Most criteria aim to measure the gain in model performance attributable to including the datapoint in the training dataset. A common [approach](https://conservancy.umn.edu/handle/11299/37076), dubbed "leave-one-out", simply computes the difference in performance between a model trained on the full dataset and one trained on the full dataset minus one example. Recently, [Ghorbani _et al._](https://proceedings.mlr.press/v97/ghorbani19c/ghorbani19c.pdf) proposed a data valuation scheme based on the [Shapley value](https://en.wikipedia.org/wiki/Shapley_value), a classic solution in game theory for distributing rewards in cooperative games. Empirically, Data Shapley valuations are more effective in downstream applications (_e.g._ active learning) than "leave-one-out" valuations. Moreover, they have several intuitive properties not shared by other criteria.
13
+
14
+
15
+
Computing exact valuations according to either of these criteria requires retraining the model from scratch many times, which can be prohibitively expensive for large models. Thus, a second challenge lies in finding a good approximation for these measures. [Influence functions](https://arxiv.org/pdf/1703.04730.pdf) provide an efficient estimate of the "leave-one-out" measure that only requires on access to the model's gradients and hessian-vector products. Shapley values can be estimated with Monte Carlo samples or, for models trained via stochastic gradient descent, a simple gradient-based[approach](https://proceedings.mlr.press/v97/ghorbani19c/ghorbani19c.pdf).
Copy file name to clipboardExpand all lines: evaluation.md
+3-4Lines changed: 3 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,10 +37,9 @@ Automated methods for slice discovery include,
37
37
38
38
-[SliceFinder](https://research.google/pubs/pub47966/) is an interactive framework
39
39
for finding interpretable slices of data.
40
-
-[SliceLine](https://mboehm7.github.io/resources/sigmod2021b_sliceline.pdf) uses a fast slice-enumeration
41
-
method to make the process of slice discovery efficient and parallelizable.
42
-
-[GEORGE](https://arxiv.org/pdf/2011.12945.pdf), which uses standard approaches to cluster representations
43
-
of a deep model in order to discover underperforming subgroups of data.
40
+
-[SliceLine](https://mboehm7.github.io/resources/sigmod2021b_sliceline.pdf) uses a fast slice-enumeration method to make the process of slice discovery efficient and parallelizable.
41
+
-[GEORGE](https://arxiv.org/pdf/2011.12945.pdf) uses standard approaches to cluster representations of a deep model in order to discover underperforming subgroups of data.
42
+
-[Multiaccuracy Audit](https://arxiv.org/abs/1805.12317) is a model-agnostic approach that searches for slices on which the model performs poorly by training a simple "auditor" model to predict the full model's residual from input features. This idea of fitting a simple model to predict the predictions of the full model is also used in the context of [explainable ML](https://arxiv.org/pdf/1910.07969.pdf).
44
43
45
44
Future directions for slice discovery will continue to improve our understanding of how to find slices
46
45
that are interpretable, task-relevant, error-prone and susceptible to distribution shift.
0 commit comments