Defining Datasets and Transfering Between Them #34
Replies: 2 comments
-
|
Beta Was this translation helpful? Give feedback.
-
I've done some work following the initial dump of ideas and our discussion in writing up on overleaf, with the goal of also tying into some of the theoretical predictions made in #24. Apologies in advance as there's a lot of repetition on the above, but hopefully this is a more coherent account with more consistent notation, and the implications made clearly follow
The key implications here are:
If we set
Broadly, we would expect that transfer success would depend on the similarity of the task (and therefore also the domain, which is relevant to the task). E.g. learning a supervised CIFAR classifier should aid in an unsupervised CIFAR classifier because the domain is identical, even if the task is not. The task in other words depends on Unless we have all possible members of the feature(/label) space(s) in our dataset, we cannot measure the space generally. We do have the aforementioned empirical estimators It follows that e.g.
A point I'm stuck on:
This also helps generate a typology for the metrics:
all of which is relevant to the potential generalisation of these metrics (assuming they perform well) in the conclusion. It's also relevant to the prediction that PAD might struggle with data dropping alone, since the reletive size of the two datasets may influence its result even though this isn't relevant to e.g. domain similarity. Hopefully this helps make the above more coherent, and the extra ideas are clear |
Beta Was this translation helpful? Give feedback.
-
Other Definitions of Datasets, Domains, and Tasks
To set up theroetical expectations, it would be good to have a rough definition of datasets to use. Initially, I considered using the definition from the optimal transport dataset distance paper, which is that a dataset$\mathcal{D}$ is a set of feature-label pairs $(x,y) \in \mathcal{X} \times \mathcal{Y}$ , where $\mathcal{X}$ is a feature space and $\mathcal{Y}$ is a label space.
However, when thinking this definition through, it made me realise a few things, some more relevant, some less:
I then considered the paper A Survey on Transfer Learning (originally linked An introduction to domain adaptation and transfer learning), which instead of defining a dataset defines a domain$\mathcal{D} = \{\mathcal{X}, P(X)\} $ , where $\mathcal{X}$ is a feature space and $P(X)$ is a marginal distribution where $X = \{x_1, x_2, \ldots, x_n\} \in \mathcal{X}$ . Given a specific domain, a task is denoted by $\mathcal{T} = \{\mathcal{Y}, f(\cdot)\}$ , where $\mathcal{Y}$ is a label space and $f(\cdot)$ is a function that is learned from pairs of training data.
What I find helpful about this framework is both the fact with a small adjustment, it can be made more agnostic to which of the three broad types of learning is being performed.
Defining Transfer Learning
In the Survey on Transfer Learning paper, transfer learning is defined as starting from a source domain$\mathcal{D}_S$ and learning task $\mathcal{T}_S$ , and a target domain $\mathcal{D}_T$ and learning task $\mathcal{T}_T$ , using the knowledge from $\mathcal{D}_S$ and $\mathcal{T}_S$ to aid in the learning of $f_T(\cdot)$ .
An imporant condition is that either$\mathcal{D}_S \neq \mathcal{D}_T$ or $\mathcal{T}_S \neq \mathcal{T}_T$ . If neither of these conditions are true, we are no longer performing transfer learning.
The first condition implies that either$\mathcal{X}_S \neq \mathcal{X}_T$ or $P_S(X) \neq P_T(X)$ . So either the feature space must be different, or the marginal distribution of variables drawn from that feature space must be different. If we consider learning $f(\cdot)$ as learning $P(Y|X)$ , then the second condition implies that either $\mathcal{Y}_S \neq \mathcal{Y}_T$ or $P(Y_S|X_S) \neq P(Y_T|X_T)$ .
From the perspective of datasets, this has some implications:
Defining Datasets for Our Purposes
So, some considerations:
So, given the above definitions of dataset, domain, and task (albeit with some confusingly overlapping notation between the two papers), I think where I'm at is to go down the following path:
If we're agreed that this all makes sense, then I'll write it out more formally (and with non-overlapping notation for e.g. domain and dataset).
Transfer Learning vs Transfer Attacks
It's also worth considering another point: that the task of a transfer attack differs from transfer learning. In a transfer attack, we learn$f_S: \mathcal{X}_S \rightarrow \mathcal{Y}_S$ for the surrogate dataset and $f_T: \mathcal{X}_T \rightarrow \mathcal{Y}_T$ . We then want to train an attack $A$ on $f_S$ such that the cost when applied to $f_T$ is maximised (assuming we just want it to be classified incorrectly. If we want to assign a specific label to it, it's not easy to imagine how that kind of attack looks when the label spaces are not the same). Here though $X_T$ and $X_S$ must allow for the same inputs (although both elements of the domain could be different - consider e.g. CIFAR vs rotated CIFAR), we can't assume that $Y_S$ and $Y_T$ belong to the same label space. So the mapping of $X$ to $Y$ becomes particularly relevant, even when the feature space is the same.
Beta Was this translation helpful? Give feedback.
All reactions