Multi-Task Experiment Notes
Notes
- Test independent attributes dataset, how to set values so that accuracy is not always 100%
- Add noise to school dataset to test distance v. relatedness
- Add error bars to relatedness plot.
- Error bars to nTask plot
Datasets
Results
Pareto Tradeoffs
Independent Attributes Artificial Dataset
- 2/22/07 For R=0 and R=0.2, results are as expected. Error is worse for increasing R. However, for R > 0.2, results improve. This may be because values are checked for range. Too large random values are capped at the bounds (0,1). Thus, for large R, all attributes are either separable or not, no in between. Changed experiment to fix this.
Preference Dataset
- 2/24/07 Tradeoff surface does not look like a tradeoff surface for SVM. Some points on the surface are dominated. Looks fine for DFL, though. For SVM, could be that opt val includes sum of error.
Accuracy v. Weights
- Accuracy does not change nicely with weights. There is some trend that more weight on error does increase accuracy.
- For the SVM, accuracy tends to increase as the weight for all different models increases. Need to average over folds.
- For DFL, seems that overfitting is present for large weight on error rate. Not very different across school datasets. DFL does not appear very good for pref data probably because the data are already distances. Computing distance between distances. For artificial data, nearly always get 100% accuracy.
- Average accuracy for pass / fail on school is 47%. SVM slightly better than DFL. Simple 1NN in Weka yields 71% accuracy. SMO yields 72%
Analysis of Accuracy v. Weight Curves
3/17/07
Assume 25 tasks, vary number of examples.
| Num. Example | 50 | 100 | 150 | 250 | 300 | 350 | 400 |
|---|
| avg | 93% | 90 | | 95 |
| shape | smooth | flat | | |
| asWeightTo1 | + | 0 |
| variance | 5% | 5 |
| Complete | =count(b2:h2) |
Implementation Notes
School Dataset
- 2/22/07 Binarized class at 50% of exam score. This results in many examples being duplicates. These duplicates were removed. This reduced the number of examples by 66% to approximately 5000 examples. In the experiment, this reduced set is used and then shuffled before applying stratification and cross validation.
- 2/24/07 Doing kernel matrix for this dataset will be hard because the number of examples in each task is different. Thus, the current solution is to not do the kernel matrix at all and consider only the original feature space.
Comments
No comments for this document