|
1519 | 1519 | "So we are also looking for MAXIMAL values of GT when possible\n", |
1520 | 1520 | "\n", |
1521 | 1521 | "** 1. Capital loss:**\n", |
1522 | | - "This feature contains 3 subfeatures (capital gain amounts of 10-20K, 20-30K and 90-100K) with majortiy '>50K' class and clearly discriminated from the rest of the subfeatures. Among all features, capital gain fulfills all of our simplified feature importance criteria.\n", |
| 1522 | + "This feature contains 3 subfeatures (capital gain amounts of 10-20K, 20-30K and 90-100K) with majority '>50K' class and clearly discriminated from the rest of the subfeatures. Among all features, capital gain fulfills all of our simplified feature importance criteria.\n", |
1523 | 1523 | "\n", |
1524 | 1524 | "** 2. Capital gain:**\n", |
1525 | | - "This feature contains 2 subfeatures (capital loss amounts of 1.5-2K, 2.5-3K) with majortiy '>50K' class and 1 subfeature (2-2.5K) with R > 50%. These attribute groups are clearly discriminated from the rest of the subfeatures. Capital loss fulfills all of our simplified feature importance criteria.\n", |
| 1525 | + "This feature contains 2 subfeatures (capital loss amounts of 1.5-2K, 2.5-3K) with majority '>50K' class and 1 subfeature (2-2.5K) with R > 50%. These attribute groups are clearly discriminated from the rest of the subfeatures. Capital loss fulfills all of our simplified feature importance criteria.\n", |
1526 | 1526 | "\n", |
1527 | 1527 | "** 3. Education Level:**\n", |
1528 | | - "It contains subfeatures \"Masters\", \"Doctorate\" and \"Prof. School\" with majortiy '>50K' class, and \"Bachelors\", with a high percentage of \">50K\" earners in that group R > 0.5. Althought these groups discriminate between the two income classes, the '>50K' (green) disbribution spreads a bit more among the rest of the groups in comparison to capital gain and loss. \n", |
| 1528 | + "It contains subfeatures \"Masters\", \"Doctorate\" and \"Prof. School\" with majority '>50K' class, and \"Bachelors\", with a high percentage of \">50K\" earners in that group R > 0.5. Althought these groups discriminate between the two income classes, the '>50K' (green) distribution spreads a bit more among the rest of the groups in comparison to capital gain and loss. \n", |
1529 | 1529 | "\n", |
1530 | 1530 | "** 4. Marital Status:**\n", |
1531 | 1531 | "Contains the subfeature \"Married-civ-Spouse\" that is clearly discriminated from the rest of attribute groups, and which contains a significant proportion of '>50K' earners, R >0.5.\n", |
1532 | 1532 | "\n", |
1533 | 1533 | "** 5. Age:**\n", |
1534 | 1534 | "The class distribution appears to be the envelope of a normal distribution, which is good for expectation values in relation to '>50K' class. Near the center we have subfeatures corresponding to age groups 40-50, 50-60 with a significant proportion of '>50K', R >0.5.\n", |
1535 | 1535 | "\n", |
1536 | | - "The rest of the features do not fulfill our \"rule of thumb\" criteria except for \"Occupation\", which contains a several \">50K\" groups wiht R>0.5, but no '>50K' majority groups. In addition, 'occupation' seems to be correlated or dependent on the education feature (possibly redundant)." |
| 1536 | + "The rest of the features do not fulfill our \"rule of thumb\" criteria except for \"Occupation\", which contains a several \">50K\" groups with R>0.5, but no '>50K' majority groups. In addition, 'occupation' seems to be correlated or dependent on the education feature (possibly redundant)." |
1537 | 1537 | ] |
1538 | 1538 | }, |
1539 | 1539 | { |
|
0 commit comments