(sitting at SKA)
It is my intent to complete the classification test suite today, using the benchmark Iris dataset.
The Iris dataset offers 4 features (columns in a .csv) for each of 3 unique plant species, as labelled in the right-most (5th) column. As 2 of these species are not discernible from each other with a linear function, there are 2 ways to approach this problem:
- Compare only 2 species at a time (in a round-robin) such that we work with A/B, A/C, and finally B/C; or
- Seek a non-linear function with a kernel which supports more than 2 classes at a time
As Karoo GP supports only 2 classes at this time, I am going to work with the first option, and come back to the second when I have time to build a new kernel.
If it is our intent to plot the results, we can engage only 3 dimensions of data at one (without forcing higher dimensions into lower orders). However, the Iris dataset is compose of 4 features (a, b, c and d). As whatever holds true in lower dimensions will be retained in higher dimensions, it is safe to test a lower order of the featureset as long as the features we selected are decisive in the identification of the flower species. As such that we remove column d from all 3 comparisons.
This resulted in what appears to be a fully functional classification by Karoo GP, with both linear and non-linear functions that produce 100% (50/50) correct classifications.