Racket Machine Learning — K-Nearest Neighbors
This package provides an implementation of the k-Nearest Neighbors algorithm for classification. It provides both a straightforward classifier function that takes a data set and an individual and returns the set of predicted classifier values for that individual.
The classifier function provided by this module can be used by the higher-order classification functions classify, cross-classify, and partitioned-test-classify provided by the package rml-core.
For more information on the k-NN algorithm, see Wikipedia and Scholar.
1 Module rml-knn/classifier
(require rml-knn/classifier) | package: rml-knn |
This package contains the procedures that implement the k-NN classifier itself. The classifier function returned from make-knn-classifier will in turn provide a list of classifiervalues predicted for an individual. Alternately, the nearest-k function will provide the set of closest neighbors that the classifier uses.
> (require rml/data rml/individual rml-knn/classifier)
> (define iris-data (load-data-set (path->string (collection-file-path "test/iris_training_data.csv" "rml")) 'csv (list (make-feature "sepal-length" #:index 0) (make-feature "sepal-width" #:index 1) (make-feature "petal-length" #:index 2) (make-feature "petal-width" #:index 3) (make-classifier "classification" #:index 4))))
> (define an-iris (make-individual #:data-set iris-data "sepal-length" 6.3 "sepal-width" 2.5 "petal-length" 4.9 "petal-width" 1.5 "classification" "Iris-versicolor")) > (define classify (make-knn-classifier 5)) > (classify iris-data default-partition an-iris) '("Iris-virginica")
The code block above demonstrates the classifier by constructing an individual and classifying it against the loaded data-set. Note that in this example the classifier returned Iris-virginica, whereas the individual was labeled as Iris-versicolor.
constructor
k : exact-positive-integer?
procedure
(nearest-k dataset partition individual k) → list?
dataset : data-set? partition : exact-nonnegative-integer? individual : individual? k : exact-positive-integer?
1.1 Data Transformations
From Scholarpedia:
… is a transformation which exploits uncertainty in feature values in order to increase classification performance. Fuzzification replaces the original features by mapping original values of an input feature into 3 fuzzy sets representing linguistic membership functions in order to facilitate the semantic interpretation of each fuzzy set }