Generates synthetic positive instances using nearmiss algorithm.
Arguments
- df
data.frame or tibble. Must have 1 factor variable and remaining numeric variables.
- var
Character, name of variable containing factor variable.
- k
An integer. Number of nearest neighbor that are used to generate the new examples of the minority class.
- under_ratio
A numeric value for the ratio of the majority-to-minority frequencies. The default value (1) means that all other levels are sampled down to have the same frequency as the least occurring level. A value of 2 would mean that the majority levels will have (at most) (approximately) twice as many rows than the minority level.
References
Inderjeet Mani and I Zhang. knn approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets, 2003.
Examples
circle_numeric <- circle_example[, c("x", "y", "class")]
res <- nearmiss(circle_numeric, var = "class")
res <- nearmiss(circle_numeric, var = "class", k = 10)
res <- nearmiss(circle_numeric, var = "class", under_ratio = 1.5)