step_adasyn
creates a specification of a recipe
step that generates synthetic positive instances using ADASYN algorithm.
step_adasyn( recipe, ..., role = NA, trained = FALSE, column = NULL, over_ratio = 1, neighbors = 5, skip = TRUE, seed = sample.int(10^5, 1), id = rand_id("adasyn") ) # S3 method for step_adasyn tidy(x, ...)
recipe | A recipe object. The step will be added to the sequence of operations for this recipe. |
---|---|
... | One or more selector functions to choose which
variable is used to sample the data. See |
role | Not used by this step since no new variables are created. |
trained | A logical to indicate if the quantities for preprocessing have been estimated. |
column | A character string of the variable name that will
be populated (eventually) by the |
over_ratio | A numeric value for the ratio of the majority-to-minority frequencies. The default value (1) means that all other levels are sampled up to have the same frequency as the most occurring level. A value of 0.5 would mean that the minority levels will have (at most) (approximately) half as many rows than the majority level. |
neighbors | An integer. Number of nearest neighbor that are used to generate the new examples of the minority class. |
skip | A logical. Should the step be skipped when the
recipe is baked by |
seed | An integer that will be used as the seed when applied. |
id | A character string that is unique to this step to identify it. |
x | A |
An updated version of recipe
with the new step
added to the sequence of existing steps (if any). For the
tidy
method, a tibble with columns terms
which is
the variable used to sample.
All columns in the data are sampled and returned by juice()
and bake()
.
All columns used in this step must be numeric with no missing data.
When used in modeling, users should strongly consider using the
option skip = TRUE
so that the extra sampling is not
conducted outside of the training set.
He, H., Bai, Y., Garcia, E. and Li, S. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Proceedings of IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference. pp.1322-1328.
#> #> <NA> stem other #> 0 9539 50316ds_rec <- recipe(Class ~ age + height, data = okc) %>% step_meanimpute(all_predictors()) %>% step_adasyn(Class) %>% prep() sort(table(bake(ds_rec, new_data = NULL)$Class, useNA = "always"))#> #> <NA> stem other #> 0 50316 50316# since `skip` defaults to TRUE, baking the step has no effect baked_okc <- bake(ds_rec, new_data = okc) table(baked_okc$Class, useNA = "always")#> #> stem other <NA> #> 9539 50316 0library(ggplot2) ggplot(circle_example, aes(x, y, color = class)) + geom_point() + labs(title = "Without ADASYN")recipe(class ~ ., data = circle_example) %>% step_adasyn(class) %>% prep() %>% bake(new_data = NULL) %>% ggplot(aes(x, y, color = class)) + geom_point() + labs(title = "With ADASYN")