itemCoding {arules} | R Documentation |
The order in which items are stored in an itemMatrix
is called the item coding. The following generic functions and S4 methods are used to translate between the binary representation in the itemMatrix format (used in transactions, rules and itemsets), item labels and numeric item IDs (i.e., the column numbers in the binary representation).
encode(x, ...) ## S4 method for signature 'list' encode(x, itemLabels, itemMatrix = TRUE) ## S4 method for signature 'character' encode(x, itemLabels, itemMatrix = TRUE) ## S4 method for signature 'numeric' encode(x, itemLabels, itemMatrix = TRUE) compatible(x, y) recode(x, ...) ## S4 method for signature 'itemMatrix' recode(x, itemLabels = NULL, match = NULL) ## S4 method for signature 'itemsets' recode(x, itemLabels = NULL, match = NULL) ## S4 method for signature 'rules' recode(x, itemLabels = NULL, match = NULL) decode(x, ...) ## S4 method for signature 'list' decode(x, itemLabels) ## S4 method for signature 'numeric' decode(x, itemLabels)
x |
a vector or a list of vectors of character strings
(for |
itemLabels |
a vector of character strings used for coding where
the position of an item label in the vector gives the item's column ID.
Alternatively, a |
itemMatrix |
return an object of class |
y |
an object of class |
match |
deprecated: used |
... |
further arguments. |
Item compatibility:
If you deal with several datasets or different subsets of the same dataset and want to combine or compate the found itemsets or rules, then you need to make sure that all transaction sets have a compatible item coding. That is, the sparse matrices representing the items have columns for the same items in exactly the same order. The coercion to transactions with as(x, "transactions")
will create the item coding by adding items when they are encountered in the dataset. This can lead to different item codings (different order, missing items) for even only slightly different datasets. You can use the method compatible
to check if two sets have the same item coding.
If you work with many sets, then you should first define a common item coding by creating a vector with all possible item labels and then use either encode
to create transactions or recode
to make a different set compatible.
The following function help with creating and changing the item coding to make them compatible.
encode
converts from readable item labels to an itemMatrix using a
given coding. With this method it is possible to create several compatible
itemMatrix
objects (i.e., use the same binary representation for
items) from data.
decode
converts from the column IDs used in the itemMatrix representation to
item labels. decode
is used by LIST
.
recode
recodes an itemMatrix object so its coding is compatible
with another itemMatrix object specified in itemLabels
(i.e., the columns are reordered to match).
recode
always returns an object
of class itemMatrix
.
For encode
with itemMatrix = TRUE
an object
of class itemMatrix
is returned.
Otherwise the result is of the same type as x
, e.g., a
list or a vector.
Michael Hahsler
LIST
,
associations-class
,
itemMatrix-class
data("Adult") ## Example 1: Manual decoding ## Extract the item coding as a vector of item labels. iLabels <- itemLabels(Adult) head(iLabels) ## get undecoded list (itemIDs) list <- LIST(Adult[1:5], decode = FALSE) list ## decode itemIDs by replacing them with the appropriate item label decode(list, itemLabels = iLabels) ## Example 2: Manually create an itemMatrix using iLabels as the common item coding data <- list( c("income=small", "age=Young"), c("income=large", "age=Middle-aged") ) # Option a: encode to match the item coding in Adult iM <- encode(data, itemLabels = Adult) iM inspect(iM) compatible(iM, Adult) # Option b: coercion plus recode to make it compatible to Adult # (note: the coding has 115 item columns after recode) iM <- as(data, "itemMatrix") iM compatible(iM, Adult) iM <- recode(iM, itemLabels = Adult) iM compatible(iM, Adult) ## Example 3: use recode to make itemMatrices compatible ## select first 100 transactions and all education-related items sub <- Adult[1:100, itemInfo(Adult)$variables == "education"] itemLabels(sub) image(sub) ## After choosing only a subset of items (columns), the item coding is now ## no longer compatible with the Adult dataset compatible(sub, Adult) ## recode to match Adult again sub.recoded <- recode(sub, itemLabels = Adult) image(sub.recoded) ## Example 4: manually create 2 new transaction for the Adult data set ## Note: check itemLabels(Adult) to see the available labels for items twoTransactions <- as( encode(list( c("age=Young", "relationship=Unmarried"), c("age=Senior") ), itemLabels = Adult), "transactions") twoTransactions inspect(twoTransactions) ## Example 5: Use a common item coding # coercion to transactions will produce different item codings trans1 <- as(list( c("age=Young", "relationship=Unmarried"), c("age=Senior") ), "transactions") trans1 trans2 <- as(list( c("age=Middle-aged", "relationship=Married"), c("relationship=Unmarried", "age=Young") ), "transactions") trans2 compatible(trans1, trans2) # produce common item coding (all item labels in the two sets) commonItemLabels <- union(itemLabels(trans1), itemLabels(trans2)) commonItemLabels trans1 <- recode(trans1, itemLabels = commonItemLabels) trans1 trans2 <- recode(trans2, itemLabels = commonItemLabels) trans2 compatible(trans1, trans2) ## Example 6: manually create a rule and calculate interest measures aRule <- new("rules", lhs = encode(list(c("age=Young", "relationship=Unmarried")), itemLabels = Adult), rhs = encode(list(c("income=small")), itemLabels = Adult) ) quality(aRule) <- interestMeasure(aRule, measure = c("support", "confidence", "lift"), transactions = Adult) inspect(aRule)