bowbowbow / Apriori

Apriori Assignment

C++ Implementation of Apriori Algorithm

To run the implementation

Keep project files in one folder.
compile using command make.

To compile without using the makefile, type the following command.

g++ -std=c++11 apriori.cpp -o apriori.exe

(Note that -std=c++11 option is must be given in g++.)

Run using following command.

apriori.exe [min support (%)] [input file name] [output file name]

ex. apriori.exe 5 input.txt output.txt

Summary of the algorithm

Apriori algorithm is to find frequent itemsets using an iterative level-wise approach based on candidate generation.

Input: A database of transactions, the minimum support count threshold

Output: frequent itemsets in the database

The algorithm solves the problem with a two-step approach.

Step1. Frequent Itemset Generation

Generate all itemsets whose support with a value of minimum support or greater.

But the proccess requires a lot of computation (O(3^(k)-2^(k+1)+1), k=the number of item).

The key idea of apriori algorithmto reduce this operation is that any subset of a frequent itemset must be frequent.

Therefore, if there is any itemset which is infrequent, its superset should not be generated/tested.

As a result, follow the steps below to proceed with this process.

Generate length (k+1)-candidate itemsets from length k frequent itemsets. (This process is called joining.)

And delete newly generated (k+1)-items if the item set that removed one element is not in (k)-candidate. (This process is called pluning.)

Calculate the support of the candidates and remove candidates with support less than min support.

Proceed 1 again until there are no more candidates left.

Step2. Associate Rule Generation

Generate high confidence rules from each frequent itemset, where each rule is a binary partition of a frequent itemset.

Any other specification of the implementation and testing

Note that I use c++11, not c++. therefore -std=c++11 option is must be given in g++.

Since the test code uploaded to Hanyang Portal is for Windows, I added checker class for using in Unix.

About input file

Input file format

[item_id]\t[item_id]\n

[item_id]\t[item_id]\t[item_id]\t[item_id]\t[item_id]\n

[item_id]\t[item_id]\t[item_id]\t[item_id]\n

Row is transaction
[item_id] is a numerical value
There is no duplication of items in each transaction

About output file

output file format

[item_set]\t[associative_item_set]\t[support(%)]\t[confidence(%)]\n

Support: probability that a transaction contains [item_set] [associative_item_set]
Confidence: conditional probability that a transaction having [item_set] also contains [associative_item_set]
The value of support and confidence should be rounded to two decimal places.

Oct	NOV	Dec
	24
2019	2020	2021

bowbowbow / Apriori

README.md

Apriori Assignment

To run the implementation

Summary of the algorithm

Any other specification of the implementation and testing

About input file

About output file

About

Releases

Packages

Languages

bowbowbow / Apriori

Join GitHub today

GitHub is where the world builds software

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio

Latest commit

Git stats

Files

README.md

Apriori Assignment

To run the implementation

Summary of the algorithm

Any other specification of the implementation and testing

About input file

About output file

About

Topics

Resources

License

Releases

Packages 0

Languages

Essential cookies

Always active

Analytics cookies

Packages