palak_report
palak_report
Bachelor of Technology
In
Computer Science & Information Technology
(KIT-851)
By
Ishika Mittal(1900270110028)
Palak Aggarwal(1900270110035)
Devendra Kumar Gupta(1900270110019)
Deep Narayan Maurya(1900270110018)
Certified that “Ishika Mittal, Palak Aggarwal, Devendra Kumar Gupta and Deep Narayan Maurya”
have carried out the Project work entitled “Market Basket Analysis” for the award of Bachelor of
Technology from Ajay Kumar Garg Engineering College, Ghaziabad under our supervision.
To the best of our knowledge, this work has not been submitted earlier to any university for the award
of any degree.
We would like to express our sincerest gratitudeto all the people who have contributed towardsthe
successful completion of our project.
We would like to extend our heartfelt thanks to the Head of Information Technology Department Dr.
Anu Chaudhary, for nurturing a congenial yet competitive environment in the department, which
motivates all the students to pursue higher goals.
We want to express our special gratitude to our guide ―Ms. Mrignainy Kansal, Assistant
Professor”, Department of Information Technology, Ajay Kumar Garg Engineering College,
Ghaziabad for his constant support, guidance, encouragement and much needed motivation. His
sincerity, thoroughness and perseverance have been a constant source of inspiration for us.
Last but not the least; we would like to extend our thanks to all the teaching and non-teaching staff
members ofour department, and to all our colleagues who helped us in completion of the project.
1. ISHIKA MITTAL
2. PALAK AGGARWAL
3. DEVENDRA KUMAR GUPTA
4. DEEP NARAYAN MAURYA
Date :
DECLARATION
We hereby declare that the “Market Basket Analysis” submitted to the department of Information
technology, Ajay Kumar Garg Engineering College Ghaziabad is a record of an original work done by
us under the guidance of “Ms. Mrignainy Kansal, Assistant Professor” Department of Information
Technology, Ajay Kumar Garg Engineering College Ghaziabad and this project work is submitted as a
Ishika Mittal ( )
Palak Aggarwal ( )
Date:
Place: Ghaziabad (U.P.)
TABLE OF CONTENT
Chapters Page
No.
1.1 Preface 9
1.2 Objectives 9
1.3 Scope 11
Chapter II: Requirement Analysis and Feasibility Study
2.1 Requirement Analysis 12
2.1.1 Information Gathering 12
2.1.2 Functional Requirements 15
2.1.3 Non-Functional Requirements 16
2.1.3.1 Software Requirements 16
2.1.3.2 Hardware Requirements 16
2.1.3.3 Performance Requirements 17
2.1.3.4 SafetyRequirements 17
2.1.3.5 Security Requirements 17
2.1.3.6 Software Quality Attributes 17
2.2 Feasibility Study 19
2.2.1 Technical Feasibility 19
2.2.2 Economic Feasibility 20
2.2.3 Scheduling Feasibility 20
Chapter III: System Analysis and Design
3.1 System Analysis 22
3.1.1 Process Specification 22
3.2 SystemDesign 29
3.2.1 E-R Diagram 29
3.2.2 Use Case Diagram 31
3.2.3 Class Diagram 33
3.2.4 Sequence Diagram 35
3.2.5 Data Flow Diagram(DFD) 37
3.2.6 Output Design 44
Chapter IV: Coding and Testing
4.1 Coding 45
4.2 Testing 49
4.2.1 FunctionalityTesting 49
Chapter V: Implementation
5.1 User Manual 51
5.2 Loading Procedure 51
5.3 Starting Procedure
Chapter VI: Maintenance Features 52
Chapter VII: Advantages and Limitations 53
7.1 Advantages 53
7.2 Limitations 53
Chapter VIII: Conclusion and Suggestions for further work
8.1 Conclusion 54
8.2 Recommendations 54
8.3 Suggestions for further work 54
References 56
Appendix 1: Definitions 57
Appendix 2: Organization of the study 57
Appendix 3: Source code 58
Research paper 60
LIST OF FIGURES
LIST OF TABLES
Table Title Page No.
Table.1 Functional Requirements of the model 14
Table.2 Non-Functional Requirements of the model 15
CHAPTER I: INTRODUCTION
1.1 Preface
Market basket analysis (MBA)(i.e. Data mining technique in the field of marketing) is the method to
find the associations between the items/item sets and based on those associations we can analyze
consumer behavior. In this research, we have presented with the change in time the mental aspects of
the consumer also change. For example, in winter people warm clothes while in summer people wear
light clothes. Similarly, consumer purchase behavior also changes with the change in time. We study
the problem for discovering the association between items. This problem will allow us to find the
algorithm that can access the changing trends in the purchase behavior of consumers in a retail market
and capture the results which will display the variation of the association rules. In this research, we
will study the relationship between association rules and time. We have majorly focused on methods of
finding the effective minimum threshold value, which plays a significant role in giving results. The
two major algorithms used in frequent rule mining are apriori and eclat algorithms. Both of the
algorithms have different approaches to giving results and have different efficiency in different sizes of
the dataset. The characteristics of the dataset highly affect the performance of algorithms.
Market basket analysis is a statistical technique used to analyze transactional data and discover
patterns and relationships among items that are frequently purchased together. It is widely used in the
retail industry and can help businesses improve their product offerings, layout, pricing strategies, and
marketing campaigns.
The preface of market basket analysis lies in the assumption that there are often relationships between
the items in a customer's market basket. By analyzing transaction data, businesses can identify these
relationships and use them to make data-driven decisions that can increase revenue and customer
satisfaction.
Market basket analysis involves the use of algorithms and statistical models to identify the most
common item combinations, also known as itemsets. These itemsets are used to generate association
rules that describe the relationships between the items. The strength of these association rules is
measured by metrics such as support,confidence, and lift.
Overall, market basket analysis provides businesses with valuable insights into customer behavior and
preferences. By understanding which items are frequently purchased together, businesses can make
informed decisions about product bundles, pricing strategies, and marketing campaigns. This can lead
to increased revenue,customer loyalty, and overall business success.
1.2 Objective
The main objective of an MBA is to get better efficiency in market and sales strategy using consumer
transactional data collected during the sales transaction. The Market Basket Analysis system itself
gathers Frequent Itemsets from a set of Transaction & other resources, which are then classified
according to their semantic orientation and intensity. The proposed system is software that will collect
frequent items from multiple transactional databases and then obtained data will be analyzed for
association mining. There are multiple algorithms available for association rule mining out of those
Apriori Algorithm is used in our Basket Analysis system
a.To identify the frequent items from the transaction on the basis of support and confidence
b. To generate the association rule from the frequent item sets.
Most of the retail markets are more focused on the what their customer’s buy. But they ignore the fact
about when they buy it. Which is also considered to be a huge factor in their behavior of purchase.
This thesis is focused on not just “what” the customer buys but also “when” they buy it. According to
Forbes magazine marketers are constantly looking into future, trying to predict next big trend and data
driven marketing is the top most trend right now in which time plays a highly significant rolethat are
frequently purchased together in a transaction or shopping cart. It is widely used in the retail industry
and is also known as affinity analysis or association rule mining. The objective of market basket
analysis is to identify the most common item combinations, so that businesses can make data-driven
decisions to optimize their sales and marketing strategies.
The primary goal of market basket analysis is to identify which items are frequently bought together,
and which items are not. This can help businesses improve their product offerings, layout, and
marketing strategies. By knowing which products are often bought together, businesses can create
product bundles and cross-sell or upsell additional products to customers. For example, a store may
notice that people who buy bread also tend to buy milk and eggs, so they could create a promotion
where all three items are sold together at a discount, encouraging customers to purchase all three items
at once.
Another objective of market basket analysis is to help businesses understand the behavior and
preferences of their customers. By analyzing customer transaction data, businesses can learn which
products are most popular, which items are typically bought together, and how often customers return
to the store. This information can be used to tailor marketing campaigns and product offerings to
specific customer segments, increasing customer loyalty and satisfaction.
Market basket analysis can also help businesses optimize their inventory management and product
placement strategies. By knowing which products are frequently purchased together, businesses can
ensure that those items are always in stock and located near each other in the store. This can help
improve customer satisfaction by making it easier for customers to find what they need, reducing the
time they need to spend in the store and increasing the likelihood of making a purchase.
Additionally, market basket analysis can be used to identify trends and patterns over time. By tracking
changes in customer behavior and purchasing patterns, businesses can detect changes in customer
preferences and adjust their strategies accordingly. For example, if a store notices that sales of certain
products are declining over time, they may want to investigate why and consider replacing those
products with more popular alternatives.
One of the most significant benefits of market basket analysis is that it can help businesses improve
their pricing strategies. By identifying the most common item combinations, businesses can set prices
that encourage customers to purchase additional items. For example, a store may offer a discount on a
complementary product, encouraging customers to purchase both items together. This can help
increase overall revenue and customer loyalty.
In conclusion, market basket analysis is a powerful tool for businesses looking to understand customer
behavior, optimize their inventory management and product placement strategies, and improve their
sales and marketing efforts. By analyzing customer transaction data, businesses can identify which
items are frequently purchased together, understand customer preferences, and make data-driven
decisions that can increase revenue and customer satisfaction.
1.3 Scope
The first domain is the creation of personalized recommendations which is a very well known
methodology nowadays. During the explosion of e-commerce, personalized recommendations have
appeared as a part of the marketing process. Basically, the idea consists of suggesting items to
customers based on his/her preferences. Thefirst way to do it is, by suggesting items like the ones that
the customer has purchased in the past which is also called collaborative filtering. The second way is,
looking for similar customers and recommending items that had been purchased by others. It is called
content-based filtering. There is also a third way to do it which is called hybrid recommendation
system. The name of this system explains itself. It is the combination of both
collaborativeandcontentbased filtering, which can be very effective in certain cases. These strategies
are often used for companies in order to realize cross-selling and upselling strategies.
The second domain where market basket analysis is used in the analysis of spatial distribution in
chain stores. Due to the increasing number of products that nowadays exist, physical space in stores
started to be a problem. More and more, stores invest money and time trying to find which
distribution of items can lead them to obtain more sales. Due to that, knowing in advance which items
are commonly purchased together, the distribution of the storecan be changed in order to sell more
products. It is also very helpful in inventory or stock management within the store. Also having
several stores in different areas of the city or the country helps the chain stores to target more and
more customers and develop marketing strategies based on customer demographics. This way the
chain stores conduct target marketing which also leads to the variability of the price of the same
product in different stores.
The last domain is in the creation of marketing strategies that focus on discounts and promotions
which are solely based on customer behavior through which special sales or targeted promotions can
be performed. When sales campaigns are prepared, promoted items must be chosen very carefully.
The main goal of a campaign is to entice customers to visit the store and buy more than they usually
do. Profit margins on promoted items are usually cut; therefore, non-promoted items with a higher
profit margin should be sold together with the promoted items.
Therefore, the items chosen should make the promotion effective enough to generate higher sales.
The scope of market basket analysis is wide-ranging, as it can be applied to a variety of industries and
busines ssettings. The technique can be used to analyze transactional data from retail stores, online
shopping platforms, hospitality businesses, and more.
One of the primary applications of market basket analysis is in the retail industry, where it is used to
identify product bundles and cross-selling opportunities. By analyzing transaction data, retailers can
identify which products are frequently purchased together and create bundled promotions that offer
discounts on these item combinations. This can help increase sales and customer loyalty.
Market basket analysis can also be used in e-commerce businesses to improve product
recommendations and personalize the customer experience. By analyzing customer transaction data,
e- commerce platforms can recommend products that are likely to be of interest to the customer based
on their past purchases.
In the hospitality industry, market basket analysis can be used to optimize menu design and pricing
strategies. By analyzing transaction data from restaurant customers, businesses can identify which
items are frequently ordered together and create meal deals or promotional packages that offer
discounts on these item combinations.
● The system must carry out only actions specified by the user (browse, modify, delete, add).
The datasets were provided by Instacart Technology Company and was taken from Kaggle to
perform the analysis. The datasets provided by Instacart had complete information of over 3 million
grocery orders from more than 200,000 Instacart users. Both product data and customer data from
Instacart includes 50,000 unique products, week and the time of purchase, different product aisle
and departments. Understanding the data, dairy products, fruits and vegetables were purchased the
most across all the departments and people tends to purchase and reorders 60% oftheir previous
orders mostly on Sunday and Monday
The overall methods which are used while gathering information are:
● Interviewing: interview allows the analyst to collect or gather the information from the individual
or group who are generally the current user of the existing system or potential user of the proposed
system. This is a basic source of qualitative and helpful information. It also allows the analyst or
developer to discover areas of misunderstanding and problems. User interviews are conducted to
determine the qualitative information. These interviews which were instructed interviews,
provided opportunity to gather information from the respondents who are involved in the process
for a long time.
Interviewing for market basket analysis may involve questions about the basic concepts of market
basket analysis, its applications, and the tools and techniques used for analysis. Here are some example
questions that may be asked in an interview:
● Can you explain the difference between support, confidence, and lift in association rule mining?
● What are some common algorithms used for market basket analysis, and what are their strengths
and weaknesses?
● How can market basket analysis be used to optimize inventory management and product
placement strategies?
● How can market basket analysis be used to improve pricing strategies and revenue generation?
● How can market basket analysis be used to personalize the customer experience and improve
product recommendations?
● How do you measure the effectiveness of a market basket analysis model, and what metrics do
you use?
● Can you describe a project where you used market basket analysis to solve a business problem,
and what were the outcomes?
● How do you ensure data qualityand completeness when performing market basket analysis?
● What are some effective methods of gathering information for market basket analysis, apart from
analyzing transactional data? Can you explain one such method in detail?
● How can market basket analysis be used to optimize inventory management and product
placement strategies?
● Can you describe a project where you used market basket analysis to solve a business problem,
and what were the outcomes?
● How do you ensure data qualityand completeness when performing market basket analysis?
● How can market basket analysis be used to improve pricing strategies and revenue generation?
● How can market basket analysis be used to personalize the customer experience and improve
product recommendations?
● Observation: Observation can bring in missed facts, new ways to improve the existing procedures,
duplicate work done inadvertently. It can also bring in what other fact finding methods cannot. But
his task is delicate because some people do not like to be observed when they work. Observation
givesanalysts the opportunity to go behind the scenes to learn how things work. Observation
should look for:
● Operational inefficiencies
Observation and information gathering are critical steps in conducting market basket analysis. In order
to analyze customer transaction data and identify meaningful patterns, businesses must first gather the
necessary data and understand the context in which the data was collected. In this article, we will
discuss the process of observation and information gathering in market basket analysis and how
businesses can ensure they are collecting high-quality data for analysis.
Defining the problem
The first step in market basket analysis is defining the problem or question that the analysis is intended
to answer. For example, a business may want to understand which products are commonly purchased
together or which products are most frequently returned. Defining the problem clearly will help
businesses focus their data collection efforts and ensure that the data collected is relevant to the
analysis.
Collecting data
Once the problem has been defined, the next step is to collect the data necessary to perform the
analysis. Data can be collected through various channels, including point of sale systems, customer
surveys, and online purchase histories. It is essential to collect accurate and complete data to ensure
that the analysis is valid and reliable.
Exploring data
Once the data is cleaned and preprocessed, businesses can begin to explore the data to understand its
characteristics and identify patterns. This step involves visualizing the data using techniques such as
scatterplots, histograms, and boxplots. Exploring the data can help businesses identify outliers, detect
trends, and gain insights into customer behavior.
Interpreting results
After applying market basket analysis techniques, businesses must interpret the results and draw
conclusions. This step involves identifying the most significant associations and determining how they
can be used to improve business outcomes. For example, a business may decide to create a product
bundle based on the most commonly purchased items or adjust product placements to encourage
crossselling.
Validating results
Finally, businesses must validate the results of their market basket analysis to ensure that the findings
are reliable and accurate. This step involves testing the analysis on a different dataset to ensure that the
results are consistent and can be replicated. Validating the results is essential to ensure that the insights
gained from the analysis can be applied effectively to real-world business problems.
NFR4
Supportability The system must be
easy to support.
2.1.3.1 Software Requirements
If there is extensive damage to a wide portion of the database due to catastrophic failure, such as a
disk crash,the recovery method restores a past copy of the database that was backed up to archival
storage (typically tape) and reconstructs a more current state by reapplying or redoing the operations
of committed transactions from the backed a up log, up to the time of failure
Security systems need database storage just like many other applications. However, the special
requirements of the security market mean that vendors must choose their database partner carefully.
2.1.3.6 Software Quality Attributes
Software quality attributes refer to the characteristics of software that determine its
overall quality.Market basket analysis software should have the following software
quality attributes:
●Availability:- The system shall be available more than 99% of the time.
●Performance:- The system shall respond in a timely fashion and will not consume inordinate
amounts of system resources.
●Correctness:- The system shall return valid and correct data and results to user requests.
●Accuracy: The accuracy of the market basket analysis software is critical in ensuring the validity
and reliability of the results obtained. The software should be able to accurately identify
association rule between items in transactions.
●Speed: The software should be able to process large volumes of data quickly. In the retail industry,
data is generated in real-time, and fast decision-making is critical.
●Ease of use: The software should have an intuitive and user-friendly interface that makes it easy for
users to interact with and analyze data. The software should also have clear documentation and
user manuals to guide users through the analysis process.
●Reliability: The software should be reliable and stable, with minimal downtime or errors. In the
retail industry, decisions are made based on insights obtained from analysis, and unreliable.
● Security: The software should have adequate security measures to protect data from unauthorized
access,theft, or loss. The software should also be compliant with industry- specific regulations and
standards.
● Compatibility: The software should be compatible with other software and systems used in the
retail industry, including point-of-sale systems, data warehousing systems, and customer
relationship management systems.
● Customizability: The software should be customizable to meet the specific needs of different
businesses. This includes the ability to add or remove features, modify algorithms, and tailor the software
to meet the unique requirements of individual businesses.
● Portability: Portability measures the ease with which a software system can be transferred or
adapted to different environments, platforms, or operating systems. It ensures that the software can
be deployed and run effectively across various target environments without significant
modifications or dependencies.
● Testability: Testability evaluates how easily the software system can be tested to identify defects,
validate functionality, and ensure its quality. A testable software system has well-defined test cases,
debuggable code, and effective testing techniques to support thorough and efficient testing
processes.
● Scalability: Scalability refers to the software system's ability to handle increasing workloads, users,
and data volumes without compromising performance or functionality.
The adoption of market basket analysis was aided by the advent of electronic point-of-sale (POS)
systems. Compared to handwritten records kept by store owners, the digital records generated by POS
systems made it easier for applications to process and analyze large volumes of purchase
data.Implementation of market basket analysis requires a background in statistics and data science and
some algorithmic computer programming skills. For those without the needed technical skills,
commercial, off-the-shelf tools exist.
Market Basket Analysis is modelled on Association rule mining, i.e., the IF {}, THEN {} construct. For
example, IF a customer buys bread, THEN he is likely to buy butter as well.
● Antecedent:Items or 'itemsets' found within the data are antecedents. In simpler words, it's the IF
component, written on the left-hand side. In the above example, bread is the antecedent.
● Consequent:A consequent is an item or set of items found in combination with the antecedent. It's
the THEN component, written on the right-hand side. In the above example, butter is the
consequent.
The following feasibility studies have been done for the project-
2.2.1 Technical Feasibility:
Technical feasibility is one of the most important criteria for selecting material for digitisation. The
physical characteristics of source material and the project goals for capturing, presenting and storing
the digital surrogates dictate the technical requirements. Libraries must evaluate those requirements
for each project and determine whether they can be met with the resources available. If the existing
staff, hardware and software resources cannot meet the requirements, then the project will need
funding to upgrade equipment or hire an outside conversion agency. The system was found to be
technically feasible in terms of-
Market basket analysis is technically feasible and has been widely adopted by businesses in the retail
industry. The technology and tools required to perform market basket analysis are readily available,
and the process can be automated using software tools.To perform market basket analysis, businesses
need access to transaction data that includes information on the items purchased, the date and time of
purchase, and the customer who made the purchase. This data can be collected using point-of-sale
systems or other data collection tools.The process of market basket analysis involves data
preprocessing, which includes data cleaning, transformation, and reduction. This step involves
removing duplicate or irrelevant data, converting the data into a suitable format, and reducing the size
of the data set to improve processing speed and efficiency.
Once the data has been preprocessed, the actual analysis is performed using algorithms such as
Apriori, FP-Growth, or Eclat. These algorithms use association rules to identify patterns and
relationships between items in transactions.
The results of the analysis can be presented in various forms, including tables, graphs, or
visualizations. The insights obtained from market basket analysis can be used to inform business
decisions, such as product bundling, pricing strategies, and inventory management.Technical
feasibility of market basket analysis also depends on the scalability of the analysis process. As the
volume of data increases, businesses need to ensure that their systems can handle the increased
workload. Cloud-based solutions can be used to scale the analysis process and provide businesses
with the necessary computing power to handle large data sets.
In conclusion, market basket analysis is technically feasible and has been widely adopted by
businesses in the retail industry. The technology and tools required to perform market basket analysis
are readily available, and the process can be automated using software tools. The scalability of the
analysis process is critical in ensuring that businesses can handle large volumes of data and obtain
valuable insights to inform their decision-making. This type only derives insights from past data and
is the most frequently used approach. The analysis here does not make any predictions but rates the
association between products using statistical techniques. For those familiar with the basics of Data
Analysis, this type of modelling is known as unsupervised learning.
A system can be developed technically and that will be used if installed must still be a good
investment for the organization. Inthe economic feasibility, the development cost in creating the
system is evaluated against the ultimate benefit derived from the new systems. Financial benefits
must equal or exceed the costs. The system is economically feasible. It does not require any
additional hardware or software.
Market basket analysis is not only technically feasible, but also economically feasible for businesses
in the retail industry. The insights obtained from market basket analysis can help businesses
optimize their sales and marketing strategies, leading to increased revenue and profitability.
By identifying the most common item combinations and customer preferences, businesses can create
product bundles and cross-sell or upsell additional products to customers. This can increase the average
transaction value and overall revenue. For example, if a store notices that people who buy bread also
tend to buy milk and eggs, they could create a promotion where all three items are sold together at a
discount, encouraging customers to purchase all three items at once.
Market basket analysis can also help businesses optimize their inventory management and product
placement strategies. By knowing which products are frequently purchased together, businesses can
ensure that those items are always in stock and located near each other in the store. This can help
improve customer satisfaction by making it easier for customers to find what they need, reducing the
time they need to spend in the store and increasing the likelihood of making a purchase. It can also help
reduce inventory costs by ensuring that the right products are in stock, reducing the need for excess
inventory.
Additionally, market basket analysis can help businesses tailor their marketing campaigns to specific
customer segments, increasing customer loyalty and satisfaction. By analyzing customer transaction
data, businesses can learn which products are most popular, which items are typically bought together,
and how often customers return to the store. This information can be used to create targeted marketing
campaigns that appeal to specific customer segments, increasing the effectiveness of marketing efforts
and reducing marketing costs.
In conclusion, market basket analysis is economically feasible for businesses in the retail industry. The
insights obtained from market basket analysis can help businesses optimize their sales and marketing
strategies, leading to increased revenue and profitability. By identifying customer preferences and
optimizing inventory management and product placement strategies, businesses can reduce costs and
increase customer satisfaction. Furthermore, by tailoring marketing campaigns to specific customer
segments, businesses can increase the effectiveness of their marketing efforts and reduce marketing
costs.
It estimates how long the system will take to develop. If the project has a high likelihood of completion
by the desired due date, then schedule feasibility is considered to be high. It ensures that the project is
completed before the project or technology becomes obsolete. Scheduling feasibility refers to the
ability of a business to schedule and execute market basket analysis in a timely and efficient manner.
the results. In order to perform market basket analysis, businesses need to have access to transaction
data, which may be collected through point-of-sale systems or other data sources.
Once the data is collected, businesses need to have the resources and expertise to analyze the data
using statistical techniques and algorithms. This may require specialized software or tools, as well as
trained analysts who can interpret the results and provide insights to the business.In terms of
scheduling, businesses need to balance the time and effort required to perform market basket analysis
with other priorities and responsibilities. Depending on the size and complexity of the data, market
basket analysis may take several hours or even days to complete.
To ensure scheduling feasibility, businesses may need to plan ahead and allocate the necessary
resources and time to perform market basket analysis. This may involve dedicating staff or contracting
with external experts to perform the analysis, as well as setting aside time in the business's schedule to
review the results and make data-driven decisions.
In conclusion, scheduling feasibility of market basket analysis depends on the availability of data,
resources, and expertise required to perform the analysis, as well as the time and effort required to
analyze the results. To ensure scheduling feasibility, businesses may need to plan ahead and allocate
the necessary resources and time to perform market basket analysis, balancing the time and effort
required with other priorities and responsibilities.00 The project was found to have high scheduling
feasibility. The following is a breakdown of the activities as anticipated to be carried out-
Information about customer purchases and transaction details are delivered to us through six different
datasets. Order and Product dataset form the base of the complete transactions and were merged to a
single dataset through the common Product and Order ID variables accordingly. Later, aisles and
departments datasets were merged with the order and product combined dataset through aisle ID and
department ID to form a master dataset to commence the analysis. SAS Studio was used for preparing
the data and other manipulation operations to proceed further for the analysis.
The first step in data collection is to define the scope of the analysis. The scope defines the products or
categories of products that will be included in the analysis. For example, a retailer may choose to
analyze the purchasing patterns of customers in the grocery section, or they may choose to analyze the
purchasing patterns of customers in the clothing section.
The second step is to collect transaction data. Transaction data is the data that describes the products
that customers buy in each transaction. It includes the date and time of the transaction, the product
code or name, and the quantity purchased. This data can be collected from the retailer's point of sale
(POS) system or through surveys or interviews with customers.
The third step is to clean and process the transaction data. Cleaning and processing involve removing
any incomplete or inaccurate data, standardizing the product names, and identifying the transactions
that are relevant to the analysis. For example, a retailer may choose to exclude transactions that
involve gift cards or returns.
The fourth step is to transform the transaction data into a format that can be analyzed using market
basket analysis techniques. The transformed data typically consists of a matrix where each row
represents a transaction and each column represents a product. The cells in the matrix contain binary
values indicating whether a product was purchased in a particular transaction.
The fifth step is to apply market basket analysis techniques to the transformed data. These techniques
typically involve calculating support, confidence, and lift values for each combination of products.
Support refers to the percentage of transactions that contain a particular combination of products.
Confidence refers to the percentage of transactions that contain one product given that another product
is already in the transaction. Lift refers to the ratio of the observed support of a combination of
products to the expected support if the products were independent.
In summary, the data collection process for market basket analysis involves defining the scope of the
analysis, collecting transaction data, cleaning and processing the data, transforming the data into a
format that can be analyzed, and applying market basket analysis techniques. The insights gained from
market basket analysis can help retailers to optimize their product assortments, promotions, and store
layouts to improve sales and customer satisfaction.
Fig.2 Snapshot of the dataset
The data collected was mapped manually as integer values as shown in Figure 4. For example the
“Fruit” was labeled as 1, “Bread” as 2 “Soups” as 4 and so on.
Fig.3 Mapped to integers
The mapped integer’s values were then saved in a text file and given as the input to the system. The
Apriori algorithm was used for processing the input data and result was produced as the list of rules
that are strongly associated with each other.
Data preprocessing is a critical step in any data analysis project, including market basket analysis. The
goal of data preprocessing is to transform the raw data into a format that is suitable for analysis, which
involves cleaning, integrating, transforming, and reducing data. In this article, we will discuss the data
preprocessing steps for market basket analysis.
Data cleaning: The first step in data preprocessing is to clean the data. This step involves identifying
and correcting errors or inconsistencies in the data. For market basket analysis, this may include
removing incomplete or duplicate transactions, standardizing product names, and dealing with missing
or inconsistent data.
Data transformation: The third step is to transform the data into a format that is suitable for analysis.
For market basket analysis, this typically involves creating a transaction matrix, where each row
represents a unique transaction, and each column represents a unique product. The cells in the matrix
contain binary values indicating whether a product was purchased in a particular transaction.
Data reduction: The fourth step is to reduce the data to a manageable size. This is important because
market basket analysis involves analyzing all possible combinations of products, which can quickly
become computationally intensive with large datasets. Techniques such as filtering, sampling, and
aggregation can be used to reduce the size of the dataset while preserving the relevant information.
Data discretization: The fifth step is to discretize the data, which involves converting continuous data
into discrete categories. For market basket analysis, this may involve grouping products into categories
based on their attributes, such as brand or price range. This step is important because many market
basket analysis techniques require discrete data to calculate association rules.
Data normalization: The sixth step is to normalize the data, which involves scaling the data to a
common range or format. Normalization is important because it ensures that all variables are on the
same scale, which is necessary for many market basket analysis techniques.
Let I = {I1, I2,…, Im} be an itemset. These itemsets are called antecedents. Let D, the data, be a set of
is associated with an identifier called a TID(or Tid). Let A be a set of items(itemset).⊆ T is the
database transactions where each transaction T is a nonempty itemset such that T I. Each transaction
Transaction that is said to contain A if A T. An Association Rule is an implication of form A B,
where A ⊂ I, B ⊂ I, and A ∩B = φ. ⊆ ⇒
transactions ⇒in D that contain A 𝖴 B (i.e.,⇒ the union of set A and set B, or both A and B). This is
taken The rule A B holds in the data set(transactions) D with supports, where ‘s’ is the percentage of
percentage of transactions𝖴 in D containing A that also contains B. This is taken to be the conditional
as the probability, P(A B). Rule A B has confidence c in the transaction set D, where c is the
probability, like P(B|A). That is, support(A⇒ B) =P(A
𝖴 B) confidence(A⇒ B) =P(B|A)
Rules that satisfy both a minimum support threshold (called min sup) and a minimum confidence
threshold (called min conf ) are called “Strong”.
The Apriori algorithm is a level-wise, breadth-first algorithm that counts transactions Apriori algorithm
uses prior knowledge of frequent item set properties. Apriori uses an iterative approach known as a
level- wise search, in which n-item sets are used to explore (n+1) - item sets. To improve the efficiency
of the level- wise generation of frequent item sets Apriori property is used here. Apriori property
insists that all non-empty subsets of a frequent item set must also be frequent. This is made possible
because of the anti-monotone property of the support measure - the support for an item set never
exceeds the support for its subsets. A two-step process consists of join and prunes actions that are done
iteratively It is one of the
Data Mining Algorithms which is used to find the frequent items/item set from a given data repository.
The Apriori algorithm is a popular method used in market basket analysis to find frequent itemsets,
and generate association rules between them. The algorithm works by first finding frequent individual
items, then extending theitemsets by adding one item at a time until no more frequent itemsets can be
found. In this article, we will discuss the Apriori algorithm with an example.
Consider a dataset of transactions in a grocery store, where each transaction is a list of items purchased
by a customer. The dataset contains the following transactions:
Cheese 3
Eggs 2
The frequent itemsets of size 2 can be found by joining the individual items with themselves, and then
counting the frequency of each itemset. The itemsets with frequency greater than or equal to a
minimum support threshold are consideredfrequent.
{Milk, Bread} 2
{Milk, Cheese} 2
{Bread, Cheese} 3
{Bread, Eggs} 1
{Cheese, Eggs} 1
Association rules are generated by calculating the confidence of each possible rule and selecting the
rules with a confidence greater than or equal to a minimum confidence threshold.
The resulting association rules indicate that customers who purchase bread are highly likely to also
purchase cheese, and vice versa. Similarly, customers who purchase milk are likely to purchase bread
or cheese. These insights can be used by the grocery store to optimize product placement, promotions,
and pricing to increase sales and customer satisfaction
i. Joining: Joining is the process of combining itemsets to form larger itemsets during the
generation of frequent itemsets. The join operation is typically applied in an iterative manner to
progressively generate frequent itemsets of increasing length.The process of joining involves
comparing the candidate itemsets to determine if they share a common prefix. If two itemsets
share a common prefix, they are joined to create a new itemset that contains all the unique items
from both sets. The resulting joined itemset is then checked for support against the transactional
data to determine if it qualifies as a frequent itemset.
For example, let's consider a market basket analysis where the frequent itemsets are being generated. If
{A, B} and {A, C} are frequent itemsets, their common prefix {A} is identified, and the join
operation combines them to form the new itemset {A, B, C}. This new itemset will then be
evaluated for support to determine if it qualifies as a frequent itemset.
ii. Pruning: Pruning is the process of removing infrequent itemsets or subsets of itemsets that do not
meet the specified support threshold. Pruning helps reduce the number of itemsets that need to be
evaluated, thereby improving the efficiency of the market basket analysis algorithm. Using the
pruning technique, if an itemset is determined to be infrequent, there is no need to generate or
evaluate its supersets because they are guaranteed to be infrequent as well. This pruning step
significantly reduces the number of candidate itemsets that need to be considered during the
analysis, leading to improved efficiency.
The Apriori property is the important factor to be consider before proceeding with the algorithm
Apriori property states that If item X is joined with item Y, Support (XUY) =min (Support(X),
Support(Y)) Basically when we are determining the strength of an association rule i.e. how to
string the relationship isbetween the transaction of the items we measure through the use of the
support and confidence. The support of an item is the number of transactions containing the
item. Those items that do not meet the minimum support are excluded from further processing.
Support determines how often a rule is applicableto a given data set. Confidence is defined as
the conditional probability that a transaction containing the LHS will also contain the RHS.
Confidence(LHS->RHS>P(RHS/LHS)=P(RHS∩LHS)/P(LHS)=support(RHS∩LHS)/support(LHS)
Confidence determines how frequently an item in RHS appears in the transaction that contains
LHS. While determining the rules we must measure these two components as it is very
important to us. A rule that has very low support may occur simply by chance. Confidence, on
the other hand, measures the reliability of the inference made bythe rule. The Apriori algorithm
consists of three main steps.
This is the process and art of defining the Architecture, components, modules, interface, and data for a
system to satisfy specified requirements by the stakeholder or customer. The Project is designed in
phases to ensure that all necessary fields are covered in the management of the Travel and Tourism
system.
3.2.1 ER diagram for database related applications
An entity relationship diagram (ERD) shows the relationships of entity sets stored in a database. An
entity in this context is an object, a component of data. An entity set is a collection of similar entities.
The main entities in market basket analysis are transactions, items, and customers. A transaction
represents a purchase made by a customer, and contains one or more items. An item is a product that
can be purchased, and is part of one or more transactions. A customer is a person who makes one or
more transactions, and can purchase one or more items in each transaction.
In this ER diagram, the main entities are represented as rectangles, and the relationships between them
are represented as diamonds.
The "Transaction" entity is represented by the rectangle labeled "Transaction", and contains the
attributes "Transaction ID" and "Transaction Date". Each transaction is uniquely identified by a
transaction ID, and contains one or more items.
The "Item" entity is represented by the rectangle labeled "Item", and contains the attributes "Item ID"
and "Item Name". Each item is uniquely identified by an item ID, and can be part of one or more
transactions.
The "Customer" entity is represented by the rectangle labeled "Customer", and contains the attributes
"Customer ID" and "Customer Name". Each customer is uniquely identified by a customer ID, and can
make one or more transactions.
The relationships between the entities are represented by diamonds. The relationship between
"Transaction" and "Item" is represented by the diamond labeled "Contains", which indicates that each
transaction can contain one or more items, and each item can be part of one or more transactions. The
relationship between "Transaction" and "Customer" is represented by the diamond labeled "Made By",
which indicates that each transaction is made by one customer, and each customer can make one or
more transactions.
Fig.4 ER diagram
A UML use case diagram is the primary form of system/software requirements for a new software
program underdeveloped. Use cases specify the expected behaviour (what), and not the exact method of
making it happen (how). Use cases once specified can be denoted both textual and visual representation
(i.e. use case diagram). A key concept of use case modelling is that it helps us design a system from the
end user's perspective. It is an effective technique for communicating system behaviour in the user's
terms by specifying all externally visible system behaviour.
This Use Case Diagram is a graphic depiction of the interactions among the elements of Tourism
Management System. It represents the methodology used in system analysis to identify, clarify, and
organize system requirements of Tourism Management System. The main actors of Tourism Management
System in this Use Case Diagram are: Admin, Agents, Customer, who perform the different type of use
cases such as Manage Travel Agent, Manage Package, Manage Transportation, Manage Booking Manage
Hotel Manage Tour, Manage, Manage Users and Full Tourism Management System Operations. Visual
modeling tools or software can be used to create a more detailed and visually appealing representation of
the use case diagram.
A use case diagram is a graphical representation of the interactions between the users and the system in
a specific scenario. In market basket analysis, a use case diagram can be used to illustrate the various use
cases or scenarios where the market basket analysis system can be used. The Admin is an actor with
additional privileges and responsibilities, including managing users, configuring analysis parameters, and
viewing system reports. The System is an actor that performs the core functionalities of generating
market basket analysis and processing transaction data.Please note that this is a simplified representation,
and the actual use case diagram for market basket analysis may vary depending on the specific
requirements and system design. Visual modeling tools or software can be used to create a more detailed
and visually appealing representation of the use case diagram. Visual modeling tools or software can be
used to create a more detailed and visually appealing representation of the use case diagram
The main actors in a market basket analysis use case diagram are the customers and the analysts. The
customers are the ones who make purchases and generate transactions, while the analysts are the ones
who use the market basket analysis to gain insights and make decisions.
The use case diagram for market basket analysis can be represented as follows:
In this use case diagram, the main actors are represented by stick figures, and the use cases are
represented by ovals.
The "Make Purchase" use case represents the scenario where the customer makes a purchase, which
generates a transaction in the system.
The "Retrieve Transaction Data" use case represents the scenario where the analyst retrieves the
transaction data from the system, which can be used for market basket analysis.
The "Analyze Transaction Data" use case represents the scenario where the analyst analyzes the
transaction data using market basket analysis techniques, such as Apriori algorithm or association
rules, to gain insights into customer behavior and preferences.
The "Generate Insights" use case represents the scenario where the analyst generates insights and
recommendations based on the analysis of the transaction data, which can be used to inform business
decisions, such as product placement, marketing campaigns, or pricing strategies.
Fig.5 Use Case Diagram
Class diagrams are the main building blocks of every object oriented methods. The class diagram can
be used to show the classes, relationships, interface, association, and collaboration. UML is
standardized in class diagrams. Since classes are the building block of an application that is based on
OOPs, so as the class diagram has appropriate structure to represent the classes, inheritance,
relationships, and everything that OOPs have in its context. It describes various kinds of objects and
the static relationship in between them. The main purpose to use class diagrams are:
● This is the only UML which can appropriately depict various aspects of OOPS concept.
● It is base for deployment and component diagram.
Given below is the class diagram for the object oriented applications. The customer can book bus or
any other transportation method, print the ticket and send feedback.
A class diagram is a type of UML diagram that shows the static structure of a system by modeling its
classes, attributes, methods, and relationships. A class diagram is a type of structural diagram in
UML (Unified Modeling Language) that depicts the structure and relationships of classes in a system
or software application. It provides a static view of the system, illustrating the classes, their
attributes, methods, and associations with other classes. In market basket analysis, a class diagram
can be used to illustrate the classes and their relationships involved in the analysis of the transaction
data.
The main classes in market basket analysis are Transaction, Item, and Customer. Each of these
classes has attributes and methods that describe their characteristics and behavior. Here is an example
class diagram for market basket analysis:
In this class diagram, the main classes are represented as rectangles with their attributes and methods
listed below. The relationships between classes are represented by lines between the rectangles.
The "Transaction" class has attributes such as "transactionID" and "transactionDate", and a method
called "getItems" which returns a list of items in the transaction. It has a one-to-many relationship
with the "Item" class, as each transaction can contain multiple items.
The "Item" class has attributes such as "itemID" and "itemName", and a method called
"getTransactions" which returns a list of transactions containing the item. It has a many-to-one
relationship with the "Transaction" class, as multiple transactions can contain the same item.
The "Customer" class has attributes such as "customerID" and "customerName", and a method called
"getTransactions" which returns a list of transactions made by the customer. It has a one-to-many
relationship with the "Transaction" class, as each customer can make multiple transactions.
Overall, the class diagram for market basket analysis provides a structural overview of the main
classes involved in the analysis of transaction data, and their relationships to each other. It can be
used to guide the
development and implementation of the market basket analysis system, and to ensure that the necessary
classes and methods are present to support the analysis
In this class diagram, we have four main classes: Transaction4, Item, MarketBasket, and Customer.
● The Transaction class represents an individual transaction made by a customer and contains
attributes such as transactionId, date, time, and customerId.
● The Item class represents individual items available for purchase and contains attributes such as
itemId, name, description, and price.
● The MarketBasket class represents a collection of items purchased together in a single
transaction and contains attributes such as basketId, transaction, and items (referring to a
collection of Item objects).
● The Customer class represents the customer involved in transactions and contains attributes
such as customerId, name, and address.
Fig.6 – Class Diagram
This is the UML sequence diagram of Market Basket Analysis which allows the interaction between the
objects of Customer, Package, Hotel and Travel Agent.
This is the Login Sequence Diagram of Market Basket Analysis, where admin will be able to login in
their account using their credentials. After login, user can manage all the operations on hotel,
customer, package, travel agent. All the pages are secure and user can access these pages after login.
The diagram below helps to demonstrate how the login page works. The various objects interact over
the course of the sequence,and user will not be able to access this page without verifying their identity.
A sequence diagram is a type of UML diagram that shows the interactions between objects in a system
over time. In market basket analysis, a sequence diagram can be used to illustrate the interactions
between the objects involved in the analysis of the transaction data, such as the customer, the
transaction, and the market basket analysis system. A sequence diagram is a type of behavioral
diagram in UML (Unified Modeling Language) that illustrates the interactions and messages
exchanged between objects over a specific period of time. It represents the dynamic behavior of a
system or a specific scenario, showing the sequence of interactions among objects in chronological
order.
Here is an example sequence diagram for market basket analysis:
In this sequence diagram, the main objects are represented as rectangles, and their interactions are
represented by arrows.
The "Customer" object initiates the sequence by making a purchase, which creates a new "Transaction"
object with the relevant details, such as the transaction ID and the items purchased.
The "Transaction" object sends a message to the "Market Basket Analysis System" object, which
triggers the system to analyze the transaction data using the Apriori algorithm or other association rule
techniques.
The "Market Basket Analysis System" object sends a response back to the "Transaction" object with the
results ofthe analysis, such as the frequent itemsets or association rules discovered.
The "Transaction" object updates its state to reflect the analysis results, and may send messages to
other objects, such as the "Customer" object, to provide recommendations or other information based
on the analysis.
Overall, the sequence diagram for market basket analysis provides a dynamic view of the interactions
between the objects involved in the analysis of transaction data, and how they work together to
support the analysis process. It can be used to guide the development and implementation of the
market basket analysis system, and to ensure that the necessary interactions and messages are present
to support the analysis Sequence diagram of market basket analysis
Fig.7 Login Sequence Diagram
Also known as DFD, Data flow diagrams are used to graphically represent the flow of data in a
business information system. DFD describes the processes that are involved in a system to transfer data
from the input to the file storage and reports generation.
Data flow diagrams can be divided into logical and physical. The logical data flow diagram describes
flow of data through a system to perform certain functionality of a business. The physical data flow
diagram describes the implementation of the logical data flow.
A data flow diagram (DFD) is a graphical representation of the flow of data through a system. In
market basket analysis, a DFD can be used to illustrate the flow of transaction data and the analysis
process.
Here is an example data flow diagram for market basket analysis
In this data flow diagram, the main processes are represented as rectangles, and the data flows
between them are represented by arrows.
● The "Transaction Data" source provides the input data for the market basket analysis system. This
may come from a variety of sources, such as point-of-sale systems, online purchases, or customer
loyalty programs.
● The "Preprocessing" process involves cleaning and transforming the transaction data to prepare it
for analysis. This may include tasks such as removing duplicates, formatting the data, and
converting it into a suitable format for the analysis algorithms.
Data flow diagrams can be divided into logical and physical. The logical data flow diagram describes
flow of data through a system to perform certain functionality of a business. The physical data flow
diagram describes the implementation of the logical data flow.
A data flow diagram (DFD) is a graphical representation of the flow of data through a system. In market
basket analysis, a DFD can be used to illustrate the flow of transaction data and the analysis process. A
data flow diagram (DFD) is a graphical representation that illustrates the flow of data within a system
or process. It focuses on how data moves through different stages or components of a system,
highlighting the transformations that occur along the way. DFDs are commonly used in systems
analysis and design to visualize the data flow and to understand the information requirements and
processing steps of a system.
The main components of a data flow diagram are:
● Processes: Processes represent the functions, tasks, or activities that manipulate or transform data.
They are depicted as rectangles or circles with labels indicating the specific operation being
performed on the data.
● Data Flows: Data flows represent the movement of data between processes, external entities, and
data stores. They are shown as arrows indicating the direction of data flow.External Entities:
External entities are sources or destinations of data that interact with the system but are outside
the scope of the system being modeled.
● Data Stores: Data stores represent the repositories or storage locations where data is persisted or
retrieved`
In this data flow diagram, the main processes are represented as rectangles, and the data flows between
them are represented by arrows.
The "Transaction Data" source provides the input data for the market basket analysis system. This may
come from a variety of sources, such as point-of-sale systems, online purchases, or customer loyalty
programs.
The "Preprocessing" process involves cleaning and transforming the transaction data to prepare it for
analysis. This may include tasks such as removing duplicates, formatting the data, and converting it into
a suitable format for the analysis algorithms.
The "Market Basket Analysis" process performs the actual analysis on the preprocessed transaction data
using algorithms such as Apriori or FP-Growth. The output of this process includes frequent itemsets,
association rules, or other relevant metrics.
The "Visualization" process creates visual representations of the analysis results, such as charts, graphs,
or reports, to help users understand and interpret the findings.
The "Recommendations" process provides personalized recommendations based on the analysis results,
such as suggesting related or complementary products to customers.
Overall, the data flow diagram for market basket analysis provides an overview of how data flows
through the system and the processes involved in preparing, analyzing, and visualizing the transaction
data. It can be used to guide the development and implementation of the market basket analysis system,
and to ensure that the necessary data flows and processes are present to support the analysis.
The admin module in market basket analysis typically handles the management and configuration of
the system. It smain functions include:
● User management: The admin module allows the system administrator to manage user accounts
and permissions, such as creating new accounts, deleting accounts, and assigning access rights.
● Data management: The admin module provides tools for managing the transaction data used in
the analysis,such as importing new data, exporting data, and cleaning the data.
● Algorithm configuration: The admin module allows the system administrator to configure the
parameters and settings for the market basket analysis algorithms, such as the support and
confidence thresholds, the maximum itemset size, and the algorithmic approach.
● Report generation: The admin module enables the system administrator to generate reports and
visualizations of the market basket analysis results, such as frequent itemsets, association rules,
and other metrics.
● System monitoring: The admin module provides tools for monitoring the performance and
resource usage of the market basket analysis system, such as tracking the number of transactions
processed, the processing time for each transaction, and the CPU and memory usage of the
system.
Overall, the admin module plays a crucial role in ensuring the smooth operation and effective use of
the market basket analysis system. By providing the necessary tools for managing and configuring the
system, it enables the system administrator to optimize the performance and accuracy of the analysis,
and to meet the needs and requirements of the users.
The admin module in market basket analysis typically handles the management and configuration of
the system. Its Main Functions include:
User management: The admin module allows the system administrator to manage user accounts and
permissions, such as creating new accounts, deleting accounts, and assigning access rights.
Data management: The admin module provides tools for managing the transaction data used in the
analysis, such as importing new data, exporting data, and cleaning the data.
Algorithm configuration: The admin module allows the system administrator to configure the
parameters and settings forthe market basket analysis algorithms, such as the support and confidence
thresholds, the maximum dataset size, and the algorithmic approach.
Report generation: The admin module enables the system administrator to generate reports and
visualizations of the marketbasket analysis results, such as frequent itemsets, association rules, and
other metrics.
A Level 1 Data Flow Diagram (DFD) is a more detailed representation of a system's data flow
compared to a higher-level DFD, such as a Context Level DFD. It provides a more granular view of
the processes, data flows, external entities, and data stores within the system.In a Level 1 DFD, the
top-level process from the Context Level DFD is decomposed into sub- processes, representing more
specific activities or functions. The Level 1 DFD focuses on showing the major processes and their
interactions, providing a clearer understanding of how data flows through the system.
● Data Flows: Data flows represent the movement of data between processes, external entities,
and data stores. They show how data is transformed or exchanged between the different
components of the system. Data flows are labeled to describe the type of data being transmitted,
and they are represented by arrows indicating the direction of data flow.
● External Entities: External entities are sources or destinations of data that interact with the
system. They can be users, other systems, or organizations. External entities are depicted as
rectangles or squares and are connected to processes or data stores through data flows.
● Data Stores: Data stores represent the repositories or storage locations where data is persisted or
retrieved. They can be databases, files, or any other form of data storage. Data stores are
depicted as rectangles with parallel lines representing stored data. Processes interact with data
stores through data flows.
A Level 1 DFD provides a more detailed and refined view of the system compared to a higher-level DFD,
allowing for a better understanding of the system's functionality and data flow. A Level 1 Data Flow
Diagram (DFD) is a more detailed representation of a system's data flow compared to a higherlevel DFD,
such as a Context Level DFD. It provides a more granular view of the processes, data flows, external
entities, and data stores within the system.n a Level 1 DFD, the top-level process from the Context Level
DFD is decomposed into sub-processes, representing more specific activities or functions. The Level 1
DFD focuses on showing the major processes and their interactions,providing a clearer understanding of
how data flows through the system.It servesas a foundation for further decomposition into subsequent
levels of DFDs, where each sub-processcan be further detailed and expanded upon. Overall, the admin
module plays a crucial role in ensuring the smooth operation and effective use of the market basket
analysis system. By providing the necessary tools for managing and configuring the system, it enables the
system administrator to optimize the performance and accuracy of the analysis, and to meet the needs and
requirements of the users It focuses on how data moves through different stages or components of a
system, highlighting the transformations that occur along the way. DFDs are commonly used in systems
analysis and design to visualize the data flow and to understand the information requirements and
processing steps of a system.
Overall, the admin module plays a crucial role in ensuring the smooth operation and effective use of
the market basket analysis system. By providing the necessary tools for managing and configuring the
system, it enables the system administrator to optimize the performance and accuracy of the analysis,
and to meet the needs and requirements of the users It focuses on how data moves through different
stages or components of a system, highlighting the transformations that occur along the way. DFDs are
commonly used in systems analysis and design to visualize the data flow and to understand the
information requirements and processing steps of a system.
Consumers nowadays have a wide range of options, independently in almost every domain. In the past,
when the consumer had to buy something, he only could choose a product from the catalog of the
store. However, with the new era of information and globalization, the list of options have increased
exponentially. Now consumers can choose between a huge variety of products and their variances.
Limitations as geography, season and so on are no longer an issue. Products that were considered as
luxury goods are considered as common products now. All of this led the companies to have limitless
possibilities nowadays. However, this limitless possibility caused a huge number of new competitors
to enter the market. The retail stores seek for marketing strategies in order to attract new customers or
keep its current customers. Only new marketing strategies could help this situation by offering
efficient promotions and proper product planning. Market basket analysis which have been practiced
in other countries have shown remarkable success. As a result, multinational retail stores such as
Walmart and Tesco have been using market basket analysis in order to achieve higher profit. But in
order to get the insights using market basket analysis we need to have information about our customer
purchase regarding what they buy and when they buy it. Hence, comes the importance of the data
about the customers purchases which is based on their behavior.
In the last two decades there have been an explosive growth in the data, but not all data is relevant. So,
the companies started to use data to discover and extract relevant information. This process of
extracting useful information is called Data Mining also known as Knowledge Discovery and Data
(KDD) process. Data mining allows a search for valuable information in large volumes of data (Weiss
& Indurkhya, 1998). It is widely used in several aspects of science such as manufacturing, marketing,
CRM, retail trade, psychology, education, etc. There are several data mining techniques which helps to
extract meaningful knowledge and find the solution to the organizational problems. Some of them are
Neural Networks, Artificial information, Classification, Association, Prediction, Clustering,
Regression, Sequence discovery, Visualization.
Application of data mining techniques in various fields have been very effective till now. For example,
in the field of healthcare, it can help healthcare insurers detect fraud and abuse, healthcare
organizations make customer relationship management decisions, physicians identify effective
treatments and best practices, and patients receive better and more affordable healthcare services (Hian
Chye Koh and Gerald Tan). In the field of marketing, customer segmentation involves the subdivision
of an entire customer base into smaller customer groups, consisting of customers who are similar
within each specific segment (Woo, Bae, & Park, 2005).
Domain of data mining is the analysis of transactional data. In a recorded transactional database each
transaction is a collection of items. The best technique to analyze and find the relationships and
patterns between items is market basket analysis. It is one of the most interesting research areas of the
data mining that have received more attention by researchers nowadays.
Most of the retail markets are more focused on the what their customer’s buy. But they ignore the fact
about when they buy it. Which is also considered to be a huge factor in their behavior of purchase.
This thesis is focused on not just “what” the customer buys but also “when” they buy it. According to
Forbes magazine marketers are constantly looking into future, trying to predict next big trend and data
driven marketing is the top most trend right now in which time plays a highly significant role.
Fig.9 Level 1 DFD In market basket analysis, the billing detail
table is a key component of the data collection process. The
billing detail table contains information about each
transaction, including the items that were purchased, the price
of each item, and the total amount paid for the transaction.
By collecting data from the billing detail table over a period of time, retailers can gain insights into
customer purchasing patterns and identify which items are frequently purchased together. into
customer purchasing patterns and identify which items are frequently purchased together. This
information can then be used to make product recommendations, optimize store layout, and increase
sales. By collecting data from the billing detail table over a period of time, retailers can gain insights
into customer purchasing patterns and identify which items are frequently purchased together. into
customer purchasing patterns and identify which items are frequently purchased together. This
information can then be used to make product recommendations, optimize store layout, and increase
sales.
supervised learning models like classification and regression. It essentially aims to mimic the market
to analyze what causes what to happen. Essentially, it considers items purchased in a sequence to
determine cross-selling. For example, buying an extended warranty is more likely to follow the
purchase of an iPhone. While it isn't as widely used as a descriptive MBA, it is still a very valuable
tool for marketers. This type of analysis is beneficial for competitor analysis. It compares purchase
history between stores, between seasons, between two time periods, between different days of the
week, etc., to find interesting patterns in consumer behaviour. For example, it can help determine why
some users prefer to purchase the same product at the same price on Amazon vs Flipkart. The answer
can be that the Amazon reseller has more warehouses and can deliver faster, or maybe something more
profound like user experience.
Algorithms that use association rules include AIS, SETM and Apriori. The Apriori algorithm is
commonly cited by data scientists in research articles about market basket analysis. It identifies
frequent items in the database and then evaluates their frequency as the datasets are expanded to larger
sizes.
3.2.6 Output design
As show in the table 5 sample data of 24 transaction was taken as input to the system and when the
support was 10% and confidence 40% 21 rules were generate from which 12 rules were found to be
strong with 100% confidence. When the support was 20% with same confidence then 2 rules was
generated from which 1 rule was found o be strong and when support was 30 % no rule was generated.
7 transaction was taken as input to the system and when the support was 10% and confidence 50% 62
rules was generated from which 31 rules was found to be strong with 100% confidence. When the
support was 10% with 60% confidence then 31 rules was generated from which 31 rules was found to
be strong. When support was 20% and confidence was 70% 10 rules was generated from which 10
rules was found to be strong and when support was 30 % no rule was generated.
● PYTHON
Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code
readability with the use of significant indentation via the off-side rule.
Python is dynamically typed and garbage-collected. It supports multiple programming paradigms,
including structured (particularly procedural), object-oriented and functional programming. It is often
described as a "batteries included" language due to its comprehensive standard library.
Guido van Rossum began working on Python in the late 1980s as a successor to the ABC
programming language and first released it in 1991 as Python 0.9.0.[36] Python 2.0 was released in
2000. Python 3.0, released in 2008, was a major revision not completely backward-compatible with
earlier versions. Python 2.7.18, released in 2020, was the last release of Python 2.
Python uses dynamic typing and a combination of reference counting and a cycle-detecting garbage
collector for memory management. It uses dynamic name resolution (late binding), which binds
method and variable names during program execution.
Its design offers some support for functional programming in the Lisp tradition. It has
filter,mapandreduce functions; list comprehensions, dictionaries, sets, and generator expressions.[71]
The standard library has two modules (itertools and functools) that implement functional tools
borrowed from Haskell and Standard ML.
● MACHINE LEARNING
Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the
use of data and algorithms to imitate the waythat humans learn, gradually improving its accuracy.
IBM has a rich history with machine learning. One of its own, Arthur Samuel, is credited for coining
the term, “machine learning” with his research (PDF, 481 KB) (link resides outside IBM) around the
game of checkers. Robert Nealey, the self-proclaimed checkers master, played the game on an IBM
7094 computer in 1962, and he lost to the computer..
Over the last couple ofdecades, the technological advances in storage and processing power have
enabled some innovative products based on machine learning, such as Netflix’s recommendation
engine and self- driving cars.
Machine learning is an important component of the growing field of data science. Through the use of
statistical methods, algorithms are trained to make classifications or predictions, and to uncover key
insights in data mining projects. These insights subsequently drive decision making within
applications and businesses, ideally impacting key growth metrics. As big data continues to expand
and grow, the market demand for data scientists will increase. They will be required to help identify
the most relevant business questions and the data to answer them.
HTML. The resulting page is compiled and executed on the server to deliver a document. The
compiled pages, as well as any dependent Java libraries, contain Java byte code rather than machine
code. JSPs are usually used to deliver HTML and XML documents, but through the use of Output
Stream, they can deliver other types of data as well.
JSP pages use several delimiters for scripting functions. The most basic is <% ... %>, which encloses a
JSP scriptlet. A scriptlet is a fragment of Java code that is run when the user requests the page.
This standard was developed by Sun Microsystems as an alternative to Microsoft's active server page
(ASP) technology. When used with Java Database Connectivity (JDBC), JSP provides a dynamic way
to create database-driven websites.
● Portability: JSP can be deployed across many platforms. All these components can be run
across Web servers.
● Configured for reusability: JSP components can be reused across servlets, JavaBeans and
Enterprise JavaBeans (EJB).
● Simplification: JSP is simple in the processes of development and maintenance. HTML
HTML is a computer language devised to allow website creation. These websites can then be viewed
by anyone else connected to the Internet. It is relatively easy to learn, with the basics being accessible
to most people in one sitting; and quite powerful in what it allows you to create. It is constantly
undergoing revision and evolution to meet the demands and requirements ofthe growing Internet
audience.
Hypertext is the method by which you move around on the web — by clicking on special text called
hyperlinks which bring you to the next page. The fact that it is hyper just means it is not linear —
i.e. you can go to any place on the Internet whenever you want by clicking on links — there is no set
order to do things in.
Markup is what HTML tags do to the text inside them. They mark it as a certain type of text. HTML is
HTML consists of a series of short codes typed into a text-file. The text is then saved as a html file,
and viewed through a browser like Internet Explorer. The tags are what separate normal text from
HTML code. You might know them as the words between the <angle-brackets>. The tags themselves
don‘t appear when you view your page through a browser, but their effects do. The simplest tags do
nothing more than apply formatting to some text.
Web browsers receive HTML documents from a web server or from local storage and render the
documents into multimedia web pages. HTML elements are the building blocks of HTML pages. With
HTML constructs, images and other objects such as interactive forms may be embedded into the
rendered page. HTML can embed programs written in a scripting language such as JavaScript, which
affects the behavior and content of web pages. Inclusion of CSS defines the look and layout of content.
● CSS
HTML was originally designed as a simple way of presenting information, with the aesthetics of a web
page being far less important than the content (and largely being left up to the web browser). Of
course, now that the web has become as popular as it has, the presentation of your content has become
almost critical to a site‘s success. CSS is the key presentational technology that is used to design
websites. Cascading Style Sheets (CSS) is a style sheet language used for describing the
presentation of a document written in a markup language like HTML.
CSS is designed to enable the separation of presentation and content, including layout, colors, and
fonts.[3] This separation can improve content accessibility, provide more flexibility and control in the
specification of presentation characteristics, enable multiple web pages to share formatting by
specifying the relevant CSS in a separate .css file, and reduce complexity and repetition in the
structural content. CSS has a simple syntax and uses a number of English keywords to specify the
names of various style properties. A style sheet consists of a list of rules. Each rule or rule-set consists
of one or more selectors, and a declaration block.
Well-authored CSS improves the accessibility of web content, allowing access through myriad devices
(handheld PDAs for example) and ensuring that web users with disabilities are still able to receive it. It
also eliminates the need for browser-specific hacks and tags, which means your site has a better chance
of working across all major browsers. Another of CSS‘s boons is that you define things once, making
it far more efficient than defining everything in HTML on everypage. This means:
Bootstrap also comes with several JavaScript components in the form of jQuery plugins. Each
Bootstrap component consists of an HTML structure, CSS declarations, and in some cases
accompanying JavaScript code. The most prominent components of Bootstrap are its layout
components, as theyaffect an entire web page. The basic layout component is called "Container", as
every other element in the page is placed in it. Once a container is in place, other Bootstrap layout
components implement a CSS Flexbox layout through defining rows and columns.
Bootstrap‘s components are well-adopted to the ecosystemof popular JS MVC Frameworks like
Angular. Bootstrap provides several ways to include it into your project.
● JavaScript
JavaScript often abbreviated as JS, is a programming language. Alongside HTML and CSS, JavaScript
is one of the core technologies of the World Wide Web. JavaScript enables interactive web pages and is
an essential part of web applications. The vast majority of websites use it for client-side page behavior,
and all major web browsers have a dedicated JavaScript engine to execute it.
It was originally developed by Netscape as a means to add dynamic and interactive elements to
websites. While JavaScript is influenced by Java, the syntax is more similar to C.
JavaScript is a client-side scripting language, which means the source code is processed by the client's
web browser rather than on the web server. This means JavaScript functions can run after a webpage
has loaded without communicating with the server. Like server-side scripting languages, such as PHP
and ASP, JavaScript code can be inserted anywhere within the HTML ofa webpage. However, onlythe
output of server-side code is displayed in the HTML, while JavaScript code remains fully visible in the
source ofthe webpage. It can also be referenced in a separate .JS file, which may also be viewed in a
browser.
JavaScript functions can be called within <script> tags or when specific events take place. Examples
include onClick, onMouseDown, onMouseUp, onKeyDown, onKeyUp, onFocus, onBlur, onSubmit,
and many others. While standard JavaScript is still used for performing basic client-side functions,
many web developers now prefer to use JavaScript libraries like jQuery to add more advanced
dynamic elements to websites.
● MySql
MySQL is an Oracle-backed open source relational database management system (RDBMS) based on
Structured Query Language (SQL). MySQL runs on virtually all platforms, including Linux, UNIX
and Windows. Although it can be used in a wide range of applications, MySQL is most often
associated with web applications and online publishing.
MySQL is based on a client-server model. The core of MySQL is MySQL server, which handles all of
the database instructions (or commands). MySQL operates along with several utility programs which
support the administration of MySQL databases. Commands are sent to MySQL server via the MySQL
client, which is installed on a computer. MySQL enables data to be stored and accessed across multiple
storage engines. MySQL users aren't required to learn new commands; they can access their data using
standard SQL commands.
MySQL is written accessible and available across in C and C++ and over 20 platforms, including Mac,
Windows, Linux and Unix. MySQL is offered in two different editions: the open source MySQL
Community Server and the proprietary Enterprise Server. For security, MySQL uses an access
privilege and encrypted password system that enables host-based verification. MySQL clients can
connect to MySQL Server using several protocols, including TCP/IP sockets on anyplatform. MySQL
supports a number of client and utility programs, command-line programs and administration tools
such as MySQL Workbench.
MySQL was originally developed to handle large databases quickly. Although MySQL is typically
installed on only one machine, it is able to send the database to multiple locations, as users are able to
access it via different MySQL client interfaces. These interfaces send SQL statements to the server and
then display the results. Today, MySQL is the RDBMS behind many of the top websites in the world
and countless corporate and consumer-facing web-based applications, including Facebook, Twitter and
YouTube.
MySQL was designed to be compatible with other systems. It supports deployment in virtualized
environments. Users can transfer their data to a SQL Server database by using database migration
tools. The database object semantics between SQL Server and MySQL are similar, but not identical.
There are architectural differences that must be considered when migrating from SQL Server to
MySQL. In MySQL, there is no difference between a database and a schema, while SQL Server treats
the two as separate entities.
4.2 Testing
System testing is the stage of implementation which is aimed at ensuring that the system works
accurately and efficiently before live operation commences. Testing is the process executing the
program with the intent of finding errors and missing operations and also a complete verification to
determine whether the objectives are met and the user requirements are satisfied. The ultimate aim is
quality assurance. Tests are carried out and the results are compared with the expected document. In
case of erroneous results, debugging is done. Using detailed testing strategies a test plan is carried out
on each module. The various tests performed are unit testing, integration testing and user acceptance
testing.
Blackbox testing method has been used for the testing of our project. BLACK BOX TESTING, also
known as Behavioral Testing, is a software testing method in which the internal
structure/design/implementation of the itembeing tested is not knownto the tester. These tests can be
functional or non-functional, though usually functional. Testing has been done at all levels of the
system- unit, integration, system and acceptance level. Blackbox testing method can be used in all the
stages of the project.
Functionality testing, also known as functional testing, is a type of software testing that focuses on
verifying that a system or software application meets its functional requirements and performs as
expected. It aims to ensure that the software functions correctly and performs the tasks it is designed to
do The purpose of functionality testing is to validate the behavior of the software against the functional
specifications or requirements. It involves testing the features, functions, and interactions of the system
to ensure they work as intended.The purpose of testing is to discover errors. Testing is the process of
trying to discover every conceivable fault or weakness in a work product. It provides a way to check
the functionality of components, subassemblies, assemblies and/or a finished product. It is the process
of exercising software with the intent of ensuring that the Software system meets its requirements and
user expectations and does not fail in an unacceptable manner. Unit testing was performed to test
correctness of different modules. Test case 1: Correctness of the output.
● Data Input: The data input functionality of the market basket analysis needs to be tested to ensure
that it is capable of taking data from various sources such as flat files, databases, or data warehouses.
The input data should also be tested for correctness and completeness, to ensure that all relevant data
is included in the analysis.
● Pre-processing: Market basket analysis requires data pre-processing, which includes tasks such as
cleaning, transforming, and reducing data. The functionality of pre-processing tasks should be tested
to ensure that the data is transformed correctly and that there are no data inconsistencies.
● Association Rule Mining: Association rule mining is the core functionality of market basket
analysis, which involves the generation of association rules based on the input data. The accuracy
and effectiveness of the association rule mining process should be tested to ensure that the generated
association rules are meaningful and relevant.
● Post-Processing: The post-processing functionality involves the presentation of results to the user,
which may include graphical representations of association rules or other data visualization
techniques. The functionality of post-processing should be tested to ensure that the results are
presented accurately and in a user-friendly format.
● Performance: The performance of the market basket analysis algorithm should also be tested to
ensure that it can handle large datasets efficiently and that the analysis results are generated in a
reasonable amount of time.
● Pre-processing: Market basket analysis requires data pre-processing, which includes tasks such as
cleaning, transforming, and reducing data. The functionality of pre-processing tasks should be tested
to ensure that the data is transformed correctly and that there are no data inconsistencies.
● Association Rule Mining: Association rule mining is the core functionality of market basket
analysis, which involves the generation of association rules based on the input data. The accuracy
and effectiveness of the association rule mining process should be tested to ensure that the generated
association rules are meaningful and relevant.
● Post-Processing: The post-processing functionality involves the presentation of results to the user,
which may include graphical representations of association rules or other data visualization
techniques. The functionality of post-processing should be tested to ensure that the results are
presented accurately and in a user-friendly format.
● Performance: The performance of the market basket analysis algorithm should also be tested to
ensurethat it can handle large datasets efficiently and that the analysis results are generated in a
reasonable
The goal of functionality testing is to identify any functional defects or deviations from the desired
behavior and report them for resolution. Test cases are designed based on functional requirements and
specifications, and the software is tested against these predefined test scenarios to validate its
functionality.
The purpose of functionality testing is to validate the behavior of the software against the functional
specifications or requirements. It involves testing the features, functions, and interactions of the system
to ensure they work as intended.
By conducting functionality testing, organizations can ensure that their software meets user expectations,
functions correctly, and delivers the intended functionality. It helps improve the quality and reliability of
the software and enhances the overall user experience.
CHAPTER V: IMPLEMENTATION
Implementation is the stage in the project where the theoretical design is turned into a working system.
It involves careful planning, investigation of the current system and its constraints on implementation,
design of methods to achieve the changeover, an evaluation of change over methods. Apart from
planning major task of preparing the implementation are education and training of users. The
implementation process begins with preparing a plan for the implementation of the system. According
to this plan, the activities are to be carried out, discussions made regarding the equipment and
resources and the additional equipment has to be acquired to implement the new system. In a network
backup system no additional resources are require. The most critical stage in a achieving a successful
new system is giving the users the confidence that the new system will work and be effective. The
system can be implemented only after thorough testing is done and is found to be working according
to specification. This method also offers the greatest security since the old system can take over if the
errors are found or there is an inability to carry out a certain transaction while using the new system.
This phase outlines what the user needs to maximize the potential of the system and how to use the
● Copy the Travel and Tourism System Directory onto the hard disk drive.
Maintenance involves the software industry captive, typing up system resources. It means restoring
something to its original condition. Maintenance follows conversion to the extent that changes are
necessary to maintain satisfactory operations relative to changes in the user‘s environment. Maintenance
often includes minor enhancements or corrections to problems that surface in the system‘s operation.
Maintenance is also done based on fixing the problems reported, changing the interface with the other
software or hardware enhancing the software. Any system developed should be secured and protected
against possible hazards. Security measures are provided to prevent unauthorized access to the database
at various levels. An uninterrupted power supply should be so that the power failure or voltage
fluctuations will not erase the data in the files. Password protections and simple procedures to prevent
the unauthorized access are provided to the users. The system allows the user to enter the system only
through proper user name and password. Database is frequently backed up on a hard drive to prevent
anypossible data loss.
During the early stages of project planning, it is important to identify the resources and schedule for
development of the Maintenance & Operations Plan. The roles and responsibilities of the various
resources must be determined and an overall approach developed. This overall approach will include
elements such as:
· Approach to determine maintenance planning needs
· Tasks associated with developing final plan
· Assumptions, constraints, dependencies related to maintenance planning or product maintenance
· Options for staffing maintenance and operations, including outsourcing
· Costs associated with maintenance and operations, including cost ofoutsourced resources.
· Identification of keyplayers or user groups involved in planning and/or maintenance
· Roles and responsibilities of individuals involved in planning and/or maintenance
· Timelines for developing and implementing maintenance plan components
· Method for decision making on organizational and operational issues - e.g., how will you determine
who is responsible for maintenance, support strategy, maintenance change management
strategies, governance, etc.
· Relationships to other project plan elements, such as Change Management, Communications, Test
Plan, and Implementation & Transition Plan, during the life of the project.
Project deliverables that require maintenance after implementation should be addressed in the
Maintenance & Operations Plan. It may be a good idea to develop a control mechanism early in the
project that will ensure these items are included in the Maintenance & Operations Plan.
Most project processes will have maintenance and operations equivalents, including change
management, governance processes, testing, communications and the like. Review project planning
elements to determine those needed on an ongoing basis and include them in the Maintenance &
Operations Plan.
CHAPTER VII: Advantages and Limitations of the Developed System
7.1 Advantages
The basket analysis offers several advantages for businesses. It can help them improve sales, customer
service, as well as inventory management. Let us discuss some of these advantages below.
●Improved Sales: By understanding which products are often purchased together, businesses can
create targeted promotions and product recommendations that lead to increased sales.
●Better Inventory Management: Analyzing transaction and sales data can help businesses optimize
their inventory management by identifying which products are commonly purchased together and
ensuring that theyare always in stock.
●Enhanced Customer Loyalty: By providing personalized recommendations based on customer
purchase history, businesses can improve customer loyalty and increase the likelihood of repeat
purchases.
●Improved Pricing Strategies: Market basket analysis can help businesses optimize their pricing
strategies by identifying which products are often purchased together and adjusting their pricing
accordingly.
●Improved Marketing Campaigns: By understanding customer purchase behavior, businesses can
create more effective marketing campaigns that target specific customer segments and offer
personalized promotions.
●Improved Product Placement: Market basket analysis can help businesses optimize product
placement by identifying which products are often purchased together and placing them in close
proximity to each other.
7.2 Limitations
While transaction data analysis offers several advantages for businesses, there are also
somepotential disadvantages as shown below.
●Limited Insight into Causal Relationships: Market basket analysis can only identify relationships
between products that are frequently purchased together. It cannot determine the causal
relationship between them. It means that the businesses may not fully understand why certain
products are frequently purchased together or how changes in one product may affect the sales of
other products.
●Dependence on Data Quality: The accuracy and quality of the data used in market basket analysis
can significantly impact the validity of the results. Poor data quality or incomplete data can lead
to inaccurate or incomplete insights. This can ultimately lead to ineffective marketing strategies.
●Inability to Account for External Factors: Market basket analysis is limited to the data available
within a particular data set. It doesn’t take into account external factors that may influence
customer behavior,such as seasonality, competition, or economic conditions.
●Time and Resource Intensive: Analyzing consumer transaction data can be a complex and
timeconsuming process that requires specialized skills and resources, such as data analysts and
statistical software.
●Ethical Concerns: Analyzing consumer data can raise ethical concerns related to customer privacy
and data security. Businesses must ensure that they are collecting and analyzing customer data in
an ethical and responsible manner and in compliance with relevant data privacy laws.
Chapter VIII: Conclusion and Suggestions for further work
8.1 Conclusion
Frequent data mining is the most important association rule mining method, which is used in many
applications like Market Basket Analysis, Clustering, Series Analysis, Object Mining, etc. However,
the minimum threshold value is the most important factor in determining the efficiency of results. It
affects most of the results of both algorithms. It is a very low value that adds a lot of unwanted data
whereas a very large value leads to a loss of information. Hence it should be calculated very
effectively. Various methods are proposed for this but it also depends largely on the needs, taking the
average of all the values or depending upon the characteristics of the dataset. Further dynamic
threshold values also solve the problem to some extent which changes its values at every step of the
algorithm like reducing the support value by 5% at every step or using some formula that dynamically
changes the value of the threshold according to the characteristics of the dataset.
The mostly used algorithms Apriori and Eclat are used in association rule mining. Both give the same
results but differ in their efficiency in different size datasets, Eclat being the faster and reducing the
time-space complexity in giving the results in small and medium size datasets. Apriori generates a new
dataset at each step and uses a horizontal approach. It uses more memory, but the Apriori algorithm is
faster in larger datasets because the vertical approach of Eclat consumes more memory and becomes
ineffective. Hence Eclat algorithm is well suited for small and medium-sized datasets and the Apriori
algorithm is effective in larger datasets.
8.2 Recommendations
The input data given to the application is used as the integer value mapped from the transaction
database. The mapping is done manually. If database converter is made then the system will
workeffectively for any format of data. The application can be efficiently used by using more
efficientalgorithm rather that Apriori Algorithm in future.
Incorporate additional data sources: In addition to transaction data, market basket analysis can be
enhanced by incorporating other data sources such as customer demographics, weather data, or social
media data. This can help businesses better understand the factors that influence purchasing behavior
and identify new opportunities for product development or marketing.
Implement real-time analysis: Real-time market basket analysis can provide businesses with
immediate insights into customer behavior and preferences. Developing algorithms that can analyze
data in real-time can help businesses respond more quickly to changes inthe market and improve their
overall competitiveness.
Develop advanced association rule mining techniques: Advanced association rule mining techniques
such as sequential pattern mining and temporal association rule mining can help businesses identify
more complex patterns and trends in customer behavior. These techniques can be used to generate
more targeted marketing campaigns and improve product recommendations.
Incorporate machine learning and predictive analytics: Machine learning and predictive analytics
can be used to develop more accurate and personalized product recommendations. By analyzing
customer data and behavior patterns, businesses can develop algorithms that can predict future
purchasing behavior and recommend products that are more likely to be of interest to individual
customers.
Integrate market basket analysis with other data analytics techniques: Market basket analysis can be
integrated with other data analytics techniques such as clustering, segmentation, and sentiment
analysis to provide businesses with a more comprehensive understanding of customer behavior. This
can help businesses identify new market opportunities and develop more effective marketing
strategies.
By exploring these areas of further work in market basket analysis, businesses can gain a deeper
understanding of customer behavior and preferences, and develop more effective strategies for product
development, marketing, and customer engagement.
REFERENCES
[1] Saravanan, V. (2021). An empirical study on market basket analysis using association rule mining.
International Journal of Management, Technology And Engineering, 11(2), 367-372. [2] Pradhan, R.
K., & Das, B. (2021). A survey on frequent itemset mining algorithms for market basket analysis.
International Journal of Scientific & Engineering Research, 12(2), 127-134.
[3] Joes Korstanje, “Using the famous Apriori algorithm in Python to do frequent item set
miningfor basket analysis”, Proceedings of Medium, Year; 2021.
[4] Ishwari Joshi, Priya Khanna, Minal Sabale, Nikita Tathawade,” A Comparative Study of
Association Mining Algorithms for Market Basket Analysis “Proceedings of IJARIIE-
ISSN(O)-23954396, Year; 2021.
Internal
[5] I. Qoniah and A. T. Priandika, "Basketball Market Analysis To Determine Rule Association
With Apriori Algorithm (Case Study: Tb. Menara)," J. Teknol. and Sist. Inf., vol. 1, no. 2, pp.
26–33, 2020.
[6] M Qisman, R Rosadi, A S Abdullah,”Market basket analysis using Apriori algorithm to find
consumer patterns in buying goods through transaction data (a case study of Mizan computer
retail stores)”Proceedings of Journal of Physics: Conference Series, Year;2020
[7] Edgar Lawrence, Bagus Mulyawan, Novario Jaya Perdana,” MARKET BASKET ANALYST
BASED ON WEBSITE USING ECLAT ALGORITHM (CASE STUDY POLA
PHARMACY)”Proceedings of ijiksi.v8i2.11497, Year;2020
[8] Yang, J., Zhang, J., & Huang, Y. (2020). A fast and scalable approach to market basket analysis
using efficient mining of negative frequent itemsets. IEEE Transactions on Big Data,6(4), 1-1.
[9] Alzahrani, A., & Kurniawan, M. (2019). Application of market basket analysis in retail sector:
A systematic review. Journal of Theoretical and Applied Electronic Commerce Research, 14(3),
48-62. [10] Hu, Y., Gao, M., Huang, W., & Zhang, X. (2019). An efficient approach to market
basket analysis with novel concept hierarchy generation. Knowledge-Based Systems, 164, 31-
43.
[11] Kumar, S., & Rani, R. (2019). An overview of market basket analysis techniques: Applications
and challenges. International Journal of Advanced Science and Technology, 28(16),150-158. [12] Gao,
B., & Wang, X. (2019). Association rules mining and analysis of purchasing behaviorin cross-border e-
commerce. Journal of Business Research, 100, 484-497.
[13] Oztekin, A., & Ture, M. (2019). A comparative analysis of association rule mining algorithms.
International Journal of Computer Science and Information Security, 17(5), 152-160.
[14] Li, Z., Yu, G., & Guo, X. (2019).Research on optimization
of association rules miningalgorithm based on Apriori. Journal of Physics: Conference Series,
1174(1), 012057.
[15] HYS Samarasinghe, WJ Samaraweera, CP Waduge, “Market Basket Analysis: A Profit Based
Product Promotion Forecasting” Proceedings of 12th International Research Conference2019, KDU,
Year; 2019
Appendix 1: Definitions
Record: A record is a value that contains other values, typically in fixed number and sequence and
typically indexed by names.
Database: A database is an organized collection of data. The data are typically organized to model
relevant aspectsof reality in a way that supports processes requiring this information.
Object: An object is a location in memory having a value and referenced byan identifier.
Chapter Two: This chapter outlines the requirement analysis and feasibility study. A feasibility analysis
evaluates the project‘s potential for success; therefore, perceived objectivity is an essential factor in
the credibility of the study for potential investors and lending institutions. There are five types of
feasibility study- Technical, Operational, legal, economic and scheduling feasibility. [2]
Chapter Three: This chapter explains the system analysis, methodology, and system design the
proposed system.
Chapter Four: This chapter deals with the coding and testing of the project. Testing has been performed
manually by designing various test cases using normal, exceptional and extreme data.
Chapter Five: This chapter deals system implementation and documentation of the new system, which
includes the tools, software and hardware requirement of the new system, system testing and
maintenance of the new system are also disused here.
Chapter Six: This chapter highlights the maintenance aspect of the project. Regular Maintenance is necessary
to keep the system updated and reduce risks.
Chapter Seven: This chapter aims at explaining the advantages of the system and as to why this system
will be extremely beneficial. It also lists the various limitations of the system.
Chapter Eight: This chapter summarizes the entire project as well as draws conclusion and recommendations
from the project.