Monday, May 1, 2017

Data Mining’s Application for Analyzing Performance of Social Empowerment Activities

Data Mining’s Application for Analyzing Performance of Social Empowerment Activities

Pankaj Gupta
Department of Computer Science and Engineering, BIT Mesra:Ranchi, off-Campus:Noida.

Abstract
Social development activities are flourishing in diversified branches of society endeavor, despite numerous hurdles inflicting on their ways that are truly cross-sectoral. They vary from providing basic human services, as such education, health, and entrepreneurship to advance maneuvers depending on the demand at the outset. However, while talking about discovering true success cases around the globe, recapitulating their thoroughfares to accumulate knowledge; and foremost, utilizing newly emerged information technology methods to archive and disseminate model cases, not many stand on their own. This has happened due for many reasons, and a few of them are; improper program design, inaccurate site selection, incorrect break even analysis, insufficient supply of funding, unbalanced manpower selection, inappropriate budget allocation, inadequate feedback and monitoring. Apart from them, there are many hidden parameters that are not even visible. Furthermore, these visible parameters (including the invisible) are intricately intermingled to one another in such a way that lagging of one derailed the whole project and eventually the program fail. Not surprisingly, all of these parameters depend on data and information on implemented programs or projects of which they mostly lack. Thus, lack of data and information related to their appropriateness (or inappropriateness), made them failure projects, despite devoted efforts by the implementers, in most cases. This paper has tried to focus on data mining applications and their utilizations in formulating performance-analyzing tools for social development activities. In this context, this paper has provided justifications to include data mining application to establish monitoring and evaluation tools for various social development applications. Specifically, this paper gave in-depth analytical observations to establish knowledge for acceptance and rejection for various social activities and transform the contemporary human society into a knowledge society.
Keywords: Data Mining, Social Activities, Empowerment, Knowledge,

Introduction
All information pertaining to a successful organization is truly its asset. Information, such as client lists, vendor lists, product details, employee information, and corporate strategy, is invaluable. Without appropriate feeding of information, a business cannot operate properly (Utimaco, 2005). This is potentially true for any sort of ventures that may vary from providing services to the scientific community or academics or civil society or individuals. However, to take an intelligent decision, the information needs to be processed and compiled. Data mining is a method of collecting and processing of data and eventually assisting to take knowledgeable decision. In today’s modern information based environment, data mining is day by day coming at the front and beginning to acquire more and more attention. Because data mining is all about acquisition, assessment and analysis, and by automatic or semiautomatic means huge or small, all quantities of data can help to uncover meaningful patterns and rules. These patterns and schemes help enterprises improve their marketing, sales and customer support operations to better understand their end users. Over the years, corporate houses have accumulated very large databases from applications such as enterprise resource planning (ERP), client relationship management (CRM), or other operational systems. People believe that there are untapped values hidden inside these data, and data mining techniques can help these patterns out of this data.1 Currently data are being collected and accumulated across a wide variety of fields at an exaggerated pace. Data are no more a rigid matter for an entrepreneurship, or an organization, but have became an intrinsic part of any management process and most dynamic in nature. For these reasons, data mining algorithms are imperative to researches in the aspect of making intelligent decisions through data mining. To cope up with this new arena of research, there is an urgent need for a new generation of computational theories and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data. At the same time, data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention (Boulicaut, Esposito, Giannotti & Pedreschi, 2004; Bramer, 1999; Fayyad, Piatetsky-Shapiro & Smyth, 1996; Freitas, 2002; Kargupta & Chen, 2001; Kloesgen & Zythkow, 2002; Larose, 2004; Miller & Han, 2001). Here we focused on application of data mining algorithms in establishing social development management systems, for this we illustrate a few real-world applications, but specifically focused to data mining algorithms; challenges involved in those applications of knowledge discovery, including contemporary and future research directions in the arena of establishing knowledge centers to assist the society for taking intelligent decision. Also tries how data mining algorithms may be applied for making decision support systems. However, until now, not many researches are being conducted to measure their impacts in the society, or any cost benefit analyses have carried out. This article tries to devise to formulate the measuring criteria utilizing data mining. Finally it discusses a few challenges with some hints on future research directives before concluding.

Background
In contrast to heuristics (which contain general recommendations based on statistical evidence or theoretical reasoning), algorithms are comprised of completely defined, finite sets of steps, operations, or procedures to produce a particular outcome. Algorithms are based on finite patterns and occurrences in any incidents, and the outcome could be quantified using mathematical formulations (Abbass, Sarker & Newton, 2002; Adamo, 2001; Kantardzic, 2002;Yoon & Kerschberg, 1993). Historically, the concept of finding useful patterns in data has been given a variety of names, including data mining, knowledge extraction, information discovery, information harvesting, data archeology, data warehousing, data repository, or data pattern processing. Furthermore, the term data mining has been mainly used by statisticians, data analysts, and management information system (MIS) communities. Though it has also gained popularity in the database field (Chakrabarti, 2002; Fayyad, Piatetsky-Shapiro & Smyth, 1996; Hand, Mannila & Smyth, 2001; Liu & Motoda, 1998a, 1998b; Pal & Mitra, 2004; Perner & Petrou, 1999; Pyle, 1999), but development partners and researchers in the field of implementing numerous development projects remain aloof of utilizing data mining techniques to preserve their data or content, and as well as utilizing data mining algorithms to derive their project outcomes. Data remain as critical means of project evaluation essence and data processing possesses as a simple means of conversion of raw data into tables or charts. The hidden pattern within the data remains hidden and transformation of those data into knowledge element could not gain concrete momentum until now. Furthermore, there has not been any mathematical formulation derived that can take care the transformation of data into knowledge and at the same time, measure their impact in the society, or quantify the impact of data transformation. The traditional method of turning data into knowledge relies on manual analysis and interpretation. For example, in the health-care industry, it is common for physicians or specialists to periodically analyze current trends and changes in health-care data. The specialists then provide a report detailing the analysis to the authority; and ultimately this report becomes the basis for future decision making and planning for health care management. In a totally different category of application, planetary geologists sift through remotely sensed images of planets and asteroids by carefully locating and cataloging such geologic objects of interest as impact craters. Perhaps it can be a village information center, established at a very remote corner of a geographically dispersed region. There has not been evolved many readymade formulas, algorithms, hypothesis, or any measuring criteria to recognize their pattern of growth and implementation, nature of operation, sustainability of their existence, or replication of success cases in applicable states or stages. Be it science, research, marketing, finance, health care, retail shop, community center, or any other field, the classical approach to data analysis relies fundamentally on one or more analysts becoming intimately familiar with the data ad serving as an interface between the data and the users and end products (Berthold & Hand, 1999; Fayyad, Piatetsky-Shapiro & Smyth, 1996; Maimon & Last, 2000; Mattison, 1997). Nevertheless, in recent years many entrepreneurs are formulating measuring criteria that include marketing, finance (especially investment), fraud detection, data access, data cleaning, manufacturing, telecommunications, and Internet agents. Here, a few data mining algorithms based on rough set theory (RS) (Cox, 2004; Curotto & Ebecken, 2005; Kantardzic, 2002; Myatt, 2006; Nanopoulos, Katsaros & Manolopoulos, 2003; Thuraisingham, 1999; Zhou, Li, Meng & Meng, 2004) are included which are used to extract decision-making rules from dataset. Rough set theory provides a neat methodology to formalize and calculate the results for data mining problems. In the early 1980’s Z. Pawlak, in cooperation with other researchers developed the rough set data analysis (RSDA) (Pawlak, 1982). As recommended by its main adage “let the data speak for themselves”, RSDA tried to distinguish internal characteristics of a data set, such as categorization, dependency, and association rules, without invoking external metrics and judgment (Drewry et al., 2002).

Analyzing Social Activities using Data Mining
The output of a data mining algorithm is typically a pattern or a set of patterns that are valid in the given data. A pattern is defined as a statement (expression) in a given language, that describes (relationships among) the facts in a subset of the given data, and is in some sense simpler than the enumeration of all the facts in the subset. (Drewry et al, 2002, p. 2) A given data mining algorithm usually depends on a built-in class of patterns, and the particular language of patterns considered depends on the characteristics of given data (the attributes and their values). Data mining for association rules is an useful method for analyzing data that describe transactions, lists of items, unique phrases (in text mining), and so forth. In this context, the decision tree algorithm would probably be the most popular technique for predictive modeling.
This section constitutes the main thrust of the chapter and includes a few models/patterns of data mining algorithms that would be used to deduce possible measuring criteria of social development processes.
The following example explains some of the basics of the decision tree algorithms. Table 1 shows a data-set that could be used to predict credit risk. In this example, fictionalized information was generated on loan seekers that included debit level, income level, what type of employment they had and whether they were a good or bad credit risk.
Loan-Seeker-Id
Debt-Level
Income-Level
Employment-Status
Credit-Risk
Remarks
1
High
High
Self-Employed
Bad

2
High
High
Salaried
Bad

3
High
Low
Self-Employed
Bad

4
High
Low
Salaried
Bad

5
Low
High
Self-Employed
Bad
Accepted
6
Low
High
Salaried
Bad
Accepted
7
Low
Low
Self-Employed
Bad

8
Low
Low
Salaried
Bad

9
High
High
Self-Employed
Good
Accepted
10
Low
High
Self-Employed
Good
Accepted
11
Low
Low
Salaried
Good
Accepted
Table-1 Loan Seeker’s Info.

In the example illustrated in Figure-1, the decision tree algorithm might determine that the most significant attribute for predicting credit risk is debt level. The first split in the decision tree is, therefore, made on debt level. One of the two new nodes (debt = low) is a leaf node, containing two cases with bad credits and three cases with good credit. In this example, a high debt level is a perfect predictor of a bad credit risk. The other node (debt = high) is still mixed, having two good credits and zero bad credit case.

 
Departmental stores may use data mining to understand customer’s behavior, sale trend, market behavior, and predict market strategy. This can be done using the following table. Table 2 includes two forms of tables—case table and nested table. A case table contains the case information related to the non-nested part of the data, and a nested table contains information related to the nested part of the data. In the following table, there are two input tables to the mining model. One table contains information about customer demographics. It is a case table. The other table contains information about customer purchases. It is a nested table. In database technology, a nested table is similar to a transaction table. In the example, age group division may be made more broad sacrificing accuracy of the result, though smaller age groups segregation results in complicated algorithms. This applies to other parameters too.


Customer-id
Age-Group

a-below15,
b-15-20,
c-21-26,
d-27-32,
e-33-38,
f-39-44,
g-45-50,
h-51-56,
i-57-62,
j-above 62
Martial-Status

M-married,
S-separated,
D-divorced,
U-unmarried
Wealth-Group

A-Less than 50,000,
B-Between 50,000-250,000,
C-Between 251,000-450,000,
D-Above 451,000
Product Purchase
Product
Quantity
1
C
M
B
Washing Machine
1
TV
1
Shampoo
2
2
E
S
C
Diet-coke
12
TV
1
Jelly
3
Cake
2
3
B
M
A
Coke
3
Cake
1
Jelly
1
Table-2

To illustrate another example of data mining, hidden patterns inside data have been considered. It is a fact that, data mining finds hidden patterns inside datasets, and these patterns can be used to solve many business problems. The following table presents a few business questions that are difficult to answer without data mining, and at the same time answers to these questions are essential for making decisions on predictive marketing (Ville, 2001; Ville, 2006; Weiss & Indurkhya, 1997). Fields for Table 3 could be Cust_ID, Income, Other_Income, Loan, Age_Group, Area_Residence, Home_Years, Value_House, Home_Type, Insured, Type_of_Insurance, Education_Level, Leave_Yes_No, and others. Association rule mining is another fundamental technique in data mining.  

Question Number
Question(Data Mining Application)
1
Identifying those customers that are most likely depart based on customer demographical information (Decision tree without nested table)
2
Grouping heterogeneous customers into subgroups based on customer profile to generate a mailing list for marketing purposes (Clustering without nested table)
3
Finding the list of other products that the customer may be interested in, based on the products the customer has purchased (Cross-selling using decision tree with nested table)
4
Grouping customers into more or less homogeneous groups based on the customer profile and the list of banking products they have subscribed to (Clustering with nested table)
Table-3 Information for Predictive Marketing
In some real-life applications, for example, market basket analysis in super market chain stores, data sets can be too large for manual analysis, and potentially valuable relations among attributes may not be evident at a glance. An association rule-mining algorithm can find frequent patterns (sets of database attributes) in a given data set and generate association rules among database attributes. For example, some items can be frequently sold together, for example, milk and cereal, or bread and butter. Such items can be displayed together to improve the convenience of shopping. Association rule mining is generally be applicable to those applications in which the data set is large and it is useful to find frequent patterns and their associations, for example, market basket analysis, medical research, and intrusion detection. Similarly, algorithms may be devised for various other social activities like, readymade garments databank (bridging the gap between developed and developing countries), NGO networks engaged in social development works, skill and capacity development databank (migration of skilled workers), jobs databank (for youths and jobless), online blood bank (during emergencies and disasters), and microcredit databank for the overall benefit of the society.

Future Issues and Challenges
Data mining algorithms in future should consider incorporation of larger databases, high dimensionality, over fitting, assessing of statistical significance, dynamic database, adaptation of knowledge theory, treatment of missing and noisy data, complex relationships between fields, understandability of tattered patterns, user interaction and prior knowledge, and integration, and versatility with other systems (Wang, 2003). While measuring performance impact of social development activities, future research should formulate a homogeneous pattern of implementation, provided varying nature of environment, economy, culture and other parameters exist at the peripheries. Specifically, in terms of knowledge centers, there should be a symmetric matrix to follow as a guideline, over which each node, sub-node, or any discrete existence of knowledge center could be established. This will reduce the design cost, operating expenditure, monitoring complexity and assist in measuring the performance quantitatively. Given the three patterns of implementation model, yet numerous debates are running across the globe about their advantages and disadvantages. A systematic approach, in terms of establishing a mathematical formula and its consequential algorithm will ease debacles of enormous nature and lead to deduce a verified threshold as output. Furthermore, quantification of knowledge development from the immensely discrete activities of qualitative nature will remain as challenge to the future researchers. Finally, utilizing data mining algorithms for measuring performance impact demand huge storage of data of varying nature; many of them have not been archived during the last decade of implementation phases (collection and archival of existing data) and by far most of them need to be transformed into recognized data sets, so that they can be used by verified data readers (transformation to any recognized database structure). Now, before concluding, a pattern of data transformation is portrayed here in Figures-2. If a community would like to synthesize data and transform them into knowledge then transformation pattern are visible. The vertical one is more or less thorough and involves several stages of action during the transformation process though deserves rigorous study and closed observation. Researchers may derive separate algorithms for this transformation process, so that an acceptable measuring indicator may evolve in future.

Conclusion
It is well recognized, that the real-world knowledge-measurement applications obviously vary in terms of underlying data, complexity, the amount of human involvement required, and their degree of possible automation of parts of the discovery process. In most applications, however, an indispensable part of the measurement process is that the analyst explores the data and sifts through the raw data to become familiar with it and to get a feel for what the data may cover. Furthermore, very often an explicit specification of what one actually is looking for only arises during an interactive process of data exploration, analysis, and segmentation (Stumme, Wille & Wille, 1998). Therefore, proper data mining techniques with timely feedback analysis on the executed results deserves immediate attention for accurate result. It is a difficult task to eliminate theories of probability, redundancies of efforts and abundances of varying data in determining reasonable mathematical formulae to measure the impact of social development processes. Complexity accumulates further, when it comes to projects or programmes that are related to newly evolved ICTs. Many developing and transitional economies are entangled with severe social problems within the vicious poverty cycle; thereby evolution of ICT emulated performance indicators are extremely difficult to resonate. They are diverse, deem to diverge and tend to become vulnerable in the longer run, without a verified mathematical model. Moreover, data mining algorithms should incorporate design, development, implementation and operational factors, in addition to developing mathematical models on cost-benefit analysis. Foremost, utilizing data mining, success cases should come out at the forefront with rigorous analysis, so that they could be easily replicated elsewhere, with minimum adjustments.

References


1. Abbass, H. A., Sarker, R. A., & Newton, C. S. (Eds.) (2002). Data mining: A heuristic approach. Hershey, PA: IGI Global.
2. Adamo, Jean-Marc (2001). Data mining for association rules and sequential patterns: Sequential and parallel algorithms. Springer Verlag.
3. Agrawal, R. & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases (pp. 487-499), Santiago, Chile.
4. Agrawal, R., Imielinski, T. & Swami, A. (1993). Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD Special Interest Group on Management of Data (pp. 207-216), Washington, DC.
5. Boulicaut, Jean-Francois, Esposito, F., Giannotti, F. & Pedreschi, D. (Eds.) (2004). Knowledge discovery in databases. In Proceedings of the PKDD 2004: 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy.
6. Bramer, M. A. (Ed.) (1999). Knowledge discovery and data mining: Theory and practice. IEE Books.
7. Chakrabarti, S. (2002). Mining the Web: Discovering knowledge from hypertext data. Morgan Kaufmann.
8. Cox, E. (2004). Fuzzy modeling and genetic algorithms for data mining and exploration. Morgan Kaufmann.
9. Curotto, C. L. & Ebecken, N. F. F. (2005). Implementing data mining algorithms in Microsoft® SQL Server™. WIT Press.
10. de Ville, Barry. (2001). Microsoft data mining, Integrated business intelligence for e-commerce and knowledge management.
11. de Ville, Barry (2006). Decision trees for business intelligence and data mining: Using SAS enterprise miner. SAS Press.
12. Drewry et al. (2002). Current state of data mining. Department of Computer Science, University of Virginia.
13. Fayyad, U., G. Piatetsky-Shapiro, & P. Smyth. (1996). From data mining to knowledge discovery in databases (a survey). AI Magazine, 17(3), 37-54.
14. Giuffrida, G., Cooper, L. G., & Chu, W. W. (1998). A scalable bottom-up data mining algorithm for relational databases. In Proceedings of the Tenth International Conference on Scientific and Statistical Database Management (pp. 206-209)
15. Hale, J., Threet, J., & Shenoi, S. (1994). A practical formalism for imprecise inference control. Ifip Trans. A-Computer Science And Technology,60, 139-156.
16. Han, J., Kamber, M. & Chiang, J. (1997). Metaruleguided mining of multi-dimensional association rules using data cubes. In Proceedings of international conference on knowledge discovering and data mining (KDD’97), pp. 207-210.
17. Kloesgen, W. & Zytkow, J. (Eds.) (2002). Handbook of data mining and knowledge discovery. Oxford University Press. Larose, D. T. (2004). Discovering knowledge indata: An introduction to data mining. Wiley-Interscience.
18. Utimaco (2005). Data encryption: The foundation of enterprise security. Foxboro, MA: Utimaco Safeware, Inc.
19. Wang, J. (Ed.) (2003). Data mining opportunities and challenges. IRM Press.
20. Zhou, C., Li, Z., Meng, Y. & Meng, Q. (2004). A data mining algorithm based on rough set theory.

* * * * *

No comments:

Post a Comment

Popular Posts