Performance of Classification Methods to Evaluate Groundwater (Case Study: Shoosh Aquifer)

Author
Assistant professor, Faculty of Sciences, Shahid Rajaee Teacher Training University, Tehran, Iran
Abstract
The objective of this study was to classify the Shoosh Aquifer to several zones with different water quality in Khuzestan Province, Iran. In this regard, the performance of classification methods (Discriminant function and Cluster analysis) for the classification of groundwater based on the level of pollution with an emphasis on the problem of over-fitting in training data were considered. An over-fitted model will generally have poor predictiveperformance, as it can exaggerate minor fluctuations in the data. Cluster Analysis(CA) was adopted to spatially explain the similarity of sampling stations with respect to measured parameters. Three methods for variable selection were used including regularized discriminant analysis, principal component analysis and Wilks's lambda method. The best algorithm for variable selection was Wilks'lambda which resulted in reducing the generalization error of the test sample to 0.1 for leave-one-out and 4-fold cross-validation. The second best performed algorithm was regularized discriminant function with 0.167 and 0.133 misclassification error for the two above-mentioned methods, respectively. Principal component analysis did not proved to be a promising algorithm for variable selection in the classification methods.
Keywords

Alberto, W.D., Pilar, D.M.D., Valeria, A.M., Fabiana, P.S., Cecilia, H.A. and Angeles, B.M.D.L. Pattern recognition techniques for the evaluation of spatial and temporal variations in water quality. A case study: Suquia river basin (Cordoba-Argentina), Water Res., 2001; 35(12): 2881-2894.
Babaei, A.A., Mahvi, A.H., Nouri, J., Ahmadpour, E. and Mohsenzadeh, F. An experimental study of macro and micro elements in groundwater, Biotechnology, 2006; 5(2): 125-129.
Baum, E.B. and Haussler, D. What size net gives valid generalization? Neurocomputing, 1989; 6:151-160.
Belhumeur, P.N., Hespanha, J.P. and Kriegman, D.J. Eigenfacesvs. Fisherfaces: recognition using class specific linear projection.IEEE Trans. Pattern Anal. Mach. Intell., 1997; 19: 711-720.
Burden, F.R., Donnert, D., Godish, T. and Mckelvie, I. Environmental monitoring handbook. 2004; McGraw-Hill Handbooks.
Carroll, S.P., Dawes, L., Hargreaves, M. and Goonetilleke, A. Faecal pollution source identification in an urbanizing catchment using antibiotic resistance profiling, discriminant analysis and partial least squares regression.Water Res., 2009;43: 1237-1246.
Feio, M., Almeida, S.F.P., Craveiro, S.C. and Calado, A.J. A comparison between biotic indices and predictive models in stream water quality assessment based on benthic diatom communities, Ecol. Indic, 2009; 9: 497-507.
Fried, J.J. Groundwater Pollution. Developments in Water Science Series, 4 Elsevier, Amsterdam, 1975; 312 P.
Friedman, J.H. Regularized discriminant analysis. Jam. Statist. Assoc.,1989; 84: 165-175.
Jennrich R.J. Stepwise discriminant analysis. In: Statistical Methods for Digital Computers, John Wiley and Sons, 1977; NewYork.
Johnson, R.A. and Wichern, D.W. Applied multivariate statistical analysis, sixth edition, Pearson Prentice Hall, 2007; New Jersey.
Jolliffe, I.T. Discarding variables in principal component analysis. I: Artificial data. Appl. Statist., 1972; 21: 160-173.
Jolliffe, I.T. Discarding variables in principal component analysis. II: Real data. Appl. Statist., 1973; 22: 21-31.
Mardia KV, Kent, J.T. and Bibby, J.M. Multivariate Analysis. London, Academic Press. 1979;
McLachlan, G. Discriminant analysis and statistical pattern recognition, John Wiley and Sons, INC., Publication, 2004; New Jersy.
Ouardighi, A.E., Akadi, A.E. and Aboutajdine, D. Feature Selection on Supervised Classification Using Wilks Lambda Statistic, ISCIII'07. International Symposium on Computational Intelligence and Intelligent Informatics. 2007.
Qiao, Z., Zhou, L. and Huang, J. Z. Effective linear discriminant analysis for high dimensional, low sample size data, Proceedings of 2008 World Congress of Engineering (WCE 2008), 2008; 1070-1075.
Qiao, Z., Zhou, L. and Huang, J.Z. Sparse linear discriminant analysis with application to high dimensional low sample size data, Int. J. Appl. Math., 2009; 39: 48-60.
Raudys, S.J. and Jain, A.K. Small sample size effects in statistical pattern recognition: Recommendations for practitioners, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991; 13: 252-264.
Reisenhofer, E., Adami, G. and Favretto, E. Heavy metals andnutrients in coastal, surface seawaters (Gulf of Trieste, NorthernAdriatic Sea): an environmental study by factor analysis. Fresenius J. Anal. Chem., 1996; 354: 729-734.
Sun, D.W. Infrared spectroscopy for food quality analysis and control, Academic press in an imprint of Elsevier, 2009; 51-82.
Tian, T.S., Wilcox, R.R. and James, G.M. Data reduction in classification: A simulated annealing based projection method. Statistical and Analytical Data Mining, 2010; 3(5): 319-331.
Trauth, M.H. Matlab recipes for earth sciences, Springer, 2006; USA.
Wu, E.M.Y. and Kuo, S.L. Applying a multivariate statistical analysis model to evaluate the water quality of a watershed, Water Environ. Res., 2012; 84: 2075-2085.
Zhuang, X.S. and Dai, D.Q. Improved discriminate analysis for high-dimensional data andits application to face recognition, Pattern Recogn., 2007;  40: 1570-1578.
Zhou, F., Guo, H., Liu, Y. and Jiang, Y. Chemometrics data analysis of marine water quality and source identification in Southern Hong Kong, Mar. Pollut. Bull., 2007; 54: 745-756.