Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

A Provenance-based Approach To Evaluate Data Quality In Escience

Abstract

Data quality is growing in relevance as a research topic. Quality assessment has been progressively incorporated in many business environments, and in software engineering practices. eScience environments, however, because of the multiplicity and heterogeneity of data sources and scientific experts involved in a given problem, complicate data quality assessment. This paper deals with the evaluation of the quality of data managed by eScience applications. Our approach is based on data provenance, i.e. the history of the origins and transformations applied to a given data product. Our contributions include (a) the specification of a framework to track data provenance and use it to derive quality information, (b) a model for data provenance based on the Open Provenance Model, and (c) a methodology to evaluate the quality of data based on its provenance. Our proposal is validated experimentally by a prototype that takes advantage of the Taverna workflow system. Copyright © 2014 Inderscience Enterprises Ltd.911528Barbosa, I., Casanova, M.A., Trust indicator for decisions based on geospatial data (2011) Proceedings of the XII Brazilian Symposium on GeoInformatics, pp. 49-60. , 27-29 November, BrazilBarga, R.S., Digiampietri, L.A., Automatic generation of workflow provenance (2006) Proceedings of the 2006 International Conference on Provenance and Annotation of Data, 4145, pp. 1-9. , Moreau, L. and Foster, I.T. (Eds) , Springer-VerlagBarga, R.S., Jackson, J., Araujo, N., Guo, D., Gautam, N., Grochow, K., Lazowska, E.D., Trident: Scientific workflow workbench for oceanography (2008) SERVICES I. IEEE Computer Society, pp. 465-466Blake, R., Mangiameli, P., The effects and interactions of data quality and problem complexity on classification (2011) Journal of Data and Information Quality, 2, pp. 81-828Brown, M.E., Pinzfion, J.E., Didan, K., Morisette, J.T., Tucker, C.J., Evaluation of the consistency of longterm ndvi time series derived from avhrr, spotvegetation, seawifs, modis, and landsat etm+ sensors (2006) IEEE Transactions on Geoscience and Remote Sensing, 44 (7), pp. 1787-1793Buneman, P., Chapman, A., Cheney, J., Provenance management in curated databases (2006) Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 539-550. , ACM, New York, NY, USAChapman, A.D., (2005) Principles of Data Quality, , Global Biodiversity Information Facility, CopenhagenCheah, Y., Plale, B., Provenance analysis: Towards quality provenance (2012) Proceedings of 8th IEEE International Conference on EScience 2012, , 8-12 October, Chicago, USACheney, J., Chiticariu, L., Tan, W., Provenance in databases: Why, how, and where (2009) Foundations and Trends in Databases, 1 (4), pp. 379-474Cohen-Boulakia, S., Biton, O., Cohen, S., Davidson, S., Addressing the provenance challenge using ZOOM (2008) Concurrency and Computation: Practice and Experience, 20, pp. 497-506Congalton, R.G., Green, K., (2009) Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, , 2nd ed., CRC Press, Boca Raton, FLDa Silva, P.P., McGuinness, D.L., Fikes, R., A proof markup language for semantic web services (2006) Information Systems, 31 (4), pp. 381-395Dai, C., Lin, D., Bertino, E., Kantarcioglu, M., An approach to evaluate data trustworthiness based on data provenance (2008) Proceedings of the 5th VLDB Workshop on Secure Data Management, pp. 82-98. , Springer-Verlag, Berlin, HeidelbergDeering, D., (1978) Rangeland Reectance Characteristics Measured by Aircraft and Spacecraft Sensors, , PhD Thesis, Texas A&M Univ., College StationDing, L., Kolari, P., Finin, T., Joshi, A., Peng, Y., Yesha, Y., On homeland security and the semantic web: A provenance and trust aware inference framework (2005) AAAI Spring Symposium: AI Technologies for Homeland Security, AAAI, pp. 157-160(1998) Content Standard for Digital Geospatial Metadata FGDC-STD-001-1998, , FGDC Technical report, US Geological SurveyGarcia-Molina, H., Ullman, J.D., Widom, J., (2008) Database Systems: The Complete Book, , Prentice Hall Press(2006) Geonames, , http://www.geonames.org/, (accessed on January 2013)Gonzalez, R.C., Woods, R.E., (2006) Digital Image Processing, , 3rd ed., Prentice-Hall, Inc., Upper Saddle River, NJ, USAGoodchild, M.F., Li, L., Assuring the quality of volunteered geographic information (2012) Spatial Statistics, 1, pp. 110-120Hartig, O., Provenance information in the web of data (2009) Proceedings of the 2nd Workshop on Linked Data on the Web (LDOW2009), , 20 April, Madrid, SpainHartig, O., Zhao, J., Using web data provenance for quality assessment (2009) Proceedings of the Workshop on Semantic Web and Provenance Management at ISWC, , October, Washington DC, USAJøsang, A., Ismail, R., Boyd, C., A survey of trust and reputation systems for online service provision (2007) Decision Support Systems, 43, pp. 618-644Kondo, A.A., Medeiros, C.B., Bacarin, E., Madeira, E.R.M., Traceability in food for supply chains (2007) Proceedings of 3rd International Conference on Web Information Systems and Technologies (WEBIST), pp. 121-127. , 3-6 March, Barcelona, SpainLebo, T., Sahoo, S., McGuinness, D., (2013) PROV-O: The PROV Ontology, , http://www.w3.org/TR/prov-o/, (accessed on 30 April 2013)Lee, Y.W., Strong, D.M., Kahn, B.K., Wang, R.Y., AIMQ: A methodology for information quality assessment (2002) Information & Management, 40 (2), pp. 133-146Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Bizer, C., Dbpedia-A large-scale, multilingual knowledge base extracted from wikipedia (2013) Semantic Web Journal, , Under reviewLemos, F., (2013) Infrastructure and Algorithms for Information Quality Analysis and Process Discovery, , PhD Thesis, Ingénierie des Systèmes d'InformationMacário, C.G.N., Medeiros, C.B., A framework for semantic annotation of geospatial data for agriculture (2009) International Journal of Metadata, Semantics and Ontology, 4 (1-2), pp. 118-132Madnick, S.E., Wang, R.Y., Lee, Y.W., Zhu, H., Overview and framework for data and information quality research (2009) Journal of Data and Information Quality, 1, pp. 21-222Malaverri, J.E.G., Medeiros, C.B., Data quality in agriculture applications (2012) Proceedings of the XIII Brazilian Symposium on GeoInformatics (GeoInfo), , 25-27 November, BrazilMcGuinness, D., Da Silva, P.P., Explaining answers from the semantic web: The inference web approach (2004) Journal of Web Semantics, 1 (4), pp. 397-413Moraes, R.A., Rocha, J., Imagens de coeficiente de qualidade (Quality) e de confiabilidade (Reliability) para seleção de pixels em imagens de NDVI do sensor MODIS para monitoramento da cana-de-açúcar no estado de São Paulo (2011) Proceedings of Brazilian Remote Sensing SymposiumMoreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P.T., Kwasnikowska, N., Bussche, J.V.D., The open provenance model core specification (v1.1) (2011) Future Generation Computer Systems, 27 (6), pp. 743-756Myers, J.D., Futrelle, J., Gaynor, J., Plutchak, J., Bajcsy, P., Kastner, J., Kotwani, K., Liu, Y., (2009) Embedding Data Within Knowledge Spaces, , CoRR(2012) National Aeronautics and Space Administration, , https://wist.echo.nasa.gov/api/, (accessed on April 2012)Naumann, F., (2002) Quality-Driven Query Answering for Integrated Information Systems, 2261. , SpringerNaumann, F., Rolker, C., Assessment methods for information quality criteria (2000) IQ, MIT, pp. 148-162(2009) National Center for Supercomputing Applications, , http://leovip217.ncsa.uiuc.edu/, (accessed on May 2013)Parssian, A., Managerial decision support with knowledge of accuracy and completeness of the relational aggregate functions (2006) Decision Support Systems, 42, pp. 1494-1502Pastorello Jr., G.Z., (2008) Managing the Lifecycle of Sensor Data: From Production to Consumption, , PhD Thesis, Institute of Computing, University of Campinas, BrazilPierce, E.M., Assessing data quality with control matrices (2004) Communications of the ACM, 47, pp. 82-86Pipino, L.L., Lee, Y.W., Wang, R.Y., Data quality assessment (2002) Communications of the ACM, 45, pp. 211-218Prat, N., Madnick, S., Measuring data believability: A provenance approach (2008) Proceedings of the 41st Hawaii International Conference on System Sciences, , 7-10 JanuaryRam, S., Liu, J., Understanding the semantics of data provenance to support active conceptual modeling (2006) Active Conceptual Modeling of Learning, 4512, pp. 17-29. , in Chen, P.P. and Wong, L.Y. (Eds) , LNCS, SpringerReiter, M., Breitenbücher, U., Dustdar, S., Karastoyanova, D., Leymann, F., Truong, H.-L., A novel framework for monitoring and analyzing quality of data in simulation workflows (2011) Proceedings of the 2011 IEEE 7th International Conference on EScience, pp. 105-112. , 5-8 DecemberResnick, P., Kuwabara, K., Zeckhauser, R., Friedman, E., Reputation systems (2000) Communications of ACM, 43 (12), pp. 45-48Sahoo, S.S., Sheth, A., Provenir ontology: Towards a framework for escience provenance management (2009) Microsoft EScience Workshop, , Pittsburgh, PASampaio, D.S.F.M., Dong, C., Sampaio, P., Incorporating the timeliness quality dimension in internet query systems (2005) WISE Workshops, 3807. , LNCS, SpringerScheidegger, C.E., Vo, H.T., Koop, D., Freire, J., Silva, C.T., Querying and re-using workflows with VsTrails (2008) Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1251-1254. , ACMScholten, H., Cate, A.J.U.T., Quality assessment of the simulation modeling process (1999) Computers and Electronics in Agriculture, 22 (2-3), pp. 199-208Simmhan, Y.L., Plale, B., Gannon, D., A survey of data provenance in e-science (2005) ACM SIGMOD Record, 34 (3), pp. 31-36(2007) The Swift Project, , http://www.ci.uchicago.edu/swift/, (accessed on May 2013)Voisard, A., Medeiros, C.B., Jomier, G., Database support for cooperative work documentation (2000) Proceedings of the 4th International Conference on He Design of Cooperative Systems, , Sophia Antipolis, France, 23-26 MayWand, Y., Wang, R.Y., Anchoring data quality dimensions in ontological foundations (1996) Communications of ACM, 39, pp. 86-95Wang, R.Y., Strong, D.M., Beyond accuracy: What data quality means to data consumers (1996) Journal of Management Information Systems, 12 (4), pp. 5-34Wolstencroft, K., Haines, R., Fellows, D., Williams, A., Withers, D., Owen, S., Soiland-Reyes, S., Goble, C., The Taverna workflow suite: Designing and executing workflows of Web Services on the desktop, web or in the cloud (2013) Nucleic Acids Research, 41 (W1), pp. W557-W561Xie, J., Burstein, F., Using machine learning to support resource quality assessment: An adaptive attribute-based approach for health information portals (2011) Proceedings of the 16th International Conference on Database Systems for Advanced Applications, , 22-25 April, Hong Kon

Similar works

Full text

thumbnail-image

Repositorio da Producao Cientifica e Intelectual da Unicamp

redirect
Last time updated on 10/04/2020

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.