{"@type": "dcat:Dataset", "accessLevel": "public", "accrualPeriodicity": "irregular", "bureauCode": ["026:00"], "contactPoint": {"@type": "vcard:Contact", "fn": "Kanishka Bhaduri", "hasEmail": "mailto:kanishka.bhaduri-1@nasa.gov"}, "description": "There has been a tremendous increase in the volume of sensor data collected over the last decade\r\nfor different monitoring tasks. For example, petabytes of earth science data are collected from modern\r\nsatellites, in-situ sensors and different climate models. Similarly, huge amount of flight operational data\r\nis downloaded for different commercial airlines. These different types of datasets need to be analyzed\r\nfor finding outliers. Information extraction from such rich data sources using advanced data mining\r\nmethodologies is a challenging task not only due to the massive volume of data, but also because these\r\ndatasets are physically stored at different geographical locations with only a subset of features available\r\nat any location. Moving these petabytes of data to a single location may waste a lot of bandwidth.\r\nTo solve this problem, in this paper, we present a novel algorithm which can identify outliers in the\r\nentire data without moving all the data to a single location. The method we propose only centralizes\r\na very small sample from the different data subsets at different locations. We analytically prove and\r\nexperimentally verify that the algorithm offers high accuracy compared to complete centralization with\r\nonly a fraction of the communication cost. We show that our algorithm is highly relevant to both earth\r\nsciences and aeronautics by describing applications in these domains. The performance of the algorithm\r\nis demonstrated on two large publicly available datasets: (1) the NASA MODIS satellite images and (2) a\r\nsimulated aviation dataset generated by the \u2018Commercial Modular Aero-Propulsion System Simulation\u2019 (CMAPSS).", "distribution": [{"@type": "dcat:Distribution", "description": "DAnom.pdf", "downloadURL": "https://c3.nasa.gov/dashlink/static/media/publication/DAnom_2.pdf", "format": "PDF", "mediaType": "application/pdf", "title": "DAnom.pdf"}], "identifier": "DASHLINK_367", "issued": "2011-05-05", "keyword": ["ames", "dashlink", "nasa"], "landingPage": "https://c3.nasa.gov/dashlink/resources/367/", "modified": "2025-03-31", "programCode": ["026:029"], "publisher": {"@type": "org:Organization", "name": "Dashlink"}, "title": "Distributed Anomaly Detection using 1-class SVM for Vertically Partitioned Data"}