{"@type": "dcat:Dataset", "accessLevel": "public", "accrualPeriodicity": "irregular", "bureauCode": ["026:00"], "contactPoint": {"@type": "vcard:Contact", "fn": "Kanishka Bhaduri", "hasEmail": "mailto:kanishka.bhaduri-1@nasa.gov"}, "description": "The problem of distance-based outlier detection is difficult\r\nto solve efficiently in very large datasets because of potential\r\nquadratic time complexity. We address this problem and\r\ndevelop sequential and distributed algorithms that are significantly\r\nmore efficient than state-of-the-art methods while\r\nstill guaranteeing the same outliers. By combining simple\r\nbut effective indexing and disk block accessing techniques,\r\nwe have developed a sequential algorithm iOrca that is up to\r\nan order-of-magnitude faster than the state-of-the-art. The\r\nindexing scheme is based on sorting the data points in order\r\nof increasing distance from a fixed reference point and\r\nthen accessing those points based on this sorted order. To\r\nspeed up the basic outlier detection technique, we develop\r\ntwo distributed algorithms (DOoR and iDOoR) for modern\r\ndistributed multi-core clusters of machines, connected\r\non a ring topology. The first algorithm passes data blocks\r\nfrom each machine around the ring, incrementally updating\r\nthe nearest neighbors of the points passed. By maintaining\r\na cutoff threshold, it is able to prune a large number\r\nof points in a distributed fashion. The second distributed\r\nalgorithm extends this basic idea with the indexing scheme\r\ndiscussed earlier. In our experiments, both distributed algorithms\r\nexhibit significant improvements compared to the\r\nstate-of-the-art distributed methods.", "distribution": [{"@type": "dcat:Distribution", "description": "Paper", "downloadURL": "https://c3.nasa.gov/dashlink/static/media/publication/KDD_11_outlier.pdf", "format": "PDF", "mediaType": "application/pdf", "title": "KDD_11_outlier.pdf"}], "identifier": "DASHLINK_450", "issued": "2011-08-15", "keyword": ["ames", "dashlink", "nasa"], "landingPage": "https://c3.nasa.gov/dashlink/resources/450/", "modified": "2025-03-31", "programCode": ["026:029"], "publisher": {"@type": "org:Organization", "name": "Dashlink"}, "title": "Algorithms for Speeding up Distance-Based Outlier Detection"}