Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Skip to content

Try the next-generation Data Catalog at catalog-beta.data.gov and help shape it with your feedback.

A clustering method for repeat analysis in DNA sequences

Metadata Updated: September 6, 2025

Background A computational system for analysis of the repetitive structure of genomic sequences is described. The method uses suffix trees to organize and search the input sequences; this data structure has been used previously for efficient computation of exact and degenerate repeats.

      Results
      The resulting software tool collects all repeat classes and outputs summary statistics as well as a file containing multiple sequences (multi fasta), that can be used as the target of searches. Its use is demonstrated here on several complete microbial genomes, the entire Arabidopsis thaliana genome, and a large collection of rice bacterial artificial chromosome end sequences.


      Conclusions
      We propose a new clustering method for analysis of the repeat data captured in suffix trees. This method has been incorporated into a system that can find repeats in individual genome sequences or sets of sequences, and that can organize those repeats into classes. It quickly and accurately creates repeat databases from small and large genomes. The associated software (RepeatFinder), should prove helpful in the analysis of repeat structure for both complete and partial genome sequences.

Access & Use Information

Public: This dataset is intended for public access and use. License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person's official duties it is considered a U.S. Government Work.

Downloads & Resources

Dates

Metadata Created Date July 24, 2025
Metadata Updated Date September 6, 2025

Metadata Source

Harvested from Healthdata.gov

Additional Metadata

Resource Type Dataset
Metadata Created Date July 24, 2025
Metadata Updated Date September 6, 2025
Publisher National Institutes of Health
Maintainer
NIH
Identifier https://healthdata.gov/api/views/9m8z-466d
Data First Published 2025-07-14
Data Last Modified 2025-09-06
Category NIH
Public Access Level public
Bureau Code 009:25
Metadata Context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Metadata Catalog ID https://healthdata.gov/data.json
Schema Version https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id a0ec9877-d398-4bc5-92e3-b4c4b7de46d2
Harvest Source Id 651e43b2-321c-4e4c-b86a-835cfc342cb0
Harvest Source Title Healthdata.gov
Homepage URL https://healthdata.gov/d/9m8z-466d
Program Code 009:033
Source Datajson Identifier True
Source Hash 596514f2c1a3eef800a4410a56d84ae7bf750010a18233f09975200db53c5532
Source Schema Version 1.1

Didn't find what you're looking for? Suggest a dataset here.