Permutation-validated principal components analysis of microarray data

Metadata Updated: September 7, 2025

Background In microarray data analysis, the comparison of gene-expression profiles with respect to different conditions and the selection of biologically interesting genes are crucial tasks. Multivariate statistical methods have been applied to analyze these large datasets. Less work has been published concerning the assessment of the reliability of gene-selection procedures. Here we describe a method to assess reliability in multivariate microarray data analysis using permutation-validated principal components analysis (PCA). The approach is designed for microarray data with a group structure.

      Results
      We used PCA to detect the major sources of variance underlying the hybridization conditions followed by gene selection based on PCA-derived and permutation-based test statistics. We validated our method by applying it to well characterized yeast cell-cycle data and to two datasets from our laboratory. We could describe the major sources of variance, select informative genes and visualize the relationship of genes and arrays. We observed differences in the level of the explained variance and the interpretability of the selected genes.


      Conclusions
      Combining data visualization and permutation-based gene selection, permutation-validated PCA enables one to illustrate gene-expression variance between several conditions and to select genes by taking into account the relationship of between-group to within-group variance of genes. The method can be used to extract the leading sources of variance from microarray data, to visualize relationships between genes and hybridizations and to select informative genes in a statistically reliable manner. This selection accounts for the level of reproducibility of replicates or group structure as well as gene-specific scatter. Visualization of the data can support a straightforward biological interpretation.

Access & Use Information

Public: This dataset is intended for public access and use. License: No license information was provided. If this work was prepared by an officer or employee of the United States government as part of that person's official duties it is considered a U.S. Government Work.

Downloads & Resources

Official Government Data SourceHTML
Visit the original government dataset for complete information,...

Visit page

Landing PageLanding Page

Visit page

Dates

Metadata Created Date	July 24, 2025
Metadata Updated Date	September 7, 2025

Metadata Source

Data.json Data.json Metadata
Download Metadata

Harvested from Healthdata.gov

Additional Metadata

Resource Type	Dataset
Metadata Created Date	July 24, 2025
Metadata Updated Date	September 7, 2025
Publisher	National Institutes of Health
Maintainer	NIH
Identifier	https://healthdata.gov/api/views/tc4r-7t92
Data First Published	2025-07-14
Data Last Modified	2025-09-06
Category	NIH
Public Access Level	public
Bureau Code	009:25
Metadata Context	https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
Metadata Catalog ID	https://healthdata.gov/data.json
Schema Version	https://project-open-data.cio.gov/v1.1/schema
Catalog Describedby	https://project-open-data.cio.gov/v1.1/schema/catalog.json
Harvest Object Id	461cfaae-eff8-4571-984e-c1c6ac4e9547
Harvest Source Id	651e43b2-321c-4e4c-b86a-835cfc342cb0
Harvest Source Title	Healthdata.gov
Homepage URL	https://healthdata.gov/d/tc4r-7t92
Program Code	009:033
Source Datajson Identifier	True
Source Hash	ca1771989c38c2e4d37c2069801c541d9061f26f52af7bb877a4e880cd61be14
Source Schema Version	1.1

Didn't find what you're looking for? Suggest a dataset here.

Data Catalog