Microarrays are a powerful tool in bioinformatics because they enable quantitative measures of gene expression levels in the transcription phase of thousands of genes simultaneously. As DNA sequencing techniques become automated, understanding the function of genes is still a huge challenge. For example, given that most cells in our body carry the same set of genetic material, how do cells differentiate? How are complex biological pathways genetically regulated? What are genetic causes of disease?
Microarrays are a new and evolving technology. Experiments are costly and large quantities of 40 MB image files are generated routinely, yet there are no standardized methods for image processing and information extraction. Consequently, compression is a necessary tool, and introduces an interesting question: what is the effect of compression loss on extracted features?
We propose a practical lossless and lossy with refinement compression scheme, SLOCO. We show empirically that the error in extracted gene expression levels caused by our compression loss is smaller than the variability in repeated experiments, which in turn is smaller than the variability caused by different image processing methods . In fact, compression has a denoising effect on the extracted features. We further study rate distortion performance using multiple non-linear distortion measures.
Figure 1: This display is .5/100 of an inkjet microarray image displayed in false color: red indicates high intensities and blue indicates low intensities. Each spot corresponds to one gene.