FlowJo Data Corruption Go to Outmoded Bonsai Project page for downloads
Jump back to the Main Outmoded Bonsai Introduction.
Support This Project

Thanks Amgen staff for swift co-operation and feedback on the data corruption fix. The Amgen staff has integrated the patch within a week, and bumped the version up from 1.5.0 to 1.5.2 (r45365, presumably there was an internal unreleased 1.5.1):




The problem (and other similiar issues for "in-house" XML processing), as detailed below, does not affect windows users, but affects Linux users of R. Or more precisely, it affects people trying to process FlowJo workspace files on systems for which the native encoding is not iso8859-1/latin1/codepage 1252. So only English MS Windows users are not affected. Linux and Solaris defaults to UTF-8, and also CJK windows R users are affected.

Slide 16 acknowledgement of flowFlowJo FICCS presentation listed these people:
Gary Means Florian Hahne
Katie Newhall Nolwenn Le Meur
Nishant Gopalakrishnan
CompBio Robert Gentleman
Hugh Rand
Research Information Systems Adam Triester
Lauren Buchholz
Sharon Wong-Madden Becton Dickinson
Jack Dunne
BioStats Errol Strain
Cheng Su Perry Haaland
University of Cambridge
Vincent Plagnol

Most of the Amgen/Becton Dickinson people are probably windows users; the FHCRC people probably only reviewed the work but not actually use it. "Vincent Plagnol" is both a primarily linux user, and had also published on flowcytometry in the last year... I don't know about the others.

Isn't it fun to find out scientific "discovery" may equal carelessness and data corruption?

Duncan Temple Lang (author of RSXML) has posted the result of our discussion on-line link here .

--- On Mon, 15/3/10, Hin-Tak Leung <htl10@...> wrote:

> From: Hin-Tak Leung <htl10@...>
> Subject: silent data corruption in flowFlowJo, and fix
> To: paboyoun@..., gosinkj@...
> Cc: bioconductor@...
> Date: Monday, 15 March, 2010, 2:20
> Hi,
> Commit r41352 from j.gosink broke flowFlowJo Bioc's nightly
> check for most of summer/autumn 2009 until just before BioC
> 2.5 code freeze, p.aboyoun committed r42419 which involves
> using iconv() to strip multibyte data to make the nightly
> check pass. Unfortunately it "fixes" some flowjo workspace
> files but breaks others. I finally find the time to look at
> it - it is actually fairly serious and causes silent data
> corruption and here is the fix - please review and commit.
> The underlying issue is this: FlowJo workspaces files are,
> in most(?all) cases, XML with iso8859-1 encoding (a.k.a.
> 'latin1'). With win32 R which defaults to codepage 1252 (a
> superset of latin1), R check passes - everything is in
> latin1 and the data stripping has no effort. On Linux and
> other "modern" unix systems, which defaults to UTF-8, R
> check fails - not all iso8859-1 text is valid UTF-8 text and
> vice versa, and also, the multibyte data strip causes data
> corruption.
> The proper fix is to query libxml2 about the xml encoding
> and set the encoding explicitly - it is a substantial
> rewrite. As a side-effect, the code possibly run faster as
> well - most of the gsub() don't not need to be 'g'. The
> regular expressions are only concerned with manipulating the
> header and only need to match the first instance.
> Cheers,
> Hin-Tak Leung

Hin-Tak Leung, last updated 2010-03-25

Get Outmoded Bonsai at SourceForge.net. Fast, secure and Free Open Source software downloads Get Outmoded Bonsai at SourceForge.net. Fast, secure and Free Open Source software downloads Support This Project