Complex Data Flow Research Articles

The topic of knowledge management has matured considerably over recent years, evolving from abstract discussion about the translation of raw data into useful information on to knowledge while presenting an exposé of the differences between explicit and tacit knowledge. These dialogues, more often than not, lacked cohesion as well as a clear impression of the effect on the industry's bottom line. In reviewing the various papers for this year's feature, I have to remark about the superior quality and high degree of relevance of the various publications. One can witness a foundation of data acquisition and management in a continuum, which spans from the large volumes of data that comprise the static domain of the shared Earth model to the capacity to collect data on a continuous basis along the full profile of the wellbore. The latter is a clear demonstration of the great strides in technology made by the providers of data-gathering systems. Until recently, many knowledge-management initiatives were hindered by an environment that was unable to make immediate and efficient use of all the data being acquired to benefit the operation of the asset. A mainstay in the downstream operation of refinery and pipeline systems, the upstream E&P industry has evolved slowly in accepting the value of real-time data. Clearly, we are now at an advanced stage of integration of technology and information that does, indeed, drive decision making on a continuous basis. Coupling the advances described above with the ascribing of value to specific information provides added capability in making decisions that are bottom-line driven in an industry that depends increasingly on marginal assets as we focus on sustainability of current and future reserves. The papers featured this month provide an interesting wealth of information on the state of play of data-gathering technology, integration initiatives, operational deployment of digital oilfield environments, and the use of mentoring as a means of capturing and retaining knowledge. These are all excellent ingredients in a recipe for the continuing success of integrating knowledge management into mainstream E&P operations. Knowledge Management and Training additional reading available at the SPE eLibrary: www.spe.org SPE 106916 "From Sensors to Models to Visualization—Handling the Complex Data Flow" by Ø. Kolnes, SINTEF Petroleum Research, et al. SPE 103287 "Sustained Competitive Advantage Through Structured Mentoring" by G. Arango, SPE, Hughes Christensen, et al. SPE 108175 "Value-of-Information Applications in Unconventional Resource Plays" by Patrick E. Leach, Decision Strategies, et al.

Read full abstract

6-color flow cytometry allows multiparameter analysis of high numbers of single cells. It is an excellent tool for the characterization of a wide range of hematopoietic populations and for monitoring minimal residual disease. However, analysis of complex flow data is challenging. Gating populations on 28 two-parameter plots is extremely tedious and does not reflect the multidimensionality of the data. Here, we describe a novel approach, employing hierarchical clustering (HCA) and support vector machine (SVM) learning in analyzing flow data. This approach provides a new perspective for looking at flow data and promises better identification of rare and novel subpopulations that escape classic analysis. Our aim was to identify normal and leukemic B cell progenitor/stem cell populations in normal (n=6) and ALL (n=10) bone marrow. Samples were labelled with fluorochrome-conjugated antibodies to 6 CD markers (CD 10, 19, 22, 34, 38, 117) and 104 to 106 events were acquired (FACSCanto, BD Biosciences). To analyze flow data with HCA we developed a new algorithm, better suited for the ellipsoid nature of cell populations than other current HCA metrics. Data exported from DiVa software were externally compensated and Hyperlog transformed to achieve a logarithmic-like scale that displayed zero and negative values. Normalized data were then subjected to HCA employing a scale-invariant Mahalanobis distance measurement for merging clusters. This reflects the extended ellipsoid shape of the populations (here: 8 dimensional ellipsoids). We developed a new adaptive linkage algorithm that smoothly shifts from the Euclidean distance (when clusters are too small to compute Mahalanobis distance) to Mahalanobis distance measurement. This allowed us to build the hierarchy from single events, yet to retain the advantage of Mahalanobis measurement for larger clusters. To build classifiers we used SVM employing polynomial kernel. All work was carried out in MATLAB (MathWorks, Inc.). The resulting hierarchical tree combined with the heatmap of the CD marker expression allows visualization of hierarchically clustered data with all 8 parameters displayed in a single plot (!) as compared to 28 traditional two-parameter plots. HCA has big advantage of providing populations homogenous in their expression pattern of all parameters (without the need for complex sub or back gating). We were able to identify populations corresponding to the different stages of B-cell development. In a normal control bone marrow we could detect the following candidate B-lineage progenitor populations: CD34+117+38+10−22−19− (0.94% of total) progenitor/stem cells, CD34+117−38+10+22+19med (0.26% of total) pro-B cells, CD34−117−38+10+22+19+ (2.77% of total) small pre-B cells (lower FCS values), CD34−117−38+10+22+19+ (1.09% of total) large pre-B cells (higher FCS values) and CD34−117−38lo10−22+19+ (5.94% of total) (immature) B cells. In 10 diagnostic or relapse samples HCA clearly identified the main leukemic population. HCA is able to visualize otherwise “hidden” populations. This was exemplified by a distinct CD38+B-lin− population that overlapped with other populations in all 28 two-parameter plots (most likely T cells). We have built a classifier able to find established populations across samples and in large datasets (106 events) for which HCA would be computationally too demanding. In summary, we show the advantages of using hierarchical clustering analysis for large complex multiparameter flow cytometry datasets.

Read full abstract

Complex Data Flow Research Articles

Related Topics

Articles published on Complex Data Flow

Ophidia: Toward Big Data Analytics for eScience

Fluχ: a quality-driven dataflow model for data intensive computing

Managing data dependencies in service compositions

Management of complex data flows in the ASDEX Upgrade plasma control system

Spatial Data-mining Technology Assisting in Petroleum Reservoir Modeling

A New Data Mining Method based on Multidimensional-Data Flow

A Hardware-Efficient Multi-Resolution Block Matching Algorithm and its VLSI Architecture for High Definition MPEG-Like Video Encoders

A profile-based tool for finding pipeline parallelism in sequential programs

Recent Bioinformatics Advances in the Analysis of High Throughput Flow Cytometry Data

The Daily Commute: An Analysis of the Geography of the Labour Market using 2006 Census Data

Workflow-Based Data Parallel Applications on the EGEE Production Grid Infrastructure

Concepts for Visualization of Multidirectional Phase-contrast MRI of the Heart and Large Thoracic Vessels

Overview: Knowledge Management and Training (December 2007)

Identifying Candidate Normal and Leukemic B Cell Progenitor Populations with Hierarchical Clustering of 6-Color Flow Cytometry Data - A Better View.

A practical dynamic single assignment transformation

AN ANALYTICAL METHOD FOR PARALLELIZATION OF RECURSIVE FUNCTIONS

AN ANALYTICAL METHOD FOR PARALLELIZATION OF RECURSIVE FUNCTIONS

An Analytical Method for Parallelization of Recursive Functions

An architecture for control and monitoring of discrete events systems

Modular architecture for high performance implementation of 2-dimensional fast Fourier transform

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Complex Data Flow Research Articles

Related Topics

Articles published on Complex Data Flow

Ophidia: Toward Big Data Analytics for eScience

Fluχ: a quality-driven dataflow model for data intensive computing

Managing data dependencies in service compositions

Management of complex data flows in the ASDEX Upgrade plasma control system

Spatial Data-mining Technology Assisting in Petroleum Reservoir Modeling

A New Data Mining Method based on Multidimensional-Data Flow

A Hardware-Efficient Multi-Resolution Block Matching Algorithm and its VLSI Architecture for High Definition MPEG-Like Video Encoders

A profile-based tool for finding pipeline parallelism in sequential programs

Recent Bioinformatics Advances in the Analysis of High Throughput Flow Cytometry Data

The Daily Commute: An Analysis of the Geography of the Labour Market using 2006 Census Data

Workflow-Based Data Parallel Applications on the EGEE Production Grid Infrastructure

Concepts for Visualization of Multidirectional Phase-contrast MRI of the Heart and Large Thoracic Vessels

Overview: Knowledge Management and Training (December 2007)

Identifying Candidate Normal and Leukemic B Cell Progenitor Populations with Hierarchical Clustering of 6-Color Flow Cytometry Data - A Better View.

A practical dynamic single assignment transformation

AN ANALYTICAL METHOD FOR PARALLELIZATION OF RECURSIVE FUNCTIONS

AN ANALYTICAL METHOD FOR PARALLELIZATION OF RECURSIVE FUNCTIONS

An Analytical Method for Parallelization of Recursive Functions

An architecture for control and monitoring of discrete events systems

Modular architecture for high performance implementation of 2-dimensional fast Fourier transform