Gene selection for tumor classification using neighborhood rough sets and entropy measures.
J Biomed Inform. 2017 Feb 12;:
Authors: Chen Y, Zhang Z, Zheng J, Ma Y, Xue Y
With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification.
PMID: 28215562 [PubMed - as supplied by publisher]
Analyzing Structural Changes in SNOMED CT's Bacterial Infectious Diseases Using a Visual Semantic Delta.
J Biomed Inform. 2017 Feb 12;:
Authors: Ochs C, Case JT, Perl Y
Thousands of changes are applied to SNOMED CT's concepts during each release cycle. These changes are the result of efforts to improve or expand the coverage of health domains in the terminology. Understanding which concepts changed, how they changed, and the overall impact of a set of changes is important for editors and end users. Each SNOMED CT release comes with delta files, which identify all of the individual additions and removals of concepts and relationships. These files typically contain tens of thousands of individual entries, overwhelming users. They also do not identify the editorial processes that were applied to individual concepts and they do not capture the overall impact of a set of changes on a subhierarchy of concepts. In this paper we introduce a methodology and accompanying software tool called a SNOMED CT Visual Semantic Delta ("semantic delta" for short) to enable a comprehensive review of changes in SNOMED CT. The semantic delta displays a graphical list of editing operations that provides semantics and context to the additions and removals in the delta files. However, there may still be thousands of editing operations applied to a set of concepts. To address this issue, a semantic delta includes a visual summary of changes that affected sets of structurally and semantically similar concepts. The software tool for creating semantic deltas offers views of various granularities, allowing a user to control how much change information they view. In this tool a user can select a set of structurally and semantically similar concepts and review the editing operations that affected their modeling. The semantic delta methodology is demonstrated on SNOMED CT's Bacterial infectious disease subhierarchy, which has undergone a significant remodeling effort over the last two years.
PMID: 28215561 [PubMed - as supplied by publisher]
Leveraging Health Informatics to a Foster Smart Systems Response to Health Disparities and Health Equity Challenges.
J Biomed Inform. 2017 Feb 15;:
Authors: Jay Carney T, Kong AY
Informaticians are challenged to design health IT solutions for complex problems like health disparities but are only achieving mixed results in demonstrating a direct impact on health outcomes. This presentation of collective intelligence and the corresponding terms of smart health, knowledge ecosystem, enhanced health disparities informatics capacities, knowledge exchange, big-data, and situational awareness are means of demonstrating the complex challenges informatics professional face in trying to model, measure, and manage an intelligence and a smart systems response to health disparities. A critical piece in our understanding of collective intelligence for public and population health rests in our understanding of any public and population health as a living and evolving network of individuals, organizations, and resources. This discussion represents a step in advancing the conversation of what a smart response to health disparities should represent and how informatics can drive the design of intelligent systems to assist in eliminating health disparities and achieving health equity.
PMID: 28214562 [PubMed - as supplied by publisher]
How Are You Feeling?: A Personalized Methodology for Predicting Mental States from Temporally Observable Physical and Behavioral Information.
J Biomed Inform. 2017 Feb 14;:
Authors: Tuarob S, Tucker CS, Kumara S, Lee Giles C, Pincus AL, Conroy DE, Ram N
It is believed that anomalous mental states such as stress and anxiety not only cause suffering for the individuals, but also lead to tragedies in some extreme cases. The ability to predict the mental state of an individual at both current and future time periods could prove critical to healthcare practitioners. Currently, the practical way to predict an individual's mental state is through mental examinations that involve psychological experts performing the evaluations. However, such methods can be time and resource consuming, mitigating their broad applicability to a wide population. Furthermore, some individuals may also be unaware of their mental states or may feel uncomfortable to express themselves during the evaluations. Hence, their anomalous mental states could remain undetected for a prolonged period of time. The objective of this work is to demonstrate the ability of using advanced machine learning based approaches to generate mathematical models that predict current and future mental states of an individual. The problem of mental state prediction is transformed into the time series forecasting problem, where an individual is represented as a multivariate time series stream of monitored physical and behavioral attributes. A personalized mathematical model is then automatically generated to capture the dependencies among these attributes, which is used for prediction of mental states for each individual. In particular, we first illustrate the drawbacks of traditional multivariate time series forecasting methodologies such as vector autoregression. Then, we show that such issues could be mitigated by using machine learning regression techniques which are modified for capturing temporal dependencies in time series data. A case study using the data from 150 human participants illustrates that the proposed machine learning based forecasting methods are more suitable for high-dimensional psychological data than the traditional vector autoregressive model in terms of both magnitude of error and directional accuracy. These results not only present a successful usage of machine learning techniques in psychological studies, but also serve as a building block for multiple medical applications that could rely on an automated system to gauge individuals' mental states.
PMID: 28213145 [PubMed - as supplied by publisher]
Modeling and Validating HL7 FHIR Profiles Using Semantic Web Shape Expressions (ShEx).
J Biomed Inform. 2017 Feb 14;:
Authors: Solbrig HR, Prud'hommeaux E, Grieve G, McKenzie L, Mandel JC, Sharma DK, Jiang G
BACKGROUND: HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging open standard for the exchange of electronic healthcare information. FHIR resources are defined in a specialized modeling language. FHIR instances can currently be represented in either XML or JSON. The FHIR and Semantic Web communities are developing a third FHIR instance representation format in Resource Description Framework (RDF). Shape Expressions (ShEx), a formal RDF data constraint language, is a candidate for describing and validating the FHIR RDF representation.
OBJECTIVE: Create a FHIR to ShEx model transformation and assess its ability to describe and validate FHIR RDF data.
METHODS: We created the methods and tools that generate the ShEx schemas modeling the FHIR to RDF specification being developed by HL7 ITS/W3C RDF Task Force, and evaluated the applicability of ShEx in the description and validation of FHIR to RDF transformations.
RESULTS: The ShEx models contributed significantly to workgroup consensus. Algorithmic transformations from the FHIR model to ShEx schemas and FHIR example data to RDF transformations were incorporated into the FHIR build process. ShEx schemas representing 109 FHIR resources were used to validate 511 FHIR RDF data examples from the Standards for Trial Use (STU 3) Ballot version. We were able to uncover unresolved issues in the FHIR to RDF specification and detect 10 types of errors and root causes in the actual implementation. The FHIR ShEx representations have been included in the official FHIR web pages for the STU 3 Ballot version since September 2016.
DISCUSSION: ShEx can be used to define and validate the syntax of a FHIR resource, which is complementary to the use of RDF Schema (RDFS) and Web Ontology Language (OWL) for semantic validation.
CONCLUSION: ShEx proved useful for describing a standard model of FHIR RDF data. The combination of a formal model and a succinct format enabled comprehensive review and automated validation.
PMID: 28213144 [PubMed - as supplied by publisher]
Application of Human Augmentics: A Persuasive Asthma Inhaler.
J Biomed Inform. 2017 Feb 10;:
Authors: Grossman B, Conner S, Mosnaim G, Albers J, Leigh J, Jones S, Kenyon R
This article describes a tailored health intervention delivered on a mobile phone platform, integrating low-literacy design strategies and basic principles of behavior change, to promote increased adherence and asthma control among underserved minority adolescents. We based the intervention and design principles on theories of Human Augmentics and the Elaboration Likelihood Model. We tested the efficacy of using electronic monitoring devices that incorporate informative and persuasive elements to improve adherence to a prescribed daily medication regimen intended to reduce use of asthma rescue medications. We describe the theoretical framework, hardware and software systems, and results of user testing for design purposes and a clinical pilot study incorporating use of the device and software by the targeted population. The results of the clinical pilot study showed an 83% completion rate for the treatment as well as improved adherence. Of note, 8% and 58% of participants achieved clinically significant adherence targets at baseline and last week of the study, respectively. Rescue asthma medication use decreased from a median of 3 puffs per week at baseline to 0 puffs per week during the last week of the study.
PMID: 28193465 [PubMed - as supplied by publisher]
iGAS: A Framework for Using Electronic Intraoperative Medical Records for Genomic Discovery.
J Biomed Inform. 2017 Feb 10;:
Authors: Levin MA, Joseph TT, Jeff JM, Nadukuru R, Ellis SB, Bottinger EP, Kenny EE
OBJECTIVE: Design and implement a HIPAA and Integrating the Healthcare Enterprise (IHE) profile compliant automated pipeline, the integrated Genomics Anesthesia System (iGAS), linking genomic data from the Mount Sinai Health System (MSHS) BioMe biobank to electronic anesthesia records, including physiological data collected during the perioperative period. The resulting repository of multi-dimensional data can be used for precision medicine analysis of physiological readouts, acute medical conditions, and adverse events that can occur during surgery.
MATERIALS AND METHODS: A structured pipeline was developed atop our existing anesthesia data warehouse using open-source tools. The pipeline is automated using scheduled tasks. The pipeline runs weekly, and finds and identifies all new and existing anesthetic records for BioMe participants.
RESULTS: The pipeline went live in June 2015 with 49.2% (n=15,673) of BioMe participants linked to 40,947 anesthetics. The pipeline runs weekly in minimal time. After eighteen months, an additional 3,671 participants were enrolled in BioMe and the number of matched anesthetic records grew 21% to 49,545. Overall percentage of BioMe patients with anesthetics remained similar at 51.1% (n=18,128). Seven patients opted out during this time. The median number of anesthetics per participant was 2 (range 1-144). Collectively, there were over 35 million physiologic data points and 480,000 medication administrations linked to genomic data. To date, two projects are using the pipeline at MSHS.
CONCLUSION: Automated integration of biobank and anesthetic data sources is feasible and practical. This integration enables large-scale genomic analyses that might inform variable physiological response to anesthetic and surgical stress, and examine genetic factors underlying adverse outcomes during and after surgery.
PMID: 28193464 [PubMed - as supplied by publisher]
Use of Ontology Structure and Bayesian Models to Aid the Crowdsourcing of ICD-11 Sanctioning Rules.
J Biomed Inform. 2017 Feb 10;:
Authors: Lou Y, Tu SW, Nyulas C, Tudorache T, Chalmers RJ, Musen MA
The International Classification of Diseases (ICD) is the de facto standard international classification for mortality reporting and for many epidemiological, clinical, and financial use cases. The next version of ICD, ICD-11, will be submitted for approval by the World Health Assembly in 2018. Unlike previous versions of ICD, where coders mostly select single codes from pre-enumerated disease and disorder codes, ICD-11 coding will allow extensive use of multiple codes to give more detailed disease descriptions. For example, "severe malignant neoplasms of left breast" may be coded using the combination of a "stem code" (e.g., code for malignant neoplasms of breast) with a variety of "extension codes" (e.g., codes for laterality and severity). The use of multiple codes (a process called post-coordination), while avoiding the pitfall of having to pre-enumerate vast number of possible disease and qualifier combinations, risks the creation of meaningless expressions that combine stem codes with inappropriate qualifiers. To prevent that from happening, "sanctioning rules" that define legal combinations are necessary. In this work, we developed a crowdsourcing method for obtaining sanctioning rules for the post-coordination of concepts in ICD-11. Our method utilized the hierarchical structures in the domain to improve the accuracy of the sanctioning rules and to lower the crowdsourcing cost. We used Bayesian networks to model crowd workers' skills, the accuracy of their responses, and our confidence in the acquired sanctioning rules. We applied reinforcement learning to develop an agent that constantly adjusted the confidence cutoffs during the crowdsourcing process to maximize the overall quality of sanctioning rules under a fixed budget. Finally, we performed formative evaluations using a skin-disease branch of the draft ICD-11 and demonstrated that the crowd-sourced sanctioning rules replicated those defined by an expert dermatologist with high precision and recall. This work demonstrated that a crowdsourcing approach could offer a reasonably efficient method for generating a first draft of sanctioning rules that subject matter experts could verify and edit, thus relieving them of the tedium and cost of formulating the initial set of rules.
PMID: 28192233 [PubMed - as supplied by publisher]
Distinguishing Surgical Behavior by Sequential Pattern Discovery.
J Biomed Inform. 2017 Feb 04;:
Authors: Huaulmé A, Voros S, Riffaud L, Forestier G, Moreau-Gaudry A, Jannin P
OBJECTIVE: Each surgical procedure is unique due to patient's and also surgeon's particularities. In this study, we propose a new approach to distinguish surgical behaviors between surgical sites, levels of expertise and individual surgeons thanks to a pattern discovery method.
METHODS: The developed approach aims to distinguish surgical behaviors based on shared longest frequent sequential patterns between surgical process models. To allow clustering, we propose a new metric called SLFSP. The approach is validated by comparison with a clustering method using Dynamic Time Warping as a metric to characterize the similarity between surgical process models.
RESULTS: Our method outperformed the existing approach. It was able to make a perfect distinction between surgical sites (accuracy of 100%). We reached an accuracy superior to 90% and 85% for distinguishing levels of expertise and individual surgeons.
CONCLUSION: Clustering based on shared longest frequent sequential patterns outperformed the previous study based on time analysis.
SIGNIFICANCE: The proposed method shows the feasibility of comparing surgical process models, not only by their duration but also by their structure of activities. Furthermore, patterns may show risky behaviors, which could be an interesting information for surgical training to prevent adverse events.
PMID: 28179119 [PubMed - as supplied by publisher]
Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification.
J Biomed Inform. 2017 Feb 02;:
Authors: Elyasigomari V, Lee DA, Screen HR, Shaheed MH
For each cancer type, only a few genes are informative. Due to the so-called 'curse of dimensionality' problem, the gene selection task remains a challenge. To overcome this problem, we propose a two-stage gene selection method called MRMR-COA-HS. In the first stage, the minimum redundancy and maximum relevance (MRMR) feature selection is used to select a subset of relevant genes. The selected genes are then fed into a wrapper setup that combines a new algorithm, COA-HS, using the support vector machine as a classifier. The method was applied to four microarray datasets, and the performance was assessed by the leave one out cross-validation method. Comparative performance assessment of the proposed method with other evolutionary algorithms suggested that the proposed algorithm significantly outperforms other methods in selecting a fewer number of genes while maintaining the highest classification accuracy. The functions of the selected genes were further investigated, and it was confirmed that the selected genes are biologically relevant to each cancer type.
PMID: 28163197 [PubMed - as supplied by publisher]
Extraction of Left Ventricular Ejection Fraction Information from Various Types of Clinical Reports.
J Biomed Inform. 2017 Feb 02;:
Authors: Kim Y, Garvin JH, Goldstein MK, Hwang TS, Redd A, Bolton D, Heidenreich PA, Meystre SM
Efforts to improve the treatment of congestive heart failure, a common and serious medical condition, include the use of quality measures to assess guideline-concordant care. The goal of this study is to identify left ventricular ejection fraction (LVEF) information from various types of clinical notes, and to then use this information for heart failure quality measurement. We analyzed the annotation differences between a new corpus of clinical notes from the Echocardiography, Radiology, and Text Integrated Utility package and other corpora annotated for natural language processing (NLP) research in the Department of Veterans Affairs. These reports contain varying degrees of structure. To examine whether existing LVEF extraction modules we developed in prior research improve the accuracy of LVEF information extraction from the new corpus, we created two sequence-tagging NLP modules trained with a new data set, with or without predictions from the existing LVEF extraction modules. We also conducted a set of experiments to examine the impact of training data size on information extraction accuracy. We found that less training data is needed when reports are highly structured, and that combining predictions from existing LVEF extraction modules improves information extraction when reports have less structured formats and a rich set of vocabulary.
PMID: 28163196 [PubMed - as supplied by publisher]
Driven to Distraction: The Nature and Apparent Purpose of Interruptions in Critical Care and Implications for HIT.
J Biomed Inform. 2017 Jan 31;:
Authors: Mamykina L, Carter EJ, Sheehan B, Stanley Hum R, Twohig BC, Kaufman DR
OBJECTIVES: To examine the apparent purpose of interruptions in a Pediatric Intensive Care Unit and opportunities to reduce their burden with informatics solutions.
MATERIALS AND METHODS: In this prospective observational study, researchers shadowed clinicians in the unit for one hour at a time, recording all interruptions participating clinicians experienced or initiated, their starting time, duration, and a short description that could help to infer their apparent purpose. All captured interruptions were classified inductively on their source and apparent purpose and on the optimal representational media for fulfilling their apparent purpose.
RESULTS: The researchers observed thirty-four one-hour sessions with clinicians in the unit, including 21 nurses and 13 residents and house physicians. The physicians were interrupted on average 11.9 times per hour and interrupted others 8.8 times per hour. Nurses were interrupted 8.6 times per hour and interrupted others 5.1 times per hour. The apparent purpose of interruptions included Information Seeking and Sharing (n=259, 46.3%), Directives and Requests (n=70, 12%), Shared Decision-Making (n=49, 8.8%), Direct Patient Care (n=36, 6.4%), Social (n=71, 12.7%), Device Alarms (n=28, 5%), and Non-Clinical (n=10, 1.8%); 6.6% were not classified due to insufficient description. Of all captured interruptions, 29.5% were classified as being better served with informational displays or computer-mediated communication.
CONCLUSIONS: Deeper understanding of the purpose of interruptions in critical care can help to distinguish between interruptions that require face-to-face conversation and those that can be eliminated with informatics solutions. The proposed taxonomy of interruptions and representational analysis can be used to further advance the science of interruptions in clinical care.
PMID: 28159645 [PubMed - as supplied by publisher]