Deep-Confidentiality : An IoT-Enabled Privacy-Preserving Framework for Unstructured Big Biomedical Data
Due to the Internet of Things evolution, the clinical data is exponentially growing and using smart technologies. The generated big biomedical data is confidential, as it contains a patient’s personal information and findings. Usually, big biomedical data is stored over the cloud, making it convenient to be accessed and shared. In this view, the data shared for research purposes helps to reveal useful and unexposed aspects. Unfortunately, sharing of such sensitive data also leads to certain privacy threats. Generally, the clinical data is available in textual format (e.g., perception reports). Under the domain of natural language processing, many research studies have been published to mitigate the privacy breaches in textual clinical data. However, there are still limitations and shortcomings in the current studies that are inevitable to be addressed. In this article, a novel framework for textual medical data privacy has been proposed as Deep-Confidentiality . The proposed framework improves Medical Entity Recognition (MER) using deep neural networks and sanitization compared to the current state-of-the-art techniques. Moreover, the new and generic utility metric is also proposed, which overcomes the shortcomings of the existing utility metric. It provides the true representation of sanitized documents as compared to the original documents. To check our proposed framework’s effectiveness, it is evaluated on the i2b2-2010 NLP challenge dataset, which is considered one of the complex medical data for MER. The proposed framework improves the MER with 7.8% recall, 7% precision, and 3.8% F1-score compared to the existing deep learning models. It also improved the data utility of sanitized documents up to 13.79%, where the value of the k is 3.
- Book Chapter
10
- 10.1007/978-3-319-33525-4_16
- Jan 1, 2016
Two scientific domains that are crucial in “Biomedical Big Data”, computing and statistics, do not typically require “training in the responsible conduct of research” or research ethics. While “responsible conduct of research” (RCR) comprises interactions with subjects (human and non-human), it also involves interactions with other scientists, the scientific community, the public, and in some contexts, research funders. Historically, the development or emergence of disciplines and professions tend to involve a semi-simultaneous emergence of professional norms and/or codes of conduct. However, Biomedical Big Data is not emerging as a single discipline or profession, and engages practitioners from many diverse backgrounds. Moreover, the place of the data analyst or the computer scientist developing analytic algorithms seems to be too granular to be considered specifically within the activities that comprise “responsible research and innovation” (RRI). Current legal and policy-level considerations of Biomedical Big Data and RRI are implicitly assuming that scientists carrying out the research and achieving the innovations are exercising their scientific freedom – i.e., conducting research – responsibly. The assumption is that all scientists are trained to conduct research responsibly. In the United States, federal agencies funding research require that training in RCR be included – some of the time. Because the vast majority of research that was federally funded has not included Biomedical Big Data, RCR training paradigms have emerged over the past 20 years in US institutions that are not particularly relevant for Big Data. While it would be efficient to utilize such established, well-known, easily-documented RCR training programs, this chapter discusses how and why this is less likely to support the development of professional norms that are relevant for Biomedical Big Data. This chapter will describe an alternative approach that can support ongoing reflection on professional obligations, which can be used in a wide range of ethical, legal, and social implications (ELSI), including those that have not yet been identified. This may be the greatest strength of this alternative approach for preparing practitioners for Biomedical Big Data, because the ability to apply prior learning in ethics to previously unseen problems is especially critical in the current era of dynamic and massive data accumulation. To support the development of normative ethical practices among practitioners in Biomedical Big Data, this chapter reviews the guidelines for professional practice from three statistical associations (American Statistical Association; Royal Statistics Society; International Statistics Institute) and from the Association of Computing Machinery. These can be leveraged to ensure that, in their work with Biomedical Big Data, participants know and understand the ethical, legal, and social implications of that work. Formal integration of these (or other relevant) guidelines into the preparation for practice with data (big and small) can help in dealing with ethical challenges currently arising with Big Data in biomedical research; moreover, this integration can also help deal with challenges that have not yet arisen. These outcomes, which are consistent with recent calls for the institutionalization of reflection and reasoning around ELSI across scientific disciplines, in Europe, are only possible as long as the integration effort does not follow a currently-dominant paradigm for training in RCR. Preparing scientists to engage competently in conversations around ethical issues in Biomedical Big Data requires purposeful, discipline-relevant, and developmental training that can come from, and support, a culture of ethical biomedical research and practice with Big Data.
- Research Article
7
- 10.20517/jsegc.2021.07
- Jan 1, 2022
- Journal of Smart Environments and Green Computing
Big data have been in use since the 1990s, which usually include some complex data sets whose sizes are beyond the ability of commonly used software to handle within a reasonable period of time. In recent years, big data analytics by providing personalized medicine and regulation analysis, providing clinical risk intervention and forecast analysis, reducing waste and nursing patients with external and internal variability, standardization of medical terminology and patient registration, and fragmentation of the solution, help to improve health care. This paper provides an overview of the contents of big data healthcare. We summarize some kinds of medical big data, including the electronic health records, the medical image data, the healthcare system big data, the health Internet of Things and healthcare informatics, the remote medical monitoring big data, the biomedical big data, and other sources of big data. Furthermore, we discuss some methods for handling different kinds of medical big data. Additionally, we analyze the privacy of medical big data and summarize some methods and technologies to protect privacy. Aiming at some special cases, we list some other analyses and methods for them. Most importantly, we discuss the potential challenges and future research directions related to big data healthcare.
- Research Article
15
- 10.1016/j.ijmedinf.2019.05.022
- Jun 5, 2019
- International Journal of Medical Informatics
Measuring the effect of different types of unsupervised word representations on Medical Named Entity Recognition
- Conference Article
3
- 10.1145/3359789.3359829
- Dec 9, 2019
The literature has extensively studied various location privacy-preserving mechanisms (LPPMs) in order to improve the location privacy of the users of location-based services (LBSes). Such privacy, however, comes at the cost of degrading the utility of the underlying LBSes. The main body of previous work has used a generic distance-only based metric to quantify the quality loss incurred while employing LPPMs. In this paper, we argue that using such generic utility metrics misleads the design and evaluation of LPPMs, since generic utility metrics do not capture the actual utility perceived by the users. We demonstrate this for ride-hailing services, a popular class of LBS with complex utility behavior. Specifically, we design a privacy-preserving ride-hailing service, called PRide, and demonstrate the significant distinction between its generic and tailored metrics. Through various experiments we show the significant implications of using generic utility metrics in the design and evaluation of LPPMs. Our work concludes that LPPM design and evaluation should use utility metrics that are tailored to the individual LBSes.
- Research Article
3
- 10.52214/vib.v7i.8403
- Jun 2, 2021
- Voices in Bioethics
Photo by Josh Riemer on Unsplash
 Introduction
 With the rapid advancements in neurotechnological machinery and improved analytical insights from machine learning in neuroscience, the availability of big brain data has increased tremendously. Neurological health research is done using digitized brain data.[1] There must be adequate data governance to secure the privacy of subjects participating in brain research and treatments. If not properly regulated, the research methods could lead to significant breaches of the subject’s autonomy and privacy. This paper will address the necessity for neuroprotection laws, which effectively govern the use of big brain data to ensure respect for patient privacy and autonomy.
 Background
 Artificial intelligence and machine learning can be integrated with neuroscience big brain data to drive research studies. This integrative technology allows patterns of electrical activity in neurons to be studied in detail.[2]Specifically, it uses a robotic system which can reason, plan, and exhibit biologically intelligent behavior. Machine learning is a method of computer programming where the code can adapt its behavior based on big brain data.[3] The big brain data is the collection of large amounts of information for the purpose of deciphering patterns through computer analysis using machine learning.[4] The information that these technologies provide is extensive enough to allow a researcher to read a patient’s mind. AI and machine learning technologies work by finding the underlying structure of brain data, which is then described by patterns known as latent factors, eventually resulting in an understanding of the brain’s temporal dynamics.[5]
 Through these technologies, researchers are able to decipher how the human brain computes its performances and thoughts. However, due to the extensive and complex nature of the data processed through AI and machine learning, researchers may gain access to personal information a patient may not wish to reveal. From a bioethical lens, tensions arise in the realm of patient autonomy. Patients are not able to control the transmission of data from their brains that is analyzed by researchers. Governing brain data through laws may enhance the extent of patient privacy in the case where brain data is being used through AI technologies.[6] A responsible approach to governing brain data would require a sophisticated legal structure.
 Analysis
 Impact on Patient Autonomy and Privacy 
 In research pertaining to big brain data, the consent forms do not fully cover the vast amounts of information that is collected. According to research, personal data has become the most sought out commodity to provide content to corporations and the web-based service industry. Unfortunately, data leaks that release private information frequently occur.[7] The storage of an individual’s data on technologies accessible on the internet during research studies makes it vulnerable to leaks, jeopardizing an individual’s privacy. These data leaks may cause the patient to be identified easily, as the degree of information provided by AI technologies are personalized and may be decoded through brain fingerprinting methods.[8]
 There has been an extensive growth in the development and use of AI. It is efficient in providing information to radiologists who diagnose various diseases including brain cancer and psychiatric disease, and AI assists in the delivery of telemedicine.[9] However, the ethical pitfall of reduced patient autonomy must be addressed by analyzing current AI technologies and creating more options for patient preference in how the data may be used. For instance, facial recognition technology[10] commonly used in health care produces more information than listed in common consent forms, threatening to undermine informed consent. Facial recognition software collects extensive data and may disclose more information than a person would prefer to provide despite being a useful tool for diagnosing medical and genetic conditions.[11] In addition, people may not be aware that their images are being used to generate more clinical data for other purposes. It is difficult to guarantee the data is anonymized. Consent requirements must include informing people about the complexity of the potential uses of the data; software developers should maximize patient privacy.[12] Furthermore, there is a “human element” in the use of AI technologies as medical providers control the use and the extent to which data is captured or accessed through the AI technologies.[13] People must understand the scope of the technology and have clear communication with the physician or health care provider about how the medical information will be used. 
 Existing Laws for Brain Data Governance 
 A strict system of defined legal responsibilities of medical providers will ensure a higher degree of patient privacy and autonomy when AI technologies and data from machine learning are used. Governing specific algorithmic data is crucial in safeguarding a patient’s privacy and developing a gold standard treatment protocol following the procurement of the information.[14] Certain AI technologies provide more data than others, and legal boundaries should be established to ensure strong performance, quality control, and scope for patient privacy and autonomy. For instance, currently AI technologies are being used in the realm of intensive neurological care. However, there is a significant level of patient uncertainty about how much control patients have over the data’s uses.[15] Calibrated legal and ethical standards will allow important brain data to be securely governed and monitored.
 Once brain signals are recorded and processed from one individual, the data may be merged with other data in Brain Computer Interface Technology (BCI).[16] To ensure a right and ability to retrieve personal data or pull it from the collection, specific regulations for varying types of data are needed.[17] The importance of consent and patient privacy must be considered through giving patients a transparent view of how brain data is governed.[18] The legal system must address discriminatory issues and risks to patients whose data is used in studies. Laws like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Protection Act (CCPA) can serve as effective models to protect aggregated data. These laws govern consumer information and ensure the compliance when personal data is collected.[19] California voters recently approved expansion of the CCPA to health data. The Washington Privacy Act, which would have provided rights to access, change, and withdraw personal data, failed to pass. Other states should improve privacy as well,[20] although a federal bill would be preferable. Scientists at the Heidelberg Academy of Sciences argue for data security to be governed in a manner that balances patient privacy and autonomy with the commercial interests of researchers.[21] The balance could be achieved through privacy protections like those in the Washington Privacy Act. Although the Health Insurance Portability and Accountability Act (HIPAA) provides an overall framework to deter the likelihood of dangers to patient protection and privacy, more thorough laws are warranted to combat pervasive data transfer and analysis that technology has brought to the health care industry.[22] Breaches of patient privacy under current HIPAA regulations include releasing patient information to a reporter without their consent and sending HIV data to a patient’s employer without consent.[23] HIPAA does not cover information being shared with outside contractors who do not have an agreement with technology companies to keep patient data confidential. HIPAA regulations also do not always address blatant breaches on patient data confidentiality.[24] Patients must be provided with methods to monitor the data being analyzed to be able to view the extent of private information being generated via AI technologies. In health research, the medical purposes of better diagnosis, earlier detection of diseases, or prevention are ethical justifications for the use of the data if it was collected with permission, the person understood and approved the uses of the data, and the data was deidentified.
 A standard governance framework is required in providing the fairest system of care to patients who allow their brain data to be examined. Informed consent in the neuroscience field could reaffirm the privacy and autonomy of patients by ensuring that they understand the type of information collected. Laws also could protect data after a patient’s death. Malpractice in the scope of brain data could give people a cause of action critical in safeguarding patient’s rights. Data breach lawsuits will become common but generally do not cover deidentified data that becomes part of big data collection. A more synchronized approach to the collection and consent process will encourage an understanding of how big data is used to diagnose and treat patients. Some altruistic people may even be more likely to consent if they know the largescale data collection is helpful to treat and diagnose people. Others should have the ability to opt out of sharing neurological data, especially when there is not certainty surrounding deidentification.[25]
 Conclusion
 Artificial intelligence and machine learning technologies have the potential to aid in the diagnosis and treatment of people globally by extracting and aggregating brain data specific to individuals. However, the secure use of the data is necessary to build trust between care providers and patients, as well as in balancing the bioethical principles of beneficence and patient autonomy. We must ensure the highest quality of care to patients, while protecting their privacy, informed consent, and clinical trust. More sophis
- Research Article
50
- 10.1186/s12911-016-0357-5
- Aug 30, 2016
- BMC Medical Informatics and Decision Making
BackgroundHealthcare providers generate a huge amount of biomedical data stored in either legacy system (paper-based) format or electronic medical records (EMR) around the world, which are collectively referred to as big biomedical data (BBD). To realize the promise of BBD for clinical use and research, it is an essential step to extract key data elements from unstructured medical records into patient-centered electronic health records with computable data elements. Our objective is to introduce a novel solution, known as a double-reading/entry system (DRESS), for extracting clinical data from unstructured medical records (MR) and creating a semi-structured electronic health record database, as well as to demonstrate its reproducibility empirically.MethodsUtilizing the modern cloud-based technologies, we have developed a comprehensive system that includes multiple subsystems, from capturing MRs in clinics, to securely transferring MRs, storing and managing cloud-based MRs, to facilitating both machine learning and manual reading, and to performing iterative quality control before committing the semi-structured data into the desired database. To evaluate the reproducibility of extracted medical data elements by DRESS, we conduct a blinded reproducibility study, with 100 MRs from patients who have undergone surgical treatment of lung cancer in China. The study uses Kappa statistic to measure concordance of discrete variables, and uses correlation coefficient to measure reproducibility of continuous variables.ResultsUsing the DRESS, we have demonstrated the feasibility of extracting clinical data from unstructured MRs to create semi-structured and patient-centered electronic health record database. The reproducibility study with 100 patient’s MRs has shown an overall high reproducibility of 98 %, and varies across six modules (pathology, Radio/chemo therapy, clinical examination, surgery information, medical image and general patient information).ConclusionsDRESS uses a double-reading, double-entry, and an independent adjudication, to manually curate structured data elements from unstructured clinical data. Further, through distributed computing strategies, DRESS protects data privacy by dividing MR data into de-identified modules. Finally, through internet-based computing cloud, DRESS enables many data specialists to work in a virtual environment to achieve the necessary scale of processing thousands MRs within days. This hybrid system represents probably a workable solution to solve the big medical data challenge.Electronic supplementary materialThe online version of this article (doi:10.1186/s12911-016-0357-5) contains supplementary material, which is available to authorized users.
- Research Article
3
- 10.3390/app12178539
- Aug 26, 2022
- Applied Sciences
Chinese medical texts contain a large number of medically named entities. Automatic recognition of these medical entities from medical texts is the key to developing medical informatics. In the field of Chinese medical information extraction, annotated Chinese medical text data are very few. In the named entity recognition task, there is insufficient labeled data, which leads to low model recognition performance. Therefore, this paper proposes a Chinese medical entity recognition model based on multi-neural network fusion and the improved Tri-Training algorithm. The model performs semi-supervised learning by improving the Tri-Training algorithm. According to the characteristics of the medical entity recognition task and medical data, the method in this paper is improved in terms of the division of the initial sub-training set, the construction of the base classifier, and the integration of the learning voting method. In addition, this paper also proposes a multi-neural network fusion entity recognition model for base classifier construction. The model learns feature information jointly by combining Iterated Dilated Convolutional Neural Network (IDCNN) and BiLSTM. Through experimental verification, the model proposed in this paper outperforms other models and improves the performance of the Chinese medical entity recognition model by incorporating and improving the semi-supervised learning algorithm.
- Research Article
23
- 10.15252/embr.201642917
- Jul 11, 2016
- EMBO reports
The trading of personal medical and health data will drastically increase given the boon of mobile devices used to monitor health, and the interest large internet companies have in this resource. Legal frameworks need to be adapted and upgraded to protect patient privacy and consent, and to ensure the quality of individual health data for research.
- Research Article
5
- 10.1088/1742-6596/1624/3/032041
- Oct 1, 2020
- Journal of Physics: Conference Series
All medical big data reuse projects are faced with the challenging problem of collecting and transforming heterogeneous data from different sources in a distributed research network. Both openEHR and OMOP CDM are open source tools for medical data. In this paper, the principles, feasibility, implementation, and characteristics of the two main clinical data secondary use methods are compared and discussed. We analyzed two data conversion frameworks in the medical data secondary utilization project conducted in China and the United States, and summarized the experience of designing the data ETL process, and compared the principles, implementation, characteristics between openEHR-based data acquisition system and reusing medical data approach based on Common Data Model with literature. OpenEHR from the Scandinavian countries is one of promising two-level modeling approach to extract data from various medical databases. It separates the operations of medical experts and software engineers, and changes in medical knowledge can be embedded in the new prototypes without affecting the EHR system. However, some shortcomings overshadow its advantages, such as poor compatibility with medical data other than EHR, difficulties in defining prototypes, steep learning curve, and the lack of mature development tools and guidelines. We adopted a minimalist data transformation model in Xiangya medical big data acquisition system based on openEHR to solve the large-scale data exchange problem faced by the distributed clinical data center. Many experimental projects have proved the feasibility and utility of OMOP CDM for multiple, disparate health databases. This is why it is widely used for the model framework of patient-level prediction and safety surveillance, including a transformation from source data into standard vocabulary, which solves semantic interoperability; technology neutrality that does not rely on special computer technology; open community, open resources, free tools; generating aggregated analysis results directly from desensitized data, etc. Some issues should be under consideration in the use. Not all source data encodings can be converted to standard vocabulary, and there will be a loss of semantics, and concepts matching requires a lot of time and effort. The model and vocabulary were originally developed and designed for pharmaceutical safety research and clinical observation data, while the development of vocabularies in other fields is limited. In conclusion, both openEHR and CDMs are designed for exporting and reusing data from a distributed clinical database. The former is suitable for collecting data from distributed EHR systems and building medical big data warehouses, while the latter is a better model for sharing data in some decentralized medical database.
- Book Chapter
4
- 10.1007/978-981-19-2199-5_20
- Jan 1, 2022
Currently, the value in biomedical big data has been ascending. Notable sources of biomedical big data include medical claims data, electronic medical records, public health data, medical research data, medical market and costs data, individual behavior and mentality data, human genetics and omics data, social demography data, environmental and geographic information data, and health network and media data. Biomedical big data can be applied to conduct omics studies and association studies among different omics, quickly search for biomarkers and develop new drugs, rapidly screen unknown pathogens and find suspicious agents, carry out the real-time implementation of biological monitoring and public health surveillance, prevent and control of emerging infectious diseases, understand the changing spectrum of diseases in the population, evaluate the efficacy and safety of drugs and vaccines in real world, conduct precise health management, and allow more powerful work on data mining. Meanwhile, biomedical big data does face challenges, including standardization and normalization of data collection, merging of current data islands, computer management of massive data, efficient use of data for research and practice, and lack of talents with the comprehensive expertise in both biomedicine and information technology. Biomedical big data will change the pattern of medical practice and improve the quality of healthcare services.
- Research Article
28
- 10.1111/odi.13007
- Dec 28, 2018
- Oral Diseases
Biomedical big data amasses from different sources such as electronic health records, health research, wearable devices and social media. Recent advances in data capturing, storage and analysis techniques have facilitated conversion of a wealth of knowledge in biomedical big data into evidence-based actionable plans to enhance population health and well-being. The delay in reaping the benefits of biomedical big data in dentistry is mainly due to the slow adoption of electronic health record systems, unstructured clinical records, tattered communication between data silos and perceiving oral health as a separate entity from general health. Recent recognition of the complex interplay between oral and general health has acknowledged the power of oral health big data to glean new insights on disease prevention and management. This review paper summarizes recent advances, limitations and challenges in biomedical big data in health care with emphasis on oral health and discusses the potential future applications of oral health big data to improve the quality and efficiency of personalized health care.
- Conference Article
4
- 10.1109/icaicta.2019.8904397
- Sep 1, 2019
In the domain of NLP (Natural Language Processing), there are studies on automated ICD coding from text data. Due to the sparsity of the training data, it is difficult to process codes that represent rare diseases. Therefore, in many previous studies, the number of targeted codes was often limited by the frequency in the training data. In this work, we propose a method for generating ICD-10 codes by using sequence-to-sequence (seq2seq) model. A seq2seq model can automatically learn the regularity, more specifically hierarchical structure, found on the character sequence of ICD codes, so that it makes use of the interactions and relations among the codes during training, which conventional classifier fails to use. As a result, the proposed method can predict even the codes not appeared in training data. In our experiment, we randomly divided the all pairs of ICD-10 code and its description written in natural language into equal halves. By using the first half, we trained a seq2seq model that convert a token sequence of ICD-10 description into a character sequence representing ICD-10 code. Then, we tested the model on the latter half, which are not included in the training data. We found it successfully assigned the corresponding correct code to 80% of the descriptions, while the conventional classifier trained on the same data did not assign any correct code. In addition, we experimented ICD code assignment with medical test data. Our proposed model showed higher accuracy score than the conventional classifier model. Moreover, we implemented a seq2seq model with attention. Investigating the attended tokens during prediction, we analyzed the performance of our seq2seq model.
- Research Article
44
- 10.1371/journal.pone.0182156
- Jul 31, 2017
- PloS one
Open-ended questions have routinely been included in large-scale survey and panel studies, yet there is some perplexity about how to actually incorporate the answers to such questions into quantitative social science research. Tools developed recently in the domain of natural language processing offer a wide range of options for the automated analysis of such textual data, but their implementation has lagged behind. In this study, we demonstrate straightforward procedures that can be applied to process and analyze textual data for the purposes of quantitative social science research. Using more than 35,000 textual answers to the question “What else are you worried about?” from participants of the German Socio-economic Panel Study (SOEP), we (1) analyzed characteristics of respondents that determined whether they answered the open-ended question, (2) used the textual data to detect relevant topics that were reported by the respondents, and (3) linked the features of the respondents to the worries they reported in their textual data. The potential uses as well as the limitations of the automated analysis of textual data are discussed.
- Supplementary Content
29
- 10.1007/s40258-019-00474-7
- Jan 1, 2019
- Applied Health Economics and Health Policy
There is potential value in incorporating biomedical big data (BBD)—observational real-world patient-level genomic and clinical data in multiple sub-populations—into economic evaluations of precision medicine. However, health economists face practical and methodological challenges when using BBD in this context. We conducted a literature review to identify and summarise these challenges. Relevant articles were identified in MEDLINE, EMBASE, EconLit, University of York Centre for Reviews and Dissemination and Cochrane Library from 2000 to 2018. Articles were included if they studied issues relevant to the interconnectedness of biomedical big data, precision medicine, and health economic evaluation. Nineteen articles were included in the review. Challenges identified related to data management, data quality and data analysis. The availability of large volumes of data from multiple sources, the need to conduct data linkages within an environment of opaque data access and sharing procedures, and other data management challenges are primarily practical and may not be long-term obstacles if procedures for data sharing and access are improved. However, the existence of missing data across linked datasets, the need to accommodate dynamic data, and other data quality and analysis challenges may require an evolution in economic evaluation methods. Health economists face challenges when using BBD in economic evaluations of technologies that facilitate precision medicine. Potential solutions to some of these challenges do, however, exist. Going forward, health economists who present work that uses BBD should document challenges and the solutions they have applied to the challenges to support future researcher endeavours.Electronic supplementary materialThe online version of this article (10.1007/s40258-019-00474-7) contains supplementary material, which is available to authorized users.
- Research Article
25
- 10.1155/2017/4898963
- Jan 1, 2017
- Journal of Healthcare Engineering
Medical entity recognition, a basic task in the language processing of clinical data, has been extensively studied in analyzing admission notes in alphabetic languages such as English. However, much less work has been done on nonstructural texts that are written in Chinese, or in the setting of differentiation of Chinese drug names between traditional Chinese medicine and Western medicine. Here, we propose a novel cascade-type Chinese medication entity recognition approach that aims at integrating the sentence category classifier from a support vector machine and the conditional random field-based medication entity recognition. We hypothesized that this approach could avoid the side effects of abundant negative samples and improve the performance of the named entity recognition from admission notes written in Chinese. Therefore, we applied this approach to a test set of 324 Chinese-written admission notes with manual annotation by medical experts. Our data demonstrated that this approach had a score of 94.2% in precision, 92.8% in recall, and 93.5% in F-measure for the recognition of traditional Chinese medicine drug names and 91.2% in precision, 92.6% in recall, and 91.7% F-measure for the recognition of Western medicine drug names. The differences in F-measure were significant compared with those in the baseline systems.