LEVERAGING UNSTRUCTURED DATA IN THE FIGHT AGAINST CANCER
Data is at the heart of fighting cancer as it’s central to finding effective ways to prevent, diagnose, and treat the disease. The challenge, however, is that much of the data is unstructured. Examples include clinical notes, surgical and lab reports, clinical trial data, and discharge notes.
A prominent US-based healthcare provider recognized the importance of unstructured data for all cancer-related activities – from research through treatment. However, given the volume and complexity of the work, they knew they needed assistance from a company that had Natural Language Processing expertise, experience with the applied tooling, and a deep understanding of medical terminology. For help, they turned to Klarrio US.
A second project where Klarrio was asked to apply its NLP expertise was the analysis of drug labels.
Every manufacturer submits a label for every drug they register with the United States Food and Drug Administration. The label is structured; however, the structure mostly defines the sections of the label, but the section itself may contain a lot of unstructured data in the form of free text. Extracting elements of interest from the text, such as a list of clinical studies, what arms they have, what adverse reactions each arm had, and the outcomes the adverse reactions resulted in, can enable a much deeper research across all the drugs dataset and correlate reactions and outcomes across the drugs, for example, connecting them to a common ingredient.
Solution
To perform the work, Klarrio devised a process for redeveloping the annotators, which included the following:
Results
Our work with annotators:
- Improved accuracy
- A cleaner system
Our work with the analysis of drug labels:
- Precision and recall for the extracted entities: approximately 95% and 75% (early results).
- The work on improving accuracy continues.
Behind the scenes
To enable the client to find and extract the information needed for their research in large volumes of unstructured data, Klarrio’s initial task was to assess an existing set of annotators to improve their accuracy. Klarrio had the added objective of making the maintenance of the annotators more efficient.
This approach was applied to assess and improve additional existing annotators as well as the development of new annotators.
Regarding the analysis of drug labels, Klarrio used spaCy, an open-source NLP library, to combine its capabilities of a machine-learning Named Entity Recognizer to detect the entities with deep sentence parsing and rule-based matching to detect the relations between the entities. The extracted entities and relationships are saved back into a relational database for further analysis by researchers and clinicians.
The Technology
- Open-source Frameworks, such as spaCy
- Proprietary Analytics
The Expertise
- Data Engineering
- Software Development
- Data Analytics
Join us!
Want to work on similar projects?
Introverts and extroverts, geeks, nerds, and digital poets... Klarrio is the perfect place to learn and teach, experiment and brainstorm, exercise your brain, and feed your passion. Surrounded by people with amazing, world-changing talents.
Contact us!
We're your one-stop cloud-native partner
We design cloud native, cloud agnostic software solutions to empower you to control your data, limit cloud costs, and optimize performance–all without compromise. What can Klarrio do for you today?
Other Projects
Just a few projects examples.