LEVERAGING UNSTRUCTURED DATA IN THE FIGHT AGAINST CANCER

Challenge

Natural Language Processing Expertise

Data is at the heart of fighting cancer as it’s central to finding effective ways to prevent, diagnose, and treat the disease. The challenge, however, is that much of the data is unstructured. Examples include clinical notes, surgical and lab reports, clinical trial data, and discharge notes.

A prominent US-based healthcare provider recognized the importance of unstructured data for all cancer-related activities – from research through treatment. However, given the volume and complexity of the work, they knew they needed assistance from a company that had Natural Language Processing expertise, experience with the applied tooling, and a deep understanding of medical terminology. For help, they turned to Klarrio US.

A second project where Klarrio was asked to apply its NLP expertise was the analysis of drug labels.

Every manufacturer submits a label for every drug they register with the United States Food and Drug Administration. The label is structured; however, the structure mostly defines the sections of the label, but the section itself may contain a lot of unstructured data in the form of free text. Extracting elements of interest from the text, such as a list of clinical studies, what arms they have, what adverse reactions each arm had, and the outcomes the adverse reactions resulted in, can enable a much deeper research across all the drugs dataset and correlate reactions and outcomes across the drugs, for example, connecting them to a common ingredient.

Solution

To perform the work, Klarrio devised a process for redeveloping the annotators, which included the following:

Developing and applying tooling to determine the rules and dictionary dependencies for each annotator.

This led to the elimination of unused rules, restructuring existing dictionaries, and generating new dictionaries.

Introducing reusable models to make maintenance of the annotators more efficient

Establishing Gold Standard document samples for each annotator and assessing the annotators against them for accuracy.

Results

Our work with annotators:

Improved accuracy
A cleaner system

Our work with the analysis of drug labels:

Precision and recall for the extracted entities: approximately 95% and 75% (early results).
The work on improving accuracy continues.

Average precision and recall for the extracted entities

Behind the scenes

To enable the client to find and extract the information needed for their research in large volumes of unstructured data, Klarrio’s initial task was to assess an existing set of annotators to improve their accuracy. Klarrio had the added objective of making the maintenance of the annotators more efficient.

This approach was applied to assess and improve additional existing annotators as well as the development of new annotators.

Regarding the analysis of drug labels, Klarrio used spaCy, an open-source NLP library, to combine its capabilities of a machine-learning Named Entity Recognizer to detect the entities with deep sentence parsing and rule-based matching to detect the relations between the entities. The extracted entities and relationships are saved back into a relational database for further analysis by researchers and clinicians.

The Technology

Open-source Frameworks, such as spaCy
Proprietary Analytics

The Expertise

Data Engineering
Software Development
Data Analytics

Join us!

Want to work on similar projects?

Introverts and extroverts, geeks, nerds, and digital poets… Klarrio is the perfect place to learn and teach, experiment and brainstorm, exercise your brain, and feed your passion. Surrounded by people with amazing, world-changing talents.

We're hiring

Contact us!

Control your destiny

Big Data Engineering. Architecture & Data Platform Engineering. Site Reliability Engineering. Consulting. And customized Open Source projects for companies of all sizes. What can Klarrio do for you today?

Filter Projects by:

Filter Projects by:

LEVERAGING UNSTRUCTURED DATA IN THE FIGHT AGAINST CANCER

Challenge

Natural Language Processing Expertise

Solution

Results

Behind the scenes

The Technology

The Expertise

Join us!

Want to work on similar projects?

Contact us!

Control your destiny

Other Projects

VEHICLE DATA INSIGHTS

BREAKING DOWN DATA SILOS

ELECTRIC VEHICLE FLEET CHARGING

DATABASE REPLICATION

REALTIME NETWORK OUTAGE DETECTION

TALKING TRAFFIC

SMART CITIES

MONITORING FAKE NEWS

DOCKING IN PARADISE

PREDICTING PATIENTS

PREVENTING FRAUD IN MILLISECONDS

CLOUD PLATFORM SCALABILITY & OPTIMISATION

PUBLIC LIGHTING FOR SMART CITIES

ARTIST ROYALTIES

STREAMING DATA QUALITY

FIGHT AGAINST CANCER