BAM Lab Projects

Discovering and diagnosing PTSD (post traumatic stress disorder) from doctors' chart notes using NLP and text analytics

This is a collaborative project funded by IBM and CIMVHR. Academic collaborators in this project are from School of Family Medicine, University of Manitoba and and Queen's University, and Department of Clinical Psychology, Western University. The project will look into extracting doctors' chart notes from the EMR systems, anonymizing the data, developing a gold standard of the epidemiology of PTSD by manually inspecting the chart notes and also applying NLP and text mining techniques to identify cases of PTSD in the medical data of military veterans and their family members, compute statistics on care quality, prevalence and severity of PTSD in the population. The project will validate the effectiveness of the computational approach by validating the results against the gold standard and extending it with machine learning techniques for diagnosis of PTSD. The clinical part of the study will look into the statistics of suicidal attempts in patients diagnosed with PTSD.

A Multilevel Streaming Data Analytics Infrastructure for Predictive AnalyticsStreaming data analytics architecture

Streaming data from a variety of data sources and IoT are processed by the streaming data processing engines using filtering and aggregation operations before they are stored on the disk. With large volume and velocity of streaming data more time consuming operations such as application of machine learning algorithms can cause the flow to suffer from bottleneck. The project will look into creating a multi-level stream data processing framework where filtered and partially processed data can be temporarily stored into memory for more time consuming analytics after which decisions can be taken to discard the unnecessary data and store the knowledge on the hard disk. A variety of stream processing and in-memory storage systems will be studied and a streaming data analytics framework will be developed to analyze data from the web for financial market prediction for the industry partner Gnowit. The project is funded by IBM and SOSCIP (Southern Ontario Smart Computing Innovation Platform).

Medical / Health Data Analytics

Research on Electronic Medical Records (EMR) and the data stored in the CPCSSN (Canadian Primary Care Sentinel Services Network) data bank. CPCSSN store currently holds 1.5 million patient data shared by primary care physicians from many provinces and territories across Canada and serves as a valuable source of anonymized health data.

Using this network, the BAM lab was able to perform analysis to predict hypertension in patients, seen below.

Predictive model for predicting hypertension

Developed a neural network model to predict hypertension from patients' health records. We obtained about 82% accuracy. [Lafreniere et al., 2016]

Chart notes analytics containing unstructured/semi-structured text data

Analyzing patients' chart notes in the EMR to anonymize the data and automatically extract relevant disease and diagnosis information related to lower back pain.

Created a tool:

  • Medical text processing: Extracting knowledge from doctors chart notes for analyzing and predicting disease status.[Michael Judd - ongoing work]
    The tool uses a 3rd party tool for anonymizing the text data, extract medical terminology, and links the information in multiple data sources for analytics.

Fig. 2 - The data processing flow in the artificial neural
network model for predicting hypertention.

Text Data
Text Data Analytics

Goal: Implement text (structured /unstructured /semi-structured) analytics algorithms for big data on in-memory and stream data processing engines and develop novel analytics tools.

Created multiple tools:

  • SPARK-PSO: Text clustering using Particle Swarm Optimization on Spark [Sherar et al., 2017]
  • CAPRI Tool: Semi-structured multiline log data [Zulkernine et al., 2012]

Fig. 3 - OSCAR EMR chart notes.

Big Data
Big Data Analytics

Created a tool named BiNARY (A Big Data Integration tool for Adhoc Query Processing) [Eftekhari et al., 2016]

Description: A tool that provides a graphical web based interface to facilitate execution of big data query from multiple hybrid back-end data sources.

Fig. 4 - BiNARY Framework outline

Cloud and Service Oriented Architectures

Created CLAaaS, which refers to Cloud-based Analytics as a Tool. [Zulkernine at al., 2013]

This software / tool is a framework to enable big data analytics on the cloud and knowledge sharing.

Fig. 5 - CLAaaS Architecture

Machine Learning

Used Machine Learning Algorithms to teach a vehicle to successfully navigate a track in the shortest time possible. [Song et al., 2017]

Fitness Criteria:
  • Whether the vehicle navigates the track successfully without hitting any walls.
  • Speed of the Vehicle


Stock Market Prediction

We implemented the three component Proposed Hybrid Model (PHM) [Wang et al., 2012] to predict daily stock index.

Although PHM performed well for weekly stock prices, the results showed that the back propagation neural network (BPNN) model performed better than the other two component models of ARIMA (Auto Regressive Integrated Moving Average) and the ESM (Exponential Smoothing Model).