New project: Prevent national health crises by mining public discussion and news to predict vaccination uptake

Prevent national health crises by mining public discussion and news to predict vaccination uptake

This is the proposal that I submitted yesterday to the Knight Foundation health data challenge. See the proposal, and vote on it if you like it, at the Knight News Challenge.

Untitled 3.001


Vaccines matter. We want to predict uptake by mining news and social sources. Our initial pilot in Ireland will focus on the uptake of the HPV vaccine, a critical public health issue for women in Ireland. 

This project will develop algorithms for text analysis of news and social big-data for predicting population-level behavior to predict whether vaccinations (pilot would focus on the HPV vaccination against cervical cancer) update is adequate to prevent national health crisis.
We want to use all possible sources (messy, unstructured data) and use smart algorithms to extract its meaning. So, our “social data” will be the unstructured, text “news” which we will transform, using smart algorithms, into knowledge about population opinions, of use in predicting population-level behavior. 

While the big-data literature is replete with “momentum” predictors (e.g., Google searches predicting flu epidemics; twitter rates predicting movie revenues, frequencies of specific words predicting cultural trends), our work is less about tracking an emerging fad/meme and more about what a whole population is thinking about a particular topic at a given point in time. 

Focus of this pilot: womens’ health  

The project will develop text-analytic techniques for large-scale news and social data to capture population-level opinion with a view to predicting population-level behavior. The focus of this pilot project is the uptake of the HPV vaccine for cervical cancer in Ireland (began in 2008). HPV-vaccine uptake is a critical public health issue for cancer rates in women (but, one which conflicts with religious views on sexual morality). 

Our idea: 

With sufficiently large news and social data sets it has been shown that systematic changes in language-use occur that reflect population-level opinions, sufficiently well to predict population-level decision-making. Specifically, whole distributions of words (which, typically, are power law distributions) reflect the degree of agreement/disagreement in a population on an issue. Systematic changes in weekly power-laws of news and social data can reflect the emerging coalescence of opinions on a topic. To date, this proposal has been demonstrated in the domain of high finance.
Systematic changes in weekly power-laws of the words in financial articles have been shown to track trends in the major stock indices (DJI, NIKKEI, FTSE); using 18,000 articles (10M+ words) it has been shown that, as the 2007 stock-bubble emerged, week-to-week changes in the power-law distributions of verb-phrases correlated strongly with market movements. (See previous research on this by team members involved in this proposal, which was covered in The Economist here.)
These distributional shifts show emerging agreement/disagreement in journalistic-language as reporters use a progressively narrower set of words to describe the market, reporting on the same small set of companies using the same, overwhelmingly positive language. This demonstration suggests news and social data may reflect population opinions, well enough to be used to track changes in critical social opinions in health. 

Describe your project in one sentence.
The prevention of avoidable national health crises by mining news and social data to predict the uptake of vaccinations.
Who is the audience for this project? How does it meet their needs?
Ultimately this project should be applicable to the entire population of all countries where vaccination programmes are available. The audience for this pilot project is the female population of Ireland (initially), however we believe that this will be scalable to the global population.
What does success look like?
The prevention of avoidable national health crises. By informing the public about the risk of national health crises ahead of time we can make a dramatic impact on citizen’s health.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s