Artificial Intelligence and Big Data Analytics are useful to analyze and predict the impact of the COVID-19 virus.
The whole world is paying a high price both in terms of human lifes and social distancing needed to slow the epidemic peak. In this scenario public institutions, companies and independent researchers are involved in the fight against COVID-19. There are many research lines, so that the White House launched an open call through a Kaggle Challenge trying to get some meaningful results.
Revelis couldn’t stand aside and that’s why an internal task-force was created for this goal. Computer Scientists, Data Scientists and Researchers are working hard to achieve tangible outcomes using Big Data Analysis techniques. This activity involves also some researchers from ICAR-CNR.
One of the initial tasks of this task force was to analyze the state-of-the-art, open-data and everything that could be used for Artificial Intelligence solutions against the virus.
We decided to share this list of useful resources in order to allow everyone to benefit for their own project against COVID-19. This list is not exhausting and that’s why we kindly invite you to contact us in order to further expand it, writing an e-mail to email@example.com
Resources are classified into five different categories (“Dataset”, “Research Papers”, “Dashboard”, “Tutorial” and “Others”), to allow an easier navigation.
A good Artificial Intelligence Solution needs a dataset that is at the same time big enough and of a good quality. We have collected different kind of data going from time series to images and text.
Worldwide epidemic time series
At the Humanitarian Data Exchange site you can find open data about Confirmed, Deaths and Recovered cases all over the World, starting from 01/22/2020. Data are daily updated and contain informations about each Nation and, eventually, for Province.
In these days many scientists, like Prof. Ioannidis from Stanford University, are arguing that we need better information to guide decisions. Because of that we are adding other data sources to the list to provide a broad spectrum of choices. You can obtain worldwide time series also from Johns Hopkins, from the European Centre for Diseases Prevention and Control where you can also find a short tutorial to upload data into an R environment and the R Package nCov2019 that allows you to have historical data and other functionalities.
Italy epidemic time series
Official data about COVID-19 in Italy are made available the Italian Civil Protection. Beyond confirmed, deaths and recovered, you can retrieve info about the number of hospitalized individuals, the number of people in ICU and others.
COVID-19 diagnosis using Deep Learning is an hot topic. The following is a list of websites where it is possible to find this kind of data:
- X-Rays images of ~30k unique patients having pulmonary diseases
- X-Rays images of pneumonia affected pediatric patients
- X-Rays images of pneumonia affected patients
- X-Rays images of COVID and normal patients
- X-Rays and CT scan images
- X-Rays and CT scan images of patients affected by different kind of pneumonias (including COVID)
The White House released a dataset consisting of 29,000 research papers related to COVID-19. The goal is to find meaningful answers to various questions such as risk factors, virus genetics and many others.
Geo-spatial data are sometimes used in the context of epidemiological studies. Italian demographic data are available with a high geographical density.
Social Networks Data
Online Social Networks Sites are a data source with daily tons of news shared by common people, public institutions and newspapers. The crowtangle website allows to monitor the Facebook Pages activity related to COVID-19 in Italy.
Flight connection data
The virus transmission analysis requires to identify the connections between different areas in the whole world. Some researchers are working on worldwide flights to better understand how the virus shifted from China to the World. All the worldwide connections between airports with an estimate of the average trips between them are available online.
Previous epidemics data
There is a research line that tries to understand COVID-19 trying to compare it with previous epidemics. Here you can find links to data related to:
Data from Korea
Korean Ministry of Health and Welfare built a web-platform to make available official data coming from South Korea. You won’t be allowed to download data locally, but you will have at disposal a cloud computing platform to submit your applications.
Italian researchers from University of Turin are currently developing an AI system for the automatic diagnosis of COVID-19 pneumonia from ultrasound images. The database is available upon request.
Data from UK government
From the official website of the UK government, it is possible to retrieve information about the number of hospitalized individuals, the daily new cases, the cumulative confirmed cases and deaths. Data are recorded at a province level.
Data from CDC (US)
CDC has launched a public surveillance on US Covid activity through a weekly report of hospitalized individuals, emergency department visits and other info. It is also possible to obtain daily data on deaths and cases from this page.
Mobility Data from Apple
Apple is publishing daily reports on the request for directions in Apple Maps for the whole world. This could be a good proxy for actual mobility. It is possible to observe data through dashboards or download the raw version at this link.
Data from Clinical Studies
The Operation Research and Analytics Lab from MIT has started a project related to Covid-19 analytics. Among their contributions, we need to remark the collection of datasets from various clinical studies.
Many research papers have been written in the last weeks, confirming that researchers are making their best effort to provide a substantial progress in the fight against COVID. These works can be an important guide and that’s why we want to include some papers that we consider as relevant.
Deep Learning applied to CT scan images
Chinese researchers have used about 5,000 CT scan chest images to detect infectedCOVID-19 patients with a high accuracy. The paper “Artificial Intelligence Distinguishes COVID-19 from Community Acquired Pneumonia on Chest CT” describes the use of Convolutional Neural Networks to extract meaningful features from images to classify a pneumonia either from COVID or not.
Poisson Autoregressive forecasting model
Trend prediction is a relevant information for decision-makers. An approach to Covid-19 trend analysis is described by Arianna Agosto and Paolo Giudici.
Non-Pharmaceutical Interventions (NPI) Epidemic model
During Covid-19 epidemic emergency many governments decided to take drastic actions such as school closures and social distancing. Ferguson et. Al. made a study of the efficacy of this kind of measures.
Case-Fatality-Rate unreported cases estimation
An open discussion is the one about data reliability. It is not always possible to screen all the population and, in this paper, a methodology to estimate the unreported cases is set out.
Deep Learning for Scientific Discovery
Researchers are trying to figure out solutions to a variety of problems related to COVID-19. Deep Learning algorithms are able to catch non-linear relationships between data and that’s why it is used in many different application. This paper is a Deep Learning survey that contains: a high-level explanation of Deep Learning theoretical concepts, a diverse set of Deep Learning techniques for different data modalities, a set of methods to improve performances when you have less data (semi-supervised methods), model interpretability techniques and tutorials on how to implement a complete Deep Learning pipeline.
Overview on recent AI solutions to Covid19
The epidemic outbreak have stimulated a lot of people to work towards providing new insights about the virus using Artificial Intelligence. This paper highlights many studies regarding molecular, medical and epidemiological applications giving also hints for future research on the subject.
Monitoring COVID contagion growth
The EU Think-Tank CEPS is producing reports, which are regularly updated, about the COVID-19 contagion. Two of their contributors are Arianna Agosto and Paolo Giudici, authors of the model we used as base to develop our own.
Infections and NPI assessment in 11 European countries
In this study, researchers from Imperial College provides estimates about the true number of infectious, the reproduction number and the impact of Non-Pharmaceutical-Interventions.
Dashboards are visual tools useful to represent complex information in a concise and dynamic way, with the aim to understand the epidemic outbreak in a short time.
Italian Civil Protection Dashboard
COVID-19 diffusion data are available by a dashboard from Italian Civil Protection.
Real-time epidemic simulation
Online you can find an epidemic simulation dashboard, where is possible to evaluate the virus diffusion by setting different parameters.
Epidemic Modelling for US
Researchers from Northeastern University, University of Florida and other institutions used a spatio-temporal epidemic model to estimate the number of deaths and infections in US. It is possible to observe their prediction at this link, in a single-page there are dashboards with both predictions and a “what-if” scenario analysis that simulates the contagion growth without containment measures.
Community mobility report from Google
Google decided to open-source their estimation of mobility in the World. You can find data about mobility in parks, transit stations and so on. The spatial-granularity is national and it becomes regional for wide areas.
Epidemic Projections by MIT
MIT researchers employed a novel SEIR model, explained in the details at this link, to predict several quantities such as Deaths, Hospitalized Individuals and Confirmed Cases. Projections are available at a State level.
Tutorials describes are practical applications of Artificial Intelligence tools.
Deep Learning for X-Ray analysis tutorial
You can learn how to diagnose the COVID-19 by using Convolutional Neural Networks with Python and TensorFlow.
Epidemic models with R tutorial
This RMarkdown is useful to learn epidemic models by using R.
SEIR Modelling using R
In this tutorial, it is possible to find how to exploit a SEIR model taking into consideration the spatial dimension using an origin-destination matrix. In this case, the analysis is focused on the Greater Tokyo Area.
In the following you can find resources that could not be classified in previous categories.
R Package for time series
The tscount package is useful for time-series analysis with counters.
BERT is a deep neural network used in various Natural Language Processing tasks to achieve state-of-the-art results. In this Github repository, you can find various resources such as pre-trained architectures and research papers about different NLP solutions using BERT.
Research papers with code
It happens quite a few times that you discover an interesting paper but you don’t have time to implement from scratch the software. At this website, you can find a collection of papers with the associated code.
Many organizations are sharing resources on-line. You can find other AU resources at the following links:
- Dimensions.ai dataset
- Towards Data Science
Last update: 2021/05/18