The Blue Sky Challenge

live
1256 Registered Allowed team size: 2 - 6
Submission Phase
Online
starts on:
Dec 24, 2021, 12:30 PM
ends on:
Feb 01, 2022, 06:25 PM

Overview

The Blue Sky Challenge 

Let’s keep the sky blue and the earth green!!


Monitoring air quality in various cities across the globe is becoming an utmost necessity as air pollution can lead to several respiratory/cardiovascular diseases. As per reports from WHO, more than 7 million mortalities have been recorded worldwide due to diseases related to air pollution. Increasing urbanisation and industrialisation have a negative impact on air quality which is already alarming in many cities across the world. In practice, the installation of ground stations for local air quality measurement would not be a feasible solution. A mobile application in this regard would enable us to make an informed decision.

In this direction, this hackathon aims to find solutions in two different approaches which are categorized in two sub-themes, (a) to analyse satellite imagery data to estimate the pollutants in a given area, and (b) to discover new innovative solutions for developing smart air quality monitoring systems by integrating sensor technology with machine learning algorithms. The challenges with satellite images include the availability of only temporal snapshots of data instead of continuous data, a huge data size of the imagery data makes them not suitable to be downloaded through mobile networks to consumer devices, and processing of such data with limited computational capacity in the mobile devices. Similarly, the sensor data in many cases need to be screened to estimate failures, anomalies, and errors associated with the same. This hackathon targets finding solutions in a hybrid approach. Further, this event will also provide a platform for technology and innovation enthusiasts from different parts of the world to demonstrate and showcase their skills for the betterment of society. 

The participants are expected to find and submit solutions in the proposed sub-themes. The submissions will be then evaluated by experts. 

Each registering team needs to submit max 1000 words on their approach to solving the sub-theme problems. Final decision on winners and other matters will be taken by the panel of judges.

Themes

Each registering team needs to submit max 1000 words on their approach to solving the sub-theme problems. Final decision on winners and other matters will be taken by the panel of judges.

Sub-theme 1: Blue Sky Above

Goal: Pollution estimation with improved accuracy using a combination of hyper-spectral satellite imagery data and maps. 

Background:

Air pollution is one of the major public health concerns which leads to a  number of respiratory/cardiovascular diseases and an estimated 7 million mortalities  worldwide [WHO]. Tracking the quality of the air we breathe using a mobile device  will help us be more informed about the air quality condition around and to make an  informed decision. However, accessible data from local air quality measurement  ground stations are not available everywhere. There are some recent advancements in  terms of the use of map-based land-use regression (LUR) models (Steininger et al.).  However, such models may suffer from spatial and temporal inaccuracy due to  artefacts, non-man-made sources of pollution, wind speed, pressure, precipitation and  temperature. Satellite imagery data could give a better estimate of the pollutants in a  given area. The challenges with satellite images include availability of only temporal  snapshots of data instead of continuous data, huge data-size of the imagery data  makes them not suitable to be downloaded through mobile networks to consumer  devices, and processing of such data with limited computational capacity in the  mobile devices. This hackathon aims to find solutions to these challenges through a  hybrid approach.

Problem Statement: 

The schematic diagram below illustrates the challenge. A code template with APIs is  provided to assist you with the implementation. The solution requires two levels of  data or image processing. One involves processing of the hyper-spectral satellite  images/data to extract hyper-parameters or other suitable compressed  representation/feature set. This compressed information should be sent to the second  regression machine learning (ML) model as data input. The regression machine  learning model should process the data from the satellite imagery along with the map data from the open street map to find the regression estimate of pollution (Nitrous  Oxide) in the region for specific time instances. The solution should be submitted as a python code (.py) or (.ipynb) file

Datasethttps://s3-ap-southeast-1.amazonaws.com/he-public-data/BlueSkyAbove0287eea4.ipynb

Click here to read more on the dataset - https://s3-ap-southeast-1.amazonaws.com/he-public-data/BlueSkyChallenge39c7d73.pdf

Judging/scoring Criteria: 

1) A dataset similar to the example dataset will be used for judging and testing the same trained network. The solution should be submitted as a python code/ipynb file.
 
2) Location (Eg. 51.5219,-0.1280) and the local time (Eg. 'April 4 2021 1:33PM') at which the NO2 concentration must be estimated will be provided as input. Your model can take up to one week of prior data and extract key information/data from this. The data extracted by your model from the satellite data should be stored as a file. This compressed data/features file size (os.path.getsize python function) from the satellite-imagery will be taken into consideration to calculate the score. There will be a penalty of -1 point for every kilobyte of the compressed features file extracted from the satellites-imagery data and fed to the regression model. The goal is to achieve maximum accuracy with minimum compressed data/features transferred from the satellite imagery to the regression model.
 
3) For a time-instance, 5 sample spatial locations will be chosen. The locations will be within a 50 km circumference from the center of London (51° 30' 35.5140'' N and 0° 7' 5.1312'' W). These locations will be at least 7 kms apart. 100 time-instances will be evaluated. Absolute percentage correctness (i.e., 100%-(percentage of error)) of the estimate from the ground truth (8 hour running average) for each of the time-space samples (5x100) will be summed to get the first score component.
 
4) The time of execution of the code will be scored as -1 point for every millisecond of execution (rounded off to nearest milliseconds from microseconds time of run estimation (time.time python function)). The execution will be benchmarked in the amazon elastic cloud compute unit as specified in the problem statement.
 
5) Proper comments in the code, explanation of the algorithm and presentation/report will involve additional scores up to 10000 points.

Example Scores: 

Team A: 

Score from regression: 43000 

Score from the hyper-parameter datasize (2048 kb): -2048 points Score from time of execution (2.242 seconds): -2242 points 

Comments/Presentation: 4200 points 

Total: 42910 

Team B: 

Score from regression: 46000 

Score from the hyper-parameter datasize (10084 kb): -10084 points Score from time of execution (4.648 seconds): -4648 points 

Comments/Presentation: 2200 points 

Total: 33468 

Teams Information: Please check the Rules section

Benchmarking compute cloud information: 

Amazon EC2 

t2.micro, 1 GiB of Memory, 1 vCPUs, EBS only, 64-bit platform 

Code template: 

https://github.com/williamnavaraj/BlueSkyChallenge.git

Potential FAQs: 

1) Should the code run on a mobile device? 

No. While the problem statement aims towards a mobile application/webapp, the current  algorithms/code development is aimed to be tested in a consumer grade computer and will  be benchmarked in an equivalent computing elastic cloud computing unit in the cloud.

2) What dataset will be used for testing? 

A similar dataset as the dataset provided in the challenge (From same region and similar  satellite data will be used for testing). 

3) Will the ML training time/resources be taken into consideration for the grading? 

No. Training can be carried out in any computing system. However, the resulting trained  models should not exceed more than 4 GB for the compression/hyperparameter extraction  and should not be more than 1 GB for the regression model. 

4) What if we get negative total scores? 

Given the penalty points, total negative scores are possible and are acceptable. The overall  goal is to maximize the accuracy within the limited time and computing resources. The  solutions will be ranked based on whoever gets the maximum in the positive direction. 

5) Can we publish this work? 

Yes. You are free to publish the work. 

6) Who owns the IP? 

You own the IP. On submission of the solution, you are contributing to IEEE a license to  use your algorithm for potential app development to tackle and create awareness about air  pollution. 

References: 

https://www.who.int/health-topics/air-pollution#tab=tab_1

Schmitz, O., Beelen, R., Strak, M. et al. High resolution annual average air pollution  concentration maps for the Netherlands. Sci Data 6, 190035 (2019).  https://doi.org/10.1038/sdata.2019.35 

Michael Steininger, Konstantin Kobs, Albin Zehe, Florian Lautenschlager, Martin  Becker, and Andreas Hotho. 2020. MapLUR: Exploring a New Paradigm for  Estimating Air Pollution Using Deep Learning on Map Images. ACM Trans. Spatial  Algorithms Syst. 6, 3, Article 19 (May 2020), 24 pages.  DOI: https://doi.org/10.1145/3380973

Sub-theme 2: Blue Sky Below

Goal: Forecasting Sensor Measurements in Smart Air Aquality Monitoring System

Background:

Air quality has a significant impact on the overall well-being of humans and society across the globe. The rewards of good air quality are numerous, including substantial health, environmental, and economic benefits.  However, as a result of increasing urbanisation and industrialisation, air quality in major cities around the world is becoming a source of concern. Several nations have made efforts to implement smart city initiatives, in which sensors play a vital role in informing both governing authorities and the general public about real-time air quality levels via mobile or web-based apps.  Traditional sensor monitoring can be made smarter through the adoption of state-of-the-art machine learning algorithms, which will allow for an improvement in the current capabilities of air quality monitoring. In this context, the sub-theme 2 of the hackathon seeks to discover new innovative solutions for developing smart air quality monitoring systems by integrating sensor technology with machine learning algorithms.

Problem Statement: 

A number of factors in the air can have an impact on its quality. Multiple sensors monitoring various parameters are used in air quality monitoring sensing systems, which are available as a whole suite. The role of temperature  and carbon monoxide in air quality is vital. The following issue that may be addressed via this hackathon in order to make such systems smarter:

Temporal forecasting of temperature and Carbon Monoxide (CO) sensor data one day ahead: It can assist the general public and government officials in anticipating trends early in order to make timely decisions and take preventative actions.

Advanced machine learning algorithms combined with sensor data have the potential to be a leap forward and in addressing the problem listed above. Therefore, the primary emphasis of this sub-theme 2 is on the development of machine learning algorithm to solve the defined problem. To evaluate the developed machine learning algorithm, the participants can use the dataset from the air quality chemical multisensory device deployed in the field in an Italian city.

Dataset: https://www.kaggle.com/fedesoriano/air-quality-data-set?select=AirQuality.csv

Dataset Acknowledgements: Saverio De Vito (saverio.devito '@' enea.it), ENEA - National Agency for New Technologies, Energy and Sustainable Economic Development.

Citation Request: S. De Vito, E. Massera, M. Piga, L. Martinotto, G. Di Francia, On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario, Sensors and Actuators B: Chemical, Volume 129, Issue 2, 22 February 2008, Pages 750-757, ISSN 0925-4005.            


(https://www.sciencedirect.com/science/article/pii/S0925400507007691)

Judging/scoring Criteria:

In total, the dataset contains 15 attributes. In this hackathon, we are restricting to use only 4 attributes. They are:

  • Attribute 00: Date (DD/MM/YYYY)
  • Attribute 01: Time (HH.MM.SS)
  • Attribute 02: CO (mg/m3)
  • Attribute 12: Temperature (°C)

Initial Training Data Period: 7 days from 11/03/2004 00.00.00 to 17/03/2004 23.00.00

Testing Data Period: 7 days from 18/03/2004 00.00.00 to 24/03/2004 23.00.00

Each day of the training and test data period have 24 data points starting from 00.00.00 to 23.00.00.

Procedure:

a) Initially, train your machine learning model using 7 days data from 11/03/2004 00.00.00 to 17/03/2004 23.00.00.

b) Perform temporal forecasting (one-day ahead forecasting) for the 8th day using 7 days of data. Compare the forecast values with the real sensor data and perform performance evaluation using the metrics Mean Absolute Percentage Error (MAPE). Use your real sensor data for the 8th day as the true value while computing the performance metric for the 8th

c) Perform the temporal forecasting for the 9th day by updating the training database from the 8th day sensor measurements. Compute the forecasting performance metric for the 9th

d) Perform the temporal forecasting for the 10th day by updating the training database from the 9th day sensor measurements. Compute the forecasting performance metric for the 10th

e) Perform the temporal forecasting for the 11th day by updating the training database from the 10th day sensor measurements. Compute the forecasting performance metric for the 11th

f) Perform the temporal forecasting for the 12th day by updating the training database from the 11th day sensor measurements. Compute the forecasting performance metric for the 12th

g) Perform the temporal forecasting for the 13th day by updating the training database from the 12th day sensor measurements. Compute the forecasting performance metric for the 13th

h) Perform the temporal forecasting for the 14th day by updating the training database from the 13th day sensor measurements. Compute the forecasting performance metric for the 14th

JUDGING CRITERION: Determine the average of MAPE for the testing period (8th day to 14th day) and each day MAPE as well.

For evaluation of different teams, ranking orders among the teams will be computed by the judges for each judging criterion. Note: the participants need to use temperature data independently to evaluate their algorithm.

Potential FAQs:

1) Should the code run on a mobile device?

No. The current algorithms/code development is aimed to be tested in a consumer grade computer and will be benchmarked in an equivalent computing elastic cloud computing unit in the cloud.

2) What dataset will be used for evaluation?

The judges will be using the same dataset as provided.

3) Will the ML training time/resources be taken into consideration for the grading?

No.

4) Can we publish this work?

Yes. You are free to publish the work.

5) Who owns the IP?

You own the IP. On submission of the solution, you are contributing to IEEE a license to use your algorithm for potential app development to tackle and create awareness about air quality.

6) -200 in the dataset is an anomaly value. Can be removed as part of the training/testing dataset.

References:

Similar Work: K. Thiyagarajan, S. Kodagoda, L. Van Nguyen and R. Ranasinghe, "Sensor Failure Detection and Faulty Data Accommodation Approach for Instrumented Wastewater Infrastructures," in IEEE Access, vol. 6, pp. 56562-56574, 2018, doi: 10.1109/ACCESS.2018.2872506.

Prizes USD 3000 in prizes

Main Prizes
Winning team (Per Sub-Theme) (2)
USD 1,000

One winning team per sub theme will get prizes worth USD 1000.

Runner up (Per Sub-Theme) (2)
USD 500

One runner up team per sub theme will get prizes worth USD 500.

starts on:
Dec 24, 2021, 12:30 PM
closes on:
Feb 01, 2022, 06:25 PM

Social Share

Notifications
View All Notifications

?