Shell.ai

Shell.ai Hackathon

Challenge Over
Technical/ Problem Statement
6

Technical/ Problem Statement

Replies (23)
Sort by:

hi where can I get the dataset I would still like to participate although the contest is over

 

Hey, Once the challenge is over, we can not share the dataset pertaining to this challenge. However you can participate in HackerEarth's parctise tests.

Can you share a link to the practice tests? It seems the competition is no longer accepting submissions. 

Hi  Harshal, 

I have a small doubt regarding evaluation of the solutions , the leaderboard shown to us is using the result  for year 2019 and private leaderboard will use the result for year 2020. is it necessary that the one who is at 24th position in social leaderboard their ranking wil also be 24th in provate leaderboard ? or it can imporove/decline also? is there anyway to see private leaderboard?

Regards

Shahrukh

Private leaderboard can have different ranking as your solution is evaluated on year 2020. Having said that, if you have a good solution for 2019 (public leaderboard), you would also have a good solution for year 2020 (private leaderboard) as well since the nature of the problem is exactly same. Private leaderboard will be made available to view after the Hackathon ends.

Hi Harshal,

 we have observed 30 demand points as all zero demands for all previous years(2010-2018) , would you like us to continue this for 2019 and 2020 with zero value  or should we forecast it using our logic, so we have clean data for forescasting?

Hi Mohammed, as a moderator I can't comment on this. You can use your best judgement and progress.

For the submission in leaderboard, upload prediction file, we can upload only 1 file, in the sample, we have only SCS, FCS & DS_ij . How about the demand D_i values for 2019, we need to upload that also right? should we do it in 2 sheets on the same excel, or do we need to do it without D_i 2019 values? kindly advice

Hi Mohammed, D_i values are calculated by evaluation script using constraint No. 6.

hello @harshil patel ,

I had a doubt that what is value of supply is it sum of scs n fcs or else smax itself in ds matrix

Hi Vedang, Can you please elaborate your question? Apologies for not understanding.

thanks harshil, i have got the answer actually i had asked wrong question sorry for that 

welcome Vedang :)

Hello, I am currently encountering an issue where I am constantly getting a constraint 5 violation. I have checked that the total demand to each supply point is either equal or less than the available supply. Has anyone else encountered this, if so any suggestions for fixes?

As a moderator, I can advise you to check the definition of constraint 5 again. Specifically: 1) how Summation is happening in D_ij matrix and 2) How Smax_j is calculated. More details in the problem statement. If your problem still persists, you can mail your solution file to HackerEarth and we will examine it.

Hi Johnny, one more point: ensure that constraint 5 is followed by solutions of both year 2019 and 2020 independently. 

Hi Harshil, many thanks for your suggestions! I have ensured that the total demand (across all 4096 demand points) to each respective supply point does not exceed the available supply given by 200*SCS+400*FCS and that this does not violate for 2019 and 2020. This still results in a constraint 5 violationI have however reduced the total demand to each supply point so that there is an miniscule excess of supply (order of 1E-5), and hackerearth accepts my solution.This however, does not seem methodical and robust, so I have emailed support@hackerearth.com with my solution file for further checks on your end.Does the site only handle floats up to a certain significant figure, which possibly contributig to a rounding error on your end? 

Preferential Bias towards Underestimated Demands

Dear Organizers, 

I believe the EV problem as constructed is "biased" towards the underestimation of demand. Please allow me to elaborate. 

Let us assume that the actual demand for the year 2019 is D19_true. We have two hypothetical forecasts D19_over (overestimation) and D19_under (underestimation). Further assume: MAE(D19_true - D19_over) = MAE(D19_true - D19_under). In other words, Cost_DM is identical for both the demand predictions. 

Now, in the case of D19_over scenario, one will need more "supply" to balance the excess demand. That means Cost_IF_over will always be higher than the Cost_IF_under case. Moreover, due to more demand, one needs to transport more "materials" to achieve optimal transport. Thus, Cost_CD_over will also be more than Cost_CD_under. 

Thus, the overestimation of demand leads to more overall cost. In other words, identical Cost_DM can lead to significantly different "scores" due to over/under estimation; underestimation scenario will always lead to "better" scores. I would like to emphasize that I am assuming Cost_DM is the same for under and over estimated cases. 

From a business perspective, underestimation case is more lucrative. However, overestimation is better from a consumer point of view. 

I believe you can make the playing field level, if you use asymmetric cost for Cost_DM. You can penalize underestimation more than overestimation. Since you have access to the true demand, you can figure out the exact penalty. Changing Cost_DM does not impact the participants' coding. 

Best regards,

Sukanta

Wow what a finding. Agree

Hi Sukanta, Great observation! I must appriciate your understanding of the problem. To be true, this is also a real dellima for the EV charging business: to maximize the profit or to maximize the margines. In the Demad underestimation case, your designed EV infra will be fully utilized at all time and you will maximize on margins. But in this case, some of your customers will be dissatisfied and you may lose on getting potential profits (albeit at lower margines if you would have installed some more capacity). In overestimation case, your EV infra may be underutilized a bit but your customers will be happy and there is an opportunity to maximize the profilts at a bit lower margines. so, the real matric to focous here is Net Present Value (NPV) along with margines and profilts. However, it is outside of the scope of current competition. For this competition, let's play by the already set rules. Once again, a great observation!

Hello. I have a question about Cost of Customer Dissatisfaction. You said that this cost is calculated by multiplying the distance matrix with the demand-supply matrix. Here, Distance Matrix is the distances from EVERY demand point to EVERY supply point. Why should demand(i) care about every supply(j) in the region? As long as there are supply(j) points close to demand(i) and have enough capacity, there's no need to calculate distance and dissatisfaction to ALL supply(j) points in the entire region. Please tell me why you chose to define objective function this way. 

'Close to' is a subjective term. We don't know how close are these supply points from the demand points unless we develop a solution satisfying all constraints. So to convert the subjective 'Close to' to an objective value, we have addeded a cost of customer dissatisfaction. Think about this: If we don't have this cost, your optimization algo doesn't have an incentive to obtain a solution where these supply points are indeed 'close to' the demand points.

Hi All,

I am unable to find the datasets that are to be provided. Please help me with the navigation for these datasets

You will need to 1) registed for this challenage, 2) create/join team and 3) start the challenge. You will then be redirected to data/submission/leaderboard page from where you can download data.

For calculating the cost, kindly help us to understand the Cost of Demand Mismatch, in which, how we will access Dtrue,i for calculating the cost.

D_History is given to you from year 2012 to 2018. D_True for year 2019 and 2020 is not given (hidden from you). You have to forecast Demand (D_Forecast) for year 2019 and 2020 as close as D_True. You actually can't compute the Cost of Demand Mismatch on your side as D_True is hidden from you. We do it when you submit your solution.

thanks, Harshil, for nicely explaining.

Harshil, D_Forecast is not a part of the solution submission(you are only asking DS, SCS, FCS from us) . SO how will you even compute cost of Demand Mismatch?

I guess the D_Forecast is predicted based on the DS matrix which we submit. So for ex for a demand point 'i' it will be sum over j DS[i][j].

As mentioned by Kirushikesh DB, D_Forecast_i is computed using DS_ij using constraint 6 by us.

What float precision is used to evaluate the DS Matrix constraints? 

Can you please inform us which constraint are you referring to?

Hi Harshal,

Thank you for the explanation. I would request a small clarification.

Di - EV Charging demand at i th point.

DMij - Demand Supply Matrix ( It says how much demand of each demand point is supplied by each supply point.

Here Di is understood well with values from the provided file,  Demand_ History.csv

And understood DMij is the Demand supply matrix we need to create ourselves.

THE QUESTION IS: WHAT VALUE (NUMBERS) WILL DMij WILL HAVE? CAN YOU GIVE AN EXAMPLE WITH SPECIFYING ONE i Point and j POINT WITH NUMBERS TO UNDERSTAND WELL? 

This explanation will really help us to completely understand the entire concept.

 

Hi Mohammed, Please go through the webinar video (Q&A section at the end) where we have addressed this question. Also, suggest you to revisit last two constraints to understand it better.

Thank you Harshil for the advice. Let's say for any ith point, Demand be Di = D0 =  13.1195 and there are 100 parking slots the demand will be accomplished, we know the nearby parking slot will be responsible for accomplishing this demand maximum, for far parking slots, the values will be zero. can you give one example in detail, how it is being done considering data given for 2018, for Di and parking Slot (No. of FCS &SCS)? This is to understand it well.

 

 

Technically: Demand at i_th demand point can be satisfied by ANY j_th supply point. Ideally: Satisfy the demand by NEARBY supply point given that all constraints are satisfied. Factually: demand will be AUTOMATICALLY satisfied by nearby points if you minimize the cost function. That's how the cost function is formulated.

Hello everyone, 

I and my team are facing issues predicting the demand values for 2019 and 2020 due to the very few number of rows (i.e 2010 to 2018) given. I suspect we are using a wrong approach, and I'll appreciate if anyone suggest a data preparation method that works. 

Hi Yusuf, As a moderator I would refrain from answering this question to be fair to everyone. Fellow participants, please help Yusuf.

Hello Yusuf try to use some time series forecasting method or other Machine Learning Algorithms i hope these 9 points are enough to predict the demand on 2019 and 2020 cause i got a good result of using one of those.

Thank you Kirushikesh 

Hi, the second cost function is the mismatch between predicted and true demand forecasts. So without the true forecasts available, how shall we take that cost function into consideration for computing the overall cost for years 2019 and 2020? 

View all replies (2 more)

D_History is given to you from year 2012 to 2018. D_True for year 2019 and 2020 is not given (hidden from you). You have to forecast Demand (D_Forecast) for year 2019 and 2020 as close as D_True.

Hello Harshil,So how can we include "Cost of Demand Mismatch" in our cost function for optimizing our model?

Hi Deep, It's totally upto you on how do you want to formulate your problem mathematically. I can't comment on the specifics.

hi Harshil

I find in sample submission file that all demand point index values for data type SCS and FCS are NULL does it all ways needs to be like that or should we insert some values in future submissions?

Hi Kunisetty,Yes. SCS and FCS are the quantities related to the supply point. So, demand point index will be disregarded for them while evaluating your submission. Best to keep demand point index emply/blank for SCS and FCS.

Hi, 

In your evaluation script, how is the distance function defined? Is it Eucledian (L2) or Taxi-cab (L1)? 

Hi Sukanta, We have considered direct/Eucledian distance. Great question and thanks for clarifying it for other participants as well.

Hello. Please, what evaluation script are you reffering to?

hey... I am unable to find the dataset mentioned in the problem_statement.pdf

can anyone help me out??

You will need to 1) registed for this challenage, 2) create/join team and 3) start the challenge. You will then be redirected to data/submission/leaderboard page from where you can download data.

How will we get Dtrue value , is it given?

D_History is given to you from year 2012 to 2018. D_True for year 2019 and 2020 is not given. You have to forecast Demand (D_Forecast) for year 2019 and 2020 as close as D_True.

Hi guys. I think I understand the problem. Does anyone know what class of problems this challenge belongs to. I mean what kind of algorithms we use to solve this kind of problem. I know (basic) graph theory algorithms, machine learning models (Random forest, SVM, ...), but I can't match my algorithms to this problem.

Hi dimbos1997, the is a constrained optimization problem. You can read more on it here: https://en.wikipedia.org/wiki/Constrained_optimization

So this is not a machine learning problem. Please change the title in HackerEarth website. 

Please solve the following queries

  1. What doest DS represents and how it can be calculated?
  2. Is the supply_point_index and demand_point_index represents same thing. If not then how to decide the coorelation between the demand and supply.

Hi Rajan,

I will answer your second question first.

2) Supply point is the (public/private parking) locations where more EV charging stations can be potentially installed to cater to the increasing demand year-on-year. There a are total 100 supply points for this problem. Index and cordinates of which are already given. Demand point is the center (representative) point of the grid where the demand of that grid area is aggregated. We have considered 64X64 (total 4096) demand points for this problem. Index, cordinates and historic demand of each demand points is also shared with you.1) Consumers from i_th demand point can satisfy their EV charging demand from any j_th supply location. So, DS matrix represets how much of a demand from i_th demand point is catered by j_th supply point. Naturally, it would be 4096X100 size matrix.

How is DS calculated? 

What is DS_ij?

Consumers from i_th demand point can satisfy their EV charging demand from any j_th supply location. So, DS_ij matrix represets how much of a demand from i_th demand point is catered by j_th supply point. Naturally, it would be 4096X100 size matrix.

How DS calculated?

DS is the result of the optimization problem that you are suppose to formulate and solve.

?