Feature Engineering + H2o Gradient Boosting (GBM) in R Scores 0.936

March 23, 2017
1 minute

With less than 3 days to go, this script is meant to help beginners with feisty ideas, machine learning workflow and motivation for ongoing machine learning challenge.

Here’s a quick workflow of what I’ve done below:

  1. Load data and explore
  2. Data Pre-processing
  3. Dropped Features
  4. One Hot Encoding
  5. Feature Engineering
  6. Model Training

Good Luck!

Note: For more feature engineering ideas, spend time on exploring data by loan_status variable. For categorical vs categorical data, create dodged bar plots. For categorical vs continuous data, create density plots and use fill=as.factor(loan_status).

To help the community, feel free to contribute the equivalent python / C ++ script in the comments below.

Update: You can get python script for this solution from Jin Cong Ho’s comment below.


Script (R)


Resources – Handy Algorithms for this Challenge

  • 2

About the Author

Making an effort to help people understand Machine Learning. I believe your educational background doesn't stop you to pursue ML & Data Science. Earned Masters in F/M, a self taught data science professional. Previously worked at Analytics Vidhya. Now solving ML & Growth challenges at HackerEarth!

Want to stay ahead of the technology curve?

Subscribe to our Developers blog

Yes, I would like to receive the latest information on emerging technology trends, as well as relevant marketing communication about hackathons, events and challenges.     By signing up you agree to our Terms of service and Privacy policy.