Jarvis is my attempt of making data analysis script easier to write, read and maintain.

The first stage is to design a good set of APIs. Here is what we got for now.

From these APIs, I found the following categories:

  1. I/O related: read
  2. EDA related: peek, scatter, dist, log_transform
  3. Feature enginnering related: concat, check_missing, corr, corr2, get_numerical_feats, get_cat_feats, plot_feats_target_corr, plot_cat_target_corr, fillna_group_mean, label_encode, get_skewness, remove_skew_coxbox
  4. Model related: rmsle_cv, lasso, ENet, KRR, GBoost, model_xgb, model_lgb, AveragingModels, StackingAveragedModels, rmsle, get_best_score

Instead of being designed from the first principle, these APIs and their implementation emerge from users’ contributions on Kaggle. However, you can see there is a strong pattern and their reusability is becoming better day by day.

Currently I’ve done two case studies using this library. Let’s see how it goes.