Bhishan Poudel, Ph.D. Candidate

Data Scientist

Linkedin GitHub twitter stackoverflow



NOTE:
1. Please click on the Project name (blue line) to expand the project details. (Click again to hide.)
2. The GitHub repositories are private and only accessible to the author but the htmls here in this website are all public.

Section A: Regression

A01: King County Seattle House Price Prediction (Regression)

GitHub

README

Statistics Report

a01 Data Processing

a02 Data processing Script

a03 Regression Statistics

a04 Regression EDA

a05 Regression EDA: bokeh

a06 Regression EDA: plotly

a07 Regression EDA: pixiedust

a08 Regression EDA: pandas profiling

b01 Regression Modelling (Boosting): Hist Gradient Boosting

b02 Regression Modelling (Boosting): XGBoost

b03 Regression Modelling (Boosting): LightGBM

b04 Regression Modelling (Boosting): CatBoost

e01 Regression Modelling (Ensemble): Stacking and Blending

m01 Regression Modelling (sklearn): linear and polynomial regression

m02 Regression Modelling (sklearn): sklearn methods

m03 Regression Modelling (sklearn): Random Forest

m04 Regression Modelling (statsmodels): linear OLS

s01 Regression Modelling (Special): pycaret

s02 Feature Engineering (Featuretools): XGBoost

s03 Feature Engineering (Featuretools): LightGBM

s04 Feature Engineering (Featuretools): CatBoost

w01 Model Interpretation: Yellowbrick, Lime, Eli5

w02 Model Interpretation: What If Tool (WIT)

w03 Model Interpretation: Dalex

w04 Model Interpretation: Dtreeviz

x01 Big Data Analysis: PySpark

x02 Big Data Analysis: PySpark Random Forest Tuning

y01 Deep Learning: Keras

z01 Best Model: CatBoost

z02 Best Model: XGBoost

z03 Best Model: Overall

A02: All State Insurance (Insurance: Regression)

GitHub

README

a01 Exploratory Data Analysis

a02 Data Processing

b01 Modelling

b02 Modelling Pyspark


Section B: Classification

BX.01: Fraud Detection (Binary Classification)

GitHub

README

a01 Classification EDA

a02 Classification Statistics

b01 Classification Modelling (Boosting): XGBboost

b02 Classification Modelling (Boosting): LightGBM

b03 Classification Modelling (Boosting): Catboost

e01 Classification Modelling (Ensemble): Stacking

m01 Classification Modelling (sklearn): Undersampling

m02 Classification Modelling (sklearn): Logistic Regression SMOTE

m03 Classification Modelling (sklearn): Decision Tree

m04 Classification Modelling (sklearn): Calibrated Classification

b05 Classification Modelling (sklearn): Isolattion Forest and LOF

s01 Classification Modelling (Special): pycaret (lda)

s02 Classification Modelling (Special): evalML

x01 Classification Modelling (Big Data): dask

x02 Classification Modelling (Big Data): vaex

x03 Classification Modelling (Big Data): pySpark

y01 Classification Modelling (Deep Learning): keras large model

y02 Classification Modelling (Deep Learning): keras simple model

y03 Classification Modelling (Deep Learning): keras oversampling

y04 Classification Modelling (Deep Learning): keras classifier sklearn api

BX.02: Cutomer Churn (Binary Classification)

GitHub

README

a01 Exploratory Data Analysis

a01 Exploratory Data Analysis (Plolty)

a02 Customer Churn: Data Processing

bx01 Modelling (Boosting): XGBoost with HyperbandCV

bx02 Modelling (Boosting): XGBoost with Bayes Optimization

bl01 Modelling (Boosting): LightGBM Classifier with sklearn pipeline and HyperbandCV

bl02 Modelling (Boosting): LightGBM Classifier with Optuna HPO

bl03 Modelling (Boosting): LightGBM Classifier with Hyperopt HPO

bc01 Modelling (Boosting): CatBoostClassifier with optuna hyperparameter tuning

ml01 Modelling (Sklearn): LogisticRegression

ml02 Modelling (Sklearn): LogisticRegressionCV

splr01 Modelling (Special): (Pycaret) Logistic Regression

spn01 Modelling (Special): (Pycaret) Naive Bayes

spx01 Modelling (Special): (Pycaret) Xgboost

spdla01 Modelling (Special): (Pycaret) Linear Discriminant Analysis

sflr01 Modelling (Special): (featuretools) Logistic Regression

se01 Modelling (Special): (evalml) Built-in Algorithm

w01 Model Interpretation: (What If Tool) Logistic Regression

wbl Model Interpretation: (LOFO) Logistic Regression

w01 Model Interpretation: (Interpret) Builtin Estimators Logistic Regression and Boosting

y01 Deep Learning: (Keras) Sequential Simple Model

BX.03: Porto Seguro Auto Insurance (Binary Classification)

GitHub

README

a01 Exploratory Data Analysis

a02 Modelling: LightGBM

a03 Modelling: XGBoost

a04 Modelling: Keras Entity Embedding

a05 Modelling: Stacking different Models

a06 Feature Selection: Boruta and Target Permutation

BX.04: Breast Cancer Wisconsin (Binary Classification)

GitHub

README

a01 Exploratory Data Analysis

b01 Modlling: (Boosting) XGBoost

y01 Deep Learning: Keras Sequential with class_weight

y02 Deep Learning: Keras Sequential

BY.01: Prudential Insurance (Multiclass Classification)

GitHub

README

a01 Exploratory Data Analysis

a02 Multiclass Classification Statistics

a03 Data Preprocessing

a04 Data Preprocessing Script

b01 Modelling: Linear Regression

b02 Modelling: RF Classifier

b03 Modelling: RF Classifier AUC ROC

b04 Modelling: XGBoost Multiclass Classification

b05 Modelling: XGBoost Linear Regression and Poisson Regression with Offset

c01 Multiclass Model Interpretation: eli5, shap and pdpbox


Section C: Timeseries Analysis

C01: Timeseries Analysis for Web Traffic Data

GitHub

README

a01 Data Processing

b01 Timeseries visualization and eda

c01 Timeseries statistics

d01 Timeseries modelling: ARIMA

d02 Timeseries modelling: VAR

e01 Timeseries modelling: sklearn

f01 Timeseries modelling: tsfresh and xgboost

g01 Timeseries modelling: fbprophet

g02 Timeseries modelling: fbprophet holidays

h01 Timeseries modelling: deep learning


Section D: Natural Language Processing (NLP)

D01: Twitter Sentiment Analysis (Analytics Vidhya Hackathon: Identify the Sentiment)

GitHub

a00 README

a01 Text Data Processing

a02 Text Data EDA

a03 Scattertext for positive and negative sentiments

a03b Result: Twitter Sentiment Html

b01 Text Data Modelling: BoW + Word2Vec + TF-IDF

b02 Text Data Modelling: TF-IDF + Logistic Regression

c01 Sentiment Analysis: ktrain

c01 Sentiment Analysis: ktrain, neptune

c01 Sentiment Analysis: ktrain, neptune HPO

c02 Sentiment Analysis: simpletransformers + Roberta

d01 Sentiment Analysis: (keras) LSTM

d02 Sentiment Analysis: (keras) GRU, CNN, LSTM

e01 Sentiment Analysis: (transformers) Small data with torch and distilbert

e02 Sentiment Analysis: (transformers): Full data with keras and distilbert

e03 Sentiment Analysis: BERT and Tensorflow

e03 Sentiment Analysis: BERT, Tensorflow, and Neptune

D02: Toxic Comments (Multilabel Classification)

GitHub

README

a01 Text Data Processing

a02 Text Data EDA

a03 Text Data EDA: Plotly

m01 Text Data Binary Classification (toxic or not)

s01 Text data modelling: spacy

y01 Deep Learning: GRU and Fasttext

y01b Deep Learning: GRU, Fasttext, Badwords

y02 Deep Learning: Transformers PyTorch BERT

y02b Deep Learning: Transformers PyTorch XLNET

y02c Deep Learning: Transformers PyTorch DisltilBert

y03 Bert Client: XGBoost

y03b Bert Client: Keras Sequential

D03: Consumer Complaints (Multiclass Classification)

GitHub

README

a01 Text Processing

a02 EDA for Text Data

b01 Text Data Modelling: Tf-idf and Sklearn Classifiers

b02 Text Data Modelling: LinearSVC

c01 Model Evaluation: Yellowbrick

c02 Model Evaluation: scikit-plot

d01 Text Data Modelling: PySpark

e01 Text Data Modelling: simpletransformers


Section E: Insurance Data Modelling

E01: French Motor Claims (Pure Premium Modelling)

GitHub

README

a01 Data Cleaning

b01 Frequency Modelling (Poisson Regressor)

b02 Severity Modelling (Gamma Regressor)

b03 Pure Premium Modelling (Tweedie Regressor)

b04 Tweedie Model vs FrequencySeverity Model

b05 Lorentz Curves Comparison

c01 Xgboost with Tweedie Regression

d01 GAM Linearized Modelling using Pygam


Section F: Financial Data Analysis

F01: Credit Risk (Banking: Financial Modelling (Scorecard))

GitHub

README

a01 EDA for Credit Risk Data

a02 Data Processing

b01 Risk Modelling: PDModel Gini KS CreditScore Scorecard



Chapter 2: SQL

2.01: SQL Queries for Hospital Management

GitHub

README

a01 SQL Queries using postgresql, sqlalachemy and pandas

a02 SQL Queries using sqlite3 and pandas

a03 Using Pandas only



Chapter 3: Business Projects

3.01: Spanish Translation A/B Testing

GitHub

README

a01 Spanish Translation A/B Testing with Extensive EDA and Statistical Tests





comments powered by Disqus