I have done B.Tech in Computer Science from Glocal University, Saharanpur. I’m eager to learn and I love solving technical problems . My major areas of interest are data structures, algorithms, mathematics, solving real world data science problems.
I have good experience with Machine Learning, Deep Learning and NLP. I have worked on scikit-learn, XGBoost and keras for solving various real world classification, regression and clustering problems using Naive Bayes,Logistic Regression, SVM, SVD, MLP and other techniques. I have posted my Data Science projects here.
I did my Summer 2015 Training at Hewlett-Packard Education Services(HPES), Noida in Java where I secured 'A' grade.
I did my Final Year Project on Comparison of different machine Learning Algorithms on a dataset and compared their accuracy for each model built. I've also set up multi-cluster environment for data processing in my University using Apache Hadoop. I've some knowledge on Apache Hadoop, Pig.
Build a recommendation system for apparel which gives the most similar products using amazon dataset. Techniques used Tfidf-W2Vec, VGG-16.
View More..
Predict the movies for users with the help of surprise library and SVD. Trained different models and used the result of each model as feature to the next model with the final model as XgBoost to reduce RMSE.
View More..
Exploratory Data Analysis of Cancer data from kaggle and running various models and plotting confusion matrix and return probability score to make model interpretable. Techniques used Tfidf, Naive Bayes, Logistic Regression, SVM, Random Forest.
View More..
Predicting whether a question is duplicate or not and predict if the question is duplicate from quora dataset and reducing the loss using hyperparameter tuning. Techniques used : Avg-W2Vec, Logistic Regression, Linear SVM and XGBoost.
View More..
Predict the pick up density of cabs at a given particular time and a location in New York City using simple models such as Rolling Window and its variants and Regression Models such as Random Forest and XGBoost and using Time Series Forecasting and Fourier Features.
View More..
Visualize the data set and apply various models and see which one performs better with respect to other on various performance metrics. Techniques used : KNN, Naive-Bayes, Logistic Regression, Decision Tree.
View More..
Found out the top tfidf vectors and then find the co-occurrence matrix and we use Truncated SVD to find word-vectors and find the top words that are clustered together using KMeans and plot them using WordCloud according to top tfidf. Techniques used: TF-IDF, Truncated SVD, Kmeans++.
View More..
Predicted the tag related to the question and improved the micro-averaged-f1-score with hyperparameter tuning of Logistic Regression and SVM.
View More..