Logo

AI LEARNING ENGINEER

Email: dileepdba@outlook.com

A Portfolio by Dileep

This portfolio is a compilation of notebooks which created for data analysis or for exploration of machine learning algorithms.

Dive into Data Analysis

If you’ve just started to learn about data, or if you’re not quite sure how it works—this book offers a wealth of information. Data Analytics Made Accessible breaks down data analysis into an easy to follow, digestible format.

The link can be found here

Sequence Analysis of Covid-19 Genome

The emerging infectious disease of the global coronavirus COVID-19, caused by the new severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2), had an impact on global health and economy since its identification at the end of December 2019. We done some basic analysis of DNA sequences. We work with FASTA files and do some manipulation, such as reverse complementing or transcription.

The link can be found here

AN APPROACH FOR TABLE SCRAPING - AZURE FORM RECOGNIZER

Invoice handling is the most important task in many different fields that deal with large amounts of data in many different formats. Tables use layouts to organize information and convey meaning and can also represent and communicate complex information to readers.

The link can be found here.

Face-Detection with Python and azureAPI

Computer vision is an exciting and growing field. There are tons of interesting problems to solve! One of them is face detection: the ability of a computer to recognize that a photograph contains a human face and tell you where it is located. In this article, you’ll learn about face detection with Azure Face API with Python.The code can be found here.

Stand-alone projects.

Home Credit Default Risk

Home Credit Bank offers a challenge of credit scoring. There is a lot of data about applicants and their previous behavior. The code can be found here.

Classification problems.

Titanic: Machine Learning from Disaster

Azure ML Studio.

Titanic: Machine Learning from Disaster is a knowledge competition on Kaggle. Many people started practicing in machine learning with this competition, so did I. This is a binary classification problem: based on information about Titanic passengers we predict whether they survived or not. General description and data are available on Kaggle. Titanic dataset provides an interesting opportunities for feature engineering.

Regression problems.

Loan Prediction

Github

Loan Prediction is a knowledge and learning hackathon on Analyticsvidhya. Dream Housing Finance company deals in home loans. Company wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. Based on customer’s information we predict whether they should receive a loan or not.

Natural language processing.

Bag of Words

Github

Bag of Words Meets Bags of Popcorn is a sentimental analysis problem. Based on texts of reviews we predict whether they are positive or negative. General description and data are available on Kaggle.

NLP. Text summarization

Github This notebook shows how text can be summarized choosing several most important sentences from the text. I explore various methods of doing this based on a news article.

Clustering

Clustering with KMeans

Github

Clustering is an approach to unsupervised machine learning. Clustering with KMeans is one of algorithms of clustering. in this notebook I’ll demonstrate how it works. Data used is about various types of seeds and their parameters.

Data exploration and analysis

Telematic data

Github

I have a dataset with telematic information about 10 cars driving during one day. I visualised data, search for insights and analyse the behavior of each driver.

Recommendation systems.

Collaborative filtering

Github

Recommenders are systems, which predict ratings of users for items. There are several approaches to build such systems and one of them is Collaborative Filtering. This notebook shows several examples of collaborative filtering algorithms.

Handwritten digit recognition

This is a self project using image recognition methods in practice. This is a site (also works on mobile) where user can draw a digit, and machine learning models (FNN and CNN) will try to recognize it. After than models can use the drawn digit for training to improve their accuracy. The code can be found here.

Anomaly Detection

Github

Repository contains the notebooks of multiple methods to detect the anomalies in the dataset.