All things Data

# Category: Statistics

• ## Data Science Resources, ETL Practices, Beginner’s guide to Seaborn

1. Most Active Data Scientists, Free Books, Notebooks & Tutorials on Github In this article, I’ve listed the most active data scientist on GitHub, so that you can follow & see what are they up to (especially projects). Before moving forward, check out this ~ 2 minutes video on students using Github! Open Source Data […]

• ## Math Heavy Topics in Data Science!￼

1. Support-vector machine Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic […]

• ## Analyzing Diabetes Patterns amongst Indians, A Beginner’s Guide to Pearson’s Correlation Coefficient, Deep Learning in Cyber Security & Much More!

1. Juicing out the Diabetes Patterns amongst Indians using Machine Learning The data indicates an increase of 266% in the population of diabetics is going to be witnessed by developing countries. The score of the training model was a magnificent 100% which means it classified all the elements correctly as is evident as a result […]

• ## Interview with a Kaggle Master & More

1. Exclusive Interview with 2x Kaggle Master Gilles Vandewiele! “I think one of the nice things about the data science field is that it is so multi-disciplinary and that anyone who aspires to become a data scientist can do so.” – Gilles Vandewiele Golden words! As a beginner in data science, this quote gives me […]

• ## Resources to learn Linear Regression

Linear regression shows the linear relationship between the independent(predictor) variable i.e.Linear regression is a quiet and the simplest statistical regression method used for predictive analysis in machine learning. How a Math equation is used in building a Linear Regression model? Do you know that this one equation helps in building a linear regression model in the machine learning world? Yes, you heard it right.From the school days, we have come across the equation of the straight line.

• ## Basics of Data Science using Python

Basics of Data Science using Python After going through the topic above, what are the questions that come to your mind? It must be, why python? How can we use python to implement data science? And, what are the advantages and disadvantages? We will be answering all these questions and we are also going to talk about the libraries that we can use to implement data science. 1. NumPy NumPy arrays are similar to Python’s built-in list type in some ways, but NumPy arrays provide much more efficient storage and data operations as the arrays grow in size. 2. SciPy It is built on top of the Numpy library, which provides more extensions for finding scientific mathematical formulae such as Matrix Rank, Inverse, polynomial equations, LU Decomposition, and so on. 3. Pandas Pandas is a pillar library in any data science workflow because it allows you to perform data processing, wrangling, and munging. 4. Matplotlib This library is built on NumPy arrays and includes several plots such as line charts, bar charts, histograms, and so on. 5. Scikit Learn Scikit-learn is by far one of the most important Python libraries for machine learning, as it allows you to create machine learning models while also providing utility functions for data preparation, post-model analysis, and evaluation.. 6. TensorFlow TensorFlow is a software library that uses data flow graphs to perform numerical computations. 7. Keras It was created with the goal of allowing for quick experimentation .Keras is a Python library that is widely used for deep learning model training. 8. PyTorch PyTorch takes these tensors and makes it simple to move them to GPUs for faster processing during neural network training.

• ## Decision Tree From Scratch!! -Part I

Introduction In this blog post, I am going to talk about a powerful supervised learning algorithm that is often used in Machine Learning competitions. It is called the Decision Tree algorithm. It can be used for both classification & regression tasks. In this post, I will discuss the need for tree-based algorithms, the basics of […]

• ## Logistic Regression in Machine Learning (from Scratch !!)

Introduction In this blog post, I would like to continue my series on “building from scratch.” I will discuss a linear classifier called Logistic Regression. This blog post covers the following topics, Basics of a classifier Decision Boundaries Maximum Likelihood Principle Logistic Regression Equation Logistic Regression Cost Function Gradient Descent Algorithm After the discussion of […]

• ## Probability Distributions that every Data Scientist must know

Introduction Probability of an event tells us how likely is that, the event will occur. The applications of probability begin with the numbers p0, p1, p2… that give the probability of each possible outcome. There are dozens of famous and useful possibilities for p. I will discuss four of them in this post. Before going […]