1. Most Active Data Scientists, Free Books, Notebooks & Tutorials on Github
In this article, I’ve listed the most active data scientist on GitHub, so that you can follow & see what are they up to (especially projects). Before moving forward, check out this ~ 2 minutes video on students using Github!
- Open Source Data Science — This repository encourages you to leverage open-source education and become a self-taught data scientist. If you like reading books, and prefer to gain knowledge from books than any other method, you have a lot to take home from this repository.
- Python Projects — Keen to do interesting python projects but don’t know where to start? Check out some interesting projects done in python, understand them, and maybe they could inspire you to start one on your own.
If you check their profiles, you’d realize that they have avidly contributed knowledge in form of books, projects, and tutorials for the welfare of the worldwide ML community. You can check out the original article here.
2. Good ETL Practices with Apache Airflow
- In this process, data is pulled (extracted) from a source system, to move into a format that can be analyzed, and stored in a warehouse or other system.
- Extract, Load, Transform (ELT) is an alternative, albeit related, an approach designed to push processing to the database to improve performance.
- In this guide we will cover the good practices of ETL implementation, using the Datastream Implemented through the Apache Airflow platform.
- If there is no error, access the Apache Airflow user interface the address (*Wait about 5 minutes before opening the terminal)
- To get a full picture of their assets and errands, they move data from that large number of sources into a data dispersion focus or data lake and run assessments against it.
- Connectors: Data sources and objections In a digital technology ecosystem, several devices contain a great diversity of data and objects, stored in object storage, which can be defined as a Data Lake, and a set of these constitute Big Data.
3. A Beginner’s Guide To Seaborn: The Simplest Way to Learn
- Seaborn allows us to make complicated plots even in a single line of code!
- In this tutorial, we will be using three libraries to get the job done — Matplotlib, Seaborn, and Pandas.
- A box plot is used for depicting groups of numerical data through their quartiles.
- Factor plots make it easy to separate plots by categorical classes. You can make more visualizations like these, by simply changing the variable names and running the same lines of code.
You can check out the entire article here.
I hope you found this blog post insightful. Please do share it with your friends & family. You can reach out to me on LinkedIn. I am quite active here & I will be happy to have a conversation with you. Please feel free to drop your feedback in the comments that helps me to improve the quality of my work. I will keep on sharing more content as I grow & mature as a Data Scientist. Until next time, Keep Hustling & Keep Up with Data Science. Happy Learning 🙂