Various Data Libraries for Data Science.

Libraries are the collection of methods and functions that enable you to perform a wide variety of actions without writing the code yourself. Libraries usually contain built-in-modules providing different functionalities that you can use directly. These are sometimes called as frameworks.

The focus here is on Python Libraries.

1. Scientific Libraries-

a) Pandas: It offers data structures and tools for effective data cleaning, manipulation and analysis. It provides tools to work with different types of data. There will be 2-D table consisting of rows and columns. This table is known as DataFrame and is designed to provide easy indexing so you can work with your data.

b) NumPy: It is based on arrays and enables you to apply mathematical functions to these arrays. NOTE: Pandas is built on top of NumPy.

2. Visualization Libraries- Data Visualization methods are a great way to communicate with each others and shows the meaningful results of analysis. These libraries enables you to create charts, graphs and maps.

a) Matplotlib

b) Seaborn: Generates plots like heat maps, time series and violin plots.

3. Machine Learning and Deep Learning Libraries-

a) Scikit-learn:(For ML)It contains tools for statistical modeling including regression, classification and clustering. It is built on NumPy, SciPy and Matplotlib. The high level interface of this library enables you to build models quickly and simply. It can function using GPU(Graphics Processing Units).

b) Keras:(For DL and Neural Networks) It enables you to build the standard deep learning models.

c) TensorFlow: It is a lower level framework used in large scale production of DL models. Used for Production and Deployment.

d) PyTorch: Used for experimentation making it easy for researches to test their ideas.

There are various other libraries-

Apache Spark- A cluster computing framework that enables you to process data using compute clusters. This means that you can process data in parallel using multiple computers simultaneously. This Spark library has same functionality as Pandas, NumPy and Scikit-learn. Apache Spark data processing jobs (huge data) can use Python, R, Scala and SQL.

Let’s talk about Scala Libraries which are complementary to Spark.

a)Vegas: For statistical data visualizations. Works with data files as well as Spark DataFrames.

b)Big DL: For Deep Learning.

Anirudh Suri

Anirudh Suri

Various Data Libraries for Data Science.