Categories of Data Science Tools

There are many open-source tools for Data Science.

Data Management is the process of persisting and retrieving of the data. With regards to the persisting of data means the data will be there even after the process is removed i.e. the continuation of the effect even after its effect has been removed from the system.

Data Integration and Transformation often referred as ETL i.e. Extract, Transform and Load is the process of retrieving the data from the remote data management systems. Transforming the data and loading into local data management system is also a part of Data Integration and Transformation.

Data Visualization is a part of initial data exploration process as well as being part of a final deliverable.

Model Building is a process of creating Machine Learning and Deep Learning models using appropriate learning algorithms with a lot of data.

Model Deployment makes such a machine learning or deep learning models available to third-party applications.

Model Monitoring and Accessment ensures continuous performance and quality checks on the deployed models. These checks are for accuracy, fairness and robustness.

Code Asset Management uses versioning and collaborative features to facilitate team work.

Data Asset Management supports backups, replication and access right management.

Development Environments commonly known as IDEs i.e. Integrated Development Environments help the data scientist to implement, execute, test and deploy their work.

Execution Environments are tools where data processing, model training and model deploying takes place.