pandas is a software library written for the Python programming language for data manipulation and analysis. 

In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license.

The name is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals.

Its name is a play on the phrase "Python data analysis" itself.



Library Features

  • DataFrame object for data manipulation with integrated indexing.
  • Tools for reading and writing data between in-memory data structures and different file formats.
  • Data alignment and integrated handling of missing data.
  • Reshaping and pivoting of data sets.
  • Label-based slicing, fancy indexing, and subsetting of large data sets.
  • Data structure column insertion and deletion.
  • Group by engine allowing split-apply-combine operations on data sets.
  • Data set merging and joining.
  • Hierarchical axis indexing to work with high-dimensional data in a lower-dimensional data structure.
  • Time series-functionality: Date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging.

Provides data filtration.



Official Documentation


Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+. There is also a procedural "pylab" interface based on a state machine (like OpenGL), designed to closely resemble that of MATLAB, though its use is discouraged. SciPy makes use of Matplotlib.


Matplotlib was originally written by John D. Hunter, since then it has an active development community, and is distributed under a BSD-style license. Michael Droettboom was nominated as matplotlib's lead developer shortly before John Hunter's death in August 2012, and further joined by Thomas Caswell.


Matplotlib 2.0.x supports Python versions 2.7 through 3.6. Python 3 support started with Matplotlib 1.2. Matplotlib 1.4 is the last version to support Python 2.6. Matplotlib has pledged not to support Python 2 past 2020 by signing the Python 3 Statement.


Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible.


  • Create publication quality plots.
  • Make interactive figures that can zoom, pan, update.
  • Customize visual style and layout.
  • Export to many file formats.
  • Embed in JupyterLab and Graphical User Interfaces.
  • Use a rich array of third-party packages built on Matplotlib.

Official Documentation


Plotly, also known by its URL, plot.ly, is a technical computing company headquartered in Montreal, Quebec, that develops online data analytics and visualization tools. 

Plotly provides online graphing, analytics, and statistics tools for individuals and collaboration, as well as scientific graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, and REST.

You get interactive graphs by using this library.


Plotly offers open-source and enterprise products.


  • Dash is an open-source Python, R, and Julia framework for building web-based analytic applications. Many specialized open-source Dash libraries exist that are tailored for building domain-specific Dash components and applications. Some examples are Dash DAQ, for building data acquisition GUIs to use with scientific instruments, and Dash Bio, which enables users to build custom chart types, sequence analysis tools, and 3D rendering tools for bioinformatics applications.
  • Dash Enterprise is Plotly’s paid product for building, testing, deploying, managing and scaling Dash applications organization-wide.
  • Chart Studio Cloud is a free, online tool for creating interactive graphs. It has a point-and-click graphical user interface for importing and analyzing data into a grid and using stats tools. Graphs can be embedded or downloaded.
  • Chart Studio Enterprise is a paid product that allows teams to create, style, and share interactive graphs on a single platform. It offers expanded authentication and file export options, and does not limit sharing and viewing.
  • Data visualization libraries Plotly.js is an open-source JavaScript library for creating graphs and powers Plotly.py for Python, as well as Plotly.R for R, MATLAB, Node.js, Julia, and Arduino and a REST API. Plotly can also be used to style interactive graphs with Jupyter notebook.
  • Figure Converters which convert matplotlib, ggplot2, and IGOR Pro graphs into interactive, online graphs.

Official Documentation for Plotly Open Source Graphing Library

Let us have a look at ways in which one could use these libraries.


Open the Colab notebook

Exercise:


Dataset Link: https://cds.santechz.com/userfiles/media/default/datasets/cities.csv


Using the above dataset named cities, do the following analysis:


  • Find out five cities around the world with the lowest population density.
  • Find out five cities around the world with the highest population desnsity.
  • Find out the city with the least population density in Australia.
  • Plot five American cities with the most population on a bar chart.
  • Plot the cities of India on a map along with their respective population densities.
  • Compare the total population densities of four countries on a pie chart. The countries are as follows: United States, Japan, Australia and Canada