Numpy

NumPy is a Python tool that helps you work with numbers and arrays easily. It allows you to create and use multi-dimensional arrays, which are like lists but can have many rows and columns. NumPy is very fast and useful for handling large amounts of data.

You can do many math operations with NumPy, like multiplying matrices, adding or subtracting numbers in arrays, and applying functions like squares or logarithms to each number.
It also has special functions for math related to linear algebra and random number generation.
NumPy works well with other programming languages like C and C++, and it’s faster than regular Python lists because it uses precompiled code.
To start using it, you just need to install it with a simple command and then import it in your Python program.

So, NumPy is a powerful and easy-to-use package for mathematic and data tasks in Python.

Array

In Python, an array is a collection of elements that all have the same data type, such as all integers, all floats, or all characters. Arrays are different from Python lists because lists can hold elements of mixed types (for example, an integer, a string, and a float in the same list), while arrays require all elements to be of the same type.

Arrays are useful when working with large amounts of data of the same type, such as numbers or characters, because they are more memory-efficient and can perform operations faster than lists. One of the benefits of arrays is that they can grow or shrink dynamically, meaning you can add or remove elements without needing to define the array’s size ahead of time.

To work with arrays in Python, you need to import the built-in array module. You can do this using either import array or from array import *.

Once the module is imported, you can create an array using the syntax:

array_name = array(type_code, [elements])

In this syntax, type_code is a single-character string that specifies the type of data the array will hold. For example:

'i' is used for signed integers.
'f' is used for floating-point numbers.
'u' is used for Unicode characters.

Here is an example of creating an array of integers:

from array import array

numbers = array('i', [1, 2, 3, 4])

NumPy Arrays

NumPy arrays are a more powerful and flexible type of array provided by the NumPy library, which is widely used in data science and scientific computing.

A NumPy array is an N-dimensional array object, which means it can represent data not only in one dimension (like a simple list) but also in two dimensions (like a table), three dimensions, or even more.
NumPy arrays are created from nested Python lists, and all elements in the array must have the same data type.
Each NumPy array has a data-type object, known as dtype, which defines the type and size of its elements.
The elements are accessed using zero-based indexing, meaning that the first element is at index 0.

To use NumPy arrays, you must first install and import the NumPy library. You can install it with pip install numpy, and then import it with:

import numpy as np

Once NumPy is imported, you can create an array like this:

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])

In this example, arr is a two-dimensional array with two rows and three columns.

One-Dimensional Array

A one-dimensional array is the simplest form of an array. It is a linear collection of elements arranged in a single row or column.

Example:

import numpy as np

arr1d = np.array([10, 20, 30, 40, 50])

print("Full 1D array:")
print(arr1d)

# Accessing a specific element
print("\nAccess arr1d[2] (Third element):")
print(arr1d[2]) # Output: 30

# Slicing examples
print("\nSlicing arr1d[1:4] (Elements from index 1 to 3):")
print(arr1d[1:4]) # Output: [20 30 40]

print("\nSlicing arr1d[:3] (Elements from start to index 2):")
print(arr1d[:3]) # Output: [10 20 30]

print("\nSlicing arr1d[2:] (Elements from index 2 to end):")
print(arr1d[2:]) # Output: [30 40 50]

Output:

Full 1D array:
[10 20 30 40 50]

Access arr1d[2] (Third element):
30

Slicing arr1d[1:4] (Elements from index 1 to 3):
[20 30 40]

Slicing arr1d[:3] (Elements from start to index 2):
[10 20 30]

Slicing arr1d[2:] (Elements from index 2 to end):
[30 40 50]

Two-Dimensional Array

A two-dimensional array has both rows and columns, similar to a table or matrix. Each row can contain multiple columns of data, and the data is still of the same type throughout the array.

Example:

import numpy as np

arr2d = np.array([[1, 2, 3], [4, 5, 6]])

print("Full 2D array:")
print(arr2d)

# Accessing a specific element
print("\nAccess arr2d[1][2] (Second row, third element):")
print(arr2d[1][2]) # Output: 6

# Slicing examples
print("\nSlicing arr2d[0] (First row):")
print(arr2d[0])

print("\nSlicing arr2d[1, :2] (First two elements of second row):")
print(arr2d[1, :2])

print("\nSlicing arr2d[:, 0] (First column of all rows):")
print(arr2d[:, 0])

Output:

Full 2D array:
[[1 2 3]
[4 5 6]]

Access arr2d[1][2] (Second row, third element):
6

Slicing arr2d[0] (First row):
[1 2 3]

Slicing arr2d[1, :2] (First two elements of second row):
[4 5]

Slicing arr2d[:, 0] (First column of all rows):
[1 4]

Three-Dimensional Array

A three-dimensional array consists of multiple 2D arrays stacked together. You can think of it as a cube or a collection of tables. It has depth (or layers), rows, and columns.

Example:

import numpy as np

arr3d = np.array([
[[1, 2, 3], [4, 5, 6]],
[[7, 8, 9], [10, 11, 12]]
])

print("Full 3D array:")
print(arr3d)

# Accessing a specific element
print("\nAccess arr3d[1][0][2] (Second block, first row, third element):")
print(arr3d[1][0][2]) # Output: 9

# Slicing examples
print("\nSlicing arr3d[0] (First block):")
print(arr3d[0])

print("\nSlicing arr3d[1, 1] (Second block, second row):")
print(arr3d[1, 1])

print("\nSlicing arr3d[:, :, 0] (First element of each row in all blocks):")
print(arr3d[:, :, 0])

Output:

Full 3D array:
[[[ 1 2 3]
[ 4 5 6]]

[[ 7 8 9]
[10 11 12]]]

Access arr3d[1][0][2] (Second block, first row, third element):
9

Slicing arr3d[0] (First block):
[[1 2 3]
[4 5 6]]

Slicing arr3d[1, 1] (Second block, second row):
[10 11 12]

Slicing arr3d[:, :, 0] (First element of each row in all blocks):
[[ 1 4]
[ 7 10]]

This array has:

2 blocks (depth layers)
Each block has 2 rows
Each row has 3 columns

Operations on Arrays

We have many functions in numpy. First, import the module:

from array import array

Common Methods:

Method	Description	Example	Output
`.append(x)`	Adds an element `x` to the end of the array	`arr.append(6)`	`[1, 2, 3, 6]`
`.insert(i, x)`	Inserts element `x` at position `i`	`arr.insert(1, 9)`	`[1, 9, 2, 3]`
`.remove(x)`	Removes the first occurrence of element `x`	`arr.remove(2)`	`[1, 3]`
`.pop([i])`	Removes and returns the element at index `i` (last if `i` not given)	`arr.pop()`	returns `3`
`.index(x)`	Returns the index of the first occurrence of `x`	`arr.index(3)`	`2`
`.reverse()`	Reverses the order of elements in the array in-place	`arr.reverse()`	`[3, 2, 1]`

`.count(x)`	Counts how many times `x` appears in the array	`arr.count(2)`	`1`
`.extend(iterable)`	Adds elements from an iterable (e.g., list or array) to the array	`arr.extend([7, 8])`	`[1, 2, 3, 7, 8]`

Program:

from array import array

# Create an integer array
arr = array('i', [1, 2, 3])
print("Original array:")
print(arr)

# Append an element at the end
arr.append(4)
print("\nAfter append(4):")
print(arr)

# Insert an element at index 1
arr.insert(1, 9)
print("\nAfter insert(1, 9):")
print(arr)

# Remove the first occurrence of value 2
arr.remove(2)
print("\nAfter remove(2):")
print(arr)

# Pop the last element
popped = arr.pop()
print("\nAfter pop():")
print("Popped element:", popped)
print("Array now:", arr)

# Find the index of element 3
index_of_3 = arr.index(3)
print("\nIndex of element 3:")
print(index_of_3)

# Reverse the array
arr.reverse()
print("\nAfter reverse():")
print(arr)

# Count how many times 1 appears
count_1 = arr.count(1)
print("\nCount of 1 in array:")
print(count_1)

# Extend the array with another list
arr.extend([7, 8])
print("\nAfter extend([7, 8]):")
print(arr)

Output:

Original array:
array('i', [1, 2, 3])

After append(4):
array('i', [1, 2, 3, 4])

After insert(1, 9):
array('i', [1, 9, 2, 3, 4])

After remove(2):
array('i', [1, 9, 3, 4])

After pop():
Popped element: 4
Array now: array('i', [1, 9, 3])

Index of element 3:
2

After reverse():
array('i', [3, 9, 1])

Count of 1 in array:
1

After extend([7, 8]):
array('i', [3, 9, 1, 7, 8])

Concatenating Arrays

Concatenation means joining two or more arrays into a single array. This is very common when you're combining datasets, rows of data, or features in machine learning and data analysis.

Concatenation can happen in:

Row-wise direction (adding more rows): axis=0
Column-wise direction (adding more columns): axis=1

You can use:

np.concatenate() → more general
np.vstack() → vertical stack (row-wise)
np.hstack() → horizontal stack (column-wise)

Example 1: Concatenating 1D Arrays

import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

result = np.concatenate((a, b))
print(result)

Output:

[1 2 3 4 5 6]

Example 2: 2D Arrays – Row-wise Concatenation

a = np.array([[1, 2],[3, 4]])
b = np.array([[5, 6],[7, 8]])

row_concat = np.concatenate((a, b), axis=0)
print(row_concat)

Output:

[[1 2]
[3 4]
[5 6]
[7 8]]

Example 3: 2D Arrays – Column-wise Concatenation

a = np.array([[1, 2],[3, 4]])
b = np.array([[5, 6],[7, 8]])

col_concat = np.concatenate((a, b), axis=1)
print(col_concat)

Output:

[[1 2 5 6]
[3 4 7 8]]

Example 4: 2D Arrays – Vertical Stack (Row-wise, like axis=0)

It is the same as np.concatenate((a, b), axis=0).

a = np.array([[1, 2],[3, 4]])

b = np.array([[5, 6],[7, 8]])

print(np.vstack((a, b)))

Output:

[[1 2]
[3 4]
[5 6]
[7 8]]

Example 5: 2D Arrays – Horizontal Stack (Column-wise, like axis=1)

It is same as np.concatenate((a, b), axis=1).

a = np.array([[1, 2],[3, 4]])
b = np.array([[5, 6],[7, 8]])

print(np.hstack((a, b)))

Output:

[[1 2 5 6]
[3 4 7 8]]

Example 6: 2D Arrays – Depth Stack

It joins arrays along the 3rd dimension (depth).

a = np.array([[1, 2],[3, 4]])
b = np.array([[5, 6],[7, 8]])

print(np.dstack((a, b)))

Output:

[[[1 5]
[2 6]]

[[3 7]
[4 8]]]

Reshaping Arrays

Reshaping means changing the shape (rows × columns × dimensions) of an array without changing its data.

It is useful when you want to reorganize data for analysis, machine learning, or matrix operations.

Function:

numpy.reshape(array, newshape)

Example 1: 1D → 2D Reshaping

import numpy as np

arr = np.arange(12) # Creates array from 0 to 11 (12 elements)
print("Original Array:", arr)

reshaped = arr.reshape(3, 4) # Reshape into 3 rows and 4 columns
print("Reshaped to 3x4:\n", reshaped)

Output:

Original Array: [ 0 1 2 3 4 5 6 7 8 9 10 11]

Reshaped to 3x4:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]

Example 2: Using -1 for Auto Dimension

arr = np.arange(12)
reshaped = arr.reshape(4, -1) # NumPy figure out the number of columns
print("Reshaped with -1:\n", reshaped)

Output:

Reshaped with -1:
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]

Example 3: 1D → 3D Reshaping

arr = np.arange(8)
reshaped = arr.reshape(2, 2, 2)
print("Reshaped to 2x2x2:\n", reshaped)

Output:

Reshaped to 2x2x2:
[[[0 1]
[2 3]]

[[4 5]
[6 7]]]

Example 4: Flattening (3D/2D → 1D)

arr = np.arange(12).reshape(3, 4)
flat = arr.reshape(-1)
print("Flattened Array:", flat)

Output:

Flattened Array: [ 0 1 2 3 4 5 6 7 8 9 10 11]

Splitting Arrays

Splitting means dividing one array into multiple sub-arrays.

This is useful when you want to:

Separate data into chunks
Divide features and labels in machine learning
Organize large arrays into smaller sections

NumPy Functions for Splitting:

Function	Description
`np.split()`	Split an array into equal parts
`np.array_split()`	Split into unequal parts if needed
`np.hsplit()`	Split along columns (axis=1)
`np.vsplit()`	Split along rows (axis=0)

Example 1: Using np.split() – Equal Split (1D Array)

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])
split_arr = np.split(arr, 3) # Split into 3 equal parts

print("Original Array:", arr)
print("Split Arrays:", split_arr)

Output:

Original Array: [1 2 3 4 5 6]
Split Arrays: [array([1, 2]), array([3, 4]), array([5, 6])]

Example 2: np.array_split() – Unequal Split Allowed

arr = np.array([1, 2, 3, 4, 5, 6, 7])
split_arr = np.array_split(arr, 3)

print("Split into 3 (unequal):", split_arr)

Output:

Split into 3 (unequal): [array([1, 2, 3]), array([4, 5]), array([6, 7])]

Example 3: Splitting a 2D Array (Row-wise)

arr2d = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
split_rows = np.vsplit(arr2d, 2)

print("Original Array:\n", arr2d)
print("Row-wise Split:\n", split_rows)

Output:

Original Array:
[[1 2]
[3 4]
[5 6]
[7 8]]
Row-wise Split:
[array([[1, 2],
[3, 4]]),
array([[5, 6],
[7, 8]])]

Example 4: Splitting a 2D Array (Column-wise)

split_cols = np.hsplit(arr2d, 2)

print("Column-wise Split:\n", split_cols)

Output:

Column-wise Split:
[array([[1],
[3],
[5],
[7]]),
array([[2],
[4],
[6],
[8]])]

Statistical Operations on Arrays

Function / Attribute	Description
`ndim`	Returns the number of array dimensions
`shape`	Returns a tuple of array dimensions
`size`	Returns the total number of elements
`transpose()`	Transposes the array (rows ↔ columns)
`ravel()`	Flattens the array into 1D
`np.zeros()`	Creates an array of all zeros
`np.ones()`	Creates an array of all ones
`np.linspace()`	Creates evenly spaced values between a range
`np.max()`	Returns the maximum element
`np.min()`	Returns the minimum element
`np.sum()`	Returns the sum of all elements
`np.sqrt()`	Returns the square root of each element
`np.std()`	Returns the standard deviation
`np.sort()`	Returns a sorted copy of the array

Program

import numpy as np

# Create a 2D array

arr = np.array([[10, 20, 30], [40, 50, 60]])

# 1D array for math ops

arr1d = np.array([4, 1, 9, 3, 5])

# Zeros, Ones, Linspace arrays

zeros_arr = np.zeros((2, 2))

ones_arr = np.ones((2, 2))

lin_arr = np.linspace(1, 5, 5)

print("Original 2D Array:\n", arr)

# Properties

print("\nNumber of Dimensions (ndim):", arr.ndim)

print("Shape:", arr.shape)

print("Size:", arr.size)

# Transformations

print("\nTranspose:\n", arr.transpose())

print("Ravel (flattened):", arr.ravel())

# Array creation

print("\nZeros Array:\n", zeros_arr)

print("Ones Array:\n", ones_arr)

print("Linspace Array (1 to 5, 5 parts):", lin_arr)

# Statistical and Math Operations

print("\n1D Array for math:", arr1d)

print("Maximum:", np.max(arr1d))

print("Minimum:", np.min(arr1d))

print("Sum:", np.sum(arr1d))

print("Square Roots:", np.sqrt(arr1d))

print("Standard Deviation:", np.std(arr1d))

print("Sorted Array:", np.sort(arr1d))

Output:

Original 2D Array:

[[10 20 30]

[40 50 60]]

Number of Dimensions (ndim): 2

Shape: (2, 3)

Size: 6

Transpose:

[[10 40]

[20 50]

[30 60]]

Ravel (flattened): [10 20 30 40 50 60]

Zeros Array:

[[0. 0.]

[0. 0.]]

Ones Array:

[[1. 1.]

[1. 1.]]

Linspace Array (1 to 5, 5 parts): [1. 2. 3. 4. 5.]

1D Array for math: [4 1 9 3 5]

Maximum: 9

Minimum: 1

Sum: 22

Square Roots: [2. 1. 3. 1.73205081 2.23606798]

Standard Deviation: 2.6076809620810595

Sorted Array: [1 3 4 5 9]

Data Handling using Pandas

Pandas is a powerful open-source data manipulation and analysis library for Python.

It provides data structures like DataFrame and Series, which are used to work with structured data.
It provides fast, flexible, data structures that make it easy to handle and analyze large datasets, similar to working with tables or spreadsheets.
It is a Python library used to analyze, clean, explore, and manipulate data in a structured and efficient way.

It’s widely used in:

Data science
Machine learning
Real-world data processing
CSV/Excel file handling

Key Features

1. Data Structures

Series → A one-dimensional array-like object. It’s like a single column from a DataFrame.
DataFrame → This is the main structure in Pandas, similar to a table or spreadsheet. It consists of rows and columns, and you can easily manipulate, filter, and analyze data within it.

2. Data Manipulation

Handling missing data easily.
Data filtering, sorting, grouping, and aggregation.
Merging and joining datasets.
Time-series functionality (date parsing, resampling, etc.).

3. Performance

Built on NumPy, so it’s optimized for performance.
Vectorized operations for fast computation.

4. Integration

Works well with libraries like NumPy, Matplotlib, and Scikit-learn.

5. Indexing and Selection

Selecting rows and columns using labels or indices
Conditional selection using boolean indexing

6. Data handling

You can import/export data from various formats like CSV, Excel, SQL databases, JSON, HTML, etc.

Advantages

1. Efficient Data Handling

Pandas provides fast, flexible, and expressive data structures (DataFrames and Series), which are optimized for performance. These structures allow you to perform complex data manipulation quickly and efficiently.

2. Easy to Use

The syntax is intuitive, making it easy to learn for beginners and efficient for experienced users. The API is highly user-friendly, allowing you to perform data manipulations with minimal code.

3. Handling Missing Data

Pandas provides built-in methods for dealing with missing data (NaN), such as filling, dropping, or forward/backward filling, which is critical for real-world data analysis.

4. Support for Various File Formats

Pandas can read and write data from a variety of formats: CSV, Excel, JSON, HDF5, SQL databases, and more. This makes it a great choice for integrating different data sources.

5. Integration with Other Libraries

Pandas integrates seamlessly with other Python libraries like NumPy (for numerical computations), Matplotlib/Seaborn (for visualization), and Scikit-learn (for machine learning), allowing for a full data science workflow.

Disadvantages

1. Memory Consumption

While Pandas is fast and powerful, it can be memory-intensive, especially when working with large datasets. The DataFrame structure can consume a lot of RAM, which can lead to performance issues on very large datasets (over a few GBs).

2. Learning Curve

While basic operations in Pandas are easy to learn, more advanced features (like multi-indexing, pivoting, or complex aggregation) can be tricky for newcomers to grasp.

3. Inconsistent Performance on Specific Operations

While Pandas is highly optimized, some operations (like string manipulations, or using apply() with custom functions) can be slower than vectorized operations with NumPy or more specialized libraries.

Applications

1. Data Analysis

Exploratory Data Analysis (EDA): Pandas is commonly used for EDA to analyze datasets, calculate statistics (mean, median, mode), and understand the distribution of data.
Data Summarization: It’s used for calculating summary statistics like average, standard deviation, etc., as well as grouping data by categories for aggregation.

2. Data Cleaning

Handling Missing Data: Removing, filling, or interpolating missing values in datasets.
Data Transformation: Renaming columns, changing data types, applying functions across columns/rows.
Dealing with Outliers: Detecting and handling outliers in datasets.

3. Machine Learning Data Preprocessing

Data Preprocessing: Before applying machine learning algorithms, data often needs to be cleaned, transformed, and standardized. Pandas is a key tool in preparing data for machine learning models.
Feature Engineering: Creating new features from existing data (e.g., time-based features, aggregating data) is done with Pandas.

Pandas Data Structures

Pandas provides 2 data structures:

1. Series

A Series is a one-dimensional labeled array that can hold any data type (integers, strings, floats, Python objects, etc.).

Key Features:

Homogeneous data (all elements are of the same type)
Associated labels (index)
Can be thought of like a column in a spreadsheet or SQL table

Creation of Series

Series are generally created from:

Arrays
Lists
Dict

From Arrays:

import pandas as pd

import numpy as np

# Create a pandas Series from a NumPy array

arr= np.array([10, 20, 30, 40])

s = pd.Series(arr)

print(s)

Output:

0 10

1 20

2 30

3 40

dtype: int64

Left column is index
Right column is data

From Lists:

import pandas as pd

# Creating a Series from a list

s = pd.Series([10, 20, 30, 40])

print(s)

Output:

0 10

1 20

2 30

3 40

dtype: int64

From Dictionary:

# Create a pandas Series from a dictionary

dict_data = {'a': 100, 'b': 200, 'c': 300}

series_from_dict = pd.Series(dict_data)

print(series_from_dict)

Output:

a 100

b 200

c 300

dtype: int64

2. DataFrames

A DataFrame is a two-dimensional labeled data structure with columns (like a spreadsheet or SQL table).

Key Features:

Heterogeneous data (each column can have a different data type)
Labeled axes (rows and columns)
Flexible indexing and powerful data manipulation

DataFrames in pandas can be created from :

List
List of tuples
Dictionary
Excel Spreadsheet files
csv (common separated values) files

From a list:

import pandas as pd

# List of lists

data = [['Swathi', 'Vizag'], ['Surya', 'Hyderabad'], ['Chinnu', 'Pune']]

df = pd.DataFrame(data, columns=['Name', 'City'])

print(df)

Output:

Name City

0 Swathi Vizag

1 Surya Hyderabad

2 Chinnu Pune

From a list of tuples:

import pandas as pd

# List of tuples

data = [('Swathi', 'Vizag'), ('Surya', 'Hyderabad'), ('Chinnu', 'Pune')]

df = pd.DataFrame(data, columns=['Name', 'City'])

print(df)

Output:

Name City

0 Swathi Vizag

1 Surya Hyderabad

2 Chinnu Pune

From a Dictionary

import pandas as pd

# Creating a DataFrame with your data

data = {

'Name': ['Swathi', 'Surya', 'Chinnu'],

'City': ['Vizag', 'Hyderabad', 'Pune'],

'Age': [28, 32, 26],

'Salary': [60000, 75000, 52000]

}

df = pd.DataFrame(data)

print(df)

Output:

Name City Age Salary

0 Swathi Vizag 28 60000

1 Surya Hyderabad 32 75000

2 Chinnu Pune 26 52000

From an Excel File (.xlsx)

df = pd.read_excel('data.xlsx') # install openpyxl if not already: pip install openpyxl

print(df)

You need to have the Excel file (data.xlsx) in your directory, or provide the full path.

From a CSV File (.csv)

df = pd.read_csv('data.csv')

print(df)

You need to have the CSV file (data.csv) in your directory, or provide the full path.

Search This Blog

Python Programming

Numpy and Data Handling using Pandas

Numpy

Array

NumPy Arrays

One-Dimensional Array

Two-Dimensional Array

Three-Dimensional Array

Operations on Arrays

Concatenating Arrays

Reshaping Arrays

Splitting Arrays

Statistical Operations on Arrays

Data Handling using Pandas

Pandas Data Structures

Creation of Series

Comments

Post a Comment

Popular posts from this blog

Getting started with Python, Strings

FUNCTIONS, PYTHON OOPS AND EXCEPTION HANDLING