dreaife

Announcement

welcome to my blog

Learn More

Tags

dreaife

Announcement

welcome to my blog

Learn More

Site Statistics

Posts

71

Categories

13

Tags

58

Total Words

127,637

Running Days

0 days

Last Activity

0 days ago

Tags

dreaife

Announcement

welcome to my blog

Learn More

Site Statistics

Posts

71

Categories

13

Tags

58

Total Words

127,637

Running Days

0 days

Last Activity

0 days ago

Tags

Categories

3523 words

18 minutes

Learning Basic SciPy Usage

2024-01-09

cs-base

python

Scipy#

Introduction#

SciPy is an open-source Python library for mathematics and scientific computing.

SciPy is a scientific computing library built on NumPy, used in mathematics, science, engineering, and other fields where many advanced abstractions and physical models require SciPy.

SciPy includes modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier transforms, signal processing and image processing, solving ordinary differential equations, and other computations commonly used in science and engineering.

Applications#

SciPy is a widely used package for mathematics, science, and engineering, capable of handling optimization, linear algebra, integration, interpolation, fitting, special functions, fast Fourier transforms, signal processing, image processing, solvers for ordinary differential equations, and more.

SciPy includes modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier transforms, signal processing and image processing, solving ordinary differential equations, and other computations common in science and engineering.

The synergy between NumPy and SciPy enables efficient solutions to many problems, with broad applications in astronomy, biology, meteorology and climate science, as well as materials science and other disciplines.

Installation#

1
python3 -m pip install -U pip
2
python3 -m pip install -U scipy

Verify the installation:

1
import scipy
2

3
print(scipy.__version__)

Module List#

The following lists some commonly used SciPy modules and their official API URLs:

Module name	Function / Description	Reference documentation
scipy.cluster	Vector quantization	cluster API
scipy.constants	Mathematical constants	constants API
scipy.fft	Fast Fourier Transform	fft API
scipy.integrate	Integration	integrate API
scipy.interpolate	Interpolation	interpolate API
scipy.io	Data input/output	io API
scipy.linalg	Linear algebra	linalg API
scipy.misc	Image processing	misc API
scipy.ndimage	N-dimensional image	ndimage API
scipy.odr	Orthogonal distance regression	odr API
scipy.optimize	Optimization algorithms	optimize API
scipy.signal	Signal processing	signal API
scipy.sparse	Sparse matrices	sparse API
scipy.spatial	Spatial data structures and algorithms	spatial API
scipy.special	Special mathematical functions	special API
scipy.stats	Statistical functions	stats.mstats API

For more module content, see the official documentation: https://docs.scipy.org/doc/scipy/reference/

SciPy Constants Module#

SciPy’s constants module, constants, provides many built-in mathematical constants.

Pi is a mathematical constant—the ratio of a circle’s circumference to its diameter, approximately 3.14159, commonly denoted by the symbol π.

The following prints pi:

1
from scipy import constants
2

3
print(constants.pi)

The following prints the golden ratio:

1
from scipy import constants
2

3
print(constants.golden)

We can use the dir() function to see which constants are contained in the constants module:

1
from scipy import constants
2

3
print(dir(constants))

Unit Types#

The constants module contains the following kinds of units:

SI prefixes The International System of Units prefixes (SI prefixes) denote multiples and submultiples of units; there are currently 20 prefixes, most of which are powers of ten. (centi equals 0.01):

1
from scipy import constants
2

3
print(constants.yotta)    #1e+24
4
print(constants.zetta)    #1e+21
5
print(constants.exa)      #1e+18
6
print(constants.peta)     #1000000000000000.0
7
print(constants.tera)     #1000000000000.0
8
print(constants.giga)     #1000000000.0
9
print(constants.mega)     #1000000.0
10
print(constants.kilo)     #1000.0
11
print(constants.hecto)    #100.0
12
print(constants.deka)     #10.0
13
print(constants.deci)     #0.1
14
print(constants.centi)    #0.01
15
print(constants.milli)    #0.001
16
print(constants.micro)    #1e-06
17
print(constants.nano)     #1e-09
18
print(constants.pico)     #1e-12
19
print(constants.femto)    #1e-15
20
print(constants.atto)     #1e-18
21
print(constants.zepto)    #1e-21

Binary, in bytes Returns byte units (kibi = 1024).

1
from scipy import constants
2

3
print(constants.kibi)    #1024
4
print(constants.mebi)    #1048576
5
print(constants.gibi)    #1073741824
6
print(constants.tebi)    #1099511627776
7
print(constants.pebi)    #1125899906842624
8
print(constants.exbi)    #1152921504606846976
9
print(constants.zebi)    #1180591620717411303424
10
print(constants.yobi)    #1208925819614629174706176

Mass units Returns kilograms (kg). (gram returns 0.001).

1
from scipy import constants
2

3
print(constants.gram)        #0.001
4
print(constants.metric_ton)  #1000.0
5
print(constants.grain)       #6.479891e-05
6
print(constants.lb)          #0.45359236999999997
7
print(constants.pound)       #0.45359236999999997
8
print(constants.oz)          #0.028349523124999998
9
print(constants.ounce)       #0.028349523124999998
10
print(constants.stone)       #6.3502931799999995
11
print(constants.long_ton)    #1016.0469088
12
print(constants.short_ton)   #907.1847399999999
13
print(constants.troy_ounce)  #0.031103476799999998
14
print(constants.troy_pound)  #0.37324172159999996
15
print(constants.carat)       #0.0002
16
print(constants.atomic_mass) #1.66053904e-27
17
print(constants.m_u)         #1.66053904e-27
18
print(constants.u)           #1.66053904e-27

Angle conversions Returns radians (degree returns 0.017453292519943295).

1
from scipy import constants
2

3
print(constants.degree)     #0.017453292519943295
4
print(constants.arcmin)     #0.0002908882086657216
5
print(constants.arcminute)  #0.0002908882086657216
6
print(constants.arcsec)     #4.84813681109536e-06
7
print(constants.arcsecond)  #4.84813681109536e-06

Time units Returns seconds (hour returns 3600.0).

1
from scipy import constants
2

3
print(constants.minute)      #60.0
4
print(constants.hour)        #3600.0
5
print(constants.day)         #86400.0
6
print(constants.week)        #604800.0
7
print(constants.year)        #31536000.0
8
print(constants.Julian_year) #31557600.0

Length units Returns meters (nautical_mile returns 1852.0).

1
from scipy import constants
2

3
print(constants.inch)              #0.0254
4
print(constants.foot)              #0.30479999999999996
5
print(constants.yard)              #0.9143999999999999
6
print(constants.mile)              #1609.3439999999998
7
print(constants.mil)               #2.5399999999999997e-05
8
print(constants.pt)                #0.00035277777777777776
9
print(constants.point)             #0.00035277777777777776
10
print(constants.survey_foot)       #0.3048006096012192
11
print(constants.survey_mile)       #1609.3472186944373
12
print(constants.nautical_mile)     #1852.0
13
print(constants.fermi)             #1e-15
14
print(constants.angstrom)          #1e-10
15
print(constants.micron)            #1e-06
16
print(constants.au)                #149597870691.0
17
print(constants.astronomical_unit) #149597870691.0
18
print(constants.light_year)        #9460730472580800.0
19
print(constants.parsec)            #3.0856775813057292e+16

Pressure units Returns pascals, the SI unit of pressure. (psi returns 6894.757293168361).

1
from scipy import constants
2

3
print(constants.atm)         #101325.0
4
print(constants.atmosphere)  #101325.0
5
print(constants.bar)         #100000.0
6
print(constants.torr)        #133.32236842105263
7
print(constants.mmHg)        #133.32236842105263
8
print(constants.psi)         #6894.757293168361

Area units Returns square meters, the metric unit of area; defined as the area of a square with side length 1 meter. (hectare returns 10000.0).
```
1
from scipy import constants
2

3
print(constants.hectare) #10000.0
4
print(constants.acre)    #4046.8564223999992
```

Volume units

Returns cubic meters; a volume of one cubic meter is the volume of a cube with sides of 1 meter; equal to 1 liter and 1 cubic decimeter, and equal to 1,000,000 cubic centimeters. (liter returns 0.001).

1
from scipy import constants
2

3
print(constants.liter)            #0.001
4
print(constants.litre)            #0.001
5
print(constants.gallon)           #0.0037854117839999997
6
print(constants.gallon_US)        #0.0037854117839999997
7
print(constants.gallon_imp)       #0.00454609
8
print(constants.fluid_ounce)      #2.9573529562499998e-05
9
print(constants.fluid_ounce_US)   #2.9573529562499998e-05
10
print(constants.fluid_ounce_imp)  #2.84130625e-05
11
print(constants.barrel)           #0.15898729492799998
12
print(constants.bbl)              #0.15898729492799998

Speed units Returns meters per second. (speed_of_sound returns 340.5).

1
from scipy import constants
2

3
print(constants.kmh)            #0.2777777777777778
4
print(constants.mph)            #0.44703999999999994
5
print(constants.mach)           #340.5
6
print(constants.speed_of_sound) #340.5
7
print(constants.knot)           #0.5144444444444445

Temperature units Returns kelvin. (zero_Celsius returns 273.15).

1
from scipy import constants
2

3
print(constants.zero_Celsius)      #273.15
4
print(constants.degree_Fahrenheit) #0.5555555555555556

Energy units Returns joules; the joule (symbol J) is the SI derived unit of energy, work, or heat. (calorie returns 4.184).
```
1
from scipy import constants
2

3
print(constants.calorie)      #4.184
```
Power units Returns watts; the watt (symbol W) is the SI unit of power. One watt is defined as one joule per second (1 J/s), i.e., the rate of energy conversion, use, or dissipation. (horsepower returns 745.6998715822701).
```
1
from scipy import constants
2

3
print(constants.hp)         #745.6998715822701
4
print(constants.horsepower) #745.6998715822701
```

Dynamical (mechanical) units Returns newtons; the newton (symbol N) is the SI unit of force. It is named after Isaac Newton, the founder of classical mechanics. (kilogram_force returns 9.80665).

1
from scipy import constants
2

3
print(constants.dyn)             #1e-05
4
print(constants.dyne)            #1e-05
5
print(constants.lbf)             #4.4482216152605
6
print(constants.pound_force)     #4.4482216152605
7
print(constants.kgf)             #9.80665
8
print(constants.kilogram_force)  #9.80665

SciPy Optimizers#

SciPy’s optimize module provides implementations of common optimization algorithms that we can call directly to solve optimization problems, such as finding the minimum of a function or the roots of equations.

Root Finding#

NumPy can find roots of polynomials and linear equations, but it cannot find roots of nonlinear equations, as shown below:

x + cos(x)

Therefore we can use SciPy’s optimize.root function, which requires two parameters:

fun - the function representing the equation.
x0 - initial guess for the root.

The function returns an object containing information about the solution.

1
from scipy.optimize import root
2
from math import cos
3

4
def eqn(x):
5
  return x + cos(x)
6

7
myroot = root(eqn, 0)
8

9
print(myroot.x)
10
# See more information
11
#print(myroot)

Minimizing Functions#

A function represents a curve with maxima and minima.

A high point is called a maximum.
A low point is called a minimum.
The highest point on the entire curve is the global maximum; the rest are local maxima.
The lowest point on the entire curve is the global minimum; the rest are local minima.

You can use the scipy.optimize.minimize() function to minimize a function.

minimize() accepts the following parameters:

fun - the function to optimize
x0 - initial guess
method - the name of the method to use; values can be: ‘CG’, ‘BFGS’, ‘Newton-CG’, ‘L-BFGS-B’, ‘TNC’, ‘COBYLA’, ‘SLSQP’.
callback - the function called after each optimization iteration.

options - dictionary for other parameters:

1
{
2
     "disp": boolean - print detailed description
3
     "gtol": number - the tolerance of the error
4
}

The minimization of x^2 + x + 2 using BFGS:

1
from scipy.optimize import minimize
2

3
def eqn(x):
4
  return x**2 + x + 2
5

6
mymin = minimize(eqn, 0, method='BFGS')
7

8
print(mymin)

SciPy Sparse Matrices#

A sparse matrix is a matrix in which the vast majority of the elements are zero. Conversely, if most elements are nonzero, the matrix is dense.

In science and engineering, large sparse matrices frequently arise when solving linear models.

The above sparse matrix contains only 9 nonzero elements, with 26 zeros. Its sparsity is 74%, density 26%.

SciPy’s scipy.sparse module provides functions for working with sparse matrices.

We primarily use the following two types of sparse matrices:

CSC - Compressed Sparse Column, compressed by column.
CSR - Compressed Sparse Row, compressed by row.

In this chapter we primarily use CSR matrices.

CSR Matrix#

We can create a CSR matrix by passing an array to the scipy.sparse.csr_matrix() function.

1
# Create a CSR matrix.
2
import numpy as np
3
from scipy.sparse import csr_matrix
4

5
arr = np.array([0, 0, 0, 0, 0, 1, 1, 0, 2])
6
print(csr_matrix(arr))

data Use the data attribute to view the stored data (excluding zero elements):

1
import numpy as np
2
from scipy.sparse import csr_matrix
3

4
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
5

6
print(csr_matrix(arr).data)

count_nonzero() Use count_nonzero() to count the total number of non-zero elements:

1
import numpy as np
2
from scipy.sparse import csr_matrix
3

4
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
5

6
print(csr_matrix(arr).count_nonzero())

eliminate_zeros() Use eliminate_zeros() to remove zero elements from the matrix:

1
import numpy as np
2
from scipy.sparse import csr_matrix
3

4
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
5

6
mat = csr_matrix(arr)
7
mat.eliminate_zeros()
8

9
print(mat)

sum_duplicates() Use sum_duplicates() to remove duplicates:

1
import numpy as np
2
from scipy.sparse import csr_matrix
3

4
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
5

6
mat = csr_matrix(arr)
7
mat.sum_duplicates()
8

9
print(mat)

tocsc() Convert CSR to CSC using tocsc():

1
import numpy as np
2
from scipy.sparse import csr_matrix
3

4
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
5

6
newarr = csr_matrix(arr).tocsc()
7

8
print(newarr)

SciPy Graph Structures#

Graphs are one of the most powerful frameworks in algorithmic theory.

A graph is a set of nodes (vertices) and edges representing relationships; nodes correspond to objects and edges connect them.

SciPy provides the scipy.sparse.csgraph module to handle graph structures.

Adjacency Matrix#

An Adjacency Matrix is a matrix representing the adjacency relationships between vertices.

The adjacency matrix structure consists of two sets: V (vertices) and E (edges); edges may have weights indicating the strength of connections between nodes.

The above sparse matrix contains only 9 nonzero elements, with 26 zeros. Its sparsity is 74%, density 26%.

We can store all vertex data in a 1D array and the relationships between vertices (edges or arcs) in a 2D array; this 2D array is called the adjacency matrix.

Adjacency matrices can distinguish between directed and undirected graphs.

An undirected graph is a bidirectional relationship; edges have no direction:

A directed graph’s edges have direction and represent a one-way relationship:

Connected Components#

View all connected components using connected_components().

1
import numpy as np
2
from scipy.sparse.csgraph import connected_components
3
from scipy.sparse import csr_matrix
4

5
arr = np.array([
6
  [0, 1, 2],
7
  [1, 0, 0],
8
  [2, 0, 0]
9
])
10

11
newarr = csr_matrix(arr)
12

13
print(connected_components(newarr))

Dijkstra — Shortest Path Algorithm#

Dijkstra’s algorithm computes the shortest paths from one node to all others.

SciPy uses the dijkstra() function to compute the shortest paths from one element to the others. The dijkstra() function can be configured with the following parameters:

return_predecessors: Boolean, set to True to traverse all paths; if you do not want to traverse all paths, set to False.
indices: Indices of the elements; returns all paths to that element.
limit: The maximum weight of a path.

1
# Find the shortest path from element 1 to 2:
2
import numpy as np
3
from scipy.sparse.csgraph import dijkstra
4
from scipy.sparse import csr_matrix
5

6
arr = np.array([
7
  [0, 1, 2],
8
  [1, 0, 0],
9
  [2, 0, 0]
10
])
11

12
newarr = csr_matrix(arr)
13

14
print(dijkstra(newarr, return_predecessors=True, indices=0))

Floyd Warshall — Floyd-Warshall Algorithm#

The Floyd-Warshall algorithm solves the all-pairs shortest path problem.

SciPy uses floyd_warshall() to find the shortest paths between all pairs of elements.

1
# Find the shortest paths between all pairs:
2
import numpy as np
3
from scipy.sparse.csgraph import floyd_warshall
4
from scipy.sparse import csr_matrix
5

6
arr = np.array([
7
  [0, 1, 2],
8
  [1, 0, 0],
9
  [2, 0, 0]
10
])
11

12
newarr = csr_matrix(arr)
13

14
print(floyd_warshall(newarr, return_predecessors=True))

Bellman Ford — Bellman-Ford Algorithm#

The Bellman-Ford algorithm solves the all-pairs shortest path problem.

SciPy uses bellman_ford() to find the shortest paths between all pairs of nodes; it can be used on any graph, including directed graphs and graphs with negative edge weights.

1
# Find the shortest path from element 1 to element 2 on a graph with negative weights:
2
import numpy as np
3
from scipy.sparse.csgraph import bellman_ford
4
from scipy.sparse import csr_matrix
5

6
arr = np.array([
7
  [0, -1, 2],
8
  [1, 0, 0],
9
  [2, 0, 0]
10
])
11

12
newarr = csr_matrix(arr)
13

14
print(bellman_ford(newarr, return_predecessors=True, indices=0))

Depth-First Order#

depth_first_order() returns the depth-first traversal order from a node.

It accepts the following parameters:

Graph
The starting element for traversal

1
# Given an adjacency matrix, return the depth-first traversal order:
2
import numpy as np
3
from scipy.sparse.csgraph import depth_first_order
4
from scipy.sparse import csr_matrix
5

6
arr = np.array([
7
  [0, 1, 0, 1],
8
  [1, 1, 1, 1],
9
  [2, 1, 1, 0],
10
  [0, 1, 0, 1]
11
])
12

13
newarr = csr_matrix(arr)
14

15
print(depth_first_order(newarr, 1))

Breadth-First Order#

breadth_first_order() returns the breadth-first traversal order from a node.

It accepts the following parameters:

Graph
The starting element for traversal

1
# Given an adjacency matrix, return the breadth-first traversal order:
2
import numpy as np
3
from scipy.sparse.csgraph import breadth_first_order
4
from scipy.sparse import csr_matrix
5

6
arr = np.array([
7
  [0, 1, 0, 1],
8
  [1, 1, 1, 1],
9
  [2, 1, 1, 0],
10
  [0, 1, 0, 1]
11
])
12

13
newarr = csr_matrix(arr)
14

15
print(breadth_first_order(newarr, 1))

SciPy Spatial Data#

Spatial data, also known as geometric data, is used to represent information about the position, shape, size, and distribution of objects, such as points in coordinates.

SciPy handles spatial data via the scipy.spatial module, for example, determining whether a point lies within a boundary, computing the nearest point around a given point, and finding all points within a given distance.

Triangulation#

Triangulation in trigonometry and geometry is a method of measuring the distance to a target by using the angles at known endpoints of fixed reference lines.

Polygon triangulation divides a polygon into multiple triangles; we can use these triangles to compute the polygon’s area.

Topology tells us that every surface admits a triangulation.

Suppose a triangulation of a surface exists; let p be the total number of vertices (identical vertices counted once), a the number of edges, and n the number of triangles; then e = p - a + n is a topological invariant of the surface. In other words, regardless of the particular triangulation, e yields the same value. e is called the Euler characteristic.

Delaunay() triangulation is used for triangulating a set of points.

1
# Create triangles from given points:
2
import numpy as np
3
from scipy.spatial import Delaunay
4
import matplotlib.pyplot as plt
5

6
points = np.array([
7
  [2, 4],
8
  [3, 4],
9
  [3, 0],
10
  [2, 2],
11
  [4, 1]
12
])
13

14
simplices = Delaunay(points).simplices    # indices of vertices of triangles
15

16
plt.triplot(points[:, 0], points[:, 1], simplices)
17
plt.scatter(points[:, 0], points[:, 1], color='r')
18

19
plt.show()

Convex Hull#

A convex hull is a concept in computational geometry.

In a real vector space V, given a set X, the intersection of all convex sets containing X is called the convex hull of X. The convex hull of X can be constructed by convex combinations of all points in X (X1, … Xn).

We can create a convex hull using the ConvexHull() method.

1
# Create a convex hull from given points:
2
import numpy as np
3
from scipy.spatial import ConvexHull
4
import matplotlib.pyplot as plt
5

6
points = np.array([
7
  [2, 4],
8
  [3, 4],
9
  [3, 0],
10
  [2, 2],
11
  [4, 1],
12
  [1, 2],
13
  [5, 0],
14
  [3, 1],
15
  [1, 2],
16
  [0, 2]
17
])
18

19
hull = ConvexHull(points)
20
hull_points = hull.simplices
21

22
plt.scatter(points[:,0], points[:,1])
23
for simplex in hull_points:
24
  plt.plot(points[simplex,0], points[simplex,1], 'k-')
25

26
plt.show()

KD-Tree#

A kd-tree (short for k-dimensional tree) is a tree data structure used for storing points in a k-dimensional space to enable fast retrieval. It is commonly used for searching high-dimensional data (e.g., range searches and nearest-neighbor searches).

KDTree() returns a KDTree object.

The query() method returns the nearest distance and the nearest location.

1
# Nearest distance to (1,1):
2
from scipy.spatial import KDTree
3

4
points = [(1, -1), (2, 3), (-2, 3), (2, -3)]
5

6
kdtree = KDTree(points)
7

8
res = kdtree.query((1, 1))
9

10
print(res)

Distance Matrix#

In mathematics, a distance matrix is a matrix whose elements are the distances between points (a two-dimensional array). Given N points in Euclidean space, the distance matrix is an N×N symmetric matrix with non-negative real entries, conceptually similar to an adjacency matrix, but the latter only indicates whether there is a connection between points and does not contain information about the actual distances between points. Therefore, a distance matrix can be viewed as a weighted form of an adjacency matrix.

For example, we analyze the following 2D points a to f. Here, we use the Euclidean distances between points as the distance measure.

Euclidean Distance#

In mathematics, the Euclidean distance or Euclidean metric is the standard (straight-line) distance between two points in Euclidean space. Using this distance makes Euclidean space a metric space. The associated norm is called the Euclidean norm. Earlier literature called it the Pythagorean distance.

Euclidean distance (euclidean metric) is a commonly used distance definition, referring to the true distance between two points in an m-dimensional space, or the natural length of a vector (i.e., the distance from the origin). In 2D and 3D space, the Euclidean distance is simply the actual distance between two points.

1
from scipy.spatial.distance import euclidean
2

3
p1 = (1, 0)
4
p2 = (10, 2)
5

6
res = euclidean(p1, p2)
7

8
print(res)

Manhattan Distance#

The Manhattan distance, coined by Hermann Minkowski in the 19th century, is a term in geometry used in metric spaces to denote the sum of the absolute distances along each axis between two points in a standard coordinate system.

Manhattan distance can only move in the four cardinal directions (up, down, left, right); the distance between two points using Manhattan distance is the shortest path under those constraints.

Manhattan and Euclidean distances: the red, blue, and yellow lines all have the same length (12) for Manhattan distance, while the green line shows the Euclidean distance is 6×√2 ≈ 8.48.

Cosine Distance#

Cosine distance, also known as cosine similarity, measures how similar two vectors are by the cosine of the angle between them.

0 degrees has a cosine value of 1; for any other angle, the cosine value is not greater than 1 and minimum is -1.

1
# Compute the cosine distance between A and B:
2
from scipy.spatial.distance import cosine
3

4
p1 = (1, 0)
5
p2 = (10, 2)
6

7
res = cosine(p1, p2)
8

9
print(res)

Hamming Distance#

In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it counts the number of substitutions required to transform one string into another.

Hamming weight is the Hamming distance of a string relative to a zero string of the same length; that is, the number of nonzero elements in the string: for binary strings, the number of 1s, so the Hamming weight of 11101 is 4.

The Hamming distance between 1011101 and 1001001 is 2.
The Hamming distance between 2143896 and 2233796 is 3.
The Hamming distance between “toned” and “roses” is 3.

1
# Compute the Hamming distance between two points:
2
from scipy.spatial.distance import hamming
3

4
p1 = (True, False, True)
5
p2 = (False, True, True)
6

7
res = hamming(p1, p2)
8

9
print(res)

SciPy MATLAB Arrays#

NumPy provides a Python-readable format for saving data.

SciPy provides MATLAB interoperability.

SciPy’s scipy.io module provides many functions to work with MATLAB arrays.

Export data to MATLAB format#

The savemat() method can export data in MATLAB format. The method has these parameters:

filename - the name of the file to save the data.
mdict - dictionary containing the data.
do_compression - boolean indicating whether to compress the resulting data. Default is False.

1
# Export the array as a variable "vec" to a mat file:
2
from scipy import io
3
import numpy as np
4

5
arr = np.arange(10)
6

7
io.savemat('arr.mat', {"vec": arr})

Note: The above code will save a file named “arr.mat” on your computer.

Import MATLAB format data#

The loadmat() method can import MATLAB format data.

This method has the following parameters:

filename - the file name to load.

Return a structured array whose keys are variable names and whose values are the corresponding variable values.

1
# Import from a mat file:
2
from scipy import io
3
import numpy as np
4

5
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,])
6

7
# Export
8
io.savemat('arr.mat', {"vec": arr})
9

10
# Import
11
mydata = io.loadmat('arr.mat')
12

13
print(mydata)
14

15
# Display only the MATLAB array with the variable name "vec":
16
print(mydata['vec'])

From the result, you can see the array was originally one-dimensional, but when extracted it gains an extra dimension and becomes a two-dimensional array.

To resolve this, you can pass an extra parameter squeeze_me=True:

1
from scipy import io
2
import numpy as np
3

4
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,])
5

6
# Export
7
io.savemat('arr.mat', {"vec": arr})
8

9
# Import
10
mydata = io.loadmat('arr.mat', squeeze_me=True)
11

12
print(mydata['vec'])

SciPy Interpolation#

What is interpolation?#

In numerical analysis, interpolation is a method or process for estimating new data points within the range of a discrete set of known data points.

In simple terms, interpolation is a method for generating points between given points.

For example, for two points 1 and 2, we can interpolate to obtain points 1.33 and 1.66.

Interpolation has many applications; in machine learning, we often deal with missing data, and interpolation can be used to fill in these values.

This filling approach is called imputation.

Besides imputation, interpolation is frequently used wherever we need to smooth discrete points in a data set.

How to implement interpolation in SciPy?#

SciPy provides the scipy.interpolate module to handle interpolation.

One-dimensional interpolation#

Interpolation for one-dimensional data can be performed with the interp1d() method.

The method takes two inputs, x and y.

The return value is a callable function that you can call with new x values to obtain the corresponding y, i.e., y = f(x).

1
# Interpolate given xs and ys from 2.1, 2.2... to 2.9:
2
from scipy.interpolate import interp1d
3
import numpy as np
4

5
xs = np.arange(10)
6
ys = 2*xs + 1
7

8
interp_func = interp1d(xs, ys)
9

10
newarr = interp_func(np.arange(2.1, 3, 0.1))
11

12
print(newarr)

Univariate Interpolation#

In one-dimensional interpolation, the points are fitted to a single curve, whereas in spline interpolation, the points are fitted to functions defined by piecewise polynomials.

Univariate interpolation uses the UnivariateSpline() function, which takes xs and ys and returns a callable function that can be called with new xs.

A piecewise function is a function that has different analytic expressions over different ranges of the independent variable x.

1
# Find the univariate spline interpolation for nonlinear points at 2.1, 2.2...2.9:
2
from scipy.interpolate import UnivariateSpline
3
import numpy as np
4

5
xs = np.arange(10)
6
ys = xs**2 + np.sin(xs) + 1
7

8
interp_func = UnivariateSpline(xs, ys)
9

10
newarr = interp_func(np.arange(2.1, 3, 0.1))
11

12
print(newarr)

Radial Basis Function Interpolation#

Radial basis functions are functions defined with respect to fixed reference points.

In surface interpolation we typically use radial basis function interpolation.

1
# The Rbf() function accepts xs and ys as arguments and returns a callable function
2
from scipy.interpolate import Rbf
3
import numpy as np
4

5
xs = np.arange(10)
6
ys = xs**2 + np.sin(xs) + 1
7

8
interp_func = Rbf(xs, ys)
9

10
newarr = interp_func(np.arange(2.1, 3, 0.1))
11

12
print(newarr)

SciPy Significance Testing#

A significance test is a hypothesis test conducted by making an a priori assumption about the population (random variable) or its distribution, and then using sample information to determine whether this assumption (the alternative hypothesis) is reasonable; i.e., whether the true population deviates significantly from the null hypothesis. In other words, a significance test asks whether the difference between the sample and our assumption about the population is due to random variation or a real discrepancy between the assumption and the population.

Significance testing is used to determine whether there is a difference between experimental and control groups, or between two different treatments, and whether that difference is statistically significant.

SciPy provides the scipy.stats module to perform SciPy significance testing.

Statistical Hypotheses#

A statistical hypothesis concerns the unknown distribution of one or more random variables. A statistical hypothesis that concerns only one or a few unknown parameters within a known distribution is called a parameter hypothesis. The process of testing a statistical hypothesis is called hypothesis testing; testing a parameter hypothesis is called a parameter test.

Null Hypothesis#

The null hypothesis, a term in statistics, also called the original hypothesis, is the hypothesis that is stated before performing a statistical test. When the null hypothesis is true, the test statistic should follow a known probability distribution.

When the computed statistic falls into the rejection region, a rare event has occurred, and the null hypothesis should be rejected.

A hypothesis to be tested is usually denoted as H0 (the null hypothesis), while the alternative hypothesis is denoted as H1 (the alternative hypothesis).

When the null hypothesis is true, deciding to reject it constitutes a Type I error; its probability is usually denoted α.
When the null hypothesis is false, deciding not to reject it constitutes a Type II error; its probability is usually denoted β.
α + β does not necessarily equal 1.

Typically, only the maximum probability of making a Type I error, α, is constrained; β is not considered. This kind of hypothesis testing is called significance testing, and α is the significance level.

Common α values are 0.01, 0.05, 0.10, etc. In general, depending on the research question, if the cost of making a wrong decision is high, you choose a smaller α to reduce such errors; otherwise, you may choose a larger α.

Alternative Hypothesis#

The alternative hypothesis is one of the fundamental concepts in statistics; it includes any proposition about the population distribution that would render the null hypothesis invalid. It is also called the opposite hypothesis or alternative hypothesis.

The alternative hypothesis can replace the null hypothesis.

For example, in evaluating students, we might adopt:

“Students are below average” — as the null hypothesis
“Students are above average” — as the alternative hypothesis.

One-Sided Test#

One-sided test, also known as one-tailed or one-sided test, in hypothesis testing, uses the area of the tail on one side of the density curve to construct the critical region for testing.

When our hypothesis tests only one side of the value, it is called a one-tailed test.

Example:

For the null hypothesis:

“Mean equals k”

We can have alternative hypotheses:

“Mean less than k”
“Mean greater than k”

Two-Sided Test#

Two-sided test, also known as two-tailed or two-sided test, in hypothesis testing, uses the areas in both tails of the distribution to construct the critical region.

When our test concerns both sides of the mean:

Example:

For the null hypothesis:

“Mean equals k”

We can have alternative hypotheses:

“Mean not equal to k”

In this case, both sides (less than or greater than k) are checked.

Alpha Value#

The alpha value is the significance level.

The significance level is the probability of committing an error when the population parameter falls within a certain interval, denoted by α.

Data must be sufficiently close to the extremes to reject the null hypothesis.

Usually 0.01, 0.05, or 0.1.

P-value#

The P-value indicates how extreme the observed data are.

Compare the P-value with alpha to determine statistical significance.

If the p value <= alpha, we reject the null hypothesis and say the data are statistically significant; otherwise, we fail to reject the null.

T Test#

The T-test is used to determine whether there is a significant difference between the means of two variables and whether they come from the same distribution.

This is a two-sided test.

The function ttest_ind() takes two samples of the same size and returns a tuple of the t-statistic and p-value.

1
# Find whether values v1 and v2 come from the same distribution:
2
import numpy as np
3
from scipy.stats import ttest_ind
4

5
v1 = np.random.normal(size=100)
6
v2 = np.random.normal(size=100)
7

8
res = ttest_ind(v1, v2)
9

10
print(res)
11

12
# If you only want the p-value
13
res = ttest_ind(v1, v2).pvalue
14
print(res)

KS Test#

The KS test checks whether a given value conforms to a distribution.

The function takes two arguments: the test values and the CDF.

CDF stands for Cumulative Distribution Function, also called the distribution function.

CDF can be a string or a callable function that returns probabilities.

It can be used for one-sided or two-sided tests.

By default, it is a two-sided test. We can pass a string for the alternative as one of two-sided, less, or greater.

1
# Check whether a given value conforms to a normal distribution:
2
import numpy as np
3
from scipy.stats import kstest
4

5
v = np.random.normal(size=100)
6

7
res = kstest(v, 'norm')
8

9
print(res)

Descriptive Statistics#

Using describe() you can view information about an array, including:

nobs — number of observations
minmax — minimum and maximum
mean — arithmetic mean
variance — variance
skewness — skewness
kurtosis — kurtosis

1
# Display descriptive statistics for an array:
2
import numpy as np
3
from scipy.stats import describe
4

5
v = np.random.normal(size=100)
6
res = describe(v)
7

8
print(res)

Normality Test (Skewness and Kurtosis)#

A normality test assesses whether observed data come from a normal distribution; it is an important special case of a goodness-of-fit test in statistics.

Normality tests are based on skewness and kurtosis.

The normaltest() function returns the p-value for the null hypothesis:

“x comes from a normal distribution”

Skewness#

A measure of the symmetry of the data.

For a normal distribution, it is 0.

If negative, the data are skewed to the left.

If positive, the data are skewed to the right.

Kurtosis#

A measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution.

Positive kurtosis means heavy tails.

Negative kurtosis means light tails.

1
# Find the skewness and kurtosis of values in an array:
2
import numpy as np
3
from scipy.stats import skew, kurtosis
4
from scipy.stats import normaltest
5

6
v = np.random.normal(size=100)
7

8
print(skew(v))
9
print(kurtosis(v))
10

11
# Check whether the data come from a normal distribution:
12
print(normaltest(v))

Share

If this article helped you, please share it with others!

Learning Basic SciPy Usage

https://dreaife.tokyo/en/posts/scipy-guide/

Author

dreaife

Published at

2024-01-09

License

CC BY-NC-SA 4.0

Some information may be outdated

Getting Started with Regular Expressions

NumPy Study Notes 2

dreaife的休憩小栈

Scipy#

Introduction#

Applications#

Installation#

Module List#

SciPy Constants Module#

Unit Types#

SciPy Optimizers#

Root Finding#

Minimizing Functions#

SciPy Sparse Matrices#

CSR Matrix#

SciPy Graph Structures#

Adjacency Matrix#

Connected Components#

Dijkstra — Shortest Path Algorithm#

Floyd Warshall — Floyd-Warshall Algorithm#

Bellman Ford — Bellman-Ford Algorithm#

Depth-First Order#

Breadth-First Order#

SciPy Spatial Data#

Triangulation#

Convex Hull#

KD-Tree#

Distance Matrix#

Euclidean Distance#

Manhattan Distance#

Cosine Distance#

Hamming Distance#

SciPy MATLAB Arrays#

Export data to MATLAB format#

Import MATLAB format data#

SciPy Interpolation#

What is interpolation?#

How to implement interpolation in SciPy?#

One-dimensional interpolation#

Univariate Interpolation#

Radial Basis Function Interpolation#

SciPy Significance Testing#

Statistical Hypotheses#

Null Hypothesis#

Alternative Hypothesis#

One-Sided Test#

Two-Sided Test#

Alpha Value#

P-value#

T Test#

KS Test#

Descriptive Statistics#

Normality Test (Skewness and Kurtosis)#

Skewness#

Kurtosis#

Table of Contents