Scipy
Introduction
SciPy is an open-source Python library for mathematics and scientific computing.
SciPy is a scientific computing library built on NumPy, used in mathematics, science, engineering, and other fields where many advanced abstractions and physical models require SciPy.
SciPy includes modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier transforms, signal processing and image processing, solving ordinary differential equations, and other computations commonly used in science and engineering.
Applications
SciPy is a widely used package for mathematics, science, and engineering, capable of handling optimization, linear algebra, integration, interpolation, fitting, special functions, fast Fourier transforms, signal processing, image processing, solvers for ordinary differential equations, and more.
SciPy includes modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier transforms, signal processing and image processing, solving ordinary differential equations, and other computations common in science and engineering.
The synergy between NumPy and SciPy enables efficient solutions to many problems, with broad applications in astronomy, biology, meteorology and climate science, as well as materials science and other disciplines.
Installation
python3 -m pip install -U pippython3 -m pip install -U scipyVerify the installation:
import scipy
print(scipy.__version__)Module List
The following lists some commonly used SciPy modules and their official API URLs:
| Module name | Function / Description | Reference documentation |
|---|---|---|
| scipy.cluster | Vector quantization | cluster API |
| scipy.constants | Mathematical constants | constants API |
| scipy.fft | Fast Fourier Transform | fft API |
| scipy.integrate | Integration | integrate API |
| scipy.interpolate | Interpolation | interpolate API |
| scipy.io | Data input/output | io API |
| scipy.linalg | Linear algebra | linalg API |
| scipy.misc | Image processing | misc API |
| scipy.ndimage | N-dimensional image | ndimage API |
| scipy.odr | Orthogonal distance regression | odr API |
| scipy.optimize | Optimization algorithms | optimize API |
| scipy.signal | Signal processing | signal API |
| scipy.sparse | Sparse matrices | sparse API |
| scipy.spatial | Spatial data structures and algorithms | spatial API |
| scipy.special | Special mathematical functions | special API |
| scipy.stats | Statistical functions | stats.mstats API |
For more module content, see the official documentation: https://docs.scipy.org/doc/scipy/reference/
SciPy Constants Module
SciPy’s constants module, constants, provides many built-in mathematical constants.
Pi is a mathematical constant—the ratio of a circle’s circumference to its diameter, approximately 3.14159, commonly denoted by the symbol π.
The following prints pi:
from scipy import constants
print(constants.pi)The following prints the golden ratio:
from scipy import constants
print(constants.golden)We can use the dir() function to see which constants are contained in the constants module:
from scipy import constants
print(dir(constants))Unit Types
The constants module contains the following kinds of units:
-
SI prefixes The International System of Units prefixes (SI prefixes) denote multiples and submultiples of units; there are currently 20 prefixes, most of which are powers of ten. (centi equals 0.01):
from scipy import constantsprint(constants.yotta) #1e+24print(constants.zetta) #1e+21print(constants.exa) #1e+18print(constants.peta) #1000000000000000.0print(constants.tera) #1000000000000.0print(constants.giga) #1000000000.0print(constants.mega) #1000000.0print(constants.kilo) #1000.0print(constants.hecto) #100.0print(constants.deka) #10.0print(constants.deci) #0.1print(constants.centi) #0.01print(constants.milli) #0.001print(constants.micro) #1e-06print(constants.nano) #1e-09print(constants.pico) #1e-12print(constants.femto) #1e-15print(constants.atto) #1e-18print(constants.zepto) #1e-21 -
Binary, in bytes Returns byte units (kibi = 1024).
from scipy import constantsprint(constants.kibi) #1024print(constants.mebi) #1048576print(constants.gibi) #1073741824print(constants.tebi) #1099511627776print(constants.pebi) #1125899906842624print(constants.exbi) #1152921504606846976print(constants.zebi) #1180591620717411303424print(constants.yobi) #1208925819614629174706176 -
Mass units Returns kilograms (kg). (gram returns 0.001).
from scipy import constantsprint(constants.gram) #0.001print(constants.metric_ton) #1000.0print(constants.grain) #6.479891e-05print(constants.lb) #0.45359236999999997print(constants.pound) #0.45359236999999997print(constants.oz) #0.028349523124999998print(constants.ounce) #0.028349523124999998print(constants.stone) #6.3502931799999995print(constants.long_ton) #1016.0469088print(constants.short_ton) #907.1847399999999print(constants.troy_ounce) #0.031103476799999998print(constants.troy_pound) #0.37324172159999996print(constants.carat) #0.0002print(constants.atomic_mass) #1.66053904e-27print(constants.m_u) #1.66053904e-27print(constants.u) #1.66053904e-27 -
Angle conversions Returns radians (degree returns 0.017453292519943295).
from scipy import constantsprint(constants.degree) #0.017453292519943295print(constants.arcmin) #0.0002908882086657216print(constants.arcminute) #0.0002908882086657216print(constants.arcsec) #4.84813681109536e-06print(constants.arcsecond) #4.84813681109536e-06 -
Time units Returns seconds (hour returns 3600.0).
from scipy import constantsprint(constants.minute) #60.0print(constants.hour) #3600.0print(constants.day) #86400.0print(constants.week) #604800.0print(constants.year) #31536000.0print(constants.Julian_year) #31557600.0 -
Length units Returns meters (nautical_mile returns 1852.0).
from scipy import constantsprint(constants.inch) #0.0254print(constants.foot) #0.30479999999999996print(constants.yard) #0.9143999999999999print(constants.mile) #1609.3439999999998print(constants.mil) #2.5399999999999997e-05print(constants.pt) #0.00035277777777777776print(constants.point) #0.00035277777777777776print(constants.survey_foot) #0.3048006096012192print(constants.survey_mile) #1609.3472186944373print(constants.nautical_mile) #1852.0print(constants.fermi) #1e-15print(constants.angstrom) #1e-10print(constants.micron) #1e-06print(constants.au) #149597870691.0print(constants.astronomical_unit) #149597870691.0print(constants.light_year) #9460730472580800.0print(constants.parsec) #3.0856775813057292e+16 -
Pressure units Returns pascals, the SI unit of pressure. (psi returns 6894.757293168361).
from scipy import constantsprint(constants.atm) #101325.0print(constants.atmosphere) #101325.0print(constants.bar) #100000.0print(constants.torr) #133.32236842105263print(constants.mmHg) #133.32236842105263print(constants.psi) #6894.757293168361 -
Area units Returns square meters, the metric unit of area; defined as the area of a square with side length 1 meter. (hectare returns 10000.0).
from scipy import constantsprint(constants.hectare) #10000.0print(constants.acre) #4046.8564223999992 -
Volume units
Returns cubic meters; a volume of one cubic meter is the volume of a cube with sides of 1 meter; equal to 1 liter and 1 cubic decimeter, and equal to 1,000,000 cubic centimeters. (liter returns 0.001).
from scipy import constantsprint(constants.liter) #0.001print(constants.litre) #0.001print(constants.gallon) #0.0037854117839999997print(constants.gallon_US) #0.0037854117839999997print(constants.gallon_imp) #0.00454609print(constants.fluid_ounce) #2.9573529562499998e-05print(constants.fluid_ounce_US) #2.9573529562499998e-05print(constants.fluid_ounce_imp) #2.84130625e-05print(constants.barrel) #0.15898729492799998print(constants.bbl) #0.15898729492799998 -
Speed units Returns meters per second. (speed_of_sound returns 340.5).
from scipy import constantsprint(constants.kmh) #0.2777777777777778print(constants.mph) #0.44703999999999994print(constants.mach) #340.5print(constants.speed_of_sound) #340.5print(constants.knot) #0.5144444444444445 -
Temperature units Returns kelvin. (zero_Celsius returns 273.15).
from scipy import constantsprint(constants.zero_Celsius) #273.15print(constants.degree_Fahrenheit) #0.5555555555555556 -
Energy units Returns joules; the joule (symbol J) is the SI derived unit of energy, work, or heat. (calorie returns 4.184).
from scipy import constantsprint(constants.calorie) #4.184 -
Power units Returns watts; the watt (symbol W) is the SI unit of power. One watt is defined as one joule per second (1 J/s), i.e., the rate of energy conversion, use, or dissipation. (horsepower returns 745.6998715822701).
from scipy import constantsprint(constants.hp) #745.6998715822701print(constants.horsepower) #745.6998715822701 -
Dynamical (mechanical) units Returns newtons; the newton (symbol N) is the SI unit of force. It is named after Isaac Newton, the founder of classical mechanics. (kilogram_force returns 9.80665).
from scipy import constantsprint(constants.dyn) #1e-05print(constants.dyne) #1e-05print(constants.lbf) #4.4482216152605print(constants.pound_force) #4.4482216152605print(constants.kgf) #9.80665print(constants.kilogram_force) #9.80665
SciPy Optimizers
SciPy’s optimize module provides implementations of common optimization algorithms that we can call directly to solve optimization problems, such as finding the minimum of a function or the roots of equations.
Root Finding
NumPy can find roots of polynomials and linear equations, but it cannot find roots of nonlinear equations, as shown below:
x + cos(x)
Therefore we can use SciPy’s optimize.root function, which requires two parameters:
- fun - the function representing the equation.
- x0 - initial guess for the root.
The function returns an object containing information about the solution.
from scipy.optimize import rootfrom math import cos
def eqn(x): return x + cos(x)
myroot = root(eqn, 0)
print(myroot.x)# See more information#print(myroot)Minimizing Functions
A function represents a curve with maxima and minima.
- A high point is called a maximum.
- A low point is called a minimum.
- The highest point on the entire curve is the global maximum; the rest are local maxima.
- The lowest point on the entire curve is the global minimum; the rest are local minima.
You can use the scipy.optimize.minimize() function to minimize a function.
minimize() accepts the following parameters:
-
fun - the function to optimize
-
x0 - initial guess
-
method - the name of the method to use; values can be: ‘CG’, ‘BFGS’, ‘Newton-CG’, ‘L-BFGS-B’, ‘TNC’, ‘COBYLA’, ‘SLSQP’.
-
callback - the function called after each optimization iteration.
-
options - dictionary for other parameters:
{"disp": boolean - print detailed description"gtol": number - the tolerance of the error}
The minimization of x^2 + x + 2 using BFGS:
from scipy.optimize import minimize
def eqn(x): return x**2 + x + 2
mymin = minimize(eqn, 0, method='BFGS')
print(mymin)SciPy Sparse Matrices
A sparse matrix is a matrix in which the vast majority of the elements are zero. Conversely, if most elements are nonzero, the matrix is dense.
In science and engineering, large sparse matrices frequently arise when solving linear models.

The above sparse matrix contains only 9 nonzero elements, with 26 zeros. Its sparsity is 74%, density 26%.
SciPy’s scipy.sparse module provides functions for working with sparse matrices.
We primarily use the following two types of sparse matrices:
- CSC - Compressed Sparse Column, compressed by column.
- CSR - Compressed Sparse Row, compressed by row.
In this chapter we primarily use CSR matrices.
CSR Matrix
We can create a CSR matrix by passing an array to the scipy.sparse.csr_matrix() function.
# Create a CSR matrix.import numpy as npfrom scipy.sparse import csr_matrix
arr = np.array([0, 0, 0, 0, 0, 1, 1, 0, 2])print(csr_matrix(arr))- data Use the data attribute to view the stored data (excluding zero elements):
import numpy as npfrom scipy.sparse import csr_matrix
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
print(csr_matrix(arr).data)- count_nonzero() Use count_nonzero() to count the total number of non-zero elements:
import numpy as npfrom scipy.sparse import csr_matrix
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
print(csr_matrix(arr).count_nonzero())- eliminate_zeros() Use eliminate_zeros() to remove zero elements from the matrix:
import numpy as npfrom scipy.sparse import csr_matrix
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
mat = csr_matrix(arr)mat.eliminate_zeros()
print(mat)- sum_duplicates() Use sum_duplicates() to remove duplicates:
import numpy as npfrom scipy.sparse import csr_matrix
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
mat = csr_matrix(arr)mat.sum_duplicates()
print(mat)- tocsc() Convert CSR to CSC using tocsc():
import numpy as npfrom scipy.sparse import csr_matrix
arr = np.array([[0, 0, 0], [0, 0, 1], [1, 0, 2]])
newarr = csr_matrix(arr).tocsc()
print(newarr)SciPy Graph Structures
Graphs are one of the most powerful frameworks in algorithmic theory.
A graph is a set of nodes (vertices) and edges representing relationships; nodes correspond to objects and edges connect them.
SciPy provides the scipy.sparse.csgraph module to handle graph structures.
Adjacency Matrix
An Adjacency Matrix is a matrix representing the adjacency relationships between vertices.
The adjacency matrix structure consists of two sets: V (vertices) and E (edges); edges may have weights indicating the strength of connections between nodes.

The above sparse matrix contains only 9 nonzero elements, with 26 zeros. Its sparsity is 74%, density 26%.
We can store all vertex data in a 1D array and the relationships between vertices (edges or arcs) in a 2D array; this 2D array is called the adjacency matrix.
Adjacency matrices can distinguish between directed and undirected graphs.
An undirected graph is a bidirectional relationship; edges have no direction:
A directed graph’s edges have direction and represent a one-way relationship:
Connected Components
View all connected components using connected_components().
import numpy as npfrom scipy.sparse.csgraph import connected_componentsfrom scipy.sparse import csr_matrix
arr = np.array([ [0, 1, 2], [1, 0, 0], [2, 0, 0]])
newarr = csr_matrix(arr)
print(connected_components(newarr))Dijkstra — Shortest Path Algorithm
Dijkstra’s algorithm computes the shortest paths from one node to all others.
SciPy uses the dijkstra() function to compute the shortest paths from one element to the others. The dijkstra() function can be configured with the following parameters:
- return_predecessors: Boolean, set to True to traverse all paths; if you do not want to traverse all paths, set to False.
- indices: Indices of the elements; returns all paths to that element.
- limit: The maximum weight of a path.
# Find the shortest path from element 1 to 2:import numpy as npfrom scipy.sparse.csgraph import dijkstrafrom scipy.sparse import csr_matrix
arr = np.array([ [0, 1, 2], [1, 0, 0], [2, 0, 0]])
newarr = csr_matrix(arr)
print(dijkstra(newarr, return_predecessors=True, indices=0))Floyd Warshall — Floyd-Warshall Algorithm
The Floyd-Warshall algorithm solves the all-pairs shortest path problem.
SciPy uses floyd_warshall() to find the shortest paths between all pairs of elements.
# Find the shortest paths between all pairs:import numpy as npfrom scipy.sparse.csgraph import floyd_warshallfrom scipy.sparse import csr_matrix
arr = np.array([ [0, 1, 2], [1, 0, 0], [2, 0, 0]])
newarr = csr_matrix(arr)
print(floyd_warshall(newarr, return_predecessors=True))Bellman Ford — Bellman-Ford Algorithm
The Bellman-Ford algorithm solves the all-pairs shortest path problem.
SciPy uses bellman_ford() to find the shortest paths between all pairs of nodes; it can be used on any graph, including directed graphs and graphs with negative edge weights.
# Find the shortest path from element 1 to element 2 on a graph with negative weights:import numpy as npfrom scipy.sparse.csgraph import bellman_fordfrom scipy.sparse import csr_matrix
arr = np.array([ [0, -1, 2], [1, 0, 0], [2, 0, 0]])
newarr = csr_matrix(arr)
print(bellman_ford(newarr, return_predecessors=True, indices=0))Depth-First Order
depth_first_order() returns the depth-first traversal order from a node.
It accepts the following parameters:
- Graph
- The starting element for traversal
# Given an adjacency matrix, return the depth-first traversal order:import numpy as npfrom scipy.sparse.csgraph import depth_first_orderfrom scipy.sparse import csr_matrix
arr = np.array([ [0, 1, 0, 1], [1, 1, 1, 1], [2, 1, 1, 0], [0, 1, 0, 1]])
newarr = csr_matrix(arr)
print(depth_first_order(newarr, 1))Breadth-First Order
breadth_first_order() returns the breadth-first traversal order from a node.
It accepts the following parameters:
- Graph
- The starting element for traversal
# Given an adjacency matrix, return the breadth-first traversal order:import numpy as npfrom scipy.sparse.csgraph import breadth_first_orderfrom scipy.sparse import csr_matrix
arr = np.array([ [0, 1, 0, 1], [1, 1, 1, 1], [2, 1, 1, 0], [0, 1, 0, 1]])
newarr = csr_matrix(arr)
print(breadth_first_order(newarr, 1))SciPy Spatial Data
Spatial data, also known as geometric data, is used to represent information about the position, shape, size, and distribution of objects, such as points in coordinates.
SciPy handles spatial data via the scipy.spatial module, for example, determining whether a point lies within a boundary, computing the nearest point around a given point, and finding all points within a given distance.
Triangulation
Triangulation in trigonometry and geometry is a method of measuring the distance to a target by using the angles at known endpoints of fixed reference lines.
Polygon triangulation divides a polygon into multiple triangles; we can use these triangles to compute the polygon’s area.
Topology tells us that every surface admits a triangulation.
Suppose a triangulation of a surface exists; let p be the total number of vertices (identical vertices counted once), a the number of edges, and n the number of triangles; then e = p - a + n is a topological invariant of the surface. In other words, regardless of the particular triangulation, e yields the same value. e is called the Euler characteristic.
Delaunay() triangulation is used for triangulating a set of points.
# Create triangles from given points:import numpy as npfrom scipy.spatial import Delaunayimport matplotlib.pyplot as plt
points = np.array([ [2, 4], [3, 4], [3, 0], [2, 2], [4, 1]])
simplices = Delaunay(points).simplices # indices of vertices of triangles
plt.triplot(points[:, 0], points[:, 1], simplices)plt.scatter(points[:, 0], points[:, 1], color='r')
plt.show()Convex Hull
A convex hull is a concept in computational geometry.
In a real vector space V, given a set X, the intersection of all convex sets containing X is called the convex hull of X. The convex hull of X can be constructed by convex combinations of all points in X (X1, … Xn).
We can create a convex hull using the ConvexHull() method.
# Create a convex hull from given points:import numpy as npfrom scipy.spatial import ConvexHullimport matplotlib.pyplot as plt
points = np.array([ [2, 4], [3, 4], [3, 0], [2, 2], [4, 1], [1, 2], [5, 0], [3, 1], [1, 2], [0, 2]])
hull = ConvexHull(points)hull_points = hull.simplices
plt.scatter(points[:,0], points[:,1])for simplex in hull_points: plt.plot(points[simplex,0], points[simplex,1], 'k-')
plt.show()KD-Tree
A kd-tree (short for k-dimensional tree) is a tree data structure used for storing points in a k-dimensional space to enable fast retrieval. It is commonly used for searching high-dimensional data (e.g., range searches and nearest-neighbor searches).
KDTree() returns a KDTree object.
The query() method returns the nearest distance and the nearest location.
# Nearest distance to (1,1):from scipy.spatial import KDTree
points = [(1, -1), (2, 3), (-2, 3), (2, -3)]
kdtree = KDTree(points)
res = kdtree.query((1, 1))
print(res)Distance Matrix
In mathematics, a distance matrix is a matrix whose elements are the distances between points (a two-dimensional array). Given N points in Euclidean space, the distance matrix is an N×N symmetric matrix with non-negative real entries, conceptually similar to an adjacency matrix, but the latter only indicates whether there is a connection between points and does not contain information about the actual distances between points. Therefore, a distance matrix can be viewed as a weighted form of an adjacency matrix.
For example, we analyze the following 2D points a to f. Here, we use the Euclidean distances between points as the distance measure.
Euclidean Distance
In mathematics, the Euclidean distance or Euclidean metric is the standard (straight-line) distance between two points in Euclidean space. Using this distance makes Euclidean space a metric space. The associated norm is called the Euclidean norm. Earlier literature called it the Pythagorean distance.
Euclidean distance (euclidean metric) is a commonly used distance definition, referring to the true distance between two points in an m-dimensional space, or the natural length of a vector (i.e., the distance from the origin). In 2D and 3D space, the Euclidean distance is simply the actual distance between two points.
from scipy.spatial.distance import euclidean
p1 = (1, 0)p2 = (10, 2)
res = euclidean(p1, p2)
print(res)Manhattan Distance
The Manhattan distance, coined by Hermann Minkowski in the 19th century, is a term in geometry used in metric spaces to denote the sum of the absolute distances along each axis between two points in a standard coordinate system.
Manhattan distance can only move in the four cardinal directions (up, down, left, right); the distance between two points using Manhattan distance is the shortest path under those constraints.
Manhattan and Euclidean distances: the red, blue, and yellow lines all have the same length (12) for Manhattan distance, while the green line shows the Euclidean distance is 6×√2 ≈ 8.48.

Cosine Distance
Cosine distance, also known as cosine similarity, measures how similar two vectors are by the cosine of the angle between them.
0 degrees has a cosine value of 1; for any other angle, the cosine value is not greater than 1 and minimum is -1.
# Compute the cosine distance between A and B:from scipy.spatial.distance import cosine
p1 = (1, 0)p2 = (10, 2)
res = cosine(p1, p2)
print(res)Hamming Distance
In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it counts the number of substitutions required to transform one string into another.
Hamming weight is the Hamming distance of a string relative to a zero string of the same length; that is, the number of nonzero elements in the string: for binary strings, the number of 1s, so the Hamming weight of 11101 is 4.
- The Hamming distance between 1011101 and 1001001 is 2.
- The Hamming distance between 2143896 and 2233796 is 3.
- The Hamming distance between “toned” and “roses” is 3.
# Compute the Hamming distance between two points:from scipy.spatial.distance import hamming
p1 = (True, False, True)p2 = (False, True, True)
res = hamming(p1, p2)
print(res)SciPy MATLAB Arrays
NumPy provides a Python-readable format for saving data.
SciPy provides MATLAB interoperability.
SciPy’s scipy.io module provides many functions to work with MATLAB arrays.
Export data to MATLAB format
The savemat() method can export data in MATLAB format. The method has these parameters:
- filename - the name of the file to save the data.
- mdict - dictionary containing the data.
- do_compression - boolean indicating whether to compress the resulting data. Default is False.
# Export the array as a variable "vec" to a mat file:from scipy import ioimport numpy as np
arr = np.arange(10)
io.savemat('arr.mat', {"vec": arr})Note: The above code will save a file named “arr.mat” on your computer.
Import MATLAB format data
The loadmat() method can import MATLAB format data.
This method has the following parameters:
- filename - the file name to load.
Return a structured array whose keys are variable names and whose values are the corresponding variable values.
# Import from a mat file:from scipy import ioimport numpy as np
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,])
# Exportio.savemat('arr.mat', {"vec": arr})
# Importmydata = io.loadmat('arr.mat')
print(mydata)
# Display only the MATLAB array with the variable name "vec":print(mydata['vec'])From the result, you can see the array was originally one-dimensional, but when extracted it gains an extra dimension and becomes a two-dimensional array.
To resolve this, you can pass an extra parameter squeeze_me=True:
from scipy import ioimport numpy as np
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,])
# Exportio.savemat('arr.mat', {"vec": arr})
# Importmydata = io.loadmat('arr.mat', squeeze_me=True)
print(mydata['vec'])SciPy Interpolation
What is interpolation?
In numerical analysis, interpolation is a method or process for estimating new data points within the range of a discrete set of known data points.
In simple terms, interpolation is a method for generating points between given points.
For example, for two points 1 and 2, we can interpolate to obtain points 1.33 and 1.66.
Interpolation has many applications; in machine learning, we often deal with missing data, and interpolation can be used to fill in these values.
This filling approach is called imputation.
Besides imputation, interpolation is frequently used wherever we need to smooth discrete points in a data set.
How to implement interpolation in SciPy?
SciPy provides the scipy.interpolate module to handle interpolation.
One-dimensional interpolation
Interpolation for one-dimensional data can be performed with the interp1d() method.
The method takes two inputs, x and y.
The return value is a callable function that you can call with new x values to obtain the corresponding y, i.e., y = f(x).
# Interpolate given xs and ys from 2.1, 2.2... to 2.9:from scipy.interpolate import interp1dimport numpy as np
xs = np.arange(10)ys = 2*xs + 1
interp_func = interp1d(xs, ys)
newarr = interp_func(np.arange(2.1, 3, 0.1))
print(newarr)Univariate Interpolation
In one-dimensional interpolation, the points are fitted to a single curve, whereas in spline interpolation, the points are fitted to functions defined by piecewise polynomials.
Univariate interpolation uses the UnivariateSpline() function, which takes xs and ys and returns a callable function that can be called with new xs.
A piecewise function is a function that has different analytic expressions over different ranges of the independent variable x.
# Find the univariate spline interpolation for nonlinear points at 2.1, 2.2...2.9:from scipy.interpolate import UnivariateSplineimport numpy as np
xs = np.arange(10)ys = xs**2 + np.sin(xs) + 1
interp_func = UnivariateSpline(xs, ys)
newarr = interp_func(np.arange(2.1, 3, 0.1))
print(newarr)Radial Basis Function Interpolation
Radial basis functions are functions defined with respect to fixed reference points.
In surface interpolation we typically use radial basis function interpolation.
# The Rbf() function accepts xs and ys as arguments and returns a callable functionfrom scipy.interpolate import Rbfimport numpy as np
xs = np.arange(10)ys = xs**2 + np.sin(xs) + 1
interp_func = Rbf(xs, ys)
newarr = interp_func(np.arange(2.1, 3, 0.1))
print(newarr)SciPy Significance Testing
A significance test is a hypothesis test conducted by making an a priori assumption about the population (random variable) or its distribution, and then using sample information to determine whether this assumption (the alternative hypothesis) is reasonable; i.e., whether the true population deviates significantly from the null hypothesis. In other words, a significance test asks whether the difference between the sample and our assumption about the population is due to random variation or a real discrepancy between the assumption and the population.
Significance testing is used to determine whether there is a difference between experimental and control groups, or between two different treatments, and whether that difference is statistically significant.
SciPy provides the scipy.stats module to perform SciPy significance testing.
Statistical Hypotheses
A statistical hypothesis concerns the unknown distribution of one or more random variables. A statistical hypothesis that concerns only one or a few unknown parameters within a known distribution is called a parameter hypothesis. The process of testing a statistical hypothesis is called hypothesis testing; testing a parameter hypothesis is called a parameter test.
Null Hypothesis
The null hypothesis, a term in statistics, also called the original hypothesis, is the hypothesis that is stated before performing a statistical test. When the null hypothesis is true, the test statistic should follow a known probability distribution.
When the computed statistic falls into the rejection region, a rare event has occurred, and the null hypothesis should be rejected.
A hypothesis to be tested is usually denoted as H0 (the null hypothesis), while the alternative hypothesis is denoted as H1 (the alternative hypothesis).
- When the null hypothesis is true, deciding to reject it constitutes a Type I error; its probability is usually denoted α.
- When the null hypothesis is false, deciding not to reject it constitutes a Type II error; its probability is usually denoted β.
- α + β does not necessarily equal 1.
Typically, only the maximum probability of making a Type I error, α, is constrained; β is not considered. This kind of hypothesis testing is called significance testing, and α is the significance level.
Common α values are 0.01, 0.05, 0.10, etc. In general, depending on the research question, if the cost of making a wrong decision is high, you choose a smaller α to reduce such errors; otherwise, you may choose a larger α.
Alternative Hypothesis
The alternative hypothesis is one of the fundamental concepts in statistics; it includes any proposition about the population distribution that would render the null hypothesis invalid. It is also called the opposite hypothesis or alternative hypothesis.
The alternative hypothesis can replace the null hypothesis.
For example, in evaluating students, we might adopt:
- “Students are below average” — as the null hypothesis
- “Students are above average” — as the alternative hypothesis.
One-Sided Test
One-sided test, also known as one-tailed or one-sided test, in hypothesis testing, uses the area of the tail on one side of the density curve to construct the critical region for testing.
When our hypothesis tests only one side of the value, it is called a one-tailed test.
Example:
For the null hypothesis:
“Mean equals k”
We can have alternative hypotheses:
- “Mean less than k”
- “Mean greater than k”
Two-Sided Test
Two-sided test, also known as two-tailed or two-sided test, in hypothesis testing, uses the areas in both tails of the distribution to construct the critical region.
When our test concerns both sides of the mean:
Example:
For the null hypothesis:
“Mean equals k”
We can have alternative hypotheses:
“Mean not equal to k”
In this case, both sides (less than or greater than k) are checked.
Alpha Value
The alpha value is the significance level.
The significance level is the probability of committing an error when the population parameter falls within a certain interval, denoted by α.
Data must be sufficiently close to the extremes to reject the null hypothesis.
Usually 0.01, 0.05, or 0.1.
P-value
The P-value indicates how extreme the observed data are.
Compare the P-value with alpha to determine statistical significance.
If the p value <= alpha, we reject the null hypothesis and say the data are statistically significant; otherwise, we fail to reject the null.
T Test
The T-test is used to determine whether there is a significant difference between the means of two variables and whether they come from the same distribution.
This is a two-sided test.
The function ttest_ind() takes two samples of the same size and returns a tuple of the t-statistic and p-value.
# Find whether values v1 and v2 come from the same distribution:import numpy as npfrom scipy.stats import ttest_ind
v1 = np.random.normal(size=100)v2 = np.random.normal(size=100)
res = ttest_ind(v1, v2)
print(res)
# If you only want the p-valueres = ttest_ind(v1, v2).pvalueprint(res)KS Test
The KS test checks whether a given value conforms to a distribution.
The function takes two arguments: the test values and the CDF.
CDF stands for Cumulative Distribution Function, also called the distribution function.
CDF can be a string or a callable function that returns probabilities.
It can be used for one-sided or two-sided tests.
By default, it is a two-sided test. We can pass a string for the alternative as one of two-sided, less, or greater.
# Check whether a given value conforms to a normal distribution:import numpy as npfrom scipy.stats import kstest
v = np.random.normal(size=100)
res = kstest(v, 'norm')
print(res)Descriptive Statistics
Using describe() you can view information about an array, including:
- nobs — number of observations
- minmax — minimum and maximum
- mean — arithmetic mean
- variance — variance
- skewness — skewness
- kurtosis — kurtosis
# Display descriptive statistics for an array:import numpy as npfrom scipy.stats import describe
v = np.random.normal(size=100)res = describe(v)
print(res)Normality Test (Skewness and Kurtosis)
A normality test assesses whether observed data come from a normal distribution; it is an important special case of a goodness-of-fit test in statistics.
Normality tests are based on skewness and kurtosis.
The normaltest() function returns the p-value for the null hypothesis:
“x comes from a normal distribution”
Skewness
A measure of the symmetry of the data.
For a normal distribution, it is 0.
If negative, the data are skewed to the left.
If positive, the data are skewed to the right.
Kurtosis
A measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution.
Positive kurtosis means heavy tails.
Negative kurtosis means light tails.
# Find the skewness and kurtosis of values in an array:import numpy as npfrom scipy.stats import skew, kurtosisfrom scipy.stats import normaltest
v = np.random.normal(size=100)
print(skew(v))print(kurtosis(v))
# Check whether the data come from a normal distribution:print(normaltest(v))If this article helped you, please share it with others!
Some information may be outdated





