NumPy for Materials Research: From Zero to Productive

2026-06-26·5 min read·Python Tutorials

Why Every Materials Scientist Needs NumPy

If you process experimental data, NumPy is non-negotiable. It turns raw .txt files from your tensile tester, AFM, or XRD into Python arrays you can analyze, plot, and export. This guide assumes you have Python installed and takes you from zero to productive with real lab data.


import numpy as np

1. The NumPy Array: Your Data Container

A NumPy array is a grid of numbers — like a column in Excel, but faster and more powerful:


# From a Python list
strains = np.array([0.0, 0.005, 0.010, 0.020, 0.050, 0.100])
stresses = np.array([0, 8, 15, 28, 52, 74])  # MPa

# Convenience functions
x = np.linspace(0, 0.5, 200)       # 200 points from 0 to 0.5
zeros = np.zeros((10, 3))          # 10x3 matrix of zeros
ones = np.ones(50)                 # 50-element vector of ones
empty = np.empty(100)              # Uninitialized (fast, but fill before use)

Array Properties


arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)    # (2, 3) — 2 rows, 3 columns
print(arr.ndim)     # 2 — 2-dimensional
print(arr.dtype)    # int64 — data type
print(arr.size)     # 6 — total elements

2. Loading Experimental Data

Most lab instruments export tabular data. NumPy handles all common formats:


# Generic: columns separated by spaces or tabs
data = np.loadtxt("tensile_test.txt", skiprows=0)

# CSV with headers
data = np.loadtxt("mechanical_data.csv", delimiter=",", skiprows=1)

# AFM force curve (Bruker: 320-line header)
raw = np.loadtxt("force_curve.001", skiprows=320)

# Column extraction
z_piezo = raw[:, 0]        # First column
deflection = raw[:, 1]     # Second column

# Multiple columns at once
time, force, displacement = data[:, 0], data[:, 1], data[:, 2]

# Genfromtext: handles missing values
data_with_nans = np.genfromtxt("messy_data.csv", delimiter=",", skip_header=1, filling_values=0)

3. Element-Wise Operations (No Loops)

NumPy's real power: operations apply to every element simultaneously. No for loops needed:


# Arithmetic
stress_pascals = stresses * 1e6          # MPa to Pa
strain_percent = strains * 100           # fraction to percent
engineering_stress = force / area        # F/A for every point

# Math functions
log_stress = np.log(stresses)
exp_decay = np.exp(-time / tau)
sqrt_strain = np.sqrt(strains)

# Comparison
yielded = stresses > 30                 # Boolean array: [False, False, False, False, True, True]

# Convert deflection to force (AFM)
k = 0.06             # Spring constant (N/m)
sensitivity = 45.0   # Deflection sensitivity (nm/V)
deflection_nm = deflection * sensitivity
force = k * deflection_nm * 1e-9 * 1e9     # Convert to nN

4. Statistics and Aggregation


# Basic statistics
mean_stress = np.mean(stresses)         # 29.5 MPa
std_stress = np.std(stresses)           # Standard deviation
median_stress = np.median(stresses)     # Median (robust to outliers)
min_stress = np.min(stresses)
max_stress = np.max(stresses)

# Cumulative
total_energy = np.sum(force)            # Sum
cumulative = np.cumsum(stresses)        # Running total

# Percentiles
q75 = np.percentile(stresses, 75)       # 75th percentile

# Axis operations (for 2D data)
column_means = data.mean(axis=0)        # Mean of each column
row_stds = data.std(axis=1)             # Std of each row

5. Boolean Indexing (Filter Data)

This is probably the NumPy feature you will use most:


# Find all points in the elastic region (strain < 2%)
elastic_mask = strains < 0.02
elastic_stresses = stresses[elastic_mask]

# Combined conditions
valid = (strains > 0) & (strains < 0.1) & (stresses > 0)
valid_data = stresses[valid]

# Count how many points satisfy a condition
n_yielded = np.sum(stresses > 30)       # Count of True values

# Replace values
stresses[stresses < 0] = 0              # No negative stresses
force[force > 100] = 100                # Cap at 100 nN

# Find indices
idx_max = np.argmax(stresses)           # Index of maximum
idx_where = np.where(stresses > 30)[0]  # Indices where condition is True

6. Reshaping and Combining Arrays


# Reshape
flat = data.flatten()                   # Any shape -> 1D
matrix = np.arange(12).reshape(3, 4)    # 1D -> 3x4 matrix

# Stacking
combined = np.column_stack([strains, stresses])  # Side by side
stacked = np.vstack([data1, data2])              # Row-wise append

# Transpose
transposed = matrix.T

# Broadcasting: operations between arrays of different shapes
centered = data - data.mean(axis=0)     # Subtract column means
normalized = data / data.max(axis=0)    # Normalize each column

7. Saving Results


# Simple text export
np.savetxt("results.csv", combined, delimiter=",", header="strain,stress", comments="")

# Binary format (fast, compact)
np.save("processed_data.npy", data)
loaded = np.load("processed_data.npy")

# Multiple arrays
np.savez("experiment.npz", strain=strains, stress=stresses, time=time)
archive = np.load("experiment.npz")
print(archive["strain"])

8. End-to-End Example: AFM Force Curve Processing

Here is a complete pipeline from raw data to mechanical properties:


import numpy as np

# Load
raw = np.loadtxt("force_curve_001.txt", skiprows=320)
z_piezo, deflection_V = raw[:, 0], raw[:, 1]

# Convert to force
k, sens = 0.06, 45.0
deflection_nm = deflection_V * sens
force = k * deflection_nm * 1e-9 * 1e9  # nN

# Baseline correction (pre-contact region)
baseline_idx = 100  # First 100 points are pre-contact
baseline = np.median(force[:baseline_idx])
force_corrected = force - baseline

# Find contact point
force_derivative = np.gradient(force_corrected)
noise = np.std(force_derivative[:50])
contact_idx = np.argmax(force_derivative > 5 * noise)

# Indentation
indentation = z_piezo[contact_idx] - z_piezo
contact_force = force_corrected - force_corrected[contact_idx]

# Fit elastic region
fit_mask = (indentation > 0) & (indentation < 50)
E, _ = np.polyfit(indentation[fit_mask], contact_force[fit_mask], 1)

print(f"Young's modulus: {E:.1f} nN/nm (relative)")

9. Common Pitfalls

Problem	Solution
---------	----------
`np.loadtxt` fails on header lines	Use `skiprows=N` to skip header
Missing values cause errors	Use `np.genfromtxt` with `filling_values`
Wrong shape from slicing	Check with `.shape` — `data[:, 0]` is 1D, `data[:, :1]` is 2D
Operator `*` does element-wise	For matrix multiply, use `@` or `np.dot`
NaN from division by zero	Use `np.divide(a, b, where=b!=0)`

10. When to Use What

Task	NumPy Function
------	---------------
Load tabular data	`loadtxt`, `genfromtxt`
Linear fit	`polyfit(x, y, 1)`
Statistics	`mean`, `std`, `median`, `percentile`
Filter data	Boolean indexing `arr[mask]`
Find peaks	In SciPy: `scipy.signal.find_peaks`
FFT (frequency analysis)	`np.fft.fft`

NumPy is the foundation. Once you are comfortable here, SciPy adds optimization, curve fitting, signal processing — all building on NumPy arrays.

References

Harris, C.R. et al. (2020). Array programming with NumPy. Nature, 585, 357-362.
NumPy documentation: numpy.org/doc
VanderPlas, J. (2016). Python Data Science Handbook. O'Reilly Media.