# stats.awk

## Overview
  **stats.awk** reads numeric values from standard input or files and 
  computes basic descriptive statistics (mean, variance, standard deviation,
  min/max) and performs one-pass linear regression when x/y data are available.
  It is intended for use in Unix pipelines on FreeBSD/Linux  systems. 
  All computations are performed in a single pass over the input stream.

  When  processing  single-column  input  with no xcol or ycol specified,
  stats.awk automatically performs linear  regression  using  the  record
  index  as  x  and  the  column  value as y.  Descriptive statistics are
  always computed from the y values.
    
## Features
- One-pass calculation of:
  - Count (number of samples)
  - Arithmetic Mean
  - Unbiased variance (Bessel corrected)
  - Unbiased standard deviation
  - Maximum / 2nd maximum
  - Minimum / 2nd minimum
  - Optional one-pass **linear regression** (y = a + b * x)
  - Text, CSV, TSV or JSON output
  - Customizable numeric precision and format

## Requirements
- AWK implementation with:
  - `sqrt()`
  - `sprintf()`
  - `tolower()`
- Tested on: gawk, nawk, FreeBSD awk

## Installation 

### Using Makefile
```sh
make install           # Installs script and man page to /usr/local
make uninstall         # Removes installed files
```
You can override the prefix:
```sh
make PREFIX=$HOME/.local install
```

### Manual installation
```sh
chmod +x stats.awk
cp stats.awk /usr/local/bin/
cp stats.awk.1 /usr/local/man/man1/
```

## Usage
```sh
awk -f stats.awk [options] [file ...]
```

## Examples
- Basic stats from column 1:
  ```sh
  seq 1 100 | awk -f stats.awk
  ```
- Linear regression on columns 1 and 2:
  ```sh
  awk -f stats.awk -v xcol=1 -v ycol=2 data.txt
  ```
- CSV output:
  ```sh
  awk -f stats.awk -v xcol=1 -v ycol=2 -v out=csv data.txt
  ```
- JSON output for downstream processing:
  ```sh
  awk -f stats.awk -v xcol=1 -v ycol=2 -v out=json data.txt | jq .
  ```

## Hash values
- The archive contains the files 'stats.awk.sha256' and 'stats.awk.md5', 
  which contain the sha256 and md5 hash values for stats.awk respectively.

  - SHA256 (stats.awk) = f8f857f3b99fa37aa3080fe16a8820e710899a0644cba685dc3d07c8465e44a6
  - MD5 (stats.awk) = 66aab1b2c58c98c50e9a6bae8ed39d72

- Users can verify the distributed **stats.awk** using:
  - sha256 stats.awk # Linux / FreeBSD
  - md5    stats.awk # linux / FreeBSD

## License
SPDX-License-Identifier: BSD-3-Clause
(c) 2026, Takayuki HOSODA

## Author
Takayuki HOSODA
http://www.finetune.co.jp/~lyuka/technote/tools/stats/
