有理近似による数値精度と命令レベル並列性のバランス
このページでは、X86 拡張倍精度(Extended-precision / 80ビット)演算をターゲットとした ANSI C(C89)によるアークタンジェント(arc tangent ; 逆正接)関数の実装を紹介します。 初等関数では多項式近似が一般的に用いられますが、 この実装ではパデ近似(Padé approximation ; 有理近似)を採用することで、 数値精度と最新のスーパースケーラ(superscalar)プロセッサの効率的な利用を両立させています。 分子と分母を独立に評価することで、中核となる近似は命令レベル並列処理(ILP, Instruction-Level Parallelism)や、 現代の CPU で利用可能な積和演算(FMA, fused multiply–add)パイプラインに適しています。 主要区間における理論誤差(絶対値)は 6.5×10-23 未満に抑えられていて、 アークタンジェントの恒等式に基づく引数削減と組み合わせることで、 この実装は値域全体にわたって高い精度を実現しています。 この設計は、移植性、予測可能性、アーキテクチャへの対応を重視していて、 高品質な libm 実装やパフォーマンス重視の数値ソフトウェアに適しています。
_atanl — arc tangent function in C89 (ANSI C)
/* _atanl -- arc tangent function in C89 (ANSI-C), X86 Extended-precision (80-bit)
* _atanl Rev.1.21 (Jan. 22, 2026) (c) 2007-2026 Takayuki HOSODA
* SPDX-License-Identifier: BSD-3-Clause
*/
#include <math.h>
#include <float.h>
#define Q_SQRT2 1.41421356237309504880168872421L
#define Q_PI_2 1.57079632679489661923132169164L
#define Q_PI_4 0.78539816339744830961566084582L
#ifndef LDBL_EPSILON
#define LDBL_EPSILON 1.0842021724855044340E-19L
#endif
long double _atanl(long double x);
/* _atanl(x) computes the principal value of the arc tangent of x. */
long double _atanl(long double x) {
long double q, s, v, w;
long double offset = 0;
if (x == 0) return x;
if (x > (2 / LDBL_EPSILON)) return Q_PI_2;
if (x < -(2 / LDBL_EPSILON)) return -Q_PI_2;
if ((v = fabs(x)) >= 1) {
if (v > Q_SQRT2 + 1.0L) {
offset = -Q_PI_2; v = 1 / v;
} else {
offset = Q_PI_4; v = (v - 1) / (1 + v);
}
} else if (v > Q_SQRT2 - 1.0L) {
offset = -Q_PI_4; v = (1 - v) / (1 + v);
} /* Argument reduction into four regions using atan identities */
s = v * v;
w = 29360128.L; q = 27923620725.L;
w = w * s + 23930643317.L; q = q * s + 742768311285.L;
w = w * s + 501165956970.L; q = q * s + 6684914801565.L;
w = w * s + 3525222266475.L; q = q * s + 28472785265925.L;
w = w * s + 11362119869100.L; q = q * s + 64710875604375.L;
w = w * s + 18420316154955.L; q = q * s + 80639706522375.L;
w = w * s + 14615037006570.L; q = q * s + 51967810869975.L;
w = w * s + 4512611027925.L; q = q * s + 13537833083775.L;
w = w * s; /* (14,14) Pade approximation for atan(x) / x - 1 */
v -= w / q * v; /* atan(x) = x - (atan(x) / x - 1) * x */
s = v + offset; /* |err| < 1.43e-21, x in [0, sqrt(2) - 1] */
return (offset * x >= 0) ? (offset != 0 || x >= 0) ? s : -v: -s;
}
In an empirical evaluation conducted using GCC 7.5, the custom _atanl function was tested on
65536 (= 216) uniformly spaced points per unit interval for x ∈ [0, 8]
in long double-precision values.
The relative error, computed as _atanl(x) / atanref(x) - 1 was found to be within
± 1.0842021724855044340E-19L,
approximately the machine epsilon for IEEE 754 long double precision.
Where atanref(x) denotes a rounded high-precision reference (≈ 40 digits),
not a libm result.
For |x| > 8, argument reduction ensures that approximation errors rapidly diminish.
_atanl(x), shown as _atanl(x) / atanref(x) - 1
The theoretical approximation error of the (14,14) Padé approximation attains its maximum value of approximately -6.46×10-23 at x = √2 - 1.
The plot is calculated in a high-precision (≈40 digits) environment.
_atanl(x)
_atan is a double-precision (64-bit) implementation of the arc-tangent function,
derived from _atanl, the extended-precision version.
To reduce the approximation error near the upper limit
—where the error tends to be large in plain Padé approximations—
some coefficients for higher-order terms are adjusted in a minimax-like manner.
This makes it possible to use an approximation of order (8,10),
instead of the (10,10) order required by a plain Padé approximation for double precision.
With a (10,10) plain Padé approximation, the maximum error is |error| < 3.86×10-17. In contrast, the (8,10) modified Padé approximation achieves a slightly better bound of |error| < 3.14×10-17 over the interval x ∈ [0, √2 - 1].
_atan — arc tangent function in C89 (ANSI-C)
/* _atan -- arc tangent function in C89 (ANSI-C), IEEE 754 double-precision
* _atan Rev.1.21 (Jan. 22, 2026) (c) 2007-2026 Takayuki HOSODA
* SPDX-License-Identifier: BSD-3-Clause
*/
#include <math.h>
#include <float.h>
#ifndef M_SQRT2
#define M_SQRT2 1.41421356237309504880168872421
#endif
#ifndef M_PI_2
#define M_PI_2 1.57079632679489661923132169164
#endif
#ifndef M_PI_4
#define M_PI_4 0.78539816339744830961566084582
#endif
#ifndef DBL_EPSILON
#define DBL_EPSILON 2.2204460492503131E-16
#endif
double _atan(double x);
/* _atan(x) computes the principal value of the arc tangent of x. */
double _atan(double x) {
double q, s, v, w;
double offset = 0;
if (x == 0) return x;
if (x > (2 / DBL_EPSILON)) return M_PI_2;
if (x < -(2 / DBL_EPSILON)) return -M_PI_2;
if ((v = fabs(x)) >= 1) {
if (v > M_SQRT2 + 1) {
offset = -M_PI_2; v = 1 / v;
} else {
offset = M_PI_4; v = (v - 1) / (1 + v);
}
} else if (v >= M_SQRT2 - 1) {
offset = -M_PI_4; v = (1 - v) / (1 + v);
} /* Argument reduction into four regions using atan identities */
s = v * v;
q = 2401244.6571; w = 2073564.8997;
q = q * s + 52026974.7295; w = w * s + 32801340.0004;
q = q * s + 312161850.; w = w * s + 136306170.;
q = q * s + 758107350.; w = w * s + 205633428.;
q = q * s + 800224425.; w = w * s + 101846745.;
q = q * s + 305540235.; w = w * s;
/* (8,10) modified Pade approximation for atan(x) / x - 1 */
v -= w / q * v; /* atan(x) = x - (atan(x) / x - 1) * x */
s = v + offset; /* |err| < 3.2e-17, x in [0, sqrt(2) - 1] */
return (offset * x >= 0) ? (offset != 0 || x >= 0) ? s : -v : -s;
}
In an empirical evaluation conducted using GCC 7.5, the custom _atan function was tested on
65536 (= 216) uniformly spaced points per unit interval for x ∈ [0, 8]
in double-precision values.
The relative error, computed as _atan(x) / atanref(x) - 1 was found to be within
±2.2204460492503131E-16,
approximately the machine epsilon for IEEE 754 double precision.
Where atanref(x) denotes a rounded high-precision reference (≈ 40 digits),
not a libm result.
For |x| > 8, argument reduction ensures that approximation errors rapidly diminish.
_atan(x), shown as _atan(x) / atan(x) - 1,
where atan(x) is a rounded high-precision reference (≈40 digits), not a libm result.
The theoretical approximation error of the (8,10) modified Padé approximation attains its maximum value of approximately -3.14×10-17 near x ≈ 0.4.
The plot is calculated in a high-precision (≈40 digits) environment.
_atan(x)
長年にわたり、初期の 8 ビット CPU から現代の IEEE 754 システムに至るまでのハードウェアの進化を背景として、 テイラー近似、連分数近似、有理近似といったさまざまな手法が研究されてきました。
Although not treated in the present implementation, the derivation of the inverse tangent function can also be formulated via elliptic functions and their inverse mappings. The following references 3–8 are cited for completeness.



© 2000 Takayuki HOSODA.