InvMem

InvMem is a collection of short notes on mathematics and programming.

1.	Kepler-style coincidences	#Math
2.	Note on means	#Math
3.	Generalized Motzkin numbers	#Math
4.	Scale derivative	#Math
5.	Generating generalized Gaussian, Cauchy and related random variables	#Math #Algorithms
6.	Triplane hexagonal coordinates	#Math #Programming #BoardGames
7.	Short brainfuck programs	#Programming
8.	Lattice distance distortion	#Math
9.	Fourier-based area in polar coordinates	#Math
10.	On Weierstrass and Riemann functions	#Math
11.	Pawn protection of initial chess positions	#Math #Chess
12.	Cantor's numbers	#Math
13.	Fraction digit tables	#Math
14.	Multidimensional scaling for vowel-consonant separation	#Algorithms
15.	Vectors with constant and normal components	#Math
16.	Embedding hexagonal grid in a square grid	#Programming #BoardGames
17.	Irrationality plots	#Math
18.	Maximal chess positions	#Chess #Optimization
19.	Weighted minimum and maximum	#Math
20.	Maximal density subsquare-free arrangements	#Optimization #OpenProblem
21.	Sierpinsky-like arrays	#Math
22.	Peaceable queens	#Math #Chess #Optimization
23.	RGB Principal component analysis	#Math #Color
24.	Lightness approximations	#Color #Approximation
25.	Feature scaling	#Math
26.	Mixed radix fractal carpets	#Math
27.	Random strings with identical element counts	#Math
28.	Irrational scoring	#Math
29.	Correlation for vector-valued data	#Math

N1 / Kepler-style coincidences / #Math / 2014.09.23

Take a Kepler triangle - a right triangle with sides 1, √φ, φ. Then the length of its circumscribed circle nearly matches the perimeter of a square with a side equal to its middle edge. This yields an approximation τ ≈ 8 / √φ. Here are some coincidences having a similar form:

Constant	Approximation	Relative Error
τ	12 / 5 φ²	1 / 65270
τ	9 / 2 ³√e	1 / 2144
τ	(11 / 7)³ φ	1 / 1409
τ	8 / √φ	1 / 1043
e	³√10 / 7 S	1 / 3770
e	7 / 4 √S	1 / 3305
e	5 / Ln τ	1 / 1210
e	9 / 8 S	1 / 1186
γ	1 / √3	1 / 4289
P	√8 / 11 S	1 / 3841
P	9 / 11 φ	1 / 1519

Notation:
τ	Circle constant
e	Base of the natural logarithm
φ	Golden ratio
S	Silver ratio
P	Plastic number
γ	Euler-Mascheroni constant

N2 / Note on means / #Math / 2015.02.10

Consider a family of means of two numbers a, b:

M(a, b, p) = ∫_a^bx^{p + 1} dx / ∫_a^bx^p dx

In other words, M(a, b, p) is the mean of a distribution with density ∝ x^p in the interval [a, b]. Many well known means belong to this family:

p	Expression	Name
1	(2 / 3) (a + b - a b / (a + b))	???
0	(a + b) / 2	Arithmetic
-1/2	(a + √a b + b) / 3	Heronian
-1	(b - a) / (Ln b - Ln a)	Logarithmic
-3/2	√a b	Geometric
-2	a b (Ln b - Ln a) / (b - a)	???
-3	2 a b / (a + b)	Harmonic

N3 / Generalized Motzkin numbers / #Math / 2015.02.16

Number of strings of length N consisting of symbols from the alphabet of size b and correctly placed brackets is

⸤N / 2⸥
∑	(N \| 2 i) b^{N - 2 i} C_i ,
i = 0

where (N | 2 i) is a binomial coefficient and C_i is the ith Catalan number. This result is easy to interpret: C_i is the number of possible configurations of 2 i brackets, (N | 2 i) is the number of ways to place a bracket configuration among the N symbols, and b^{N - 2 i} is the number of possible configurations of the non-bracket symbols.

b	OEIS	First terms starting with N = 0
0	A126120	1, 0, 2, 0, 5, 0, 14, 0, 42, ...	Catalan numbers interleaved with 0s
1	A001006	1, 1, 2, 4, 9, 21, 51, 127, 323, ...	Motzkin numbers
2	A000108	1, 2, 5, 14, 42, 132, 429, 1430, 4862, ...	Catalan numbers
3	A002212	1, 3, 10, 36, 137, 543, 2219, 9285, 39587, ...
4	A005572	1, 4, 17, 76, 354, 1704, 8421, 42508, 218318, ...
5	A182401	1, 5, 26, 140, 777, 4425, 25755, 152675, 919139, ...
6	A025230	1, 6, 37, 234, 1514, 9996, 67181, 458562, 3172478, ...

N4 / Scale derivative / #Math / 2015.02.28

Derivative of f(x) is defined as the limit (f(x + h) - f(x)) / h, h → 0. By replacing the shift f + h with scaling x / s, we obtain the scale derivative: g(x) = Limit (f(x / s) - f(x)) / Ln s, s → 1. In the logarithmic x scale, scaling is equivalent to shifting and scale derivative to the normal derivative: t = Ln x; q(t) = f(x); g(x) = Limit (q(t - Ln s) - q(t)) / Ln s, s → 1 = -dq(t) / dt = -df(x) / d Ln x = -x f'(x). It is assumed that x > 0. For x < 0, same results apply in a mirrored form.

g(x) to f(x) is what a gnomon is to the figurate numbers. Infinitesimally larger version of f(x) is obtained by adding g(x) to it: f(x / s) ≈ f(x) + g(x) Ln s. We can continuously "grow" f(x) in size by adding infinitesimal amounts of scaled g(x). Under the condition f(∞) = 0, the original function can be reconstructed as a superposition of g(x) with exponentially increasing scales: f(x) = ∫_-∞⁰ g(x / e^z) dz, z = Ln s. This is easily proven by using the identity g(x) = -dq(t) / dt. If the upper integration limit is different, f(x) is obtained at the corresponding scale.

Alternatively, f(x) can be seen as being reconstructed from area-normalized g(x) at scales from 0 to 1: f(x) = ∫₀¹g(x / s) / s ds. We can "grow" the distribution with PDF equivalent to f(x) by sampling from PDF ∝ g(x / s) starting with s = 0 and increasing it linearly. At any time, distribution of all the values generated so far will match f(x) up to scaling. This trick will only work if f(x) is non-increasing. Otherwise, g(x) will contain negative values and cannot be a proper PDF.

N5 / Generating generalized Gaussian, Cauchy and related random variables / #Math #Algorithms/ 2015.03.31

Consider a generalized Gaussian (PDF ∝ e^{-|x|^p}) and generalized Cauchy (PDF ∝ 1 / (1 + |x|^p)) random variables. An easier to sample bounding functions are required to apply the rejection sampling technique. It is easy to verify that both distributions are log-log convex upwards. Therefore they can be bounded by tangent lines in the log-log scale - that is, by power functions. Piecewise power law PDFs are easily handled by inversion sampling of their pieces with corresponding probabilities. It suffices to sample the x > 0 half of the distribution and symmetrize it by multiplying the results by ±1 at random.

Let's bound f(x) ∝ PDF with h(x) consisting of two segments: h(x) = 1 if x < x₁, A x^-q otherwise. The segments match at x₁, so x₁ = A^{1 / q}. Area below h(x) is S = A^{1 / q} (1 + 1 / (q - 1)). The efficiency (acceptance rate) is ∫₀^+∞ f(x) dx / S. Let's denote the point where h(x) touches f(x) as x_e. Then A = f(x_e) x_e^q and q is the log-log slope of f(x) at x_e. Expressing A through q and solving dS / dq = 0 yields the optimal q. This process leads to a transcendental equation for the generalized Cauchy density: q / (1 - q) = Ln(1 - q / p). A slightly less efficient function that avoids the problem is obtained by assuming q = p (tangent at +∞).

Name	f(x)	q	A	Efficiency	Min efficiency
Generalized Gaussian	e^{-\|x\|^p}	p + 1	((q / p) / e)^{q / p}	(p^-1! e^{1 / p}) / (1 + 1 / p)^{1 + 1 / p}	0 at p → 0
Generalized Cauchy	1 / (1 + \|x\|^p)	p	1	τ (p - 1) / (2 p² Sin(τ / (2 p)))	τ / 8 at p = 2

Scale derivatives of the examined functions can be enclosed with three segments: x^p if x < x1, M if x < x2, A x^-q otherwise, where M = Max f(x). Scale derivative of Cauchy density can be effectively sampled by assuming q = p. In the Gaussian case, optimal q is a solution to the transcendental equation (1 / p + 1 / (1 - q)) q = Ln(1 + q / p). Approximate solution obtained with the aid of Newton's method is listed in the table below.

f(x)	M	q	A	Min efficiency
x^p e^{-\|x\|^p}	1 / e	Max((15 / 7) p + 13 / 19, 1 + p)	((1 + q / p) / e)^{1 + q / p}	0 at p → 0
x^p / (1 + \|x\|^p)²	1 / 4	p	1	2 / (1 + Ln 4) at p → +∞

N6 / Triplane hexagonal coordinates / #Math #Programming #Board_Games / 2015.04.06

Consider a hexagonal-shaped region of a hexagonal grid. A coordinate system can be formed by three semi-axes 120° apart starting from the center of the region.

The system is well suited for board games. Most hexagons are encoded with combinations of two symbols. Axis hexagons are encoded with one symbol. Compared to existing hexagonal chess variants, the system has a pleasant symmetry. Each symbol except 0 appears in equal number of hexes. Hexagonal array can be efficiently stored as three rectangular arrays overlapping at the axes. Conversion to the cubic (3D) coordinates is straightforward: triplane coordinates (n, a, α) are equal to the absolute values of the coordinates of 3D axes (x, y, z) lying in the corresponding sector: (n, a) = (-z, y), (a, α) = (-x, z), (α, n) = (-y, x).

N7 / Short brainfuck programs / #Programming / 2015.04.22

Following programs were generated with Ascension framework. Memory cells are assumed to be integers [0 .. 255], wrapping around.

Text output:

Program	Length	Output
+>-[>>+>+[++>-<<]-<+<+]>---.<<<<++.<<----..+++.>------.<<+.>.+++.------.>>-.<+.	79	Hello World!
+[-[<<[+[--->]-[<<<]]]>>>-]>-.---.>..>.<<<<-.<+.>>>>>.>.<<.<-.<<+.	66	hello world!

For comparison, shortest human-coded "Hello World!" program is 89 instructions long:
++++++++++[>>->++>+++>+++>----->+[++++++++<]<-]>>++.>+.>--..>+.>++.>---.<<.+++.<.<-.>>>+.

Calculating sequences in memory:

Program	Length	Output
-[+>+>--[<]>>]	14	Powers of 2
+[>>-->+[<]<+]	14	Powers of 2
>-[[<+>>>-<-<+]>]	17	Fibonacci sequence
>-[[>-->+<<<+>+]>--]	20	Squares

N8 / Lattice distance distortion / #Math / 2015.05.17

Take a regular lattice with each point connected to a few neighbouring points, thereby forming a graph. Let's only consider physically meaningful neighbour sets: a line connecting a lattice point to any of its neighbours should not intersect other Voronoi cells in more than one point. This requirement is stricter than the cell vertex connectivity. A neighbourhood can be denoted by the number of nearest lattice points satisfying the connectivity requirement. Graph edges can either have unit weights or have weights equal to the Euclidean distances between the points they connect (w suffix). Let {R} be a set of ratios of the shortest graph distances to the Euclidean distances between any two lattice points. Distance distortion is Max {R} / Min {R}. Distortions of various 2D and 3D lattices are summarized in the following tables. f, e, v denote face, edge and vertex cell connectivity.

Voronoi cell	Neighbors	Connectivity	Distortion	%
Triangle	6	ev	2	100
Triangle	3	e	3 / 2	50
Square	8	ev	√2	41
Square	4	e	√2	41
Triangle	6w	ev	2 / √3	15
Hexagon	6	e	2 / √3	15
Square	8w	ev	(1 + √2) / √5	8

Lattice	Voronoi cell	Neighborhood	Connectivity	Distortion	%
Simple cubic	Cube	26	fev	√3	73
Body-centered cubic	Truncated octahedron	8	f	√3	73
Simple cubic	Cube	6	f	√3	73
Simple cubic	Cube	18	fe	4 / √6	63
Body-centered cubic	Truncated octahedron	14	f	√2	41
Hexagonal close-packed	Trapezo-rhombic dodecahedron	12	f	√2	41
Face-centered cubic	Rhombic dodecahedron	12	f	√2	41
Simple cubic	Cube	18w	fe	(1 + √2) / √3	39
Body-centered cubic	Truncated octahedron	14w	f	√3 / 2	22
Simple cubic	Cube	26w	fev	(1 + √3) / √6	12

N9 / Fourier-based area in polar coordinates / #Math / 2015.05.23

Take a Fourier decomposition of a function defined on the [0, τ] interval: f(t) = a₀ + ∑ a_i Sin(i t + φ_i). Then the area of f plotted in polar coordinates is (τ / 2) (a₀² + ∑ a_i² / 2).

N10 / On Weierstrass and Riemann functions / #Math / 2015.06.15

Take the Fourier series:

f_s(t) = ∑ A_k Sin(F_k t)
f_c(t) = ∑ A_k Cos(F_k t)

Weierstrass and Riemann-style functions correspond to exponential and power-law forms of A_k and F_k.

Name	A_k	F_k	Log-log slope β	Dimension D	N terms
Weierstrass	a^-k	b^k	1 + 2 Ln \|a\| / Ln b	2 - Ln \|a\| / Ln b	-Ln ε / Ln \|a\|
Riemann	k^-a	k^b	1 + (2 a - 1) / b	2 - (a - 1 / 2) / b, b ≠ 1	ε^{2 / (1 - 2 a)} / 2

Numerical calculations can only rely on a limited number of terms. Suppose we want to achieve a relative inacuraccy ε. Then ε² is the ratio of the power of the truncated terms to the total power. Solving that equation gives the required number of terms.

There is a very simple non-rigorous way to estimate fractal dimensions of the functions in question. Power density around the k-th term is

P_k = A_k² / (F_{k + 1 / 2} - F_{k - 1 / 2}) ≈ A_k² / F'_k

In that sense both Weierstrass and Riemann functions have an approximately power-law distributed power density. A well-known result pertaining to the fractional Brownian motion states that if the power spectrum is proportional to F^-β, then the fractal dimension is D = (5 - β) / 2.

Weierstrass Sin and Cos functions, a = √2, b = 2, D = 3 / 2:

Riemann Sin and Cos functions, a = 3 / 2, b = 2, D = 3 / 2:

The sine and cosine versions can be viewed as the projections of a single complex-valued function F(t) = ∑ A_k Exp(i F_k t). If the frequency parameter is an integer, the function represents a closed curve on a complex plane.

N11 / Pawn protection of initial chess positions / #Math #Chess / 2015.08.01

Consider all possible arrangements of the chess pieces on the first rank. Assuming that the bishops occupy opposite-colored squares, there are 1440 reflection-distinct positions. 542 of these positions have every pawn protected at least once. Two maximally protected positions with only one pawn being protected once and all other pawns twice are RQBBRNKN and RKBBRNQN. Two minimally protected positions having three unprotected pawns are NNRQBKRB and NNRKBQRB.

The best overall position is apparently BQNRRNKB: the rooks are placed on the central files, the bishops on the main diagonals, each knight has four options for its first move, and only two pawns are protected just once.

N12 / Cantor's numbers / #Math / 2015.08.25

Recall Cantor's first uncountability proof. Start with an interval and a sequence of real numbers x_i. Find the first two numbers from x_i that lie inside the interval. These two numbers form the next interval. Repeat. In case of an infinite number of iterations, any number from the limit interval is not contained in x_i.

Take a sequence of rational numbers within the initial interval, sorted in order of increasing denominator, and in case of equal denominators in order of increasing numerator. What number does Cantor's bracketing procedure converges to?

Take two fractions A = an / ad, B = bn / bd, ad ≤ bd, satisfying the determinant condition |bn ad - an bd| = 1. The first number lying between A and B is the mediant of A and B: M = (an + bn) / (ad + bd). The next number is the mediant of A and M. We thus obtain a system of recurrent equations:

a_i = a_{i - 1} + b_{i - 1}
b_i = 2 a_{i - 1} + b_{i - 1}

Thanks to the mediant property, the determinant condition is preserved throughout the recurrence. Eliminating b_i gives a single recurrent equation a_{i + 1} = 2 a_i + a_{i - 1}, which has a solution a_i = α (1 + √2)ⁱ + β (1 - √2)ⁱ. Plugging in initial conditions gives α = (a₀ + b₀ / √2) / 2. The sought-after Cantor number is the limit:

Lim an_i / ad_i = αn / αd = (√2 an₀ + bn₀) / (√2 ad₀ + bd₀)

N13 / Fraction digit tables / #Math / 2015.11.25

Following images depict binary and ternary digits of irreducible fractions in the (0, 1) interval. Fractions are ordered first by denominators and then by numerators.

$Binary digits of fractions$ $Ternary digits of fractions$

Take the numbers formed by digits lying on the main diagonal. By Cantor's diagonal argument, a complement of the binary diagonal number is irrational.

Binary diagonal number:
Binary: 0.111000111001010110000110111110101011100111111011011010000100011 ... (OEIS A257341)
Decimal: 0.8890003549695230139346719690879129310538480210179543414543465370 ... (OEIS A257342)
Continued fraction: 0, 1, 8, 110, 1, 1, 1, 4, 1, 3, 33, 3, 4, 1, 26, 1, 37, 1, 23, 2, 117, 1, 1, 5, 1, 2, 20, 12, 1, 2, 1, 4, 1, ...

Ternary diagonal number:
Ternary: 0.1002211021120000111220000002000011111210202200100001202102121112 ...
Decimal: 0.3682120994629780785967399570406755660508922444001004151748993819 ...
Continued fraction: 0, 2, 1, 2, 1, 1, 12, 1, 2, 9, 2, 6, 5, 1, 2, 2, 7, 5, 1, 1, 2, 2, 2, 6, 7, 1, 41, 1, 4, 2, 2, 3, 1, ...

N14 / Multidimensional scaling for vowel-consonant separation / #Algorithms / 2015.12.04

Multidimensional scaling (MDS) can be effectively used for character-level text analysis. Denote the distribution of characters appearing at position δ relative to the character a as D(a, δ). Distance between characters a and b can be defined as

ρ(D(a, +1), D(b, +1)) + ρ(D(a, -1), D(b, -1)),

where ρ(Da, Db) is some measure of distance between the two distributions. MDS applied to the distance matrices obtained from texts in natural languages readily reveals clusters of vowels, consonants and punctuation marks. Following images depict the results for English, Russian and Voynich manuscript characters and the L1 metric ρ.

Multidimensional scaling applied to English characters

Multidimensional scaling applied to Russian characters

Multidimensional scaling applied to the Voynich manuscript characters

N15 / Vectors with constant and normal components / #Math / 2016.01.29

Take a 2D vector having one constant component c and the other component coming from a standard normal distribution. PDF of the distribution of the vector lengths is given by ρ(x) = 2 x Exp(-(x² - c²) / 2) / √τ (x² - c²).

The mean of the distribution is approximated by √c² + c / 6 + 4 / τ with a maximum relative error of 1%.

N16 / Embedding hexagonal grid in a square grid / #Programming #Board_Games / 2016.02.01

A subset of square grid locations (i, j) corresponding to a hexagonal grid is given by (i + 2 j) mod 4 = 0. Neighbours of a hex have offsets (0, ±2), (±2, ±1). 3D coordinates of a hex are

x = i / 2
y = (-i + 2 j) / 4
z = (-i - 2 j) / 4

This representation is easy to work with, but has poor memory efficiency, as only 1/4th of all locations are actually used. However, these extra locations can be used to store additional information, for example serve as an intermediate buffer in a cellular automata scenario.

N17 / Irrationality plots / #Math / 2016.02.01

Let x be a real number, a_i its continued fraction representation without the integer part, p_i / q_i its convergents. Let's define irrationality functions:

F_a = ∑ 2^-i / a_i
F_b = ∑ 1 / q_i
F_c = ∑ 2^-i Ln a_i

For terminating continued fractions, a_i is extended with +∞ terms, which does not changes the fraction value. Obviously F_a and F_b converge for all x. F_c converges almost everywhere except for rationals and some transcendentals with fast growing a_i. Numerical calculations of F_c based on machine precision rationals should allways yeild +∞. So for the purpose of drawing F_c, a_i is extended with Khinchin's constant representative of the typical a_i values instead of +∞.

F_a(x):

F_b(x):

F_c(x):

Local irrationality maxima are attained at x whose continued fractions end with an infinite sequence of 1s. Such x have Stern-Brocot tree paths that eventually alternate between left and right ad infinitum, as each run of left or right steps corresponds to a continued fraction term equal to its length. Take an interval [A, B], A = An / Ad, B = Bn / Bd with A, B being unimodular: |An Bd - Ad Bn| = 1. Suppose that B is the more complex fraction in a sense that Bd ≥ Ad and Bn ≥ An. This means that in the Stern-Brocot tree, B is a right child of A. Take the left step by computing a mediant of A and B, then the right step and so on. We thus obtain a system of recurrent equations for both numerators and denominators (see InvMem 12 for a similar recursion in a different context):

Ax_{i + 1} = Bx_i
Bx_{i + 1} = Ax_i + Bx_i

Solution to the above equations is a Fibonacci sequence:

Ax_i = α φⁱ + β (-1 / φ)ⁱ
α = (Ax₀ + φ Bx₀) / (1 + φ)

The sought-after limit, which might be called the golden mediant, is

M_g(A, B) = Lim An / Ad = (An + φ Bn) / (Ad + φ Bd)

If A is the more complex fraction instead, A and B should be swapped before calculating the golden mediant. M_g(A, B) is a point of maximal irrationality on the unimodular interval [A, B]. Golden mediants share the worst rational approximation property (see Hurwitz's theorem) with the golden ratio.

N18 / Maximal chess positions / #Chess #Optimization / 2016.02.04

Following record-breaking positions were generated with Ascension framework. Amount of improvement over the previous records (see "Guide to fairy chess" by Anthony Dickins) is indicated in brackets.

168 moves by both sides, all pieces on the board (+4)

183 moves by both sides, no promotions allowed (+2)

224 moves by both sides, promotions allowed (+1)

89 captures by both sides, all pieces on the board (+1)

50 captures by white, all pieces on the board (+1)

N19 / Weighted minimum and maximum / #Math / 2016.02.22

How can one define a weighted minimum and maximum of the numbers a_i having weights w_i? One approach is to calculate a limit of a weighted power mean:

WMax = Lim ^q√∑ a _i^q w _i^q / ∑ w _i^p, q → +∞ = Max(a_i w_i) / Max(w_i)

WMin = Lim ^q√∑ a _i^q w _i^-q / ∑ w _i^-q, q → -∞ = Min(a_i / w_i) / Min(1 / w_i)

Another idea is to interpret the normalized weight w_i as the probability of a_i existing. Normalize the weights to unit maximum by dividing by Max w_i . Sort a_i in ascending | descending order. The probability of a_i being the maximum | minimum is the probability of it existing and all greater | smaller values not existing:

p_i = w_i ∏ (1 - w_j), j > i.

The sought-after result is the mean of the resulting distribution: WMax | WMin = ∑ a_i p_i.

N20 / Maximal density subsquare-free arrangements / #Optimization #OpenProblem / 2016.02.22

What is the largest number of points that can be placed on a N × N grid so that no four of them form a square? Exact values are known for N up to 9 (OEIS A240443). Following solutions for N = 10 .. 16 were generated with tabu search and simulated annealing via Ascension framework.

10: 50	11: 58	12: 67	13: 76
14: 86	15: 95	16: 106

If only axis-parallel squares are considered, maximal numbers are known for N up to 10 (OEIS A227133). Solutions obtained for N = 11 .. 16:

11: 73	12: 85	13: 98
14: 112	15: 127	16: 142

Last edit: 2016.06.04 (updated with better results)

N21 / Sierpinsky-like arrays / #Math / 2016.03.01

Define a 2 × 2 "seed" ((1, 1), (1, 0)). Start with A = ((1)), iteratively replace every element A_ij with A_ij × Seed. Resulting infinite array is essentially a Sierpinsky triangle. Alternatively, A_ij can be defined as being equal to 1 only if binary representations of i and j do not contain a '1' in the same position: A_ij = 1 if i AND j = 0, 0 otherwise, where AND denotes a bitwise 'and' operation.

Let a(n) = ∑ A_ij for 0 ≤ i, j < n. a(2 n) = 3 a(n), because after a single replacement iteration, every 1 yields a block containing three new 1s. For odd indices, the edge contains incomplete blocks that only have two 1s, so a(2 n + 1) = 2 a(n + 1) + a(n) (A064194).

Two 3D versions arise naturally.

1. Seed = (((1, 1), (1, 1)), ((1, 1), (1, 0))). All three coordinates of nonzero elements never have a '1' bit in identical position: i AND j AND k = 0. Let a(n) = ∑ A_ijk for 0 ≤ i, j, k < n. a(2 n) = 7 a(n). For odd n, in addition to incomplete 2 × 2 × 1 face blocks containing four 1s, there are 3 × 2^Z(n) incomplete 2 × 1 × 1 edge blocks only containing two 1s, where Z(n) is the number of zeroes in the binary representation of n. Thus a(2 n + 1) = 4 a(n + 1) + 3 a(n) - 6 × 2^{Z(2 n + 1)}. a(n) (A269589) is a lower bound for the 3D verson of no subsquares problem (A268239, also see InvMem 20).

2. Seed = (((1, 1), (1, 0)), ((1, 0), (0, 0))). No pair of coordinates of nonzero elements contains a '1' bit in identical position: i AND j = 0, j AND k = 0, k AND i = 0. a(2 n) = 4 a(n), a(2 n + 1) = 3 a(n + 1) + a(n) (A268524).

N22 / Peaceable queens / #Math #Chess #Optimization / 2016.04.03

OEIS A250000 gives the largest M such that M white and M black queens can be placed on a N × N board and queens of opposite color do not attack each other. For N ≥ 14, simulated annealing yields solutions with 28, 32, 37, 42, 47, 52, 58, 64, 70, 77, 84 queens per side.

Solutions typically consist of four polygonal regions. What happens at the limit N → ∞? Replace the N × N board by a unit square. The areas of the regions (see illustration) are:

A1 = (4 x (1 − x) − (1 − 2 y)²) / 8
A2 = (1 − 4 x² − 2 y²) / 8
A = A1 + A2 = (2 x (1 − 2 x) + y (2 − 3 y)) / 4

Maximum of the total area A is 7 / 48, attained at x = 1 / 4, y = 1 / 3. Observation: with a few exceptions having a non-polygonal structure (N = 5, 9), M = Floor(7 / 48 N²).

N23 / RGB Principal component analysis / #Math #Color / 2016.05.29

Principal component analysis applied to gamma-compressed RGB values from a large set of photographic images (MIRFLICKR25000 dataset) yielded the following transformation matrix:

0.576	0.585	0.571
0.721	-0.034	-0.692
-0.386	0.810	-0.441

Relative importance of the axes, as measured by the square root of their eigenvalues, is about 1 : 1/3 : 1/7. The axes closely correspond to black - white, orange - azure (hue 30° - 210°) and green - magenta (hue 120° - 300°) directions. Normalized to unit dynamic range of the axis projections, the matrix is well approximated by

1/3	1/3	1/3
1/2	0	-1/2
-1/4	1/2	-1/4

Projection of the RGB values
normalized to unit maximum
along the main cube diagonal

Approximate PCA planes

N24 / Lightness approximations / #Color #Approximation / 2016.05.29

Lightness of a color can be calculated with the aid of the Lab color system, but in many cases a simple approximation would suffice. Following tables summarize such approximations. R, G, B denote gamma-compressed sRGB values. Equivalent gray level denotes the level of sRGB gray having the same lightness as the color in question. The approximations were obtained by minimizing the L₂ and L_∞ errors. L₂ error is weighted with typical real-world colors (RGB data taken from the MIRFLICKR25000 dataset), and the L_∞ error was calculated along the whole RGB cube. All the approximations give 0 when applied to black and 1 when applied to white.

Equivalent gray level approximation	Error
0.301 R + 0.622 G + 0.077 B	L₂ =	1 / 55
0.279 R + 0.643 G + 0.078 B	L_∞ <	1 / 4
(0.228 R² + 0.696 G² + 0.076 B²)^{1 / 2}	L₂ =	1 / 367
(0.224 R² + 0.700 G² + 0.076 B²)^{1 / 2}	L_∞ <	1 / 38
(0.215 R^p + 0.714 G^p + 0.071 B^p)^{1 / p}, p = 2.169	L₂ =	1 / 496
(0.234 R^p + 0.695 G^p + 0.071 B^p)^{1 / p}, p = 2.063	L_∞ <	1 / 40

Lightness approximation	Error
0.325 R + 0.641 G + 0.034 B	L₂ =	1 / 30
0.288 R + 0.633 G + 0.079 B	L_∞ <	1 / 4
(0.267 R² + 0.674 G² + 0.059 B²)^{1 / 2}	L₂ =	1 / 43
(0.232 R² + 0.692 G² + 0.076 B²)^{1 / 2}	L_∞ <	1 / 19
(0.208 R^p + 0.744 G^p + 0.048 B^p)^{1 / p}, p = 2.909	L₂ =	1 / 47
(0.222 R^p + 0.705 G^p + 0.073 B^p)^{1 / p}, p = 2.134	L_∞ <	1 / 24
(0.213 R^p + 0.730 G^p + 0.057 B^p)^{1 / q}, p = 2.211, q = 2.367	L₂ =	1 / 76

N25 / Feature scaling / #Math / 2016.10.15

Feature scaling directly affects the results of dimensionality reduction methods such as PCA and MDS. Standardization (rescaling to zero mean and unit variance) is the most common method. But if all the features belong to a ratio scale (have a meaningful zero), one can divide by the mean or the root mean square instead.

Normalization factor	Feature importance
1	σ
σ	1
μ	CV
√⟨x²⟩	CV / √1 + CV²

A feature's impact is proportional to its standard deviation after the normalization. Standardization assigns equal importance to all features, while the unit mean and RMS normalization emphasize features with a higher coefficient of variation (CV = σ / μ), which may be more appropriate if relative differences are important. Note that the RMS normalization guarantees bounded feature importance. If all the features have the same units of measurement, the normalization can be skipped, although it would likely be beneficial if the features significantly differ in scale.

Another potential issue is the presence of highly correlated features. Duplicating a feature adds no new information but will distort the results of PCA or MDS. Consider a dataset having D uncorrelated features. Replace each feature i by M_i duplicates. The data will now lie in a subspace spanned by D oblique axes and each feature will be inflated by a factor of √M_i.

Collinear inflation correction (CIC) factors applicable in the general case have the form F_i = √∑ Corr²(i, j), where Corr(i, j) is Pearson's correlation between features i and j. CIC provides a principled and robust alternative to discarding highly correlated features. It is independent of the feature normalization and should be performed in conjunction with it.

N26 / Mixed radix fractal carpets / #Math / 2017.04.18

Sierpinski carpet can be constructed from a single cell by iteratively subdividing the grid into 3 x 3 cell blocks and then removing the cells with both coordinates equal to 1 modulo 3. If 2 is used in place of 3, the procedure would yield a Sierpinski triangle. However, the subdivision factor S and the coordinate test modulo M need not necessarily be the same. Let's call such cases mixed radix carpets (or (hyper)sponges, depending on the dimension).

The total number of cells in a mixed radix carpet at iteration i can be calculated by considering the numbers of cells with various values of coordinates modulo M. Denote by (x; y; ...) the cells with coordinates modulo M identical to x, y, ... up to a permutation, and by N(x; y; ...) the number of such cells. Let's examine the case S = 3, M = 2 and the cells (1; 1) being excluded. Subdivision produces the following cells:

(0; 0) → 4(0; 0), 4(0; 1), 1(1; 1)
(0; 1) → 2(0; 0), 5(0; 1), 2(1; 1)
(1; 1) → 1(0; 0), 4(0; 1), 4(1; 1)

Since the cells (1; 1) are excluded, at iteration i + 1 we have:

N_{i + 1}(0; 0) = 4 N_i(0; 0) + 2 N_i(0; 1)
N_{i + 1}(0; 1) = 4 N_i(0; 0) + 5 N_i(0; 1)

The total number of cells N_{i + 1} = 8 N_i(0; 0) + 7 N_i(0; 1). By substitution, N_{i + 2} = 60 N_i(0; 0) + 51 N_i(0; 1), and through simple algebraic manipulations we obtain N_{i + 1} = 9 N_i − 12 N_{i − 1}. Asymptotic growth rate is the largest root of the characteristic polynomial: R = (9 + √33) / 2, and the fractal dimension is D = Ln R / Ln S.

Results for other values of S, M and other dimensions can be obtained in the same manner. In general, the system of linear recurrence equation is defined by the subdivision matrix with rows and columns corresponding to the excluded cells removed. Here are two more subdivision rules in matrix form:

3D, S = 3, M = 2:

(N(0; 0; 0), N(0; 0; 1), N(0; 1; 1), N(1; 1; 1)) ·

8	12	6	1
4	12	9	2
2	9	12	4
1	6	12	8

2D, S = 2, M = 3:

(N(0; 0), N(0; 1), N(1; 1), N(0; 2), N(2; 2), N(1; 2)) ·

1	2	1	0	0	0
1	1	0	1	0	1
1	0	0	2	1	0
0	1	1	1	0	1
0	0	1	0	1	2
0	1	0	1	1	1

Results for selected cases are summarized in the following table. D₀ is the grid dimension; excluded cells are denoted by the number of '1's in their coordinates modulo M.

S	M	D₀	Excluded	Recurrence	R	D	First terms	OEIS	Images
2	3	2	0	2	2	1	1, 3, 4, 8, 16, 32, 64, 128, ...	A000079, i > 1	8
2	3	2	0, 2	2	2	1	1, 2, 4, 8, 16, 32, 64, 128, ...	A000079	8
2	3	2	1	1, 4	2.562	1.357	1, 2, 6, 14, 38, 94, 246, 622, ...	A026597	8
2	3	2	2	3, 2	3.562	1.833	1, 3, 11, 39, 139, 495, 1763, ...	A007482	8
3	2	2	1, 2	4	4	1.262	1, 4, 16, 64, 256, 1024, ...	A000302	5
3	2	2	0, 1	4	4	1.262	1, 1, 4, 16, 64, 256, 1024, ...	A000302, i > 0	5
3	2	2	1	5	5	1.465	1, 5, 25, 125, 625, 3125, ...	A000351	5
3	2	2	0, 2	5	5	1.465	1, 4, 20, 100, 500, 2500, ...	A005054	5
3	2	2	2	9, −12	7.372	1.818	1, 8, 60, 444, 3276, 24156, ...	A285391	5
3	2	2	0	9, −12	7.372	1.818	1, 5, 36, 264, 1944, 14328, ...	A285392	5
3	2	3	1, 2, 3	8	8	1.893	1, 8, 64, 512, 4096, 32768, ...	A001018	1, 2, 3, 4
3	2	3	0, 1, 2	8	8	1.893	1, 1, 8, 64, 512, 4096, ...	A001018, i > 0	1, 2, 3, 4
3	2	3	1, 2	9	9	2	1, 9, 81, 729, 6561, 59049, ...	A001019	1, 2, 3, 4
3	2	3	0, 2, 3	12	12	2.262	1, 12, 144, 1728, 20736, ...	A001021	1, 2, 3, 4
3	2	3	0, 1, 3	12	12	2.262	1, 6, 72, 864, 10368, ...	A067419	1, 2, 3, 4
3	2	3	1, 3	14	14	2.402	1, 14, 196, 2744, 38416, ...	A001023	1, 2, 3, 4
3	2	3	0, 2	14	14	2.402	1, 13, 182, 2548, 35672, ...	A285399	1, 2, 3, 4
3	2	3	2, 3	20, −48	17.211	2.590	1, 20, 352, 6080, 104704, ...	A285393	1, 2, 3, 4
3	2	3	0, 1	20, −48	17.211	2.590	1, 7, 116, 1984, 34112, ...	A285394	1, 2, 3, 4
3	2	3	2	28, −195, 324	18.322	2.647	1, 21, 399, 7401, 136227, ...	A285396	1, 2, 3, 4
3	2	3	1	28, −195, 324	18.322	2.647	1, 15, 249, 4371, 78693, ...	A285395	1, 2, 3, 4
3	2	3	0, 3	21	21	2.771	1, 18, 378, 7938, 166698, ...	A285400	1, 2, 3, 4
3	2	3	3	32, −195, 216	24.359	2.906	1, 26, 646, 15818, 385822, ...	A285397	4
3	2	3	0	32, −195, 216	24.359	2.906	1, 19, 452, 10948, 266300 ...	A285398	1, 2, 3, 4

N27 / Random strings with identical element counts / #Math / 2018.05.31

Probability of two random binary strings of length n having identical bit counts is

P = ∑_{0 ≤ i, j ≤ n, i + j = n} ((i, j)! / 2ⁿ)² = (n, n)! / 4ⁿ = A000984 / 4ⁿ = √2 / τ (n − 1/2)! / n! ≈ 1 / √6 / 7 + τ n / 2

Probability of two random ternary strings having identical element counts is

P = ∑_{0 ≤ i, j, k ≤ n, i + j + k = n} ((i, j, k)! / 3ⁿ)² = A002893 / 9ⁿ ≈ 1 / (4 / 7 + (2 τ / 3^3/2) n),

where (a, b, c, ...)! = (a + b + c + ...)! / (a! b! c! ...) is the multinomial coefficient.

N28 / Irrational scoring / #Math / 2018.05.31

Irrational point system can be employed to eliminate the possibility of several players having an equal score in a tournament unless their numbers of wins and draws are identical. Due to worst rational approximation properties (Hurwitz's theorem), suitable choices for the win / draw point ratio are 1 + φ, 4 − φ and γ, where φ = (1 + √5) / 2 is the golden ratio and γ = 1 + √2 is the silver ratio. These values result in the following dependencies of the minimal gap between two different scores Δ on the number of games n:

Win / draw ratio	Asymptotic worst case Δ	Worst case n
1 + φ	(1 + φ) / (√5 n) ≈ 1.17 / n	2, 3, 5, 8, 13, 21, 34, ... (A000045)
4 − φ	(4 − φ) / (√5 n) ≈ 1.07 / n	2, 5, 7, 12, 19, 31, 50, ... (A013655)
γ	γ / (√8 n) ≈ 0.85 / n	2, 5, 12, 29, 70, 169, 408, ... (A000129)

Dependencies of the minimal gap between two different scores on the number of games

N29 / Correlation for vector-valued data / #Math / 2018.05.31

Pearson's correlation coefficient can be extended to vector-valued data by replacing multiplication with a dot product. In other words, correlation is the mean dot product between the corresponding elements of the datasets normalized to zero mean and unit mean square vector length.

If you appreciate the effort put into creating this page and would like to see more frequent updates, you can support me on Patreon. Top patrons of the month:

Brian Bucklew
Anton Shepelev
Adam Hill
Tomoyuki Naito
John Metcalf

Research

Other