InvMem is a collection of short notes on mathematics and programming.
Contents
N1 / Keplerstyle coincidences / #Math / 2014.09.23
Take a Kepler triangle  a right triangle with sides 1, √φ, φ. Then the length of its circumscribed circle nearly matches the perimeter of a square with a side equal to its middle edge. This yields an approximation τ ≈ 8 / √φ. Here are some coincidences having a similar form:
Constant  Approximation  Relative Error 

τ  12 / 5 φ^{2}  1 / 65270 
τ  9 / 2 ^{3}√e  1 / 2144 
τ  (11 / 7)^{3} φ  1 / 1409 
τ  8 / √φ  1 / 1043 
e  ^{3}√10 / 7 S  1 / 3770 
e  7 / 4 √S  1 / 3305 
e  5 / Ln τ  1 / 1210 
e  9 / 8 S  1 / 1186 
γ  1 / √3  1 / 4289 
P  √8 / 11 S  1 / 3841 
P  9 / 11 φ  1 / 1519 
Notation:  
τ  Circle constant 
e  Base of the natural logarithm 
φ  Golden ratio 
S  Silver ratio 
P  Plastic number 
γ  EulerMascheroni constant 
N2 / Note on means / #Math / 2015.02.10
Consider a family of means of two numbers a, b:
M(a, b, p) = ∫_{a}^{b}x^{p + 1} dx / ∫_{a}^{b}x^{p} dx
In other words, M(a, b, p) is the mean of a distribution with density ∝ x^{p} in the interval [a, b]. Many well known means belong to this family:
p  Expression  Name 

1  (2 / 3) (a + b  a b / (a + b))  ??? 
0  (a + b) / 2  Arithmetic 
1/2  (a + √a b + b) / 3  Heronian 
1  (b  a) / (Ln b  Ln a)  Logarithmic 
3/2  √a b  Geometric 
2  a b (Ln b  Ln a) / (b  a)  ??? 
3  2 a b / (a + b)  Harmonic 
N3 / Generalized Motzkin numbers / #Math / 2015.02.16
Number of strings of length N consisting of symbols from the alphabet of size b and correctly placed brackets is
⸤N / 2⸥  
∑  (N  2 i) b^{N  2 i} C_{i} , 
i = 0 
where (N  2 i) is a binomial coefficient and C_{i} is the ith Catalan number. This result is easy to interpret: C_{i} is the number of possible configurations of 2 i brackets, (N  2 i) is the number of ways to place a bracket configuration among the N symbols, and b^{N  2 i} is the number of possible configurations of the nonbracket symbols.
b  OEIS  First terms starting with N = 0  

0  A126120  1, 0, 2, 0, 5, 0, 14, 0, 42, ...  Catalan numbers interleaved with 0s 
1  A001006  1, 1, 2, 4, 9, 21, 51, 127, 323, ...  Motzkin numbers 
2  A000108  1, 2, 5, 14, 42, 132, 429, 1430, 4862, ...  Catalan numbers 
3  A002212  1, 3, 10, 36, 137, 543, 2219, 9285, 39587, ...  
4  A005572  1, 4, 17, 76, 354, 1704, 8421, 42508, 218318, ...  
5  A182401  1, 5, 26, 140, 777, 4425, 25755, 152675, 919139, ...  
6  A025230  1, 6, 37, 234, 1514, 9996, 67181, 458562, 3172478, ... 
N4 / Scale derivative / #Math / 2015.02.28
Derivative of f(x) is defined as the limit (f(x + h)  f(x)) / h, h → 0. By replacing the shift f + h with scaling x / s, we obtain the scale derivative: g(x) = Limit (f(x / s)  f(x)) / Ln s, s → 1. In the logarithmic x scale, scaling is equivalent to shifting and scale derivative to the normal derivative: t = Ln x; q(t) = f(x); g(x) = Limit (q(t  Ln s)  q(t)) / Ln s, s → 1 = dq(t) / dt = df(x) / d Ln x = x f'(x). It is assumed that x > 0. For x < 0, same results apply in a mirrored form.
g(x) to f(x) is what a gnomon is to the figurate numbers. Infinitesimally larger version of f(x) is obtained by adding g(x) to it: f(x / s) ≈ f(x) + g(x) Ln s. We can continuously "grow" f(x) in size by adding infinitesimal amounts of scaled g(x). Under the condition f(∞) = 0, the original function can be reconstructed as a superposition of g(x) with exponentially increasing scales: f(x) = ∫_{∞}^{0} g(x / e^{z}) dz, z = Ln s. This is easily proven by using the identity g(x) = dq(t) / dt. If the upper integration limit is different, f(x) is obtained at the corresponding scale.
Alternatively, f(x) can be seen as being reconstructed from areanormalized g(x) at scales from 0 to 1: f(x) = ∫_{0}^{1}g(x / s) / s ds. We can "grow" the distribution with PDF equivalent to f(x) by sampling from PDF ∝ g(x / s) starting with s = 0 and increasing it linearly. At any time, distribution of all the values generated so far will match f(x) up to scaling. This trick will only work if f(x) is nonincreasing. Otherwise, g(x) will contain negative values and cannot be a proper PDF.
N5 / Generating generalized Gaussian, Cauchy and related random variables / #Math #Algorithms/ 2015.03.31
Consider a generalized Gaussian (PDF ∝ e^{xp}) and generalized Cauchy (PDF ∝ 1 / (1 + x^{p})) random variables. An easier to sample bounding functions are required to apply the rejection sampling technique. It is easy to verify that both distributions are loglog convex upwards. Therefore they can be bounded by tangent lines in the loglog scale  that is, by power functions. Piecewise power law PDFs are easily handled by inversion sampling of their pieces with corresponding probabilities. It suffices to sample the x > 0 half of the distribution and symmetrize it by multiplying the results by ±1 at random.
Let's bound f(x) ∝ PDF with h(x) consisting of two segments: h(x) = 1 if x < x_{1}, A x^{q} otherwise. The segments match at x_{1}, so x_{1} = A^{1 / q}. Area below h(x) is S = A^{1 / q} (1 + 1 / (q  1)). The efficiency (acceptance rate) is ∫_{0}^{+∞} f(x) dx / S. Let's denote the point where h(x) touches f(x) as x_{e}. Then A = f(x_{e}) x_{e}^{q} and q is the loglog slope of f(x) at x_{e}. Expressing A through q and solving dS / dq = 0 yields the optimal q. This process leads to a transcendental equation for the generalized Cauchy density: q / (1  q) = Ln(1  q / p). A slightly less efficient function that avoids the problem is obtained by assuming q = p (tangent at +∞).
Name  f(x)  q  A  Efficiency  Min efficiency 

Generalized Gaussian  e^{xp}  p + 1  ((q / p) / e)^{q / p}  (p^{1}! e^{1 / p}) / (1 + 1 / p)^{1 + 1 / p}  0 at p → 0 
Generalized Cauchy  1 / (1 + x^{p})  p  1  τ (p  1) / (2 p^{2} Sin(τ / (2 p)))  τ / 8 at p = 2 
Scale derivatives of the examined functions can be enclosed with three segments: x^{p} if x < x1, M if x < x2, A x^{q} otherwise, where M = Max f(x). Scale derivative of Cauchy density can be effectively sampled by assuming q = p. In the Gaussian case, optimal q is a solution to the transcendental equation (1 / p + 1 / (1  q)) q = Ln(1 + q / p). Approximate solution obtained with the aid of Newton's method is listed in the table below.
f(x)  M  q  A  Min efficiency 

x^{p} e^{xp}  1 / e  Max((15 / 7) p + 13 / 19, 1 + p)  ((1 + q / p) / e)^{1 + q / p}  0 at p → 0 
x^{p} / (1 + x^{p})^{2}  1 / 4  p  1  2 / (1 + Ln 4) at p → +∞ 
N6 / Triplane hexagonal coordinates / #Math #Programming #Board_Games / 2015.04.06
Consider a hexagonalshaped region of a hexagonal grid. A coordinate system can be formed by three semiaxes 120° apart starting from the center of the region.
The system is well suited for board games. Most hexagons are encoded with combinations of two symbols. Axis hexagons are encoded with one symbol. Compared to existing hexagonal chess variants, the system has a pleasant symmetry. Each symbol except 0 appears in equal number of hexes. Hexagonal array can be efficiently stored as three rectangular arrays overlapping at the axes. Conversion to the cubic (3D) coordinates is straightforward: triplane coordinates (n, a, α) are equal to the absolute values of the coordinates of 3D axes (x, y, z) lying in the corresponding sector: (n, a) = (z, y), (a, α) = (x, z), (α, n) = (y, x).
N7 / Short brainfuck programs / #Programming / 2015.04.22
Following programs were generated with Ascension framework. Memory cells are assumed to be integers [0 .. 255], wrapping around.Text output:
Program  Length  Output 

+>[>>+>+[++><<]<+<+]>.<<<<++.<<..+++.>.<<+.>.+++..>>.<+.  79  Hello World! 
+[[<<[+[>][<<<]]]>>>]>..>..>.<<<<.<+.>>>>>.>.<<.<.<<+.  66  hello world! 
For comparison, shortest humancoded "Hello World!" program is 89 instructions long:
++++++++++[>>>++>+++>+++>>+[++++++++<]<]>>++.>+.>..>+.>++.>.<<.+++.<.<.>>>+.
Calculating sequences in memory:
Program  Length  Output 

[+>+>[<]>>]  14  Powers of 2 
+[>>>+[<]<+]  14  Powers of 2 
>[[<+>>><<+]>]  17  Fibonacci sequence 
>[[>>+<<<+>+]>]  20  Squares 
N8 / Lattice distance distortion / #Math / 2015.05.17
Take a regular lattice with each point connected to a few neighbouring points, thereby forming a graph. Let's only consider physically meaningful neighbour sets: a line connecting a lattice point to any of its neighbours should not intersect other Voronoi cells in more than one point. This requirement is stricter than the cell vertex connectivity. A neighbourhood can be denoted by the number of nearest lattice points satisfying the connectivity requirement. Graph edges can either have unit weights or have weights equal to the Euclidean distances between the points they connect (w suffix). Let {R} be a set of ratios of the shortest graph distances to the Euclidean distances between any two lattice points. Distance distortion is Max {R} / Min {R}. Distortions of various 2D and 3D lattices are summarized in the following tables. f, e, v denote face, edge and vertex cell connectivity.
Voronoi cell  Neighbors  Connectivity  Distortion  % 

Triangle  6  ev  2  100 
Triangle  3  e  3 / 2  50 
Square  8  ev  √2  41 
Square  4  e  √2  41 
Triangle  6w  ev  2 / √3  15 
Hexagon  6  e  2 / √3  15 
Square  8w  ev  (1 + √2) / √5  8 
Lattice  Voronoi cell  Neighborhood  Connectivity  Distortion  % 

Simple cubic  Cube  26  fev  √3  73 
Bodycentered cubic  Truncated octahedron  8  f  √3  73 
Simple cubic  Cube  6  f  √3  73 
Simple cubic  Cube  18  fe  4 / √6  63 
Bodycentered cubic  Truncated octahedron  14  f  √2  41 
Hexagonal closepacked  Trapezorhombic dodecahedron  12  f  √2  41 
Facecentered cubic  Rhombic dodecahedron  12  f  √2  41 
Simple cubic  Cube  18w  fe  (1 + √2) / √3  39 
Bodycentered cubic  Truncated octahedron  14w  f  √3 / 2  22 
Simple cubic  Cube  26w  fev  (1 + √3) / √6  12 
N9 / Fourierbased area in polar coordinates / #Math / 2015.05.23
Take a Fourier decomposition of a function defined on the [0, τ] interval: f(t) = a_{0} + ∑ a_{i} Sin(i t + φ_{i}). Then the area of f plotted in polar coordinates is (τ / 2) (a_{0}^{2} + ∑ a_{i}^{2} / 2).
N10 / On Weierstrass and Riemann functions / #Math / 2015.06.15
Take Fourier series:
f_{s}(x) = ∑ A_{i} Sin(F_{i} x)f_{c}(x) = ∑ A_{i} Cos(F_{i} x)
Weierstrass and Riemannstyle functions correspond to exponential and powerlaw forms of A_{i} and F_{i}.
Name  A_{i}  F_{i}  Loglog slope β  Dimension D  N terms 

Weierstrass  a^{i}  b^{i}  1 + 2 Ln a / Ln b  2  Ln a / Ln b  Ln ε / Ln a 
Riemann  i^{a}  i^{b}  1 + (2 a  1) / b  2  (a  1 / 2) / b, b ≠ 1  ε^{2 / (1  2 a)} / 2 
Numerical calculations can only rely on a limited number of terms. Suppose we want to achieve a relative inacuraccy ε. Then ε^{2} is the ratio of the power of the truncated terms to the total power. Solving that equation gives the required number of terms.
There is a very simple nonrigorous way to estimate fractal dimensions of the functions in question. Power density around the ith term is
P_{i} = A_{i}^{2} / (F_{i + 1 / 2}  F_{i  1 / 2}) ≈ A_{i}^{2} / F'_{i}In that sense both Weierstrass and Riemann functions have an approximately powerlaw distributed power density. A wellknown result pertaining to the fractional Brownian motion states that if the power spectrum is proportional to F^{β}, then the fractal dimension is D = (5  β) / 2.
Weierstrass Sin and Cos functions, a = √2, b = 2, D = 3 / 2:
Riemann Sin and Cos functions, a = 3 / 2, b = 2, D = 3 / 2:
Let's modify the cosinebased functions so that they have zero at the origin: f*(x) = 1  f_{c}(x) / f_{c}(0). When plotted in polar coordinates, these modified functions resemble butterflies and leaves.
Modified Weierstrass function, D = 3 / 2, b = 2, 3, 4, 5:Modified Riemann function, D = 3 / 2, b = 2, 3, 4, 5:
N11 / Pawn protection of initial chess positions / #Math #Chess / 2015.08.01
Consider all possible arrangements of the chess pieces on the first rank. Under the restriction of bishops occupying opposite colored squares, there are 1440 reflectiondistinct positions. 542 of these positions have every pawn protected at least once. Two maximally protected positions with only one pawn being protected once and all other pawns twice are RQBBRNKN and RKBBRNQN. Two minimally protected positions having three unprotected pawns are NNRQBKRB and NNRKBQRB.
N12 / Cantor's numbers / #Math / 2015.08.25
Recall Cantor's first uncountability proof. Start with an interval and a sequence of real numbers x_{i}. Find the first two numbers from x_{i} that lie inside the interval. These two numbers form the next interval. Repeat. In case of an infinite number of iterations, any number from the limit interval is not contained in x_{i}.
Take a sequence of rational numbers within the initial interval, sorted in order of increasing denominator, and in case of equal denominators in order of increasing numerator. What number does Cantor's bracketing procedure converges to?
Take two fractions A = an / ad, B = bn / bd, ad ≤ bd, satisfying the determinant condition bn ad  an bd = 1. The first number lying between A and B is the mediant of A and B: M = (an + bn) / (ad + bd). The next number is the mediant of A and M. We thus obtain a system of recurrent equations:
a_{i} = a_{i  1} + b_{i  1}
b_{i} = 2 a_{i  1} + b_{i  1}
Thanks to the mediant property, the determinant condition is preserved throughout the recurrence. Eliminating b_{i} gives a single recurrent equation a_{i + 1} = 2 a_{i} + a_{i  1}, which has a solution a_{i} = α (1 + √2)^{i} + β (1  √2)^{i}. Plugging in initial conditions gives α = (a_{0} + b_{0} / √2) / 2. The soughtafter Cantor number is the limit:
Lim an_{i} / ad_{i} = αn / αd = (√2 an_{0} + bn_{0}) / (√2 ad_{0} + bd_{0})N13 / Fraction digit tables / #Math / 2015.11.25
Following images depict binary and ternary digits of irreducible fractions in the (0, 1) interval. Fractions are ordered first by denominators and then by numerators.
Take the numbers formed by digits lying on the main diagonal. By Cantor's diagonal argument, a complement of the binary diagonal number is irrational.
Binary diagonal number:Binary: 0.111000111001010110000110111110101011100111111011011010000100011 ... (OEIS A257341)
Decimal: 0.8890003549695230139346719690879129310538480210179543414543465370 ... (OEIS A257342)
Continued fraction: 0, 1, 8, 110, 1, 1, 1, 4, 1, 3, 33, 3, 4, 1, 26, 1, 37, 1, 23, 2, 117, 1, 1, 5, 1, 2, 20, 12, 1, 2, 1, 4, 1, ...
Ternary diagonal number:
Ternary: 0.1002211021120000111220000002000011111210202200100001202102121112 ...
Decimal: 0.3682120994629780785967399570406755660508922444001004151748993819 ...
Continued fraction: 0, 2, 1, 2, 1, 1, 12, 1, 2, 9, 2, 6, 5, 1, 2, 2, 7, 5, 1, 1, 2, 2, 2, 6, 7, 1, 41, 1, 4, 2, 2, 3, 1, ...
N14 / Multidimensional scaling for vowelconsonant separation / #Algorithms / 2015.12.04
Multidimensional scaling (MDS) can be effectively used for characterlevel text analysis. Denote the distribution of characters appearing at position δ relative to the character a as D(a, δ). Distance between characters a and b can be defined as
ρ(D(a, +1), D(b, +1)) + ρ(D(a, 1), D(b, 1)),where ρ(Da, Db) is some measure of distance between the two distributions. MDS applied to the distance matrices obtained from texts in natural languages readily reveals clusters of vowels, consonants and punctuation marks. Following images depict the results for English, Russian and Voynich manuscript characters and the L1 metric ρ.
N15 / Vectors with constant and normal components / #Math / 2016.01.29
Take a 2D vector having one constant component c and the other component coming from a standard normal distribution. PDF of the distribution of the vector lengths is given by ρ(x) = 2 x Exp((x^{2}  c^{2}) / 2) / √τ (x^{2}  c^{2}).
The mean of the distribution is approximated by √c^{2} + c / 6 + 4 / τ with a maximum relative error of 1%.
N16 / Embedding hexagonal grid in a square grid / #Programming #Board_Games / 2016.02.01
A subset of square grid locations (i, j) corresponding to a hexagonal grid is given by (i + 2 j) mod 4 = 0. Neighbours of a hex have offsets (0, ±2), (±2, ±1). 3D coordinates of a hex are
x = i / 2y = (i + 2 j) / 4
z = (i  2 j) / 4
This representation is easy to work with, but has poor memory efficiency, as only 1/4th of all locations are actually used. However, these extra locations can be used to store additional information, for example serve as an intermediate buffer in a cellular automata scenario.
N17 / Irrationality plots / #Math / 2016.02.01
Let x be a real number, a_{i} its continued fraction representation without the integer part, p_{i} / q_{i} its convergents. Let's define irrationality functions:
F_{a} = ∑ 2^{i} / a_{i}
F_{b} = ∑ 1 / q_{i}
F_{c} = ∑ 2^{i} Ln a_{i}
For terminating continued fractions, a_{i} is extended with +∞ terms, which does not changes the fraction value. Obviously F_{a} and F_{b} converge for all x. F_{c} converges almost everywhere except for rationals and some transcendentals with fast growing a_{i}. Numerical calculations of F_{c} based on machine precision rationals should allways yeild +∞. So for the purpose of drawing F_{c}, a_{i} is extended with Khinchin's constant representative of the typical a_{i} values instead of +∞.
F_{a}(x):F_{b}(x):
F_{c}(x):
Local irrationality maxima are attained at x whose continued fractions end with an infinite sequence of 1s. Such x have SternBrocot tree paths that eventually alternate between left and right ad infinitum, as each run of left or right steps corresponds to a continued fraction term equal to its length. Take an interval [A, B], A = An / Ad, B = Bn / Bd with A, B being unimodular: An Bd  Ad Bn = 1. Suppose that B is the more complex fraction in a sense that Bd ≥ Ad and Bn ≥ An. This means that in the SternBrocot tree, B is a right child of A. Take the left step by computing a mediant of A and B, then the right step and so on. We thus obtain a system of recurrent equations for both numerators and denominators (see InvMem 12 for a similar recursion in a different context):
Ax_{i + 1} = Bx_{i}Bx_{i + 1} = Ax_{i} + Bx_{i}
Solution to the above equations is a Fibonacci sequence:
Ax_{i} = α φ^{i} + β (1 / φ)^{i}α = (Ax_{0} + φ Bx_{0}) / (1 + φ)
The soughtafter limit, which might be called the golden mediant, is
M_{g}(A, B) = Lim An / Ad = (An + φ Bn) / (Ad + φ Bd)If A is the more complex fraction instead, A and B should be swapped before calculating the golden mediant. M_{g}(A, B) is a point of maximal irrationality on the unimodular interval [A, B]. Golden mediants share the worst rational approximation property (see Hurwitz's theorem) with the golden ratio.
N18 / Maximal chess positions / #Chess #Optimization / 2016.02.04
Following recordbreaking positions were generated with Ascension framework. Amount of improvement over the previous records (see "Guide to fairy chess" by Anthony Dickins) is indicated in brackets.
168 moves by both sides, all pieces on the board (+4)
183 moves by both sides, no promotions allowed (+2)
224 moves by both sides, promotions allowed (+1)
89 captures by both sides, all pieces on the board (+1)
50 captures by white, all pieces on the board (+1)
N19 / Weighted minimum and maximum / #Math / 2016.02.22
How can one define a weighted minimum and maximum of the numbers a_{i} having weights w_{i}? One approach is to calculate a limit of a weighted power mean:
WMax = Lim ^{q}√∑ a _{i}^{q} w _{i}^{q} / ∑ w _{i}^{p}, q → +∞ = Max(a_{i} w_{i}) / Max(w_{i})WMin = Lim ^{q}√∑ a _{i}^{q} w _{i}^{q} / ∑ w _{i}^{q}, q → ∞ = Min(a_{i} / w_{i}) / Min(1 / w_{i})
Another idea is to interpret the normalized weight w_{i} as the probability of a_{i} existing. Normalize the weights to unit maximum by dividing by Max w_{i} . Sort a_{i} in ascending  descending order. The probability of a_{i} being the maximum  minimum is the probability of it existing and all greater  smaller values not existing:
p_{i} = w_{i} ∏ (1  w_{j}), j > i.The soughtafter result is the mean of the resulting distribution: WMax  WMin = ∑ a_{i} p_{i}.
N20 / Maximal density subsquarefree arrangements / #Optimization #OpenProblem / 2016.02.22
What is the largest number of points that can be placed on a N × N grid so that no four of them form a square? Exact values are known for N up to 9 (OEIS A240443). Following solutions for N = 10 .. 16 were generated with tabu search and simulated annealing via Ascension framework.
10: 50  11: 58  12: 67  13: 76 
14: 86  15: 95  16: 106 
If only axisparallel squares are considered, maximal numbers are known for N up to 10 (OEIS A227133). Solutions obtained for N = 11 .. 16:
11: 73  12: 85  13: 98 
14: 112  15: 127  16: 142 
N21 / Sierpinskylike arrays / #Math / 2016.03.01
Define a 2 × 2 "seed" ((1, 1), (1, 0)). Start with A = ((1)), iteratively replace every element A_{ij} with A_{ij} × Seed. Resulting infinite array is essentially a Sierpinsky triangle. Alternatively, A_{ij} can be defined as being equal to 1 only if binary representations of i and j do not contain a '1' in the same position: A_{ij} = 1 if i AND j = 0, 0 otherwise, where AND denotes a bitwise 'and' operation.
Let a(n) = ∑ A_{ij} for 0 ≤ i, j < n. a(2 n) = 3 a(n), because after a single replacement iteration, every 1 yields a block containing three new 1s. For odd indices, the edge contains incomplete blocks that only have two 1s, so a(2 n + 1) = 2 a(n + 1) + a(n) (A064194).
Two 3D versions arise naturally.
1. Seed = (((1, 1), (1, 1)), ((1, 1), (1, 0))). All three coordinates of nonzero elements never have a '1' bit in identical position: i AND j AND k = 0. Let a(n) = ∑ A_{ijk} for 0 ≤ i, j, k < n. a(2 n) = 7 a(n). For odd n, in addition to incomplete 2 × 2 × 1 face blocks containing four 1s, there are 3 × 2^{Z(n)} incomplete 2 × 1 × 1 edge blocks only containing two 1s, where Z(n) is the number of zeroes in the binary representation of n. Thus a(2 n + 1) = 4 a(n + 1) + 3 a(n)  6 × 2^{Z(2 n + 1)}. a(n) (A269589) is a lower bound for the 3D verson of no subsquares problem (A268239, also see InvMem 20).
2. Seed = (((1, 1), (1, 0)), ((1, 0), (0, 0))). No pair of coordinates of nonzero elements contains a '1' bit in identical position: i AND j = 0, j AND k = 0, k AND i = 0. a(2 n) = 4 a(n), a(2 n + 1) = 3 a(n + 1) + a(n) (A268524).
N22 / Peaceable queens / #Math #Chess #Optimization / 2016.04.03
OEIS A250000 gives the largest M such that M white and M black queens can be placed on a N × N board and queens of opposite color do not attack each other. For N ≥ 14, simulated annealing yields solutions with 28, 32, 37, 42, 47, 52, 58, 64, 70, 77, 84 queens per side.
Solutions typically consist of four polygonal regions. What happens at the limit N → ∞? Replace the N × N board by a unit square. The areas of the regions (see illustration) are:
A2 = (1 − 4 x^{2} − 2 y^{2}) / 8
A = A1 + A2 = (2 x (1 − 2 x) + y (2 − 3 y)) / 4
Maximum of the total area A is 7 / 48, attained at x = 1 / 4, y = 1 / 3. Observation: with a few exceptions having a nonpolygonal structure (N = 5, 9), M = Floor(7 / 48 N^{2}).
N23 / RGB Principal component analysis / #Math #Color / 2016.05.29
Principal component analysis applied to gammacompressed RGB values from a large set of photographic images (MIRFLICKR25000 dataset) yielded the following transformation matrix:
0.576  0.585  0.571 
0.721  0.034  0.692 
0.386  0.810  0.441 
Relative importance of the axes, as measured by the square root of their eigenvalues, is about 1 : 1/3 : 1/7. The axes closely correspond to black  white, orange  azure (hue 30°  210°) and green  magenta (hue 120°  300°) directions. Normalized to unit dynamic range of the axis projections, the matrix is well approximated by
1/3  1/3  1/3 
1/2  0  1/2 
1/4  1/2  1/4 
N24 / Lightness approximations / #Color #Approximation / 2016.05.29
Lightness of a color can be calculated with the aid of the Lab color system, but in many cases a simple approximation would suffice. Following tables summarize such approximations. R, G, B denote gammacompressed sRGB values. Equivalent gray level denotes the level of sRGB gray having the same lightness as the color in question. The approximations were obtained by minimizing the L_{2} and L_{∞} errors. L_{2} error is weighted with typical realworld colors (RGB data taken from the MIRFLICKR25000 dataset), and the L_{∞} error was calculated along the whole RGB cube. All the approximations give 0 when applied to black and 1 when applied to white.
Equivalent gray level approximation  Error  
0.301 R + 0.622 G + 0.077 B  L_{2} =  1 / 55 
0.279 R + 0.643 G + 0.078 B  L_{∞} <  1 / 4 
(0.228 R^{2} + 0.696 G^{2} + 0.076 B^{2})^{1 / 2}  L_{2} =  1 / 367 
(0.224 R^{2} + 0.700 G^{2} + 0.076 B^{2})^{1 / 2}  L_{∞} <  1 / 38 
(0.215 R^{p} + 0.714 G^{p} + 0.071 B^{p})^{1 / p}, p = 2.169  L_{2} =  1 / 496 
(0.234 R^{p} + 0.695 G^{p} + 0.071 B^{p})^{1 / p}, p = 2.063  L_{∞} <  1 / 40 
Lightness approximation  Error  
0.325 R + 0.641 G + 0.034 B  L_{2} =  1 / 30 
0.288 R + 0.633 G + 0.079 B  L_{∞} <  1 / 4 
(0.267 R^{2} + 0.674 G^{2} + 0.059 B^{2})^{1 / 2}  L_{2} =  1 / 43 
(0.232 R^{2} + 0.692 G^{2} + 0.076 B^{2})^{1 / 2}  L_{∞} <  1 / 19 
(0.208 R^{p} + 0.744 G^{p} + 0.048 B^{p})^{1 / p}, p = 2.909  L_{2} =  1 / 47 
(0.222 R^{p} + 0.705 G^{p} + 0.073 B^{p})^{1 / p}, p = 2.134  L_{∞} <  1 / 24 
(0.213 R^{p} + 0.730 G^{p} + 0.057 B^{p})^{1 / q}, p = 2.211, q = 2.367  L_{2} =  1 / 76 
N25 / Feature scaling / #Math / 2016.10.15
Feature scaling directly affects the results of dimensionality reduction methods such as PCA and MDS. Standardization (rescaling to zero mean and unit variance) is the most common method. But if all the features belong to a ratio scale (have a meaningful zero), one can divide by the mean or the root mean square instead.
Normalization factor  Feature importance 

1  σ 
σ  1 
μ  CV 
√⟨x^{2}⟩  CV / √1 + CV^{2} 
A feature's impact is proportional to its standard deviation after the normalization. Standardization assigns equal importance to all features, while the unit mean and RMS normalization emphasize features with a higher coefficient of variation (CV = σ / μ), which may be more appropriate if relative differences are important. Note that the RMS normalization guarantees bounded feature importance. If all the features have the same units of measurement, the normalization can be skipped, although it would likely be beneficial if the features significantly differ in scale.
Another potential issue is the presence of highly correlated features. Duplicating a feature adds no new information but will distort the results of PCA or MDS. Consider a dataset having D uncorrelated features. Replace each feature i by M_{i} duplicates. The data will now lie in a subspace spanned by D oblique axes and each feature will be inflated by a factor of √M_{i}.
Collinear inflation correction (CIC) factors applicable in the general case have the form F_{i} = √∑ Corr^{2}(i, j), where Corr(i, j) is Pearson's correlation between features i and j. CIC provides a principled and robust alternative to discarding highly correlated features. It is independent of the feature normalization and should be performed in conjunction with it.
N26 / Mixed radix fractal carpets / #Math / 2017.04.18
Sierpinski carpet can be constructed from a single cell by iteratively subdividing the grid into 3 x 3 cell blocks and then removing the cells with both coordinates equal to 1 modulo 3. If 2 is used in place of 3, the procedure would yield a Sierpinski triangle. However, the subdivision factor S and the coordinate test modulo M need not necessarily be the same. Let's call such cases mixed radix carpets (or (hyper)sponges, depending on the dimension).
The total number of cells in a mixed radix carpet at iteration i can be calculated by considering the numbers of cells with various values of coordinates modulo M. Denote by (x; y; ...) the cells with coordinates modulo M identical to x, y, ... up to a permutation, and by N(x; y; ...) the number of such cells. Let's examine the case S = 3, M = 2 and the cells (1; 1) being excluded. Subdivision produces the following cells:
(0; 0) → 4(0; 0), 4(0; 1), 1(1; 1)(0; 1) → 2(0; 0), 5(0; 1), 2(1; 1)
(1; 1) → 1(0; 0), 4(0; 1), 4(1; 1)
Since the cells (1; 1) are excluded, at iteration i + 1 we have:
N_{i + 1}(0; 0) = 4 N_{i}(0; 0) + 2 N_{i}(0; 1)N_{i + 1}(0; 1) = 4 N_{i}(0; 0) + 5 N_{i}(0; 1)
The total number of cells N_{i + 1} = 8 N_{i}(0; 0) + 7 N_{i}(0; 1). By substitution, N_{i + 2} = 60 N_{i}(0; 0) + 51 N_{i}(0; 1), and through simple algebraic manipulations we obtain N_{i + 1} = 9 N_{i} − 12 N_{i − 1}. Asymptotic growth rate is the largest root of the characteristic polynomial: R = (9 + √33) / 2, and the fractal dimension is D = Ln R / Ln S.
Results for other values of S, M and other dimensions can be obtained in the same manner. In general, the system of linear recurrence equation is defined by the subdivision matrix with rows and columns corresponding to the excluded cells removed. Here are two more subdivision rules in matrix form:
3D, S = 3, M = 2:  (N(0; 0; 0), N(0; 0; 1), N(0; 1; 1), N(1; 1; 1)) · 

2D, S = 2, M = 3:  (N(0; 0), N(0; 1), N(1; 1), N(0; 2), N(2; 2), N(1; 2)) · 

Results for selected cases are summarized in the following table. D_{0} is the grid dimension; excluded cells are denoted by the number of '1's in their coordinates modulo M.
S  M  D_{0}  Excluded  Recurrence  R  D  First terms  OEIS  Images 

2  3  2  0  2  2  1  1, 3, 4, 8, 16, 32, 64, 128, ...  A000079, i > 1  8 
2  3  2  0, 2  2  2  1  1, 2, 4, 8, 16, 32, 64, 128, ...  A000079  8 
2  3  2  1  1, 4  2.562  1.357  1, 2, 6, 14, 38, 94, 246, 622, ...  A026597  8 
2  3  2  2  3, 2  3.562  1.833  1, 3, 11, 39, 139, 495, 1763, ...  A007482  8 
3  2  2  1, 2  4  4  1.262  1, 4, 16, 64, 256, 1024, ...  A000302  5 
3  2  2  0, 1  4  4  1.262  1, 1, 4, 16, 64, 256, 1024, ...  A000302, i > 0  5 
3  2  2  1  5  5  1.465  1, 5, 25, 125, 625, 3125, ...  A000351  5 
3  2  2  0, 2  5  5  1.465  1, 4, 20, 100, 500, 2500, ...  A005054  5 
3  2  2  2  9, −12  7.372  1.818  1, 8, 60, 444, 3276, 24156, ...  A285391  5 
3  2  2  0  9, −12  7.372  1.818  1, 5, 36, 264, 1944, 14328, ...  A285392  5 
3  2  3  1, 2, 3  8  8  1.893  1, 8, 64, 512, 4096, 32768, ...  A001018  1, 2, 3, 4 
3  2  3  0, 1, 2  8  8  1.893  1, 1, 8, 64, 512, 4096, ...  A001018, i > 0  1, 2, 3, 4 
3  2  3  1, 2  9  9  2  1, 9, 81, 729, 6561, 59049, ...  A001019  1, 2, 3, 4 
3  2  3  0, 2, 3  12  12  2.262  1, 12, 144, 1728, 20736, ...  A001021  1, 2, 3, 4 
3  2  3  0, 1, 3  12  12  2.262  1, 6, 72, 864, 10368, ...  A067419  1, 2, 3, 4 
3  2  3  1, 3  14  14  2.402  1, 14, 196, 2744, 38416, ...  A001023  1, 2, 3, 4 
3  2  3  0, 2  14  14  2.402  1, 13, 182, 2548, 35672, ...  A285399  1, 2, 3, 4 
3  2  3  2, 3  20, −48  17.211  2.590  1, 20, 352, 6080, 104704, ...  A285393  1, 2, 3, 4 
3  2  3  0, 1  20, −48  17.211  2.590  1, 7, 116, 1984, 34112, ...  A285394  1, 2, 3, 4 
3  2  3  2  28, −195, 324  18.322  2.647  1, 21, 399, 7401, 136227, ...  A285396  1, 2, 3, 4 
3  2  3  1  28, −195, 324  18.322  2.647  1, 15, 249, 4371, 78693, ...  A285395  1, 2, 3, 4 
3  2  3  0, 3  21  21  2.771  1, 18, 378, 7938, 166698, ...  A285400  1, 2, 3, 4 
3  2  3  3  32, −195, 216  24.359  2.906  1, 26, 646, 15818, 385822, ...  A285397  4 
3  2  3  0  32, −195, 216  24.359  2.906  1, 19, 452, 10948, 266300 ...  A285398  1, 2, 3, 4 
N27 / Random strings with identical element counts / #Math / 2018.05.31
Probability of two random binary strings of length n having identical bit counts is
P = ∑_{0 ≤ i, j ≤ n, i + j = n} ((i, j)! / 2^{n})^{2} = (n, n)! / 4^{n} = A000984 / 4^{n} = √2 / τ (n − 1/2)! / n! ≈ 1 / √6 / 7 + τ n / 2Probability of two random ternary strings having identical element counts is
P = ∑_{0 ≤ i, j, k ≤ n, i + j + k = n} ((i, j, k)! / 3^{n})^{2} = A002893 / 9^{n} ≈ 1 / (4 / 7 + (2 τ / 3^{3/2}) n),where (a, b, c, ...)! = (a + b + c + ...)! / (a! b! c! ...) is the multinomial coefficient.
N28 / Irrational scoring / #Math / 2018.05.31
Irrational point system can be employed to eliminate the possibility of several players having an equal score in a tournament unless their numbers of wins and draws are identical. Due to worst rational approximation properties (Hurwitz's theorem), suitable choices for the win / draw point ratio are 1 + φ, 4 − φ and γ, where φ = (1 + √5) / 2 is the golden ratio and γ = 1 + √2 is the silver ratio. These values result in the following dependencies of the minimal gap between two different scores Δ on the number of games n:
Win / draw ratio  Asymptotic worst case Δ  Worst case n 

1 + φ  (1 + φ) / (√5 n) ≈ 1.17 / n  2, 3, 5, 8, 13, 21, 34, ... (A000045) 
4 − φ  (4 − φ) / (√5 n) ≈ 1.07 / n  2, 5, 7, 12, 19, 31, 50, ... (A013655) 
γ  γ / (√8 n) ≈ 0.85 / n  2, 5, 12, 29, 70, 169, 408, ... (A000129) 
N29 / Correlation for vectorvalued data / #Math / 2018.05.31
Pearson's correlation coefficient can be extended to vectorvalued data by replacing multiplication with a dot product. In other words, correlation is the mean dot product between the corresponding elements of the datasets normalized to zero mean and unit mean square vector length.
If you appreciate the effort put into creating this page and would like to see more frequent updates, you can support me on Patreon. Top patrons of the month:
 Brian Bucklew
 Anton Shepelev
 Adam Hill